首页 > 解决方案 > ValueError:logits 和标签必须具有相同的形状 ((None, 10) vs (None, 1))

问题描述

我是 tensorflow 的新手,我试图构建一个简单的模型来输出安装概率(安装列)。

这里是数据集的一个子集:

{'A': {0: 12, 2: 28, 3: 26, 4: 9, 5: 36},
 'B': {0: 10, 2: 17, 3: 22, 4: 2, 5: 31},
 'C': {0: 1, 2: 0, 3: 5, 4: 0, 5: 1},
 'D': {0: 5, 2: 0, 3: 0, 4: 0, 5: 0},
 'E': {0: 12, 2: 1, 3: 4, 4: 3, 5: 1},
 'F': {0: 12, 2: 2, 3: 14, 4: 9, 5: 11},
 'install': {0: 0, 2: 0, 3: 1, 4: 0, 5: 0},
 'G': {0: 21, 2: 12, 3: 8, 4: 13, 5: 19},
 'H': {0: 0, 2: 5, 3: 1, 4: 6, 5: 5},
 'I': {0: 21, 2: 22, 3: 5, 4: 10, 5: 20},
 'J': {0: 0.0, 2: 136.5, 3: 0.0, 4: 0.1, 5: 29.5},
 'K': {0: 0.15220949263502456,
  2: 0.08139534883720931,
  3: 0.15625,
  4: 0.15384584755440725,
  5: 0.04188829787234043},
 'L': {0: 649, 2: 379, 3: 531, 4: 660, 5: 242},
 'M': {0: 0, 2: 0, 3: 0, 4: 1, 5: 1},
 'N': {0: 1, 2: 1, 3: 1, 4: 0, 5: 0},
 'O': {0: 0, 2: 1, 3: 0, 4: 1, 5: 0},
 'P': {0: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'Q': {0: 1, 2: 0, 3: 1, 4: 0, 5: 1}}

这里是我正在处理的代码:

X = df.drop('install', axis=1) #data
y = df['install'] #target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, test_size = 0.3)

X_train = ss.fit_transform(X_train)
X_test = ss.fit_transform(X_test)

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='softmax'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(10)
])

loss = keras.losses.BinaryCrossentropy(from_logits=True)
optim = keras.optimizers.Adam(lr=0.001)
metrics = ["accuracy"]

model.compile(loss=loss, optimizer=optim, metrics=metrics)

batch_size = 32
epoch = 5
model.fit(X_train, y_train, batch_size=batch_size, epochs=epoch, shuffle=True, verbose=1)

你能帮我理解错误吗?我知道问题在于我的 X 和 y 的大小。

标签: pythontensorflowkerasdeep-learning

解决方案


注意:您尚未指定ss对象属于哪个类,因此我将讨论删除它的所有内容。

首先让我们讨论一下你的目标。即安装列。根据这些值,我假设您的问题是二元分类,即预测0并且1您想要拥有它们的概率。

为此,您必须如下定义您的模型。

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(2, activation='softmax')
])

'''
Note: I have changed the activation of the first `dense` layer from
'softmax` to `relu` as `softmax` is not ideal for inner layers as it greatly
reduce information from each node. Although having 'softmax' will not result
in any syntax error but it is methodologically wrong.

Now the next major change is changing the number of units in the last
`Dense` layer from 10 to 2. What you want is the probability of having
either 0 or 1. So if you have the have the output from your model as `[a ,
b]` here a is some value corresponding to 0 and b corresponding to 1 then
you can get probability on them using the 'softmax' activation. Without
activation the values we get are called 'logits'.
'''

# Now you have to change your loss function as below
loss = tf.keras.losses.SparseCategoricalCrossentropy()

# The rest is same. Now we run a dummy trial of the model after training it using your code.

preds = model.predict(X_test)
preds
'''
This gives the results:
array([[9.9999726e-01, 2.7777487e-06],
       [9.5156413e-01, 4.8435837e-02]], dtype=float32)

This says the probability of sample 1 being 0 is '9.9999726e-01' i.e.
'0.999..' and of it being 1 is '2.7777487e-06' i.e. '0.00000277..` and these
gracefully sum up to 1. Same for the sample 2.
'''

还有另一种方法可以做到这一点。由于您只有 1 个标签,因此如果您具有与该标签相对应的概率,那么您可以通过从 1 中减去它来获得与另一个标签相对应的概率。您可以按如下方式实现它:

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(1, activation='sigmoid')
])

'''
The difference is 'softmax' and 'sigmoid' is that the 'softmax' is applied
on all the units in a unified manner but 'sigmoid' is applied on each
individual unit. So you can say that 'softmax' is the applied on the 'layer'
and 'sigmoid' is applied on the 'units'.

Now the output of the 'sigmoid' is the probability of the result being 1. So
we can say that the result could either be 0 or 1 depending on the output
probability with some threshold and hence we will not use a different loss
that is BinaryCrossEntropy as the values will be binary (either 0 or 1).
'''

loss = keras.losses.BinaryCrossentropy() # again without logits

# We once again the train the model using the rest of the code and analyze
the outputs.

preds = model.predict(X_test)
preds
'''
This gives the results:
array([[1.6424768e-13],
       [2.0349980e-06]], dtype=float32)

So for sample 1 we have the probability of it being '1' as '1.6424768e-13'
and as we have only '1' and '0' the probability of it being '0' is '1 -
1.6424768e-13'. Same for the sample 2.
'''

现在来自@Mattpats的回答。这个答案也可以,但在这种情况下,您不会得到概率作为输出,而是会得到 ,logits因为您没有使用 any activation,并且损失是logits通过指定参数来计算的from_logits=True。对于这个概率,你必须像下面这样使用它:

preds = model.predict(X_test)
sigmoid_preds = tf.math.sigmoid(preds).numpy()
preds, sigmoid_preds
'''
This give the following results:
preds = array([[-51.056973],
              [-32.444508]], dtype=float32)

sigmoid_preds = array([[6.702527e-23],
                      [8.119502e-15]], dtype=float32)
'''

推荐阅读