首页 > 解决方案 > 仅使用数据集的第一个样本进行训练的模型

问题描述

我正在训练以下模型:

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=30, output_dim=64, mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=1024)),
    tf.keras.layers.Dense(128, activation="sigmoid"),
    tf.keras.layers.Dense(10, activation="linear")
])

该网络处理文本,因此我通过将每个字母转换为数值将数据集的每个字符串转换为 numpy 数组:

def converter(fen):
    normal_list = []

    for letter in fen:
        if letter == "/" or letter == " " or letter == "-":
            normal_list.append(0)
        elif letter == "p":
            normal_list.append(1)
        elif letter == "P":
            normal_list.append(2)
        elif letter == "n":
            normal_list.append(3)
        elif letter == "N":
            normal_list.append(4)
        elif letter == "b":
            normal_list.append(5)
        elif letter == "B":
            normal_list.append(6)
        elif letter == "r":
            normal_list.append(7)
        elif letter == "R":
            normal_list.append(8)
        elif letter == "q":
            normal_list.append(9)
        elif letter == "Q":
            normal_list.append(10)
        elif letter == "k":
            normal_list.append(11)
        elif letter == "K":
            normal_list.append(12)
        elif letter == "a":
            normal_list.append(13)
        elif letter == "b":
            normal_list.append(14)
        elif letter == "c":
            normal_list.append(15)
        elif letter == "d":
            normal_list.append(16)
        elif letter == "e":
            normal_list.append(17)
        elif letter == "f":
            normal_list.append(18)
        elif letter == "g":
            normal_list.append(19)
        elif letter == "h":
            normal_list.append(20)
        elif letter == "1":
            normal_list.append(21)
        elif letter == "2":
            normal_list.append(22)
        elif letter == "3":
            normal_list.append(23)
        elif letter == "4":
            normal_list.append(24)
        elif letter == "5":       
            normal_list.append(25) 
        elif letter == "6":
            normal_list.append(26)
        elif letter == "7":
            normal_list.append(27)
        elif letter == "8":
            normal_list.append(28)
        elif letter == "9":
            normal_list.append(29)
        else:
            normal_list.append(0)
    
    return np.array(normal_list, ndmin=2).astype(np.float32)
    # I used ndmin = 2 because the embedding layer turns it into ndmin = 3

然后我导入了用于训练转换样本的数据集:

x_set = []
y_set = []

for position in df["position"]:
    x_set.append(cvt.converter(position))

len(x_set)是 950,是x_set[0].shape(1, ?) 在哪里?在 50 到 70 之间变化。

关于y_set,我用过:

for a in range(len(df["position"])):
    y_set.append(np.array([
        df["Pawns"][a], df["Knights"][a], df["Bishops"][a], df["Rooks"][a],
        df["Queens"][a], df["Mobility"][a], df["King"][a], df["Threats"][a],
        df["Passed"][a], df["Space"][a]
    ], ndmin=2)) # If I don't use ndmin = 2 here I get ValueError: Data cardinality is ambiguous

而且它的len也是950

当我调用model.fit(x_set, y_set, epochs = 10)模型时,只使用一个样本来训练网络:

Epoch 1/10
1/1 [==============================] - 19s 19s/step - loss: 0.2291 - mae: 0.4116
Epoch 2/10
1/1 [==============================] - 3s 3s/step - loss: 0.1645 - mae: 0.3302
Epoch 3/10
1/1 [==============================] - 3s 3s/step - loss: 0.0764 - mae: 0.1982
Epoch 4/10
1/1 [==============================] - 3s 3s/step - loss: 1.4347 - mae: 1.0087
Epoch 5/10
1/1 [==============================] - 3s 3s/step - loss: 0.0038 - mae: 0.0461
Epoch 6/10
1/1 [==============================] - 3s 3s/step - loss: 0.0532 - mae: 0.1780
Epoch 7/10
1/1 [==============================] - 3s 3s/step - loss: 0.0597 - mae: 0.1931
Epoch 8/10
1/1 [==============================] - 3s 3s/step - loss: 0.0522 - mae: 0.1814
Epoch 9/10
1/1 [==============================] - 3s 3s/step - loss: 0.0375 - mae: 0.1583
Epoch 10/10
1/1 [==============================] - 3s 3s/step - loss: 0.0252 - mae: 0.1432

它不应该使用所有 950 个 x_set 样本吗?这段代码有什么问题?

标签: pythontensorflowmachine-learningkerasdeep-learning

解决方案


此行表示它是在一批训练,而不是一个样本:

1/1 [==============================] - 19s 19s/step - loss: 0.2291 - mae: 0.4116

我相信 Keras 中的默认 batch_size 是 32。Keras 嵌入层需要整数,而不是浮点数,并且您使用的维度太多,因此您应该在转换器中更改此行:

return np.array(normal_list, ndmin=2).astype(np.float32)

对此:

return np.array(normal_list)

您希望每个训练样本的形状为 (?),在哪里?你的情况是50-70。您希望每个目标的形状为 (10),因为您的模型从其最后一个密集层输出 10 个值。结合样本数量,您希望x_set具有 的形状(950, ?)y_set的形状(950, 10)。为避免出现问题,您可能应该填充所有样本以具有相同的大小,而不是在 50 到 70 之间变化。

您的模型需要以下输入:

>>> model.input_shape
(None, None)

model.summary()的如下(第一个None维度是batch_size,在你的例子中是950):

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, None, 64)          1920      
_________________________________________________________________
bidirectional (Bidirectional (None, 2048)              8921088   
_________________________________________________________________
dense (Dense)                (None, 128)               262272    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 9,186,570
Trainable params: 9,186,570
Non-trainable params: 0
_________________________________________________________________

简而言之,我相信您将整个训练集嵌入到一个样本中。


推荐阅读