python - 损失函数的评估在 LSTM 模型中返回错误
问题描述
我正在尝试使用 tensorflowkeras.Sequential
库为文本生成拟合 LSTM 模型和预训练嵌入。我有以下评估错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [Condition x == y did not hold element-wise:] [x (sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/Shape_1:0) = ] [5 199] [y (sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/strided_slice:0) = ] [200 199]
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert (defined at <input>:161) ]] [Op:__inference_train_function_4885]
我的模型如下:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size, embedding_matrix):
model = tf.keras.Sequential([
#vocab_size = 30000, embedding_dim = 300, batch_size=64, embedding_matrix.shape = (30000, 300)
tf.keras.layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], trainable=False, batch_input_shape=[max_len, None]),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(vocab_size)
])
return model
model = build_model(
vocab_size=len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=batch_size,
embedding_matrix=embedding_matrix
)
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
patience = 10
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience)
checkpoint_dir = './checkpoints'+ datetime.datetime.now().strftime("_%Y.%m.%d-%H:%M:%S")
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True
)
history = model.fit(text_ds, epochs=epochs, callbacks=[checkpoint_callback, early_stop], validation_data=text_ds)
在查看其他类似问题后,问题似乎与输入形状和输出形状有关。尽管如此,我还是不明白出了什么问题。
模型总结如下:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (200, None, 300) 9000000
_________________________________________________________________
dropout (Dropout) (200, None, 300) 0
_________________________________________________________________
lstm (LSTM) (200, None, 1024) 5427200
_________________________________________________________________
dropout_1 (Dropout) (200, None, 1024) 0
_________________________________________________________________
lstm_1 (LSTM) (200, None, 1024) 8392704
_________________________________________________________________
dropout_2 (Dropout) (200, None, 1024) 0
_________________________________________________________________
dense (Dense) (200, None, 30000) 30750000
=================================================================
Total params: 53,569,904
Trainable params: 44,569,904
Non-trainable params: 9,000,000
_________________________________________________________________
输入和输出形状如下:
Output:
(200, None, 300)
(200, None, 300)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
(200, None, 30000)
Input:
(200, None)
(200, None, 300)
(200, None, 300)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
编辑:
通过放入return_sequences=False
最后一个 LSTM,我得到:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [200,199,300] vs. [5,199,300]
[[node sequential/dropout/dropout/Mul_1 (defined at <input>:161) ]] [Op:__inference_train_function_4801]
在这种情况下,模型摘要是:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (200, None, 300) 9000000
_________________________________________________________________
dropout (Dropout) (200, None, 300) 0
_________________________________________________________________
lstm (LSTM) (200, None, 1024) 5427200
_________________________________________________________________
dropout_1 (Dropout) (200, None, 1024) 0
_________________________________________________________________
lstm_1 (LSTM) (200, 1024) 8392704
_________________________________________________________________
dropout_2 (Dropout) (200, 1024) 0
_________________________________________________________________
dense (Dense) (200, 30000) 30750000
=================================================================
Total params: 53,569,904
Trainable params: 44,569,904
Non-trainable params: 9,000,000
_________________________________________________________________
输入:
(200, None)
(200, None, 300)
(200, None, 300)
(200, None, 1024)
(200, None, 1024)
(200, 1024)
(200, 1024)
解决方案
改变batch_input_shape
论点:
tf.keras.layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], trainable=False, , batch_input_shape=[5, None]),
推荐阅读
- classification - 如何以 zarr 格式动态存储一组图像和标签?
- opencv - 我想检测段落中所有带下划线的单词
- python - 当当前时间某处有零时,将其转换为二进制失败
- c++ - 递归函数在 0x79B20AD2 (ucrtbased.dll) 处引发未处理的异常
- wordpress - 使用 htaccess 将所有页面重定向到单个页面?
- windows-installer - 安装程序在没有任何重启提示的情况下重启
- java - 使用代码编译包含 java 类和文本文件的目录
- python-3.x - 是否可以使用python截取文件内容的屏幕截图
- google-api - 尝试使用 Python 的库在 GAPI Drive v3 中按文件 ID 查询时出错
- java - 尽管使用正确的输出编写了正确的代码,但仍获得 WA