python - Keras 在训练分类 LSTM 序列到序列模型时给出 nan
问题描述
我正在尝试编写一个 Keras 模型(使用 Tensorflow 后端),它使用 LSTM 来预测序列的标签,就像在词性标注任务中一样。我编写的模型nan
作为所有训练时期和所有标签预测的损失返回。我怀疑我的模型配置不正确,但我不知道我做错了什么。
完整的程序在这里。
from random import shuffle, sample
from typing import Tuple, Callable
from numpy import arange, zeros, array, argmax, newaxis
def sequence_to_sequence_model(time_steps: int, labels: int, units: int = 16):
from keras import Sequential
from keras.layers import LSTM, TimeDistributed, Dense
model = Sequential()
model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))
model.add(TimeDistributed(Dense(labels)))
model.compile(loss='categorical_crossentropy', optimizer='adam')
return model
def labeled_sequences(n: int, sequence_sampler: Callable[[], Tuple[array, array]]) -> Tuple[array, array]:
"""
Create training data for a sequence-to-sequence labeling model.
The features are an array of size samples * time steps * 1.
The labels are a one-hot encoding of time step labels of size samples * time steps * number of labels.
:param n: number of sequence pairs to generate
:param sequence_sampler: a function that returns two numeric sequences of equal length
:return: feature and label sequences
"""
from keras.utils import to_categorical
xs, ys = sequence_sampler()
assert len(xs) == len(ys)
x = zeros((n, len(xs)), int)
y = zeros((n, len(ys)), int)
for i in range(n):
xs, ys = sequence_sampler()
x[i] = xs
y[i] = ys
x = x[:, :, newaxis]
y = to_categorical(y)
return x, y
def digits_with_repetition_labels() -> Tuple[array, array]:
"""
Return a random list of 10 digits from 0 to 9. Two of the digits will be repeated. The rest will be unique.
Along with this list, return a list of 10 labels, where the label is 0 if the corresponding digits is unique and 1
if it is repeated.
:return: digits and labels
"""
n = 10
xs = arange(n)
ys = zeros(n, int)
shuffle(xs)
i, j = sample(range(n), 2)
xs[j] = xs[i]
ys[i] = ys[j] = 1
return xs, ys
def main():
# Train
x, y = labeled_sequences(1000, digits_with_repetition_labels)
model = sequence_to_sequence_model(x.shape[1], y.shape[2])
model.summary()
model.fit(x, y, epochs=20, verbose=2)
# Test
x, y = labeled_sequences(5, digits_with_repetition_labels)
y_ = model.predict(x, verbose=0)
x = x[:, :, 0]
for i in range(x.shape[0]):
print(' '.join(str(n) for n in x[i]))
print(' '.join([' ', '*'][int(argmax(n))] for n in y[i]))
print(y_[i])
if __name__ == '__main__':
main()
我的特征序列是从 0 到 9 的 10 位数组。我对应的标签序列是 10 个零和一个数组,其中零表示唯一数字,一表示重复数字。(这个想法是创建一个包含长距离依赖关系的简单分类任务。)
训练看起来像这样
Epoch 1/20
- 1s - loss: nan
Epoch 2/20
- 0s - loss: nan
Epoch 3/20
- 0s - loss: nan
所有标签数组预测看起来像这样
[[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]]
很明显有些事情是错误的。
传递给的特征矩阵model.fit
的维度为samples
× time steps
× 1
。标签矩阵的维度为samples
× time steps
× 2
,其中 2 来自标签 0 和 1 的 one-hot 编码。
我正在使用时间分布的密集层来预测序列,遵循 Keras 文档和类似this和this的帖子。据我所知,上面定义的模型拓扑sequence_to_sequence_model
是正确的。模型摘要如下所示
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 10, 16) 1152
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 2) 34
=================================================================
Total params: 1,186
Trainable params: 1,186
Non-trainable params: 0
_________________________________________________________________
像这样的堆栈溢出问题听起来像是nan
结果是数字问题的指标:失控梯度等等。但是,由于我正在处理一个很小的数据集,并且从我的模型返回的每个数字都是 a nan
,我怀疑我没有看到数字问题,而是我如何构建模型的问题。
上面的代码是否具有用于序列到序列学习的正确模型/数据形状?如果是这样,为什么我nan
到处都是 s ?
解决方案
默认情况下,该Dense
层没有激活。如果您指定一个,则nan
s 消失。更改上面代码中的以下行。
model.add(TimeDistributed(Dense(labels, activation='softmax')))
推荐阅读
- node.js - 如何编写 npm 脚本来复制目录
- java - 我在 android studio 中创建了一个应用程序。我收到了图片中的错误。我尝试了一切但无法解决
- google-bigquery - 您可以安排 BQ 查询每天运行并保存到表中吗?
- javascript - 抓住承诺
- javascript - 套接字 io 发射生成 'instanceof' 的右侧不是对象
- javascript - 使用 Webpack 动态同步导入
- c# - ASP.NET MVC 5 - Konscious Security Argon2 GetBytes 方法挂起
- c# - 在 C# 中创建常量表达式
- swift - RxSwift 无点风格避免保留循环
- python - 有没有办法在 python 启动时始终执行脚本?(R中类似的site.profile)