python - Keras CNN-RNN 不会训练。可以使用一些调试
问题描述
我有一个正在训练的二元分类问题,我在稍后通过预先训练的嵌入传递我的数据相当成功,然后是并行的几个 CNN,汇集结果,然后使用密集层来预测类。但是当我改为在 CNN 之后分层一个 RNN 时,训练完全失败了。代码如下(这是一个很长的帖子)。
这是仅工作的 CNN 模型。我的输入是长度为 100 的向量。
inputs=L.Input(shape=(100))
embedding=L.Embedding(input_dim=weights.shape[0],\
output_dim=weights.shape[1],\
input_length=100,\
weights=[weights],\
trainable=False)(inputs)
conv3 = L.Conv1D(m, kernel_size=(3))(dropout)
conv4 = L.Conv1D(m, kernel_size=(4))(dropout)
conv5 = L.Conv1D(m, kernel_size=(5))(dropout)
maxpool3 = L.MaxPool1D(pool_size=(100-3+1, ), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=(100-4+1, ), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(100-5+1, ), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=1)([maxpool3,maxpool4,maxpool5])
flattened = L.Flatten()(concatenated_tensor)
output = L.Dense(units=1, activation='sigmoid')(flattened)
这是摘要:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_25 (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
embedding_25 (Embedding) (None, 100, 50) 451300 input_25[0][0]
____________________________________________________________________________________________________
dropout_25 (Dropout) (None, 100, 50) 0 embedding_25[0][0]
____________________________________________________________________________________________________
conv1d_73 (Conv1D) (None, 98, 100) 15100 dropout_25[0][0]
____________________________________________________________________________________________________
conv1d_74 (Conv1D) (None, 97, 100) 20100 dropout_25[0][0]
____________________________________________________________________________________________________
conv1d_75 (Conv1D) (None, 96, 100) 25100 dropout_25[0][0]
____________________________________________________________________________________________________
max_pooling1d_73 (MaxPooling1D) (None, 1, 100) 0 conv1d_73[0][0]
____________________________________________________________________________________________________
max_pooling1d_74 (MaxPooling1D) (None, 1, 100) 0 conv1d_74[0][0]
____________________________________________________________________________________________________
max_pooling1d_75 (MaxPooling1D) (None, 1, 100) 0 conv1d_75[0][0]
____________________________________________________________________________________________________
concatenate_25 (Concatenate) (None, 3, 100) 0 max_pooling1d_73[0][0]
max_pooling1d_74[0][0]
max_pooling1d_75[0][0]
____________________________________________________________________________________________________
flatten_25 (Flatten) (None, 300) 0 concatenate_25[0][0]
____________________________________________________________________________________________________
dense_47 (Dense) (None, 1) 301 flatten_25[0][0]
====================================================================================================
正如我上面所说,这工作得很好,仅在 3-4 个 epoch 后就获得了很好的准确性。然而,我的想法是 CNN 识别区域模式,但如果我还想在给定的输入向量中模拟它们在更长距离内如何相互关联,我应该在卷积之后使用某种 RNN 风格。因此,我尝试在卷积后更改层的 ,删除pool_size
,然后将层传递给RNN。例如MaxPooling1D
Flatten
Concatenate
maxpool3 = L.MaxPool1D(pool_size=((50,), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=((50,), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(49,), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=1)([maxpool3,maxpool4,maxpool5])
rnn=L.SimpleRNN(75)(concatenated_tensor)
output = L.Dense(units=1, activation='sigmoid')(rnn)
现在摘要变为:
max_pooling1d_95 (MaxPooling1D) (None, 50, 100) 0 conv1d_97[0][0]
____________________________________________________________________________________________________
max_pooling1d_96 (MaxPooling1D) (None, 50, 100) 0 conv1d_98[0][0]
____________________________________________________________________________________________________
max_pooling1d_97 (MaxPooling1D) (None, 49, 100) 0 conv1d_99[0][0]
____________________________________________________________________________________________________
concatenate_32 (Concatenate) (None, 149, 100) 0 max_pooling1d_95[0][0]
max_pooling1d_96[0][0]
max_pooling1d_97[0][0]
____________________________________________________________________________________________________
simple_rnn_5 (SimpleRNN) (None, 75) 13200 concatenate_32[0][0]
____________________________________________________________________________________________________
dense_51 (Dense) (None, 1) 76 simple_rnn_5[0][0]
====================================================================================================
当我训练模型时,预测都是完全相同的:class[1] 与 class[0] 的比率。我读过一些论文,人们成功地使用了这个方案,所以很明显我做错了什么,我敢打赌这是一个令人尴尬的愚蠢错误。有人愿意帮忙诊断吗?
解决方案
您可以尝试的第一件事是沿特征轴连接,而不是时间轴。基本上试试这个:
maxpool3 = L.MaxPool1D(pool_size=(50,), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=(50,), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(50,), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=2)([maxpool3,maxpool4,maxpool5])
rnn=L.SimpleRNN(75)(concatenated_tensor)
output = L.Dense(units=1, activation='sigmoid')(rnn)
(请注意,您必须确保 maxpool3、maxpool4 和 maxpool5 具有相同数量的“时间”步或 maxpool3.shape[1]=maxpool4.shape[1]=maxpool5.shape[1])
其次,使用 50 个时间步长,试一试 LSTM 或 GRU,因为它们可以比 LSTM 更好地捕捉更长的时间依赖关系。
推荐阅读
- css - 将按钮放在material-ui对话框的右下角
- wolfram-mathematica - Mathematica 中 DSolve 求解偏微分方程的问题
- spring-data-jpa - Spring Data:按实体查找还是按实体 id 查找?
- ruby - 我可以在验证字符串的黄瓜步骤中使用正则表达式吗?
- python - 关闭窗口后如何使按钮变为正常?
- node.js - 使用 http-proxy-middleware 转发 Express 请求,仅更改端口
- swift - 一个表格视图中的多个自定义单元格不起作用
- python - 中止后如何重新开始交易?
- typescript - 3.1 之后的 TypeScript 不接受回调函数参数类型
- css - 更改智能手机/平板电脑上的 CSS