首页 > 解决方案 > Keras CNN-RNN 不会训练。可以使用一些调试

问题描述

我有一个正在训练的二元分类问题,我在稍后通过预先训练的嵌入传递我的数据相当成功,然后是并行的几个 CNN,汇集结果,然后使用密集层来预测类。但是当我改为在 CNN 之后分层一个 RNN 时,训练完全失败了。代码如下(这是一个很长的帖子)。

这是仅工作的 CNN 模型。我的输入是长度为 100 的向量。

inputs=L.Input(shape=(100))
embedding=L.Embedding(input_dim=weights.shape[0],\
                          output_dim=weights.shape[1],\
                          input_length=100,\
                          weights=[weights],\
                          trainable=False)(inputs)
conv3 = L.Conv1D(m, kernel_size=(3))(dropout)
conv4 = L.Conv1D(m, kernel_size=(4))(dropout)
conv5 = L.Conv1D(m, kernel_size=(5))(dropout)
maxpool3 = L.MaxPool1D(pool_size=(100-3+1, ), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=(100-4+1, ), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(100-5+1, ), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=1)([maxpool3,maxpool4,maxpool5])
flattened = L.Flatten()(concatenated_tensor)
output = L.Dense(units=1, activation='sigmoid')(flattened)

这是摘要:

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_25 (InputLayer)            (None, 100)           0                                            
____________________________________________________________________________________________________
embedding_25 (Embedding)         (None, 100, 50)       451300      input_25[0][0]                   
____________________________________________________________________________________________________
dropout_25 (Dropout)             (None, 100, 50)       0           embedding_25[0][0]               
____________________________________________________________________________________________________
conv1d_73 (Conv1D)               (None, 98, 100)       15100       dropout_25[0][0]                 
____________________________________________________________________________________________________
conv1d_74 (Conv1D)               (None, 97, 100)       20100       dropout_25[0][0]                 
____________________________________________________________________________________________________
conv1d_75 (Conv1D)               (None, 96, 100)       25100       dropout_25[0][0]                 
____________________________________________________________________________________________________
max_pooling1d_73 (MaxPooling1D)  (None, 1, 100)        0           conv1d_73[0][0]                  
____________________________________________________________________________________________________
max_pooling1d_74 (MaxPooling1D)  (None, 1, 100)        0           conv1d_74[0][0]                  
____________________________________________________________________________________________________
max_pooling1d_75 (MaxPooling1D)  (None, 1, 100)        0           conv1d_75[0][0]                  
____________________________________________________________________________________________________
concatenate_25 (Concatenate)     (None, 3, 100)        0           max_pooling1d_73[0][0]           
                                                                   max_pooling1d_74[0][0]           
                                                                   max_pooling1d_75[0][0]           
____________________________________________________________________________________________________
flatten_25 (Flatten)             (None, 300)           0           concatenate_25[0][0]             
____________________________________________________________________________________________________
dense_47 (Dense)                 (None, 1)             301         flatten_25[0][0]                 
====================================================================================================

正如我上面所说,这工作得很好,仅在 3-4 个 epoch 后就获得了很好的准确性。然而,我的想法是 CNN 识别区域模式,但如果我还想在给定的输入向量中模拟它们在更长距离内如何相互关联,我应该在卷积之后使用某种 RNN 风格。因此,我尝试在卷积后更改层的 ,删除pool_size,然后将层传递给RNN。例如MaxPooling1DFlattenConcatenate

maxpool3 = L.MaxPool1D(pool_size=((50,), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=((50,), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(49,), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=1)([maxpool3,maxpool4,maxpool5])
rnn=L.SimpleRNN(75)(concatenated_tensor) 
output = L.Dense(units=1, activation='sigmoid')(rnn)

现在摘要变为:

max_pooling1d_95 (MaxPooling1D)  (None, 50, 100)       0           conv1d_97[0][0]                  
____________________________________________________________________________________________________
max_pooling1d_96 (MaxPooling1D)  (None, 50, 100)       0           conv1d_98[0][0]                  
____________________________________________________________________________________________________
max_pooling1d_97 (MaxPooling1D)  (None, 49, 100)       0           conv1d_99[0][0]                  
____________________________________________________________________________________________________
concatenate_32 (Concatenate)     (None, 149, 100)      0           max_pooling1d_95[0][0]           
                                                                   max_pooling1d_96[0][0]           
                                                                   max_pooling1d_97[0][0]           
____________________________________________________________________________________________________
simple_rnn_5 (SimpleRNN)         (None, 75)            13200       concatenate_32[0][0]             
____________________________________________________________________________________________________
dense_51 (Dense)                 (None, 1)             76          simple_rnn_5[0][0]               
====================================================================================================

当我训练模型时,预测都是完全相同的:class[1] 与 class[0] 的比率。我读过一些论文,人们成功地使用了这个方案,所以很明显我做错了什么,我敢打赌这是一个令人尴尬的愚蠢错误。有人愿意帮忙诊断吗?

标签: pythontensorflowkerasconv-neural-networkrnn

解决方案


您可以尝试的第一件事是沿特征轴连接,而不是时间轴。基本上试试这个:

maxpool3 = L.MaxPool1D(pool_size=(50,), strides=(1,))(conv3)
maxpool4 = L.MaxPool1D(pool_size=(50,), strides=(1,))(conv4)
maxpool5 = L.MaxPool1D(pool_size=(50,), strides=(1,))(conv5)
concatenated_tensor = L.Concatenate(axis=2)([maxpool3,maxpool4,maxpool5])
rnn=L.SimpleRNN(75)(concatenated_tensor) 
output = L.Dense(units=1, activation='sigmoid')(rnn)

(请注意,您必须确保 maxpool3、maxpool4 和 maxpool5 具有相同数量的“时间”步或 maxpool3.shape[1]=maxpool4.shape[1]=maxpool5.shape[1])

其次,使用 50 个时间步长,试一试 LSTM 或 GRU,因为它们可以比 LSTM 更好地捕捉更长的时间依赖关系。


推荐阅读