首页 > 解决方案 > Keras 双向 LSTM:传递了与 `cell.state_size 不兼容的初始状态`

问题描述

我正在尝试在 Keras 中构建一个堆叠的双向 LSTM seq2seq 模型,但是在将编码器的输出状态传递给解码器的输入状态时遇到了问题。根据这个拉取请求,这看起来应该是可能的。最终,我想encoder_output为其他下游任务保留向量。

错误信息:

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 100), ndim=2)]; however `cell.state_size` is (100, 100)

我的模型:

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250

embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat"
                               name="encoder_lstm_2")(encoder_lstm_1)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(LSTM(latent_size_1, return_sequences=True), 
                                merge_mode="concat", 
                                name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

非常感谢任何帮助/建议/方向!

标签: pythontensorflowkeras

解决方案


您的代码有两个问题,

  1. 正如@Daniel 指出的那样,您不应该将encoder_states( 而不是encoder_states = [forward_h, forward_c, backward_h, backward_c])中的状态连接起来

  2. 编码器返回的状态是大小latent_size_2(不是latent_size_1)。所以如果你想把它作为你的解码器初始状态,你的解码器应该是latent_size_2.

您可以在下面找到带有这些更正的代码。

from tensorflow.keras.layers import Embedding, Input, Bidirectional, LSTM, Dense, Concatenate
from tensorflow.keras.initializers import Constant
from tensorflow.keras.models import Model

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250
num_words = 5000
embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(1.0),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat", name="encoder_lstm_2")(encoder_lstm_1)
encoder_states = [forward_h, forward_c, backward_h, backward_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(
    LSTM(latent_size_2, return_sequences=True), 
    merge_mode="concat", name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

推荐阅读