keras - 问题合并 LSTM Seq2Seq 模型中的两层以用于问答用例
问题描述
我正在尝试基于bAbI Task 8 示例构建一个问答模型,但在将两个输入层合并为一层时遇到问题。这是我当前的模型架构:
story_input = Input(shape=(story_maxlen,vocab_size), name='story_input')
story_input_proc = Embedding(vocab_size, latent_dim, name='story_input_embed', input_length=story_maxlen)(story_input)
story_input_proc = Reshape((latent_dim,story_maxlen), name='story_input_reshape')(story_input_proc)
query_input = Input(shape=(query_maxlen,vocab_size), name='query_input')
query_input_proc = Embedding(vocab_size, latent_dim, name='query_input_embed', input_length=query_maxlen)(query_input)
query_input_proc = Reshape((latent_dim,query_maxlen), name='query_input_reshape')(query_input_proc)
story_query = dot([story_input_proc, query_input_proc], axes=(1, 1), name='story_query_merge')
encoder = LSTM(latent_dim, return_state=True, name='encoder')
encoder_output, state_h, state_c = encoder(story_query)
encoder_output = RepeatVector(3, name='encoder_3dim')(encoder_output)
encoder_states = [state_h, state_c]
decoder = LSTM(latent_dim, return_sequences=True, name='decoder')(encoder_output, initial_state=encoder_states)
answer_output = Dense(vocab_size, activation='softmax', name='answer_output')(decoder)
model = Model([story_input, query_input], answer_output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
这是 model.summary() 的输出
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
story_input (InputLayer) (None, 358, 38) 0
__________________________________________________________________________________________________
query_input (InputLayer) (None, 5, 38) 0
__________________________________________________________________________________________________
story_input_embed (Embedding) (None, 358, 64) 2432 story_input[0][0]
__________________________________________________________________________________________________
query_input_embed (Embedding) (None, 5, 64) 2432 query_input[0][0]
__________________________________________________________________________________________________
story_input_reshape (Reshape) (None, 64, 358) 0 story_input_embed[0][0]
__________________________________________________________________________________________________
query_input_reshape (Reshape) (None, 64, 5) 0 query_input_embed[0][0]
__________________________________________________________________________________________________
story_query_merge (Dot) (None, 358, 5) 0 story_input_reshape[0][0]
query_input_reshape[0][0]
__________________________________________________________________________________________________
encoder (LSTM) [(None, 64), (None, 17920 story_query_merge[0][0]
__________________________________________________________________________________________________
encoder_3dim (RepeatVector) (None, 3, 64) 0 encoder[0][0]
__________________________________________________________________________________________________
decoder (LSTM) (None, 3, 64) 33024 encoder_3dim[0][0]
encoder[0][1]
encoder[0][2]
__________________________________________________________________________________________________
answer_output (Dense) (None, 3, 38) 2470 decoder[0][0]
==================================================================================================
Total params: 58,278
Trainable params: 58,278
Non-trainable params: 0
__________________________________________________________________________________________________
其中 vocab_size = 38,story_maxlen = 358,query_maxlen = 5,latent_dim = 64,batch size = 64。
当我尝试训练这个模型时,我得到了错误:
Input to reshape is a tensor with 778240 values, but the requested shape has 20480
这是这两个值的公式:
input_to_reshape = batch_size * latent_dim * query_maxlen * vocab_size
requested_shape = batch_size * latent_dim * query_maxlen
我在哪里
我相信错误消息是说输入到query_input_reshape
图层中的张量的形状是 (?, 5, 38, 64) 但它期望形状为 (?, 5, 64) 的张量(参见上面的公式),但我可以错了。
当我将 Reshape 的 target_shape 输入更改为 3D(即Reshape((latent_dim,query_maxlen,vocab_size)
)时,我得到了错误total size of new array must be unchanged
,这对我来说没有任何意义,因为输入是 3D。你会认为这Reshape((latent_dim,query_maxlen))
会给我这个错误,因为它将 3D 张量更改为 2D 张量,但它编译得很好,所以我不知道那里发生了什么。
我使用 Reshape 的唯一原因是我需要将两个张量合并为 LSTM 编码器的输入。当我尝试摆脱 Reshape 图层时,我在尝试编译模型时只会出现尺寸不匹配错误。上面的模型架构至少可以编译,但我无法训练它。
有人可以帮我弄清楚如何合并 story_input 和 query_input 层吗?谢谢!
解决方案
推荐阅读
- r - 在R(dplyr)中进行多次分组后的百分比
- flask - 如果我点击 chrome 上的撤消按钮,如何清除会话。(烧瓶)
- python - Python获取类字段的类型?
- django - ImportError:无法从部分初始化的模块“delivery_api.main.serializers”导入名称“MealSerializer”
- python - 在 DRF 中过滤多个条件的数据
- angular - Angular Hybrid App:Angular 组件在本地渲染,但不在生产环境中渲染
- github - 根据拉取请求自动过滤文件
- android - 上传文件时出现 Sentry android 错误 - RAM 包的源映射类型无效,跳过
- java - 如何使用spring-boot从mongoDB中排除嵌套字段?
- amazon-web-services - AWS:使用变量将标签分配给秘密