python - 如何使用注意力机制在多层双向中操纵编码器状态
问题描述
我正在实现一个具有多层双向 rnn 和注意力机制的 Seq2Seq 模型,在遵循本教程https://github.com/tensorflow/nmt时,我对如何在双向层之后正确操作 encoder_state 感到困惑。
引用教程“对于多个双向层,我们需要稍微操纵encoder_state,有关更多详细信息,请参见model.py,方法_build_bidirectional_rnn()。” 这是代码的相关部分(https://github.com/tensorflow/nmt/blob/master/nmt/model.py第 770 行):
encoder_outputs, bi_encoder_state = (
self._build_bidirectional_rnn(
inputs=self.encoder_emb_inp,
sequence_length=sequence_length,
dtype=dtype,
hparams=hparams,
num_bi_layers=num_bi_layers,
num_bi_residual_layers=num_bi_residual_layers))
if num_bi_layers == 1:
encoder_state = bi_encoder_state
else:
# alternatively concat forward and backward states
encoder_state = []
for layer_id in range(num_bi_layers):
encoder_state.append(bi_encoder_state[0][layer_id]) # forward
encoder_state.append(bi_encoder_state[1][layer_id]) # backward
encoder_state = tuple(encoder_state)
所以这就是我现在所拥有的:
def get_a_cell(lstm_size):
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size)
#drop = tf.nn.rnn_cell.DropoutWrapper(lstm,
output_keep_prob=keep_prob)
return lstm
encoder_FW = tf.nn.rnn_cell.MultiRNNCell(
[get_a_cell(num_units) for _ in range(num_layers)])
encoder_BW = tf.nn.rnn_cell.MultiRNNCell(
[get_a_cell(num_units) for _ in range(num_layers)])
bi_outputs, bi_encoder_state = tf.nn.bidirectional_dynamic_rnn(
encoder_FW, encoder_BW, encoderInput,
sequence_length=x_lengths, dtype=tf.float32)
encoder_output = tf.concat(bi_outputs, -1)
encoder_state = []
for layer_id in range(num_layers):
encoder_state.append(bi_encoder_state[0][layer_id]) # forward
encoder_state.append(bi_encoder_state[1][layer_id]) # backward
encoder_state = tuple(encoder_state)
#DECODER -------------------
decoder_cell = tf.nn.rnn_cell.MultiRNNCell([get_a_cell(num_units) for _ in range(num_layers)])
# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units_attention, encoder_output ,memory_sequence_length=x_lengths)
decoder_cell = tf.contrib.seq2seq.AttentionWrapper(
decoder_cell,attention_mechanism,
attention_layer_size=num_units_attention)
decoder_initial_state = decoder_cell.zero_state(batch_size,tf.float32)
.clone(cell_state=encoder_state)
问题是我收到错误
The two structures don't have the same nested structure.
First structure: type=AttentionWrapperState
str=AttentionWrapperState(cell_state=(LSTMStateTuple(c=, h=),
LSTMStateTuple(c=, h=)), attention=, time=, alignments=, alignment_history=
(), attention_state=)
Second structure: type=AttentionWrapperState
str=AttentionWrapperState(cell_state=(LSTMStateTuple(c=, h=),
LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=)),
attention=, time=, alignments=, alignment_history=(), attention_state=)
这对我来说有点道理,因为我们没有包括所有输出层,但(我猜)只包括最后一层。而对于状态,我们实际上是连接所有层。
所以正如我所期待的,当只连接最后一层状态时,如下所示:
encoder_state = []
encoder_state.append(bi_encoder_state[0][num_layers-1]) # forward
encoder_state.append(bi_encoder_state[1][num_layers-1]) # backward
encoder_state = tuple(encoder_state)
它运行没有错误。
据我所知,在将编码器状态传递到注意力层之前,没有任何部分代码会再次转换编码器状态。那么他们的代码是如何工作的呢?更重要的是,我的修复是否破坏了注意力机制的正确行为?
解决方案
这是问题所在:
只有编码器是双向的,但您为解码器提供双向状态(始终是单向的)。
解决方案:
您所要做的就是简单地连接状态,因此,您再次操纵“单向数据”!
encoder_state = []
for layer_id in range(num_layers):
state_fw = bi_encoder_state[0][layer_id]
state_bw = bi_encoder_state[1][layer_id]
# Merging the fw state and the bw state
cell_state = tf.concat([state_fw.c, state_bw.c], 1)
hidden_state= tf.concat([state_fw.h, state_bw.h], 1)
# This state as the same structure than an uni-directional encoder state
state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state, h=hidden_state)
encoder_state.append(state)
encoder_state = tuple(encoder_state)
推荐阅读
- php - 数组中的默认选项
- java - 我在运行 java 中的 if else 程序时遇到问题,如果 if 语句为 false,则不会打印 else 语句
- makefile - 合并极其相似的 Make 规则
- python - 使用匀称几何扩展旋转的矩形
- laravel - Laravel:使用 Maatwebsite Excel 导入 Csv 文件
- sql - 显示按列分组的最大数量的行
- python - 当我的提交按钮在我的 while 循环中不起作用时,我如何摆脱这个 while 循环
- javascript - 未处理的承诺拒绝 [错误:屏幕“LocationType”的“组件”道具的值无效。它必须是一个有效的 React 组件。]
- sql - 基于 SQL Query 的数据透视表
- macos - Macos VSCode 不运行某些扩展