tensorflow - 注意力机制/Tensorflow 教程
问题描述
我正在尝试改进我的注意力机制代码草稿,其中我基本上对解码器步骤进行了迭代,并且 LSTM 解码器单元在每个步骤从注意力模块获取上下文向量:
post_activation_LSTM_cell = layers.LSTM(n_s, return_state = True)
output_layer = Dense(1)
s0 = Input(shape=(n_s,), name='s0')
c0 = Input(shape=(n_s,), name='c0')
s = s0
c = c0
outputs = []
input_tensor = Input(shape=(past_period,raw_dataset.shape[-1]))
h = Bidirectional(LSTM(n_a, return_sequences = True))(input_tensor)
for t in range(preview_period):
context = one_step_attention(h,s)
s, _, c = post_activation_LSTM_cell(context,initial_state = [s, c])
out = output_layer(s)
outputs.append(out)
model=Model([input_tensor,s0,c0],outputs)
model.summary()
我发现 tensorflow 教程中的实现更加清晰,但是我看不到解码器如何在每个输出步骤从 bahdanau 获得不同的上下文向量,看起来解码器只获得一个上下文向量,我错过了什么???
https://www.tensorflow.org/tutorials/text/nmt_with_attention
class BahdanauAttention(tf.keras.layers.Layer):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, query, values):
# query hidden state shape == (batch_size, hidden size)
# query_with_time_axis shape == (batch_size, 1, hidden size)
# values shape == (batch_size, max_len, hidden size)
# we are doing this to broadcast addition along the time axis to calculate the score
query_with_time_axis = tf.expand_dims(query, 1)
# score shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to self.V
# the shape of the tensor before applying self.V is (batch_size, max_length, units)
score = self.V(tf.nn.tanh(
self.W1(query_with_time_axis) + self.W2(values)))
# attention_weights shape == (batch_size, max_length, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
解决方案
你是对的,解码器只得到一个上下文向量。解码器类的call
方法只实现了解码器的一个步骤。
在本教程的进一步介绍中,在训练时有 for 循环迭代目标句子,另一个循环用于在推理时进行采样。
推荐阅读
- visual-studio - Visual Studio 安装程序中的哪个“单个组件”适合 Rust 开发?
- c++ - 使用 vcpkg 如何使用不同的 c++ 标准构建 boost?
- c# - 如何在 Windows Server 2008 中使用 puppeteer sharp 获取网页截图
- jquery - 单击特定表格单元格后停用事件
- arrays - 无法设置未定义 TypeScript-Array 的属性“年份”
- angular - 将 ngModel 添加到单选按钮时,默认选中在角度 6 中不起作用
- typescript - 如何在 Typescript 中包装 async/await 而不会出现错误和警告
- php - wordpress 在必要的更改后不断返回主域而不是子域
- symfony - 安全防火墙注销路径重定向到 http 而不是 https 协议
- symfony - symfony 4 - 表单选择后重定向