python - Tensorflow:使用 raw_rnn 复制 dynamic_rnn 行为
问题描述
我正在尝试复制tf.nn.dynamic_rnn
使用低级 api的行为tf.nn.raw_rnn
。为此,我使用相同的数据块,设置随机种子并使用相同的 hparams 来创建细胞和循环神经网络。但是,两种实现产生的输出并不相等。下面是数据和代码。
data
和lengths
:_
X = np.array([[[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [0.0, 0.0, 0.0]], [[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [7.7, 8.8, 9.9]], [[1.1, 2.2, 3.3], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]], dtype=np.float32)
X_len = np.array([2, 3, 1], dtype=np.int32)
tf.nn.dynamic_rnn
实施:
tf.reset_default_graph()
tf.set_random_seed(42)
inputs = tf.placeholder(shape=(3, None, 3), dtype=tf.float32)
lengths = tf.placeholder(shape=(None,), dtype=tf.int32)
lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
outputs, state = tf.nn.dynamic_rnn(inputs=inputs, sequence_length=lengths, cell=lstm_cell, dtype=tf.float32, initial_state=lstm_cell.zero_state(3, dtype=tf.float32), time_major=True)
outputs_reshaped = tf.transpose(outputs, perm=[1, 0, 2])
sess = tf.Session()
sess.run(tf.initializers.global_variables())
X = np.transpose(X, (1, 0, 2))
hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: X, lengths: X_len})
print(hidden_state)
tf.nn.raw_rnn
实施:
tf.reset_default_graph()
tf.set_random_seed(42)
inputs = tf.placeholder(shape=(3, None, 3),dtype=tf.float32)
lengths = tf.placeholder(shape=(None,), dtype=tf.int32)
inputs_ta = tf.TensorArray(dtype=tf.float32, size=3)
inputs_ta = inputs_ta.unstack(inputs)
lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
def loop_fn(time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None for time == 0
if cell_output is None: # time == 0
next_cell_state = lstm_cell.zero_state(3, tf.float32)
else:
next_cell_state = cell_state
elements_finished = (time >= lengths)
finished = tf.reduce_all(elements_finished)
next_input = tf.cond(finished, true_fn=lambda: tf.zeros([3, 3], dtype=tf.float32), false_fn=lambda: inputs_ta.read(time))
next_loop_state = None
return (elements_finished, next_input, next_cell_state, emit_output, next_loop_state)
outputs_ta, final_state, _ = tf.nn.raw_rnn(lstm_cell, loop_fn)
outputs_reshaped = tf.transpose(outputs_ta.stack(), perm=[1, 0, 2])
sess = tf.Session()
sess.run(tf.initializers.global_variables())
X = np.transpose(X, (1, 0, 2))
hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: X, lengths: X_len})
print(hidden_state)
我确信它们之间存在一些差异,但我无法弄清楚它在哪里以及它是什么。如果有人有想法,那将是很棒的。
期待您的回答!
解决方案
差异的原因是您的变量被初始化为不同的值。你可以通过调用来查看:
print(sess.run(tf.trainable_variables()))
在它们被初始化之后。
这种差异的原因是有一个全局种子和一个 per-op 种子,因此设置随机种子不会强制对埋在 lstm 代码中的初始化程序的调用使用相同的随机种子。有关更多详细信息,请参阅此答案。总结一下:用于任何随机的随机种子,从您的全局种子开始,然后取决于添加到图中的最后一个操作的 id。
知道了这一点,我们可以通过以完全相同的顺序构建图形直到我们构建变量来强制变量种子在两个实现中相同:这意味着我们从相同的全局种子开始,并将相同的操作添加到以相同的顺序绘制直到变量,因此变量将具有相同的操作种子。我们可以这样做:
tf.reset_default_graph()
tf.set_random_seed(42)
lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
inputs_shape = (3, None, 3)
lstm_cell.build(inputs_shape)
需要 build 方法,因为这是将变量实际添加到图中的方法。
这是您所拥有的完整工作版本:
import tensorflow as tf
import numpy as np
X = np.array([[[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [0.0, 0.0, 0.0]], [[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [7.7, 8.8, 9.9]], [[1.1, 2.2, 3.3], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]], dtype=np.float32)
X_len = np.array([2, 3, 1], dtype=np.int32)
def dynamic():
tf.reset_default_graph()
tf.set_random_seed(42)
lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
inputs_shape = (3, None, 3)
lstm_cell.build(inputs_shape)
inputs = tf.placeholder(shape=inputs_shape, dtype=tf.float32)
lengths = tf.placeholder(shape=(None,), dtype=tf.int32)
outputs, state = tf.nn.dynamic_rnn(inputs=inputs, sequence_length=lengths, cell=lstm_cell, dtype=tf.float32,
initial_state=lstm_cell.zero_state(3, dtype=tf.float32), time_major=True)
outputs_reshaped = tf.transpose(outputs, perm=[1, 0, 2])
sess = tf.Session()
sess.run(tf.initializers.global_variables())
a = np.transpose(X, (1, 0, 2))
hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: a, lengths: X_len})
print(hidden_state)
def replicated():
tf.reset_default_graph()
tf.set_random_seed(42)
lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
inputs_shape = (3, None, 3)
lstm_cell.build(inputs_shape)
inputs = tf.placeholder(shape=inputs_shape, dtype=tf.float32)
lengths = tf.placeholder(shape=(None,), dtype=tf.int32)
inputs_ta = tf.TensorArray(dtype=tf.float32, size=3)
inputs_ta = inputs_ta.unstack(inputs)
def loop_fn(time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None for time == 0
if cell_output is None: # time == 0
next_cell_state = lstm_cell.zero_state(3, tf.float32)
else:
next_cell_state = cell_state
elements_finished = (time >= lengths)
finished = tf.reduce_all(elements_finished)
next_input = tf.cond(finished, true_fn=lambda: tf.zeros([3, 3], dtype=tf.float32),
false_fn=lambda: inputs_ta.read(time))
next_loop_state = None
return (elements_finished, next_input, next_cell_state, emit_output, next_loop_state)
outputs_ta, final_state, _ = tf.nn.raw_rnn(lstm_cell, loop_fn)
outputs_reshaped = tf.transpose(outputs_ta.stack(), perm=[1, 0, 2])
sess = tf.Session()
sess.run(tf.initializers.global_variables())
a = np.transpose(X, (1, 0, 2))
hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: a, lengths: X_len})
print(hidden_state)
if __name__ == '__main__':
dynamic()
replicated()
推荐阅读
- python - GridSearchCV 中的自定义评分功能:访问未缩放的功能,不在模型中
- javascript - jquery 的验证。主验证器函数或条件语句?
- c++ - 在另一个 objectB 中创建 objectA 时,objectA 是 objectB 的本地对象,objectA 是否存在于 objectB 的实例化之外?
- python-3.x - 如何在不中断 python 脚本执行的情况下捕获文件中的异常?
- delphi - Delphi XE2 - 调用读取函数回调时,DFM 流随机为空或损坏
- python - 如何为神经网络提供两个不同大小的输入?
- filter - 来自列范围 X 的值匹配以返回来自同一行但不同列范围的值
- c++ - 需要有关声明未在此范围内声明的代码的代码的帮助 (C++)
- java - tkinter python程序可以镜像到android应用程序上吗
- python - 修改CustomUser模型的字段顺序