tensorflow - LSTMBlockFusedCell 的(测试)错误是否比 LSTMCell 高 6%,或者我是否在 dropout 时犯了错误?
问题描述
LSTMCell
我用、DropoutWrapper
、MultiRNNCell
和bidirectional_dynamic_rnn
( Model_Orig ) 为回归问题构建了一个简单的堆叠动态双向 LSTM 。20 个 epoch 后的测试绝对误差为 2.89,训练时间为 14.5 小时。
然后我尝试了另一种实现(Model_blockfused),它具有相同的结构,但使用了块融合组件(即tf.layers.dropout
, LSTMBlockFusedCell
, TimeReversedFusedRNN
)。Model_blockfused的训练时间要短得多(3.6 小时),但 20 个 epoch 后的测试绝对误差要高出大约 6%(3.06)。
LSTMBlockFusedCell
那么,我应该期望和之间有 6% 的性能差异LSTMCell
吗?或者我在构建Model_blockfused时是否犯了任何错误(特别是对于 dropout) ?
这是Model_Orig的简化代码:
LSTM_CELL_SIZE = 200
keep_prob = 0.90
parallel_iterations = 512
dropcells = []
for iiLyr in list(range(3)):
cell_iiLyr = tf.nn.rnn_cell.LSTMCell(num_units=LSTM_CELL_SIZE, state_is_tuple=True)
dropcells.append(tf.nn.rnn_cell.DropoutWrapper(cell=cell_iiLyr, output_keep_prob=keep_prob))
MultiLyr_cell = tf.nn.rnn_cell.MultiRNNCell(cells=dropcells, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=MultiLyr_cell,
cell_bw=MultiLyr_cell,
inputs=Orig_input_TSs, #shape of Orig_input_TSs: [#batches, time_len, #input_features]
dtype=tf.float32,
sequence_length=length, # shape of length: [#batches, 1]
parallel_iterations = parallel_iterations, # default:32, Those operations which do not have any temporal dependency and can be run in parallel, will be.
scope = "BiLSTM"
)
states_fw, states_bw = states
# get the states (c and h, both directions) from the top LSTM layer for final fully connected layers.
c_fw_lstLyr, h_fw_lstLyr = states_fw[-1]
c_bw_lstLyr, h_bw_lstLyr = states_bw[-1]
这是Model_blockfused的简化代码:
LSTM_CELL_SIZE = 200
keep_prob = 0.90
Flg_training = True # True: training
# convert the input sequences (Orig_input_TSs) to the time major format
# shape of input_TSs_TimeMajor: [time_len, #batches, #input_features]
input_TSs_TimeMajor = tf.transpose(Orig_input_TSs, perm=[1,0,2])
# apply the dropout
# shape of dropout_input_TSs_TimeMajor: [time_len, #batches, #input_features]
dropout_input_TSs_TimeMajor = tf.layers.dropout(
input_TSs_TimeMajor,
rate=1.0 - keep_prob, # dropout rate
training=Flg_training
)
# build the stacked dynamic bidirectional LSTM
for iiLyr in list(range(3)):
cur_fw_BFcell_obj = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj = tf.contrib.rnn.TimeReversedFusedRNN(cur_fw_BFcell_obj)
if (iiLyr == 0):
# first layer (different inputs)
# shape of fw_out_TM (or bw_out_TM): [time_len, #batches, LSTM_CELL_SIZE]
# fw_state (or bw_state): LSTMStateTuple(c, h))
fw_out_TM, fw_state = cur_fw_BFcell_obj(dropout_input_TSs_TimeMajor, dtype=tf.float32, sequence_length=length)
bw_out_TM, bw_state = cur_bw_BFcell_obj(dropout_input_TSs_TimeMajor, dtype=tf.float32, sequence_length=length)
else:
# shape of fw_out_TM (or bw_out_TM): [time_len, #batches, LSTM_CELL_SIZE]
# fw_state (or bw_state): LSTMStateTuple(c, h))
fw_out_TM, fw_state = cur_fw_BFcell_obj(fw_out_TM, dtype=tf.float32, sequence_length=length)
bw_out_TM, bw_state = cur_bw_BFcell_obj(bw_out_TM, dtype=tf.float32, sequence_length=length)
# get the LSTM states (c and h, both directions) from the top LSTM layer for final fully connected layers.
c_fw_lstLyr, h_fw_lstLyr = fw_state
c_bw_lstLyr, h_bw_lstLyr = bw_state
谢谢。
解决方案
首先,你应该为fw和bw使用两个独立的tf.contrib.rnn.LSTMBlockFusedCell,改变下面的代码
cur_fw_BFcell_obj = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj = tf.contrib.rnn.TimeReversedFusedRNN(cur_fw_BFcell_obj)
对此:
cur_fw_BFcell_obj = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj_cell = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj = tf.contrib.rnn.TimeReversedFusedRNN(cur_bw_BFcell_obj_cell)
其次,在 tf 的tf.contrib.rnn.stack_bidirectional_dynamic_rnn api 中,它说
组合的前向和后向层输出用作下一层的输入。
所以下面的代码
fw_out_TM, fw_state = cur_fw_BFcell_obj(fw_out_TM, dtype=tf.float32, sequence_length=length)
bw_out_TM, bw_state = cur_bw_BFcell_obj(bw_out_TM, dtype=tf.float32, sequence_length=length)
应改为:
next_layer_input = tf.concat([fw_out_TM, bw_out_TM], axis=2)
fw_out_TM, fw_state = cur_fw_BFcell_obj(next_layer_input, dtype=tf.float32, sequence_length=length)
bw_out_TM, bw_state = cur_bw_BFcell_obj(next_layer_input, dtype=tf.float32, sequence_length=length)
推荐阅读
- c# - Blazor 中的数据表:Dispose() 不起作用
- .net - SSRS集成.Net MVC,我的reportviewer webcontrol总是链接到jquery.min.js 3.1.1版本
- mysql - 在sql中创建一个动态时间范围
- android - Firebase 跟踪的 in_app_purchase 事件是 Play 控制台中实际购买的一半
- python - 如何在 django 中向 3rd 方应用程序发送发布请求?
- reactjs - 通常如何在 React 组件或 React 应用程序中使用 Bootstrap?
- java - 如何通过 android studio 代码上特定 ID 上的 ratingBar 更新评级
- spring-boot - 将默认 RabbitMQ 交换绑定到默认交换
- javascript - 如何在 vue-html2pdf 中添加 canvg 和 canvas-element?
- android - Cordova 安卓应用和谷歌助手集成