tensorflow - OutputProjectionWrapper vs fully connected layer on top of RNN
问题描述
I'm reading the 14th chapter of Hands-On Machine Learning with Scikit-Learn and TensorFlow. It says:
Although using an
OutputProjectionWrapper
is the simplest solution to reduce the dimensionality of the RNN’s output sequences down to just one value per time step (per instance), it is not the most efficient. There is a trickier but more efficient solution: you can reshape the RNN outputs, then apply a single fully connected layer with the appropriate output size. [...] This can provide a significant speed boost since there is just one fully connected layer instead of one per time step.
This makes no sense to me. In case of OutputProjectionWrapper
we need to perform 2 operations per time step:
- Calculate new hidden state based on previous hidden state and input.
- Calculate output by applying dense layer to calculated hidden state.
Of course, when we use plain BasicRNNCell
+ dense layer on top, we need to do only one operation on each time step (the first one), but then we need to pipe each output tensor through our dense layer. So we need to perform the very same amount of operations in the both cases.
Also, I can't understand the following part:
This can provide a significant speed boost since there is just one fully connected layer instead of one per time step.
Don't we have only one fully connected layer in both cases? As far as I understand OutputProjectionWrapper
uses the same shared layer on each time step. I don't even know how it can create different layer for every time step because OutputProjectionWrapper
has no information about the amount of time steps we will be using.
I will be very grateful if someone can explain the difference between these approaches.
UPD Here is pseudocode for the question. Am I missing something?
# 2 time steps, x1 and x2 - inputs, h1 and h2 - hidden states, y1 and y2 - outputs.
# OutputProjectionWrapper
h1 = calc_hidden(x1, 0)
y1 = dense(h1)
h2 = calc_hidden(x2, h1)
y2 = dense(h2)
# BasicRNNCell + dense layer on top of all time steps
h1 = calc_hidden(x1, 0)
y1 = h1
h2 = calc_hidden(x2, h1)
y2 = h2
y1 = dense(y1)
y2 = dense(y2)
UPD 2 I've created two small code snippets (one with OutputProjectionWrapper
and another with BasicRNNCell
and tf.layers.dense
on top) - both created 14 variables with the same shape. So there is definitely no memory differences between these approaches.
解决方案
我的猜测是,由于矩阵乘法优化,将 1 层应用于形状 (x, n) 的张量比将同一层应用于形状 (x) 的张量 n 次要快。
推荐阅读
- python - 有没有办法使用语音命令访问数据库和执行查询?
- python - 运行自定义一次性脚本 django
- javascript - 如何在使用推送聊天工具通知接收来自其他用户的消息时闪烁浏览器选项卡
- html - 不可滚动的内容 + 网页栏中不可点击的超链接
- python - 如何用另一个数组中的元素替换数组中的元素?
- angular - 使用角度 8 中的 ng-select 基于动态选择下拉菜单创建动态表单
- javascript - 如何使用 moment.js 格式化 Time Duration
- javascript - 当我在我的 word 插件应用程序中使用 Promise 时出现未定义的错误
- apache-flink - FlinkKafkaConsumer 设置 group.id 消费时无法正常工作
- kubernetes - 如何将命令行参数传递给 kubectl create 命令