machine-learning - PyTorch 中的双向 LSTM 输出问题
问题描述
您好我有一个关于如何从 BI-LSTM 模块的输出中收集正确结果的问题。
假设我有一个 10 长度的序列馈入一个具有 100 个隐藏单元的单层 LSTM 模块:
lstm = nn.LSTM(5, 100, 1, bidirectional=True)
output
将具有以下形状:
[10 (seq_length), 1 (batch), 200 (num_directions * hidden_size)]
# or according to the doc, can be viewed as
[10 (seq_length), 1 (batch), 2 (num_directions), 100 (hidden_size)]
如果我想在两个方向(两个 100 维向量)上获得第三个(1-index)输入的输出,我该如何正确地做到这一点?
我知道output[2, 0]
会给我一个 200 维向量。这 200 个暗淡向量是否代表两个方向的第三个输入的输出?
困扰我的一件事是,当进行反向馈送时,第三个(1-index)输出向量是根据第 8 个(1-index)输入计算的,对吧?
pytorch 会自动处理这个问题并根据方向对输出进行分组吗?
谢谢!
解决方案
是的,当使用 BiLSTM 时,方向的隐藏状态只是连接起来(中间之后的第二部分是反向序列馈送的隐藏状态)。
所以在中间分裂就很好了。
随着从右到左维度的重塑工作,您在分离两个方向时不会遇到任何问题。
这是一个小例子:
# so these are your original hidden states for each direction
# in this case hidden size is 5, but this works for any size
direction_one_out = torch.tensor(range(5))
direction_two_out = torch.tensor(list(reversed(range(5))))
print('Direction one:')
print(direction_one_out)
print('Direction two:')
print(direction_two_out)
# before outputting they will be concatinated
# I'm adding here batch dimension and sequence length, in this case seq length is 1
hidden = torch.cat((direction_one_out, direction_two_out), dim=0).view(1, 1, -1)
print('\nYour hidden output:')
print(hidden, hidden.shape)
# trivial case, reshaping for one hidden state
hidden_reshaped = hidden.view(1, 1, 2, -1)
print('\nReshaped:')
print(hidden_reshaped, hidden_reshaped.shape)
# This works as well for abitrary sequence lengths as you can see here
# I've set sequence length here to 5, but this will work for any other value as well
print('\nThis also works for more multiple hidden states in a tensor:')
multi_hidden = hidden.expand(5, 1, 10)
print(multi_hidden, multi_hidden.shape)
print('Directions can be split up just like this:')
multi_hidden = multi_hidden.view(5, 1, 2, 5)
print(multi_hidden, multi_hidden.shape)
输出:
Direction one:
tensor([0, 1, 2, 3, 4])
Direction two:
tensor([4, 3, 2, 1, 0])
Your hidden output:
tensor([[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]]]) torch.Size([1, 1, 10])
Reshaped:
tensor([[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]]]) torch.Size([1, 1, 2, 5])
This also works for more multiple hidden states in a tensor:
tensor([[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]]]) torch.Size([5, 1, 10])
Directions can be split up just like this:
tensor([[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]]]) torch.Size([5, 1, 2, 5])
希望这可以帮助!:)
推荐阅读
- javascript - 如何覆盖Angular中的依赖项
- php - 如何返回 POST 值并在网站上打印?(wordpress)
- sql - 用 SELECT * FROM X WHERE Y is not NULL 替换查询
- python - 如何找到熊猫数据框值的第一次显着差异?
- typescript - 基础 tsconfig.json 中 outDir 和 rootDir 的相对路径
- c# - C# Mysql 在我的程序中更新/添加/获取用户时延迟
- docker - Docker - 当文件存在于该路径中时,发出 dockerizing flask 应用程序
- ios - 如何在 Swift 中使用 Date 类获取当前日期
- cplex - 跳过 OPL 模型中表中的缺失数据
- google-app-engine - (基本缩放)如果达到空闲超时,App Engine 会关闭仍在忙于处理请求的应用吗?