python - Pytorch ValueError: Expected target size (2, 13),在调用 CrossEntropyLoss 时得到了 torch.Size([2])
问题描述
我正在尝试训练 Pytorch LSTM 网络,但是ValueError: Expected target size (2, 13), got torch.Size([2])
当我尝试计算 CrossEntropyLoss 时我得到了。我想我需要在某个地方改变形状,但我不知道在哪里。
这是我的网络定义:
class LSTM(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.2):
super(LSTM, self).__init__()
# network size parameters
self.n_layers = n_layers
self.hidden_dim = hidden_dim
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
# the layers of the network
self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim)
self.lstm = nn.LSTM(self.embedding_dim, self.hidden_dim, self.n_layers, dropout=drop_prob, batch_first=True)
self.dropout = nn.Dropout(drop_prob)
self.fc = nn.Linear(self.hidden_dim, self.vocab_size)
def forward(self, input, hidden):
# Perform a forward pass of the model on some input and hidden state.
batch_size = input.size(0)
print(f'batch_size: {batch_size}')
print(Input shape: {input.shape}')
# pass through embeddings layer
embeddings_out = self.embedding(input)
print(f'Shape after Embedding: {embeddings_out.shape}')
# pass through LSTM layers
lstm_out, hidden = self.lstm(embeddings_out, hidden)
print(f'Shape after LSTM: {lstm_out.shape}')
# pass through dropout layer
dropout_out = self.dropout(lstm_out)
print(f'Shape after Dropout: {dropout_out.shape}')
#pass through fully connected layer
fc_out = self.fc(dropout_out)
print(f'Shape after FC: {fc_out.shape}')
# return output and hidden state
return fc_out, hidden
def init_hidden(self, batch_size):
#Initializes hidden state
# Create two new tensors `with sizes n_layers x batch_size x hidden_dim,
# initialized to zero, for hidden state and cell state of LSTM
hidden = (torch.zeros(self.n_layers, batch_size, self.hidden_dim), torch.zeros(self.n_layers, batch_size, self.hidden_dim))
return hidden
我添加了注释,说明了每个位置的网络形状。我的数据位于名为 training_dataset 的 TensorDataset 中,具有两个属性、特征和标签。特征的形状为 torch.Size([97, 3]),标签的形状为:torch.Size([97])。
这是网络训练的代码:
# Size parameters
vocab_size = 13
embedding_dim = 256
hidden_dim = 256
n_layers = 2
# Training parameters
epochs = 3
learning_rate = 0.001
clip = 1
batch_size = 2
training_loader = DataLoader(training_dataset, batch_size=batch_size, drop_last=True, shuffle=True)
net = LSTM(vocab_size, embedding_dim, hidden_dim, n_layers)
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
loss_func = torch.nn.CrossEntropyLoss()
net.train()
for e in range(epochs):
print(f'Epoch {e}')
print(batch_size)
hidden = net.init_hidden(batch_size)
# loops through each batch
for features, labels in training_loader:
# resets training history
hidden = tuple([each.data for each in hidden])
net.zero_grad()
# computes gradient of loss from backprop
output, hidden = net.forward(features, hidden)
loss = loss_func(output, labels)
loss.backward()
# using clipping to avoid exploding gradient
nn.utils.clip_grad_norm_(net.parameters(), clip)
optimizer.step()
当我尝试进行培训时,出现以下错误:
Traceback (most recent call last):
File "train.py", line 75, in <module>
loss = loss_func(output, labels)
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 947, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/usr/local/lib/python3.8/site-packages/torch/nn/functional.py", line 2422, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/usr/local/lib/python3.8/site-packages/torch/nn/functional.py", line 2227, in nll_loss
raise ValueError('Expected target size {}, got {}'.format(
ValueError: Expected target size (2, 13), got torch.Size([2])
这里也是打印语句的结果:
batch_size: 2
Input shape: torch.Size([2, 3])
Shape after Embedding: torch.Size([2, 3, 256])
Shape after LSTM: torch.Size([2, 3, 256])
Shape after Dropout: torch.Size([2, 3, 256])
Shape after FC: torch.Size([2, 3, 13])
发生了某种形状错误,但我不知道在哪里。任何帮助,将不胜感激。如果相关,我正在使用 Python 3.8.5 和 Pytorch 1.6.0。
解决方案
对于将来遇到此问题的任何人,我在 pytorch 论坛上提出了同样的问题,并通过 ptrblock 得到了很好的答案,在这里找到。
问题是我的 LSTM 层有 batch_first=True,这意味着它返回输入序列的每个成员的输出(大小为 (batch_size, sequence_size, vocab_size))。但是,我只想要输入序列的最后一个成员的输出(大小为(batch_size,vocab_size)。
所以,在我的转发功能中,而不是
# pass through LSTM layers
lstm_out, hidden = self.lstm(embeddings_out, hidden)
它应该是
# pass through LSTM layers
lstm_out, hidden = self.lstm(embeddings_out, hidden)
# slice lstm_out to just get output of last element of the input sequence
lstm_out = lstm_out[:, -1]
这解决了形状问题。错误消息有点误导,因为它说目标是错误的形状,而实际上输出是错误的形状。
推荐阅读
- linux - 单声道开发:c#多个项目
- java - Vaadin 中的单身人士
- python-sphinx - 如何使 Sphinx 从各个页面的源文件的访问时间生成“最后修改”?
- ssis - SSIS 不要在脚本任务失败时使代理作业失败
- python - 如何在python中使用lambda函数
- android - 无限进度条android布局
- javascript - 从 Rauschmayer 的教科书中删除这个例子
- oracle - Oracle APEX IG - 根据数据库表检查页面加载时的行选择器
- list - 如何将两个共享点列表的权限授予两个用户一个不应查看其他用户
- laravel - Laravel 控制器构造函数异常