python - 我的 LSTM 模型没有学习,权重没有更新
问题描述
我在 pytorch 中的 LSTM 模型没有学习,并且在训练时没有得到任何更新....在每个 epoch 之后,我打印了每一层不同的权重总和,但它仍然没有得到任何更新......
y 数组是 (n,3) ,第一列保持实际大小,第二列是标签(0 或 1),最后一个是惩罚损失函数的权重。
显然 optimizer.step() 不起作用,并且不会将梯度应用于权重。单独说明;我已经尝试了具有不同学习率和小批量的模型,但没有不同的结果。结果显示来自随机生成的虚拟变量,并且 y 标签是不平衡的数据集 ~ 3%!我尝试了不同的权重来克服不平衡的配给,但我猜这个模型有问题。
此外,如果我使用自己的数据集运行模型,lstm 层(所有四个参数)的梯度将为零!但在线性层中有梯度。
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence
class LSTMClassifier(nn.Module):
"""
This is the simple RNN model we will be using to perform Sentiment Analysis.
"""
def __init__(self, feature_size, hidden_dim , layer_dim = 1):
"""
Initialize the model by settingg up the various layers.
"""
super(LSTMClassifier, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(feature_size, hidden_dim , layer_dim, batch_first = True)
self.dense = nn.Linear(in_features=hidden_dim, out_features=1)
self.sig = nn.Sigmoid()
def init_hidden(self, x):
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)
return [t for t in (h0, c0)]
def forward(self, x , y):
#import pdb; pdb.set_trace()
"""
Perform a forward pass of our model on some input.
"""
#h0, c0 = self.init_hidden(x)
x_seq = y[:,0]
x = pack_padded_sequence(x, x_seq, batch_first=True , enforce_sorted = False)
lstm_out, _ = self.lstm(x)
lstm_out, _ = torch.nn.utils.rnn.pad_packed_sequence(lstm_out, batch_first=True)
lstm_out = lstm_out.contiguous()
out = self.dense(lstm_out)[:,-1,:]
#out = out[range(len(x_seq)), (x_seq - 1)]
return self.sig(out.squeeze())
def _get_train_data_loader(batch_size, X , y):
print("Get train/test data loader.")
train_y = torch.from_numpy(y).long()
try :
train_X = torch.from_numpy(X).float()
except :
train_X = X
train_ds = torch.utils.data.TensorDataset(train_X, train_y)
return torch.utils.data.DataLoader(train_ds, batch_size=batch_size , shuffle=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print ('train dist' , sum(y_train[:,1])/y_train.shape[0] , '\n ****\n' ,
'test dist' , sum(y_test[:,1])/y_test.shape[0])
optimizer = optim.SGD(model.parameters() , lr=.01 )
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader = _get_train_data_loader(batch_size = 128, X = X_ , y = y_weighted)
test_loader = _get_train_data_loader(batch_size = 512, X = X_test , y = y_test)
epochs = 2
model = LSTMClassifier (87 , 5)
loss_fn = torch.nn.BCELoss(reduction='none')
train_loss = []
test_loss = []
for epoch in range(1, epochs + 1):
print ('epoch = ' , epoch)
model.train()
total_loss = 0
for batch in train_loader:
batch_X, batch_y = batch
#batch_X = batch_X.to(device)
#batch_y = batch_y.to(device)
# TODO: Complete this train method to train the model provided.
optimizer.zero_grad()
# Forward pass
outputs = model(batch_X , batch_y)
y_ = batch_y[:,1].float()
loss = loss_fn(outputs, y_)
weight=batch_y[:,2].float()
loss = (loss * weight).mean()
# Backward and optimize
loss.backward()
optimizer.step()
total_loss += loss.data.item()
for p in model.parameters():
print(torch.sum(p.grad))
print ('total loss' , total_loss)
print ('lstm weight' , torch.sum(model.lstm.weight_hh_l0.data) , 'dense_weight' , torch.sum(model.dense.weight.data))
train_loss.append(total_loss)
with torch.no_grad():
n_correct = 0
n_samples = 0
for test, labels in test_loader:
#labels = labels.to(device)
outputs = model(test , labels)
# max returns (value ,index)
predicted = torch.round(outputs)
n_samples += labels.size(0)
n_correct += (predicted == labels[:,1]).sum().item()
acc = 100.0 * n_correct / n_samples
print(f'Accuracy of the network on the 10000 test images: {acc} %')
test_loss.append(acc)
******************************************
epoch = 1
tensor(6518.8760)
tensor(-236.9392)
tensor(149.6967)
tensor(149.6966)
tensor(-1551.1709)
tensor(5021.9199)
total loss 1871.447255373001
lstm weight tensor(3.0054) dense_weight tensor(-0.5395)
Accuracy of the network on the 10000 test images: 96.92037099752012 %
epoch = 2
tensor(7822.5503)
tensor(-284.3271)
tensor(179.6338)
tensor(179.6338)
tensor(-1861.4037)
tensor(6026.2920)
total loss 1871.465574145317
lstm weight tensor(3.0054) dense_weight tensor(-0.5395)
Accuracy of the network on the 10000 test images: 96.92037099752012 %
***********************************************
另一个问题是当我们使用数据加载器加载数据时,如何在 BCEloss(weight) 中使用权重?剩下的唯一方法是在每个循环中实例化一个 loss_fn 吗?
任何帮助深表感谢 !
解决方案
推荐阅读
- ms-access - 从一个表单创建多个记录 (MS Access 2010)
- android - Flutter Navigator.replace 和 Navigator.replaceRouteBelow
- .net - SQLCMD 模式与 .net windows 应用程序一起运行
- ios - FCM 注册令牌未获得
- reactjs - 如何将域 url 保存在单个文件中并在不同的组件中重复使用?
- sql-server - sys.triggers 的审计触发器
- python - Scrapy - Django 中的 Reactor 不可重启
- c# - 使用 Web 请求在客户端获取和显示字符串
- java - 无法使用自动 IT 脚本连接到远程桌面
- webpack - webpack-bundle-analyzer.openAnalyzer 选项不起作用