deep-learning - 使用 PyTorch 在 Adam 模型中获得 nan 损失
问题描述
我是训练神经网络的新手。如果这是一个非常愚蠢的问题或违反了任何未说明的堆栈溢出规则,请原谅我。我最近开始研究泰坦尼克号数据集。我清理了数据。我有一个特征张量,它是通过连接归一化的连续数据和分类数据的一个热张量来制作的。我将这些数据传递到一个简单的线性模型中,并且我在所有时期都得到了 nan 损失。
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm import tqdm
import pickle
import pathlib
path = pathlib.Path('./drive/My Drive/Kaggle/Titanic')
with open(path/'feature_tensor.pickle', 'rb') as f:
features = pickle.load(f)
with open(path/'label_tensor.pickle', 'rb') as f:
labels = pickle.load(f)
features = features.float()
labels = labels.float()
import math
valid_size = -1 * math.floor(0.2*len(features))
train_features = features[:valid_size]
valid_features = features[valid_size:]
train_labels = labels[:valid_size]
valid_labels = labels[valid_size:]
class Model(nn.Module):
def __init__(self):
super().__init__()
self.h_l1 = nn.Linear(18, 64)
self.h_l2 = nn.Linear(64, 32)
self.o_l = nn.Linear(32, 2)
def forward(self, x):
x = F.relu(self.h_l1(x))
x = F.relu(self.h_l2(x))
return self.o_l(x)
model = Model()
model.to('cuda')
optimizer = optim.Adam(model.parameters())
loss_fn = nn.MSELoss()
EPOCHS = 5
BATCH_SIZE = 20
for EPOCH in range(0, EPOCHS):
for i in tqdm(range(0, len(features), BATCH_SIZE)):
train_feature_batch = train_features[i:i+BATCH_SIZE,:].to('cuda')
train_label_batch = train_labels[i:i+BATCH_SIZE,:].to('cuda')
valid_feature_batch = valid_features[i:i+BATCH_SIZE,:].to('cuda')
valid_label_batch = valid_labels[i:i+BATCH_SIZE,:].to('cuda')
train_loss = loss_fn(model(train_feature_batch), train_label_batch)
with torch.no_grad():
valid_loss = loss_fn(model(valid_feature_batch), valid_label_batch)
optimizer.zero_grad()
train_loss.backward()
optimizer.step()
print(f"Epoch : {EPOCH}\tTrain Loss : {train_loss}\tValid_loss : {valid_loss}\n")
我得到以下输出:
100%|██████████| 45/45 [00:00<00:00, 511.50it/s]
100%|██████████| 45/45 [00:00<00:00, 604.10it/s]
100%|██████████| 45/45 [00:00<00:00, 586.21it/s]
0%| | 0/45 [00:00<?, ?it/s]Epoch : 0 Train Loss : nan Valid_loss : nan
Epoch : 1 Train Loss : nan Valid_loss : nan
Epoch : 2 Train Loss : nan Valid_loss : nan
100%|██████████| 45/45 [00:00<00:00, 555.55it/s]
100%|██████████| 45/45 [00:00<00:00, 607.65it/s]Epoch : 3 Train Loss : nan Valid_loss : nan
Epoch : 4 Train Loss : nan Valid_loss : nan
是的,输出是这样分散的。请帮忙。
解决方案
推荐阅读
- arrays - ngFor 数组对象中的角度循环
- python - 如何处理这个带有多个选项卡的 csv 文件?
- webgl - Godot 3.1 WebGL 上下文丢失使用移动浏览器的错误
- c++ - 如何使用文件和行号抛出错误?
- google-drive-api - google drive api v2/v3:exportLinks 和 File.export 是否相同?
- bash - docker-compose:CMD 变量替换
- python - 列出熊猫数据框列中的所有单词
- python - 从熊猫数据框中删除所有零和的列和行的最佳方法
- python - 寻找数字河流的交汇点
- html - html/css .css 文件更改