machine-learning - Why does the same PyTorch code (different implementation) give different loss?
问题描述
I was tackling the Fashion MNIST data-set problem on Udacity. However my implementation of code is giving drastically different loss as compared to the solution shared by the Udacity team. I believe the only difference in my answer is the definition of the Neural Network and apart from that everything is the same. I am not able to figure out the reason for such a drastic difference in Loss.
Code 1: My solution:
import torch.nn as nn
from torch import optim
images, labels = next(iter(trainloader))
model = nn.Sequential(nn.Linear(784,256),
nn.ReLU(),
nn.Linear(256,128),
nn.ReLU(),
nn.Linear(128,64),
nn.ReLU(),
nn.Linear(64,10),
nn.LogSoftmax(dim=1))
# Flatten images
optimizer = optim.Adam(model.parameters(),lr=0.003)
criterion = nn.NLLLoss()
for i in range(10):
running_loss = 0
for images,labels in trainloader:
images = images.view(images.shape[0], -1)
output = model.forward(images)
loss = criterion(output,labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
print(f"Training loss: {running_loss}")
# Loss is coming around 4000
Code 2: Official Solution:
from torch import nn, optim
import torch.nn.functional as F
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.log_softmax(self.fc4(x), dim=1)
return x
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
epochs = 5
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
log_ps = model(images)
loss = criterion(log_ps, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
print(f"Training loss: {running_loss}")
# Loss is coming around 200
Is there any explanation for the vast difference in loss ?
解决方案
您忘记在实现中将梯度归零/清除。那就是你失踪了:
optimizer.zero_grad()
换句话说,只需执行以下操作:
for i in range(10):
running_loss = 0
for images,labels in trainloader:
images = images.view(images.shape[0], -1)
output = model.forward(images)
loss = criterion(output,labels)
# missed this!
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
print(f"Training loss: {running_loss}")
你很高兴!
推荐阅读
- .net-core - 在 .NET Core 解决方案中使用 Dynamics AX 2009 Business Connector .NET Framework 3.5
- content-security-policy - 内容安全策略指令:“default-src 'self'”。请注意,'frame-src' 没有明确设置,所以 'default-src' 用作后备
- laravel - 为什么当我通过 composer 安装时找不到 Doctrine Dbal 包驱动程序?
- java - IntelliJ IDE MissingApiTokenError: `snyk` 需要经过身份验证的帐户。请运行 `snyk auth` 并重试
- java - Eclipse Gradle 向导使用 SubFolder -lib 创建项目
- angular - 如何在基于 webpack.config.js angular 8 的应用程序中访问位于文件夹内(不在 js 文件夹中)的文本文件
- c++ - 覆盖率分析显示 const 类型的错误。如何解决这个问题?
- python - Python - 如何将搜索结果(列表)存储在变量中?
- c# - 如何在.net核心中增加谷歌oauth令牌的到期时间
- amazon-web-services - 将数据从 lambda 发送到 firehose 时出错