首页 > 解决方案 > xavier 和 kaming_uniform 权重初始化的 PyTorch/Tensorflow 影响

问题描述

我正在使用以下模型来训练网络:

class FemnistNet(nn.Module):
    def __init__(self):
        super(FemnistNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2) ##output shape (batch, 32, 28, 28)
        self.pool1 = nn.MaxPool2d(2, stride=2, ) ## output shape (batch, 32, 14, 14)
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2) ##output shape (batch, 64, 14, 14)
        self.pool2 = nn.MaxPool2d(2, stride=2) ## output shape (batch, 64, 7, 7)
        
        self.fc1 = nn.Linear(3136, 2048)
        self.fc2 = nn.Linear(2048 ,62)
        
    def forward(self, x):
        x = x.view(-1, 1, 28, 28)
        x = self.conv1(x)
        x = th.nn.functional.relu(x)

        x = self.pool1(x)

        x=self.conv2(x)
        x = th.nn.functional.relu(x)
        
        x = self.pool2(x)
        
        x = x.flatten(start_dim=1)
        
        x = self.fc1(x)
        l1_activations = th.nn.functional.relu(x)
        
        x = self.fc2(l1_activations)

        x = x.softmax()

        return x, l1_activations

的默认初始化weightskaiming_uniform. 它很好地训练了模型。当我初始化weightsusing xavieras时,th.nn.init.xavier_uniform_(self.fc1.weight)模型参数变为图层。初始化分布有什么影响?为什么权重变成in ?nandense/linearweightsnanth.nn.init.xavier_uniform_(self.fc1.weight)

不同的分布在Tensorflow. 我没有体验过NaNs in Tensorflow

标签: pythontensorflowpytorch

解决方案


推荐阅读