首页 > 解决方案 > Pytorch 中的 Sigmoid 函数破坏了梯度计算

问题描述

嘿,我一直在努力解决这个奇怪的问题。这是我的神经网络代码:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv_3d_=nn.Sequential(
            nn.Conv3d(1,1,9,1,4),
            nn.LeakyReLU(),
            nn.Conv3d(1,1,9,1,4),
            nn.LeakyReLU(),
            nn.Conv3d(1,1,9,1,4),
            nn.LeakyReLU()  
        )

        self.linear_layers_ = nn.Sequential(

            nn.Linear(batch_size*32*32*32,batch_size*32*32*3),
            nn.LeakyReLU(),
            nn.Linear(batch_size*32*32*3,batch_size*32*32*3),
            nn.Sigmoid()
        )

    def forward(self,x,y,z):
        conv_layer = x + y + z
        conv_layer = self.conv_3d_(conv_layer)
        conv_layer = torch.flatten(conv_layer)
        conv_layer = self.linear_layers_(conv_layer)
        conv_layer = conv_layer.view((batch_size,3,input_sizes,input_sizes))
        return conv_layer

我面临的奇怪问题是运行这个 NN 给了我一个错误

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3072]], which is output 0 of SigmoidBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

堆栈跟踪显示问题符合要求

conv_layer = self.linear_layers_(conv_layer)

但是,如果我将 FCN 的最后一个激活函数从 nn.Sigmoid() 替换为 nn.LeakyRelu(),则 NN 会正确执行。

谁能告诉我为什么 Sigmoid 激活函数会导致我的反向计算中断?

标签: pythondeep-learningpytorchconv-neural-network

解决方案


我发现我的代码有问题。我更深入地研究了就地实际意味着什么。所以,如果你检查线路

conv_layer = self.linear_layers_(conv_layer)

分配的linear_layers_正在就地更改 conv_layer 的值,因此这些值被覆盖,因此,梯度计算失败。这个问题的简单解决方案是使用 clone() 函数

IE

conv_layer = self.linear_layers_(conv_layer).clone()

这会创建右手计算的副本,并且 Autograd 能够存储计算图的引用。


推荐阅读