首页 > 解决方案 > The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

问题描述

When I wanna assign part of pre-trained model parameters to another module defined in a new model of PyTorch, I got two different outputs using two different methods.

The Network is defined as follows:

class Net:
    def __init__(self):
        super(Net, self).__init__()

        self.resnet = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
        self.resnet = nn.Sequential(*list(self.resnet.children())[:-1])
        self.freeze_model(self.resnet)
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 3),
        )
    
    def forward(self, x):
        out = self.resnet(x)
        out = out.flatten(start_dim=1)
        out = self.classifier(out)
        return out

What I want is to assign pre-trained parameters to classifier in the net module. Two different ways were used for this task.

# First way
net.load_state_dict(torch.load('model_CNN_pretrained.ptl'))

# Second way
params = torch.load('model_CNN_pretrained.ptl')
net.classifier[1].weight = nn.Parameter(params['classifier.1.weight'], requires_grad =False)
net.classifier[1].bias = nn.Parameter(params['classifier.1.bias'], requires_grad =False)
net.classifier[3].weight = nn.Parameter(params['classifier.3.weight'], requires_grad =False)
net.classifier[3].bias = nn.Parameter(params['classifier.3.bias'], requires_grad =False)

The parameters were assigned correctly but got two different outputs from the same input data. The first method works correctly, but the second doesn't work well. Could some guys point what the difference of these two methods?

标签: pythonpytorchparameter-passing

解决方案


最后,我找出问题出在哪里。

在预训练过程中,即使我们将参数的 require_grad 设置为 False,ResNet18 模型的 BatchNorm2d 层中的缓冲区参数也会发生变化。缓冲区参数由model.train()处理后的输入数据计算,在model.eval()后不变。

有一个关于如何冻结BN层的链接。

如何在训练网络的其余部分时冻结 BN 层(mean 和 var 不会冻结)


推荐阅读