首页 > 解决方案 > 在视频分类模型中覆盖stem方法以改变过滤器通道

问题描述

我正在尝试使用 torchvision 的视频分类模型(R3D、R(2+1)D、MC18),但我的数据是单通道(灰度视频),并且这些模型使用 3 通道输入,在这种情况下我试图覆盖干课,有人可以确认我所做的是否正确吗?

对于 R3D18 和 MC18 stem=BasicStem

class BasicStemModified(nn.Sequential):


    def __init__(self):
        super(BasicStemModified, self).__init__(
            nn.Conv3d(1, 45, kernel_size=(7, 7, 1),  #changing filter to 1 channel input
                      stride=(2, 2, 1), padding=(3, 3, 0),
                      bias=False),
            nn.BatchNorm3d(45),
            nn.ReLU(inplace=True),

            nn.Conv3d(45, 64, kernel_size=(1, 1, 3),
                      stride=(1, 1, 1), padding=(0, 0, 1),
                      bias=False),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True))


model = torchvision.models.video.mc3_18(pretrained=False)

model.stem = BasicStemModified() #here assigning the modified stem


model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, num_classes)
)


model.to('cuda:0')

对于 R(2+1)D:

#For R(2+1)D model `stem=R2Plus1dStem`

class R2Plus1dStemModified(nn.Sequential):
    """R(2+1)D stem is different than the default one as it uses separated 3D convolution
    """
    def __init__(self):
        super(R2Plus1dStemModified, self).__init__(
            nn.Conv3d(3, 45, kernel_size=(1, 7, 7),   #changing filter to 1 channel input
                      stride=(1, 2, 2), padding=(0, 3, 3),
                      bias=False),
            nn.BatchNorm3d(45),
            nn.ReLU(inplace=True),
            nn.Conv3d(45, 64, kernel_size=(3, 1, 1),
                      stride=(1, 1, 1), padding=(1, 0, 0),
                      bias=False),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True))

model = torchvision.models.video.mc3_18(pretrained=False)

model.stem = R2Plus1dStemModified() #here assigning the modified stem

model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, num_classes)
)


model.to('cuda:0')

标签: deep-learningpytorchresnet

解决方案


当从 RGB 切换到灰色时,最简单的方法是更改​​ DATA 而不是模型:
如果您有一个只有一个通道(灰色)的输入帧,您可以简单地expand使用单通道维度来跨越三个通道。这很简单,允许您按原样使用预训练模型。


如果您坚持修改模型 - 您可以在保留大部分预训练权重的同时这样做:

model = torchvision.models.video.mc3_18(pretrained=True)  # get the pretrained
# modify only the first conv layer
origc = model.stem[0]  # the orig conv layer
# build a new layer only with one input channel
c1 = torch.nn.Conv3d(1, origc.out_channels, kernel_size=origc.kernel_size, stride=origc.stride, padding=origc.padding, bias=origc.bias)

# this is the nice part - init the new weights using the original ones
with torch.no_grad():
  c1.weight.data = origc.weight.data.sum(dim=1, keepdim=True)

推荐阅读