首页 > 解决方案 > 基于 BERT 的 CNN - 卷积和 Maxpooling

问题描述

我正在尝试通过插入 CNN 层来微调预训练的 BERT 模型(拥抱脸变换器)。在此模型中,使用了所有变压器编码器的输出,而不仅仅是最新变压器编码器的输出。这样每个transformer encoder的输出向量就串联起来,就产生了一个矩阵:

卷积操作使用大小为窗口(3,BERT 的隐藏大小,在 BERT_base 模型中为 768)执行,并且通过在卷积输出上应用最大池化为每个变压器编码器生成最大值。

通过连接这些值,生成一个向量,作为全连接网络的输入。通过对输入应用softmax,进行分类操作。

在此处输入图像描述

我的问题是我似乎无法找到正确的参数来在该矩阵上执行卷积和最大池化。

批量大小 = 32 时,有 13 层 Transformer 编码器,每一层都作为编码标记化文本的输入 [64, 768] 并输出相同维度的编码。(64 是标记化的最大长度)

我想分别对每个转换器的输出矩阵 ([64,768]) 执行卷积,然后对该卷积的输出执行最大池化。因此,我应该为每个转换器获取一个最大值,并将这些最大值插入到神经网络中。

我的代码是:

class BERT_Arch(nn.Module):

    def __init__(self, bert):
        super(BERT_Arch, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.conv = nn.Conv2d(in_channels=13, out_channels=13, kernel_size= (3, 768), padding=True) 
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool1d(kernel_size=768, stride=1)
        self.dropout = nn.Dropout(0.1)
        self.fc = nn.Linear(9118464, 3)
        self.flat = nn.Flatten()
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):
        _, _, all_layers = self.bert(sent_id, attention_mask=mask, output_hidden_states=True)
        # all_layers  = [32, 13, 64, 768]
        x = torch.cat(all_layers, 0) # x= [416, 64, 768]
        x = self.conv(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.flat(x)
        x = self.fc(x)
        return self.softmax(x)

我不断收到错误消息,说卷积方法期望某个维度作为输入,但得到了不同的维度。

<generator object BERT_Arch.forward.<locals>.<genexpr> at 0x7fbeffc2d200>
torch.Size([416, 64, 768])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-3a2c2cd7c02d> in <module>()
    362 
    363         # train model
--> 364         train_loss, _ = train()
    365 
    366         # evaluate model

5 frames
<ipython-input-12-3a2c2cd7c02d> in train()
    148 
    149         # get model predictions for the current batch
--> 150         preds = model(sent_id, mask)
    151 
    152         # compute the loss between actual and predicted values

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-12-3a2c2cd7c02d> in forward(self, sent_id, mask)
     42         x = torch.cat(all_layers, 0) # torch.Size([13, 32, 64, 768])
     43         print(x.shape)
---> 44         x = self.conv(x)
     45         x = self.relu(x)
     46         x = self.pool(x)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
    421 
    422     def forward(self, input: Tensor) -> Tensor:
--> 423         return self._conv_forward(input, self.weight)
    424 
    425 class Conv3d(_ConvNd):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
    418                             _pair(0), self.dilation, self.groups)
    419         return F.conv2d(input, weight, self.bias, self.stride,
--> 420                         self.padding, self.dilation, self.groups)
    421 
    422     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [13, 13, 3, 768], but got 3-dimensional input of size [416, 64, 768] instead

我为卷积方法参数尝试了不同的值,我仍然得到类似的错误。有时会出现错误提示 maxpooling 输出大小太小:

Given input size: (64x62x1). Calculated output size: (64x31x0). Output size is too small

有时会出现这个错误(在更改 cnn 层的参数之后):

RuntimeError: Given groups=1, weight of size [32, 32, 3, 3], expected input[13, 4, 64, 768] to have 32 channels, but got 4 channels instead

或者

Expected input batch_size (X) to match target batch_size (Y)

我怎样才能做到这一点?对于如何正确执行此 CNN 层的任何帮助,我将不胜感激。

标签: deep-learningneural-networkpytorchconv-neural-networkhuggingface-transformers

解决方案


我建议您仔细阅读文档:PyTorch Conv2D

基本上,它假定输入是具有形状(批量大小、输入通道数、高度、宽度)的 4D 张量。假设有适当的填充,它将输出另一个具有形状(批量大小、输出通道数、高度、宽度)的 4D 张量。您的in_channels参数将是输入通道数,out_channels参数将是输出通道数。您将拥有out_channels不同的内核,每个内核的形状为(in_channels, kernel_size[0], kernel_size[1]). 因此,您的权重张量将是 4D 且形状为(batch_size, in_channels, kernel_size[0], kernel_size[1])


推荐阅读