首页 > 解决方案 > 在 Huggingface BERT 模型之上添加密集层

问题描述

我想在输出原始隐藏状态的裸 BERT 模型转换器之上添加一个密集层,然后微调生成的模型。具体来说,我正在使用这个基本模型。这是模型应该做的:

  1. 对句子进行编码(句子的每个标记有 768 个元素的向量)
  2. 只保留第一个向量(与第一个标记相关)
  3. 在此向量之上添加一个密集层,以获得所需的转换

到目前为止,我已经成功地对句子进行了编码:

from sklearn.neural_network import MLPRegressor

import torch

from transformers import AutoModel, AutoTokenizer

# List of strings
sentences = [...]
# List of numbers
labels = [...]

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = AutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")

# 2D array, one line per sentence containing the embedding of the first token
encoded_sentences = torch.stack([model(**tokenizer(s, return_tensors='pt'))[0][0][0]
                                 for s in sentences]).detach().numpy()

regr = MLPRegressor()
regr.fit(encoded_sentences, labels)

通过这种方式,我可以通过输入编码的句子来训练神经网络。然而,这种方法显然没有微调基础 BERT 模型。有谁能够帮我?如何构建一个可以完全微调的模型(可能在 pytorch 中或使用 Huggingface 库)?

标签: pythonpython-3.xneural-networkpytorchhuggingface-transformers

解决方案


有两种方法可以做到这一点:由于您希望为类似于分类的下游任务微调模型,您可以直接使用:

BertForSequenceClassification班级。在 768 的输出维度上执行逻辑回归层的微调。

或者,您可以定义一个自定义模块,该模块基于预训练的权重创建一个 bert 模型并在其之上添加层。

from transformers import BertModel
class CustomBERTModel(nn.Module):
    def __init__(self):
          super(CustomBERTModel, self).__init__()
          self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
          ### New layers:
          self.linear1 = nn.Linear(768, 256)
          self.linear2 = nn.Linear(256, 3) ## 3 is the number of classes in this example

    def forward(self, ids, mask):
          sequence_output, pooled_output = self.bert(
               ids, 
               attention_mask=mask)

          # sequence_output has the following shape: (batch_size, sequence_length, 768)
          linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) ## extract the 1st token's embeddings

          linear2_output = self.linear2(linear2_output)

          return linear2_output

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = CustomBERTModel() # You can pass the parameters if required to have more flexible model
model.to(torch.device("cpu")) ## can be gpu
criterion = nn.CrossEntropyLoss() ## If required define your own criterion
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()))

for epoch in epochs:
    for batch in data_loader: ## If you have a DataLoader()  object to get the data.

        data = batch[0]
        targets = batch[1] ## assuming that data loader returns a tuple of data and its targets
        
        optimizer.zero_grad()   
        encoding = tokenizer.batch_encode_plus(data, return_tensors='pt', padding=True, truncation=True,max_length=50, add_special_tokens = True)
        outputs = model(input_ids, attention_mask=attention_mask)
        outputs = F.log_softmax(outputs, dim=1)
        input_ids = encoding['input_ids']
        attention_mask = encoding['attention_mask']
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
        


推荐阅读