python - 如何使用 Huggingface BERT 模型为二元分类器 CNN 提供数据?
问题描述
我对如何使用拥抱脸transformers
输出来训练一个简单的语言二元分类器模型来预测阿尔伯特爱因斯坦是否说过一句话感到有点困惑。
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = ["Hello World", "Hello There", "Bye Bye", "Two things are infinite: the universe and human stupidity; and I'm not sure about the universe."]
for input in inputs:
inputs = tokenizer(input, return_tensors="pt")
outputs = model(**inputs)
print(outputs[0].shape, input, len(input))
输出:
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
torch.Size([1, 4, 768]) Hello World 11
torch.Size([1, 4, 768]) Hello There 11
torch.Size([1, 4, 768]) Bye Bye 7
torch.Size([1, 23, 768]) Two things are infinite: the universe and human stupidity; and I'm not sure about the universe. 95
如您所见,输出的尺寸随输入的长度而变化。现在假设我想训练一个二元分类器来预测 Einstein 是否说输入句子,并且网络的输入将是 BERT 的预测transformer
。
我如何编写一个[1, None, 768]
在 pytorch 中使用张量的 CNN 模型?似乎第二维随着输入的长度而变化。
解决方案
在 pytorch 中,您不需要固定input dim
CNN。唯一的要求是您的kernel_size
不得小于input_size
.
一般来说,在 Transformer 模型之上放置分类器(序列分类器)的最佳方式是添加池化层+ FC 层。您可以使用全局池化、平均池化或最大池化或自适应池化,然后使用全连接层。
请注意,您还可以使用AutoModelForSequenceClassification为您完成所有工作。
#An example with a simple average pooling
from transformers import AutoTokenizer, AutoModel
import torch
NUM_CLASSES = 1
MAX_LEN = 30
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
model.classifier = torch.nn.Linear(model.pooler.dense.in_features, NUM_CLASSES)
inputs_str = ["Hello World", "Hello There", "Bye Bye", "Two things are infinite: the universe and human stupidity; and I'm not sure about the universe."]
inputs = tokenizer(inputs_str, padding="max_length", return_tensors="pt", max_length=MAX_LEN)
outputs, _ = model(**inputs)
outputs = torch.mean(outputs, dim=1)
outputs = model.classifier(outputs)
print(outputs.shape) #=> (4,1)
推荐阅读
- django - Amazon S3 - 加载资源失败:服务器使用 Django 响应状态为 403(禁止访问)
- javascript - 当数据表中特定列的结果为空时自动隐藏行?
- javascript - 如何在选中或未选中复选框的基础上添加/删除项目?
- enterprise-architect - 使用 SQL 在 Sparx EA 中检索当前选定的对象
- python - 如何在 Visual Studio 中禁用 pydev 调试器?
- regex - 使用 ANT 在下划线之前分隔字符串值的正则表达式
- data-binding - Kotlin CustomView 2 路数据绑定
- reactjs - 未定义不是对象 this.props.navigation - 反应原生
- javascript - 闪亮:insertUI 之前使用 removeUI 删除了 id
- javascript - 当我在最后一张幻灯片上单击下一张/上一张时自动播放幻灯片空白幻灯片