首页 > 解决方案 > 如何在变压器和火炬中使用句子 bert

问题描述

我想使用sentence_transformers
但由于政策限制我无法安装 package sentence-transformers

我有变压器和手电筒包。

我去了这个页面并尝试运行以下代码

在此之前,我去了页面并下载了所有文件

import os
path="/yz/sentence-transformers/multi-qa-mpnet-base-dot-v1/" #local path where I have stored files
os.listdir(path)

['.dominokeep',
 'config.json',
 'data_config.json',
 'modules.json',
 'sentence_bert_config.json',
 'special_tokens_map.json',
 'tokenizer_config.json',
 'train_script.py',
 'vocab.txt',
 'tokenizer.json',
 'config_sentence_transformers.json',
 'README.md',
 'gitattributes',
 '9e1e76b7a067f72e49c7f571cd8e811f7a1567bec49f17e5eaaea899e7bc2c9e']

我运行的代码是

from transformers import AutoTokenizer, AutoModel
import torch

# Load model from HuggingFace Hub

path="/yz/sentence-transformers/multi-qa-mpnet-base-dot-v1/"

"""tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")"""

tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModel.from_pretrained(path)

我得到的错误如下

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-bb33f7c519e0> in <module>
     32 model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")"""
     33 
---> 34 tokenizer = AutoTokenizer.from_pretrained(path)
     35 model = AutoModel.from_pretrained(path)
     36 

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    308         config = kwargs.pop("config", None)
    309         if not isinstance(config, PretrainedConfig):
--> 310             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
    311 
    312         if "bert-base-japanese" in str(pretrained_model_name_or_path):

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    342 
    343         if "model_type" in config_dict:
--> 344             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    345             return config_class.from_dict(config_dict, **kwargs)
    346         else:

KeyError: 'mpnet'

我的问题:

  1. 我应该如何解决这个错误?
  2. 有没有办法对MiniLM-L6-H384-uncased -使用相同的方法。我想使用它,因为它似乎更快

=============================== 软件包版本如下 -

transformers - 4.0.0
torch - 1.4.0

标签: nlphuggingface-transformerstransformersentence-similaritysentence-transformers

解决方案


答案很简单,您不能在 pytorch 1.4.0 中使用“MiniLM-L6-H384-uncased”模型

print(torch.__version__)
# 1.4.0

torch.load("/content/MiniLM-L6-H384-uncased/pytorch_model.bin", location="cpu")

"""RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED 
at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to 
PyTorch. Attempted to read a PyTorch file with version 3, but the maximum 
supported version for reading is 2. Your PyTorch installation may be too old. 
(init at /pytorch/caffe2/serialize/inline_container.cc:132)"""

推荐阅读