首页 > 解决方案 > AssertionError: Padding_idx 必须在 num_embeddings 内

问题描述

我的一些旧代码在过去 2 个月内运行良好,直到今天突然开始出现此错误。我不知道发生了什么变化,因为我没有接触过这段代码。但我不明白这个新错误:

INFO:pytorch_transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-clm-ende-1024-vocab.json from cache at /root/.cache/torch/pytorch_transformers/6e42a59f5e60f1efc6116fd1a2c05a72ecf713a3022b9c274b727ed6469e6ac1.2c29a4b393decdd458e6a9744fa1d6b533212e4003a4012731d2bc2261dc35f3
INFO:pytorch_transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-merges.txt from cache at /root/.cache/torch/pytorch_transformers/85d878ffb1bc2c3395b785d10ce7fc91452780316140d7a26201d7a912483e44.42fa32826c068642fdcf24adbf3ef8158b3b81e210a3d03f3102cf5a899f92a0
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-31-c365f437b895> in <module>()
      9 tokenizer = tokenizer_class.from_pretrained(args['model_name'])
     10 
---> 11 model = model_class.from_pretrained(args['model_name'])
     12 model.to(device);
     13 

3 frames
/usr/local/lib/python3.6/dist-packages/pytorch_transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    534 
    535         # Instantiate model.
--> 536         model = cls(config, *model_args, **model_kwargs)
    537 
    538         if state_dict is None and not from_tf:

/usr/local/lib/python3.6/dist-packages/pytorch_transformers/modeling_xlm.py in __init__(self, config)
    842         self.num_labels = config.num_labels
    843 
--> 844         self.transformer = XLMModel(config)
    845         self.sequence_summary = SequenceSummary(config)
    846 

/usr/local/lib/python3.6/dist-packages/pytorch_transformers/modeling_xlm.py in __init__(self, config)
    543         if config.n_langs > 1 and config.use_lang_emb:
    544             self.lang_embeddings = nn.Embedding(self.n_langs, self.dim)
--> 545         self.embeddings = nn.Embedding(self.n_words, self.dim, padding_idx=self.pad_index)
    546         self.layer_norm_emb = nn.LayerNorm(self.dim, eps=config.layer_norm_eps)
    547 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/sparse.py in __init__(self, num_embeddings, embedding_dim, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse, _weight)
     86         if padding_idx is not None:
     87             if padding_idx > 0:
---> 88                 assert padding_idx < self.num_embeddings, 'Padding_idx must be within num_embeddings'
     89             elif padding_idx < 0:
     90                 assert padding_idx >= -self.num_embeddings, 'Padding_idx must be within num_embeddings'

AssertionError: Padding_idx must be within num_embeddings

有人可以阐明可能发生的事情吗?

非常感谢!

标签: pythonpytorch

解决方案


https://discuss.pytorch.org/t/assertionerror-padding-idx-must-be-within-num-embeddings/78295?u=lenyabloko

事实证明,一些类已从 pytorch_transformers 包中移出到转换器中。我仍然不得不将 pytorch_transformers 用于其他类。只需要在下面的import语句中替换包名

from pytorch_transformers import XLMConfig, XLMTokenizer, ...

推荐阅读