首页 > 解决方案 > 无法加载在 Gensim-pickle 相关错误中训练的模型

问题描述

尝试在 Windows 机器上加载由 Gensim 训练的 word2vec 模型时,我收到以下错误:

AttributeError: Can't get attribute 'EpochProgress' on <module '__main__'>

过去,我在这个系统上成功地用 Gensim 训练了许多模型。唯一的变化是这次我拆分model.build_vocab()model.train()阶段,为每个时期添加保存和时间黑客。我还为词汇构建和训练短语使用了不同的迭代器,但在具有相同标记化管道的相同数据集上。

这是我进行纪元进度跟踪/保存的方式:

class EpochProgress(CallbackAny2Vec):
    '''saves the model after each epoch'''

    def __init__(self, path_prefix):
        self.path_prefix = path_prefix
        self.epoch = 0
        self.start_time = time.time()

    def on_epoch_begin(self, model):
        print("epoch #{} started".format(self.epoch))

    def on_epoch_end(self, model):
        print("epoch #{} completed".format(self.epoch))
        passed = (time.time() - self.start_time)/60/60 # elapsed time since start in HOURS
        print("{} hours have passed".format(str(passed)))
        output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
        model.save(output_path)
        print("model saved at: {}".format(output_path))
        self.epoch +=1

epoch_progress = EpochProgress('E:/jade_prism/embeddings/phrase-embed-over- time/mega_WOS_word2vec/w2v_models/in_progress/')

然后,我使用 vocab 构建加载基线模型并设置一些参数:

model = gensim.models.Word2Vec.load(baseline_models_directory+chosen_name)
model.window = window
model.size = size
model.workers = workers 
model.callbacks = [epoch_progress]

然后我做这样的训练:

model.train(corpus, total_examples=model.corpus_count, epochs=epochs)

最后,像这样保存最终产品:

model.save('E:/w2v_models/trained/{}'.format(new_model_filename))

训练似乎工作正常,模型按预期保存 - 不幸的是现在我无法加载它。

这是完整的调试读数:

> AttributeError                            Traceback (most recent call
> last)
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs)    1329         try:
> -> 1330             model = super(Word2Vec, cls).load(*args, **kwargs)    1331 
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, *args, **kwargs)    1243         """
> -> 1244         model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)    1245         if not hasattr(model,
> 'ns_exponent'):
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, fname_or_handle, **kwargs)
>     602         """
> --> 603         return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
>     604 
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> load(cls, fname, mmap)
>     425 
> --> 426         obj = unpickle(fname)
>     427         obj._load_specials(fname, mmap, compress, subname)
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> unpickle(fname)    1383         if sys.version_info > (3, 0):
> -> 1384             return _pickle.load(f, encoding='latin1')    1385         else:
> 
> AttributeError: Can't get attribute 'EpochProgress' on <module
> '__main__'>
> 
> During handling of the above exception, another exception occurred:
> 
> AttributeError                            Traceback (most recent call
> last) <ipython-input-4-0206f9f8f3ad> in <module>
>       3 
>       4 # Load the model based onthe model name
> ----> 5 model = gensim.models.Word2Vec.load(model_name)
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs)    1339             logger.info('Model
> saved using code from earlier Gensim Version. Re-loading old model in
> a compatible way.')    1340             from
> gensim.models.deprecated.word2vec import load_old_word2vec
> -> 1341             return load_old_word2vec(*args, **kwargs)    1342     1343 
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load_old_word2vec(*args, **kwargs)
>     170 
>     171 def load_old_word2vec(*args, **kwargs):
> --> 172     old_model = Word2Vec.load(*args, **kwargs)
>     173     vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
>     174     params = {
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load(cls, *args, **kwargs)    1639     @classmethod    1640     def
> load(cls, *args, **kwargs):
> -> 1641         model = super(Word2Vec, cls).load(*args, **kwargs)    1642         # update older models    1643         if hasattr(model,
> 'table'):
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in load(cls, fname, mmap)
>      85         compress, subname = SaveLoad._adapt_by_suffix(fname)
>      86 
> ---> 87         obj = unpickle(fname)
>      88         obj._load_specials(fname, mmap, compress, subname)
>      89         logger.info("loaded %s", fname)
> 
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in unpickle(fname)
>     377             b'gensim.models.wrappers.fasttext', b'gensim.models.deprecated.fasttext_wrapper')
>     378         if sys.version_info > (3, 0):
> --> 379             return _pickle.loads(file_bytes, encoding='latin1')
>     380         else:
>     381             return _pickle.loads(file_bytes)
> 
> AttributeError: Can't get attribute 'EpochProgress' on module '__main__'\>

标签: python-3.xnlpgensim

解决方案


Python pickling/unpickling 在保存代码块或保存之前定义的类/类实例时可能会遇到问题,但在加载时可能不可用。(特别是匿名或全局范围的类型不是从显式路径导入的。)

这是 gensim 模型保存的一个已知问题,未来的版本可能会完全避免将此类回调代码存储在模型中。(相反,您必须在每次使用回调执行方法时指定回调,并且它们只会在该一次调用中保持有效。)

有关更多详细信息,请参阅gensim 项目问题 #2136,包括似乎帮助其他人重新加载他们的模型的解决方法:确保在EpochProgress尝试加载的地方定义/导入相同的类。


推荐阅读