python-3.x - 无法加载在 Gensim-pickle 相关错误中训练的模型
问题描述
尝试在 Windows 机器上加载由 Gensim 训练的 word2vec 模型时,我收到以下错误:
AttributeError: Can't get attribute 'EpochProgress' on <module '__main__'>
过去,我在这个系统上成功地用 Gensim 训练了许多模型。唯一的变化是这次我拆分model.build_vocab()
和model.train()
阶段,为每个时期添加保存和时间黑客。我还为词汇构建和训练短语使用了不同的迭代器,但在具有相同标记化管道的相同数据集上。
这是我进行纪元进度跟踪/保存的方式:
class EpochProgress(CallbackAny2Vec):
'''saves the model after each epoch'''
def __init__(self, path_prefix):
self.path_prefix = path_prefix
self.epoch = 0
self.start_time = time.time()
def on_epoch_begin(self, model):
print("epoch #{} started".format(self.epoch))
def on_epoch_end(self, model):
print("epoch #{} completed".format(self.epoch))
passed = (time.time() - self.start_time)/60/60 # elapsed time since start in HOURS
print("{} hours have passed".format(str(passed)))
output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
model.save(output_path)
print("model saved at: {}".format(output_path))
self.epoch +=1
epoch_progress = EpochProgress('E:/jade_prism/embeddings/phrase-embed-over- time/mega_WOS_word2vec/w2v_models/in_progress/')
然后,我使用 vocab 构建加载基线模型并设置一些参数:
model = gensim.models.Word2Vec.load(baseline_models_directory+chosen_name)
model.window = window
model.size = size
model.workers = workers
model.callbacks = [epoch_progress]
然后我做这样的训练:
model.train(corpus, total_examples=model.corpus_count, epochs=epochs)
最后,像这样保存最终产品:
model.save('E:/w2v_models/trained/{}'.format(new_model_filename))
训练似乎工作正常,模型按预期保存 - 不幸的是现在我无法加载它。
这是完整的调试读数:
> AttributeError Traceback (most recent call
> last)
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs) 1329 try:
> -> 1330 model = super(Word2Vec, cls).load(*args, **kwargs) 1331
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, *args, **kwargs) 1243 """
> -> 1244 model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs) 1245 if not hasattr(model,
> 'ns_exponent'):
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, fname_or_handle, **kwargs)
> 602 """
> --> 603 return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
> 604
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> load(cls, fname, mmap)
> 425
> --> 426 obj = unpickle(fname)
> 427 obj._load_specials(fname, mmap, compress, subname)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> unpickle(fname) 1383 if sys.version_info > (3, 0):
> -> 1384 return _pickle.load(f, encoding='latin1') 1385 else:
>
> AttributeError: Can't get attribute 'EpochProgress' on <module
> '__main__'>
>
> During handling of the above exception, another exception occurred:
>
> AttributeError Traceback (most recent call
> last) <ipython-input-4-0206f9f8f3ad> in <module>
> 3
> 4 # Load the model based onthe model name
> ----> 5 model = gensim.models.Word2Vec.load(model_name)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs) 1339 logger.info('Model
> saved using code from earlier Gensim Version. Re-loading old model in
> a compatible way.') 1340 from
> gensim.models.deprecated.word2vec import load_old_word2vec
> -> 1341 return load_old_word2vec(*args, **kwargs) 1342 1343
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load_old_word2vec(*args, **kwargs)
> 170
> 171 def load_old_word2vec(*args, **kwargs):
> --> 172 old_model = Word2Vec.load(*args, **kwargs)
> 173 vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
> 174 params = {
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load(cls, *args, **kwargs) 1639 @classmethod 1640 def
> load(cls, *args, **kwargs):
> -> 1641 model = super(Word2Vec, cls).load(*args, **kwargs) 1642 # update older models 1643 if hasattr(model,
> 'table'):
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in load(cls, fname, mmap)
> 85 compress, subname = SaveLoad._adapt_by_suffix(fname)
> 86
> ---> 87 obj = unpickle(fname)
> 88 obj._load_specials(fname, mmap, compress, subname)
> 89 logger.info("loaded %s", fname)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in unpickle(fname)
> 377 b'gensim.models.wrappers.fasttext', b'gensim.models.deprecated.fasttext_wrapper')
> 378 if sys.version_info > (3, 0):
> --> 379 return _pickle.loads(file_bytes, encoding='latin1')
> 380 else:
> 381 return _pickle.loads(file_bytes)
>
> AttributeError: Can't get attribute 'EpochProgress' on module '__main__'\>
解决方案
Python pickling/unpickling 在保存代码块或保存之前定义的类/类实例时可能会遇到问题,但在加载时可能不可用。(特别是匿名或全局范围的类型不是从显式路径导入的。)
这是 gensim 模型保存的一个已知问题,未来的版本可能会完全避免将此类回调代码存储在模型中。(相反,您必须在每次使用回调执行方法时指定回调,并且它们只会在该一次调用中保持有效。)
有关更多详细信息,请参阅gensim 项目问题 #2136,包括似乎帮助其他人重新加载他们的模型的解决方法:确保在EpochProgress
尝试加载的地方定义/导入相同的类。
推荐阅读
- php - AngularJS 检索在控制器中的刀片处初始化的数据
- java - 使用 GridBagLayout 在 JPanel 中居中组件
- x86 - 当线程可能切换内核时,如何正确使用 TSX-NI(HLE 和 RTM)?
- c# - 生成子列表
- apache-flink - 如何加入两个流式 Flink 表并保留时间戳信息
- java - 如何使用 Java 使 Edge 的会话无效?
- junit - java.lang.AssertionError:JSON 路径期望不同的结果
- python - 如何将 .py 编译为 .dll
- java - 未找到 Maven“资源”文件夹
- javascript - 在验证脚本中的 AppMaker 中设置字段值