python - 如何将预训练的 fastText 向量转换为 gensim 模型
问题描述
如何将预训练的 fastText 向量转换为 gensim 模型?我需要 predict_output_word 方法。
从 gensim.models 导入 gensim 从 gensim.models.wrappers 导入 Word2Vec 导入 FastText
model_wiki = gensim.models.KeyedVectors.load_word2vec_format("wiki.ru.vec") model3 = Word2Vec(sentences=model_wiki)
TypeError Traceback (most recent call last) in ----> 1 model3 = Word2Vec(sentences=model_wiki) # 从语料库中训练一个模型
~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py init (self, sentence, corpus_file, size, alpha, window, min_count, max_vocab_size, sample, seed, workers, min_alpha,sg,hs,负数,ns_exponent,cbow_mean,hashfxn,iter,null_word,trim_rule,sorted_vocab,batch_words,compute_loss,回调,max_final_vocab)765 个回调=回调,batch_words=batch_words,trim_rule=trim_rule,sg=sg,alpha=alpha , window=window, 766 seed=seed, hs=hs,negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha, compute_loss=compute_loss, --> 767 fast_version=FAST_VERSION) 768 769 def _do_train_epoch(self, corpus_file, thread_id, offset , cython_vocab, thread_private_mem, cur_epoch,
~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/base_any2vec.py in init (self, sentence, corpus_file, workers, vector_size, epochs, callbacks, batch_words, trim_rule, sg, alpha, window, seed, hs,negative, ns_exponent, cbow_mean, min_alpha, compute_loss, fast_version, **kwargs) 757 raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.") 758 --> 759 self.build_vocab(sentences=sentences,corpus_file=corpus_file,trim_rule=trim_rule)760 self.train(761句=sentences,corpus_file=corpus_file,total_examples=self.corpus_count,
~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/base_any2vec.py in build_vocab(自我,句子,corpus_file,更新,progress_per,keep_raw_vocab,trim_rule,**kwargs)934“” 935 total_words,corpus_count = self.vocabulary.scan_vocab(--> 936 个句子=sentences,corpus_file=corpus_file,progress_per=progress_per,trim_rule=trim_rule) 937 self.corpus_count = corpus_count 938 self.corpus_total_words = total_words
~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py 在 scan_vocab(self,sentences,corpus_file,progress_per,workers,trim_rule)1569 个句子 = LineSentence(corpus_file)
1570 ->第1571章 总字数,语料库数= self._scan_vocab(句子,progress_per,trim_rule)1572 1573 logger.info(~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py in _scan_vocab(自我,句子,progress_per,trim_rule)1538
vocab = defaultdict(int)1539 checked_string_types = 0 -> 1540对于sentence_no,枚举中的句子(句子):1541 如果未检查_string_types:1542
如果isinstance(sentence,string_types):~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/keyedvectors.py in getitem (self,entities) 337 return self.get_vector(entities) 338 --> 339 return vstack([self .get_vector(entity) for entity in entity]) 340 341 def包含(self,entity):
TypeError:“int”对象不可迭代
解决方案
根据 Gensim 文档,您可以使用gensim.models.wrappers
函数:
从 Facebook 的原生 fasttext .bin 和 .vec 输出文件加载输入隐藏的权重矩阵
这是一个例子:
from gensim.models.wrappers import FastText
model = FastText.load_fasttext_format('wiki.vec')
推荐阅读
- c# - 如何将“AzureAd”详细信息显式传递给 AddMicrosoftIdentityWebApi 方法以进行令牌验证
- entity-framework-core - 如何刷新 EntityFramework core 5 模型或查询过滤器缓存?
- wordpress - 自定义主页和自定义索引页?(WordPress)
- javascript - webrtc:mediaDevices.enumerateDevices() 返回空 deviceId
- reactjs - 如何使用函数为 React-Router 制作地图?
- ios - 如何在 SwiftUI 中将项目与 VStack 内的顶部对齐?
- node.js - 使用 sequelize 在 nodejs 和 Postgres 中插入错误
- android - 在android studio中运行我的第一个flutter应用程序时出现问题
- python - Python 是否有 Parquet 等价物?
- android - 如何在不复制和粘贴其父 onMeasure 实现的情况下修改自定义 TextView WRAP_CONTENT 高度?