首页 > 解决方案 > 如何将预训练的 fastText 向量转换为 gensim 模型

问题描述

如何将预训练的 fastText 向量转换为 gensim 模型?我需要 predict_output_word 方法。

从 gensim.models 导入 gensim 从 gensim.models.wrappers 导入 Word2Vec 导入 FastText

model_wiki = gensim.models.KeyedVectors.load_word2vec_format("wiki.ru.vec") model3 = Word2Vec(sentences=model_wiki)

TypeError Traceback (most recent call last) in ----> 1 model3 = Word2Vec(sentences=model_wiki) # 从语料库中训练一个模型

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py init (self, sentence, corpus_file, size, alpha, window, min_count, max_vocab_size, sample, seed, workers, min_alpha,sg,hs,负数,ns_exponent,cbow_mean,hashfxn,iter,null_word,trim_rule,sorted_vocab,batch_words,compute_loss,回调,max_final_vocab)765 个回调=回调,batch_words=batch_words,trim_rule=trim_rule,sg=sg,alpha=alpha , window=window, 766 seed=seed, hs=hs,negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha, compute_loss=compute_loss, --> 767 fast_version=FAST_VERSION) 768 769 def _do_train_epoch(self, corpus_file, thread_id, offset , cython_vocab, thread_private_mem, cur_epoch,

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/base_any2vec.py in init (self, sentence, corpus_file, workers, vector_size, epochs, callbacks, batch_words, trim_rule, sg, alpha, window, seed, hs,negative, ns_exponent, cbow_mean, min_alpha, compute_loss, fast_version, **kwargs) 757 raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.") 758 --> 759 self.build_vocab(sentences=sentences,corpus_file=corpus_file,trim_rule=trim_rule)760 self.train(761句=sentences,corpus_file=corpus_file,total_examples=self.corpus_count,

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/base_any2vec.py in build_vocab(自我,句子,corpus_file,更新,progress_per,keep_raw_vocab,trim_rule,**kwargs)934“” 935 total_words,corpus_count = self.vocabulary.scan_vocab(--> 936 个句子=sentences,corpus_file=corpus_file,progress_per=progress_per,trim_rule=trim_rule) 937 self.corpus_count = corpus_count 938 self.corpus_total_words = total_words

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py 在 scan_vocab(self,sentences,corpus_file,progress_per,workers,trim_rule)1569 个句子 = LineSentence(corpus_file)
1570 ->第1571章 总字数,语料库数= self._scan_vocab(句子,progress_per,trim_rule)1572 1573 logger.info(

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/word2vec.py in _scan_vocab(自我,句子,progress_per,trim_rule)1538
vocab = defaultdict(int)1539 checked_string_types = 0 -> 1540对于sentence_no,枚举中的句子(句子):1541 如果未检查_string_types:1542
如果isinstance(sentence,string_types):

~/anaconda3/envs/pym/lib/python3.6/site-packages/gensim/models/keyedvectors.py in getitem (self,entities) 337 return self.get_vector(entities) 338 --> 339 return vstack([self .get_vector(entity) for entity in entity]) 340 341 def包含(self,entity):

TypeError:“int”对象不可迭代

标签: pythonnlpgensimword2vec

解决方案


根据 Gensim 文档,您可以使用gensim.models.wrappers函数:

从 Facebook 的原生 fasttext .bin 和 .vec 输出文件加载输入隐藏的权重矩阵

这是一个例子:

from gensim.models.wrappers import FastText

model = FastText.load_fasttext_format('wiki.vec')

推荐阅读