首页 > 解决方案 > 使用 SimLex-999 评估 word2vec 模型

问题描述

我已经用 Gensim 训练了我的模型。现在我想用 simlexx-999 评估我的模型,但它给了我错误。我的代码。

model.wv.evaluate_word_analogies('SimLex-999.txt')
2019-08-25 13:43:22,766 : INFO : Evaluating word analogies for top 300000 words in the model on SimLex-999.txt

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-60cb96c45579> in <module>()
----> 1 model.wv.evaluate_word_analogies('SimLex-999.txt')

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_analogies(self, analogies, restrict_vocab, case_insensitive, dummy4unknown)
   1088             else:
   1089                 if not section:
-> 1090                     raise ValueError("Missing section header before line #%i in %s" % (line_no, analogies))
   1091                 try:
   1092                     if case_insensitive:

ValueError: Missing section header before line #0 in SimLex-999.txt

我努力了

from gensim.test.utils import datapath

similarities = model.evaluate_word_pairs(datapath('SimLex-999.txt'))

print(similarities)

但它给了我keyError。请帮我解决问题。

KeyError                                  Traceback (most recent call last)
<ipython-input-29-caeb682cb7ff> in <module>()
      1 from gensim.test.utils import datapath
      2 
----> 3 similarities = model.wv.evaluate_word_pairs(datapath('SimLex-999.txt'),dummy4unknown=True)
      4 
      5 print(similarities)

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_pairs(self, pairs, delimiter, restrict_vocab, case_insensitive, dummy4unknown)
   1287 
   1288         """
-> 1289         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
   1290         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
   1291 

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in <listcomp>(.0)
   1287 
   1288         """
-> 1289         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
   1290         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
   1291 

KeyError: 'movie'

标签: python-3.xgensimword2vec

解决方案


SimLex-999.txt似乎不是适合作为evaluate_word_analogies()函数参数的单词类比列表。

你试过这个evaluate_word_pairs()功能吗?它的描述在:

https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.evaluate_word_pairs


推荐阅读