python - KeyError("单词 '%s' 不在词汇表中" % word)
问题描述
在将我的预测标签从图像转换为列表 all_tags 并随后将它们拆分并最终存储到 word_list 中,其中所有标签都存储在类似句子的结构中。
我想做的就是使用 Google 的 Word2Vec 预训练模型(https://mccormickml.com/2016/04/12/googles-pretrained-word2vec-model-in-python/)来生成和打印所有 Word2Vec 值我预测的标签。导入并映射模型的预训练权重,但出现错误
KeyError:“单词'['cliff''不在词汇表中”
但是,字典中可以找到“悬崖”一词。任何见解将不胜感激。请检查下面的代码片段以供参考。
execution_path = os.getcwd()
TEST_PATH = '/home/guest/Documents/Aikomi'
prediction = ImagePrediction()
prediction.setModelTypeAsDenseNet()
prediction.setModelPath(os.path.join(execution_path, "/home/guest/Documents/Test1/ImageAI-master/imageai/Prediction/Weights/DenseNet.h5"))
prediction.loadModel()
pred_array = np.empty((0,6), dtype=object)
predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "1.jpg"), result_count=5)
for img in os.listdir(TEST_PATH):
if img.endswith('.jpg'):
image = Image.open(os.path.join(TEST_PATH, img))
image = image.convert("RGB")
image = np.array(image, dtype=np.uint8)
predictions, probabilities = prediction.predictImage(os.path.join(TEST_PATH, img), result_count=5)
temprow = np.zeros((1,pred_array.shape[1]),dtype=object)
temprow[0,0] = img
for i in range(len(predictions)):
temprow[0,i+1] = predictions[i]
pred_array = np.append(pred_array, temprow, axis=0)
all_tags = list(pred_array[:,1:].reshape(1,-1))
_in_sent = ' '.join(list(map(str, all_tags)))
import gensim
from gensim.models import Word2Vec
from nltk.tokenize import sent_tokenize, word_tokenize
import re
import random
import nltk
nltk.download('punkt')
word_list = _in_sent.split()
from gensim.corpora.dictionary import Dictionary
# be sure to split sentence before feed into Dictionary
word_list_2 = [d.split() for d in word_list]
dictionary = Dictionary(word_list_2)
print("\n", dictionary, "\n")
corpus_bow = [dictionary.doc2bow(doc) for doc in word_list_2]
model = Word2Vec(word_list_2, min_count= 1)
model = gensim.models.KeyedVectors.load_word2vec_format('/home/guest/Downloads/Google.bin', binary=True)
print(*map(model.most_similar, word_list))
解决方案
The answer is right there, you very clearly printed
KeyError(“word '%s' not in vocabulary” % word)
and the error is
KeyError: "word '['cliff'' not in vocabulary"
Since the contents of the variable word should be between ' and '
Hence the word variable has the string ['cliff'
not the string cliff
Remove punctuation from your text, like ' and [ ] etc.
推荐阅读
- node.js - 如何删除特定频道中的最后一条消息?
- python - 即使机器人重新加载,也会保留记忆角色反应消息的系统
- typescript - 打字稿的苗条事件参数类型
- javascript - 如何检查安卓设备上的元素
- javascript - 向 Spotify 发出 GET 请求
- python - 由其角点定义的边界框对象的嵌套属性
- reactjs - 类型'IntrinsicAttributes & RefAttributes 上不存在属性'item'
>'.ts(2322) - c++ - 矩阵C ++每行中的最小元素
- python - Python,打开一张照片 WIndows Live Photo Gallery 并在最后关闭它
- javascript - 在云功能冷启动期间,node_modules 文件夹的大小是否重要?