首页 > 解决方案 > 嵌入词向量之间的距离

问题描述

我需要找到单词4G、5G 和mobile phonesor之间的关系Internet,以便将有关技术的句子聚集在一起。根据一个建议,我为此使用了 word2vec。

我试过如下:

from time import time
start_nb = time()

# Initialize logging.
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s')

sentence_1 = '4G is the fourth generation of broadband network.'
sentence_2 = 'I bought a new mobile phone. '
sentence_1 = sentence_1.lower().split()
sentence_2 = sentence_2.lower().split()

# Import and download stopwords from NLTK.
from nltk.corpus import stopwords
from nltk import download
download('stopwords')  # Download stopwords list.

# Remove stopwords.
stop_words = stopwords.words('english')
sentence_1 = [w for w in sentence_1 if w not in stop_words]
sentence_2 = [w for w in sentence_2 if w not in stop_words]


start = time()
import os

from gensim.models import Word2Vec
if not os.path.exists('/data/w2v_googlenews/GoogleNews-vectors-negative300.bin.gz'):
    raise ValueError("SKIP: You need to download the google news model")
    
model = Word2Vec.load_word2vec_format('/data/w2v_googlenews/GoogleNews-vectors-negative300.bin.gz', binary=True)


distance = model.wmdistance(sentence_1, sentence_2)
print 'distance = %.4f' % distance

sentence_3 = '5G is dangerous!'
sentence_3 = sentence_3.lower().split()
sentence_3 = [w for w in sentence_3 if w not in stop_words]

distance = model.wmdistance(sentence_1, sentence_3)
distance = model.wmdistance(sentence_2, sentence_3)

结果未显示 4G、5G 和手机之间的相关连接。我想直观地展示他们的相关联系,例如在这个情节中

在此处输入图像描述

但是用4G、5G和手机。

然而,主要问题(这是我的问题)是如何改善这些词之间的联系/距离。

标签: pythonnlpword2vecmove-semantics

解决方案


推荐阅读