python - 嵌入词向量之间的距离
问题描述
我需要找到单词4G
、5G 和mobile phones
or之间的关系Internet
,以便将有关技术的句子聚集在一起。根据一个建议,我为此使用了 word2vec。
我试过如下:
from time import time
start_nb = time()
# Initialize logging.
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s')
sentence_1 = '4G is the fourth generation of broadband network.'
sentence_2 = 'I bought a new mobile phone. '
sentence_1 = sentence_1.lower().split()
sentence_2 = sentence_2.lower().split()
# Import and download stopwords from NLTK.
from nltk.corpus import stopwords
from nltk import download
download('stopwords') # Download stopwords list.
# Remove stopwords.
stop_words = stopwords.words('english')
sentence_1 = [w for w in sentence_1 if w not in stop_words]
sentence_2 = [w for w in sentence_2 if w not in stop_words]
start = time()
import os
from gensim.models import Word2Vec
if not os.path.exists('/data/w2v_googlenews/GoogleNews-vectors-negative300.bin.gz'):
raise ValueError("SKIP: You need to download the google news model")
model = Word2Vec.load_word2vec_format('/data/w2v_googlenews/GoogleNews-vectors-negative300.bin.gz', binary=True)
distance = model.wmdistance(sentence_1, sentence_2)
print 'distance = %.4f' % distance
sentence_3 = '5G is dangerous!'
sentence_3 = sentence_3.lower().split()
sentence_3 = [w for w in sentence_3 if w not in stop_words]
distance = model.wmdistance(sentence_1, sentence_3)
distance = model.wmdistance(sentence_2, sentence_3)
结果未显示 4G、5G 和手机之间的相关连接。我想直观地展示他们的相关联系,例如在这个情节中
但是用4G、5G和手机。
然而,主要问题(这是我的问题)是如何改善这些词之间的联系/距离。
解决方案
推荐阅读
- epplus - EPPlus 对生成的表进行排序
- amazon-web-services - 如何从 DynamoDB 获取每个主分区键的最新数据?
- c# - 结合 InstanceContextMode.Single 和 InstanceContextMode.PerCall/PerSession 功能
- java - 如何在 Java 中使用 log4j 更新修改的日期和时间?
- mysql - SQL 从具有 2 个表和条件的查找表中选择数据
- c# - 如何在 C# 中使用自定义域名获取当前 PathName?
- c# - 交换页面时 Xamarin 内存泄漏
- java - 有没有办法在画布中使用 .obj 文件(3D 图像)?
- debugging - 调试器如何跟踪 C 代码和汇编指令之间的映射?
- python - 当我在 python 中编写文本文件时,会自动添加空格