python - Python字典和文本文件的交集
问题描述
我正在完成一项 NLP 练习,需要一些帮助来了解获得结果的最佳方法。我有两个文本文件,一个是单词列表,比如词汇表,另一个是文章。我需要计算输入文章中我的文本文件列表中每个单词的频率。
我正在尝试一步一步地做到这一点,以便提高我的技能。
我已经导入了文本,对两个文件中的单词进行了标记/拆分,现在我将文章中的单词放入字典中。
我的下一步是找到字典和单词列表文本文件的交集(我假设),并返回我的文章中存在多少单词条目的频率。
wordlist = terms.split()
splittext = input_article.split()
freq = {}
for term in splittext:
if term in freq:
freq[term] += 1
else: freq[term] = 1
#print(freq)
result = {i for i in wordlist if i in freq.keys()}
print(result)
这个 ^ 是我到目前为止所拥有的,但这是让我卡住的最后一行。我将文章中的所有单词都放在一个字典中......现在我想返回输入文章中每个词汇表条目的频率。
关于如何实现这一目标的任何提示?
解决方案
据我了解,这应该有效:
text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum"
key = "? Lorem Ipsum more was not the with 123 test notin desktop"
dict = {}
dict2 = {}
words = text.split(" ")
keys = key.split(" ")
for word in words:
if word in dict:
dict[word] += 1
else:
dict[word] = 1
for i in range(len(keys)):
if keys[i] in dict.keys():
print("Key: {} freq: {}".format(keys[i], dict[keys[i]]))
dict2[keys[i]] = dict[keys[i]]
print(dict2)
输出:
{'Lorem': 4, 'Ipsum': 4, 'more': 1, 'was': 1, 'not': 1, 'the': 6, 'with': 2, 'desktop': 1}
推荐阅读
- kubernetes - Kustomize metadata.name 为特定种类:
- tensorflow - 如何从 Google AutoML 导出模型
- typescript - 如何在打字稿的 webpack 中使用 seznam mapy?
- javascript - 如何使用 React spring 为输入字段设置动画
- c# - EF Core 获取高于一列但低于另一列的实体
- html - 使用 CSS 将标题标题向右移动
- android-studio - Ubuntu 上的 Android Studio 无法打开
- javascript - 检查本地存储时,暗模式复选框卡住
- playframework - 如何防止 Twirl 从 HTMLentities 对脚本中的字符串进行编码?
- excel - 从不同工作簿运行宏时的不同范围结果