首页 > 解决方案 > 计算三个单词的频率

问题描述

我有下面的代码来查找两个单词短语的频率。我需要对三个单词短语做同样的事情。

但是,下面的代码似乎不适用于 3 个单词的短语。

from collections import Counter
import re

sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
words = re.findall(r'\w+', sentence)
two_words = [' '.join(ws) for ws in zip(words, words[1:])]
wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}

标签: pythonstringpython-3.xcounter

解决方案


您可以collections.Counter在可迭代的 3 字分组上使用。后者是通过生成器理解和列表切片构建的。

from collections import Counter

three_words = (words[i:i+3] for i in range(len(words)-2))
counts = Counter(map(tuple, three_words))
wordscount = {' '.join(word): freq for word, freq in counts.items() if freq > 1}

print(wordscount)

{'show makes me': 2}

请注意,我们str.join直到最后才使用,以避免不必要的重复字符串操作。此外,由于键必须是可散列的,tuple因此需要进行转换。Counterdict


推荐阅读