首页 > 解决方案 > 如何从列表中获取前“n”个最常用的单词?

问题描述

我有两个清单。每个列表都包含单词。有些词对两个列表都是通用的,有些则不是。我只想输出 20 个最常用的词,但我的代码显示了所有常用词。我想将范围限制为 20。我不允许使用 COUNTER。

def countwords(lst):
    dct = {}
    for word in lst:
        dct[word] = dct.get(word, 0) + 1
    return dct


count1 = countwords(finallist1)
count2 = countwords(finallist2)

words1 = set(count1.keys())
words2 = set(count2.keys())

common_words = words1.intersection(words2)
for i,w in enumerate (common_words,1):
    print(f"{i}\t{w}\t{count1[w]}\t{count2[w]}\t{count1[w] + count2[w]}")

预期输出:

common   f1 f2 sum 
1 program 5 10 15 
2 python  2  4  6 
.
.
until 20

标签: pythonpython-3.xlist

解决方案


您可以使用.most_common()ofcollections.Counter来实现这一点:

>>> from collections import Counter
>>> word_list = ["one", "two", "three", "four", "two", "three", "four", "three", "four", "four"]

>>> Counter(word_list).most_common(2)
[('four', 4), ('three', 3)]

Counter().most_common()文档

返回“n”个最常见元素的列表及其从最常见到最少的计数。如果“n”被省略或没有,most_common() 返回计数器中的所有元素。具有相同计数的元素按最先遇到的顺序排序


这是在不导入任何模块的情况下实现相同目的的替代方法

# Step 1: Create Counter dictionary holding frequency. 
#         Similar to: `collections.Counter()` 
my_counter = {}
for word in word_list:
    my_counter[word] = my_counter.get(word, 0) + 1

# where `my_counter` will hold:
# {'four': 4, 'three': 3, 'two': 2, 'one': 1}
#-------------

# Step 2: Get sorted list holding word & frequency in descending order.
#         Similar to: `Counter.most_common()`
sorted_frequency = sorted(my_counter.iteritems(), key=lambda x: x[1], reverse=True)

# where `sorted_frequency` will hold:
# [('four', 4), ('three', 3), ('two', 2), ('one', 1)]
#-------------

# Step 3: Get top two words by slicing the ordered list from Step 2.
#         Similar to: `.most_common(2)`
top_two = sorted_frequency[:2]

# where `top_two` will hold:
# [('four', 4), ('three', 3)]

请参阅上面代码片段中的注释以获取分步说明。


推荐阅读