首页 > 解决方案 > 我在 reducer 输出中得到一个列表列表,而不是成对的值,我不确定在我的代码中要更改什么

问题描述

下面的代码给了我几乎我想要但不完全的输出。

def reducer(self, year, words):
        x = Counter(words)
        most_common = x.most_common(3) 
        sorted(x, key=x.get, reverse=True)    
        yield (year, most_common)

这给了我输出

"2020" [["coronavirus",4],["economy",2],["china",2]]

我希望它给我的是

"2020" "coronavirus china economy"

如果有人可以向我解释为什么我得到一个列表而不是我需要的输出,我将不胜感激。以及如何改进代码以获得我需要的东西的想法。

标签: pythoncounterreducersmrjob

解决方案


从文档中Counter.most_common解释了为什么你会得到一个列表列表。

most_common(n=None) method of collections.Counter instance
    List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.
    
    >>> Counter('abracadabra').most_common(3)
    [('a', 5), ('b', 2), ('r', 2)]

因为从最高频率到最低频率排序就像按降序排序,但按字母顺序排序是升序,您可以使用自定义元组,在其中取频率的负数并按升序对所有内容进行排序。

from collections import Counter

words = Counter(['coronavirus'] * 4 + ['economy'] * 2 + ['china'] * 2 + ['whatever'])
x = Counter(words)
most_common = x.most_common(3)
# After sorting you need to discard the freqency from each (word, freq) tuple
result = ' '.join(word for word, _ in sorted(most_common, key=lambda x: (-x[1], x[0])))

推荐阅读