首页 > 解决方案 > Python:在列表列表中查找最频繁出现的任意长度组合

问题描述

如何在列表列表中查找出现次数最多的组合。组合长度可以是任意的。

所以,样本数据:

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]

预期输出:

'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

在 的帮助下this post,我能够找出出现次数最多的对(2 的组合),但是如何扩展它可以找到任意长度的组合。

编辑:根据@CrazyChucky 的评论:

样本输入:

l = [['action','mystery','horror','thriller'],
     ['drama','romance'],
     ['comedy','drama','romance'],
     ['scifi','mystery','horror','thriller'],
     ['horror','mystery','thriller'],
     ['mystery','horror']]

预期输出:

'mystery','horror' - 4 times
'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

标签: pythonpython-3.xlist

解决方案


您可以调整该问题的代码以迭代每个子列表中每个可能大小的所有可能组合:

from collections import Counter
from itertools import combinations

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]
d  = Counter()
for sub in l:
    if len(sub) < 2:
        continue
    sub.sort()
    for sz in range(2, len(sub)+1):
        for comb in combinations(sub, sz):
            d[comb] += 1

print(d.most_common())

输出:

[
 (('horror', 'mystery'), 3),
 (('horror', 'thriller'), 3),
 (('mystery', 'thriller'), 3),
 (('horror', 'mystery', 'thriller'), 3),
 (('drama', 'romance'), 2),
 (('action', 'horror'), 1),
 (('action', 'mystery'), 1),
 (('action', 'thriller'), 1),
 (('action', 'horror', 'mystery'), 1),
 (('action', 'horror', 'thriller'), 1),
 (('action', 'mystery', 'thriller'), 1),
 (('action', 'horror', 'mystery', 'thriller'), 1),
 (('comedy', 'drama'), 1),
 (('comedy', 'romance'), 1),
 (('comedy', 'drama', 'romance'), 1),
 (('horror', 'scifi'), 1),
 (('mystery', 'scifi'), 1),
 (('scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi'), 1),
 (('horror', 'scifi', 'thriller'), 1),
 (('mystery', 'scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi', 'thriller'), 1)
]

要获得计数最高的类型,您可以遍历计数器:

most_frequent = [g for g, cnt in d.items() if cnt == d.most_common(1)[0][1]]

推荐阅读