首页 > 解决方案 > 根据条件比较列表列表中的所有列表,并根据它们的差异将它们分组在一起

问题描述

我有以下列表:

a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]

用索引表示它们以便于理解:

0 [1, 2, 3, 4, 5]
1 [4, 5, 6, 7, 8]
2 [1, 2, 3, 4]
3 [4, 5, 6, 7, 8, 9]
4 [2, 3, 4, 5, 6, 7, 8]
5 [6, 7, 8, 9]
6 [5, 6, 7, 8, 9]
7 [2, 3, 4, 5, 6]
8 [3, 4, 5, 6]
9 [11, 12, 13, 14, 15]
10 [13, 14, 15]

我期待的输出将是一个元组列表,如下所示:

output = [(0,2,1), (3,1,1), (4,7,2), (4,1,2), (6,5,1), (3,5,2), (3,6,1), (7,8,1), (9,10,2)]

For example to explain first item of output i.e, (0,2,1):

0 ---> index of list under comparison with highest length 
2 ---> index of list under comparison with lowest length
1 ---> difference in length of the two lists 0 & 2

现在,来解决问题:

我的列表具有相似的项目,它们在列表的开始或结束时的长度相差一个和两个(或三个)。

我想对列表的索引及其作为元组的差异进行排序、分组、识别。

我经历了多个 stackoverflow 问题,但找不到类似的问题。

我是 python 新手,从以下代码开始,然后卡住了:

a = sorted(a, key = len)

incr = [list(g) for k, g in groupby(a, key=len)]

decr = list(reversed(incr))

ndecr = [i for j in decr for i in j]

for i in range(len(ndecr)-1):
    if len(ndecr[i]) - len(ndecr[i+1]) == 1:
        print(ndecr[i])

for i in range(len(ndecr)-2):
    if len(ndecr[i]) - len(ndecr[i+2]) == 2:
        print(ndecr[i])

for i in ndecr:
    ele = i
    ndecr.remove(i)
    for j in ndecr:
        if ele[:-1] == j:
            print(j)   

for i in ndecr:
    ele = i
    ndecr.remove(i)
    for j in ndecr:
        if ele[:-2] == j:
            print(i)

请帮助我采用实现输出的方法。

标签: pythonarrayspython-3.xlistnested-lists

解决方案


IIUC, assuming that the total number of lists is small so that len(lists)^2 is still small, something like

from itertools import combinations

# sort by length but preserve the index
ax = sorted(enumerate(a), key=lambda x: len(x[1]))

done = []

for (i0, seq0), (i1, seq1) in combinations(ax, 2):
    if seq1[:len(seq0)] == seq0 or seq1[-len(seq0):] == seq0:
       done.append((i1, i0, len(seq1)-len(seq0)))

gives me

In [117]: sorted(done)
Out[117]: 
[(0, 2, 1),
 (3, 1, 1),
 (3, 5, 2),
 (3, 6, 1),
 (4, 1, 2),
 (4, 7, 2),
 (6, 5, 1),
 (7, 8, 1),
 (9, 10, 2)]

which matches your output but for order, and for the fact you have (4, 7, 2) listed twice.

seq1[:len(seq0)] == seq0 

is the "does seq1 start with seq0?" condition, and

seq1[-len(seq0):] == seq0

is the "does seq1 end with seq0?" condition.


推荐阅读