首页 > 解决方案 > 如何从嵌套列表中找到包含较高值的列表并返回这些列表?

问题描述

我有这个包含重复条目的嵌套列表:

[['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
 ['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
 ['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
 ['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
 ['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
 ['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
 ['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]

我想通过 i[3] 过滤嵌套列表,所以最终输出将是这样的

[['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
 ['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
 ['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]

我尝试了一个 for 循环,但我不知道如何获得重复列表的最大值

标签: pythonnested-lists

解决方案


这是我能想到的最pythonic的方式。我的方法是首先对列表列表进行排序,通过sublist[3],这意味着当我们遍历列表时,我们最终会在遇到重复项之前遇到具有最大评论数的子列表。这个技巧将用于构建最终列表。

meta_list = [['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
 ['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
 ['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
 ['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
 ['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
 ['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
 ['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]

# Sort the list by review count and review name - make sure the highest review is first
meta_list.sort(key=lambda x: (int(x[3]), x[0]), reverse=True)

# This is the list we'll use to store the final data in
final_list = []
# Go through all the items in the meta_list
for meta in meta_list:
    
    if not meta[0] in [item[0] for item in final_list]:
        '''
        If another meta with the same name (0th index)
        doesn't already exist in final_list, add it
        '''
        final_list.append(meta)

输出-

[['Instagram',
  'SOCIAL',
  '4.5',
  66577446,
  'Varies with device',
  '1,000,000,000+',
  'Free',
  '0',
  'Teen',
  'Social',
  'July 31, 2018',
  'Varies with device',
  'Varies with device'],
 ['Gmail',
  'COMMUNICATION',
  '4.3',
  4604483,
  'Varies with device',
  '1,000,000,000+',
  'Free',
  '0',
  'Everyone',
  'Communication',
  'August 2, 2018',
  'Varies with device',
  'Varies with device'],
 ['Coloring book moana',
  'FAMILY',
  '3.9',
  974,
  '14M',
  '500,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design;Pretend Play',
  'January 15, 2018',
  '2.0.0',
  '4.0.3 and up']]

基本上它将所有不存在的元数据添加到final_list. 为什么这行得通?因为循环时遇到的第一个元数据是评论数最高的元数据。因此,一旦添加了那个,就无法添加它的骗子,我们就完成了。

注意:这不会保留评论本身的顺序。它只会确保只保留评论数量最多的评论,以防有同名的骗子。


推荐阅读