python - 如何从嵌套列表中找到包含较高值的列表并返回这些列表?
问题描述
我有这个包含重复条目的嵌套列表:
[['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
我想通过 i[3] 过滤嵌套列表,所以最终输出将是这样的
[['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
我尝试了一个 for 循环,但我不知道如何获得重复列表的最大值
解决方案
这是我能想到的最pythonic的方式。我的方法是首先对列表列表进行排序,通过sublist[3]
,这意味着当我们遍历列表时,我们最终会在遇到重复项之前遇到具有最大评论数的子列表。这个技巧将用于构建最终列表。
meta_list = [['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
# Sort the list by review count and review name - make sure the highest review is first
meta_list.sort(key=lambda x: (int(x[3]), x[0]), reverse=True)
# This is the list we'll use to store the final data in
final_list = []
# Go through all the items in the meta_list
for meta in meta_list:
if not meta[0] in [item[0] for item in final_list]:
'''
If another meta with the same name (0th index)
doesn't already exist in final_list, add it
'''
final_list.append(meta)
输出-
[['Instagram',
'SOCIAL',
'4.5',
66577446,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Teen',
'Social',
'July 31, 2018',
'Varies with device',
'Varies with device'],
['Gmail',
'COMMUNICATION',
'4.3',
4604483,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Everyone',
'Communication',
'August 2, 2018',
'Varies with device',
'Varies with device'],
['Coloring book moana',
'FAMILY',
'3.9',
974,
'14M',
'500,000+',
'Free',
'0',
'Everyone',
'Art & Design;Pretend Play',
'January 15, 2018',
'2.0.0',
'4.0.3 and up']]
基本上它将所有不存在的元数据添加到final_list
. 为什么这行得通?因为循环时遇到的第一个元数据是评论数最高的元数据。因此,一旦添加了那个,就无法添加它的骗子,我们就完成了。
注意:这不会保留评论本身的顺序。它只会确保只保留评论数量最多的评论,以防有同名的骗子。
推荐阅读
- pyspark - 如何知道使用 azure-cosmosdb-spark 消耗的请求单元 (RU) 的数量
- excel - 尝试访问数组时下标超出范围
- r - 带有 gtsummary 包的加权 Cox 回归模型汇总表
- ruby - 生成增加的随机数轨道
- linux - 如何删除文件1中不在文件2中的
- telegram - 开始按钮中缺少电报深度链接开始参数
- python - 无论你输入什么,我总是得到 ID 不在列表中
- xml - 如何在xslt中逐字复制xml节点
- amazon-web-services - AWS Logs Insights 解析正则表达式始终为空
- monetdb - 如何实现字符串的 SUM 聚合