python - 降低 Python 中从 dict 获取唯一值的复杂性
问题描述
我对 Python 相当陌生(并且编写了良好且高效的算法),并且不太熟悉可用于有效迭代大量数据的不同数据结构。我需要从嵌套字典中找到唯一的一组值,并编写了以下代码:
data = {'c14da622-7fb8-4da3-a2fb-d8c632957fbe': {'25': {'label': 'no plane'}, '50': {'label': 'no plane'}, '125': {'label': 'no plane'}, '150': {'label': 'no plane'}, '175': {'label': 'plane'}, '200': {'label': 'plane'}, '275': {'label': 'plane'}, '300': {'label': 'plane'}, '325': {'label': 'plane'}, '350': {'label': 'plane'}, '375': {'label': 'plane'}, '400': {'label': 'plane'}, '425': {'label': 'plane'}, '450': {'label': 'plane'}, '475': {'label': 'plane'}, '500': {'label': 'plane'}, '525': {'label': 'plane'}, '550': {'label': 'plane'}, '575': {'label': 'plane'}, '600': {'label': 'plane'}, '625': {'label': 'plane'}, '650': {'label': 'plane'}, '875': {'label': 'plane'}, '900': {'label': 'plane'}, '925': {'label': 'plane'}, '950': {'label': 'plane'}, '975': {'label': 'plane'}, '1000': {'label': 'plane'}, '1025': {'label': 'plane'}, '1050': {'label': 'plane'}, '1075': {'label': 'plane'}, '1100': {'label': 'plane'}, '1125': {'label': 'plane'}, '1150': {'label': 'plane'}, '1175': {'label': 'plane'}}, '60cb59c7-6b0a-4225-b00f-2d888a9d5250': {'30': {'label': 'no plane'}, '60': {'label': 'no plane'}, '90': {'label': 'no plane'}, '120': {'label': 'no plane'}, '150': {'label': 'no plane'}, '180': {'label': 'plane'}, '210': {'label': 'plane'}, '240': {'label': 'plane'}, '270': {'label': 'plane'}, '300': {'label': 'plane'}, '330': {'label': 'plane'}, '360': {'label': 'plane'}, '390': {'label': 'plane'}, '420': {'label': 'plane'}, '450': {'label': 'plane'}, '480': {'label': 'plane'}, '510': {'label': 'plane'}, '570': {'label': 'plane'}, '600': {'label': 'plane'}, '660': {'label': 'plane'}, '690': {'label': 'plane'}, '720': {'label': 'plane crash'}, '750': {'label': 'plane crash'}, '780': {'label': 'plane crash'}, '810': {'label': 'plane crash'}, '840': {'label': 'plane crash'}, '870': {'label': 'plane crash'}, '900': {'label': 'plane crash'}, '930': {'label': 'plane crash'}, '960': {'label': 'plane crash'}, '990': {'label': 'no plane'}, '1020': {'label': 'plane crash'}, '1050': {'label': 'plane crash'}, '1080': {'label': 'plane crash'}, '1110': {'label': 'plane crash'}, '1140': {'label': 'plane crash'}, '1170': {'label': 'plane crash'}, '1200': {'label': 'plane crash'}, '1230': {'label': 'plane crash'}, '1260': {'label': 'plane crash'}, '1290': {'label': 'plane crash'}, '1320': {'label': 'plane crash'}, '1350': {'label': 'plane crash'}, '1380': {'label': 'plane crash'}, '1410': {'label': 'plane crash'}, '1560': {'label': 'plane crash'}, '1590': {'label': 'plane crash'}, '1620': {'label': 'plane crash'}, '1650': {'label': 'plane crash'}, '1680': {'label': 'plane crash'}, '1710': {'label': 'plane crash'}}}
def parse_label_categories(data):
tuples = list(data.values())
unique_labels = []
for labels in tuples:
labels_dump = list(labels.values())
for dump in labels_dump:
label = list(dump.values())
new = label.pop()
unique_labels.append(new)
return list(set(unique_labels))
parse_label_categories(data)
它返回三个唯一值:
['plane crash', 'plane', 'no plane']
我有一个嵌套的 for 循环,总的来说我的代码非常糟糕,但是我一直很难在 Python 中找到一个更优雅、更有效的解决方案来解决这个问题。
任何帮助/建议将不胜感激:-)
解决方案
专业提示:jsonlint会将数据格式化为可读格式,即使该 JSON 已被解析为 python 列表/字典。
data = {'c14da622-7fb8-4da3-a2fb-d8c632957fbe': {'25': {'label': 'no plane'}, '50': {'label': 'no plane'}, '125': {'label': 'no plane'}, '150': {'label': 'no plane'}, '175': {'label': 'plane'}, '200': {'label': 'plane'}, '275': {'label': 'plane'}, '300': {'label': 'plane'}, '325': {'label': 'plane'}, '350': {'label': 'plane'}, '375': {'label': 'plane'}, '400': {'label': 'plane'}, '425': {'label': 'plane'}, '450': {'label': 'plane'}, '475': {'label': 'plane'}, '500': {'label': 'plane'}, '525': {'label': 'plane'}, '550': {'label': 'plane'}, '575': {'label': 'plane'}, '600': {'label': 'plane'}, '625': {'label': 'plane'}, '650': {'label': 'plane'}, '875': {'label': 'plane'}, '900': {'label': 'plane'}, '925': {'label': 'plane'}, '950': {'label': 'plane'}, '975': {'label': 'plane'}, '1000': {'label': 'plane'}, '1025': {'label': 'plane'}, '1050': {'label': 'plane'}, '1075': {'label': 'plane'}, '1100': {'label': 'plane'}, '1125': {'label': 'plane'}, '1150': {'label': 'plane'}, '1175': {'label': 'plane'}}, '60cb59c7-6b0a-4225-b00f-2d888a9d5250': {'30': {'label': 'no plane'}, '60': {'label': 'no plane'}, '90': {'label': 'no plane'}, '120': {'label': 'no plane'}, '150': {'label': 'no plane'}, '180': {'label': 'plane'}, '210': {'label': 'plane'}, '240': {'label': 'plane'}, '270': {'label': 'plane'}, '300': {'label': 'plane'}, '330': {'label': 'plane'}, '360': {'label': 'plane'}, '390': {'label': 'plane'}, '420': {'label': 'plane'}, '450': {'label': 'plane'}, '480': {'label': 'plane'}, '510': {'label': 'plane'}, '570': {'label': 'plane'}, '600': {'label': 'plane'}, '660': {'label': 'plane'}, '690': {'label': 'plane'}, '720': {'label': 'plane crash'}, '750': {'label': 'plane crash'}, '780': {'label': 'plane crash'}, '810': {'label': 'plane crash'}, '840': {'label': 'plane crash'}, '870': {'label': 'plane crash'}, '900': {'label': 'plane crash'}, '930': {'label': 'plane crash'}, '960': {'label': 'plane crash'}, '990': {'label': 'no plane'}, '1020': {'label': 'plane crash'}, '1050': {'label': 'plane crash'}, '1080': {'label': 'plane crash'}, '1110': {'label': 'plane crash'}, '1140': {'label': 'plane crash'}, '1170': {'label': 'plane crash'}, '1200': {'label': 'plane crash'}, '1230': {'label': 'plane crash'}, '1260': {'label': 'plane crash'}, '1290': {'label': 'plane crash'}, '1320': {'label': 'plane crash'}, '1350': {'label': 'plane crash'}, '1380': {'label': 'plane crash'}, '1410': {'label': 'plane crash'}, '1560': {'label': 'plane crash'}, '1590': {'label': 'plane crash'}, '1620': {'label': 'plane crash'}, '1650': {'label': 'plane crash'}, '1680': {'label': 'plane crash'}, '1710': {'label': 'plane crash'}}}
def parse_label_categories(data):
seen = set()
for some_lable, data_dict in data.items():
for some_number, outcome in data_dict.items():
seen.add(outcome['label'])
return seen
a = parse_label_categories(data)
我认为在 Python 中没有更有效的方法。您可能可以使用 pandas 并可能将循环推送到 C 中,因为它将 JSON 扩展为数据框,但我不相信。
由于确实出现了熊猫方法,因此我做了时间安排:
import pandas as pd
data = {'c14da622-7fb8-4da3-a2fb-d8c632957fbe': {'25': {'label': 'no plane'}, '50': {'label': 'no plane'}, '125': {'label': 'no plane'}, '150': {'label': 'no plane'}, '175': {'label': 'plane'}, '200': {'label': 'plane'}, '275': {'label': 'plane'}, '300': {'label': 'plane'}, '325': {'label': 'plane'}, '350': {'label': 'plane'}, '375': {'label': 'plane'}, '400': {'label': 'plane'}, '425': {'label': 'plane'}, '450': {'label': 'plane'}, '475': {'label': 'plane'}, '500': {'label': 'plane'}, '525': {'label': 'plane'}, '550': {'label': 'plane'}, '575': {'label': 'plane'}, '600': {'label': 'plane'}, '625': {'label': 'plane'}, '650': {'label': 'plane'}, '875': {'label': 'plane'}, '900': {'label': 'plane'}, '925': {'label': 'plane'}, '950': {'label': 'plane'}, '975': {'label': 'plane'}, '1000': {'label': 'plane'}, '1025': {'label': 'plane'}, '1050': {'label': 'plane'}, '1075': {'label': 'plane'}, '1100': {'label': 'plane'}, '1125': {'label': 'plane'}, '1150': {'label': 'plane'}, '1175': {'label': 'plane'}}, '60cb59c7-6b0a-4225-b00f-2d888a9d5250': {'30': {'label': 'no plane'}, '60': {'label': 'no plane'}, '90': {'label': 'no plane'}, '120': {'label': 'no plane'}, '150': {'label': 'no plane'}, '180': {'label': 'plane'}, '210': {'label': 'plane'}, '240': {'label': 'plane'}, '270': {'label': 'plane'}, '300': {'label': 'plane'}, '330': {'label': 'plane'}, '360': {'label': 'plane'}, '390': {'label': 'plane'}, '420': {'label': 'plane'}, '450': {'label': 'plane'}, '480': {'label': 'plane'}, '510': {'label': 'plane'}, '570': {'label': 'plane'}, '600': {'label': 'plane'}, '660': {'label': 'plane'}, '690': {'label': 'plane'}, '720': {'label': 'plane crash'}, '750': {'label': 'plane crash'}, '780': {'label': 'plane crash'}, '810': {'label': 'plane crash'}, '840': {'label': 'plane crash'}, '870': {'label': 'plane crash'}, '900': {'label': 'plane crash'}, '930': {'label': 'plane crash'}, '960': {'label': 'plane crash'}, '990': {'label': 'no plane'}, '1020': {'label': 'plane crash'}, '1050': {'label': 'plane crash'}, '1080': {'label': 'plane crash'}, '1110': {'label': 'plane crash'}, '1140': {'label': 'plane crash'}, '1170': {'label': 'plane crash'}, '1200': {'label': 'plane crash'}, '1230': {'label': 'plane crash'}, '1260': {'label': 'plane crash'}, '1290': {'label': 'plane crash'}, '1320': {'label': 'plane crash'}, '1350': {'label': 'plane crash'}, '1380': {'label': 'plane crash'}, '1410': {'label': 'plane crash'}, '1560': {'label': 'plane crash'}, '1590': {'label': 'plane crash'}, '1620': {'label': 'plane crash'}, '1650': {'label': 'plane crash'}, '1680': {'label': 'plane crash'}, '1710': {'label': 'plane crash'}}}
def parse_label_categories(data):
seen = set()
for some_lable, data_dict in data.items():
for some_number, outcome in data_dict.items():
seen.add(outcome['label'])
return seen
def pandas_approach(d):
all_df=None
for id, d in data.items():
df = pd.DataFrame.from_dict(d, orient="index")
if all_df is None:
all_df = df
else:
all_df = pd.concat([all_df, df])
这使:
%timeit parse_label_categories(data)
18 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit pandas_approach(data)
2.7 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
推荐阅读
- symfony - 如何使用 webpack 正确管理 jquery-ui
- google-cloud-platform - 在 GCP 外访问 Kafka 集群
- python - 如何从熊猫列中的列表中提取元素并将它们附加到集合中
- r - 是否有一个 R 包来计算使用 clogit 或 bife 的条件(固定效应)逻辑模型的伪 R 平方度量?
- qt - 自定义或定义新的 Qml ChartView 主题
- javascript - 处理响应和传入道具的最佳实践
- javascript - 将 Hooks 与 Redux 一起使用——不好的做法?
- django - 为什么我的 slug 相关字段在 Django 中显示用户对象(1)而不是建议的字段名称?
- asp.net-core-webapi - 使用带有 Asp.net Core API 的身份服务器 4 使用有效访问令牌获取 401 Unauthorized
- ssl - SSL证书阻止前端和后端之间的通信?