python - 遍历多层字典
问题描述
这是我的数据
data = [
{
"title": "Main Topic 1",
"num": "Type 1",
"text": "Some Text",
"sub": [
{
"title": "Sub Topic 1",
"num": "1",
"text": "Some more Text",
"sub": [
{
"num": "(a)",
"text": "This is the actual text for the topic 1(a)",
},
{
"num": "(b)",
"text": "This is the actual text for the topic 1(b)",
},
],
},
{
"title": "Sub Topic 2",
"num": "2",
"text": "This is the actual text for the topic 2",
},
],
},
{
"title": "Main Topic 2",
"num": "Type 2",
"text": "Some Text",
"sub": [
{
"title": "Sub Topic 3",
"num": "3",
"text": "Some more Text",
"sub": [
{
"num": "(a)",
"text": "This is the actual text for the topic 3(a)",
},
{
"num": "(b)",
"text": "This is the actual text for the topic 3(b)",
},
],
},
{
"title": "Sub Topic 4",
"num": "4",
"text": "This is the actual text for the topic 4",
},
],
},
]
现在,我想要这样的输出:
{'title': 'Main Topic 1~Sub Topic 1~NA', 'num': 'Type 1~1~(a)', 'text': 'This is the actual text for the topic 1(a)'}
{'title': 'Main Topic 1~Sub Topic 1~NA', 'num': 'Type 1~1~(b)', 'text': 'This is the actual text for the topic 1(b)'}
{'title': 'Main Topic 1~Sub Topic 2', 'num': 'Type 1~2', 'text': 'This is the actual text for the topic 2'}
{'title': 'Main Topic 2~Sub Topic 3~NA', 'num': 'Type 2~3~(a)', 'text': 'This is the actual text for the topic 3(a)'}
{'title': 'Main Topic 2~Sub Topic 3~NA', 'num': 'Type 2~3~(b)', 'text': 'This is the actual text for the topic 3(b)'}
{'title': 'Main Topic 2~Sub Topic 4', 'num': 'Type 2~4', 'text': 'This is the actual text for the topic 4'}
这是我实现这一目标的代码:
def get_each_provision(title, num, text):
provision = {}
provision['title'] = title
provision['num'] = num
provision['text'] = text
return provision
def get_consolidated_provisions(data):
provisions = []
for level1 in data:
title_level1 = level1['title']
num_level1 = level1['num']
text_level1 = level1['text']
if 'sub' in level1:
level2_subs = level1['sub']
for level2 in level2_subs:
title_level2 = '%s~%s'%(title_level1, level2['title'])
num_level2 = '%s~%s'%(num_level1, level2['num'])
text_level2 = level2['text']
if 'sub' in level2:
level3_subs = level2['sub']
for level3 in level3_subs:
title = '%s~%s'%(title_level2, level3.get('title', 'NA'))
num = '%s~%s'%(num_level2, level3['num'])
text = level3['text']
provisions.append(get_each_provision(title, num, text))
else:
provisions.append(get_each_provision(title_level2, num_level2, text_level2))
else:
provisions.append(get_each_provision(title_level1, num_level1, text_level1))
return provisions
print('----------------------------------------------')
provisions = get_consolidated_provisions(data)
for each_provision in provisions:
print(each_provision)
它按预期工作正常。我想要实现的是 - 基本上从每个字典和子字典中获取最低级别的“文本”(在键“sub”下)我的问题有两个方面:(1)有没有更好的方法来实现这一点?(2) 如果有另一级字典列表,我的代码将中断。我可以申请另一个级别,但希望不要。
如果您想知道,上面的变量“data”是通过提取pdf文件实现的json格式。数据提取成功,如变量“数据”所示。这个想法是识别每个小节及其“num”和“title”的前导序列。
需要注意的一点:(1)最低级别没有键'title'和(2)最低级别的字典没有键'sub'。两者都如变量数据所示。
解决方案
您应该使用递归来完成“扁平化”这个字典列表。
def flatten(items):
new_list = []
for i in items:
if "sub" in i:
new_dict = {}
for k, v in i.items():
if not k == "sub":
new_dict[k] = v
new_list.append(new_dict)
new_list += flatten(i["sub"])
else:
new_list.append(i)
return new_list
# I've tested this with your data
flatten(data)
推荐阅读
- django - Django将模型字段更改为视图
- c++ - 我需要批量并行编程吗?
- excel - 如何使用 aspose 动态公式进行 fillDown 和 fillRight
- java - Akka 使用 Behaviors.withTimers 获取上下文
- scipy - scipy.stats.lognorm.expect 返回一个奇怪的结果
- css - 即使没有导入,CSS 文件也会应用于另一个反应组件
- python - 修复 mollweide matplotlib 投影轮廓
- python - 如何忽略函数返回的其余参数?
- python - 如何从 Python 字典创建自定义 CSV 文件?
- google-apps-script - 共享驱动器中的 RemoveEditor