首页 > 解决方案 > Python 3按值深度合并嵌套字典

问题描述

我有两种类型的数据结构

data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': [{'name':class_3_name, 'type':'directory', 'children': []}]}]}

data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': []}]}

现在,在循环中合并这些字典的多个版本时出现了我的问题。因为孩子们总是不同的,所以我所有的尝试都只返回了一个级别的字典。例如:

{
"name": "class_1_1",
"type": "directory",
"children": [
    {
        "name": "class_2_1",
        "type": "directory",
        "children": []
    },
    {
        "name": "class_2_2",
        "type": "directory",
        "children": [
            {
                "name": "class_3_1",
                "type": "directory",
                "children": []
            }
        ]
    },
    {
        "name": "class_2_2",
        "type": "directory",
        "children": [
            {
                "name": "class_3_2",
                "type": "directory",
                "children": []
            }
        ]
    }
]
}

结果应该是:

    {
"name": "class_1_1",
"type": "directory",
"children": [
    {
        "name": "class_2_1",
        "type": "directory",
        "children": []
    },
    {
        "name": "class_2_2",
        "type": "directory",
        "children": [
            {
                "name": "class_3_1",
                "type": "directory",
                "children": []
            },
            {
                "name": "class_3_2",
                "type": "directory",
                "children": []
            }
        ]
    }
]
}

我目前正在使用来自https://github.com/avian2/jsonmerge的 avian2 的 jsonmerge, 因为我真的不知道从哪里开始按值深度合并两个字典。

每次我尝试解决这个问题时,都会遇到逻辑错误。我真的不知道如何处理这个问题。任何帮助/提示我正确的方向将不胜感激。

干杯。

编辑代码:

import os
import io
import json
import bs4 as bs
from jsonmerge import Merger

list = [ '' ]
g_dict = {}

def getJsonInfo( eggs ):
    if (eggs == 3):
        data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': [{'name':class_3_name, 'type':'directory', 'children': []}]}]}
    else:
        data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': []}]}

    schema = {
        "properties": {
            "children": {
                "type": "array",
                "mergeStrategy": "append"
            }
        }
    }

    global g_dict
    merger = Merger(schema)
    g_dict = merger.merge(data, g_dict)

with open('catalogue.html') as html_file:
    tree = bs.BeautifulSoup( html_file,'lxml' )

for class_1 in tree.find_all('div',class_="class_1"):
    class_1_name = class_1['name']
    for class_2 in class_1.find_all('div',class_="class_2"):
        class_2_name = class_2['name']
        class_3 = class_2.find_all('div',class_="class_3")
        if len(class_3) != 0:
            for class_3 in class_2.find_all('div',class_="class_3"):
                class_3_name = class_3['name']
                print(class_1['name'] + ' -> ' + class_2['name'] + ' -> ' + class_3['name'])
                getJsonInfo(3)
        else:
            print(class_1['name'] + ' -> ' + class_2['name'] )
            getJsonInfo(2)

print('Creating JSON Tree')

with io.open('database.json', 'w', encoding='utf-8') as file:
    file.write(json.dumps(g_dict, ensure_ascii=False, indent=4))

print('Done!')

目录.html:

   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja">
<body>
    <body>
        <div class="class_1" name="A">
            <div class="class_2" name="A2">
                <div class="class_3" name="a31"></div>
                <div class="class_3" name="a32"></div>
            </div>
        </div>
        <div class="class_1" name="B">
            <div class="class_2" name="b1"></div>
        </div>
    </body>
</html>

标签: pythonpython-3.x

解决方案


您可以使用 dictseen来跟踪每个不同名称的第一个子 dict 并继续扩展其children与其他dict同名子项,并递归遍历子项的子项:

def deep_merge(d):
    seen = {}
    for c in d['children']:
        if c['name'] in seen:
            seen[c['name']]['children'] += c['children']
        else:
            seen[c['name']] = c
        deep_merge(c)
deep_merge(d)

d会成为:

{'children': [{'children': [],
               'name': 'class_2_1',
               'type': 'directory'},
              {'children': [{'children': [],
                             'name': 'class_3_1',
                             'type': 'directory'},
                            {'children': [],
                             'name': 'class_3_2',
                             'type': 'directory'}],
               'name': 'class_2_2',
               'type': 'directory'},
              {'children': [{'children': [],
                             'name': 'class_3_2',
                             'type': 'directory'}],
               'name': 'class_2_2',
               'type': 'directory'}],
 'name': 'class_1_1',
 'type': 'directory'}

推荐阅读