首页 > 解决方案 > 递归地展平复杂的python dicts列表

问题描述

我想用一个非常复杂的字典创建一个 CSV 文件。真正的 dict 使用数千个键和超过 9 级深度,但这只是结构的一个示例:

import pandas
my_stuff = [
    {
        "a":
            [
                {"1": "example1"},
                {"2": [
                    {"2": "example2"},
                    {"3": "example3"}
                ]},
                {"4": "example4"},
                {"5": "example5"}
            ],
        "b":
            [
                "example6", "61", "62"
            ]
        }
]
result = pandas.json_normalize(my_stuff)
print(result.to_csv())

打印:

,a,b 0,
"[{'1': 'example1'}, {'2': [{'2': 'example2'}, {'3': 'example3'}]}, {'4': 'example4'}, {'5': 'example5'}]","['example6', '61', '62']"

但我想要这个输出:

"0.a.0.1, 0.a.0.2.2, 0.a.0.2.3, 0.a.0.4, 0.a.0.5, 0b.0"
"example1, example2, example3, example4, example5, example6;61;62"

我虽然 pandas 可以使 dict 变平,但似乎不能。我需要将密钥用作标题,sectiona.subsection1.fieldwhatever因为 .csv 稍后将被加载到数据库中。

我希望任何人都可以提供帮助。

奖励:我尝试不使用熊猫但被困在这里:

def flatten(py_structure, depth=""):
    """make a flatten dict"""
    new_dict = {}
    if isinstance(py_structure, dict):
        for k, v in py_structure.items():
            if isinstance(v, dict):
                flattened_v = flatten(v, k)
            elif isinstance(v, list):
                flattened_v = flatten(v, k)
            else:
                flattened_v = v
            new_dict[f"{depth}{k}"] = flattened_v
        return new_dict
    elif isinstance(py_structure, list):
        for idx, v in enumerate(py_structure):
            new_dict[f"{depth}{idx}"] = flatten(v, f"{depth}{idx}")
        return new_dict

标签: pythonjsonpandas

解决方案


您可以通过自定义树容器的深度优先遍历来实现此目的:

import pprint


class Container:
    def __init__(self, data):
        self.is_leaf = False
        if type(data) is list:
            self.data = [Container(x) for x in data]
        elif type(data) is dict:
            self.data = {k: Container(v) for k, v in data.items()}
        else:
            self.is_leaf = True
            self.data = data

    def walk(self, callback):
        self._walk(self, callback=callback, path=[])

    def _walk(self, container, callback=None, path=None):
        if type(container.data) is not dict \
           and all(x.is_leaf for _, x in container.items()):
            callback(".".join(path), [x.data for _, x in container.items()])
        else:
            for k, c in container.items():
                self._walk(c, callback=callback, path=path+[str(k)])

    def items(self):
        if type(self.data) is list:
            yield from enumerate(self.data)
        elif type(self.data) is dict:
            yield from self.data.items()
        else:
            yield None, self

    def flatten(self):
        result = {}

        def callback(key, value):
            result[key] = value

        self.walk(callback)
        return result


data = [
    {
        "a":
            [
                {"1": "example1"},
                {"2": [
                    {"2": "example2"},
                    {"3": "example3"}
                ]},
                {"4": "example4"},
                {"5": "example5"}
            ],
        "b":
            [
                "example6", "61", "62"
            ]
        }
]

c = Container(data)
pprint.pprint(c.flatten())

将输出:

{'0.a.0.1': ['example1'],
 '0.a.1.2.0.2': ['example2'],
 '0.a.1.2.1.3': ['example3'],
 '0.a.2.4': ['example4'],
 '0.a.3.5': ['example5'],
 '0.b': ['example6', '61', '62']}

推荐阅读