首页 > 解决方案 > 在 Pandas 中展平多嵌套 JSON 并导出为 CSV

问题描述

原始 JSON 文件如下所示:

data = [
    {
        "masterName": "AAAAAAAAAAA",
        "mainNames": [
            {
                "numbers": [
                    {
                        "date": "2019-05-16T00:00:00Z",
                        "NumberOne": 402.0,
                        "NumberTwo": 7830.0
                    }
                ],
                "name": "randomca"
            },
            {
                "numbers": [
                    {
                        "date": "2019-05-16T00:00:00Z",
                        "NumberOne": 222.0,
                        "NumberTwo": 4015.31
                    },
                    {
                        "date": "2019-05-31T00:00:00Z",
                        "NumberOne": 192.0,
                        "NumberTwo": 3685.64
                    }
                ],
                "name": "randomka"
            },
            {
                "numbers": [],
                "name": "randomop"
            }
        ]
    },
    {
        "masterName": "BBBBB",
        "mainNames": [
            {
                "numbers": [],
                "name": "randomha"
            },
            {
                "numbers": [
                    {
                        "date": "2019-05-17T00:00:00Z",
                        "NumberOne": 31.0,
                        "NumberTwo": 1500.0
                    },
                    {
                        "date": "2019-05-31T00:00:00Z",
                        "NumberOne": 236.0,
                        "NumberTwo": 31819.96
                    }
                ],
                "name": "randomba"
            }
        ]
    }
]

使用我的代码,结果是: 在此处输入图像描述

我的代码如下:

test_data = {
"main": []
}

for item in range(len(data)):
  test_data['main'].append(data[item])

df = pd.DataFrame(test_data)

df = pd.concat(
    [
        pd.concat([pd.Series(m) for m in t['mainNames']], axis=1) for t in test_data['main']
    ], keys=[t['masterName'] for t in test_data['main']]
)

df.index.levels[0].name = 'masterName'
df.columns.name = 'member'

df2 = df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

df2.to_csv('stack.csv', sep=',', encoding='utf-8', index=False)

预期结果是: 在此处输入图像描述

PS:我已经从 csv 输出屏幕截图中排除了成员,因为我不再需要它了。

标签: pythonpandasexport-to-csv

解决方案


Python 聚合的简短方法:

records = ({'teamname': d['masterName'], 'name': name['name'], **num_dct} 
           for d in data
           for name in d['mainNames'] for num_dct in name['numbers'] or [{}])

df = pd.DataFrame(records)
cols = ['teamname', 'name', 'date']
print(df[cols + df.columns[~df.columns.isin(cols)].tolist()])

输出:

      teamname      name                  date  NumberOne  NumberTwo
0  AAAAAAAAAAA  randomca  2019-05-16T00:00:00Z      402.0    7830.00
1  AAAAAAAAAAA  randomka  2019-05-16T00:00:00Z      222.0    4015.31
2  AAAAAAAAAAA  randomka  2019-05-31T00:00:00Z      192.0    3685.64
3  AAAAAAAAAAA  randomop                   NaN        NaN        NaN
4        BBBBB  randomha                   NaN        NaN        NaN
5        BBBBB  randomba  2019-05-17T00:00:00Z       31.0    1500.00
6        BBBBB  randomba  2019-05-31T00:00:00Z      236.0   31819.96

推荐阅读