python - 在 Pandas 中展平多嵌套 JSON 并导出为 CSV
问题描述
原始 JSON 文件如下所示:
data = [
{
"masterName": "AAAAAAAAAAA",
"mainNames": [
{
"numbers": [
{
"date": "2019-05-16T00:00:00Z",
"NumberOne": 402.0,
"NumberTwo": 7830.0
}
],
"name": "randomca"
},
{
"numbers": [
{
"date": "2019-05-16T00:00:00Z",
"NumberOne": 222.0,
"NumberTwo": 4015.31
},
{
"date": "2019-05-31T00:00:00Z",
"NumberOne": 192.0,
"NumberTwo": 3685.64
}
],
"name": "randomka"
},
{
"numbers": [],
"name": "randomop"
}
]
},
{
"masterName": "BBBBB",
"mainNames": [
{
"numbers": [],
"name": "randomha"
},
{
"numbers": [
{
"date": "2019-05-17T00:00:00Z",
"NumberOne": 31.0,
"NumberTwo": 1500.0
},
{
"date": "2019-05-31T00:00:00Z",
"NumberOne": 236.0,
"NumberTwo": 31819.96
}
],
"name": "randomba"
}
]
}
]
我的代码如下:
test_data = {
"main": []
}
for item in range(len(data)):
test_data['main'].append(data[item])
df = pd.DataFrame(test_data)
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['mainNames']], axis=1) for t in test_data['main']
], keys=[t['masterName'] for t in test_data['main']]
)
df.index.levels[0].name = 'masterName'
df.columns.name = 'member'
df2 = df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
df2.to_csv('stack.csv', sep=',', encoding='utf-8', index=False)
PS:我已经从 csv 输出屏幕截图中排除了成员,因为我不再需要它了。
解决方案
纯Python 聚合的简短方法:
records = ({'teamname': d['masterName'], 'name': name['name'], **num_dct}
for d in data
for name in d['mainNames'] for num_dct in name['numbers'] or [{}])
df = pd.DataFrame(records)
cols = ['teamname', 'name', 'date']
print(df[cols + df.columns[~df.columns.isin(cols)].tolist()])
输出:
teamname name date NumberOne NumberTwo
0 AAAAAAAAAAA randomca 2019-05-16T00:00:00Z 402.0 7830.00
1 AAAAAAAAAAA randomka 2019-05-16T00:00:00Z 222.0 4015.31
2 AAAAAAAAAAA randomka 2019-05-31T00:00:00Z 192.0 3685.64
3 AAAAAAAAAAA randomop NaN NaN NaN
4 BBBBB randomha NaN NaN NaN
5 BBBBB randomba 2019-05-17T00:00:00Z 31.0 1500.00
6 BBBBB randomba 2019-05-31T00:00:00Z 236.0 31819.96
推荐阅读
- powershell - 将 ValidateSet() 与对象类型一起使用
- python - 在类中转换 pandas 列类型
- matlab - 当我采用 FFT sinc(t) 时有不同的结果
- python-3.x - Python 无法从内置轮子中导入子模块
- powerbi - 显示所有列值
- javascript - 错误:无法获取此 StaticQuery 的结果。这可能是 Gatsby 中的一个错误,如果刷新页面不能修复它
- c - 您如何在 C 中与两个孩子一起从管道中读取?
- amazon-web-services - 自动扩展组中每个目标组的端口范围的 AWS NLB
- blazor - Blazor,虚拟化复选框列表
- docker - 如何增加 docker ansible 中的项目最大大小