首页 > 解决方案 > Pandas - 按 3 列将分组导出到 JSON

问题描述

我有carsML关于汽车的数据框:

+-------+-------------+--------------------+
| Manuf |    Model    |        Type        |
+-------+-------------+--------------------+
| VW    | VWModel 1   | VWModel 1 Type 1   |
| VW    | VWModel 2   | VWModel 2 Type 1   |
| VW    | VWModel 2   | VWModel 2 Type 2   |
| Opel  | OpelModel 1 | OpelModel 1 Type 1 |
| Opel  | OpelModel 2 | OpelModel 2 Type 1 |
| Opel  | OpelModel 2 | OpelModel 2 Type 2 |
+-------+-------------+--------------------+

我需要将唯一值导出到 JSON。我知道如何获得 2 个级别:

j = carsML.groupby('Manuf ')['Model'].unique().to_json()

这给了我很好的 JSON ManufacturersModels但我不知道如何在第三级(Types)上扩展它。

最终 JSON 应如下所示:

{"Opel":
{"OpelModel 1": ["OpelModel 1 Type 1"]},
["OpelModel 2":["OpelModel 2 Type 1","OpelModel 2 Type 1"]],
"VW":
{"VWModel 1":["VWModel 1 Type 1"]},
{"VWModel 2":["VWModel 2 Type 1","VWModel 2 Type 2"]}}

标签: pythonpandas

解决方案


首先MultiIndex Series通过按 2 列分组创建,然后在字典理解嵌套字典中创建:

s = carsML.groupby(['Manuf','Model'])['Type'].unique().apply(list)
d = {l: s.xs(l).to_dict() for l in s.index.levels[0]}

从嵌套字典中json使用:json.dumps

import json
j = json.dumps({l: s.xs(l).to_dict() for l in s.index.levels[0]})

print (j)
{"Opel": {"OpelModel 1": ["OpelModel 1 Type 1"], 
          "OpelModel 2": ["OpelModel 2 Type 1", "OpelModel 2 Type 2"]},
 "VW": {"VWModel 1": ["VWModel 1 Type 1"], 
        "VWModel 2": ["VWModel 2 Type 1", "VWModel 2 Type 2"]}}

推荐阅读