首页 > 解决方案 > Python:将CSV文件的多列转换为嵌套的Json

问题描述

这是我的多列输入 CSV 文件,我想将此 csv 文件转换为包含部门、部门 ID 和一个名为 customer 的嵌套字段的 json 文件,并将第一个和最后一个嵌套到该字段。

department, departmentID, first, last
fans, 1, Caroline, Smith
fans, 1, Jenny, White
students, 2, Ben, CJ
students, 2, Joan, Carpenter
...

输出我需要的json文件:

[
{
"department" : "fans",
"departmentID: "1",
"customer" : [
    {
      "first" : "Caroline",
      "last" :  "Smith"
    },
    {
      "first" : "Jenny",
      "last" :  "White"
    }
    ]
},
{
"department" : "students", 
"departmentID":2,
"user" : 
     [
     {
      "first" : "Ben",
      "last" :  "CJ"
    },
    {
     "first" : "Joan",
      "last" :  "Carpenter"
    }
  ]
}
]

我的代码:

from csv import DictReader
from itertools import groupby
with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['group'], r['groupID'])):
        groups.append({
            "group": k[0],
            "groupID": k[1],
            "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
        })
        uniquekeys.append(k)

pprint(groups)

我的问题是:groupID 在数据中显示两次,进出嵌套 json。我想要的是 group 和 groupID 作为 grouby 键。

标签: pythonjsoncsvnestedmultiple-columns

解决方案


问题是你混合了键的名称,所以这条线 "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)] 没有从你的字典中正确地删除它们,没有这样的键。所以什么都没有被删除。

我不完全理解你想要什么键,所以下面的例子假设data.csv看起来和你的问题完全一样departmentdepartmentID但是脚本将它转换为groupgroupID

from csv import DictReader
from itertools import groupby
from pprint import pprint

with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['department'], r['departmentID'])):
        groups.append({
            "group": k[0],
            "groupID": k[1],
            "user": [{k:v for k, v in d.items() if k not in ['department','departmentID']} for d in list(g)]
        })
        uniquekeys.append(k)

pprint(groups)

输出:

[{'group': 'fans',
  'groupID': '1',
  'user': [{'first': 'Caroline', 'last': 'Smith'},
           {'first': 'Jenny', 'last': 'White'}]},
 {'group': 'students',
  'groupID': '2',
  'user': [{'first': 'Ben', 'last': 'CJ'},
           {'first': 'Joan', 'last': 'Carpenter'}]}]

我使用了不同的键,所以很明显哪一行做了什么,并且很容易为输入或输出中的不同键定制它


推荐阅读