首页 > 解决方案 > 如何更快地将嵌套字典转换为 pd.dataframe?

问题描述

我有一个看起来像这样的 json 文件

{
    "file": "name",
    "main": [{
        "question_no": "Q.1",
        "question": "what is ?",
        "answer": [{
                "user": "John",
                "comment": "It is defined as",
                "value": [
                          {
                            "my_value": 5,
                            "value_2": 10
                          },
                          {
                            "my_value": 24,
                            "value_2": 30
                          }
                          ]
            },
            {
                "user": "Sam",
                "comment": "as John said above it simply means",
                "value": [
                          {
                            "my_value": 9,
                            "value_2": 10
                          },
                          {
                            "my_value": 54,
                            "value_2": 19
                          }
                          ]
            }
        ],
        "closed": "no"
    }]
}

期望的结果:

Question_no      question  my_value_sum      value_2_sum       user      comment
Q.1             what is ?      29                40            john    It is defined as
Q.1             what is ?      63                29            Sam     as John said above it simply means

我尝试过的是data = json_normalize(file_json, "main")然后使用for循环

for ans, row in data.iterrows():
    ....
    ....
    df = df.append(the data)

但是使用它的问题是我的客户会花费很多时间拒绝该解决方案。列表中有大约 1200 个项目,main并且有 450 个这样的 json 文件要转换。所以这个转换的中间过程需要将近一个小时才能完成。

编辑:是否可以将和的总和my_value作为value_2一列?(也更新了所需的结果)

标签: pythonjsonpandasdictionaryitertools

解决方案


main通过参数record_path和选择字典meta

data = pd.json_normalize(file_json["main"], 
                         record_path='answer', 
                         meta=['question_no', 'question'])
print (data)
   user                             comment question_no   question
0  John                    It is defined as         Q.1  what is ?
1   Sam  as John said above it simply means         Q.1  what is ?

然后,如果顺序很重要,则将最后 N 列转换为第一个位置:

N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
  question_no   question  user                             comment
0         Q.1  what is ?  John                    It is defined as
1         Q.1  what is ?   Sam  as John said above it simply means

推荐阅读