首页 > 解决方案 > 将一列中包含嵌套字典的 json 文件转换为 Pandas Dataframe

问题描述

我有一个具有以下结构的 json 文件:

[{
"field1": "first",
"field2": "d",
"id": 35,
"features": [
    {
        "feature_id": 2,
        "value": 6
    },
    {
        "feature_id": 3,
        "value": 8.5
    },
    {
      "feature_id":5,
      "value":6.7
    },
    {
    "feature_id":10,
    "value": 3.4
    }
  ],
  "time": "2018-11-17"
},
{
"field1": "second",
"field2": "b",
"id": 36,
"features": [
    {
        "feature_id": 3,
        "value": 5.4
    },
    {
        "feature_id": 10,
        "value": 9.5
    },

  ],
  "time": "2018-11-17"
}]

我可以将其更改为 Pandas Dataframe

import json
import pandas as pd
with open(file) as json_data:
 data = json.load(json_data)

df=pd.DataFrame(data)

但是一列在列表中有一个嵌套字典,因此特征列包含带有字典列表的列。我想展平我的整个数据,所以最终表格应该是这样的。感谢任何帮助?

final_dataframe

标签: pythonjsonpandas

解决方案


要将带有嵌套键的 JSON 对象扁平化为单个 Dict,请使用以下函数。

def flatten_json(nested_json):
"""
    Flatten json object with nested keys into a single level.
    Args:
        nested_json: A nested json object.
    Returns:
        The flattened json object if successful, None otherwise.
"""
out = {}

def flatten(x, name=''):
    if type(x) is dict:
        for a in x:
            flatten(x[a], name + a + '_')
    elif type(x) is list:
        i = 0
        for a in x:
            flatten(a, name + str(i) + '_')
            i += 1
    else:
        out[name[:-1]] = x

flatten(nested_json)
return out

希望此功能对您有所帮助。


推荐阅读