首页 > 解决方案 > 规范化熊猫数据框中具有深度嵌套数据的列

问题描述

我的问题是关于从 JSON 文件中提取的数据,具有嵌套数组的结构,如下所示:

data = { "data": [
{
  "insights": {
    "data": [
      {
        "account_id": "10",
        "actions": [
          {
            "action_type": "link",
            "value": "3"
          },
          {
            "action_type": "post",
            "value": "3"
          }
        ],
        "clicks": "3"
      },
      {
        "account_id": "10",
        "actions": [
          {
            "action_type": "save",
            "value": "3"
          }
        ],
        "clicks": "123"
      },
      {
        "account_id": "10",
        "actions": [
          {
            "action_type": "save",
            "value": "1"
          },
          {
            "action_type": "link",
            "value": "11"
          },
          {
            "action_type": "view",
            "value": "10"
          }
        ],
        "clicks": "19"
      },
      {
        "account_id": "10",
        "clicks": "0"
      }
    ],
    "paging": {
      "cursors": {
        "before": "ON",
        "after": "OFF"
      }
    }
  },
  "id": "1"
}]}

我的目标是通过 CSV 文件将其转换为 Python 上可读的表格。输出应采用以下形式:

    account_id action_type value clicks id before after
     10            link      3    3     1    ON    OFF 
     10            post      3    3     1    ON    OFF
     10            save      3    123   1    ON    OFF
     10            save      1    19    1    ON    OFF 
     10            link      11   19    1    ON    OFF  
     10            view      10   19    1    ON    OFF
     10            Null      Null 0     1    ON    OFF

我试图通过对问题Converting a JSON with a nested array to CSV给出的解决方案来找出解决方案。

我也尝试了 json_normalize,但由于嵌套数组的多个级别,我仍然卡住了。我使用了这段代码:

    python        
    df = json_normalize(data['data'],record_path=['insights','data'],meta=['id'])

2个问题仍然存在:

有人看到我在这里缺少什么吗?

标签: pythonjsonmultidimensional-arraydatabase-normalizationflatten

解决方案


推荐阅读