首页 > 解决方案 > 如何用python解析复杂的json?

问题描述

我正在尝试解析这个 json 文件,但遇到了麻烦。json 看起来像这样:

    <ListObject list at 0x2161945a860> JSON: {
  "data": [
    {
      "amount": 100,
      "available_on": 1621382400,
      "created": 1621264875,
      "currency": "usd",
      "description": "0123456",
      "exchange_rate": null,
      "fee": 266,
      "fee_details": [
        {
          "amount": 266,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_abvgd1234",
      "net": 9999,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust1",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    },
    {
      "amount": 25984,
      "available_on": 1621382400,
      "created": 1621264866,
      "currency": "usd",
      "description": "0326489",
      "exchange_rate": null,
      "fee": 93,
      "fee_details": [
        {
          "amount": 93,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_65987jihgf4984oihydgrd",
      "net": 9874,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust2",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    },
  ],
  "has_more": true,
  "object": "list",
  "url": "/v1/balance_"
}

我正在尝试使用此脚本在 python 中解析它:

import pandas as pd
df = pd.json_normalize(json)
df.head()

但我得到的是:

在此处输入图像描述

我需要的是在自己的列中解析每个数据点。所以我将有 2 行数据,每个数据点都有列。像这样的东西:

在此处输入图像描述

我现在该怎么做?

标签: pythonjsonpython-3.x

解决方案


除了一个字段之外,所有字段都是 JSON 的直接副本,因此您只需列出可以复制的字段,然后对 fee_details 进行额外处理。

import json
import pandas as pd

inp =  """{
  "data": [
    {
      "amount": 100,
      "available_on": 1621382400,
      "created": 1621264875,
      "currency": "usd",
      "description": "0123456",
      "exchange_rate": null,
      "fee": 266,
      "fee_details": [
        {
          "amount": 266,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_abvgd1234",
      "net": 9999,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust1",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    },
    {
      "amount": 25984,
      "available_on": 1621382400,
      "created": 1621264866,
      "currency": "usd",
      "description": "0326489",
      "exchange_rate": null,
      "fee": 93,
      "fee_details": [
        {
          "amount": 93,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_65987jihgf4984oihydgrd",
      "net": 9874,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust2",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    }
  ],
  "has_more": true,
  "object": "list",
  "url": "/v1/balance_"
}"""

copies = [
    'id',
    'net',
    'object',
    'reporting_category',
    'source',
    'amount',
    'available_on',
    'created',
    'currency',
    'description',
    'exchange_rate',
    'fee'
]

data = json.loads(inp)
rows = []
for inrow in data['data']:
    outrow = {}
    for copy in copies:
        outrow[copy] = inrow[copy]
    outrow['fee_details'] = inrow['fee_details'][0]['description']
    rows.append(outrow)

df = pd.DataFrame(rows)
print(df)

输出:

timr@tims-gram:~/src$ python x.py
                           id   net               object reporting_category source  amount  ...     created  currency description exchange_rate  fee      fee_details
0               txn_abvgd1234  9999  balance_transaction             charge  cust1     100  ...  1621264875       usd     0123456          None  266  processing fees
1  txn_65987jihgf4984oihydgrd  9874  balance_transaction             charge  cust2   25984  ...  1621264866       usd     0326489          None   93  processing fees

[2 rows x 13 columns]
timr@tims-gram:~/src$ 

推荐阅读