首页 > 解决方案 > Pandas 将 json 转换为单个单元格 csv 而不是完整的电子表格

问题描述

我正在尝试将 JSON 数据转换为具有 ID、created_at 和 updated_at 标头的 CSV。在这些列中应该是 JSON 中提供的数据。这些可以从查询更改为查询我不能只手动将它们的标题编码。
相反,我得到一个包含 2 个单元格的 CSV,数据,然后在它下面的单元格中列出所有其他内容。

这是从 GraphQL API 返回的测试数据。

{
  "data": {
    "accounts": {
      "entities": [
        {
          "id": "1",
          "created_at": "2021-05-06T15:35:49+00:00",
          "updated_at": "2021-11-09T15:52:02+00:00"
        },
        {
          "id": "2",
          "created_at": "2021-05-08T01:51:54+00:00",
          "updated_at": "2021-10-20T15:53:42+00:00"
        },
        {
          "id": "3",
          "created_at": "2021-05-10T15:53:01+00:00",
          "updated_at": "2021-10-27T17:15:41+00:00"
        },
        {
          "id": "4",
          "created_at": "2021-05-11T13:25:02+00:00",
          "updated_at": "2021-11-09T15:35:44+00:00"
        },
        {
          "id": "5",
          "created_at": "2021-05-11T13:42:24+00:00",
          "updated_at": "2021-11-09T15:39:50+00:00"
        }
      ]
    }
  }
}

我正在运行我认为非常标准的操作。

def json_to_csv(csv_path, json_path):
    with open(json_path, encoding='utf-8-sig') as j_input:
        df = pd.read_json(j_input)
    
    df.to_csv(csv_path, encoding='utf-8', index=False, line_terminator='\n')

但是输出都在一个单元格中,输出看起来像这样。

data
"{'entities': [{'id': '1', 'created_at': '2021-05-06T15:35:49+00:00', 'updated_at': '2021-11-09T15:52:02+00:00'}, {'id': '2', 'created_at': '2021-05-08T01:51:54+00:00', 'updated_at': '2021-10-20T15:53:42+00:00'}, {'id': '3', 'created_at': '2021-05-10T15:53:01+00:00', 'updated_at': '2021-10-27T17:15:41+00:00'}, {'id': '4', 'created_at': '2021-05-11T13:25:02+00:00', 'updated_at': '2021-11-09T15:35:44+00:00'}, {'id': '5', 'created_at': '2021-05-11T13:42:24+00:00', 'updated_at': '2021-11-09T15:39:50+00:00'}]}"

我没有使用很多熊猫,但它应该最终用于数据操作,所以我猜我在某个地方遗漏了一些东西。任何帮助,将不胜感激。

标签: pythonjsonpandascsv

解决方案


这是一种遍历 json 数据的方法,当涉及到列表时停止,然后生成一个 df。如果您的 json 结构始终是嵌套的 1-entry dicts 后跟一个 dicts 列表,那么即使外部 dicts 的标题名称发生变化,这也将起作用

import pandas as pd
import json
import io

j_input = io.StringIO(
"""
{
  "data": {
    "accounts": {
      "entities": [
        {
          "id": "1",
          "created_at": "2021-05-06T15:35:49+00:00",
          "updated_at": "2021-11-09T15:52:02+00:00"
        },
        {
          "id": "2",
          "created_at": "2021-05-08T01:51:54+00:00",
          "updated_at": "2021-10-20T15:53:42+00:00"
        },
        {
          "id": "3",
          "created_at": "2021-05-10T15:53:01+00:00",
          "updated_at": "2021-10-27T17:15:41+00:00"
        },
        {
          "id": "4",
          "created_at": "2021-05-11T13:25:02+00:00",
          "updated_at": "2021-11-09T15:35:44+00:00"
        },
        {
          "id": "5",
          "created_at": "2021-05-11T13:42:24+00:00",
          "updated_at": "2021-11-09T15:39:50+00:00"
        }
      ]
    }
  }
}
"""
)


json_data = json.load(j_input)
pd_data = None

#Iterate through the json structure looking for the first list
while True:

    if type(json_data) == list:
        #If we've found a list just try to use it to make a dataframe
        pd_data = json_data
        
    if type(json_data) == dict:
        ks = list(json_data.keys())
        
        #only proceed if there is just a single dict key
        if len(ks) == 1:
            json_data = json_data[ks[0]]
            continue

    break
        

df = pd.DataFrame(pd_data)
df

推荐阅读