首页 > 解决方案 > 使用 Python 和 Pandas 反序列化来自嵌套 JSON 的数据

问题描述

我在嵌套的 Json 中有时间序列数据,我正在努力进入一个扁平的数据框。

输入数据

数据在这里:https ://corona.lmao.ninja/v2/historical

预期产出

一个扁平的 Pandas 数据框:国家|日期|病例|死亡|恢复

我试过的

import pandas as pd
import requests
import json

r = requests.get('https://corona.lmao.ninja/v2/historical', headers)
json_data = r.json()

现在,我可以,df = pd.json_normalize(json_data, max_level=1)但这给我留下了嵌入式列表。我也可以df = pd.json_normalize(json_data),但这只是为每个日期创建一个新列,随着时间的推移,这是不可持续的。

必须有一种优雅的方式来做到这一点。最后的手段是编写一个 Python 循环。

标签: pythonjsonpython-3.xpandasjson-deserialization

解决方案


这是阿富汗国家数据的子集(json 数据中的第一个条目):

content = [{"country":"Afghanistan","province":None,"timeline":{"cases":{"3/13/20":7,"3/14/20":11,"3/15/20":16,"3/16/20":21,"3/17/20":22,"3/18/20":22,"3/19/20":22,"3/20/20":24,"3/21/20":24,"3/22/20":40,"3/23/20":40,"3/24/20":74,"3/25/20":84,"3/26/20":94,"3/27/20":110,"3/28/20":110,"3/29/20":120,"3/30/20":170,"3/31/20":174,"4/1/20":237,"4/2/20":273,"4/3/20":281,"4/4/20":299,"4/5/20":349,"4/6/20":367,"4/7/20":423,"4/8/20":444,"4/9/20":484,"4/10/20":521,"4/11/20":555},"deaths":{"3/13/20":0,"3/14/20":0,"3/15/20":0,"3/16/20":0,"3/17/20":0,"3/18/20":0,"3/19/20":0,"3/20/20":0,"3/21/20":0,"3/22/20":1,"3/23/20":1,"3/24/20":1,"3/25/20":2,"3/26/20":4,"3/27/20":4,"3/28/20":4,"3/29/20":4,"3/30/20":4,"3/31/20":4,"4/1/20":4,"4/2/20":6,"4/3/20":6,"4/4/20":7,"4/5/20":7,"4/6/20":11,"4/7/20":14,"4/8/20":14,"4/9/20":15,"4/10/20":15,"4/11/20":18},"recovered":{"3/13/20":0,"3/14/20":0,"3/15/20":0,"3/16/20":1,"3/17/20":1,"3/18/20":1,"3/19/20":1,"3/20/20":1,"3/21/20":1,"3/22/20":1,"3/23/20":1,"3/24/20":1,"3/25/20":2,"3/26/20":2,"3/27/20":2,"3/28/20":2,"3/29/20":2,"3/30/20":2,"3/31/20":5,"4/1/20":5,"4/2/20":10,"4/3/20":10,"4/4/20":10,"4/5/20":15,"4/6/20":18,"4/7/20":18,"4/8/20":29,"4/9/20":32,"4/10/20":32,"4/11/20":32}}}]

一种方法是读入时间线数据,然后国家和省份数据分配给数据框:

res = pd.DataFrame(content[0]['timeline']).assign(country = content[0]['country'],
                                                  province = content[0]['province']
                                                  )

res.head()


         cases    deaths    recovered   country    province
3/13/20   7          0        0        Afghanistan  None
3/14/20   11         0        0        Afghanistan  None
3/15/20   16         0        0        Afghanistan  None
3/16/20   21         0        1        Afghanistan  None
3/17/20   22         0        1        Afghanistan  None

请注意,整个数据都包含在一个列表中,因此索引为 0。


推荐阅读