python - 使用 Python 和 Pandas 反序列化来自嵌套 JSON 的数据
问题描述
我在嵌套的 Json 中有时间序列数据,我正在努力进入一个扁平的数据框。
输入数据
数据在这里:https ://corona.lmao.ninja/v2/historical
预期产出
一个扁平的 Pandas 数据框:国家|日期|病例|死亡|恢复
我试过的
import pandas as pd
import requests
import json
r = requests.get('https://corona.lmao.ninja/v2/historical', headers)
json_data = r.json()
现在,我可以,df = pd.json_normalize(json_data, max_level=1)
但这给我留下了嵌入式列表。我也可以df = pd.json_normalize(json_data)
,但这只是为每个日期创建一个新列,随着时间的推移,这是不可持续的。
必须有一种优雅的方式来做到这一点。最后的手段是编写一个 Python 循环。
解决方案
这是阿富汗国家数据的子集(json 数据中的第一个条目):
content = [{"country":"Afghanistan","province":None,"timeline":{"cases":{"3/13/20":7,"3/14/20":11,"3/15/20":16,"3/16/20":21,"3/17/20":22,"3/18/20":22,"3/19/20":22,"3/20/20":24,"3/21/20":24,"3/22/20":40,"3/23/20":40,"3/24/20":74,"3/25/20":84,"3/26/20":94,"3/27/20":110,"3/28/20":110,"3/29/20":120,"3/30/20":170,"3/31/20":174,"4/1/20":237,"4/2/20":273,"4/3/20":281,"4/4/20":299,"4/5/20":349,"4/6/20":367,"4/7/20":423,"4/8/20":444,"4/9/20":484,"4/10/20":521,"4/11/20":555},"deaths":{"3/13/20":0,"3/14/20":0,"3/15/20":0,"3/16/20":0,"3/17/20":0,"3/18/20":0,"3/19/20":0,"3/20/20":0,"3/21/20":0,"3/22/20":1,"3/23/20":1,"3/24/20":1,"3/25/20":2,"3/26/20":4,"3/27/20":4,"3/28/20":4,"3/29/20":4,"3/30/20":4,"3/31/20":4,"4/1/20":4,"4/2/20":6,"4/3/20":6,"4/4/20":7,"4/5/20":7,"4/6/20":11,"4/7/20":14,"4/8/20":14,"4/9/20":15,"4/10/20":15,"4/11/20":18},"recovered":{"3/13/20":0,"3/14/20":0,"3/15/20":0,"3/16/20":1,"3/17/20":1,"3/18/20":1,"3/19/20":1,"3/20/20":1,"3/21/20":1,"3/22/20":1,"3/23/20":1,"3/24/20":1,"3/25/20":2,"3/26/20":2,"3/27/20":2,"3/28/20":2,"3/29/20":2,"3/30/20":2,"3/31/20":5,"4/1/20":5,"4/2/20":10,"4/3/20":10,"4/4/20":10,"4/5/20":15,"4/6/20":18,"4/7/20":18,"4/8/20":29,"4/9/20":32,"4/10/20":32,"4/11/20":32}}}]
一种方法是读入时间线数据,然后将国家和省份数据分配给数据框:
res = pd.DataFrame(content[0]['timeline']).assign(country = content[0]['country'],
province = content[0]['province']
)
res.head()
cases deaths recovered country province
3/13/20 7 0 0 Afghanistan None
3/14/20 11 0 0 Afghanistan None
3/15/20 16 0 0 Afghanistan None
3/16/20 21 0 1 Afghanistan None
3/17/20 22 0 1 Afghanistan None
请注意,整个数据都包含在一个列表中,因此索引为 0。
推荐阅读
- python - Python/Selenium 网络驱动程序。在页面上找到一个元素并打印/返回它的 xpath
- mamp - MAMP 设置虚拟主机不起作用
- powershell - Powershell命令查找允许登录计算机的用户列表
- remote-access - 在 Enter-PSSession 中运行 CMD
- mysql - Laravel eloquent group by datetime 每 10 分钟。MySQL
- javascript - 为 Object.assign 实现自定义行为
- javascript - 拖放操作通过但在 Selenium 中没有发生操作
- java - Hibernate :QuerySyntaxException: AlertToEmployee.alert 未映射
- sharepoint - 带有配置文件的 SharePoint 上面向 Internet 的用户输入表单
- node.js - 无法从 aws lambda 函数触发 aws sqs