python - 将嵌套的 JSON 解析为 pandas DataFrames
问题描述
我正在从包含股票收益数据的目标遗留系统中读取数据。数据以 JSON 格式导出到此收益模块等模块中。
earnings_dict = {
"earningsChart": {
"quarterly": [
{
"date": "1Q2018",
"actual": {
"raw": 0.12,
"fmt": "0.12"
},
"estimate": {
"raw": 0.05,
"fmt": "0.05"
}
},
{
"date": "2Q2018",
"actual": {
"raw": 0.21,
"fmt": "0.21"
},
"estimate": {
"raw": 0.19,
"fmt": "0.19"
}
},
{
"date": "3Q2018",
"actual": {
"raw": 0.16,
"fmt": "0.16"
},
"estimate": {
"raw": 0.21,
"fmt": "0.21"
}
},
{
"date": "4Q2018",
"actual": {
"raw": 0.07,
"fmt": "0.07"
},
"estimate": {
"raw": 0.14,
"fmt": "0.14"
}
}
],
"currentQuarterEstimate": {
"raw": 0.15,
"fmt": "0.15"
},
"currentQuarterEstimateDate": "1Q",
"currentQuarterEstimateYear": 2019,
"earningsDate": [
{
"raw": 1556496000,
"fmt": "2019-04-29"
},
{
"raw": 1556841600,
"fmt": "2019-05-03"
}
]
},
"financialsChart": {
"yearly": [
{
"date": 2015,
"revenue": {
"raw": 74977000,
"fmt": "74.98M",
"longFmt": "74,977,000"
},
"earnings": {
"raw": -15668000,
"fmt": "-15.67M",
"longFmt": "-15,668,000"
}
},
{
"date": 2016,
"revenue": {
"raw": 105586000,
"fmt": "105.59M",
"longFmt": "105,586,000"
},
"earnings": {
"raw": -8281000,
"fmt": "-8.28M",
"longFmt": "-8,281,000"
}
},
{
"date": 2017,
"revenue": {
"raw": 143803000,
"fmt": "143.8M",
"longFmt": "143,803,000"
},
"earnings": {
"raw": 9716000,
"fmt": "9.72M",
"longFmt": "9,716,000"
}
},
{
"date": 2018,
"revenue": {
"raw": 190071000,
"fmt": "190.07M",
"longFmt": "190,071,000"
},
"earnings": {
"raw": 19967000,
"fmt": "19.97M",
"longFmt": "19,967,000"
}
}
],
"quarterly": [
{
"date": "1Q2018",
"revenue": {
"raw": 42340000,
"fmt": "42.34M",
"longFmt": "42,340,000"
},
"earnings": {
"raw": 4320000,
"fmt": "4.32M",
"longFmt": "4,320,000"
}
},
{
"date": "2Q2018",
"revenue": {
"raw": 47240000,
"fmt": "47.24M",
"longFmt": "47,240,000"
},
"earnings": {
"raw": 7474000,
"fmt": "7.47M",
"longFmt": "7,474,000"
}
},
{
"date": "3Q2018",
"revenue": {
"raw": 50126000,
"fmt": "50.13M",
"longFmt": "50,126,000"
},
"earnings": {
"raw": 5524000,
"fmt": "5.52M",
"longFmt": "5,524,000"
}
},
{
"date": "4Q2018",
"revenue": {
"raw": 50365000,
"fmt": "50.37M",
"longFmt": "50,365,000"
},
"earnings": {
"raw": 2649000,
"fmt": "2.65M",
"longFmt": "2,649,000"
}
}
]
},
"financialCurrency": "USD"}
如您所见,JSON 嵌套在字典顶层的一些元数据中,使用 pandas.io.json_normalize 之类的东西很容易阅读。
df = pd.io.json.json_normalize(earnings_dict)
df
Out[13]:
earningsChart.currentQuarterEstimate.fmt ... financialsChart.yearly
0 0.15 ... [{'date': 2015, 'revenue': {'raw': 74977000, '...
[1 rows x 9 columns]
但是,它错过了包含多年和季度收益数据的嵌套字典列表。例如。季度和年度列表只是作为字典列表添加到 Dataframe 中。
我认为这最初是几个带有外键的 SQL 表。
我已经阅读了json_normalize文档,但似乎无法弄清楚如何使用 record_path 和 meta 参数解析字典。
我想我可以使用 json_normalize 甚至从嵌套的多个级别的字典中创建 DataFrame。看起来我至少需要 5 个——一个用于元数据,4 个用于 2 个年度和年度表。
奖金:
您将如何存储它?您会将其存储在 NoSQL 字符串数据库中还是将其保存在 SQL 中?我的要求是进行相当低负载、轻量级的分析,这将需要一些使用 pandas 和 matplotlib 的视图和图形。
感谢您的帮助!
解决方案
推荐阅读
- javascript - Get alerted when the value of select (dropdown) became empty
- python - 单击 Selenium Python 中的按钮
- c# - 如何将 AllowAnonymousAttribute 添加到来自自定义中间件的路由/请求?
- python - 给定 Bokeh python 的标准,如何在 GMAP 上显示不同的数据点
- reactjs - 建立投资组合网站,我想包括我的 github 链接
- gif - 我将如何制作自己的 GIF 启动器?
- ruby-on-rails - 根据日期获取一个季节内的事件数量
- c# - C# 将基本 JSON 记录重新格式化为标头和详细信息输出
- npm - 在管道部署期间找不到模块
- python - 如何在 Flask 中返回获取 JSON POST 请求的数据库?