python - 如何处理在 Pandas 中返回类似字典的对象列表的 JSON?
问题描述
我正在使用来自collegefootballdata.com 的API 来获取有关分数和投注线的数据。我想使用投注线来推断预期的获胜百分比,然后将其与实际结果进行比较(我觉得我的球队输掉了太多比赛,而我们是大热门并想对其进行测试。)此代码检索一场比赛以用于示例目的。
parameters = {
"gameId": 401112435,
"year": 2019
}
response = requests.get("https://api.collegefootballdata.com/lines", params=parameters)
JSON 输出是这样的:
[
{
"awayConference": "ACC",
"awayScore": 28,
"awayTeam": "Virginia Tech",
"homeConference": "ACC",
"homeScore": 35,
"homeTeam": "Boston College",
"id": 401112435,
"lines": [
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "57.5",
"provider": "consensus",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "57",
"provider": "Caesars",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "58",
"provider": "numberfire",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "56.5",
"provider": "teamrankings",
"spread": "4.5"
}
],
"season": 2019,
"seasonType": "regular",
"week": 1
}
]
然后我将使用以下内容加载到熊猫数据框中:
def jstring(obj):
# create a formatted string of the Python JSON object
text = json.dumps(obj, sort_keys=True, indent=4)
return text
json_str = jstring(response.json())
df = pd.read_json(json_str)
这将创建一个带有“lines”列的数据框,其中包含 JSON 的整个行部分作为字符串。最终,我想在“provider”=“consensus”的块中使用“spread”值。对于我的目的而言,其他一切都是无关紧要的。我试过用
df = df.explode('lines')
这给了我 4 行,每场比赛都是这样的(如预期的那样):
{'formattedSpread': 'Virginia Tech -4.5', 'overUnder': '57.5', 'provider': 'consensus', 'spread': '4.5'}
这是我卡住的地方。我只想保留 'provider' = 'consensus' 的行,而且我需要在我的分析中将 'spread' 用作单独的变量/列。我尝试过第二次爆炸,df.split,df.replace 将 { 更改为 [ 并作为列表爆炸,都无济于事。任何帮助表示赞赏!
解决方案
这可能是您正在寻找的 -
编辑:处理特殊情况。
import pandas as pd
import requests
params = {
"gameId": 401112435,
"year": 2019,
}
r = requests.get("https://api.collegefootballdata.com/lines", params=params)
df = pd.DataFrame(r.json()) # Create a DataFrame with a lines column that contains JSON
df = df.explode('lines') # Explode the DataFrame so that each line gets its own row
df = df.reset_index(drop=True) # After explosion, the indices are all the same - this resets them so that you can align the DataFrame below cleanly
def fill_na_lines(lines):
if pd.isna(lines):
return {k: None for k in ['provider', 'spread', 'formattedSpread', 'overUnder']}
return lines
df.lines = df.lines.apply(fill_na_lines)
lines_df = pd.DataFrame(df.lines.tolist()) # A separate lines DataFrame created from the lines JSON column
df = pd.concat([df, lines_df], axis=1) # Concatenating the two DataFrames along the vertical axis.
# Now you can filter down to whichever rows you need.
df = df[df.provider == 'consensus']
推荐阅读
- javascript - 如何捕捉 postman.setNextRequest 错误
- php - 如何使用 if isset $_POST 输出中的变量?
- r - 如何按字母和数字顺序排序刺痛?
- jasper-reports - Jasper 报告时区问题
- kotlin - Jetpack Compose BasicTextField 中的装饰框
- c# - 如何根据角度计算不同的速度平均值?
- java - 具有不同数字、可选字母和点的数字的混合正则表达式
- c++ - 使用 C/C++ 在 Linux 上更改麦克风级别?
- jobs - mjphaynes/php-resque Class not found 错误
- jboss7.x - 如何在 Jboss/Wildfly 中测试分配重试超时参数?