python - 使用 Python 中的字典从 Hockey API 中抓取数据
问题描述
我开始从事一个有趣的项目,以更好地练习我的数据抓取技能,我从 NHL API 抓取数据并尝试记录射门和进球的所有位置坐标(这个 API 将向您展示任何 NHL 比赛并具有坐标以及在所述游戏中发生的每个事件的玩家信息)。但是,我在对数据进行索引时遇到问题,并且真的不知道如何处理它。下面是我的代码...
import requests as rq
import csv
GAME_ID = "2017021121" #Game ID indicates which game I want to look at...first 4 digits is the year, second two the point in season, (01 Pre, 02 Reg, 03 Playoffs, 04 All Star)
#URL to access the coordinates of every event in given game...comes in nested dictionary form
url = f"https://statsapi.web.nhl.com/api/v1/game/{GAME_ID}/feed/live"
game = rq.get(url)
#turn the file into a readable one
contents = game.text
#split text into list so we can fool around with it
contents_list = list(csv.reader(contents.splitlines()))
def main():
file = open( f'coordinates.{GAME_ID}.txt', 'a')
我现在要做的是使用 for 循环遍历数据集并检查“事件类型”以及它们是否等于“射击”或“目标”,以及它们是否要添加它们的值x, y 坐标到一个字典,该字典被打印到一个新文件中。我曾尝试通过自己建立索引,但不是很擅长数据抓取,所以我没有走得太远。作为参考,这里是数据集的样子(或者至少是其中的一个片段)。
} ],
"result" : {
"event" : "Penalty",
"eventCode" : "COL162",
"eventTypeId" : "PENALTY",
"description" : "Blake Coleman Tripping against Erik Johnson",
"secondaryType" : "Tripping",
"penaltySeverity" : "Minor",
"penaltyMinutes" : 2
},
"about" : {
"eventIdx" : 30,
"eventId" : 162,
"period" : 1,
"periodType" : "REGULAR",
"ordinalNum" : "1st",
"periodTime" : "04:47",
"periodTimeRemaining" : "15:13",
"dateTime" : "2019-03-17T19:15:33Z",
"goals" : {
"away" : 0,
"home" : 0
}
},
"coordinates" : {
"x" : -58.0,
"y" : -37.0
},
"team" : {
"id" : 1,
"name" : "New Jersey Devils",
"link" : "/api/v1/teams/1",
"triCode" : "NJD"
}
}, {
"players" : [ {
"player" : {
"id" : 8471233,
"fullName" : "Travis Zajac",
"link" : "/api/v1/people/8471233"
},
"playerType" : "Winner"
}, {
"player" : {
"id" : 8473544,
"fullName" : "Derick Brassard",
"link" : "/api/v1/people/8473544"
},
"playerType" : "Loser"
} ],
"result" : {
"event" : "Faceoff",
"eventCode" : "COL25",
"eventTypeId" : "FACEOFF",
"description" : "Travis Zajac faceoff won against Derick Brassard"
},
"about" : {
"eventIdx" : 31,
"eventId" : 25,
"period" : 1,
"periodType" : "REGULAR",
"ordinalNum" : "1st",
"periodTime" : "04:47",
"periodTimeRemaining" : "15:13",
"dateTime" : "2019-03-17T19:15:59Z",
"goals" : {
"away" : 0,
"home" : 0
}
},
"coordinates" : {
"x" : -69.0,
"y" : -22.0
},
"team" : {
"id" : 1,
"name" : "New Jersey Devils",
"link" : "/api/v1/teams/1",
"triCode" : "NJD"
对我来说,它看起来像是一堆嵌套的字典,但我再次不确定。
任何帮助将不胜感激!!谢谢!!
解决方案
It looks like a big list, containing lists of dictionaries. You could iterate through this with a for-each-loop
.
for entry in list:
Then, you can look at every entry individually and check if their eventTypeId
is equal to either shot
or goal
:
if entry["result"]["eventTypeId"] == "SHOT":
And when that's the case, you can pull out the values of the X and Y coordinates like this:
x = entry["coordinates"]["x"]
y = entry["coordinates"]["y"]
After which you can use those to do whatever you want to do with those coordinates.
推荐阅读
- javascript - Javascript中的forEach并行?
- reactjs - 在 React Web 应用程序中列出 Cognito 用户
- docker - 是否可以为不同的服务使用不同的 .env?
- c# - Anaconda 3 中的 Pythonnet 导入错误 Visbrain
- ruby - puts 的 Ruby 文档中的“ios”是什么?
- .net-core - .net core 2.1 上的 Kestrel 网络传输速度非常慢
- python - 用python合并iqy文件
- mysql - 创建 SUM 表的触发器并插入另一个表
- c# - XmlWriter 不会正确生成我需要的命名空间
- python - SQL Alchemy:嵌套复合列类型和自定义类型