首页 > 解决方案 > 使用 Python 中的字典从 Hockey API 中抓取数据

问题描述

我开始从事一个有趣的项目,以更好地练习我的数据抓取技能,我从 NHL API 抓取数据并尝试记录射门和进球的所有位置坐标(这个 API 将向您展示任何 NHL 比赛并具有坐标以及在所述游戏中发生的每个事件的玩家信息)。但是,我在对数据进行索引时遇到问题,并且真的不知道如何处理它。下面是我的代码...

import requests as rq
import csv

GAME_ID = "2017021121" #Game ID indicates which game I want to look at...first 4 digits is the year, second two the point in season, (01 Pre, 02 Reg, 03 Playoffs, 04 All Star)

#URL to access the coordinates of every event in given game...comes in nested dictionary form
url = f"https://statsapi.web.nhl.com/api/v1/game/{GAME_ID}/feed/live"
game = rq.get(url)
#turn the file into a readable one
contents = game.text

#split text into list so we can fool around with it
contents_list = list(csv.reader(contents.splitlines()))

def main():
    file = open( f'coordinates.{GAME_ID}.txt', 'a')

我现在要做的是使用 for 循环遍历数据集并检查“事件类型”以及它们是否等于“射击”或“目标”,以及它们是否要添加它们的值x, y 坐标到一个字典,该字典被打印到一个新文件中。我曾尝试通过自己建立索引,但不是很擅长数据抓取,所以我没有走得太远。作为参考,这里是数据集的样子(或者至少是其中的一个片段)。

} ],
        "result" : {
          "event" : "Penalty",
          "eventCode" : "COL162",
          "eventTypeId" : "PENALTY",
          "description" : "Blake Coleman Tripping against Erik Johnson",
          "secondaryType" : "Tripping",
          "penaltySeverity" : "Minor",
          "penaltyMinutes" : 2
        },
        "about" : {
          "eventIdx" : 30,
          "eventId" : 162,
          "period" : 1,
          "periodType" : "REGULAR",
          "ordinalNum" : "1st",
          "periodTime" : "04:47",
          "periodTimeRemaining" : "15:13",
          "dateTime" : "2019-03-17T19:15:33Z",
          "goals" : {
            "away" : 0,
            "home" : 0
          }
        },
        "coordinates" : {
          "x" : -58.0,
          "y" : -37.0
        },
        "team" : {
          "id" : 1,
          "name" : "New Jersey Devils",
          "link" : "/api/v1/teams/1",
          "triCode" : "NJD"
        }
      }, {
        "players" : [ {
          "player" : {
            "id" : 8471233,
            "fullName" : "Travis Zajac",
            "link" : "/api/v1/people/8471233"
          },
          "playerType" : "Winner"
        }, {
          "player" : {
            "id" : 8473544,
            "fullName" : "Derick Brassard",
            "link" : "/api/v1/people/8473544"
          },
          "playerType" : "Loser"
        } ],
        "result" : {
          "event" : "Faceoff",
          "eventCode" : "COL25",
          "eventTypeId" : "FACEOFF",
          "description" : "Travis Zajac faceoff won against Derick Brassard"
        },
        "about" : {
          "eventIdx" : 31,
          "eventId" : 25,
          "period" : 1,
          "periodType" : "REGULAR",
          "ordinalNum" : "1st",
          "periodTime" : "04:47",
          "periodTimeRemaining" : "15:13",
          "dateTime" : "2019-03-17T19:15:59Z",
          "goals" : {
            "away" : 0,
            "home" : 0
          }
        },
        "coordinates" : {
          "x" : -69.0,
          "y" : -22.0
        },
        "team" : {
          "id" : 1,
          "name" : "New Jersey Devils",
          "link" : "/api/v1/teams/1",
          "triCode" : "NJD"

对我来说,它看起来像是一堆嵌套的字典,但我再次不确定。

任何帮助将不胜感激!!谢谢!!

标签: pythonapidictionarynestedscreen-scraping

解决方案


It looks like a big list, containing lists of dictionaries. You could iterate through this with a for-each-loop.

for entry in list:

Then, you can look at every entry individually and check if their eventTypeId is equal to either shot or goal:

if entry["result"]["eventTypeId"] == "SHOT":

And when that's the case, you can pull out the values of the X and Y coordinates like this:

x = entry["coordinates"]["x"]
y = entry["coordinates"]["y"]

After which you can use those to do whatever you want to do with those coordinates.


推荐阅读