首页 > 解决方案 > 我将如何使用 BeautifulSoup 刮掉这个 ESPN Fantasy 播放器表?

问题描述

下面的链接有我要抓取的表格。我过去曾尝试过,但它的格式很奇怪,我不确定它们是否会阻止人们抓取。

这是我尝试过的

req_url = "https://fantasy.espn.com/basketball/players/add?leagueId=133998"
req = requests.get(req_url)
soup = BeautifulSoup(req.content, 'html.parser')
table = soup.find('table', {'id':'fitt-analytics'})
df = pd.read_html(str(table))[0]
df.head()

但它给了我一个错误,指出没有找到表。

https://fantasy.espn.com/basketball/players/add?leagueId=133998

标签: pythonhtmlpandasweb-scrapingbeautifulsoup

解决方案


嗯......你可以探索selenium,因为你不会得到它BeautifulSoup(表格是由 JS 动态呈现的),或者尝试从API.

这是一个获取玩家姓名及其幻想 PTS 总数和平均值的工作示例。

import json

import requests
from tabulate import tabulate

filter_string = {
  "players": {
    "filterStatus": {
      "value": [
        "FREEAGENT",
        "WAIVERS"
      ]
    },
    "filterSlotIds": {
      "value": [
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        11
      ]
    },
    "filterRanksForScoringPeriodIds": {
      "value": [
        1
      ]
    },
    "sortPercOwned": {
      "sortPriority": 2,
      "sortAsc": False
    },
    "sortDraftRanks": {
      "sortPriority": 100,
      "sortAsc": True,
      "value": "STANDARD"
    },
    "limit": 50,
    "offset": 0,
    "filterStatsForTopScoringPeriodIds": {
      "value": 5,
      "additionalValue": [
        "002021",
        "102021",
        "002020",
        "012021",
        "022021",
        "032021",
        "042021"
      ]
    }
  }
}

headers = {
    "x-fantasy-filter": json.dumps(filter_string)
}

api_url = "https://fantasy.espn.com/apis/v3/games/fba/seasons/2021/segments/0/leagues/133998?view=kona_player_info"
response = requests.get(api_url, headers=headers).json()

sample_table = []
for p in response["players"]:
    try:
        sample_table.append(
            [
                p['player']['fullName'],
                p['player']['stats'][9]['appliedTotal'],
                round(p['player']['stats'][9]['appliedAverage'], 2),
            ]
        )
    except (KeyError, IndexError):
        continue

print(tabulate(sample_table, headers=["Player", "Total", "Average"]))

输出:

Player                     Total    Average
-----------------------  -------  ---------
Giannis Antetokounmpo       4173      62.28
Anthony Davis               3518      57.67
Nikola Jokic                3728      54.82
Luka Doncic                 4172      63.21
LeBron James                3474      55.14
James Harden                4078      60.87
Stephen Curry               3506      56.55
Devin Booker                3491      52.89
Damian Lillard              3892      59.88
Trae Young                  3575      55.86
Kevin Durant                3339      52.17
Joel Embiid                 3375      54.44
Kawhi Leonard               3131      53.98
Jayson Tatum                3559      55.61
Jimmy Butler                2906      44.03
Russell Westbrook           3324      52.76
Kyrie Irving                3093      51.55
Donovan Mitchell            3383      49.75
Karl-Anthony Towns          3888      58.03
Pascal Siakam               3418      51.79
Bradley Beal                3571      53.3
Brandon Ingram              3306      50.86
Zion Williamson             3164      51.87
Bam Adebayo                 2647      38.93
Paul George                 3281      51.27
Ja Morant                   2946      45.32
Jamal Murray                3084      47.45
Andre Drummond              2949      44.01
DeMar DeRozan               2876      42.29
Khris Middleton             3064      47.88
Ben Simmons                 2825      44.14
Rudy Gobert                 2554      39.91
Jrue Holiday                2931      45.09
Deandre Ayton               2944      46
Chris Paul                  2561      41.31
Nikola Vucevic              2930      47.26
Shai Gilgeous-Alexander     3081      48.14
D'Angelo Russell            3127      50.44
Zach LaVine                 2944      49.07
Domantas Sabonis            2593      43.22
Tobias Harris               3213      46.57
John Collins                2584      41.68
Kyle Lowry                  2619      40.29
CJ McCollum                 2906      44.03
De'Aaron Fox                2765      43.89
Fred VanVleet               2657      40.88
T.J. Warren                 2838      44.34
LaMarcus Aldridge           2404      39.41
Kristaps Porzingis          2694      49.89
Kemba Walker                2398      42.82

奖金:

如果您想对 API 进行“分页”,只需将offset值更改为50. 那是你的第2页。


推荐阅读