python - 我将如何使用 BeautifulSoup 刮掉这个 ESPN Fantasy 播放器表?
问题描述
下面的链接有我要抓取的表格。我过去曾尝试过,但它的格式很奇怪,我不确定它们是否会阻止人们抓取。
这是我尝试过的
req_url = "https://fantasy.espn.com/basketball/players/add?leagueId=133998"
req = requests.get(req_url)
soup = BeautifulSoup(req.content, 'html.parser')
table = soup.find('table', {'id':'fitt-analytics'})
df = pd.read_html(str(table))[0]
df.head()
但它给了我一个错误,指出没有找到表。
https://fantasy.espn.com/basketball/players/add?leagueId=133998
解决方案
嗯......你可以探索selenium
,因为你不会得到它BeautifulSoup
(表格是由 JS 动态呈现的),或者尝试从API
.
这是一个获取玩家姓名及其幻想 PTS 总数和平均值的工作示例。
import json
import requests
from tabulate import tabulate
filter_string = {
"players": {
"filterStatus": {
"value": [
"FREEAGENT",
"WAIVERS"
]
},
"filterSlotIds": {
"value": [
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11
]
},
"filterRanksForScoringPeriodIds": {
"value": [
1
]
},
"sortPercOwned": {
"sortPriority": 2,
"sortAsc": False
},
"sortDraftRanks": {
"sortPriority": 100,
"sortAsc": True,
"value": "STANDARD"
},
"limit": 50,
"offset": 0,
"filterStatsForTopScoringPeriodIds": {
"value": 5,
"additionalValue": [
"002021",
"102021",
"002020",
"012021",
"022021",
"032021",
"042021"
]
}
}
}
headers = {
"x-fantasy-filter": json.dumps(filter_string)
}
api_url = "https://fantasy.espn.com/apis/v3/games/fba/seasons/2021/segments/0/leagues/133998?view=kona_player_info"
response = requests.get(api_url, headers=headers).json()
sample_table = []
for p in response["players"]:
try:
sample_table.append(
[
p['player']['fullName'],
p['player']['stats'][9]['appliedTotal'],
round(p['player']['stats'][9]['appliedAverage'], 2),
]
)
except (KeyError, IndexError):
continue
print(tabulate(sample_table, headers=["Player", "Total", "Average"]))
输出:
Player Total Average
----------------------- ------- ---------
Giannis Antetokounmpo 4173 62.28
Anthony Davis 3518 57.67
Nikola Jokic 3728 54.82
Luka Doncic 4172 63.21
LeBron James 3474 55.14
James Harden 4078 60.87
Stephen Curry 3506 56.55
Devin Booker 3491 52.89
Damian Lillard 3892 59.88
Trae Young 3575 55.86
Kevin Durant 3339 52.17
Joel Embiid 3375 54.44
Kawhi Leonard 3131 53.98
Jayson Tatum 3559 55.61
Jimmy Butler 2906 44.03
Russell Westbrook 3324 52.76
Kyrie Irving 3093 51.55
Donovan Mitchell 3383 49.75
Karl-Anthony Towns 3888 58.03
Pascal Siakam 3418 51.79
Bradley Beal 3571 53.3
Brandon Ingram 3306 50.86
Zion Williamson 3164 51.87
Bam Adebayo 2647 38.93
Paul George 3281 51.27
Ja Morant 2946 45.32
Jamal Murray 3084 47.45
Andre Drummond 2949 44.01
DeMar DeRozan 2876 42.29
Khris Middleton 3064 47.88
Ben Simmons 2825 44.14
Rudy Gobert 2554 39.91
Jrue Holiday 2931 45.09
Deandre Ayton 2944 46
Chris Paul 2561 41.31
Nikola Vucevic 2930 47.26
Shai Gilgeous-Alexander 3081 48.14
D'Angelo Russell 3127 50.44
Zach LaVine 2944 49.07
Domantas Sabonis 2593 43.22
Tobias Harris 3213 46.57
John Collins 2584 41.68
Kyle Lowry 2619 40.29
CJ McCollum 2906 44.03
De'Aaron Fox 2765 43.89
Fred VanVleet 2657 40.88
T.J. Warren 2838 44.34
LaMarcus Aldridge 2404 39.41
Kristaps Porzingis 2694 49.89
Kemba Walker 2398 42.82
奖金:
如果您想对 API 进行“分页”,只需将
offset
值更改为50
. 那是你的第2页。
推荐阅读
- javascript - 检索 oView 的所有现有 Id
- php - 具有 PHP mysql 扩展的类异步行为
- amazon-web-services - 在另一个账户中担任角色后恢复为 AWS Lambda 执行角色
- python - Google Maps Places API 未经授权的错误
- c# - 在内存中创建文件夹?
- angular - 如何在 AngularDart5 中查找和修复 ChangeDetection 的问题?
- excel - 选择打印时,在 Open XML SDK 2.5 中创建的文件会导致 excel 崩溃
- mongodb - 无法将文档插入分片集合
- java - 是否可以有一个覆盖 Windows 任务栏的非全屏 Java 应用程序?
- php - 其他声明不适用