python - 在python中使用bs4从div中的后代标签中刮取
问题描述
我的代码只能找到 div class="league-player-tracking-shots",我使用了后代、孩子和内容,但无法到达树的底部,我需要 td 标记中的值,请帮忙
网址 - https://stats.nba.com/players/bio/?sort=PLAYER_NAME&dir=-1
from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = 'https://stats.nba.com/players/bio/?sort=PLAYER_NAME&dir=-1'
##url = input('Enter -')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
player = soup.find('nba-stat-table')
stat_table = player.find(class_='nba-stat-table__overlay')
for child in stat_table.children:
c = child.findAll('td')
#print(c)
print(player)
print(stat_table)
解决方案
该页面从外部源加载数据。您可以使用requests
模块来模拟此请求。
例如:
import json
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0',
'Referer': 'https://stats.nba.com/players/bio/?sort=PLAYER_NAME&dir=-1',
'x-nba-stats-token': 'true',
}
url = 'https://stats.nba.com/stats/leaguedashplayerbiostats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&Season=2019-20&SeasonSegment=&SeasonType=Regular Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight='
data = requests.get(url, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for row in data['resultSets'][0]['rowSet']:
print(('{:<20}'*len(row)).format(*row))
印刷:
203932 Aaron Gordon 1610612753 ORL 24.0 6-8 80 235 Arizona USA 2014 1 4 58 14.4 7.6 3.7 -2.0 0.051 0.175 0.203 0.511 0.164
1628988 Aaron Holiday 1610612754 IND 23.0 6-0 72 185 UCLA USA 2018 1 23 58 9.4 2.3 3.3 1.9 0.015 0.076 0.188 0.517 0.191
1627846 Abdel Nader 1610612760 OKC 26.0 6-5 77 225 Iowa State Egypt 2016 2 58 48 6.0 1.9 0.7 -3.4 0.018 0.095 0.16 0.58 0.07
1629690 Adam Mokoka 1610612741 CHI 21.0 6-5 77 190 None France Undrafted Undrafted Undrafted 11 2.9 0.9 0.4 17.1 0.057 0.029 0.11 0.538 0.043
1629678 Admiral Schofield 1610612764 WAS 23.0 6-5 77 241 Tennessee United Kingdom 2019 2 42 27 3.1 1.3 0.5 -4.9 0.018 0.092 0.123 0.514 0.068
201143 Al Horford 1610612755 PHI 34.0 6-9 81 240 Florida Dominican Republic 2007 1 3 60 12.0 6.9 4.1 3.5 0.05 0.17 0.174 0.526 0.185
202329 Al-Farouq Aminu 1610612753 ORL 29.0 6-8 80 220 Wake Forest USA 2010 1 8 18 4.3 4.8 1.2 -5.4 0.053 0.158 0.127 0.395 0.088
202692 Alec Burks 1610612755 PHI 28.0 6-6 78 214 Colorado USA 2011 1 12 59 15.1 4.4 2.9 -8.3 0.025 0.134 0.23 0.549 0.178
1629346 Alen Smailagic 1610612744 GSW 19.0 6-10 82 215 None Serbia 2019 2 39 14 4.2 1.9 0.9 -3.0 0.077 0.11 0.175 0.61 0.133
1627936 Alex Caruso 1610612747 LAL 26.0 6-5 77 186 Texas A&M USA Undrafted Undrafted Undrafted 58 5.4 1.9 1.8 10.3 0.014 0.088 0.136 0.538 0.128
203458 Alex Len 1610612758 SAC 27.0 7-0 84 250 Maryland Ukraine 2013 1 5 49 8.3 6.0 1.0 -5.6 0.096 0.195 0.178 0.596 0.085
1628035 Alfonzo McKinnie 1610612739 CLE 27.0 6-7 79 215 None USA Undrafted Undrafted Undrafted 40 4.6 2.8 0.4 -7.4 0.061 0.131 0.147 0.493 0.032
1628993 Alize Johnson 1610612754 IND 24.0 6-7 79 212 Missouri State USA 2018 2 50 13 1.4 1.4 0.2 4.7 0.119 0.204 0.142 0.51 0.045
203459 Allen Crabbe 1610612750 MIN 28.0 6-5 77 212 California USA 2013 2 31 37 4.6 2.1 0.9 -15.0 0.014 0.098 0.122 0.47 0.073
1629019 Allonzo Trier 1610612752 NYK 24.0 6-4 76 200 Arizona USA Undrafted Undrafted Undrafted 24 6.5 1.2 1.2 -11.1 0.018 0.078 0.208 0.62 0.164
1628518 Amile Jefferson 1610612753 ORL 27.0 6-9 81 222 Duke USA Undrafted Undrafted Undrafted 18 0.8 1.3 0.2 8.5 0.112 0.163 0.129 0.372 0.078
...and so on.
推荐阅读
- vbscript - 如何使用 GUI 脚本读取 GuiTree 控件的行详细信息
- php - 在 laravel 6.9 中删除一行数据库时,如何从驱动器中删除图像?
- sed - 使用 sed 时 bash while 循环失败
- node.js - 无法升级 Expo CLI
- visual-studio-code - 扩展是否可以控制隐藏上下文菜单,例如添加日志点
- python - TesseractNotFoundError:两个 docker 容器 python 应用程序(docker-compose)
- ruby - <<- 符号在rails中是什么意思?
- wordpress - 如何在帖子加载到 WordPress 页面之前更新帖子?
- javascript - React.js - 将状态项设置为数组项,在其他状态项中引用,然后在数组项更改时更新状态
- reactjs - 获取 cookie 时反应状态总是落后一步