python - 美丽的汤:试图得到一个 div 的孩子
问题描述
我正在尝试从以下位置获取守望先锋联赛的球队名称和比分: https ://overwatchleague.com/en-us/schedule?stage=regular_season&week=1
我需要做的是从一个更大的 div 的孩子中抓取一系列孩子
到目前为止,我有:
matches = bs.find_all('div', {'class': 'schedule-boardstyles__ContainerCards-j4x5cc-8 jcvNlt'})
for match in matches:
rows = match.find_all('div', {'class': 'schedule-boardstyles__ContainerMatchCard-j4x5cc-9 esCuul match-cardstyles__Container-sc-1rgscfz-0 doBeIs'})
print("here")
for row in rows:
print('here2')
team1 = row.find('p', {'class': 'match-cardstyles__MiddleText-sc-1rgscfz-12 hueupq'})
score1 = row.find('p', {'class': 'match-cardstyles__ScoreText-sc-1rgscfz-23 gOtrSB'})
score2 = row.find('p', {'class': 'match-cardstyles__ScoreText-sc-1rgscfz-23 jRejaZ'})
team2 = row.find('p', {'class': 'match-cardstyles__MiddleText-sc-1rgscfz-12 cLYgmY'})
temp = 'team_1:{}, score":{}-{}", team_2:{}'.format(team1.text, score1.text, score2.text,team2.text)
print(temp)
match_schedule.append(temp)
但它没有返回任何东西,即使是从最初的比赛刮起,我做错了什么?
解决方案
信息是动态生成的,因此通常需要浏览器来构建它。然而,它也可以使用站点的 API 分两步提取。首先访问主页以确定所需的日程 ID。然后可以使用它来请求相关的匹配。信息以 JSON 格式返回。
例如:
import requests
from bs4 import BeautifulSoup
import json
url = "https://overwatchleague.com/en-us/schedule?stage=regular_season&week=1"
session = requests.Session()
r_main = session.get(url)
soup = BeautifulSoup(r_main.content, "html.parser")
js = soup.find('script', id="__NEXT_DATA__")
data_main = json.loads(js.string)
schedule = data_main['props']['pageProps']['blocks'][2]['schedule']['uid']
headers = {
"Referer" : "https://overwatchleague.com/",
"x-origin" : "overwatchleague.com",
"Origin" : "https://overwatchleague.com",
"DNT": "1",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
}
r_schedule = session.get(f'https://wzavfvwgfk.execute-api.us-east-2.amazonaws.com/production/v2/content-types/schedule/{schedule}/week/1?locale=en-us', headers=headers)
data_schedule = r_schedule.json()
matches = []
for match in data_schedule['data']['tableData']['events'][0]['matches']:
competitors = [c['name'] for c in match['competitors']]
scores = match['scores']
row = (competitors[0], scores[0], competitors[1], scores[1])
matches.append(row)
print(f"{row[0]:25} {row[1]:2} {row[2]:25} {row[3]}")
给你:
Houston Outlaws 3 Dallas Fuel 2
Los Angeles Gladiators 1 San Francisco Shock 3
Guangzhou Charge 0 Shanghai Dragons 3
Los Angeles Valiant 1 Chengdu Hunters 3
Philadelphia Fusion 3 Seoul Dynasty 1
Toronto Defiant 3 Vancouver Titans 1
Atlanta Reign 1 Florida Mayhem 3
Dallas Fuel 3 Los Angeles Gladiators 1
Guangzhou Charge 0 Seoul Dynasty 3
Chengdu Hunters 3 Shanghai Dragons 0
Philadelphia Fusion 3 Los Angeles Valiant 0
Houston Outlaws 3 San Francisco Shock 2
Florida Mayhem 3 Vancouver Titans 1
Toronto Defiant 3 Atlanta Reign 2
我强烈建议您打印出 JSON,例如data_schedule
,以便更好地了解返回的所有信息。脚本中的其他详细信息是通过使用浏览器的开发人员功能获得的,以查看在页面加载时发出了哪些请求。
推荐阅读
- javascript - 如何在数组变量中找到数组项的索引
- python - 合并两个具有不同时间索引的 Panda DataFrame
- mysql - Mysql社区服务器:输入root账户密码-在windows上
- python - 将子类实例转换为其父类实例
- python - 用户输入字母时输入失败。我想检查以确保他们输入了一个号码
- mongodb - 如何理解 mongoDB Api 方法要求?
- apache-spark - 生成 Spark 模式代码/持久化和重用模式
- amazon-web-services - Docker Compose 与 awslogs 驱动程序:如何获得更好的流名称?
- arrays - 用 Matlab 对角展开矩阵
- android - 我想每 1 分钟将我的位置发送到服务器,即使是 android 后台的应用程序?