python - 如何使用 bs4 搜索缩进到另一个 div 属性的 div 属性?
问题描述
我正在尝试构建一个 python 脚本来抓取 UEFA 网站的实时比分,但我找不到包含匹配分数的属性,因为它位于另一个 div 属性中。
这是代码:
from datetime import date
import requests
from bs4 import BeautifulSoup
today= date.today()
d= today.strftime("%Y-%m-%d")
page = requests.get("https://www.uefa.com/livescores/?date=" + d)
soup = BeautifulSoup(page.content, "html.parser")
matches_list = soup.find_all("div", class_="matches-list")
print(matches_list)
我想知道我是否可以直接从顶部搜索该属性而无需向下搜索三个。
解决方案
此站点使用 API 调用:
GET https://match.uefa.com/v2/matches
带有日期、分页和竞争标识的一些查询参数
它需要一个嵌入在 javascript 标签中的 api 密钥。一种解决方案是使用正则表达式提取此 api 密钥,然后用于requests
进行调用:
from datetime import date
import requests
import re
today = date.today()
d = today.strftime("%Y-%m-%d")
r = requests.get("https://www.uefa.com/livescores/?date=" + d)
reg = re.search("apiKey.*['\"](.*)['\"]", r.text, re.MULTILINE)
apiKey = reg.group(1)
r = requests.get("https://match.uefa.com/v2/matches",
params={
"fromDate": today,
"toDate": today,
"order": "ASC",
"offset": 0,
"limit": 100,
"competitionId": "18,39,14,27,38,22,19,2014,2017,5,28,9,1,13,3,2018,101,17,2008,23"
},
headers={
"x-api-key": apiKey
})
result = r.json()
data = [{
"awayTeam": t["awayTeam"]["internationalName"],
"homeTeam": t["homeTeam"]["internationalName"],
"datetime": t["kickOffTime"]["dateTime"],
"score": t["score"]["total"] if t.get("score") else {},
"winner": {
"reason": t["winner"]["match"]["reason"],
"team": t["winner"]["match"]["team"]["internationalName"] if t["winner"]["match"].get("team") else ""
} if t.get("winner") else {}
}
for t in result
]
print(data)
如果此时可用,它将打印带有分数的比赛信息
[{
'awayTeam': 'Turkey',
'homeTeam': 'Switzerland',
'datetime': '2021-06-20T16:00:00Z',
'score': {},
'winner': {}
}, {
'awayTeam': 'Wales',
'homeTeam': 'Italy',
'datetime': '2021-06-20T16:00:00Z',
'score': {},
'winner': {}
}]
编辑
看来您甚至不需要更简单的 api 密钥:
from datetime import date
import requests
today = date.today()
d = today.strftime("%Y-%m-%d")
r = requests.get("https://match.uefa.com/v2/matches",
params={
"fromDate": today,
"toDate": today,
"order": "ASC",
"offset": 0,
"limit": 100,
"competitionId": "18,39,14,27,38,22,19,2014,2017,5,28,9,1,13,3,2018,101,17,2008,23"
})
result = r.json()
data = [{
"awayTeam": t["awayTeam"]["internationalName"],
"homeTeam": t["homeTeam"]["internationalName"],
"datetime": t["kickOffTime"]["dateTime"],
"score": t["score"]["total"] if t.get("score") else {},
"winner": {
"reason": t["winner"]["match"]["reason"],
"team": t["winner"]["match"]["team"]["internationalName"] if t["winner"]["match"].get("team") else ""
} if t.get("winner") else {}
}
for t in result
]
print(data)
推荐阅读
- python - 查找 dict.get 返回的列表的第一个元素
- objective-c - 压缩后我没有得到确切的日志文件
- tradingview-api - 是否可以有一个 html 超链接来打开特定交易品种的交易视图?
- forms - How can I make a textField in react native required
- php - 如何让视频播放更快?
- javascript - 猫鼬 - 聚合数据
- css - CSS:如何使用 :not 添加多个选择器(其中之一是 [attribute*=value])
- javascript - react-router-dom 的链接在 Firefox 中不起作用
- javascript - 如何阻止 WebStorm 自动清除未使用的导入
- python - LSD Radix Sort to unequal strings Python