python - Beautifulsoup 不会检索所有的 html
问题描述
我尝试抓取该游戏的玩家统计数据:“https://siege.gg/matches/5694-invitational-intl-faze-clan-vs-team-liquid”但看起来我的代码无法检索所有 html 可以有人帮我吗?
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
url="https://siege.gg/matches/5694-invitational-intl-faze-clan-vs-team-liquid"
match_page=requests.get(url, headers=headers)
match_soup = BeautifulSoup(match_page.content, features="lxml")
all_stats_soup=match_soup.find(id="DataTables_Table_0_wrapper")
这部分 html 没有出现在“match_soup”上,所以当我做汤的时候。发现它返回一个无
解决方案
数据在 javascript 变量中。您可以使用re
模块来解析它。
此示例将表格数据解析为panda
DataFrame:
import re
import requests
import pandas as pd
from io import StringIO
url = "https://siege.gg/matches/5694-invitational-intl-faze-clan-vs-team-liquid"
html_doc = requests.get(url).text
df = pd.read_html(StringIO(re.search(r"var a = `(.*)`", html_doc).group(1)))[0]
print(df)
印刷:
Unnamed: 0 Rating K-D (+/-) Entry (+/-) KOST KPR SRV 1vX Plant HS% Atk Def Team
0 cameram4n 0.74 16-27 (-11) 1-4 (-3) 56% 0.44 25% 1 0 47% Iana Mute 50
1 muringa 0.83 15-20 (-5) 1-3 (-2) 58% 0.42 44% 0 1 67% Thatcher Smoke 19
2 Astro 1.03 24-23 (+1) 2-3 (-1) 56% 0.67 36% 2 3 50% Ace Kaid 50
3 NESKWGA 1.20 35-25 (+10) 5-5 (+0) 58% 0.97 31% 0 1 56% Hibana Jager 19
4 Bullet1 0.84 22-29 (-7) 5-7 (-2) 53% 0.61 19% 0 1 32% Ash Jager 50
5 psk1 0.83 16-23 (-7) 2-6 (-4) 61% 0.44 36% 0 1 31% Nomad Mute 19
6 xS3xyCake 1.13 27-23 (+4) 5-1 (+4) 78% 0.75 36% 0 3 50% Maverick Echo 19
7 Cyber 0.90 25-28 (-3) 4-4 (+0) 56% 0.69 22% 0 0 36% Sledge Smoke 50
8 Paluh 1.47 42-21 (+21) 6-2 (+4) 72% 1.17 42% 3 0 72% Sledge Melusi 19
9 soulz1 0.88 24-29 (-5) 5-1 (+4) 58% 0.67 19% 0 1 52% Maverick Echo 50
或与bs4
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
re.search(r"var a = `(.*)`", html_doc).group(1), "html.parser"
)
for tr in soup.select("tr"):
print(*tr.get_text(strip=True, separator="|").split("|"), sep="\t")
印刷:
Rating K-D (+/-) Entry (+/-) KOST KPR SRV 1vX Plant HS% Atk Def Team
cameram4n 0.74 16-27 (-11) 1-4 (-3) 56% 0.44 25% 1 0 47% Iana Mute 50
muringa 0.83 15-20 (-5) 1-3 (-2) 58% 0.42 44% 0 1 67% Thatcher Smoke 19
Astro 1.03 24-23 (+1) 2-3 (-1) 56% 0.67 36% 2 3 50% Ace Kaid 50
NESKWGA 1.20 35-25 (+10) 5-5 (+0) 58% 0.97 31% 0 1 56% Hibana Jager 19
Bullet1 0.84 22-29 (-7) 5-7 (-2) 53% 0.61 19% 0 1 32% Ash Jager 50
psk1 0.83 16-23 (-7) 2-6 (-4) 61% 0.44 36% 0 1 31% Nomad Mute 19
xS3xyCake 1.13 27-23 (+4) 5-1 (+4) 78% 0.75 36% 0 3 50% Maverick Echo 19
Cyber 0.90 25-28 (-3) 4-4 (+0) 56% 0.69 22% 0 0 36% Sledge Smoke 50
Paluh 1.47 42-21 (+21) 6-2 (+4) 72% 1.17 42% 3 0 72% Sledge Melusi 19
soulz1 0.88 24-29 (-5) 5-1 (+4) 58% 0.67 19% 0 1 52% Maverick Echo 50
推荐阅读
- ios - UIScrollView 与其他 UIElement 重叠,因此它们不再可点击
- excel - 工作日功能,但如果是实际工作日则返回相同的日期
- loops - 如何在javascript中批量迭代地图?
- sql - Oracle SQL——查找上周未显示的新用户
- excel - 使用 OR 和 Cells.Find 出现多个错误 - 需要 RTE424 对象,RTE13 类型不匹配,未设置 E91 块变量
- java - 如何找到接口方法的 ONE 实现的用法
- java - 这个 Hibernate Idiom 线程安全吗?
- r - 添加描边点ggplot2的图例
- python - 使用 seleinum 无法使用“非 html 按钮”
- ios-simulator - 是否可以通过 xcrun simctl 与克隆的 iOS 模拟器进行交互?