python - 我如何从 BeautifuSoup 评论中解析
问题描述
我正在使用 Python 的 BeautifulSoup 对足球统计数据进行一些数据挖掘。在尝试过滤汤时,有些表格会出现问题。经过进一步检查,我需要的数据似乎包含在评论中,而通过 Web 开发人员工具查看时并非如此。
import requests
from bs4 import BeautifulSoup, Comment
url='https://aws.pro-football-reference.com/teams/mia/2000.htm'
page = requests(url)
soup = BeautifulSoup(page, 'html.parser')
table = soup.find(id='all_passing')
print(table)
以下是打印内容的示例。
<div class="table" id="all_passing"> <div class="placeholder"></div> <!-- <div class="table_outer_container">
<div class="overthrow table_container" id="div_passing">
<table class="sortable stats_table" id="passing" <caption>Passing Table</caption> <colgroup><col><col><col></colgroup> <thead>
<tr>
<th aria-label="Uniform number" data-stat="uniform_number" scope="col">No.</th>
<th aria-label="Player's age" data-stat="age" scope="col">Age</th> <th aria-label="Position" data-stat="pos" scope="col">Pos</th>
</tr>
</thead> <tbody> <tr ><th scope="row" class="right " data-stat="uniform_number" >9</th><td class="right " data-stat="age"
>29</td><td class="left " data-stat="pos" >QB</td></tr> </tbody> </table>
</div> </div>
--> <div class="placeholder"></div> </div>
我该如何过滤评论?这是我尝试过的。
comments = table.find_all(text=lambda text:isinstance(text, Comment))
rows = comments[0].find_all('tr')
print('rows: ' + rows)
这打印:
None
解决方案
您可以迭代.contents
标签并检查内容类型是否为Comment
. 您可以将评论加载到新汤中:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://aws.pro-football-reference.com/teams/mia/2000.htm'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# find all passing table:
for c in soup.select_one('#all_passing').contents:
if isinstance(c, Comment):
break
all_passing = BeautifulSoup(c, 'html.parser')
# print some data to screen:
for tr in all_passing.select('tr'):
print(tr.get_text(strip=True, separator='\t'))
印刷:
No. Player Age Pos G GS QBrec Cmp Att Cmp% Yds TD TD% Int Int% Lng Y/A AY/A Y/C Y/G Rate QBR Sk Yds NY/A ANY/A Sk%4QC GWD
9 Jay Fiedler 29 QB 15 15 10-5-0 204 357 57.1 2402 14 3.9 14 3.9 61 6.7 5.7 11.8 160.1 74.5 23 129 5.98 5.06 6.11 1
11 Damon Huard 27 qb 16 1 1-0-0 39 63 61.9 318 1 1.6 3 4.8 29 5.0 3.2 8.2 19.9 60.2 4 22 4.42 2.70 6.01 1
26 Lamar Smith 30 RB 15 15 0 1 0.0 0 0 0.0 0 0.0 0 0.0 0.0 0.0 39.6 0 0 0.00 0.00 0.0
34 Thurman Thomas 34 9 0 0 0 0 0 0 0 0.0 1 2 -2.00 -2.00 100.0
Team Total 27.3 16 11-5-0 243 421 57.7 2720 15 3.6 17 4.0 61 6.5 5.4 11.2 170.0 72.2 28 153 5.72 4.68 6.2 2 2
Opp Total 16 282 530 53.2 3170 13 2.5 28 5.3 6.0 4.09 11.2 198.1 57.5 48 270 5.0 3.3 8.3