首页 > 解决方案 > Extract integer from HTML file with BS4

问题描述

I am trying to extract the integer (0) from within the div of the class 'high' in beautifulsoup to store in a variable:

[<tr class="high">
<td>
<div>
<a href="#*_high">High</a>
</div>
</td>
<td style="text-align: center;">
<div>0</div>
</td>
</tr>]

I am able to extract the section above from the html file using

high = soup.find_all(class_="high")

However any attempts to filter down to just the integer result in empty results:

div = soup.find("div", class_= "High")
print(div)

Any help would be greatly appreciated!

标签: pythonbeautifulsoup

解决方案


首先找到<tr class='high'>标签。从那里找到这两个<td>元素。第二个包含您想要的数据。例如:

from bs4 import BeautifulSoup

html = """<tr class="high">
<td>
<div>
<a href="#*_high">High</a>
</div>
</td>
<td style="text-align: center;">
<div>0</div>
</td>
</tr>"""

soup = BeautifulSoup(html, "html.parser")
tr = soup.find('tr', class_="high")
data = int(tr.find_all('td')[1].get_text(strip=True))

print(data)

推荐阅读