html - 如何获取具有特定类后缀的 span 元素的内容?
问题描述
我承认我真的不擅长使用 Bs4 抓取网页。所以这里是我面临的问题。我得到了这个 html 文件。我想要的只是从所有包含后缀-confirmed-vn 的跨度中获取一个数字
....
<div class="board-content--left">
<div class="board-detail">
<div class="board-col col1 text-blue">
Cases<br>
<div style="margin-right:10px;">
<span class="live-confirmed-vn">15115 </span>
<span class="plus-confirmed-vn">+578</span>
</div>
</div>
<div class="board-col col2">
<div class="board-col-child">
Recovered:
<span class="live-recovered-vn"> 5949</span>
<span class="plus-recovered-vn">+0</span>
</div>
<div class="board-col-child">
Deaths:
<span class="live-death-vn"> 74 </span>
<span class="plus-death-vn"></span>
</div>
</div>
</div>
</div>
这就是我现在正在做的
import re
import request
from bs4 import BeautifulSoup
url = "https://thanhnien.vn/e-magazine/toan-canh-covid-19-tin-tuc-so-lieu-phan-tich-1265104.html"
# url contains html that contain structure above
req = requests.get(url)
soup = BeautifulSoup(req.text,features="html.parser")
test = soup.find_all('span', class_=re.compile(r'.+-confirmed-vn'))
print(test)
#print(test)
[<span class="live-confirmed-vn"></span>, <span class="plus-confirmed-vn"></span>, <span class="live-confirmed-vn text-red"></span>, <span class="live-confirmed-vn"></span>, <span class="live-confirmed-vn text-red"></span>, <span class="live-confirmed-vn"></span>]
解决方案
from bs4 import BeautifulSoup
html = """<div class="board-content--left">
<div class="board-detail">
<div class="board-col col1 text-blue">
Cases<br>
<div style="margin-right:10px;">
<span class="live-confirmed-vn">15115 </span>
<span class="plus-confirmed-vn">+578</span>
</div>
</div>
<div class="board-col col2">
<div class="board-col-child">
Recovered:
<span class="live-recovered-vn"> 5949</span>
<span class="plus-recovered-vn">+0</span>
</div>
<div class="board-col-child">
Deaths:
<span class="live-death-vn"> 74 </span>
<span class="plus-death-vn"></span>
</div>
</div>
</div>
</div>"""
soup = BeautifulSoup(html, 'lxml')
print([x.text for x in soup.select('span[class$=confirmed-vn]')])
输出:
['15115 ', '+578']
推荐阅读
- python - 使用 Python 将二维字符列表转换为字符串列表
- azure - 在数据工厂中看不到连接或触发器
- c++ - 用方程填充矩阵导致c ++
- apache-nifi - Minifi C2服务器连接nifi获取模板出错
- php - 致命错误:在第 68 行的 /var/www/html/gatewayFromFroentEnd/server.php 中调用未定义的方法 DOMNodeList::removeChild()
- cypress - 在 cypress 中检查同一个词的更正几次
- machine-learning - 家庭自动化的强化学习
- node.js - 像“ember s”这样的命令被卡住并且无法加载
- python - 使用烧瓶运行命令运行烧瓶服务器时无法导入模块
- php - 禁用 CSRF 保护不适用于 POST 路由