首页 > 解决方案 > 如何获取具有特定类后缀的 span 元素的内容?

问题描述

我承认我真的不擅长使用 Bs4 抓取网页。所以这里是我面临的问题。我得到了这个 html 文件。我想要的只是从所有包含后缀-confirmed-vn 的跨度中获取一个数字

....
<div class="board-content--left">
    <div class="board-detail">
        <div class="board-col col1 text-blue">
            Cases<br>
            <div style="margin-right:10px;">
                <span class="live-confirmed-vn">15115 </span>
                <span class="plus-confirmed-vn">+578</span>
            </div>

        </div>
        <div class="board-col col2">
            <div class="board-col-child">
                Recovered:
                <span class="live-recovered-vn"> 5949</span>
                <span class="plus-recovered-vn">+0</span>
            </div>
            <div class="board-col-child">
                Deaths:
                <span class="live-death-vn"> 74 </span>
                <span class="plus-death-vn"></span>
            </div>
        </div>
    </div>
    
</div>

这就是我现在正在做的

import re
import request
from bs4 import BeautifulSoup

url = "https://thanhnien.vn/e-magazine/toan-canh-covid-19-tin-tuc-so-lieu-phan-tich-1265104.html"
# url contains html that contain structure above
req = requests.get(url)
soup = BeautifulSoup(req.text,features="html.parser")
test = soup.find_all('span', class_=re.compile(r'.+-confirmed-vn'))
print(test)
#print(test)

[<span class="live-confirmed-vn"></span>, <span class="plus-confirmed-vn"></span>, <span class="live-confirmed-vn text-red"></span>, <span class="live-confirmed-vn"></span>, <span class="live-confirmed-vn text-red"></span>, <span class="live-confirmed-vn"></span>]

标签: htmlpython-3.xbeautifulsoup

解决方案


from bs4 import BeautifulSoup
html = """<div class="board-content--left">
    <div class="board-detail">
        <div class="board-col col1 text-blue">
            Cases<br>
            <div style="margin-right:10px;">
                <span class="live-confirmed-vn">15115 </span>
                <span class="plus-confirmed-vn">+578</span>
            </div>

        </div>
        <div class="board-col col2">
            <div class="board-col-child">
                Recovered:
                <span class="live-recovered-vn"> 5949</span>
                <span class="plus-recovered-vn">+0</span>
            </div>
            <div class="board-col-child">
                Deaths:
                <span class="live-death-vn"> 74 </span>
                <span class="plus-death-vn"></span>
            </div>
        </div>
    </div>
    
</div>"""


soup = BeautifulSoup(html, 'lxml')

print([x.text for x in soup.select('span[class$=confirmed-vn]')])

输出:

['15115 ', '+578']

推荐阅读