首页 > 解决方案 > 如何在 bs4 python 中只从这个 html 中提取价格文本?

问题描述

因此,我正在构建一个网络抓取工具,并且无法仅从该页面中提取价格。Python 也在拉动 550 美元。我只是在寻找 41,991 美元。html如下。

<div class="snapshot__body-content">
              <div class="snapshot__col1">
               <ul class="snapshot__details list-unstyled">
                <li class="snapshot__details-price">
                 <sup>
                  $
                 </sup>
                 41,991
                 <!-- -->
                 <a class="btn-link snapshot__details-monthly hidden-xs hidden-sm" href="/vehicle/details/73082384">
                  <sup>
                   $
                  </sup>
                  <span>
                   550
                  </span>
                  /mo*
                 </a>

这是我当前的 bs4 代码。

try:
        data["Price"] = item.find_all("li", {"class":"snapshot__details-price"})[0].text.replace("/mo*","")
    except:
        data["Price"] = None

标签: pythonweb-scrapingbeautifulsoup

解决方案


之后您可以尝试get_text()提取标签内部文本的方法rsplit()以获得结果。

from bs4 import BeautifulSoup
import requests
response = """<div class="snapshot__body-content">
    <div class="snapshot__col1">
        <ul class="snapshot__details list-unstyled">
            <li class="snapshot__details-price">
                <sup>
                  $
                 </sup>
                 41,991
                 
                
                <!-- -->
                <a class="btn-link snapshot__details-monthly hidden-xs hidden-sm" href="/vehicle/details/73082384">
                    <sup>
                   $
                  </sup>
                    <span>
                   550
                  </span>
                  /mo*
                 
                
                </a>
            </li>
        </ul>
    </div>
</div>"""
soup = BeautifulSoup(response, 'lxml')

for data in soup.find_all('li',{"class":"snapshot__details-price"}):
    print(data.get_text(strip=True).rsplit('$', maxsplit=1)[0])

推荐阅读