首页 > 解决方案 > 消除在 Beautiful Soup 中使用 .decompose() 标记

问题描述

我正在使用 Beautiful Soup 来抓取一个网站,并且在使用它decompose()来删除<del>我正在抓取的部分内的标签时遇到了麻烦。

页面上的所有产品的价格都在一个<div>product-card__price中。但是,有些产品是打折的,其中包含两个价格<div>。全价包含在<del>$</del>当前价格之前的标签 ( ) 中。

# Example 1 - one price
<div class="flex-split__item product-card__price">
    $11.99          
</div>
# Example 2 - two prices
<div class="flex-split__item product-card__price">
   <del>$9.99</del> 
   $8.99          
</div>

如果我只是简单地div用抓取其中的文本price = container.find(class_ = 'product-card__price').text.strip(),Example #2 将返回$9.99 $8.99。阅读文档,我认为我应该能够使用以下代码decompose()去除包含的文本:<del></del>

if container.find(class_ = 'product-card__price'):
   if container.find('del'):
      full_price = container.find('del').text.strip()
      current_price = container.find(class_ = 'product-card__price').decompose()
   else:
      full_price = None
      price = container.find(class_ = 'product-card__price').text.strip()
else:
   price = None
   full_price = None

但是,这会返回结果None。我可以用正则表达式拆分字符串,但想了解我在分解/提取方面做错了什么。示例网页在这里

标签: pythonbeautifulsoup

解决方案


对于获取full_priceprice您不必.extract()/标签。它所需要的只是使用简单的:.decompose()<del>str.split()

import requests
from bs4 import BeautifulSoup


url = "https://gtfoitsvegan.com/shop/?v=7516fd43adaa"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for product in soup.select(".product-card"):
    prices = product.select_one(".product-card__price").text.split()
    if len(prices) == 2:
        full_price, price = prices
    else:
        full_price = "-"
        price = prices[0]

    title = product.select_one(".product-card__title").get_text(strip=True)

    print("{:<65}{:<7}{:<7}".format(title, full_price, price))

印刷:

Italian Sausage Meatballs by Hungry Planet                       -      $7.99  
Pork Gyoza by Hungry Planet                                      -      $7.99  
Asian Pork Meatballs by Hungry Planet                            -      $6.29  
Grilled and Diced Chicken by Hungry Planet                       -      $7.99  
Grilled Chicken Strips by Hungry Planet                          -      $7.99  
Crispy Fried Chicken Patties by Hungry Planet                    -      $7.99  
New England Style Crab Cakes by Hungry Planet                    -      $11.99 
Ground Beef by Hungry Planet                                     $9.99  $8.99  
Burger Patties by Hungry Planet                                  $11.99 $9.99  
Southwest Chipotle Chicken Patties by Hungry Planet              -      $11.99 
Italian Jack Sausages by Jack & Annie’s                          -      $8.69  
Apple Jack Sausages by Jack & Annie’s                            -      $8.69  
Sliced Mozzarella Soy Cheese by Tofutti                          -      $4.69  
Train Your Dragon Smoothie / Pitaya Bowl by Rollin’ n Bowlin’    -      $6.89  
Organic Mini Thyme Leaf by Simply Organic                        -      $2.49  
Organic Mini Rosemary Leaf by Simply Organic                     -      $2.49  
Organic Mini Onion Powder by Simply Organic                      -      $2.49  
Organic Mini Ground Cumin by Simply Organic                      -      $2.49  
Chick’n Pieces By Like Meat                                      -      $8.59  
BBQ Chick’n By Like Meat                                         -      $8.59  
Nuggets by Like Meat                                             -      $8.59  
Grilled Chick’n by Like Meat                                     -      $8.59  
Zalmon Sashimi 10.9oz by Vegan Zeastar                           -      $15.99 
Very Good Dog by The Very Good Butchers                          -      $7.99  

推荐阅读