python - 消除在 Beautiful Soup 中使用 .decompose() 标记
问题描述
我正在使用 Beautiful Soup 来抓取一个网站,并且在使用它decompose()
来删除<del>
我正在抓取的部分内的标签时遇到了麻烦。
页面上的所有产品的价格都在一个<div>
类product-card__price
中。但是,有些产品是打折的,其中包含两个价格<div>
。全价包含在<del>$</del>
当前价格之前的标签 ( ) 中。
# Example 1 - one price
<div class="flex-split__item product-card__price">
$11.99
</div>
# Example 2 - two prices
<div class="flex-split__item product-card__price">
<del>$9.99</del>
$8.99
</div>
如果我只是简单地div
用抓取其中的文本price = container.find(class_ = 'product-card__price').text.strip()
,Example #2 将返回$9.99 $8.99
。阅读文档,我认为我应该能够使用以下代码decompose()
去除包含的文本:<del></del>
if container.find(class_ = 'product-card__price'):
if container.find('del'):
full_price = container.find('del').text.strip()
current_price = container.find(class_ = 'product-card__price').decompose()
else:
full_price = None
price = container.find(class_ = 'product-card__price').text.strip()
else:
price = None
full_price = None
但是,这会返回结果None
。我可以用正则表达式拆分字符串,但想了解我在分解/提取方面做错了什么。示例网页在这里。
解决方案
对于获取full_price
,price
您不必.extract()
/标签。它所需要的只是使用简单的:.decompose()
<del>
str.split()
import requests
from bs4 import BeautifulSoup
url = "https://gtfoitsvegan.com/shop/?v=7516fd43adaa"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for product in soup.select(".product-card"):
prices = product.select_one(".product-card__price").text.split()
if len(prices) == 2:
full_price, price = prices
else:
full_price = "-"
price = prices[0]
title = product.select_one(".product-card__title").get_text(strip=True)
print("{:<65}{:<7}{:<7}".format(title, full_price, price))
印刷:
Italian Sausage Meatballs by Hungry Planet - $7.99
Pork Gyoza by Hungry Planet - $7.99
Asian Pork Meatballs by Hungry Planet - $6.29
Grilled and Diced Chicken by Hungry Planet - $7.99
Grilled Chicken Strips by Hungry Planet - $7.99
Crispy Fried Chicken Patties by Hungry Planet - $7.99
New England Style Crab Cakes by Hungry Planet - $11.99
Ground Beef by Hungry Planet $9.99 $8.99
Burger Patties by Hungry Planet $11.99 $9.99
Southwest Chipotle Chicken Patties by Hungry Planet - $11.99
Italian Jack Sausages by Jack & Annie’s - $8.69
Apple Jack Sausages by Jack & Annie’s - $8.69
Sliced Mozzarella Soy Cheese by Tofutti - $4.69
Train Your Dragon Smoothie / Pitaya Bowl by Rollin’ n Bowlin’ - $6.89
Organic Mini Thyme Leaf by Simply Organic - $2.49
Organic Mini Rosemary Leaf by Simply Organic - $2.49
Organic Mini Onion Powder by Simply Organic - $2.49
Organic Mini Ground Cumin by Simply Organic - $2.49
Chick’n Pieces By Like Meat - $8.59
BBQ Chick’n By Like Meat - $8.59
Nuggets by Like Meat - $8.59
Grilled Chick’n by Like Meat - $8.59
Zalmon Sashimi 10.9oz by Vegan Zeastar - $15.99
Very Good Dog by The Very Good Butchers - $7.99
推荐阅读
- teradata - Teradata 存储过程
- jmeter - 如何在 loadrunner 中传递随机特定值..?
- c# - 如何从命令行使用 PHP 中的会话?
- nativescript - Objective-C 在 Nativescript 中编组“可选”属性的问题
- python - PI 小数点后第 n 位
- java - 如何执行存储在 Java 上的 String 变量中的多个请求?
- excel - VBA - 日期时间字符串替换/替换
- sql - 如何在postgresql的查询中一起转换列和函数之间的使用
- r - 总结选定的行重复项
- conemu - 如何在 Cmder (ConEmu) 的当前选项卡中执行命令