python - Webscraping 将产品名称和相应的价格添加到 pandas 数据框
问题描述
我正在练习网页抓取,并想将产品名称和价格提取到熊猫数据框中。
这是我的代码”
for web in website:
r=re.get(web)
soup = BeautifulSoup(r.content, 'html.parser')
for i in soup.find_all("div", {"class":"row productspm"}):
for name in soup.find_all("h4"):
if name.text not in productname:
productname.append(name.text)
for price in soup.find_all("p",{'class':"price text-right"}):
prices.append(price.text)
print(len(productname))
当我提取数据时。我没有收到任何错误,但数据框包含所有错误信息。
首先,不是提取 43 个产品,而是提取 61 个产品名称。其次,产品的价格与网站上显示的价格不符。当产品打折时,他们使用不同的 html 代码,这会在抓取中产生问题。
以下是网站上非销售产品的 HTML 代码:
<div class="product-layout product-grid col-lg-3 col-md-4 col-sm-6 col-xs-12">
<div class="product-thumb transition">
<div class="image"><a href="---"><img src="--" alt="BREATHING BAG 3L N-LATEX PARKER" title="BREATHING BAG 3L N-LATEX PARKER" class="img-responsive center-block"></a>
<!-- Webiarch Images Start -->
<!-- End -->
<div class="topbutton">
<button type="button" data-toggle="tooltip" title="" onclick="wishlist.add('250');" data-original-title="Add to Wish List"><svg width="20px" height="20px"><use xlink:href="#wishlist"></use></svg><span class="hidden-xs"></span></button>
<button type="button" data-toggle="tooltip" title="" onclick="compare.add('250');" class="wishcom" data-original-title="Compare this Product"><svg width="20px" height="20px"><use xlink:href="#pcom"></use></svg><span class="hidden-xs"></span></button>
<div class="bquickv" title="" data-toggle="tooltip" data-original-title="quickview"><div class="webi-ownstyle webi-quickview"><a href="#"><svg width="20px" height="20px"><use xlink:href="#pquick"></use></svg></a></div></div>
</div>
</div>
<div class="caption">
<h4><a href="---">BREATHING BAG 3L N-LATEX PARKER</a></h4>
<p class="list-des">BREATHING
BAG 3L N-LATEX PARKER..</p>
<div class="rating pull-left"> <span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
</div>
<p class="price text-right"> SAR 150</p>
<div class="clearfix"></div>
<div class="button-group">
<button type="button" onclick="cart.add('250');" class="acart">
<span>Add to Cart</span>
</button>
</div>
</div>
</div>
</div>
这是有销售的产品。
<div class="product-layout product-grid col-lg-3 col-md-4 col-sm-6 col-xs-12">
<div class="product-thumb transition">---" title="Everbrite In-Office Tooth Whitening Kit (3 Patients)" class="img-responsive center-block"></a>
<!-- Webiarch Images Start -->
<!-- End -->
<span class="salep">sale</span>
<div class="topbutton">
<button type="button" data-toggle="tooltip" title="" onclick="wishlist.add('189');" data-original-title="Add to Wish List"><svg width="20px" height="20px"><use xlink:href="#wishlist"></use></svg><span class="hidden-xs"></span></button>
<button type="button" data-toggle="tooltip" title="" onclick="compare.add('189');" class="wishcom" data-original-title="Compare this Product"><svg width="20px" height="20px"><use xlink:href="#pcom"></use></svg><span class="hidden-xs"></span></button>
<div class="bquickv" title="" data-toggle="tooltip" data-original-title="quickview"><div class="webi-ownstyle webi-quickview"><a href="#"><svg width="20px" height="20px"><use xlink:href="#pquick"></use></svg></a></div></div>
</div>
</div>
<div class="caption">
<h4><a href="---">Everbrite In-Office Tooth Whitening Kit (3 Patients)</a></h4>
<p class="list-des">Everbrite In-Office Tooth Whitening Kit (3 Patients)
Used for Dentamerica Whitening System. One hour..</p>
<div class="rating pull-left"> <span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
<span class="fa fa-stack"><i class="fa fa-star-o fa-stack-2x"></i></span>
</div>
<p class="pricedis price text-right"><span class="price-new"> SAR 275</span> <span class="price-old"> SAR 345</span></p>
<div class="clearfix"></div>
<div class="button-group">
<button type="button" onclick="cart.add('189');" class="acart">
<span>Add to Cart</span>
</button>
</div>
</div>
</div>
</div>
有人可以让我知道我在哪里犯了错误以及如何纠正它。非常感谢
这是我得到的价格清单:
prices
[' SAR 110', ' SAR 41', ' SAR 1,760', ' SAR 150', ' SAR 3,103', ' SAR 5,770', ' SAR 540', ' SAR 4,900', ' SAR 2,650', ' SAR 603', ' SAR 58', ' SAR 15', ' SAR 15', ' SAR 3,200', ' SAR 35', ' SAR 890', ' SAR 75', ' SAR 10,500', ' SAR 1,560', ' SAR 2,421', ' SAR 4,904', ' SAR 223', ' SAR 5,072', ' SAR 1,600', ' SAR 9,700', ' SAR 354', ' SAR 25,600', ' SAR 1,800', ' SAR 84', ' SAR 256', ' SAR 120', ' SAR 349', ' SAR 2,100', ' SAR 21,500', ' SAR 15', ' SAR 3,450']
解决方案
It is very hard to answer and give a recommendation based on your input, so it would be really cool to improve your question.
What happens?
Problem of difference between name and price is the way you loop your response and append things to the lists. They are independent from each other.
How to fix that?
You should grab all the information in one step, like this:
data = []
for item in soup.select('div.row.productspm > div'):
data.append({
'name':item.h4.get_text(),
'price': item.select_one('p.price').get_text('^^', strip=True).split('^^')[0]
})
Cause it is not clear I grab only the regular price and the new price like this:
'price': item.select_one('p.price').get_text('^^', strip=True).split('^^')[0]
Example
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get("https://alrazimed.me/index.php?route=product/category&path=178_115")
soup = BeautifulSoup(page.content, "html.parser")
data = []
for item in soup.select('div.row.productspm > div'):
data.append({
'name':item.h4.get_text(),
'price': item.select_one('p.price').get_text('^^', strip=True).split('^^')[0]
})
pd.DataFrame(data)
Output
name price
0 C-BRIGHT Teeth whitening accelerators SAR 3,103
1 Everbrite At-Home Tooth Whitening Kit SAR 120
2 Everbrite In-Office Tooth Whitening Kit (3 Pat... SAR 275
3 Everbrite In-Office Tooth Whitening Kit (Single) SAR 135
4 FLOCARE – 0.4% Stannous Fluoride SAR 35
5 LITEX 686 LED CURING AND WHITENING SYSTEM SAR 10,500
推荐阅读
- javascript - 关闭选项卡警报修改
- tensorflow - 将 keras 函数式 api 与 tensorflow 相结合
- reactjs - 预期对应的 jsx 结束标记
- php - CodeIgniter - 从多个视图调用的控制器,如何重定向?
- java - 无法下载 Android Studio Sdk
- javascript - 我将如何向我的网站添加 python 代码运行器
- r - 加快创建对象和应用功能的重复性任务
- javascript - Laravel + 反应
- java - 使用 Calimero for Java 读写 KNX 系统的有符号整数值
- android - 如何从新创建的自定义布局中删除背景