python - 如何进行网页抓取 - beautifulSoup
问题描述
我正在尝试从此链接获取每个产品的“标题和价格”列表 - https://www.price.ro/preturi_notebook-1193.htm但我无法将这两个列表合并为一个以下:
“标题价格”
我在我的代码中做了一些事情,但我被困在合并这两列
import requests
from bs4 import BeautifulSoup
url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')
title=soup.findAll('a',{'class':"titlu"})
price=soup.findAll('a',{'class':"price"})
for t in title:
print(t.text.strip())
for p in price:
print(p.text.strip())`
预期输出:
华硕 ZenBook UX430UA-GV340R 3,579.00 雷 华硕 ZenBook ux331fal-eg006t 3,298.99 雷 华硕 UX334FL-A4005T 8,403.98 雷 华硕 UX461FA-E1035T 3,292.95 雷 联想 IdeaPad S530-13IWL 81j7004grm 3,499.00 雷 华硕 ZenBook 13 UX331FN-EG003T 5,229.00 雷 华硕 UX334FL-A4014R 3,692.28 雷 华硕 FX705GM-EW137 4,460.96 雷 华硕 S330FA-EY095 4,174.00 雷 华硕 UX333FA-A4109 5,794.00 雷
解决方案
produs-lista
to find all product
list and iterate list of product and scrape title and price of each product.
Ex.
import requests
from bs4 import BeautifulSoup
url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')
produs_list = soup.find("div",{'class':'produse'}).find_all("div",\
{'class':'produs-lista'})
data = []
for x in produs_list:
title = x.find("a",{'class':'titlu'}).text.strip()
price = x.find("a",{'class':'price'}).text.strip()
product = dict(title=title,price=price)
data.append(product)
print(data)
O/P:
[{'title': 'Asus ZenBook UX430UA-GV340R', 'price': '3,292.95 lei'},
{'title': 'Asus ZenBook ux331fal-eg006t', 'price': '3,499.00 lei'},
{'title': 'Asus UX334FL-A4005T', 'price': '5,229.00 lei'},
{'title': 'Asus UX461FA-E1035T', 'price': '3,692.28 lei'},
{'title': 'Lenovo IdeaPad S530-13IWL 81j7004grm', 'price': '4,460.96 lei'},
{'title': 'Asus ZenBook 13 UX331FN-EG003T', 'price': '4,174.00 lei'},
{'title': 'Asus UX334FL-A4014R', 'price': '5,794.00 lei'},
{'title': 'Asus FX705GM-EW137', 'price': '5,885.48 lei'},
{'title': 'Asus S330FA-EY095', 'price': '3,279.46 lei'},
{'title': 'Asus UX333FA-A4109', 'price': '4,098.99 lei'},
{'title': 'Apple The New MacBook Pro 13 Retina (mpxr2ze/a)', 'price': '6,040.67 lei'},
{'title': 'Lenovo Legion Y530 81FV003MRM', 'price': '3,098.99 lei'},
{'title': 'Asus UX433FA-A5046R', 'price': '3,699.00 lei'},
{'title': 'HP ProBook 450 G6 5TL51EA', 'price': '3,299.99 lei'},
{'title': 'Asus X542UA-DM525', 'price': '2,424.00 lei'},
{'title': 'Lenovo ThinkPad X1 Carbon 6th gen 20KH006JRI', 'price': '10,202.99 lei'},
{'title': 'Asus VivoBook X540UA-DM972', 'price': '1,659.00 lei'},
{'title': 'Asus X507UA-EJ782', 'price': '2,189.00 lei'},
{'title': 'Apple MacBook Air 13 (mqd32ze/a)', 'price': '3,998.00 lei'},
{'title': 'HP ProBook 470 G5 2rr84ea', 'price': '4,460.49 lei'}]
推荐阅读
- wifi - ESP32 服务器看不到客户端
- angularjs - 有没有办法跟踪/跟踪 WebStorm 中运行的函数流?
- python - SFTP:从远程服务器递归复制/下载文件夹中的所有文件
- react-native - FlatList 和页面滚动问题,如何解决?
- gdal - 如何在 Django settings.py 中指定 GDAL_LIBRARY_PATH?
- c# - 我们可以使用 IEqualityComparer 接口使用 LINQ 扩展方法 SequenceEqual 逐字段比较两个复杂集合吗
- android - Android AWS TransferService 在 API 级别 26 及更高版本中不起作用
- java - 为什么输出在下面的java代码中?
- mysql - 如何找到曾经相同但现在已损坏的两张表之间的差异
- boost-log - Boost Log 轮换文件名模式整数从未重新启动