首页 > 解决方案 > 如何进行网页抓取 - beautifulSoup

问题描述

我正在尝试从此链接获取每个产品的“标题和价格”列表 - https://www.price.ro/preturi_notebook-1193.htm但我无法将这两个列表合并为一个以下:

“标题价格”

我在我的代码中做了一些事情,但我被困在合并这两列

import requests
from bs4 import BeautifulSoup

url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')

title=soup.findAll('a',{'class':"titlu"})
price=soup.findAll('a',{'class':"price"})

for t in title:
    print(t.text.strip())
for p in price:
    print(p.text.strip())`

预期输出:

华硕 ZenBook UX430UA-GV340R 3,579.00 雷
华硕 ZenBook ux331fal-eg006t 3,298.99 雷
华硕 UX334FL-A4005T 8,403.98 雷
华硕 UX461FA-E1035T 3,292.95 雷
联想 IdeaPad S530-13IWL 81j7004grm 3,499.00 雷
华硕 ZenBook 13 UX331FN-EG003T 5,229.00 雷
华硕 UX334FL-A4014R 3,692.28 雷
华硕 FX705GM-EW137 4,460.96 雷
华硕 S330FA-EY095 4,174.00 雷
华硕 UX333FA-A4109 5,794.00 雷

标签: pythonbeautifulsoup

解决方案


produs-lista to find all product list and iterate list of product and scrape title and price of each product.

Ex.

import requests
from bs4 import BeautifulSoup

url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')
produs_list = soup.find("div",{'class':'produse'}).find_all("div",\
               {'class':'produs-lista'})
data = []
for x in produs_list:
    title = x.find("a",{'class':'titlu'}).text.strip()
    price = x.find("a",{'class':'price'}).text.strip()
    product = dict(title=title,price=price)
    data.append(product)

print(data)

O/P:

[{'title': 'Asus ZenBook UX430UA-GV340R', 'price': '3,292.95 lei'}, 
{'title': 'Asus ZenBook  ux331fal-eg006t', 'price': '3,499.00 lei'}, 
{'title': 'Asus UX334FL-A4005T', 'price': '5,229.00 lei'}, 
{'title': 'Asus UX461FA-E1035T', 'price': '3,692.28 lei'}, 
{'title': 'Lenovo IdeaPad S530-13IWL  81j7004grm', 'price': '4,460.96 lei'}, 
{'title': 'Asus ZenBook 13  UX331FN-EG003T', 'price': '4,174.00 lei'}, 
{'title': 'Asus UX334FL-A4014R', 'price': '5,794.00 lei'}, 
{'title': 'Asus FX705GM-EW137', 'price': '5,885.48 lei'}, 
{'title': 'Asus S330FA-EY095', 'price': '3,279.46 lei'}, 
{'title': 'Asus UX333FA-A4109', 'price': '4,098.99 lei'}, 
{'title': 'Apple The New MacBook Pro 13 Retina (mpxr2ze/a)', 'price': '6,040.67 lei'}, 
{'title': 'Lenovo Legion Y530 81FV003MRM', 'price': '3,098.99 lei'}, 
{'title': 'Asus UX433FA-A5046R', 'price': '3,699.00 lei'}, 
{'title': 'HP ProBook 450 G6 5TL51EA', 'price': '3,299.99 lei'},
 {'title': 'Asus X542UA-DM525', 'price': '2,424.00 lei'}, 
{'title': 'Lenovo ThinkPad X1 Carbon 6th gen 20KH006JRI', 'price': '10,202.99 lei'}, 
{'title': 'Asus VivoBook  X540UA-DM972', 'price': '1,659.00 lei'}, 
{'title': 'Asus X507UA-EJ782', 'price': '2,189.00 lei'}, 
{'title': 'Apple MacBook Air 13 (mqd32ze/a)', 'price': '3,998.00 lei'}, 
{'title': 'HP ProBook 470 G5  2rr84ea', 'price': '4,460.49 lei'}]

推荐阅读