首页 > 解决方案 > BeautifulSoup 抓取二手车列表

问题描述

我正在尝试制作一个程序,从网站上抓取二手车列表并输出该汽车列表的链接、价格、里程和发动机功率。现在它只在第一个列表中重复。它应该输出页面上的每个列表。

该网站是爱沙尼亚语的,我希望这不是问题。

import requests
from bs4 import BeautifulSoup
import unicodedata

url = 'https://www.auto24.ee/kasutatud/nimekiri.php?bn=2&a=100&b=7&ae=2&af=50&ssid=21570860'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

for div in soup.find_all('div', {'class' : 'result-row'}):

    def getLink():
        find_link = soup.find('a', {'class' : 'main'})
        link = (find_link.get('href'))
        link_string = ('https://www.auto24.ee' + link)
        return link_string

    def getPrice():
        find_price = soup.find('span', {'class' : 'price'})
        price = (find_price.get_text())
        price_string = unicodedata.normalize("NFKD", price)
        return price_string + ','

    def getMileage():
        find_mileage = soup.find('span', {'class' : 'mileage'})
        mileage = (find_mileage.get_text())
        return mileage + ','

    def getPower():
        engine = requests.get(getLink())
        kW_string = 'kW'
        engine_stats = BeautifulSoup(engine.text, 'lxml')

        if engine_stats.find(kW_string) != -1:
            power_find = engine_stats.find('tr', {'class' : 'field-mootorvoimsus'})
            power = power_find.find('span', {'class' : 'value'})
            power_string = power.get_text()
            return power_string
        else:
            return ('Engine power not specified.')

    print(getLink() + ',', getPrice(), getMileage(), getPower())

输出:

https://www.auto24.ee/soidukid/3554965, 1600 €, 174 000 km, 1.8
https://www.auto24.ee/soidukid/3554965, 1600 €, 174 000 km, 1.8
https://www.auto24.ee/soidukid/3554965, 1600 €, 174 000 km, 1.8
https://www.auto24.ee/soidukid/3554965, 1600 €, 174 000 km, 1.8

...等等。

标签: pythonhtmlfor-loopweb-scrapingbeautifulsoup

解决方案


如果您也查看页面的 URL,则 URL 会发生变化,因此我们可以使用该部分ak=0ak=50依此类推以根据网页获取数据

import requests
from bs4 import BeautifulSoup
for i in range(0,150,50):
    print(i)
    res=requests.get(f"https://www.auto24.ee/kasutatud/nimekiri.php?bn=2&a=100&b=7&ae=2&af=50&ssid=21612624&ak={i}")
    soup=BeautifulSoup(res.text,"html.parser")
    main_data=soup.find("div",attrs={"id":"usedVehiclesSearchResult-flex"}).find_all("div",class_="description")
    for i in main_data:
        print(i.find("a",class_="main")['href'],end=" ")
        print(i.find("span",class_="engine").get_text(),end=" ")
        print(i.find("span",class_="price").get_text(),end=" ")
        try:
            print(i.find("span",class_="mileage").get_text())
        except AttributeError:
            print("NAN")

输出:

0
/soidukid/3554965 1.8 450 € 174 000 km
/soidukid/3563070 1.9 85kW 450 € 514 000 km
/soidukid/3564181 1.6 74kW 500 € 323 032 km
/soidukid/3563999 1.8 85kW 500 € 374 699 km
/soidukid/3550730 2.0 85kW 500 € 420 000 km
..

推荐阅读