首页 > 解决方案 > 详细页面的网页设计

问题描述

我目前正忙于使用此网站对汽车数据集进行网络抓取 - https://www.marktplaats.nl/l/auto-s/p/1/#f:10882

我的问题是我的分析中有趣的部分 - 变速箱,发动机类型,价格等 - 位于更详细的页面 - https://www.marktplaats.nl/a/auto-s/volkswagen/m1547281937-volkswagen- polo-1-0-tsi-highline-beats-edition-navi-xenon.html?c=df2f21f683612b45d62c413c0ca719df&previousPage=lr

我已经成功地从一般分页中抓取信息,但是不知道如何在详细页面上为我迭代和抓取必要的字段。

标签: pythonweb-scraping

解决方案


您必须浏览第一个网页才能找到每辆车的所有网址。然后下载汽车详细信息并一一解析。我用过bs4包(beautifulsoup)。下面的代码需要适应您的需求,但想法在这里:

import requests
import bs4

url = 'https://www.marktplaats.nl/l/auto-s/p/1/#f:10882'

def downloading_and_parsing_url(url):
    # Downloading the webpage as text 
    txt = requests.get(url)
    # Parsing the webpage
    soup = bs4.BeautifulSoup(txt.text, 'html.parser')
    return soup

soup = downloading_and_parsing_url(url)
soup_table = soup.find('ul', 'mp-Listings mp-Listings--list-view')

for car in soup_table.findAll('li'):

    # Finding the url for each 'car'
    link = car.find('a')
    sub_url = 'https://www.marktplaats.nl/' + link.get('href')

    # Downloading each url
    sub_soup = downloading_and_parsing_url(sub_url)

    # Finding the 'div' with id 'car-attributes'
    sub_soup = sub_soup.find('div', {'id': 'car-attributes'})
    for car_item in sub_soup.findAll('div', {'class': 'spec-table-item'}):
        key = car_item.find('span', {'class': 'key'})
        value = car_item.find('span', {'class': 'value'})
        print(key.text, value.text)
    print('\n')

和输出

Merk & Model: Lako
Bouwjaar: 1996
Uitvoering: 233 C
Carrosserie: Open wagen
Kenteken: OD-31-VD
APK tot: 29 juni 2020
Prijs: € 7.500,00


Merk & Model: RAM
Bouwjaar: 2020
Carrosserie: SUV of Terreinwagen
Brandstof: LPG
Kilometerstand: 70 km
Transmissie: Automaat
Prijs: Zie omschrijving
Motorinhoud: 5.700 cc
Opties: 

Parkeersensor
Dodehoekdetectie
Elektrische achterklep
Metallic lak
Panoramadak
Radio
Mistlampen
Adaptive Cruise Control
Keyless entry
Airconditioning
Boordcomputer
Bekleding leder
Stoelverwarming
Trekhaak
Elektrische ramen
Climate control
Emergency brake assist
Isofix
Alarm
Spraakbediening
Navigatiesysteem
Elektrische buitenspiegels
Traction-control
...

推荐阅读