python - 空列表 Python 美汤
问题描述
我是网络抓取的新手。我正在尝试提取有关汽车列表的信息。但是,当我运行以下代码时,我只会得到空列表。
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
from time import sleep
from random import randint
title=[]
kilometres=[]
transmission=[]
engine=[]
price=[]
adtype=[]
url='https://www.carsales.com.au/cars/new-south-wales-state/sydney-metro-region/suv-bodystyle/?offset=0'
headers = {"Accept-Language": "en-AU, en;q=0.5"}
page=requests.get(url,headers=headers)
soup=BeautifulSoup(page.text,'html.parser')
names=soup.find_all(class_='col')
for item in names:
title.append(item.find('a').txt)
distances=soup.find_all('li',{'data-type':'Odometer'})
for item in distances:
kilometres.append(item.text)
trans=soup.find_all('li',{'data-type':'Transmission'})
for item in trans:
transmission.append(item.text)
engines=soup.find_all('li',{'data-type':'Engine'})
for item in engines:
engine.append(item.text)
prices=soup.find_all(class_='price')
for item in prices:
price.append(item.find('a').text)
adtypes=soup.find_all(class_='seller-type')
for item in adtypes:
adtype.append(item.text)
我在这里做错了什么?我想将 URL 中的数据抓取到 Pandas Dataframe 中。
解决方案
To get correct page set User-Agent
header and Accept-Language
to "en-US,en;q=0.5"
:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url='https://www.carsales.com.au/cars/new-south-wales-state/sydney-metro-region/suv-bodystyle/?offset=0'
headers = {"Accept-Language": "en-US,en;q=0.5", 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
page=requests.get(url,headers=headers)
soup=BeautifulSoup(page.text,'html.parser')
all_data = []
for car in soup.select('.listing-item'):
title = car.select_one('h3 > a').text
price = car.select_one('.price > a').text
type_ = car.select_one('.seller-type, .franchise-stock-type').get_text(strip=True)
all_data.append( dict(title=title, price=price, type=type_, **{li['data-type']: li.text for li in car.select('li[data-type]')}) )
df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv')
Prints:
title price type Odometer Body Style Transmission Engine Build Date
0 2019 Nissan Pathfinder ST-L R52 Series III Aut... $45,878* Dealer Used Car 1,400 km SUV Automatic 6cyl 3.5L Petrol NaN
1 2020 Land Rover Range Rover Evoque D150 S Auto... $70,000* Private Seller Car 3,000 km SUV Automatic 4cyl 2.0L Turbo Diesel NaN
2 2011 SsangYong Korando S Manual 2WD $8,750* Dealer Used Car 164,834 km SUV Manual 4cyl 2.0L Turbo Diesel NaN
3 2016 BMW X3 xDrive20d F25 LCI Auto 4x4 $31,000* Private Seller Car 99,654 km SUV Automatic 4cyl 2.0L Turbo Diesel NaN
4 2019 Mitsubishi Outlander ES ZL Auto 2WD MY20 $29,580 Dealer Demo 2 km SUV Automatic 4cyl 2.4L Petrol NaN
5 2012 Mazda CX-5 Grand Touring KE Series Auto AWD $18,000* Private Seller Car 116,590 km SUV Automatic 4cyl 2.2L Turbo Diesel NaN
6 2020 MG HS Excite Auto FWD MY20 $32,848 New Car In Stock NaN SUV Automatic 4cyl 1.5L Turbo Petrol Build date Jan 2020
7 2019 BMW X3 xDrive30i G01 Auto 4x4 $67,800* Dealer Used Car 10,637 km SUV Automatic 4cyl 2.0L Turbo Petrol NaN
8 2019 BMW X1 xDrive25i F48 LCI Auto AWD $56,990 Dealer Used Car 7,203 km SUV Automatic 4cyl 2.0L Turbo Petrol NaN
9 2019 Jeep Cherokee Trailhawk Auto 4x4 MY19 $50,890 Dealer Demo 10 km SUV Automatic 6cyl 3.2L Petrol NaN
10 2019 Audi Q2 35 TFSI design Auto MY19 $44,850 Dealer Demo 2,135 km SUV Automatic 4cyl 1.4L Turbo Petrol NaN
11 2020 Land Rover Range Rover Sport SDV8 HSE Aut... $162,500* Private Seller Car 48 km SUV Automatic 8cyl 4.4L Turbo Diesel NaN
12 2015 Porsche Macan S Diesel 95B Auto AWD MY15 $59,800* Dealer Used Car 71,926 km SUV Automatic 6cyl 3.0L Turbo Diesel NaN
13 2018 Mazda CX-5 Akera KF Series Auto i-ACTIV AWD $39,990* Dealer Used Car 14,855 km SUV Automatic 4cyl 2.5L Petrol NaN
14 2019 Mazda CX-5 Maxx Sport KF Series Auto i-AC... $39,950* Dealer Used Car 9,592 km SUV Automatic 4cyl 2.2L Turbo Diesel NaN
15 2019 Mitsubishi ASX LS XD Auto 2WD MY20 $29,685 Dealer Demo 447 km SUV Automatic 4cyl 2.0L Petrol NaN
16 2012 Audi Q5 TFSI Auto quattro MY12 $22,900* Private Seller Car 69,518 km SUV Automatic 4cyl 2.0L Turbo Petrol NaN
17 2013 Subaru XV 2.0i G4X Auto AWD MY13 $16,990* Dealer Used Car 94,245 km SUV Automatic 4cyl 2.0L Petrol NaN
18 2019 Mitsubishi Pajero Sport Exceed QF Auto 4x... $58,880 Dealer Demo 1,755 km SUV Automatic 4cyl 2.4L Turbo Diesel NaN
And saves data.csv
(screenshot from LibreOffice):
推荐阅读
- scala - 在 Spark 中将字符串转换为映射
- ios - 在 Xcode 13 中使 ForEach 列表可搜索
- javascript - Nodejs等待用户输入?
- android - 就 WorkManager 状态而言,挂起是什么意思?
- networking - 如何从主机访问(Windows)访问在客户机(Ubuntu)内部建立的 VPN 连接
- c++ - 试图保存矢量
在 bin 文件中并读取它给出了随机数据 - python - 在 Python 中显示重复的结果
- android - 有人可以教我如何在我当前的代码上应用回调吗
- javascript - TypeError:无法读取未定义的属性(读取“公会”)
- python - swi.nii 图像(医学图像)中的 3d numpy 数组中的 Np.min