python - Python html解析部分类名
问题描述
我正在尝试使用 bs4 解析网页,但我尝试访问的元素都有不同的类名。示例:class='list-item Listing ... id-12984' 和 class='list-item Listing ... id-10359'
def preownedaston(url):
preownedaston_resp = requests.get(url)
if preownedaston_resp.status_code == 200:
bs = BeautifulSoup(preownedaston_resp.text, 'lxml')
posts = bs.find_all('div', class_='') #don't know what to put here
for p in posts:
title_year = p.find('div', class_='inset').find('a').find('span', class_='model_year').text
print(title_year)
preownedaston('https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay§ion%5B%5D=109&order=-usd_price&pageId=3760')
有没有办法解析部分类名class_='list-item '
?
解决方案
用于匹配某个属性的部分值的 Css Selector 如下:
div[class*='list-item'] # the * means match the class with this partial value
但是,如果您查看页面的源代码,您会发现您尝试抓取的内容是由 Javascript 生成的,所以这里有三个选项
- 使用 Selenium 和无头浏览器来呈现 javescript
- 查找 Ajax 调用并尝试模拟它们,例如此 url 是网站用于检索数据 Ajax URL的 ajax 调用
- 查找您尝试抓取到脚本标记中的数据,如下所示:
在类似的情况下,我更喜欢这个,因为您将解析 Json
import requests , json
from bs4 import BeautifulSoup
URL = 'https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay§ion%5B%5D=109&order=-usd_price&pageId=3760'
page = requests.get(URL, headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"})
soup = BeautifulSoup(page.text, 'html.parser')
json_obj = soup.find('script',{'type':"application/ld+json"}).text
#{"@context":"http://schema.org","@graph":[{"@type":"Brand","name":""},{"@type":"OfferCatalog","itemListElement":[{"@type":"Offer","name":"Pre-Owned By Aston Martin","price":"€114,900.00","url":"https://preowned.astonmartin.com/preowned-cars/12984-aston-martin-v12-vantage-v8-volante/","itemOffered":{"@type":"Car","name":"Aston Martin V12 Vantage V8 Volante","brand":"Aston Martin","model":"V12 Vantage","itemCondition":"Used","category":"Used","productionDate":"2010","releaseDate":"2011","bodyType":"6.0 Litre V12","emissionsCO2":"388","fuelType":"Obsidian Black","mileageFromOdometer":"42000","modelDate":"2011","seatingCapacity":"2","speed":"190","vehicleEngine":"6l","vehicleInteriorColor":"Obsidian Black","color":"Black"}},{"@type":"Offer","name":"Pre-Owned By Aston Martin","price":"€99,900.00","url":"https://preowned.astonmartin.com/preowned-cars/10359-aston-martin-v12-vantage-carbon-edition-coupe/","itemOffered":{"@type":"Car","name":"Aston Martin V12 Vantage Carbon Edition Coupe","brand":"Aston Martin","model":"V12 Vantage","itemCondition":"Used","category":"Used","productionDate":"2011","releaseDate":"2011","bodyType":"6.0 Litre V12","emissionsCO2":"388","fuelType":"Obsidian Black","mileageFromOdometer":"42000","modelDate":"2011","seatingCapacity":"2","speed":"190","vehicleEngine":"6l","vehicleInteriorColor":"Obsidian Black","color":"Black"}}]},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":"1","item":{"@id":"https://preowned.astonmartin.com/","name":"Homepage"}},{"@type":"ListItem","position":"2","item":{"@id":"https://preowned.astonmartin.com/preowned-cars/","name":"Pre-Owned Cars"}},{"@type":"ListItem","position":"3","item":{"@id":"//preowned.astonmartin.com/preowned-cars/search/","name":"Pre-Owned By Aston Martin"}}]}]}
items = json.loads(json_obj)['@graph'][1]['itemListElement']
for item in items :
print(item['itemOffered']['name'])
输出:
Aston Martin V12 Vantage V8 Volante
Aston Martin V12 Vantage Carbon Edition Coupe
推荐阅读
- sbt - SBT 使用依赖树并被驱逐
- python - 如何摆脱 Vs 代码中的 python manage.py 权限被拒绝问题?
- python - 如何在python中重命名带有日期时间戳的文件?
- php - 如何在表单提交时重定向到 WordPress 中新创建的帖子?
- ios - UITextView 放置在 UIScrollView 中时如何实现响应式自动滚动?
- c# - EPPLUS 使用 loadfromcollection 与合并的单元格
- xcode - 递归地将 -arch 选项应用于 Xcode 项目的所有依赖项
- sql - 将字符串列转换为多个二进制列
- r - S4 类中的覆盖方法
- java - 如何使用 java 在 postfix 的 milter 中复制电子邮件