python - python web抓取 - URL与Chrome Inspect结果不匹配
问题描述
我正在尝试从以下链接中检索一些数据,但是当我更改 URL 时,我的请求会导致不同的结果,方法是在单击网站底部的下一页按钮后检索 URL(https://www.carmax .com/cars?location=all)。
适用于初始 URL 的代码
import requests
from bs4 import BeautifulSoup
car_url = "https://www.carmax.com/cars?location=all"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
r = requests.get(car_url, headers = headers)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'lxml')
# Information that I am looking for
info = soup.find_all('div', class_='vehicle-browse--result--info')
当我使用网站的下一页运行相同的代码时,响应中没有“class_='vehicle-browse--result--info'”类。
具有来自下一页的新 URL 的代码
url_test = 'https://www.carmax.com/search?location=all#BT=0&Distance=all&ExposedCategories=249+250+1001+1000+265+999+772&ExposedDimensions=249+250+1001+1000+265+999+772&Page=4&PerPage=20&SortKey=0&StartIndex=80&Zip=20877'
test_request = requests.get(url_test, headers = headers)
html_doc_test = test_request.text
soup_test = BeautifulSoup(html_doc_test, 'lxml')
# This returns a blank object, not providing me the info I need
info_test = soup_test.find_all('div', class_='vehicle-browse--result--info')
新 URL(当您单击初始 URL 上的下一个箭头时,它是下一页之一)没有相同的结果。我该怎么做才能在下一页收到相同的响应?
有关更多详细信息,当我在网站上使用 Google Chrome 时单击“检查”时,我确实看到了与初始 URL 相同类型的信息,但由于某种原因无法转换为代码。
解决方案
看起来页面是动态生成的。Selenium和Chrome之类的驱动程序可以加载注入的 DOM 并即时获取 HTML:
from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get("https://www.carmax.com/search?location=all#BT=0&Distance=all&ExposedCategories=249+250+1001+1000+265+999+772&ExposedDimensions=249+250+1001+1000+265+999+772&Page=4&PerPage=20&SortKey=0&StartIndex=80&Zip=20877")
for elem in chrome.find_elements_by_class_name("vehicle-browse--result--info"):
print(elem.text) # or just use elem
输出:
No-haggle price$22,998*Mileage12K6 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16201078
No-haggle price$13,998*Mileage52K46 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:15916654
No-haggle price$40,998*Mileage28K16 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16227545
No-haggle price$19,998*Mileage7K0 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16187367
No-haggle price$18,998*Mileage12K95 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16227745
No-haggle price$49,998*Mileage10K2 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16227961
No-haggle price$23,598*Mileage34K1 Review
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16269716
No-haggle price$24,998*Mileage38K1 Review
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16227325
No-haggle price$17,998*Mileage5K36 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16032863
No-haggle price$15,998*Mileage12K36 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16033129
No-haggle price$14,598*Mileage56K126 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:15974221
No-haggle price$18,998*Mileage14K2 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16227491
No-haggle price$16,998*Mileage20K36 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16227699
No-haggle price$18,598*Mileage24K36 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16269723
No-haggle price$14,998*Mileage35K151 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16291864
No-haggle price$15,598*Mileage54K151 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16318227
No-haggle price$14,998*Mileage45K36 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16402895
No-haggle price$17,998*Mileage12K36 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16416608
No-haggle price$18,998*Mileage24K151 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16246905
No-haggle price$30,998*Mileage33K31 Reviews
GET PRE-QUALIFIED
CarMax Gaithersburg
Gaithersburg, MD
Stock #:16187751
推荐阅读
- c# - C#动态编译器,在内存中编译时得到标准输出
- matlab - 在 MATLAB 中进行音高转换的最佳方法
- wso2 - 有没有办法按顺序触发计划任务?
- qt - 使用静态库构建独立的 qt 应用程序 - 缺少配置命令
- microsoft-edge - 从 Windows 10 build 1809 上的快捷方式以私密模式启动 microsoft edge
- oracle - 错误:将数组传递给另一个过程时出现“错误的数字或类型或参数”
- vxworks - 如何在 VxWorks 7.0 中将 PCI / PCIe 虚拟映射到物理内存?
- c# - 如何覆盖方法保存模板方法模式
- mysql - mysql docker容器的性能问题
- node.js - 如何编辑节点模块