python - 如何在不打开浏览器的情况下抓取执行的 JavaScript?
问题描述
我一直在使用 Selenium 滚动和抓取需要执行 JavaScript 的网页。现在我想在不打开浏览器的情况下抓取页面,因此我尝试这样做:
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome("C:/bin/chromedriver.exe", options=op)
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
" AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"}
link = (
"https://www.zillow.com/homedetails/303-E-57th-St-APT-44G-New-York-NY-10022/2076710336_zpid/"
)
response = requests.get(link)
webpage = response.content
soup = BeautifulSoup(webpage, "html.parser")
print(soup)
问题是 JavaScript 没有被执行,所以我需要的信息没有被加载。如何使用无头 Chrome 执行此操作并在浏览器未启动时滚动页面?
解决方案
推荐阅读
- powershell - 使用 JQ 实时过滤通知输出
- python - 使用 tkinter SystemExit 将 auto-py-to-exe 转换为问题:pre-safe-import-module hook failed
- c# - Blazor WebAssembly - 用户帐户和身份服务器的最佳实践
- microsoft-graph-api - What kind of activities which we receive from onedrive API on editing of a ppt file. Why it doesnt show 'FileModified' Operation?
- javascript - jQuery 图像滑块 - 控件
- python - Getting AssertionError: (None, <10 * Seconds>) when comparing two "seemingly" identical dataframes
- xml - 如果另一个没有值,XSD 1.0 将属性设置为 false
- oauth-2.0 - Google Blogger API oauth2 token expires after posting 50. How to deal with it?
- grpc-gateway - grpc-gateway how to make validation work for http request as well?
- javascript - Adding an image to a pdf and then recenter it