首页 > 解决方案 > 即使在使用铬之后,抓取 JS 也失败了

问题描述

我正在尝试抓取这个网站:

https://www.izkor.gov.il/%D7%90%D7%94%D7%A8%D7%95%D7%9F%20%D7%94%D7%A8%D7%A9%D7%9C %D7%A8/en_399451c07d6af2edbb259e94a77362b2

我试过这个:

first_fallen_url = r'https://www.izkor.gov.il/%D7%90%D7%94%D7%A8%D7%95%D7%9F%20%D7%94%D7%A8%D7%A9%D7%9C%D7%A8/en_399451c07d6af2edbb259e94a77362b2'

driver = webdriver.Chrome()
driver.get(first_fallen_url)

resp = requests.get(first_fallen_url)
html = resp.content
soup = BeautifulSoup(html, features="lxml")

with open("page_example.html", "w", encoding="utf-8") as file:
    file.write(str(soup))

但是生成的网站是不同的。

标签: pythonweb-scrapingbeautifulsoup

解决方案


尝试这个:

import time
from selenium import webdriver
from bs4 import BeautifulSoup

first_fallen_url = r'https://www.izkor.gov.il/%D7%90%D7%94%D7%A8%D7%95%D7%9F%20%D7%94%D7%A8%D7%A9%D7%9C%D7%A8/en_399451c07d6af2edbb259e94a77362b2'

driver = webdriver.Chrome()
driver.get(first_fallen_url)
time.sleep(3)

soup = BeautifulSoup(driver.page_source, features="lxml")

with open("page_example.html", "w", encoding="utf-8") as file:
    file.write(str(soup))

driver.close()

推荐阅读