首页 > 解决方案 > 刮板不从动态网页返回结果

问题描述

我正在尝试从https://store.steampowered.com/newshub/app/1145360抓取所有更新说明。我用“eventcalendar_CalendarRow_398u2”类标识了更新说明,并编写了如下代码:

updatenotes = soup.find_all("div", attrs={"class":"eventcalendar_CalendarRow_398u2"})
for updatenote in updatenotes:

但是当我尝试抓取时,它不会返回任何结果,我认为这是由于网站的动态特性。在开始抓取之前,我正在使用 Selenium 完全向下滚动,但它不起作用。有人能帮忙吗?

标签: pythonseleniumbeautifulsoupsteamscrape

解决方案


尝试以下

driver.get('https://store.steampowered.com/newshub/app/1145360')
scroll_pause_time = 1
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    updatenotes=driver.find_elements_by_css_selector("div.eventcalendar_CalendarRow_398u2")
    print(len(updatenotes))
    for updatenote in updatenotes:
        print(updatenote.text)
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(scroll_pause_time)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        # If heights are the same it will exit the function
        break
    last_height = new_height

推荐阅读