python - 使用硒向下滚动谷歌评论
问题描述
我正在尝试从此链接中抓取评论:
对于我使用以下代码加载页面的内容
from selenium import webdriver
import datetime
import time
import argparse
import os
import time
#Define the argument parser to read in the URL
url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"
# Initialize the Chrome webdriver and open the URL
#driver = webdriver.Chromium()
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko")
#driver = webdriver.Firefox(profile)
# https://stackoverflow.com/questions/22476112/using-chromedriver-with-selenium-python-ubuntu
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
driver.get(url)
driver.implicitly_wait(2)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
页面加载正常,它没有向下滚动,我在其他网站上使用了相同的代码,比如linkedn,它在那里工作。
解决方案
这是您可以在不使用 javascript 向下滚动的情况下使用的逻辑。location_once_scrolled_into_view
通过使用将滚动到元素的方法简单而有效。
作为下面逻辑的一部分,我们滚动到最后一条评论,然后检查我们是否根据请求加载了所需的评论数量。
需要进口:
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
在下面的代码中根据您的要求更改变desiredReviewsCount
量值。
wait = WebDriverWait(driver,10)
url = "https://www.google.com/search?q=google+reviews+2nd+chance+treatment+40th+street&rlz=1C1JZAP_enUS697US697&oq=google+reviews+2nd+chance+treatment+40th+street&aqs=chrome..69i57j69i64.6183j0j7&sourceid=chrome&ie=UTF-8#lrd=0x872b7179b68e33d5:0x24b5517d86a95f89,1"
driver.get(url)
x=0
desiredReviewsCount=30
wait.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']")))
while x<desiredReviewsCount:
driver.find_element_by_xpath("(//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review'])[last()]").location_once_scrolled_into_view
x = len(driver.find_elements_by_xpath("//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']"))
print (len(driver.find_elements_by_xpath("//div[@class='gws-localreviews__general-reviews-block']//div[@class='WMbnJf gws-localreviews__google-review']")))
推荐阅读
- android - 为什么 Android Studio 生成的adaptive-icon 裁剪图像如此之多以至于无法使用?
- cygwin - 使用 Cygwin 查找第三方库
- kotlin - 为什么 coroutineScope 会阻塞进程?
- snowflake-cloud-data-platform - 如何将 Job 中创建的 Snowflake DB 连接共享到 Talend 中的 Joblet?
- c# - 如何防止子对象与父对象一起缩小?
- python - 仅搜索/过滤与允许的组合匹配的值
- python - 聚类看起来不正确
- mysql - MYSQL 未正确显示唯一值(不同)
- python - 如何使 zappa 与我的 python 3.9 版本一起工作?
- kubectl - 如何列出 Kubernetes 对象快捷方式?