首页 > 解决方案 > Python Selenium Webdriver:无法在浏览器上加载所有评论

问题描述

我正在尝试提取餐厅的所有谷歌评论。这家餐厅的评论只有 900 多条。但是,我的脚本只能提取 50 条评论。我不确定我在哪里犯了错误。任何解决此问题的帮助将不胜感激。这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = webdriver.Chrome()
base_url = 'https://www.google.com/search?tbs=lf:1,lf_ui:9&tbm=lcl&sxsrf=AOaemvJFjYToqQmQGGnZUovsXC1CObNK1g:1633336974491&q=10+famous+restaurants+in+Dunedin&rflfq=1&num=10&sa=X&ved=2ahUKEwiTsqaxrrDzAhXe4zgGHZPODcoQjGp6BAgKEGo&biw=1280&bih=557&dpr=2#lrd=0xa82eac0dc8bdbb4b:0x4fc9070ad0f2ac70,1,,,&rlfi=hd:;si:5749134142351780976,l,CiAxMCBmYW1vdXMgcmVzdGF1cmFudHMgaW4gRHVuZWRpbiJDUjEvZ2VvL3R5cGUvZXN0YWJsaXNobWVudF9wb2kvcG9wdWxhcl93aXRoX3RvdXJpc3Rz2gENCgcI5Q8QChgFEgIIFkiDlJ7y7YCAgAhaMhAAEAEQAhgCGAQiIDEwIGZhbW91cyByZXN0YXVyYW50cyBpbiBkdW5lZGluKgQIAxACkgESaXRhbGlhbl9yZXN0YXVyYW50mgEkQ2hkRFNVaE5NRzluUzBWSlEwRm5TVU56ZW5WaFVsOUJSUkFCqgEMEAEqCCIEZm9vZCgA,y,2qOYUvKQ1C8;mv:[[-45.8349553,170.6616387],[-45.9156414,170.4803685]]'
driver.get(base_url)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[./span[text()='Newest']]"))).click()

title = driver.find_element_by_xpath("//div[@class='P5Bobd']").text
address = driver.find_element_by_xpath("//div[@class='T6pBCe']").text
overall_rating = driver.find_element_by_xpath("//div[@class='review-score-container']//span[@class='Aq14fc']").text
total_reviews_text =driver.find_element_by_xpath("//div[@class='review-score-container']//div//div//span//span[@class='z5jxId']").text
 
num_reviews = int (total_reviews_text.split()[0])
all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
total_reviews = len(all_reviews)

while total_reviews < num_reviews:
        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
        WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
        #all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')
        all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
        print(total_reviews)
        total_reviews +=1

标签: pythonpython-3.xseleniumselenium-webdriverwebdriverwait

解决方案


Selenium 预期条件presence_of_all_elements_located并没有真正等待与传递给该方法定位器的所有元素匹配。
它实际上等待至少 1 个与传递的定位器匹配的元素。
所以而不是

num_reviews = int (total_reviews_text.split()[0])
all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
total_reviews = len(all_reviews)

请试试这个:

num_reviews = int (total_reviews_text.split()[0])
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
time.sleep(2)
all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')
total_reviews = len(all_reviews)

可能您在第二次使用时也会遇到同样的问题presence_of_all_elements_located
一般来说,永远不要相信presence_of_all_elements_located,它只会给你第一个抓到的火柴。


推荐阅读