python - Python Selenium Webdriver:无法在浏览器上加载所有评论
问题描述
我正在尝试提取餐厅的所有谷歌评论。这家餐厅的评论只有 900 多条。但是,我的脚本只能提取 50 条评论。我不确定我在哪里犯了错误。任何解决此问题的帮助将不胜感激。这是我的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome()
base_url = 'https://www.google.com/search?tbs=lf:1,lf_ui:9&tbm=lcl&sxsrf=AOaemvJFjYToqQmQGGnZUovsXC1CObNK1g:1633336974491&q=10+famous+restaurants+in+Dunedin&rflfq=1&num=10&sa=X&ved=2ahUKEwiTsqaxrrDzAhXe4zgGHZPODcoQjGp6BAgKEGo&biw=1280&bih=557&dpr=2#lrd=0xa82eac0dc8bdbb4b:0x4fc9070ad0f2ac70,1,,,&rlfi=hd:;si:5749134142351780976,l,CiAxMCBmYW1vdXMgcmVzdGF1cmFudHMgaW4gRHVuZWRpbiJDUjEvZ2VvL3R5cGUvZXN0YWJsaXNobWVudF9wb2kvcG9wdWxhcl93aXRoX3RvdXJpc3Rz2gENCgcI5Q8QChgFEgIIFkiDlJ7y7YCAgAhaMhAAEAEQAhgCGAQiIDEwIGZhbW91cyByZXN0YXVyYW50cyBpbiBkdW5lZGluKgQIAxACkgESaXRhbGlhbl9yZXN0YXVyYW50mgEkQ2hkRFNVaE5NRzluUzBWSlEwRm5TVU56ZW5WaFVsOUJSUkFCqgEMEAEqCCIEZm9vZCgA,y,2qOYUvKQ1C8;mv:[[-45.8349553,170.6616387],[-45.9156414,170.4803685]]'
driver.get(base_url)
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[./span[text()='Newest']]"))).click()
title = driver.find_element_by_xpath("//div[@class='P5Bobd']").text
address = driver.find_element_by_xpath("//div[@class='T6pBCe']").text
overall_rating = driver.find_element_by_xpath("//div[@class='review-score-container']//span[@class='Aq14fc']").text
total_reviews_text =driver.find_element_by_xpath("//div[@class='review-score-container']//div//div//span//span[@class='z5jxId']").text
num_reviews = int (total_reviews_text.split()[0])
all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
total_reviews = len(all_reviews)
while total_reviews < num_reviews:
driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
#all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')
all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
print(total_reviews)
total_reviews +=1
解决方案
Selenium 预期条件presence_of_all_elements_located
并没有真正等待与传递给该方法定位器的所有元素匹配。
它实际上等待至少 1 个与传递的定位器匹配的元素。
所以而不是
num_reviews = int (total_reviews_text.split()[0])
all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
total_reviews = len(all_reviews)
请试试这个:
num_reviews = int (total_reviews_text.split()[0])
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
time.sleep(2)
all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')
total_reviews = len(all_reviews)
可能您在第二次使用时也会遇到同样的问题presence_of_all_elements_located
。
一般来说,永远不要相信presence_of_all_elements_located
,它只会给你第一个抓到的火柴。
推荐阅读
- android - 文本超出屏幕 android 的范围
- python - 根据另一个数据框查找列的子集?
- mysql - 如果存在连接值,则排除具有特定 ID 的行
- html - 根据屏幕大小加载另一个页面
- linux - 将日期和时间戳转换为不同的格式
- javascript - 如何使用 JS 获取第一个可打印的 html 元素?
- r - 5 个变量的交叉表
- matplotlib - 重新安装后加载 matplotlib 时出现“Get_Data_Path”AttributeError
- javascript - console.log(ctx.drawimage(...)) 输出未定义。为什么?
- javascript - 按两个键对对象数组进行排序,但将“其他”推到第一个键部分的末尾