首页 > 解决方案 > 硒的网络抓取问题/获取评论

问题描述

我一直在尝试从 Dealabs 网站执行一些网络抓取。

这是示例页面:

https://www.dealabs.com/bons-plans/saneo-climatiseur-2166879

主要目标是能够获得所有评论并打印出来。

下面的示例代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url = "https://www.dealabs.com/bons-plans/saneo-climatiseur-2166879"

options = Options()
options.headless = True

driver = webdriver.Firefox(options=options)
driver.get(url)

button = WebDriverWait(driver, 2).until(
    EC.element_to_be_clickable((By.XPATH, "/html/body/main/div[4]/div[1]/div/div[1]/div[2]/button[2]/span"))
)
button.click()

comments_list = driver.find_element_by_class_name("commentList")
comments = comments_list.find_elements_by_class_name("commentList-item")

for comment in comments:
    _id = comment.get_attribute("id")
    author = comment.find_element_by_class_name('userInfo-username').text
    content = comment.find_element_by_class_name('userHtml-content').text
    timestamp = comment.find_element_by_class_name('text--color-greyShade').text
    print(_id)
    print(author)
    print(content)
    print(timestamp)
    print('-' * 30)

driver.close()

事实是这样做我只能收集评价最高的评论,而不是全部。

我有点困惑。

我错过了什么吗?

提前致谢

标签: pythonseleniumweb-scraping

解决方案


您可以通过发送page参数轻松获取评论

https://www.dealabs.com/bons-plans/saneo-climatiseur-2166879?page=1
https://www.dealabs.com/bons-plans/saneo-climatiseur-2166879?page=2
https://www.dealabs.com/bons-plans/saneo-climatiseur-2166879?page=3

等等,而不是每次都单击下一步按钮。


推荐阅读