首页 > 解决方案 > 无法定位元素,无法抓取“评论”

问题描述

我正在从包含 javascript(reviews) 的丝芙兰网站上抓取产品评论,但我无法抓取。这是我的代码:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support.expected_conditions import presence_of_element_located as EC
    import time
    chrome_path = '/media/danish-khan/New Volume/Web_scraping/rgcrawler2/chromedriver'
    driver = webdriver.Chrome(chrome_path)
    
    chrome_options = Options()
    url = 'https://www.sephora.com/product/the-porefessional-face-primer-P264900?skuId=1259068&icid2=products%20grid:p264900:product'
    
    driver.get(url)
    WebDriverWait(driver, 70)
    time.sleep(70)
    review = driver.find_element_by_class_name('css-1jg2pb9 eanm77i0')
    for post in review:
    #try:
    #    element = WebDriverWait(driver, 50).until(
    #        EC.presence_of_element_located((By.XPATH, "//div[@class = 'css-1jg2pb9 eanm77i0']"))
    #    )
    #finally:
    #    driver.quit()
    #
    
       print(review)
    
    
    driver.close()'

输出是:

回溯(最后一次调用):文件“resgt.py”,第 15 行,审查中 = driver.find_element_by_class_name('css-1jg2pb9 eanm77i0') 文件“/home/danish-khan/miniconda3/lib/python3.7/site -packages/selenium/webdriver/remote/webdriver.py”,第 564 行,在 find_element_by_class_name 中返回 self.find_element(by=By.CLASS_NAME, value=name) 文件“/home/danish-khan/miniconda3/lib/python3.7 /site-packages/selenium/webdriver/remote/webdriver.py”,第 978 行,在 find_element 'value': value})['value'] 文件“/home/danish-khan/miniconda3/lib/python3.7/ site-packages/selenium/webdriver/remote/webdriver.py”,第 321 行,在执行 self.error_handler.check_response(response) 文件“/home/danish-khan/miniconda3/lib/python3.7/site-packages/selenium /webdriver/remote/errorhandler.py”,第 242 行,在 check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".css-1jg2pb9 eanm77i0 "} (会话信息: chrome=85.0.4183.102)

标签: pythonseleniumselenium-webdriver

解决方案


该页面的评论正在异步加载,特别是当该部分滚动到视图中时。您将不得不滚动到靠近评论所在的元素并等待它出现。只有这样您才能检索该元素。
我可以用这段代码做到这一点

driver.execute_script("window.scrollTo(0, document.body.scrollHeight/2);")
time.sleep(10)
review = driver.find_element_by_css_selector('.css-1jg2pb9.eanm77i0')
# review = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/main/div/div[2]/div[1]/div/div[5]/div/div[2]/div[1]/div[2]')
print(review)

我把 Xpath 留在了那里,因为那是我第一次得到它的时候注意*您可能需要调整时间和滚动高度以使其始终正确


推荐阅读