首页 > 解决方案 > 使用 selenium 删除容器中的元素

问题描述

我只想抓取黑框内包含的所需信息,删除/移除/排除红框内包含的信息在此处输入图像描述

我这样做是因为两个框中都存在类名“条目”和“部分条目”。只有第一个“部分条目”包含我需要的信息,所以我计划删除/删除/排除类名“mgrRspnInLine”。

我的代码是:

while True:
    container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
    for item in container:
        try:
            element = item.find_element_by_class_name('mgrRspnInline')
            driver.execute_script("""var element = document.getElementsByClassName("mgrRspnInline")[0];element.parentNode.removeChild(element);""", element)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element.click()
            time.sleep(2)
            rating = item.find_elements_by_xpath('.//*[contains(@class,"ui_bubble_rating bubble_")]')
            for rate in rating:
                rate = rate.get_attribute("class")
                rate = str(rate)
                rate = rate[-2:]
                score_list.append(rate)
            time.sleep(2)
            stay = item.find_elements_by_xpath('.//*[contains(@class,"recommend-titleInline noRatings")]')
            for stayed in stay:
                stayed = stayed.text
                stayed = stayed.split(', ')
                stayed.append(stayed[0])
                travel_type.append(stayed[1])
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
            summary = item.find_elements_by_xpath('.//*[contains(@class,"noQuotes")]')
            for comment in summary:
                comment = comment.text
                comments.append(comment)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
            rating_date = item.find_elements_by_xpath('.//*[contains(@class,"ratingDate")]')
            for date in rating_date:
                date = date.get_attribute("title")
                date = str(date)
                review_date.append(date)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
            review = item.find_elements_by_xpath('.//*[contains(@class,"partial_entry")]')
            for comment in review:
                comment = comment.text
                print(comment)
                reviews.append(comment)
        except (NoSuchElementException) as e:
            continue
    try:
        element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
        element.click()
        time.sleep(2)
    except (ElementClickInterceptedException,NoSuchElementException) as e:
        print(e)
        break

基本上在“review-container”中,我首先搜索了类名“mgrRspnInLine”,然后尝试使用 execute_script 将其删除。

但不幸的是,输出仍然显示“mgrRspnInLine”中包含的内容。

标签: pythonseleniumselenium-webdriverweb-scrapingselenium-chromedriver

解决方案


如果您想避免通过 XPath 匹配第二个元素,您可以修改 XPath,如下所示:

.//*[contains(@class,"partial_entry") and not(ancestor::*[@class="mgrRspnInLine"])]

"partial_entry"仅当元素没有具有类名的祖先时,这才会匹配具有类名的元素"mgrRspnInLine"


推荐阅读