首页 > 解决方案 > 使用嵌套循环不正确的循环

问题描述

我正在尝试遍历网站上的书籍,在进入下一页之前应该只得到 20 个结果。

我查看一个元素来获取总页数(num_pages),这给了我可以迭代的最大页数。

我对代码的问题是嵌套循环(定位锚节点)不只提供单个页面的 20 个 url,而是在相同的页面上循环。

我不是 100% 嵌套循环出错的地方,所以任何指针都会非常有帮助。

options = webdriver.FirefoxOptions()
options.add_argument("--headless")

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install(), options=options)
#driver = webdriver.Chrome(executable_path=chromedriver, options=options)
print("Browsing to Wordery")
driver.get('https://wordery.com/search?viewBy=grid&resultsPerPage=20&page=1&leadTime[]=any&interestAge[]=Babies')
#print((driver.page_source).encode('utf-8'))
driver.implicitly_wait(3)

#Get total pages
num_page = driver.find_element_by_xpath('//span[@class="js-pnav-max"]')


#iterate through pages grabbing links
for i in range(int(num_page.text)):
    
    #locate anchor nodes
    lists = driver.find_elements_by_xpath("//a[@class='"'c-book__title'"']")
    links = []
    for lis in lists:
        
        # Fetch and store the links
        links.append(lis.get_attribute('href'))
        with open('search_results_urls.txt', 'a') as filehandle:
            filehandle.write('%s\n' % lis.get_attribute('href'))
            print(lis.get_attribute('href'))
    
    page_ = i + 1
    click_next = driver.find_element_by_xpath('//a[@class="o-layout__item o-link--arrow js-pnav-next u-utils-pnav__next"]').click()

driver.quit()

奇怪的是,它会在第一页循环 33 次(只有 20 个项目,因此会重复它们),然后产生以下错误:

selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <a class="c-book__title" href="/peppa-pig-practise-with-peppa-wipe-clean-first-letters-peppa-pig-9780723292081"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

我已经通过删除页面导航循环测试了以下内容,它按预期工作。

lists = driver.find_elements_by_xpath("//a[@class='c-book__title']")
links = [link.get_attribute('href') for link in lists]

with open('search_results_urls.txt', 'a') as filehandle:
    for link in links:
        filehandle.write(link)
        print(link)

一旦我将它添加到页面循环中,它最终会遍历同一页面的 url。

这是我最新的代码,来自下面的答案。我仍在为它多次循环第一页而苦苦挣扎。

#Get total pages
num_page = driver.find_element_by_xpath('//span[@class="js-pnav-max"]')

for i in range(int(num_page.text)):
    driver.implicitly_wait(10)
    lists = driver.find_elements_by_xpath("//a[@class='c-book__title']")
    links = [link.get_attribute('href') for link in lists]
    
    with open('search_results_urls.txt', 'a') as filehandle:
        for link in links:
            filehandle.write(link + "\n")
            print(link + "\n")
    
    click_next = driver.find_element_by_xpath('//a[@class="o-layout__item o-link--arrow js-pnav-next u-utils-pnav__next"]').click()

标签: python-3.xselenium

解决方案


您获得相同链接的原因是您已将其分配到循环之外。当页面被刷新时,你仍然有之前的链接。

放置在 for loop.use 中WebDriverWait() 并等待,presence_of_all_elements_located()这样当点击下一页时,这将在页面刷新时同步。

num_page = driver.find_element_by_xpath('//span[@class="js-pnav-max"]')
for i in range(int(num_page.text)):
    lists =WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//a[@class='c-book__title']"))) 
    links = [link.get_attribute('href') for link in lists]
    with open('search_results_urls.txt', 'a') as filehandle:
        for link in links:
            filehandle.write(link)
            print(link)

    click_next = driver.find_element_by_xpath('//a[@class="o-layout__item o-link--arrow js-pnav-next u-utils-pnav__next"]').click()
    #provide some delay to refreshed the page.
     time.sleep(2)

您需要导入以下库。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

推荐阅读