首页 > 解决方案 > 运行循环 30 次后,Python selenium web 抓取崩溃

问题描述

我正在尝试从该网站的其中一个网站上抓取数据,该网站可以选择单击一个按钮以加载更多数据,然后单击该按钮,然后每次向下滚动都会收到新数据,一段时间后您必须再次单击该按钮。以下是我的代码:

while condition:
for div in range(0,len(info_divs)):
     i += 1
     elem = info_divs[div]
     links = elem.find_elements_by_class_name("postTitle")[-1]
     links2 = links.find_element_by_tag_name("a")
     list_of_links_riyadh.append(links2.get_property("href"))

try:
    try:
        time.sleep(4)
        driver.find_element_by_id("more").click()
    except:
        time.sleep(4)
        driver.execute_script("window.scrollBy(0,2000)")
        time.sleep(8)
        new_height = driver.execute_script("return document.body.scrollHeight")
        # print(list_of_links_riyadh)
        print(new_height)
        print(last_height)

        if last_height == new_height:
            condition = False
        else:
            last_height = new_height
except:

    condition = False
    print("False")

我已经使用上面相同的代码从同一个网站上抓取数据,现在数据要大得多,以前只有 600 行,现在是 35k 行以上在“links = elem.find_elements_by_class_name("postTitle")[-1] 上崩溃" 这一行出现以下错误

Traceback (most recent call last):
  File "abc", line 24, in <module>
    links = elem.find_elements_by_class_name("postTitle")[-1]
  File "abc", line 413, in find_elements_by_class_name
    return self.find_elements(by=By.CLASS_NAME, value=name)
  File "abc", line 684, in find_elements
    return self._execute(Command.FIND_CHILD_ELEMENTS,
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=91.0.4472.77)

标签: pythonseleniumselenium-webdriver

解决方案


推荐阅读