python - 运行循环 30 次后,Python selenium web 抓取崩溃
问题描述
我正在尝试从该网站的其中一个网站上抓取数据,该网站可以选择单击一个按钮以加载更多数据,然后单击该按钮,然后每次向下滚动都会收到新数据,一段时间后您必须再次单击该按钮。以下是我的代码:
while condition:
for div in range(0,len(info_divs)):
i += 1
elem = info_divs[div]
links = elem.find_elements_by_class_name("postTitle")[-1]
links2 = links.find_element_by_tag_name("a")
list_of_links_riyadh.append(links2.get_property("href"))
try:
try:
time.sleep(4)
driver.find_element_by_id("more").click()
except:
time.sleep(4)
driver.execute_script("window.scrollBy(0,2000)")
time.sleep(8)
new_height = driver.execute_script("return document.body.scrollHeight")
# print(list_of_links_riyadh)
print(new_height)
print(last_height)
if last_height == new_height:
condition = False
else:
last_height = new_height
except:
condition = False
print("False")
我已经使用上面相同的代码从同一个网站上抓取数据,现在数据要大得多,以前只有 600 行,现在是 35k 行以上在“links = elem.find_elements_by_class_name("postTitle")[-1] 上崩溃" 这一行出现以下错误
Traceback (most recent call last):
File "abc", line 24, in <module>
links = elem.find_elements_by_class_name("postTitle")[-1]
File "abc", line 413, in find_elements_by_class_name
return self.find_elements(by=By.CLASS_NAME, value=name)
File "abc", line 684, in find_elements
return self._execute(Command.FIND_CHILD_ELEMENTS,
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=91.0.4472.77)
解决方案
推荐阅读
- apache-spark - 如何在旧版 Spark Streaming 中使用 foreachRDD
- php - 带有分类下拉列表的 WPBakery Page Builder 不起作用
- google-apps-script - 您的脚本 Drive Migrator 最近未能成功完成
- excel - 如何使用 CommandBars 将从 excel 复制的图表定位到 powerpoint?
- mysql - SQL - 在大表上选择最后一行的更快且不那么贪婪的方法是什么?
- python - 如何重新定义 MagicMock __str__ 方法?
- java - Spring-boot JPA,使用 LIKE 和 NULL 字段查询
- java - 如何通过“类”或“跨类”查找元素?
- ios - 使用子查询进行领域计算和排序
- docker - Docker 会作为高级压缩器提供帮助吗?