python - 异常后如何在python中“重新填充”工作队列?
问题描述
我正在尝试构建一个多线程硒刮刀。假设我想使用 20 个 ChromeDriver 实例获取 100.000 个网站并打印它们的页面源。到目前为止,我有以下代码:
from queue import Queue
from threading import Thread
from selenium import webdriver
from numpy.random import randint
selenium_data_queue = Queue()
worker_queue = Queue()
# Start 20 ChromeDriver instances
worker_ids = list(range(20))
selenium_workers = {i: webdriver.Chrome() for i in worker_ids}
for worker_id in worker_ids:
worker_queue.put(worker_id)
def selenium_task(worker, data):
# Open website
worker.get(data)
# Print website page source
print(worker.page_source)
def selenium_queue_listener(data_queue, worker_queue):
while True:
url = data_queue.get()
worker_id = worker_queue.get()
worker = selenium_workers[worker_id]
# Assign current worker and url to your selenium function
selenium_task(worker, url)
# Put the worker back into the worker queue as it has completed it's task
worker_queue.put(worker_id)
data_queue.task_done()
return
if __name__ == '__main__':
selenium_processes = [Thread(target=selenium_queue_listener,
args=(selenium_data_queue, worker_queue)) for _ in worker_ids]
for p in selenium_processes:
p.daemon = True
p.start()
# Adding urls indefinitely to data queue
# Generating random url just for testing
for i in range(100000):
d = f'http://www.website.com/{i}'
selenium_data_queue.put(d)
# Wait for all selenium queue listening processes to complete
selenium_data_queue.join()
# Tearing down web workers
for b in selenium_workers.values():
b.quit()
我的问题是:如果任何 ChromeDriver 突然关闭(即不可恢复的异常InvalidSessionIdException
,如 如果是这样,有一个很好的做法来完成它?
解决方案
推荐阅读
- crystal-reports - 如何在 Crystal Reports 中按过去 30、60、90、120 天以上对数据进行分组?
- python - 如何在 DHCP scapy 中为 BOOTP 提供客户端 mac?
- docker - 基于 alpine 的 docker 容器的顺式硬化
- python - 删除表并将新的从 Pandas 写入 Postgres
- asp.net-mvc - 在列表组项中添加输入文本框以及文本
- c# - RichtextBox 自动将文本方向更改为 RightToLeft
- r - 如何找到与特定单词分组的主题和短语(动词/形容词)?
- python - 如何通过 HTTP 将文件发送到 Telegram 机器人?
- laravel - 为什么 Spatie/Image 包即使只处理尺寸也会压缩图像?
- c++ - 切换循环结束时未显示的游戏总数