python - 发出并发 selenium/chromedriver 请求时出现 ResponseNotReady 错误
问题描述
我有一个使用多个无头浏览器(+ Chrome 驱动程序)发出请求的脚本selenium
,每个浏览器都通过不同的 SOCKS 代理发出 HTTP 请求。所有请求都在concurrent.futures.ThreadPoolExecutor()
. 出于某种原因,我会定期收到错误消息ResponseNotReady: Idle
,但我不明白为什么。
我的问题是:
1) 是什么导致了这个 ResponseNotReady 错误?是我做错了什么,还是我只需要捕捉和响应的正常异常?
2) 如何正确处理 ResponseNotReady 异常?从中恢复的最佳方法是什么?
这是我提出请求的功能:
def _fetch_selenium(self, url, session, port):
domain = self.domainFromURL(url)
with self.locks[port][domain]:
try:
start_time = datetime.now()
session.get(url)
sleep(self.delay)
return {'url': url,
'html': session.page_source,
'time': datetime.now() - start_time,
'proxy_port': port}
except selenium_exceptions.WebDriverException as e:
print("Request of URL " + url + " failed with exception: " + str(e))
sleep(self.delay)
return {'url': url,
'html': None,
'time': datetime.now() - start_time,
'proxy_port': port}
这是我将请求分派到不同的 selenium 会话的代码(该fetch()
函数基本上只是最终调用_fetch_selenium()
:
def fetchConcurrent(self, urls):
results = []
timeouts = defaultdict(int)
with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
futures = []
for url in urls:
session = self.sessions.popleft()
futures.append(executor.submit(self.fetch, url, session))
self.sessions.append(session)
for future in as_completed(futures):
result, session = future.result()
results.append(result)
if not result['html']:
socks_port = result['proxy_port']
print(f"Got no HTML for url {result['url']}, using port {socks_port}.")
timeouts[socks_port] += 1
if timeouts[socks_port] > MAX_TIMEOUTS_PER_CLIENT:
tor_client_pool.replaceClient(socks_port)
self.killSeleniumSession(session)
self.sessions.remove(session)
self.newSeleniumSession(socks_port)
timeouts[socks_port] = 0
continue
print(f"GOT: {result['url'].strip()} in {result['time']} seconds, using proxy on port {result['proxy_port']})")
return results
当我运行上面的代码时,它成功下载了许多页面,但最终会碰到一个弹出这个 ResponseNotReady 错误的页面,但我不知道该页面是什么导致它崩溃。这是发生错误时我看到的回溯:
~/Code/gis project/code/TorGetter.py in _fetch_selenium(self, url, session, port)
202 try:
203 start_time = datetime.now()
--> 204 session.get(url)
205 sleep(self.delay) # no other requests to this domain can be made by this tor client while we sleep() here
206 return {'url': url,
~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in get(self, url)
324 Loads a web page in the current browser session.
325 """
--> 326 self.execute(Command.GET, {'url': url})
327
328 @property
~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
310
311 params = self._wrap_value(params)
--> 312 response = self.command_executor.execute(driver_command, params)
313 if response:
314 self.error_handler.check_response(response)
~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in execute(self, command, params)
470 data = utils.dump_json(params)
471 url = '%s%s' % (self._url, path)
--> 472 return self._request(command_info[0], url, body=data)
473
474 def _request(self, method, url, body=None):
~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in _request(self, method, url, body)
494 try:
495 self._conn.request(method, parsed_url.path, body, headers)
--> 496 resp = self._conn.getresponse()
497 except (httplib.HTTPException, socket.error):
498 self._conn.close()
/usr/lib/python3.6/http/client.py in getresponse(self)
1319 #
1320 if self.__state != _CS_REQ_SENT or self.__response:
-> 1321 raise ResponseNotReady(self.__state)
1322
1323 if self.debuglevel > 0:
ResponseNotReady: Idle
任何想法这里发生了什么,以及如何解决它?谢谢!
解决方案
推荐阅读
- c# - CompileAssemblyFromFile c# 属性 7.2 失败
- excel - 基于索引匹配的 SUM 单元格
- swift - 使用 SwiftyJSON 将数据附加到现有的 JSON 数组
- spring-rest - 如何在rest xml中删除处理程序和hibernateinitializer
- vb.net - 将图像的字节写入文本文件
- swift - FireStore如何做ClientSideJoin
- python - 在风格迁移中使用 L2 归一化 - 不涉及权重?
- flutter - 根据时间导航到页面
- java - 用 Java 播放奇异的 WAV 文件
- polymer - Polymer 3 - 使用 stylesFromTemplate