python - 使用 aiohttp 时出现 socket.gaierror
问题描述
我正在尝试从多个域中获取标题。所以我写了这段代码:
import aiohttp
import asyncio
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
'Accept-Encoding': ', '.join(('gzip', 'deflate', 'br')),
'Accept': '*/*',
'Connection': 'keep-alive'
}
async def fetch(url, session):
async with session.get(f'http://{url}') as response:
text = await response.text()
return url, BeautifulSoup(text, 'lxml').title.string
async def main():
async with asyncio.Semaphore(50):
async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False), timeout=aiohttp.ClientTimeout(10),
headers=headers) as session:
titles = await asyncio.gather(*[fetch(domain, session) for domain in domains[0:500]],
return_exceptions=True)
for title in titles:
print(title)
if __name__ == '__main__':
domains = []
with open('input', 'r') as f:
for line in f:
domains.append(line.rstrip())
asyncio.run(main())
它有效,但有时会抛出这样的错误
Task exception was never retrieved
future: <Task finished name='Task-1635' coro=<TCPConnector._resolve_host() done, defined at /venv/lib/python3.8/site-packages/aiohttp/connector.py:774> exception=gaierror(8, 'nodename nor servname provided, or not known')>
Traceback (most recent call last):
File "/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 829, in _resolve_host
addrs = await \
File "/venv/lib/python3.8/site-packages/aiohttp/resolver.py", line 29, in resolve
infos = await self._loop.getaddrinfo(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 817, in getaddrinfo
return await self.run_in_executor(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 914, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
Task exception was never retrieved
future: <Task finished name='Task-1617' coro=<TCPConnector._resolve_host() done, defined at /venv/lib/python3.8/site-packages/aiohttp/connector.py:774> exception=gaierror(8, 'nodename nor servname provided, or not known')>
Traceback (most recent call last):
File "/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 829, in _resolve_host
addrs = await \
File "/venv/lib/python3.8/site-packages/aiohttp/resolver.py", line 29, in resolve
infos = await self._loop.getaddrinfo(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 817, in getaddrinfo
return await self.run_in_executor(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 914, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
有时它会抛出更多错误,有时会更少。谁能解释它为什么抛出它?我试图将 get 方法包装在 try: except: 中,就像这样构造,但它仍然无法正常工作。
async def fetch(url, session):
async with session.get(f'http://{url}') as response:
try:
text = await response.text()
return url, BeautifulSoup(text, 'lxml').title.string
except BaseException as e:
return e
解决方案
推荐阅读
- javascript - mini-css-extract-plugin 加载器上的 Webpack 错误
- node.js - Npm 脚本失败 - '.' 未被识别为内部或外部命令
- php - 在 php 中通过 openssl_encrypt 和 openssl_decrypt 使用 openssl 会出错
- javascript - 通过单击按钮打开模式并通过单击关闭按钮隐藏模式
- html - CSS 悬停过渡
- java - __builtin_clz 的 Java 等价物是什么?
- aws-cdk - aws cdk 使用非默认配置文件
- python - 什么是 django.db.utils.OperationalError: (2000, 'Unknown MySQL error')
- kubernetes - Kubernetes 并行计算
- javascript - Javascript:解析每个 CSV 列以分隔 JSON