python - 最大重试次数超过 URL(由 NewConnection 错误引起)
问题描述
我正在尝试创建从archive.org 抓取和下载特定文件的代码。当我运行程序时,我遇到了这个代码错误。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\ROMS\Gamecube\main.py", line 16, in <module>
response = requests.get(DOMAIN + file_link)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='archive.org007%20-%20agent%20under%20fire%20%28usa%29.nkit.gcz', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x043979B8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
这是我的代码:
from bs4 import BeautifulSoup as bs
import requests
DOMAIN = 'https://archive.org'
URL = 'https://archive.org/download/GCRedumpNKitPart1'
FILETYPE = '%28USA%29.nkit.gcz'
def get_soup(url):
return bs(requests.get(url).text, 'html.parser')
for link in get_soup(URL).find_all('a'):
file_link = link.get('href')
if FILETYPE in file_link:
print(file_link)
with open(link.text, 'wb') as file:
response = requests.get(DOMAIN + file_link)
file.write(response.content)
解决方案
您只是忘记了/
,https://archive.org
因此您创建了不正确的网址。
/
在域末尾添加
DOMAIN = 'https://archive.org/'
或/
稍后添加
response = requests.get(DOMAIN + '/' + file_link)
或用于urllib.parse.urljoin()
创建网址
import urllib.parse
response = requests.get(urllib.parse.urljoin(DOMAIN, file_link))
推荐阅读
- java - 我可以在多个地方使用相同的片段吗?
- mysql - 是什么降低了这个产品搜索 MySQL 查询的速度?
- spotfire - 如何使用 C# API 删除 FilteringScheme?
- r - 基于匹配字符的折叠因子级别
- vb.net - 如何使用功能来防止datagridview的重复记录
- angular - 如何将动态创建的输入元素绑定到 FormGroup?
- html - 如何使右侧边栏悬停在页面的上下文上而不是占用单独的空间
- google-bigquery - 银行家对 BigQuery 的四舍五入
- javascript - 如何在 Node.js 中为无限请求维护网络连接
- ignite - 点燃缓存大小返回正确的值,但在尝试访问缓存时返回空值