首页 > 解决方案 > 最大重试次数超过 URL(由 NewConnection 错误引起)

问题描述

我正在尝试创建从archive.org 抓取和下载特定文件的代码。当我运行程序时,我遇到了这个代码错误。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\ROMS\Gamecube\main.py", line 16, in <module>
    response = requests.get(DOMAIN + file_link)
  File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='archive.org007%20-%20agent%20under%20fire%20%28usa%29.nkit.gcz', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x043979B8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

这是我的代码:

from bs4 import BeautifulSoup as bs
import requests

DOMAIN = 'https://archive.org'
URL = 'https://archive.org/download/GCRedumpNKitPart1'
FILETYPE = '%28USA%29.nkit.gcz'

def get_soup(url):
    return bs(requests.get(url).text, 'html.parser')

for link in get_soup(URL).find_all('a'):
    file_link = link.get('href')
    if FILETYPE in file_link:
        print(file_link)
        with open(link.text, 'wb') as file:
            response = requests.get(DOMAIN + file_link)
            file.write(response.content)

标签: python

解决方案


您只是忘记了/https://archive.org因此您创建了不正确的网址。

/在域末尾添加

DOMAIN = 'https://archive.org/'

/稍后添加

response = requests.get(DOMAIN + '/' + file_link)

或用于urllib.parse.urljoin()创建网址

import urllib.parse

response = requests.get(urllib.parse.urljoin(DOMAIN, file_link))

推荐阅读