首页 > 解决方案 > 如何借助列表 url 和 python 下载图像,但并非所有 url 都处于活动状态?

问题描述

我正在尝试借助他们网站中提供的 url 从 image-net 数据集中下载网球图像,但我的代码在到达不存在的 url 后总是停止执行。

import requests
path = "./imageslist.txt"
j = 0
file1 = open(path,'r')
for i in file1.readlines():
         imagename = "Image{0}.jpg".format(j)
         result = requests.get(i)
         if result.status_code == 200:
            print(i)
            image = result.raw.read()
            open(imagename,"wb").write(image)
         j = j+1

它显示此错误:

ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.jpmorganchaseopen.com', port=80): Max retries exceeded with url: /images/tennisball.jpg%0A (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000186FDAD0A08>: Failed to establish a new connection: [Errno 11001] getaddrinfo 
failed'))

我该如何处理这个错误?

标签: pythonpython-3.xpython-requests

解决方案


尝试这个 :

import requests
path = "./imageslist.txt"
with open(path, 'r') as file1:
    all_links = [i.strip() for i in file1.readlines()]
    j = 0
    imagename = f"Image{j}.jpg"
    for link in all_links:
        result = requests.get(link)
        if result.status_code == 200:
            try:
                image = result.raw.read()
                open(imagename, "wb").write(image)
            except Exception:
                pass
        else:
            pass
        j += 1

您需要'\n'在执行file1.readlines(). 也使用 f 字符串(更快)而不是.format格式化。


推荐阅读