首页 > 解决方案 > Python:识别 zip 文件的无效在线链接

问题描述

我正在尝试从https://www.nseindia.com/自动提取股票价格数据。数据存储为 zip 文件,并且 zip 文件的 url 因日期而异。如果在某个特定日期股市休市,例如周末和节假日,则不会有文件/网址。

我想识别无效链接(不存在的链接)并跳到下一个链接。

这是一个有效的链接 -
path = 'https://archives.nseindia.com/content/historical/EQUITIES/2021/MAY/cm05MAY2021bhav.csv.zip'

这是一个无效链接 - (因为 5 月 1 日是周末,当天股市休市)
path2 = 'https://archives.nseindia.com/content/historical/EQUITIES/2021/MAY/cm01MAY2021bhav.csv.zip '

这就是我提取数据的方法

from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile
import pandas as pd
import datetime

start_date = datetime.date(2021, 5, 3)
end_date = datetime.date(2021, 5, 7)
delta = datetime.timedelta(days=1)
final = pd.DataFrame()

while start_date <= end_date:
    print(start_date)
    day = start_date.strftime('%d')
    month = start_date.strftime('%b').upper()
    year = start_date.strftime('%Y')
    start_date += delta
    path = 'https://archives.nseindia.com/content/historical/EQUITIES/'  + year + '/' + month + '/cm' + day + month + year + 'bhav.csv.zip'
    file = 'cm' + day + month + year + 'bhav.csv' 
    try:
        with urlopen(path) as f: 
            with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
                foofile = myzipfile.open(file)
                df = pd.read_csv(foofile)
                final.append(df)
    except:
        print(file + 'not there')

如果路径无效,python 卡住了,我必须重新启动 Python。在循环多个日期时,我无法错误处理或识别无效链接。

到目前为止,我尝试区分有效链接和无效链接 -

# Attempt 1
import os
os.path.exists(path)
os.path.isfile(path)
os.path.isdir(path)
os.path.islink(path)

# output is False for both Path and Path2

# Attempt 2
import validators
validators.url(path)

# output is True for both Path and Path2

# Attempt 3
import requests
site_ping = requests.get(path)
site_ping.status_code < 400

# Output for Path is True, but Python crashes/gets stuck when I run requests.get(path2) and I have to restart everytime.

提前感谢您的帮助。

标签: python-3.xexceptiondata-extraction

解决方案


正如 SuperStormer 所建议的那样 - 为请求添加超时解决了这个问题

try:
        with urlopen(zipFileURL, timeout = 5) as f: 
            with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
                foofile = myzipfile.open(file)
                df = pd.read_csv(foofile)
                final.append(df)
    except:
        print(file + 'not there')

推荐阅读