首页 > 解决方案 > 如何从 url 加载压缩的机器学习数据集?

问题描述

我正在尝试从 url 加载压缩的、制表符分隔的“MHEALTHDATASET”。 https://archive.ics.uci.edu/ml/machine-learning-databases/00319/

代码:

zipurl = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00319/MHEALTHDATASET.zip'
with urlopen(zipurl) as zipresp, NamedTemporaryFile() as tfile:
    tfile.write(zipresp.read())
    tfile.seek(0)
    unpack_archive(tfile.name, '/tmp/MHEALTHDATASET.zip', format='zip')
    dataset = np.loadtxt(urlopen(zipurl), dtype=str, delimiter="/t")
    for file in dataset:
        file = re.sub("mHealth_", "", file)

错误:

Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\algorithms\elbow.py", line 17, in <module>
    unpack_archive(tfile.name, '/tmp/MHEALTHDATASET.zip', format='zip')
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\shutil.py", line 1247, in unpack_archive
    func(filename, extract_dir, **dict(format_info[2]))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\shutil.py", line 1151, in _unpack_zipfile
    raise ReadError("%s is not a zip file" % filename)
shutil.ReadError: C:\Users\User\AppData\Local\Temp\tmp_x_c1ejk is not a zip file

标签: pythondataset

解决方案


推荐阅读