首页 > 解决方案 > 如何在不提取的情况下打开嵌套的 zip 存档并将其附加到数据框中?

问题描述

我正在尝试打开在几层 zip 文件中找到的大量 csv 文件。鉴于这个项目的性质,我试图打开它们,将它们 read_csv 放入一个数据框,将该数据附加到一个聚合数据框,然后继续循环。

示例:文件夹目录/First Zip/Second Zip/Third Zip/csv file.csv

我现有的代码可以遍历第二个和第三个 zip 文件的内容并获取每个 csv 文件的名称。我知道通过导入 glob 可能会使这段代码更简单,但我不熟悉。

import os
import pandas as pd 
import zipfile, re, io
directory = 'C:/Test/'
os.chdir(directory)
fname = "test" + ".zip"
with zipfile.ZipFile(fname, 'r') as zfile:
    # second level of zip files
    for zipname in zfile.namelist():
        if re.search(r'\.zip$', zipname) != None:
            zfiledata = io.BytesIO(zfile.read(zipname))
            # third level of zip files
            with zipfile.ZipFile(zfiledata) as zfile2:
                for zipname2 in zfile2.namelist():
                    # this zipfile contains xml and csv contents. This filters out the xmls
                    if zipname2.find("csv") > 0:
                        zfiledata2 = io.BytesIO(zfile2.read(zipname2))
                        with zipfile.ZipFile(zfiledata2) as zfile3:
                            fullpath = directory + fname + "/" + zipname + "/" + zipname2 + "/"
                            # csv file names are always the same as their zips. this cleans the string.
                            csvf = zipname2.replace('_csv.zip',".csv")
                            filehandle = open(fullpath, 'rb')
                            # the above statement is erroring: FileNotFoundError: [Errno 2] No such file or directory:
                            zfilehandle = zipfile.ZipFile(filehandle)
                            data = []
                            csvdata = StringIO.StringIO(zfilehandle.read(csvf))
                            df = pd.read_csv(csvdata)
                            data.append(df)
print(data.head())

标签: python-3.xpandasdataframezipfile

解决方案


推荐阅读