首页 > 解决方案 > Read csv files in multiple zip files by using one csv as an example and loop

问题描述

I have multiple zip files in a folder and within the zip files are multiple csv files. All csv files dont have all the columns but a few have all the columns. How can I use the file that has all the columns as an example and then loop it to extract all the data into one dataframe and save it into one csv for further use?

The code I am following right now is as below:

import glob
import zipfile
import pandas as pd


dfs = []
for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\*.zip"):
    zf = zipfile.ZipFile(zip_file)
    dfs += [pd.read_csv(zf.open(f), sep=";", encoding='latin1') for f in zf.namelist()]
    df = pd.concat(dfs,ignore_index=True)


print(df)

However, I am not getting the columns and headers at all. I am stuck at this stage.

If you'd like to know the file structure,

Please find the output of the code here and

The example csv file here.

If you would like to see my project files for this code, Please find the shared google drive link here

Also, at the risk of sounding redundant, why am I required to use the sep=";", encoding='latin1' part? The code gives me an error without it otherwise.

标签: pythonpandascsv

解决方案


推荐阅读