python - Read csv files in multiple zip files by using one csv as an example and loop
问题描述
I have multiple zip files in a folder and within the zip files are multiple csv files. All csv files dont have all the columns but a few have all the columns. How can I use the file that has all the columns as an example and then loop it to extract all the data into one dataframe and save it into one csv for further use?
The code I am following right now is as below:
import glob
import zipfile
import pandas as pd
dfs = []
for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\*.zip"):
zf = zipfile.ZipFile(zip_file)
dfs += [pd.read_csv(zf.open(f), sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)
print(df)
However, I am not getting the columns and headers at all. I am stuck at this stage.
If you'd like to know the file structure,
Please find the output of the code here and
If you would like to see my project files for this code, Please find the shared google drive link here
Also, at the risk of sounding redundant, why am I required to use the sep=";", encoding='latin1'
part? The code gives me an error without it otherwise.
解决方案
推荐阅读
- python - repli时令牌不正确
- operating-system - 没有出口部分的关键部分问题
- c# - 如何让 Visual Studio 将依赖项复制到自定义输出文件夹?
- php - 通过钩子修改Learndash API响应
- javascript - 为什么我不能通过增加数组来显示我的产品?
- spring-data-mongodb - Spring Data MongoDB查询键名中包含非法字符的文档
- jetbrains-ide - 基于 JetBrains IDE 中的模式的自定义格式?
- java - Rabin-Karp 不适用于大素数(输出错误)
- c++ - 奇怪的 CPP 行为
- javascript - 计时器到达零时不会停止