首页 > 解决方案 > 从 CSV 文件列表生成 Pandas DataFrames

问题描述

框定问题。我正在搜索所有 csv 文件的目录。我将每个 csv 文件的路径与描述一起保存到 DataFrame 中。我知道想要遍历 DataFrame,并将特定的 csv 文件读入一个数据帧,其名称是从原始文件名生成的。我无法弄清楚如何动态生成这些数据帧。几天前我开始编码,如果语法不好,请道歉。

# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list

PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
                 for path, subdir, files in os.walk(PATH)
                 for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path

df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])

# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
    filepath = ""
    filename = df_csv_info['File Name'].values[index]
    filepath = str(df_csv_info['Path'].values[index])
    filename = pd.read_csv(filepath)

标签: pythonpandasdataframecsv

解决方案


最好的方法是创建一个字典,其键是文件名,值是相应的 DataFrame。现代方法不是使用os.pathand ,而是从标准库中使用。globpathlib

假设您实际上不需要包含文件名的 DataFrame,而只需要每个 csv 文件的 DataFrame,您可以简单地执行

from pathlib import Path

PATH = Path("Z:\Adam")
EXT = "*.csv"

# dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
files_dfs = {}

# recursive search for csv files in PATH folder and subfolders 
for csv_file in PATH.rglob(EXT):
    filename = csv_file.name     # get the filename 
    df = pd.read_csv(csv_file)   # read the csv file as a DataFrame
    files_dfs[filename] = df     # add the DataFrame to the dictionary

然后,要访问特定文件的 DataFrame,您可以执行

filename_df = files_dfs["<filename>"]

推荐阅读