首页 > 解决方案 > 如何将许多腌制文件发送到数据框中?

问题描述

我有许多使用“pickle”创建的文件。我想将它们发送到一个数据帧,计算每个数据的平均值(从第二行到最后),乘以 1000 并将其四舍五入到小数点后 2 位。

到目前为止,我已经使用 1 个泡菜文件实现了这一目标。

import pandas as pd

df = pd.read_pickle(r'C:\Users\file_inference_time')
df = pd.DataFrame(df)
df.rename(columns={0:'MobileNet'},inplace=True)

df_mean=(df.iloc[2::,:].mean()* 1000).round(decimals=2)
df_mean2=pd.DataFrame(df_mean)
df_mean2

结果我从 1 个文件中得到。

在此处输入图像描述

这些是我需要阅读的文件(“pickle”) 在此处输入图像描述

编辑这是我在运行第二个选项时遇到的错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-b72e45d8bcfc> in <module>
     16 
     17 
---> 18 df_mean_all = pd.concat(df_mean_list).reset_index(drop=True)
     19 
     20 print(df_mean_all)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    253         verify_integrity=verify_integrity,
    254         copy=copy,
--> 255         sort=sort,
    256     )
    257 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    302 
    303         if len(objs) == 0:
--> 304             raise ValueError("No objects to concatenate")
    305 
    306         if keys is None:

ValueError: No objects to concatenate

这是一个带有结果的情节

在此处输入图像描述

标签: python-3.xpandaspickle

解决方案


得到dict一个dataframes

  • 将每个文件的计算平均结果保存到dict
from pathlib import Path

dir_path = Path(r'C:\Users\path_to_files')
files = dir_path.glob('**/file_inference_time*')  # get all pkl files in main dir and subdirectories

df_mean_dict = dict()

for i, file in enumerate(files):
    df = pd.DataFrame(pd.read_pickle(file))
    df.rename(columns={0:'MobileNet'}, inplace=True)

    df_mean_dict[i] = pd.DataFrame((df.iloc[2::,:].mean()* 1000).round(decimals=2))

    # if all the file names are unique, the dict key can be the file name (w/o the file extension)
    # df_mean_dict[file.stem] = pd.DataFrame((df.iloc[2::,:].mean()* 1000).round(decimals=2))

获取单个数据框 - 这就是我要做的

  • 结果df_mean_all将是一个 2 列的数据框。
    • 第 0 列将是MobileNet
    • 第 1 列将是file
dir_path = Path(r'C:\Users\path_to_files')
files = dir_path.glob('**/file_inference_time*')   # get all pkl files in main dir and subdirectories

# to check if the files are found
# if an empty list prints, no files are found
files = list(files)
print(files[:5]

df_mean_list = list()

for file in files:
    df = pd.DataFrame(pd.read_pickle(file))

    df_mean = pd.DataFrame((df.iloc[2::,:].mean()* 1000).round(decimals=2)).reset_index(drop=True).rename(columns={0: 'MobileNet'})
    df_mean['file'] = file  # or file.stem for just the file name

    df_mean_list.append(df_mean)

# df_mean_list is a list of dataframes, pd.concat combines them all into one dataframe
df_mean_all = pd.concat(df_mean_list).reset_index(drop=True)

print(df_mean_all)

   MobileNet                                    file
0       3.24  C:\Users\file_inference_time\file1.pkl
1       2.34  C:\Users\file_inference_time\file2.pkl
2       4.23  C:\Users\file_inference_time\file3.pkl

推荐阅读