python - 从 Jupyter 笔记本文件夹中打开多个泡菜文件不起作用
问题描述
我在服务器上使用 jupyter notebook(文件夹不在我的电脑上)。我有 30 个数据框腌制的文件夹,它们具有完全相同的列。它们都保存在下一个路径中:
Reut/folder_no_one/here_the_files_located
我想打开它们并连接它们。我知道我可以做这样的事情:
df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat
但我确信有更好、更聪明的方法可以做到这一点。我试图打开所有文件并将它们分开保存,如下所示:
num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'
{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
但我得到这个错误:
-------------------------------------------------- ------------------------- TypeError Traceback (last last call last) in ----> 1 {f"df{num}" : pd .read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
TypeError:“str”对象不可调用
我尝试播放并放置不同版本的路径,也不要放置路径(因为我的笔记本是这些文件所在的位置),但我一直收到相同的错误。
*值得一提的是,当笔记本也在该文件夹中时,我可以在不指定路径的情况下打开这些文件。
我的最终目标是自动打开所有这些表并将其连接为一张大表。
编辑:我也试过这个:
path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")
li = []
for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
并且
path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
print(table)
tables.append(pd.read_pickle(table))
但两种情况我都会出错
ValueError:当我尝试连接时没有要连接的对象。同样,当我告诉它打印文件名/表时,它什么也不做。此外,如果在循环内我尝试只打印普通字符串(如 print('hello'),则没有任何反应。路径似乎有问题,但是当我像这样打开一个特定的泡菜时:
pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')
它打开。
'更新:
这最终对我有用:
import pandas as pd
import glob
path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")
li = []
for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
解决方案
怎么样:
path_to_files = r'Reut/here_the_files_located'
df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])
这相当于:
path_to_files = r'Reut/here_the_files_located'
tables = []
for num in range(1, 33):
filename = f'{path_to_files}/table{num}.pickle'
print(filename)
tables.append(pd.read_pickle(filename))
df = pd.concat(tables)
输出:
Reut/here_the_files_located/table1.pickle
Reut/here_the_files_located/table2.pickle
Reut/here_the_files_located/table3.pickle
Reut/here_the_files_located/table4.pickle
Reut/here_the_files_located/table5.pickle
Reut/here_the_files_located/table6.pickle
Reut/here_the_files_located/table7.pickle
Reut/here_the_files_located/table8.pickle
Reut/here_the_files_located/table9.pickle
Reut/here_the_files_located/table10.pickle
Reut/here_the_files_located/table11.pickle
Reut/here_the_files_located/table12.pickle
Reut/here_the_files_located/table13.pickle
Reut/here_the_files_located/table14.pickle
Reut/here_the_files_located/table15.pickle
Reut/here_the_files_located/table16.pickle
Reut/here_the_files_located/table17.pickle
Reut/here_the_files_located/table18.pickle
Reut/here_the_files_located/table19.pickle
Reut/here_the_files_located/table20.pickle
Reut/here_the_files_located/table21.pickle
Reut/here_the_files_located/table22.pickle
Reut/here_the_files_located/table23.pickle
Reut/here_the_files_located/table24.pickle
Reut/here_the_files_located/table25.pickle
Reut/here_the_files_located/table26.pickle
Reut/here_the_files_located/table27.pickle
Reut/here_the_files_located/table28.pickle
Reut/here_the_files_located/table29.pickle
Reut/here_the_files_located/table30.pickle
Reut/here_the_files_located/table31.pickle
Reut/here_the_files_located/table32.pickle
关于您的代码的一些评论:
num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'
{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
num=list(range(1, 33)) #number of tables I have in the folder
无需创建list
with range
。range
直接在 for 循环或列表/字典理解中使用非常有效。
Path=r'Reut/folder_no_one/here_the_files_located'
我猜你之前已经Path
从pathlib
. 如果您想正常调用,则需要为该变量选择另一个名称Path
。这就是你得到错误的原因TypeError: 'str' object is not callable
。
如果表名不一样,有没有办法使用它?例如,如果一个是 table1,一个是 dataframe3,只是读取它们不取决于它们的名称
当然。假设您保存的所有表格的文件名都以 结尾.pickle
,您可以使用glob
第一次尝试的方法。别忘了import pathlib
。
import pathlib
path_to_files = r'Reut/here_the_files_located'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pickle"):
tables.append(pd.read_pickle(table))
df = pd.concat(tables)
推荐阅读
- jenv - /Users/username/.jenv/jenv.version:权限被拒绝
- python - 绘制 4D 信息
- xamarin - 如何在 xamarin.forms 应用中使用 Google Ad-mob 的 Adpative 横幅广告?
- javascript - 清除 forEach 循环内的变量计数器?
- reactjs - 我想使用本地存储作为钩子制作 Todo
- constraint-programming - 将代理分配给具有固定开始时间和结束时间的任务的 CP/MILP 问题名称是什么?
- mobile - 我可以通过appium接任何视频通话吗?
- javascript - 如何在 ES6 和 Webpack 中使用导入的模块进行评估
- excel - 如何自动缩放 y 轴以在 Excel 中向下移动折线图?
- python - 在 Pandas Dataframe 中以多列作为键的左连接