首页 > 解决方案 > 从 Jupyter 笔记本文件夹中打开多个泡菜文件不起作用

问题描述

我在服务器上使用 jupyter notebook(文件夹不在我的电脑上)。我有 30 个数据框腌制的文件夹,它们具有完全相同的列。它们都保存在下一个路径中:

Reut/folder_no_one/here_the_files_located

我想打开它们并连接它们。我知道我可以做这样的事情:

df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat

但我确信有更好、更聪明的方法可以做到这一点。我试图打开所有文件并将它们分开保存,如下所示:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

但我得到这个错误:

-------------------------------------------------- ------------------------- TypeError Traceback (last last call last) in ----> 1 {f"df{num}" : pd .read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

TypeError:“str”对象不可调用

我尝试播放并放置不同版本的路径,也不要放置路径(因为我的笔记本是这些文件所在的位置),但我一直收到相同的错误。

*值得一提的是,当笔记本也在该文件夹中时,我可以在不指定路径的情况下打开这些文件。

我的最终目标是自动打开所有这些表并将其连接为一张大表。

编辑:我也试过这个:

path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
    df = pd.read_pickle(filename)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

并且

path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
    print(table)
    tables.append(pd.read_pickle(table))

但两种情况我都会出错

ValueError:当我尝试连接时没有要连接的对象。同样,当我告诉它打印文件名/表时,它什么也不做。此外,如果在循环内我尝试只打印普通字符串(如 print('hello'),则没有任何反应。路径似乎有问题,但是当我像这样打开一个特定的泡菜时:

pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')

它打开。

'更新:

这最终对我有用:

import pandas as pd
import glob

path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
    df = pd.read_pickle(filename)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

从这里(从 Jupyter 笔记本文件夹中打开多个泡菜文件不起作用

标签: pythonserverjupyter-notebookpickleconcat

解决方案


怎么样:

path_to_files = r'Reut/here_the_files_located'
df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])

这相当于:

path_to_files = r'Reut/here_the_files_located'
tables = []
for num in range(1, 33):
    filename = f'{path_to_files}/table{num}.pickle'
    print(filename)
    tables.append(pd.read_pickle(filename))

df = pd.concat(tables)

输出:

Reut/here_the_files_located/table1.pickle
Reut/here_the_files_located/table2.pickle
Reut/here_the_files_located/table3.pickle
Reut/here_the_files_located/table4.pickle
Reut/here_the_files_located/table5.pickle
Reut/here_the_files_located/table6.pickle
Reut/here_the_files_located/table7.pickle
Reut/here_the_files_located/table8.pickle
Reut/here_the_files_located/table9.pickle
Reut/here_the_files_located/table10.pickle
Reut/here_the_files_located/table11.pickle
Reut/here_the_files_located/table12.pickle
Reut/here_the_files_located/table13.pickle
Reut/here_the_files_located/table14.pickle
Reut/here_the_files_located/table15.pickle
Reut/here_the_files_located/table16.pickle
Reut/here_the_files_located/table17.pickle
Reut/here_the_files_located/table18.pickle
Reut/here_the_files_located/table19.pickle
Reut/here_the_files_located/table20.pickle
Reut/here_the_files_located/table21.pickle
Reut/here_the_files_located/table22.pickle
Reut/here_the_files_located/table23.pickle
Reut/here_the_files_located/table24.pickle
Reut/here_the_files_located/table25.pickle
Reut/here_the_files_located/table26.pickle
Reut/here_the_files_located/table27.pickle
Reut/here_the_files_located/table28.pickle
Reut/here_the_files_located/table29.pickle
Reut/here_the_files_located/table30.pickle
Reut/here_the_files_located/table31.pickle
Reut/here_the_files_located/table32.pickle

关于您的代码的一些评论:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
num=list(range(1, 33)) #number of tables I have in the folder

无需创建listwith rangerange直接在 for 循环或列表/字典理解中使用非常有效。

Path=r'Reut/folder_no_one/here_the_files_located'

我猜你之前已经Pathpathlib. 如果您想正常调用,则需要为该变量选择另一个名称Path。这就是你得到错误的原因TypeError: 'str' object is not callable


如果表名不一样,有没有办法使用它?例如,如果一个是 table1,一个是 dataframe3,只是读取它们不取决于它们的名称

当然。假设您保存的所有表格的文件名都以 结尾.pickle,您可以使用glob第一次尝试的方法。别忘了import pathlib

import pathlib
path_to_files = r'Reut/here_the_files_located'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pickle"):
    tables.append(pd.read_pickle(table))

df = pd.concat(tables)

推荐阅读