python - 使用列值python对excel文件进行排序
问题描述
我有 n 个 excel 文件,我需要根据列的值对它们进行排序。实际上,我需要在创建子文件夹时组织放置在特定文件夹下的 excel 文件,并且每个子文件夹都包含相同的 excel 文件DEPTNAME
,知道这DEPTNAME
是一个列名,每个 excel 文件都有 m 个工作表,但所有工作表都有相同的DEPTNAME
.
示例:一个包含 4 个 excel 文件的文件夹:
df1= pd.DataFrame({'Last Name':[‘Stark’, ‘Stark’, ‘ Stark’, ‘Stark’],
'FirstName':['Arya', ,'Arya','Arya','Arya',],
'DEPTNAME':['Sécu','Sécu','Sécu','Sécu']})
df2= pd.DataFrame({'Last Name':[‘Lannister’, ‘Lannister’, ‘ Lannister’, ‘Lannister’],
'FirstName':['Cersei', ,'Cersei','Cersei','Cersei',],
'DEPTNAME':['Auto','Auto','Auto','Auto']})
df3= pd.DataFrame({'Last Name':[‘Snow’, ‘Snow’, ‘ Snow’, ‘Snow’, ‘ Snow’, ‘Snow’],
'FirstName':['Jon', 'Jon','Jon','Jon','Jon','Jon'],
'DEPTNAME':['Aero','Aero','Aero','Aero','Aero','Aero']})
df4= pd.DataFrame({'Last Name':[‘Lannister’, ‘Lannister’, ‘ Lannister’, ‘Lannister’],
'FirstName':['Tyrion', 'Tyrion','Tyrion','Tyrion',],
'DEPTNAME':['Aero','Aero','Aero','Aero']})
现在我需要自动创建 3 个文件夹Sécu
:Aero
和Auto
.
Sécu
将包含一个 excel 文件
Aero
将包含两个excel文件
Auto
将包含一个 excel 文件
知道我的初始文件夹包含 n 个带有多张工作表的 excel 文件是否可行?
解决方案
here is one way which combines all files in a folder and all sheets in each file and then groups on DEPTNAME
and the filename + sorts the files in the folder(Note: if same DEPTNAME
are in 2 different excel fies, they are saves as 2 different files in the same folder <- as requested):
def myf(folder,files_to_be_created_in_folder):
""" folder is the path to input files and files_to_be_created_in_folder
is the path where the directories are to be created"""
folder = folder
list_of_files=os.listdir(folder)
combined_sheets={i[:-5]:pd.concat(pd.read_excel(os.path.join(folder,i),sheet_name=None)
.values(),sort=False)for i in list_of_files}
combined_all_files=pd.concat(combined_sheets.values(),keys=combined_sheets.keys())
d={i:g for i,g in combined_all_files.groupby(['DEPTNAME'
,combined_all_files.index.get_level_values(0)])}
to_create_folder=files_to_be_created_in_folder
for k,v in d.items():
newpath=os.path.join(to_create_folder,k[0])
if not os.path.exists(newpath):
os.makedirs(newpath)
v.to_excel(os.path.join(newpath,f"{k[1]}.xlsx"),index=False)
myf(r'C:\path_to_files\test_folder',r'C:\path_to_write\New folder') #replace paths carefully
For testing I have tried t print a folder tree based on this solution which depicts a folder tree:
ptree(r'C:\path_to_files\test_folder')
test_folder/
|-- test_1.xlsx
|-- test_2.xlsx
|-- test_3.xlsx
|-- test_4.xlsx
ptree(r'C:\path_to_write\New folder') #this also has the test folder
New folder/
|-- Aero/
| |-- test_3.xlsx
| |-- test_4.xlsx
|-- Auto/
| |-- test_2.xlsx
|-- Sécu/
| |-- test_1.xlsx
|-- test_folder/
| |-- test_1.xlsx
| |-- test_2.xlsx
| |-- test_3.xlsx
| |-- test_4.xlsx
推荐阅读
- amazon-web-services - Amazon Location CalculateRoute API 是否支持返回多个路由,以便您可以使用返回的结果找到最短路径?
- sql - 在主打开查询选择语句中使用 where sub select
- c# - IdentityServer4 (v3.1.x) 实体框架 - 会话不持久
- javascript - 在这个确切的函数中 if 子句下面的“return”的工作是什么?还请用文字解释这个功能是如何工作的
- json - 如何从go中的大json获取数据
- java - 如何在 Java Unirest POST 请求的字段中转义引号
- go - 从 GO WebAssembly 调用 JavaScript 事件
- kubernetes - Gitlab-ci Kubernetes 集群未创建命名空间(作业失败:未找到命名空间)
- android - 如何在 Android 相机 hal 中检索位置数据?
- r - 优化两个变量中的函数,创建一个向量作为输出?