首页 > 解决方案 > 根据 Pandas 中的常用列将一个数据帧拆分为多个子数据帧

问题描述

假设数据帧df如下:

            ISIN  Coupon  Issue Date Maturity Date  Jan 18th  Jan 19th  \
0   CA135087A610  0.0150   7/30/2012      6/1/2023    103.13    103.14   
1   CA135087F254  0.0075  10/19/2015      3/1/2021    100.08    100.08   
2   CA135087F585  0.0075   4/11/2016      9/1/2021    100.40    100.40   
3   CA135087G328  0.0050  10/11/2016      3/1/2022    100.44    100.43   
4   CA135087H490  0.0175   10/6/2017      3/1/2023    103.29    130.29   
5   CA135087ZU15  0.0275    8/2/2011      6/1/2022    103.61    103.60   
6   CA135087J546  0.0225   10/5/2018      3/1/2024    106.21    106.21   
7   CA135087J967  0.0150    4/5/2019      9/1/2024    104.27    104.28   
8   CA135087K528  0.0125  10/11/2019      3/1/2025    103.60    103.60   
9   CA135087K940  0.0050    4/3/2020      9/1/2025    100.33    100.32   
10  CA135087L518  0.0025   10/9/2020      3/1/2026     98.78     98.77   

    Jan 20th  
0     103.10  
1     100.08  
2     100.39  
3     100.41  
4     103.25  
5     103.57  
6     106.17  
7     104.25  
8     103.59  
9     100.30  
10     98.76  

我需要根据公共列ISIN, Coupon, Issue Date,将其拆分为 3 个子数据框Maturity Date

common_columns = ['ISIN', 'Coupon', 'Issue Date', 'Maturity Date']
df1.columns = common_columns + 'Jan 18th'
df2.columns = common_columns + 'Jan 19th'
df3.columns = common_columns + 'Jan 20th'

然后将它们保存到一个 excel 文件中,使用sheet_nameJan 18th和. 请注意,在实际数据中还有其他日期列,等。Jan 19thJan 20thJan 21thJan 22th

要保存数据框:

writer = pd.ExcelWriter('data.xlsx')
df1.to_excel(writer, 'Jan 18th')
df2.to_excel(writer, 'Jan 19th')
df3.to_excel(writer, 'Jan 20th')
writer.save()

我怎么能在 Python 中做到这一点?谢谢。

更新:

对于第一张表ISIN替换为2021-01-18,1 月 18 日替换为Close Price,第二张表ISIN替换为2021-01-19,1 月 19 日替换为Close Price,第三张表ISIN替换为2021-01-20,1 月 20 日替换为Close Price

标签: python-3.xpandasdataframe

解决方案


让我们index.difference通过从数据框中删除来获取日期类列comm_columns,然后对于每个日期类列使用DataFrame.to_excelwithpd.ExcelWriter来创建工作表:

comm_cols = ['ISIN', 'Coupon', 'Issue Date', 'Maturity Date']
date_cols = df.columns.difference(comm_cols)

with pd.ExcelWriter('data.xlsx') as writer:
    for col in date_cols:
        sub_df = df[comm_cols + [col]]
        sub_df.to_excel(writer, sheet_name=col)
        
    writer.save()

更新:在保存工作表时使用rename该列:ISIN

comm_cols = ['ISIN', 'Coupon', 'Issue Date', 'Maturity Date']
isin_cols = ['2021-01-18', '2021-02-18', '2021-02-19']
date_cols = df.columns.difference(comm_cols)

with pd.ExcelWriter('data.xlsx') as writer:
    for col, isin_col in zip(date_cols, isin_cols):
        sub_df = df[comm_cols + [col]].rename(columns={'ISIN': isin_col})
        sub_df.to_excel(writer, sheet_name=col)
        
    writer.save()

推荐阅读