首页 > 解决方案 > 对内层多索引列进行操作

问题描述

假设我有一个多索引列的数据框,

       TSLA                                MSFT                   
Year   revenues other_revenues expenses    revenues other_revenues   expenses
2019        851             10      110         200             13        213
2018        725             11      111         150             14        214

如何添加内列以获得

       TSLA                                    MSFT                   
Year   revenues other_revenues expenses  sum    revenues other_revenues   expenses  sum
2019        851             10      110  971        200             13        213   426
2018        725             11      111  847        150             14        214   378

其次,一般来说,在使用多索引列时我应该注意哪些常用功能?使用多索引列时有没有一种思考方式?我习惯于正常(单级索引)思考,但不习惯多索引。谢谢!

标签: pythonpandasdataframemulti-index

解决方案


sum首先创建由s 和MultiIndexto填充的新 DataFrame df1

sub = ['revenues', 'other_revenues', 'expenses']


df1 = df.sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])

然后concat用于连接在一起:

df = pd.concat([df, df1], axis=1)

并添加了自定义订单的 lasr reindex

mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
         MSFT                                  TSLA                          \
     revenues other_revenues expenses  sum revenues other_revenues expenses   
Year                                                                          
2019      200             13      213  426      851             10      110   
2018      150             14      214  378      725             11      111   

           
      sum  
Year       
2019  971  
2018  847  

编辑:您可以使用切片器查看(但我认为这里有必要进行排序MultiIndex):

idx = pd.IndexSlice
print (df.loc[:, idx[:, ['revenues','other_revenues']]])
         TSLA     MSFT           TSLA           MSFT
     revenues revenues other_revenues other_revenues
2019      851      200             10             13
2018      725      150             11             14

# df.index.name = 'Year'
sub = ['revenues', 'other_revenues', 'expenses']


df1 = df.loc[:, idx[:, ['revenues','other_revenues']]].sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
df = pd.concat([df, df1], axis=1)

mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
         MSFT                                  TSLA                          \
     revenues other_revenues expenses  sum revenues other_revenues expenses   
2019      200             13      213  213      851             10      110   
2018      150             14      214  164      725             11      111   

           
      sum  
2019  861  
2018  736  

推荐阅读