python - 对内层多索引列进行操作
问题描述
假设我有一个多索引列的数据框,
TSLA MSFT
Year revenues other_revenues expenses revenues other_revenues expenses
2019 851 10 110 200 13 213
2018 725 11 111 150 14 214
如何添加内列以获得
TSLA MSFT
Year revenues other_revenues expenses sum revenues other_revenues expenses sum
2019 851 10 110 971 200 13 213 426
2018 725 11 111 847 150 14 214 378
其次,一般来说,在使用多索引列时我应该注意哪些常用功能?使用多索引列时有没有一种思考方式?我习惯于正常(单级索引)思考,但不习惯多索引。谢谢!
解决方案
sum
首先创建由s 和MultiIndex
to填充的新 DataFrame df1
:
sub = ['revenues', 'other_revenues', 'expenses']
df1 = df.sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
然后concat
用于连接在一起:
df = pd.concat([df, df1], axis=1)
并添加了自定义订单的 lasr reindex
:
mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
MSFT TSLA \
revenues other_revenues expenses sum revenues other_revenues expenses
Year
2019 200 13 213 426 851 10 110
2018 150 14 214 378 725 11 111
sum
Year
2019 971
2018 847
编辑:您可以使用切片器查看(但我认为这里有必要进行排序MultiIndex
):
idx = pd.IndexSlice
print (df.loc[:, idx[:, ['revenues','other_revenues']]])
TSLA MSFT TSLA MSFT
revenues revenues other_revenues other_revenues
2019 851 200 10 13
2018 725 150 11 14
# df.index.name = 'Year'
sub = ['revenues', 'other_revenues', 'expenses']
df1 = df.loc[:, idx[:, ['revenues','other_revenues']]].sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
df = pd.concat([df, df1], axis=1)
mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
MSFT TSLA \
revenues other_revenues expenses sum revenues other_revenues expenses
2019 200 13 213 213 851 10 110
2018 150 14 214 164 725 11 111
sum
2019 861
2018 736