首页 > 解决方案 > 应用 groupby 方法后如何切片成熊猫数据框

问题描述

如何切片以下数据框以选择与 sign == -1 和 csum_count == 5 的行关联的数据?

import pandas as pd
ret = np.array([np.nan,0.022, -0.007,  0.005,  0.08 ,  0.047, -0.012, -0.012,0.026, -0.022, -0.064, -0.006, -0.02 , -0.013,  0.012, -0.005,-0.01 ,  0.007, -0.005,  0.002])
sign = np.array([-1,  1, -1,  1,  1,  1, -1, -1,  1, -1, -1, -1, -1, -1,  1, -1, -1,1, -1,  1])
cumsum = np.array([ 1,  2,  3,  4,  4,  4,  5,  5,  6,  7,  7,  7,  7,  7,  8,  9,  9,10, 11, 12])
index = pd.bdate_range("2007-01-03","2007-01-31",holidays=['2007-01-15'], freq='C', name='Index')
raw = pd.DataFrame({'return':ret,'sign': sign,'cumsum':cumsum}, index = index)
raw2=raw.groupby(['sign','cumsum']).agg(csum_count=('cumsum','count'),mean_return=('return','sum')).sort_values(['sign','csum_count'],ascending=False)
raw2
                csum_count  mean_return

sign cumsum
 1   4                3        0.132
     2                1        0.022
     6                1        0.026
     8                1        0.012
     10               1        0.007
     12               1        0.002
-1   7                5       -0.125
     5                2       -0.024
     9                2       -0.015
     1                1        0.000
     3                1       -0.007
     11               1       -0.005

我试过 raw2.loc[-1,'csum_count'==5] 但这产生了一个错误。

标签: pythonpandas

解决方案


我想你需要get_level_values在这里:

raw2.loc[raw2['csum_count'].eq(5) &
         (raw2.index.get_level_values('sign') == -1)]

输出:

             csum_count  mean_return
sign cumsum                         
-1   7                5       -0.125

或者query

# would have work if `sign` is not a python keyword
# raw2.query('sign=-1 & csum_count=5')

推荐阅读