首页 > 解决方案 > 数据框扩展窗口和应用多种功能

问题描述

我正在尝试使用扩展窗口功能计算时间索引数据帧的统计信息。

date_range=pd.date_range('2015-01-01','2019-12-31')

df=pd.DataFrame(np.random.rand(len(date_range)),index=date_range,columns=['X'])

df['X'].expanding(min_periods=1).apply(lambda x: np.nanpercentile(x,50))

如何汇总每个日期的不同百分位数以获得类似

            50Th_percentile 60Th_percentile
2015-01-01    0.373176        0.373176
2015-01-02    0.601829        0.647559
2015-01-03    0.373176        0.464637
2015-01-04    0.335774        0.358215

像下面这样的东西不起作用

df['X'].expanding(min_periods=1).apply(lambda x:
x.agg[np.nanpercentile(x.values,50),np.nanpercentile(x.values,60)])

标签: pythonpandasnumpydataframe

解决方案


利用:

(df['X'].expanding(min_periods=1).agg({'50':lambda x: np.nanpercentile(x,50),
                                       '60':lambda x: np.nanpercentile(x,60)})
        .add_suffix('_percentile'))


            50_percentile  60_percentile
2015-01-01       0.520419       0.520419
2015-01-02       0.726755       0.768022
2015-01-03       0.547734       0.624805
2015-01-04       0.597529       0.627407
2015-01-05       0.547734       0.587570
...                   ...            ...
2019-12-27       0.485802       0.591071
2019-12-28       0.485946       0.590530
2019-12-29       0.486197       0.591342
2019-12-30       0.486448       0.591684
2019-12-31       0.486467       0.591791

[1826 rows x 2 columns]

要添加更多:

list_percentiles = [50,60,70,80,90]
agg_percentiles = dict(zip(map(str,list_percentiles),
                           map(lambda val: lambda x: np.nanpercentile(x,val),
                               list_percentiles)
                           )
                      )
df['X'].expanding(min_periods=1).agg(agg_percentiles).add_suffix('_percentiles')

            50_percentiles  60_percentiles  70_percentiles  80_percentiles  \
2015-01-01        0.520419        0.520419        0.520419        0.520419   
2015-01-02        0.726755        0.768022        0.809290        0.850557   
2015-01-03        0.547734        0.624805        0.701877        0.778948   
2015-01-04        0.597529        0.627407        0.675902        0.761632   
2015-01-05        0.547734        0.587570        0.627407        0.704478   
...                    ...             ...             ...             ...   
2019-12-27        0.485802        0.591071        0.691039        0.796903   
2019-12-28        0.485946        0.590530        0.690891        0.796819   
2019-12-29        0.486197        0.591342        0.691197        0.796735   
2019-12-30        0.486448        0.591684        0.691088        0.796652   
2019-12-31        0.486467        0.591791        0.691237        0.796568   

            90_percentiles  
2015-01-01        0.520419  
2015-01-02        0.891824  
2015-01-03        0.856020  
2015-01-04        0.847361  
2015-01-05        0.818785  
...                    ...  
2019-12-27        0.899027  
2019-12-28        0.898999  
2019-12-29        0.898971  
2019-12-30        0.898943  
2019-12-31        0.898916  

[1826 rows x 5 columns]

或列出理解

agg_percentiles = dict(zip(map(str,list_percentiles),
                           [lambda x: np.nanpercentile(x,val) 
                            for val in list_percentiles]
                          )
                      )

推荐阅读