首页 > 解决方案 > pandas:有效地执行多次滚动计算?

问题描述

假设我有日期索引的数据集

id, date, col1, col2
1, 4, 1, 12
1, 5, 2, 13
1, 6, 6, 14
2, 4, 20, 16 
2, 5, 8, 17 
2, 6, 11, 18
...

我希望计算滚动并按分组,窗口大小mean, sum, min, max为2 和 3。我可以像这样在循环中做到这一点col1col2id

def multi_rolling(df, winsize, column):
 [df.groupby("id")[column].rolling(winsize).mean(),
 df.groupby("id")[column].rolling(winsize).sum(),
 df.groupby("id")[column].rolling(winsize).min(),
 df.groupby("id")[column].rolling(winsize).max(),
 df.groupby("id")[column].rolling(winsize).count()]

然后我只需要循环调用上述内容。但这感觉效率低下。有没有办法更有效地在所有函数、所有列和所有窗口大小的所有组合上调用它?例如并行运行它们?

标签: pythonpandasdataframe

解决方案


使用pandas.DataFrame.agg

new_df = df.groupby("id").rolling(2)[["col1","col2"]].agg(['mean','sum','min','max','count'])
print(new_df)

输出:

      col1                                                      col2        \
      mean         sum        min         max       count       mean         
      col1  col2  col1  col2 col1  col2  col1  col2  col1 col2  col1  col2   
id                                                                           
1  0   NaN   NaN   NaN   NaN  NaN   NaN   NaN   NaN   1.0  1.0   NaN   NaN   
   1   1.5  12.5   3.0  25.0  1.0  12.0   2.0  13.0   2.0  2.0   1.5  12.5   
   2   4.0  13.5   8.0  27.0  2.0  13.0   6.0  14.0   2.0  2.0   4.0  13.5   
2  3   NaN   NaN   NaN   NaN  NaN   NaN   NaN   NaN   1.0  1.0   NaN   NaN   
   4  14.0  16.5  28.0  33.0  8.0  16.0  20.0  17.0   2.0  2.0  14.0  16.5   
   5   9.5  17.5  19.0  35.0  8.0  17.0  11.0  18.0   2.0  2.0   9.5  17.5   


       sum        min         max       count       
      col1  col2 col1  col2  col1  col2  col1 col2  
id                                                  
1  0   NaN   NaN  NaN   NaN   NaN   NaN   1.0  1.0  
   1   3.0  25.0  1.0  12.0   2.0  13.0   2.0  2.0  
   2   8.0  27.0  2.0  13.0   6.0  14.0   2.0  2.0  
2  3   NaN   NaN  NaN   NaN   NaN   NaN   1.0  1.0  
   4  28.0  33.0  8.0  16.0  20.0  17.0   2.0  2.0  
   5  19.0  35.0  8.0  17.0  11.0  18.0   2.0  2.0  

推荐阅读