首页 > 解决方案 > Pandas:滚动平均在新的多指数值上重新开始

问题描述

我有以下数据框:

df = pd.DataFrame({'Team':['A','A','A','A','B','B','B','B'],
                   'Date':list(pd.date_range(start='1/1/2021', periods=8)),
                   'Score':[7,3,3,6,7,3,7,5],
                  }).set_index(['Team', 'Date'])

我想添加一个滚动平均列,当 Level 0 索引是一个新值时它会重置。下面的简单代码不起作用,因为滚动平均值在索引值之间结转:

df['Avg'] = df['Score'].rolling(window=2).mean()


                 Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  6.5
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

如何获得以下数据框?:

                 Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  NaN
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

谢谢

标签: pythonpandas

解决方案


使用groupby rolling meanonlevel='Team'droplevel正确对齐索引:

df['Avg'] = (
    df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
)

df

                 Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  NaN
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

droplevel过度使用的好处values是索引将正确对齐。

给定一个无序的 DataFrame,例如:

df = pd.DataFrame({'Team': ['B', 'B', 'B', 'B', 'A', 'A', 'A', 'A'],
                   'Date': list(pd.date_range(start='1/1/2021', periods=8)),
                   'Score': [7, 7, 7, 8, 1, 2, 1, 2],
                   }).set_index(['Team', 'Date'])

df

                 Score
Team Date             
B    2021-01-01      7
     2021-01-02      7
     2021-01-03      7
     2021-01-04      8
A    2021-01-05      1
     2021-01-06      2
     2021-01-07      1
     2021-01-08      2

droplevel注意和之间的区别values

df['drop_level'] = (
    df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
)
df['values'] = (
    df.groupby(level='Team')['Score'].rolling(window=2).mean().values
)
                 Score  drop_level  values
Team Date                                 
B    2021-01-01      7         NaN     NaN
     2021-01-02      7         7.0     1.5
     2021-01-03      7         7.0     1.5
     2021-01-04      8         7.5     1.5  # These are the averages from A
A    2021-01-05      1         NaN     NaN
     2021-01-06      2         1.5     7.0  # These are the averages from B
     2021-01-07      1         1.5     7.0
     2021-01-08      2         1.5     7.5

推荐阅读