python - Pandas:滚动平均在新的多指数值上重新开始
问题描述
我有以下数据框:
df = pd.DataFrame({'Team':['A','A','A','A','B','B','B','B'],
'Date':list(pd.date_range(start='1/1/2021', periods=8)),
'Score':[7,3,3,6,7,3,7,5],
}).set_index(['Team', 'Date'])
我想添加一个滚动平均列,当 Level 0 索引是一个新值时它会重置。下面的简单代码不起作用,因为滚动平均值在索引值之间结转:
df['Avg'] = df['Score'].rolling(window=2).mean()
Score Avg
Team Date
A 2021-01-01 7 NaN
2021-01-02 3 5.0
2021-01-03 3 3.0
2021-01-04 6 4.5
B 2021-01-05 7 6.5
2021-01-06 3 5.0
2021-01-07 7 5.0
2021-01-08 5 6.0
如何获得以下数据框?:
Score Avg
Team Date
A 2021-01-01 7 NaN
2021-01-02 3 5.0
2021-01-03 3 3.0
2021-01-04 6 4.5
B 2021-01-05 7 NaN
2021-01-06 3 5.0
2021-01-07 7 5.0
2021-01-08 5 6.0
谢谢
解决方案
使用groupby rolling mean
onlevel='Team'
和droplevel
正确对齐索引:
df['Avg'] = (
df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
)
df
:
Score Avg
Team Date
A 2021-01-01 7 NaN
2021-01-02 3 5.0
2021-01-03 3 3.0
2021-01-04 6 4.5
B 2021-01-05 7 NaN
2021-01-06 3 5.0
2021-01-07 7 5.0
2021-01-08 5 6.0
droplevel
过度使用的好处values
是索引将正确对齐。
给定一个无序的 DataFrame,例如:
df = pd.DataFrame({'Team': ['B', 'B', 'B', 'B', 'A', 'A', 'A', 'A'],
'Date': list(pd.date_range(start='1/1/2021', periods=8)),
'Score': [7, 7, 7, 8, 1, 2, 1, 2],
}).set_index(['Team', 'Date'])
df
:
Score
Team Date
B 2021-01-01 7
2021-01-02 7
2021-01-03 7
2021-01-04 8
A 2021-01-05 1
2021-01-06 2
2021-01-07 1
2021-01-08 2
droplevel
注意和之间的区别values
:
df['drop_level'] = (
df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
)
df['values'] = (
df.groupby(level='Team')['Score'].rolling(window=2).mean().values
)
Score drop_level values
Team Date
B 2021-01-01 7 NaN NaN
2021-01-02 7 7.0 1.5
2021-01-03 7 7.0 1.5
2021-01-04 8 7.5 1.5 # These are the averages from A
A 2021-01-05 1 NaN NaN
2021-01-06 2 1.5 7.0 # These are the averages from B
2021-01-07 1 1.5 7.0
2021-01-08 2 1.5 7.5