python - pandas:有效地执行多次滚动计算?
问题描述
假设我有日期索引的数据集
id, date, col1, col2
1, 4, 1, 12
1, 5, 2, 13
1, 6, 6, 14
2, 4, 20, 16
2, 5, 8, 17
2, 6, 11, 18
...
我希望计算滚动并按分组,窗口大小mean, sum, min, max
为2 和 3。我可以像这样在循环中做到这一点col1
col2
id
def multi_rolling(df, winsize, column):
[df.groupby("id")[column].rolling(winsize).mean(),
df.groupby("id")[column].rolling(winsize).sum(),
df.groupby("id")[column].rolling(winsize).min(),
df.groupby("id")[column].rolling(winsize).max(),
df.groupby("id")[column].rolling(winsize).count()]
然后我只需要循环调用上述内容。但这感觉效率低下。有没有办法更有效地在所有函数、所有列和所有窗口大小的所有组合上调用它?例如并行运行它们?
解决方案
new_df = df.groupby("id").rolling(2)[["col1","col2"]].agg(['mean','sum','min','max','count'])
print(new_df)
输出:
col1 col2 \
mean sum min max count mean
col1 col2 col1 col2 col1 col2 col1 col2 col1 col2 col1 col2
id
1 0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN
1 1.5 12.5 3.0 25.0 1.0 12.0 2.0 13.0 2.0 2.0 1.5 12.5
2 4.0 13.5 8.0 27.0 2.0 13.0 6.0 14.0 2.0 2.0 4.0 13.5
2 3 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN
4 14.0 16.5 28.0 33.0 8.0 16.0 20.0 17.0 2.0 2.0 14.0 16.5
5 9.5 17.5 19.0 35.0 8.0 17.0 11.0 18.0 2.0 2.0 9.5 17.5
sum min max count
col1 col2 col1 col2 col1 col2 col1 col2
id
1 0 NaN NaN NaN NaN NaN NaN 1.0 1.0
1 3.0 25.0 1.0 12.0 2.0 13.0 2.0 2.0
2 8.0 27.0 2.0 13.0 6.0 14.0 2.0 2.0
2 3 NaN NaN NaN NaN NaN NaN 1.0 1.0
4 28.0 33.0 8.0 16.0 20.0 17.0 2.0 2.0
5 19.0 35.0 8.0 17.0 11.0 18.0 2.0 2.0
推荐阅读
- javascript - Deterministic string compare in JS/TS?
- javascript - different behavior es6 Proxy on NodeJs and Web when return in class constructor that extends
- sql - sql query to get the data with respect to max and min dates group by
- testing - RSPEC testing a method that's cached in a constant
- ios - 您是否需要一个后端来存储 iOS 应用程序使用的 JSON 文件?
- python - python代码中的ModuleNotFoundError-在控制台应用程序中找不到自定义类
- python - Python - 修复以错误编码保存的波斯文本文件
- javascript - 如何将同一工作日的时间分组?
- javascript - KeyboardAvoidingView 不适用于“填充”或“高度”
- typescript - vue 3 typescript属性'key'在类型上不存在