python - 计算n列滚动间隔的最大差异
问题描述
我有一个数据集
df
Time Spot Ubalance
0 2017-01-01T00:00:00+01:00 20.96 NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40
2 2017-01-01T02:00:00+01:00 18.13 24.73
3 2017-01-01T03:00:00+01:00 16.03 24.73
4 2017-01-01T04:00:00+01:00 16.43 27.89
5 2017-01-01T05:00:00+01:00 13.75 28.26
6 2017-01-01T06:00:00+01:00 11.10 30.43
7 2017-01-01T07:00:00+01:00 15.47 32.85
8 2017-01-01T08:00:00+01:00 16.88 33.91
9 2017-01-01T09:00:00+01:00 21.81 28.58
10 2017-01-01T10:00:00+01:00 26.24 28.58
我想生成一个系列/数据框,在其中我计算多列中最后 n 行的最高值和最低值之间的最大差异,即这些“最后”10 行的最大差异是
33.91(最高在“ubalance”中)- 11.10(最低在“Spot”中)= 22.81
我试过 .rolling() 但它显然不包含差异属性。
预期结果:
Time Spot Ubalance Diff
0 2017-01-01T00:00:00+01:00 20.96 NaN NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00+01:00 21.81 28.58 NaN
10 2017-01-01T10:00:00+01:00 26.24 28.58 22.81
解决方案
使用Rolling.aggregate
然后减去:
df1 = df['Spot'].rolling(10).agg(['min','max'])
print (df1)
min max
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 11.1 21.81
10 11.1 26.24
df['dif'] = df1['max'].sub(df1['min'])
print (df)
Time Spot Ubalance dif
0 2017-01-01T00:00:00+01:00 20.96 NaN NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00+01:00 21.81 28.58 10.71
10 2017-01-01T10:00:00+01:00 26.24 28.58 15.14
或自定义功能lambda
:
df['diff'] = df['Spot'].rolling(10).agg(lambda x: x.max() - x.min())
编辑:
要处理列表中的所有列,请使用:
cols = ['Spot','Ubalance']
N = 10
df['dif'] = (df[cols].stack(dropna=False)
.rolling(len(cols) * N)
.agg(lambda x: x.max() - x.min())
.groupby(level=0)
.max())
print (df)
Time Spot Ubalance dif
0 2017-01-01T00:00:00+01:00 20.96 NaN NaN
1 2017-01-01T01:00:00+01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00+01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00+01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00+01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00+01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00+01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00+01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00+01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00+01:00 21.81 28.58 NaN
10 2017-01-01T10:00:00+01:00 26.24 28.58 22.81
推荐阅读
- javascript - 有什么方法可以同时从一个元素中添加和删除一个类
- c# - 如何合并`列表
` 并删除所有键的名称/类对的重复项 - python - 与 Oracle 数据库的 Python 连接:cx_Oracle
- python-3.x - 微调 res10_300x300_ssd_iter_140000.caffemodel
- wordpress - Symfony 4 Event Dispatcher - 是否有一些解决方案如何使用 Event Dispatcher 过滤值?(如 wordpress “add_filter” 功能)
- javascript - 合并 2 个 onclick 动作 (Javascript) 并用一个按钮触发它们
- gps - LoRaWan - 可以追踪运动员吗?
- java - Xalan-J:在扩展函数中解析 QName 文本值的命名空间
- javascript - 反应 {...props} 副作用
- laravel - Laravel PostgreSQL:SELECT DISTINCT ON 表达式必须匹配初始 ORDER BY 表达式