python - 时间序列数据的 Pandas 滚动最大值
问题描述
在 Jupyter 笔记本中的 2 个数据集上应用 rolling("1D").max() 时,我得到 2 种不同的行为。
我需要计算每天的滚动最大值。
Sample:
df = pd.DataFrame({'B': [0, 4, 3, 3, 4, 2, 1, 2, 3, 4]},
index = [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:02:02'),
pd.Timestamp('20130101 09:03:03'),
pd.Timestamp('20130101 09:04:05'),
pd.Timestamp('20130101 09:15:06'),
pd.Timestamp('20130102 09:16:06'),
pd.Timestamp('20130102 09:17:06'),
pd.Timestamp('20130102 09:35:06'),
pd.Timestamp('20130102 09:36:06'),
pd.Timestamp('20130102 09:37:06')])
df.rolling("1D").max() #gives desired output
B
2013-01-01 09:00:00 0.0
2013-01-01 09:02:02 4.0
2013-01-01 09:03:03 4.0
2013-01-01 09:04:05 4.0
2013-01-01 09:15:06 4.0
2013-01-02 09:16:06 2.0 # <- 2 is the highest value for new day
2013-01-02 09:17:06 2.0
2013-01-02 09:35:06 2.0
2013-01-02 09:36:06 3.0
2013-01-02 09:37:06 4.0
当我尝试应用于实际数据时,我得到
# Sample data
data = '{"High":{"1611221400000":0.99615,"1611222300000":0.9751,"1611223200000":1.035,"1611224100000":0.9894,"1611225000000":1.385,"1611225900000":1.345,"1611226800000":1.235,"1611227700000":1.245,"1611228600000":1.315,"1611229500000":1.295,"1611230400000":1.28,"1611231300000":1.295,"1611232200000":1.415,"1611233100000":1.415,"1611234000000":1.355,"1611234900000":1.385,"1611235800000":1.335,"1611236700000":1.325,"1611237600000":1.365,"1611238500000":1.445,"1611239400000":1.515,"1611240300000":1.475,"1611241200000":1.405,"1611242100000":1.375,"1611243000000":1.255,"1611243900000":1.225,"1611307800000":1.375,"1611308700000":1.415,"1611309600000":1.495}}'
df2 = pd.read_json(data)
df2.rolling("1D").max()
# keeps rolling from previous day
High
Date
2021-01-21 09:30:00 0.99615
2021-01-21 09:45:00 0.99615
2021-01-21 10:00:00 1.03500
2021-01-21 10:15:00 1.03500
2021-01-21 10:30:00 1.38500
2021-01-21 10:45:00 1.38500
2021-01-21 11:00:00 1.38500
2021-01-21 11:15:00 1.38500
2021-01-21 11:30:00 1.38500
2021-01-21 11:45:00 1.38500
2021-01-21 12:00:00 1.38500
2021-01-21 12:15:00 1.38500
2021-01-21 12:30:00 1.41500
2021-01-21 12:45:00 1.41500
2021-01-21 13:00:00 1.41500
2021-01-21 13:15:00 1.41500
2021-01-21 13:30:00 1.41500
2021-01-21 13:45:00 1.41500
2021-01-21 14:00:00 1.41500
2021-01-21 14:15:00 1.44500
2021-01-21 14:30:00 1.51500
2021-01-21 14:45:00 1.51500
2021-01-21 15:00:00 1.51500
2021-01-21 15:15:00 1.51500
2021-01-21 15:30:00 1.51500
2021-01-21 15:45:00 1.51500
2021-01-22 09:30:00 1.51500 # <- value got rolled from previous day
2021-01-22 09:45:00 1.51500
2021-01-22 10:00:00 1.51500
熊猫版本 = 0.25.1
两个 DF 都有 DatetimeIndex, dtype='datetime64[ns]', freq=None
知道为什么会这样吗?
解决方案
一天的滚动窗口 ( '1D'
) 不是从午夜到午夜,而是跨越 24 小时,与日期变化无关。当你这样做时,你可以看到这个:
def fun(x):
print(x.index[0], x.index[-1])
return len(x)
df2.rolling("1d").apply(fun)
所以你需要的是df2.set_index(df2.index.normalize()).rolling("1d").max()
:
df2.High = df2.set_index(df2.index.normalize()).rolling("1d").max().to_numpy()
结果:
High
2021-01-21 09:30:00 0.99615
2021-01-21 09:45:00 0.99615
2021-01-21 10:00:00 1.03500
2021-01-21 10:15:00 1.03500
2021-01-21 10:30:00 1.38500
2021-01-21 10:45:00 1.38500
2021-01-21 11:00:00 1.38500
2021-01-21 11:15:00 1.38500
2021-01-21 11:30:00 1.38500
2021-01-21 11:45:00 1.38500
2021-01-21 12:00:00 1.38500
2021-01-21 12:15:00 1.38500
2021-01-21 12:30:00 1.41500
2021-01-21 12:45:00 1.41500
2021-01-21 13:00:00 1.41500
2021-01-21 13:15:00 1.41500
2021-01-21 13:30:00 1.41500
2021-01-21 13:45:00 1.41500
2021-01-21 14:00:00 1.41500
2021-01-21 14:15:00 1.44500
2021-01-21 14:30:00 1.51500
2021-01-21 14:45:00 1.51500
2021-01-21 15:00:00 1.51500
2021-01-21 15:15:00 1.51500
2021-01-21 15:30:00 1.51500
2021-01-21 15:45:00 1.51500
2021-01-22 09:30:00 1.37500
2021-01-22 09:45:00 1.41500
2021-01-22 10:00:00 1.49500
这比groupby
onindex.date
然后删除额外的索引级别快大约 2-3 倍。
另一种可能性是使用 aVariableOffsetWindowIndexer
天normalized
DateOffset
,0
但这非常慢:
indexer = pd.api.indexers.VariableOffsetWindowIndexer(index=df2.index, offset=pd.tseries.offsets.DateOffset(0, normalize=True))
df2.rolling(indexer).max()
推荐阅读
- reactjs - 无法在 reactjs 的 ckeditor5 中实现提及
- python - 我需要点击一个td,但是selenium报错“selenium.common.exceptions.NoSuchElementException: Message: no such element”
- python - 当季节长度不同时,如何在 python 中执行 SARIMA?
- javascript - 你如何使用 npm pngjs web 版本解析方法来获取 png 文件并将文件写入磁盘
- python - 插入 pandas 数据帧时,datetime 子类的附加功能会丢失
- reactjs - 使用 rails 将图像上传到 Cloudinary(没有将 nil 隐式转换为字符串)同时
- javascript - 获取两个字符之间的内容
- regex - 如何用动态空格数替换子字符串
- laravel - Laravel Eloquen ORM 无法正确返回模型的关系
- r - 如何设置嵌套 cv 以调整 xgboost 的超参数和最佳迭代次数?