首页 > 解决方案 > 如何在具有强模式的日常数据中查看趋势和残差模式

问题描述

我试图从具有如下日常活动形状的数据集中删除模式。我尝试了season_decompose,这可能不合适。

我想要做的是删除预期的峰值使用模式并达到趋势或峰值,就像您在每月数据中应用seasonal_decompose 函数时一样。

有谁知道我可以在这样的日常数据中看到趋势和异常数据吗?

在此处输入图像描述

编辑:这是重现上述示例的代码。

sample = {'EventTime': [pd.Timestamp('2020-09-21 00:00:00'), pd.Timestamp('2020-09-21 01:00:00'), pd.Timestamp('2020-09-21 02:00:00'), pd.Timestamp('2020-09-21 03:00:00'), pd.Timestamp('2020-09-21 04:00:00'), pd.Timestamp('2020-09-21 05:00:00'), pd.Timestamp('2020-09-21 06:00:00'), pd.Timestamp('2020-09-21 07:00:00'), pd.Timestamp('2020-09-21 08:00:00'), pd.Timestamp('2020-09-21 09:00:00'), pd.Timestamp('2020-09-21 10:00:00'), pd.Timestamp('2020-09-21 11:00:00'), pd.Timestamp('2020-09-21 12:00:00'), pd.Timestamp('2020-09-21 13:00:00'), pd.Timestamp('2020-09-21 14:00:00'), pd.Timestamp('2020-09-21 15:00:00'), pd.Timestamp('2020-09-21 16:00:00'), pd.Timestamp('2020-09-21 17:00:00'), pd.Timestamp('2020-09-22 01:00:00'), pd.Timestamp('2020-09-22 02:00:00'), pd.Timestamp('2020-09-22 03:00:00'), pd.Timestamp('2020-09-22 04:00:00'), pd.Timestamp('2020-09-22 05:00:00'), pd.Timestamp('2020-09-22 06:00:00'), pd.Timestamp('2020-09-22 07:00:00'), pd.Timestamp('2020-09-22 08:00:00'), pd.Timestamp('2020-09-22 09:00:00'), pd.Timestamp('2020-09-22 10:00:00'), pd.Timestamp('2020-09-22 11:00:00'), pd.Timestamp('2020-09-22 12:00:00'), pd.Timestamp('2020-09-22 13:00:00'), pd.Timestamp('2020-09-22 14:00:00'), pd.Timestamp('2020-09-22 15:00:00'), pd.Timestamp('2020-09-22 16:00:00'), pd.Timestamp('2020-09-22 17:00:00'), pd.Timestamp('2020-09-23 00:00:00'), pd.Timestamp('2020-09-23 01:00:00'), pd.Timestamp('2020-09-23 02:00:00'), pd.Timestamp('2020-09-23 03:00:00'), pd.Timestamp('2020-09-23 04:00:00'), pd.Timestamp('2020-09-23 05:00:00'), pd.Timestamp('2020-09-23 06:00:00'), pd.Timestamp('2020-09-23 07:00:00'), pd.Timestamp('2020-09-23 08:00:00'), pd.Timestamp('2020-09-23 09:00:00'), pd.Timestamp('2020-09-23 10:00:00'), pd.Timestamp('2020-09-23 11:00:00'), pd.Timestamp('2020-09-23 12:00:00'), pd.Timestamp('2020-09-23 13:00:00'), pd.Timestamp('2020-09-23 14:00:00'), pd.Timestamp('2020-09-23 15:00:00'), pd.Timestamp('2020-09-23 16:00:00'), pd.Timestamp('2020-09-23 17:00:00'), pd.Timestamp('2020-09-24 01:00:00'), pd.Timestamp('2020-09-24 02:00:00'), pd.Timestamp('2020-09-24 03:00:00'), pd.Timestamp('2020-09-24 04:00:00'), pd.Timestamp('2020-09-24 05:00:00'), pd.Timestamp('2020-09-24 06:00:00'), pd.Timestamp('2020-09-24 07:00:00'), pd.Timestamp('2020-09-24 08:00:00'), pd.Timestamp('2020-09-24 09:00:00'), pd.Timestamp('2020-09-24 10:00:00'), pd.Timestamp('2020-09-24 11:00:00'), pd.Timestamp('2020-09-24 12:00:00'), pd.Timestamp('2020-09-24 13:00:00'), pd.Timestamp('2020-09-24 14:00:00'), pd.Timestamp('2020-09-24 15:00:00'), pd.Timestamp('2020-09-24 16:00:00'), pd.Timestamp('2020-09-24 17:00:00'), pd.Timestamp('2020-09-25 00:00:00'), pd.Timestamp('2020-09-25 01:00:00'), pd.Timestamp('2020-09-25 02:00:00'), pd.Timestamp('2020-09-25 03:00:00'), pd.Timestamp('2020-09-25 04:00:00'), pd.Timestamp('2020-09-25 05:00:00'), pd.Timestamp('2020-09-25 06:00:00'), pd.Timestamp('2020-09-25 07:00:00'), pd.Timestamp('2020-09-25 08:00:00'), pd.Timestamp('2020-09-25 09:00:00'), pd.Timestamp('2020-09-25 10:00:00'), pd.Timestamp('2020-09-25 11:00:00'), pd.Timestamp('2020-09-25 12:00:00'), pd.Timestamp('2020-09-25 13:00:00'), pd.Timestamp('2020-09-25 14:00:00')],
          'SpeedKbs': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1088.48, 58282.31, 83008.37, 58044.14, 34211.61, 27468.72, 25756.96, 14090.29, 5392.43, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1008.33, 44002.72, 47254.5, 37419.96, 23934.41, 19402.93, 18192.84, 9040.67, 3842.37, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1241.15, 43260.7, 56718.99, 41968.16, 33144.51, 22361.08, 28672.93, 21182.31, 5352.42, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 946.01, 46169.63, 51720.39, 37393.39, 27732.89, 25779.79, 24790.86, 15786.72, 4202.65, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 871.7, 37196.78, 40910.71, 26758.97, 17710.98, 16024.61, 15312.96, 9529.89]}

from statsmodels.tsa.seasonal import seasonal_decompose

seasonal_decompose(pd.DataFrame(sample).set_index("EventTime"), model='additive', period=1).plot();

标签: pythonpandasdata-science

解决方案


这是每小时数据,具有每日模式。因此,频率需要设置为 24。将频率设置为 1 本质上是根本不做季节性化。

seasonal_decompose(pd.DataFrame(sample).set_index("EventTime"), model='additive', period=24).plot();

这是它的输出:

绘图输出


推荐阅读