首页 > 解决方案 > 如何使用 R 中的基线小时计算变化?

问题描述

我有这个数据框

library(lubridate)
df <- data.frame(seq(ymd_h("2017-01-01-00"), ymd_h("2020-01-31-24"), by = "hours"))
df$close <- rnorm(nrows(df), 3000, 150)

colnames(df) <- c("date", "close")
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
df$hour <- hour(df$date)
df$day <- day(df$date)
df$month <- month(df$date)
df$year <- year(df$date)

我想得到 16 小时以来收盘价的变化。例如,在 16 小时之后,所有数据在 18 小时的平均价格变化是……等等。我想将一小时设置为基线并获得价格变化。

这就是我所做的。首先我使用滞后,但我不确定如何将 16 小时设置为基线。但是,这甚至没有给我接近我想要的结果。我使用铅的第二种方法,但我有同样的问题:

df_2 <- df %>% group_by(year, month, day, hour) %>% 
  mutate(change = (close-lead(close)))

总之,我想计算每天从 16 小时开始的价格变化,然后得到从 16 小时到其余时间的平均价格变化。

标签: rdatetime

解决方案


如果您需要全天候差异:


setDT(df)
df[, date_number := as.numeric(as.Date(ymd_h( sprintf("%d-%d-%dT%d",year,month,day,hour) ) - hours(16))) ]
df[, delta := close - close[ hour == 16 ], .(date_number) ]

head( df, n=48 )
tail( df, n=48 )

df[, .(meanPerHour = mean(delta)), .(hour) ]

要正确执行此操作,您需要创建 Date 对象,您可以在代码中看到该对象,然后减去 16 小时(或加 8),使 16:00 成为新的 0:00 ,然后将其转换回日期,并按该日期的日期编号(您从 as.numeric 获得)分组。

前 48 行:

> head( df, n=48 )
                   date    close hour day month year date_number       delta
 1: 2017-01-01 00:00:00 2924.671    0   1     1 2017       17166          NA
 2: 2017-01-01 01:00:00 3019.730    1   1     1 2017       17166          NA
 3: 2017-01-01 02:00:00 2988.162    2   1     1 2017       17166          NA
 4: 2017-01-01 03:00:00 3133.018    3   1     1 2017       17166          NA
 5: 2017-01-01 04:00:00 3017.546    4   1     1 2017       17166          NA
 6: 2017-01-01 05:00:00 3047.795    5   1     1 2017       17166          NA
 7: 2017-01-01 06:00:00 2912.731    6   1     1 2017       17166          NA
 8: 2017-01-01 07:00:00 3107.180    7   1     1 2017       17166          NA
 9: 2017-01-01 08:00:00 2876.211    8   1     1 2017       17166          NA
10: 2017-01-01 09:00:00 2946.021    9   1     1 2017       17166          NA
11: 2017-01-01 10:00:00 3013.483   10   1     1 2017       17166          NA
12: 2017-01-01 11:00:00 3014.441   11   1     1 2017       17166          NA
13: 2017-01-01 12:00:00 2969.755   12   1     1 2017       17166          NA
14: 2017-01-01 13:00:00 3110.976   13   1     1 2017       17166          NA
15: 2017-01-01 14:00:00 3018.507   14   1     1 2017       17166          NA
16: 2017-01-01 15:00:00 2995.602   15   1     1 2017       17166          NA
17: 2017-01-01 16:00:00 2941.672   16   1     1 2017       17167    0.000000
18: 2017-01-01 17:00:00 3076.628   17   1     1 2017       17167  134.956576
19: 2017-01-01 18:00:00 2862.928   18   1     1 2017       17167  -78.743991
20: 2017-01-01 19:00:00 3346.545   19   1     1 2017       17167  404.872660
21: 2017-01-01 20:00:00 2934.287   20   1     1 2017       17167   -7.385360
22: 2017-01-01 21:00:00 3114.609   21   1     1 2017       17167  172.937229
23: 2017-01-01 22:00:00 3039.294   22   1     1 2017       17167   97.622331
24: 2017-01-01 23:00:00 3116.011   23   1     1 2017       17167  174.338827
25: 2017-01-02 00:00:00 2877.843    0   2     1 2017       17167  -63.828732
26: 2017-01-02 01:00:00 2934.232    1   2     1 2017       17167   -7.439448
27: 2017-01-02 02:00:00 2891.967    2   2     1 2017       17167  -49.705095
28: 2017-01-02 03:00:00 3034.642    3   2     1 2017       17167   92.969817
29: 2017-01-02 04:00:00 2826.341    4   2     1 2017       17167 -115.331282
30: 2017-01-02 05:00:00 3037.061    5   2     1 2017       17167   95.389536
31: 2017-01-02 06:00:00 2986.333    6   2     1 2017       17167   44.661103
32: 2017-01-02 07:00:00 3263.606    7   2     1 2017       17167  321.934480
33: 2017-01-02 08:00:00 2979.311    8   2     1 2017       17167   37.638695
34: 2017-01-02 09:00:00 2983.321    9   2     1 2017       17167   41.649113
35: 2017-01-02 10:00:00 2896.498   10   2     1 2017       17167  -45.174011
36: 2017-01-02 11:00:00 2966.731   11   2     1 2017       17167   25.059003
37: 2017-01-02 12:00:00 3027.436   12   2     1 2017       17167   85.764290
38: 2017-01-02 13:00:00 3062.598   13   2     1 2017       17167  120.926630
39: 2017-01-02 14:00:00 3159.810   14   2     1 2017       17167  218.138486
40: 2017-01-02 15:00:00 3145.530   15   2     1 2017       17167  203.858440
41: 2017-01-02 16:00:00 2984.756   16   2     1 2017       17168    0.000000
42: 2017-01-02 17:00:00 3210.481   17   2     1 2017       17168  225.724909
43: 2017-01-02 18:00:00 2733.484   18   2     1 2017       17168 -251.271959
44: 2017-01-02 19:00:00 3093.430   19   2     1 2017       17168  108.674494
45: 2017-01-02 20:00:00 2921.657   20   2     1 2017       17168  -63.098117
46: 2017-01-02 21:00:00 3198.335   21   2     1 2017       17168  213.579029
47: 2017-01-02 22:00:00 2945.484   22   2     1 2017       17168  -39.271663
48: 2017-01-02 23:00:00 3197.860   23   2     1 2017       17168  213.104247

最后 48 条记录:

> tail( df, n=48 )
     date    close hour day month year date_number         delta
 1: 18290 3170.775    1  30     1 2020       18290  201.47027428
 2: 18290 3293.403    2  30     1 2020       18290  324.09870453
 3: 18290 2940.591    3  30     1 2020       18290  -28.71382979
 4: 18290 2922.411    4  30     1 2020       18290  -46.89312915
 5: 18290 3237.419    5  30     1 2020       18290  268.11402422
 6: 18290 2989.678    6  30     1 2020       18290   20.37332637
 7: 18290 2932.777    7  30     1 2020       18290  -36.52746038
 8: 18291 3188.269    8  30     1 2020       18290  218.96474627
 9: 18291 3003.327    9  30     1 2020       18290   34.02206527
10: 18291 2969.222   10  30     1 2020       18290   -0.08292166
11: 18291 2848.911   11  30     1 2020       18290 -120.39313851
12: 18291 2892.804   12  30     1 2020       18290  -76.50054871
13: 18291 3064.894   13  30     1 2020       18290   95.58913403
14: 18291 3172.009   14  30     1 2020       18290  202.70445747
15: 18291 3373.631   15  30     1 2020       18290  404.32650780
16: 18291 3019.765   16  30     1 2020       18291    0.00000000
17: 18291 2748.688   17  30     1 2020       18291 -271.07660267
18: 18291 2718.065   18  30     1 2020       18291 -301.70056024
19: 18291 2817.891   19  30     1 2020       18291 -201.87390563
20: 18291 3086.820   20  30     1 2020       18291   67.05492016
21: 18291 2972.657   21  30     1 2020       18291  -47.10804222
22: 18291 3009.258   22  30     1 2020       18291  -10.50687269
23: 18291 2949.268   23  30     1 2020       18291  -70.49745611
24: 18291 3032.938    0  31     1 2020       18291   13.17296251
25: 18291 3267.187    1  31     1 2020       18291  247.42241735
26: 18291 2984.129    2  31     1 2020       18291  -35.63610546
27: 18291 3053.728    3  31     1 2020       18291   33.96259834
28: 18291 3290.451    4  31     1 2020       18291  270.68616991
29: 18291 2875.921    5  31     1 2020       18291 -143.84421823
30: 18291 3159.612    6  31     1 2020       18291  139.84677795
31: 18291 2798.017    7  31     1 2020       18291 -221.74778788
32: 18292 2833.522    8  31     1 2020       18291 -186.24270860
33: 18292 3184.870    9  31     1 2020       18291  165.10465470
34: 18292 3037.279   10  31     1 2020       18291   17.51427029
35: 18292 3260.309   11  31     1 2020       18291  240.54407728
36: 18292 3178.804   12  31     1 2020       18291  159.03915248
37: 18292 2905.164   13  31     1 2020       18291 -114.60150340
38: 18292 2928.120   14  31     1 2020       18291  -91.64555778
39: 18292 2975.566   15  31     1 2020       18291  -44.19924163
40: 18292 3060.792   16  31     1 2020       18292    0.00000000
41: 18292 2916.899   17  31     1 2020       18292 -143.89373840
42: 18292 3297.537   18  31     1 2020       18292  236.74429212
43: 18292 3208.996   19  31     1 2020       18292  148.20392802
44: 18292 2791.129   20  31     1 2020       18292 -269.66375428
45: 18292 2842.001   21  31     1 2020       18292 -218.79120834
46: 18292 2992.381   22  31     1 2020       18292  -68.41127630
47: 18292 3189.018   23  31     1 2020       18292  128.22565814
48: 18292 2962.099    0   1     2 2020       18292  -98.69355677

平均每小时:

> df[, .(meanPerHour = mean(delta)), .(hour) ]
    hour meanPerHour
 1:    0   3.5877077
 2:    1   1.3695897
 3:    2   0.1010658
 4:    3   1.4441742
 5:    4  -3.0837907
 6:    5  -3.1353593
 7:    6  11.3738058
 8:    7   4.7171345
 9:    8   5.0449846
10:    9   1.3226027
11:   10  -2.3716443
12:   11   1.4710920
13:   12  -4.8875706
14:   13   4.7203754
15:   14   2.3528875
16:   15   2.3075150
17:   16   0.0000000
18:   17  -2.1353366
19:   18   4.5127309
20:   19   5.2032461
21:   20   3.8043017
22:   21   3.7928297
23:   22  -3.9258290
24:   23   3.0638861

最后,一个简洁的功能:


average.by.hour.by.reference <- function( df, hrs=16 ) {

    df <- as.data.table(df)
    df[, date_number := as.numeric(as.Date(ymd_h( sprintf("%d-%d-%dT%d",year,month,day,hour) ) - hours(hrs))) ]
    df[, delta := close - close[ hour == hrs ], .(date_number) ]
    
    return( df[, .(meanPerHour = mean(delta,na.rm=TRUE)), .(hour) ] )
    
}

average.by.hour.by.reference( df, 16 ) # produces the above results

讽刺地

您最有可能获得相同或足够接近实际应用程序的信息,方法是不打扰按日期分组,只需按小时进行全局分组,然后从您想要作为参考的每个小时中减去。

(但我们不会展示所有这些花哨的代码!)


推荐阅读