r - 如何使用 R 中的基线小时计算变化?
问题描述
我有这个数据框
library(lubridate)
df <- data.frame(seq(ymd_h("2017-01-01-00"), ymd_h("2020-01-31-24"), by = "hours"))
df$close <- rnorm(nrows(df), 3000, 150)
colnames(df) <- c("date", "close")
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
df$hour <- hour(df$date)
df$day <- day(df$date)
df$month <- month(df$date)
df$year <- year(df$date)
我想得到 16 小时以来收盘价的变化。例如,在 16 小时之后,所有数据在 18 小时的平均价格变化是……等等。我想将一小时设置为基线并获得价格变化。
这就是我所做的。首先我使用滞后,但我不确定如何将 16 小时设置为基线。但是,这甚至没有给我接近我想要的结果。我使用铅的第二种方法,但我有同样的问题:
df_2 <- df %>% group_by(year, month, day, hour) %>%
mutate(change = (close-lead(close)))
总之,我想计算每天从 16 小时开始的价格变化,然后得到从 16 小时到其余时间的平均价格变化。
解决方案
如果您需要全天候差异:
setDT(df)
df[, date_number := as.numeric(as.Date(ymd_h( sprintf("%d-%d-%dT%d",year,month,day,hour) ) - hours(16))) ]
df[, delta := close - close[ hour == 16 ], .(date_number) ]
head( df, n=48 )
tail( df, n=48 )
df[, .(meanPerHour = mean(delta)), .(hour) ]
要正确执行此操作,您需要创建 Date 对象,您可以在代码中看到该对象,然后减去 16 小时(或加 8),使 16:00 成为新的 0:00 ,然后将其转换回日期,并按该日期的日期编号(您从 as.numeric 获得)分组。
前 48 行:
> head( df, n=48 )
date close hour day month year date_number delta
1: 2017-01-01 00:00:00 2924.671 0 1 1 2017 17166 NA
2: 2017-01-01 01:00:00 3019.730 1 1 1 2017 17166 NA
3: 2017-01-01 02:00:00 2988.162 2 1 1 2017 17166 NA
4: 2017-01-01 03:00:00 3133.018 3 1 1 2017 17166 NA
5: 2017-01-01 04:00:00 3017.546 4 1 1 2017 17166 NA
6: 2017-01-01 05:00:00 3047.795 5 1 1 2017 17166 NA
7: 2017-01-01 06:00:00 2912.731 6 1 1 2017 17166 NA
8: 2017-01-01 07:00:00 3107.180 7 1 1 2017 17166 NA
9: 2017-01-01 08:00:00 2876.211 8 1 1 2017 17166 NA
10: 2017-01-01 09:00:00 2946.021 9 1 1 2017 17166 NA
11: 2017-01-01 10:00:00 3013.483 10 1 1 2017 17166 NA
12: 2017-01-01 11:00:00 3014.441 11 1 1 2017 17166 NA
13: 2017-01-01 12:00:00 2969.755 12 1 1 2017 17166 NA
14: 2017-01-01 13:00:00 3110.976 13 1 1 2017 17166 NA
15: 2017-01-01 14:00:00 3018.507 14 1 1 2017 17166 NA
16: 2017-01-01 15:00:00 2995.602 15 1 1 2017 17166 NA
17: 2017-01-01 16:00:00 2941.672 16 1 1 2017 17167 0.000000
18: 2017-01-01 17:00:00 3076.628 17 1 1 2017 17167 134.956576
19: 2017-01-01 18:00:00 2862.928 18 1 1 2017 17167 -78.743991
20: 2017-01-01 19:00:00 3346.545 19 1 1 2017 17167 404.872660
21: 2017-01-01 20:00:00 2934.287 20 1 1 2017 17167 -7.385360
22: 2017-01-01 21:00:00 3114.609 21 1 1 2017 17167 172.937229
23: 2017-01-01 22:00:00 3039.294 22 1 1 2017 17167 97.622331
24: 2017-01-01 23:00:00 3116.011 23 1 1 2017 17167 174.338827
25: 2017-01-02 00:00:00 2877.843 0 2 1 2017 17167 -63.828732
26: 2017-01-02 01:00:00 2934.232 1 2 1 2017 17167 -7.439448
27: 2017-01-02 02:00:00 2891.967 2 2 1 2017 17167 -49.705095
28: 2017-01-02 03:00:00 3034.642 3 2 1 2017 17167 92.969817
29: 2017-01-02 04:00:00 2826.341 4 2 1 2017 17167 -115.331282
30: 2017-01-02 05:00:00 3037.061 5 2 1 2017 17167 95.389536
31: 2017-01-02 06:00:00 2986.333 6 2 1 2017 17167 44.661103
32: 2017-01-02 07:00:00 3263.606 7 2 1 2017 17167 321.934480
33: 2017-01-02 08:00:00 2979.311 8 2 1 2017 17167 37.638695
34: 2017-01-02 09:00:00 2983.321 9 2 1 2017 17167 41.649113
35: 2017-01-02 10:00:00 2896.498 10 2 1 2017 17167 -45.174011
36: 2017-01-02 11:00:00 2966.731 11 2 1 2017 17167 25.059003
37: 2017-01-02 12:00:00 3027.436 12 2 1 2017 17167 85.764290
38: 2017-01-02 13:00:00 3062.598 13 2 1 2017 17167 120.926630
39: 2017-01-02 14:00:00 3159.810 14 2 1 2017 17167 218.138486
40: 2017-01-02 15:00:00 3145.530 15 2 1 2017 17167 203.858440
41: 2017-01-02 16:00:00 2984.756 16 2 1 2017 17168 0.000000
42: 2017-01-02 17:00:00 3210.481 17 2 1 2017 17168 225.724909
43: 2017-01-02 18:00:00 2733.484 18 2 1 2017 17168 -251.271959
44: 2017-01-02 19:00:00 3093.430 19 2 1 2017 17168 108.674494
45: 2017-01-02 20:00:00 2921.657 20 2 1 2017 17168 -63.098117
46: 2017-01-02 21:00:00 3198.335 21 2 1 2017 17168 213.579029
47: 2017-01-02 22:00:00 2945.484 22 2 1 2017 17168 -39.271663
48: 2017-01-02 23:00:00 3197.860 23 2 1 2017 17168 213.104247
最后 48 条记录:
> tail( df, n=48 )
date close hour day month year date_number delta
1: 18290 3170.775 1 30 1 2020 18290 201.47027428
2: 18290 3293.403 2 30 1 2020 18290 324.09870453
3: 18290 2940.591 3 30 1 2020 18290 -28.71382979
4: 18290 2922.411 4 30 1 2020 18290 -46.89312915
5: 18290 3237.419 5 30 1 2020 18290 268.11402422
6: 18290 2989.678 6 30 1 2020 18290 20.37332637
7: 18290 2932.777 7 30 1 2020 18290 -36.52746038
8: 18291 3188.269 8 30 1 2020 18290 218.96474627
9: 18291 3003.327 9 30 1 2020 18290 34.02206527
10: 18291 2969.222 10 30 1 2020 18290 -0.08292166
11: 18291 2848.911 11 30 1 2020 18290 -120.39313851
12: 18291 2892.804 12 30 1 2020 18290 -76.50054871
13: 18291 3064.894 13 30 1 2020 18290 95.58913403
14: 18291 3172.009 14 30 1 2020 18290 202.70445747
15: 18291 3373.631 15 30 1 2020 18290 404.32650780
16: 18291 3019.765 16 30 1 2020 18291 0.00000000
17: 18291 2748.688 17 30 1 2020 18291 -271.07660267
18: 18291 2718.065 18 30 1 2020 18291 -301.70056024
19: 18291 2817.891 19 30 1 2020 18291 -201.87390563
20: 18291 3086.820 20 30 1 2020 18291 67.05492016
21: 18291 2972.657 21 30 1 2020 18291 -47.10804222
22: 18291 3009.258 22 30 1 2020 18291 -10.50687269
23: 18291 2949.268 23 30 1 2020 18291 -70.49745611
24: 18291 3032.938 0 31 1 2020 18291 13.17296251
25: 18291 3267.187 1 31 1 2020 18291 247.42241735
26: 18291 2984.129 2 31 1 2020 18291 -35.63610546
27: 18291 3053.728 3 31 1 2020 18291 33.96259834
28: 18291 3290.451 4 31 1 2020 18291 270.68616991
29: 18291 2875.921 5 31 1 2020 18291 -143.84421823
30: 18291 3159.612 6 31 1 2020 18291 139.84677795
31: 18291 2798.017 7 31 1 2020 18291 -221.74778788
32: 18292 2833.522 8 31 1 2020 18291 -186.24270860
33: 18292 3184.870 9 31 1 2020 18291 165.10465470
34: 18292 3037.279 10 31 1 2020 18291 17.51427029
35: 18292 3260.309 11 31 1 2020 18291 240.54407728
36: 18292 3178.804 12 31 1 2020 18291 159.03915248
37: 18292 2905.164 13 31 1 2020 18291 -114.60150340
38: 18292 2928.120 14 31 1 2020 18291 -91.64555778
39: 18292 2975.566 15 31 1 2020 18291 -44.19924163
40: 18292 3060.792 16 31 1 2020 18292 0.00000000
41: 18292 2916.899 17 31 1 2020 18292 -143.89373840
42: 18292 3297.537 18 31 1 2020 18292 236.74429212
43: 18292 3208.996 19 31 1 2020 18292 148.20392802
44: 18292 2791.129 20 31 1 2020 18292 -269.66375428
45: 18292 2842.001 21 31 1 2020 18292 -218.79120834
46: 18292 2992.381 22 31 1 2020 18292 -68.41127630
47: 18292 3189.018 23 31 1 2020 18292 128.22565814
48: 18292 2962.099 0 1 2 2020 18292 -98.69355677
平均每小时:
> df[, .(meanPerHour = mean(delta)), .(hour) ]
hour meanPerHour
1: 0 3.5877077
2: 1 1.3695897
3: 2 0.1010658
4: 3 1.4441742
5: 4 -3.0837907
6: 5 -3.1353593
7: 6 11.3738058
8: 7 4.7171345
9: 8 5.0449846
10: 9 1.3226027
11: 10 -2.3716443
12: 11 1.4710920
13: 12 -4.8875706
14: 13 4.7203754
15: 14 2.3528875
16: 15 2.3075150
17: 16 0.0000000
18: 17 -2.1353366
19: 18 4.5127309
20: 19 5.2032461
21: 20 3.8043017
22: 21 3.7928297
23: 22 -3.9258290
24: 23 3.0638861
最后,一个简洁的功能:
average.by.hour.by.reference <- function( df, hrs=16 ) {
df <- as.data.table(df)
df[, date_number := as.numeric(as.Date(ymd_h( sprintf("%d-%d-%dT%d",year,month,day,hour) ) - hours(hrs))) ]
df[, delta := close - close[ hour == hrs ], .(date_number) ]
return( df[, .(meanPerHour = mean(delta,na.rm=TRUE)), .(hour) ] )
}
average.by.hour.by.reference( df, 16 ) # produces the above results
讽刺地
您最有可能获得相同或足够接近实际应用程序的信息,方法是不打扰按日期分组,只需按小时进行全局分组,然后从您想要作为参考的每个小时中减去。
(但我们不会展示所有这些花哨的代码!)
推荐阅读
- sql-server - 在面板内的面板内时,RadioButton 不显示
- hive - 使用 CLI 使用胶水目录创建 EMR Hive 集群
- java - Java 2D 数组不保存用户输入
- scheme - 打印符号并使用宏对其进行评估
- asp.net-core - 我们可以在 ASP.NETCORE 应用程序中托管 ASP.NET SignalR v2.4.1 吗?
- django - Django 2.2 如何在列表视图中禁用复选框
- python - smtplib.SMTPResponseException 与 Python 电子邮件
- android - 如何安全归零并释放 Android 应用程序使用的所有内存页面?
- scala - 使用 Apache Spark SQL 将表序列化为嵌套 JSON
- c# - 如何使用 gmail smtp 服务器在“发件人”字段中使用名字和姓氏发送电子邮件?