r - R data.table:数据集两列之间的滚动协方差(或其他函数),按组
问题描述
所以,我有一个像这样的data.table,长格式
#sample data
sample_size = 10
DT0 <- data.table(
YEAR = seq(2021, by=-1, length.out = sample_size),
a1 = seq(5, by=0.035, length.out = sample_size),
a2 = seq(12, by=0.6, length.out = sample_size),
a3 = seq(10, by=0.01, length.out = sample_size)
)
#melting to long size
DT <- melt(DT0,
id.vars = c("YEAR"),
variable.name = "ITEM",
value.name = "VARIATION")
setkeyv(DT, cols=c("ITEM", "YEAR"))
> print(DT, 100)
YEAR ITEM VARIATION
<num> <fctr> <num>
1: 2021 a1 5.000
2: 2020 a1 5.035
3: 2019 a1 5.070
4: 2018 a1 5.105
5: 2017 a1 5.140
6: 2016 a1 5.175
7: 2015 a1 5.210
8: 2014 a1 5.245
9: 2013 a1 5.280
10: 2012 a1 5.315
11: 2021 a2 12.000
12: 2020 a2 12.600
13: 2019 a2 13.200
14: 2018 a2 13.800
15: 2017 a2 14.400
16: 2016 a2 15.000
17: 2015 a2 15.600
18: 2014 a2 16.200
19: 2013 a2 16.800
20: 2012 a2 17.400
21: 2021 a3 10.000
22: 2020 a3 10.010
23: 2019 a3 10.020
24: 2018 a3 10.030
25: 2017 a3 10.040
26: 2016 a3 10.050
27: 2015 a3 10.060
28: 2014 a3 10.070
29: 2013 a3 10.080
30: 2012 a3 10.090
YEAR ITEM VARIATION
而且我正在尝试使用滚动(假设n = 5)协方差来计算新闻列,成对地为ITEM的每个元素(a1,a2,a3)
我尝试手动创建每一对,执行联合 DT[DT[ITEM==(a1,a2,a3)] 并使用cov(a,b)和滚动函数data.table::frollapply,如下所示:
#joint
DT2 <- DT[DT[ITEM == "a1"], on=.(YEAR)]
> print(DT2, 100)
YEAR ITEM VARIATION i.ITEM i.VARIATION
<num> <fctr> <num> <fctr> <num>
1: 2012 a1 5.315 a1 5.315
2: 2012 a2 17.400 a1 5.315
3: 2012 a3 10.090 a1 5.315
4: 2013 a1 5.280 a1 5.280
5: 2013 a2 16.800 a1 5.280
6: 2013 a3 10.080 a1 5.280
7: 2014 a1 5.245 a1 5.245
8: 2014 a2 16.200 a1 5.245
9: 2014 a3 10.070 a1 5.245
10: 2015 a1 5.210 a1 5.210
11: 2015 a2 15.600 a1 5.210
12: 2015 a3 10.060 a1 5.210
13: 2016 a1 5.175 a1 5.175
14: 2016 a2 15.000 a1 5.175
15: 2016 a3 10.050 a1 5.175
16: 2017 a1 5.140 a1 5.140
17: 2017 a2 14.400 a1 5.140
18: 2017 a3 10.040 a1 5.140
19: 2018 a1 5.105 a1 5.105
20: 2018 a2 13.800 a1 5.105
21: 2018 a3 10.030 a1 5.105
22: 2019 a1 5.070 a1 5.070
23: 2019 a2 13.200 a1 5.070
24: 2019 a3 10.020 a1 5.070
25: 2020 a1 5.035 a1 5.035
26: 2020 a2 12.600 a1 5.035
27: 2020 a3 10.010 a1 5.035
28: 2021 a1 5.000 a1 5.000
29: 2021 a2 12.000 a1 5.000
30: 2021 a3 10.000 a1 5.000
YEAR ITEM VARIATION i.ITEM i.VARIATION
#computing cov pairs for "a1": cov(a1, a1); cov(a2, a1) and cov(a3, a1)..
DT2[,
"Cov(ITEM, a1)" := frollapply(.SD, n=5, FUN=cov(x= VARIATION, y= i.VARIATION)),
by=.(ITEM)]
但我得到了这个结果:
>Error in match.fun(FUN) :
'cov(x = VARIATION, y = i.VARIATION)' is not a function, character or symbol
编辑:通过执行以下操作尝试了@IRTFM 建议:
DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = VARIATION, y = i.VARIATION)}), by = .(ITEM)]
并得到这个错误:
Error in frollapply(.SD, n = 5, FUN = function(x, y) { : x must be list, data.frame or data.table of numeric or logical types
VARIATION 和 i.VARIATION 的元素都是数字的,因此,我尝试通过以下方式将它们作为列表返回:
DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = .(VARIATION), y = .(i.VARIATION))}), by = .(ITEM)]
但返回相同的错误..
有关如何使用 frollapply正确执行此操作的任何提示或建议?
先感谢您!