首页 > 解决方案 > R data.table:数据集两列之间的滚动协方差(或其他函数),按组

问题描述

所以,我有一个像这样的data.table,长格式

#sample data
sample_size = 10
DT0 <- data.table(
  YEAR = seq(2021, by=-1, length.out = sample_size),
  a1 = seq(5, by=0.035, length.out = sample_size),
  a2 = seq(12, by=0.6, length.out = sample_size),
  a3 = seq(10, by=0.01, length.out = sample_size)
)

#melting to long size
DT <- melt(DT0, 
           id.vars = c("YEAR"), 
           variable.name = "ITEM",
           value.name = "VARIATION")

setkeyv(DT, cols=c("ITEM", "YEAR"))
> print(DT, 100)
     YEAR   ITEM VARIATION
    <num> <fctr>     <num>
 1:  2021     a1     5.000
 2:  2020     a1     5.035
 3:  2019     a1     5.070
 4:  2018     a1     5.105
 5:  2017     a1     5.140
 6:  2016     a1     5.175
 7:  2015     a1     5.210
 8:  2014     a1     5.245
 9:  2013     a1     5.280
10:  2012     a1     5.315
11:  2021     a2    12.000
12:  2020     a2    12.600
13:  2019     a2    13.200
14:  2018     a2    13.800
15:  2017     a2    14.400
16:  2016     a2    15.000
17:  2015     a2    15.600
18:  2014     a2    16.200
19:  2013     a2    16.800
20:  2012     a2    17.400
21:  2021     a3    10.000
22:  2020     a3    10.010
23:  2019     a3    10.020
24:  2018     a3    10.030
25:  2017     a3    10.040
26:  2016     a3    10.050
27:  2015     a3    10.060
28:  2014     a3    10.070
29:  2013     a3    10.080
30:  2012     a3    10.090
     YEAR   ITEM VARIATION

而且我正在尝试使用滚动(假设n = 5)协方差来计算新闻列,成对地为ITEM的每个元素(a1,a2,a3)

我尝试手动创建每一对,执行联合 DT[DT[ITEM==(a1,a2,a3)] 并使用cov(a,b)和滚动函数data.table::frollapply,如下所示:

#joint
DT2 <- DT[DT[ITEM == "a1"], on=.(YEAR)]
> print(DT2, 100)
     YEAR   ITEM VARIATION i.ITEM i.VARIATION
    <num> <fctr>     <num> <fctr>       <num>
 1:  2012     a1     5.315     a1       5.315
 2:  2012     a2    17.400     a1       5.315
 3:  2012     a3    10.090     a1       5.315
 4:  2013     a1     5.280     a1       5.280
 5:  2013     a2    16.800     a1       5.280
 6:  2013     a3    10.080     a1       5.280
 7:  2014     a1     5.245     a1       5.245
 8:  2014     a2    16.200     a1       5.245
 9:  2014     a3    10.070     a1       5.245
10:  2015     a1     5.210     a1       5.210
11:  2015     a2    15.600     a1       5.210
12:  2015     a3    10.060     a1       5.210
13:  2016     a1     5.175     a1       5.175
14:  2016     a2    15.000     a1       5.175
15:  2016     a3    10.050     a1       5.175
16:  2017     a1     5.140     a1       5.140
17:  2017     a2    14.400     a1       5.140
18:  2017     a3    10.040     a1       5.140
19:  2018     a1     5.105     a1       5.105
20:  2018     a2    13.800     a1       5.105
21:  2018     a3    10.030     a1       5.105
22:  2019     a1     5.070     a1       5.070
23:  2019     a2    13.200     a1       5.070
24:  2019     a3    10.020     a1       5.070
25:  2020     a1     5.035     a1       5.035
26:  2020     a2    12.600     a1       5.035
27:  2020     a3    10.010     a1       5.035
28:  2021     a1     5.000     a1       5.000
29:  2021     a2    12.000     a1       5.000
30:  2021     a3    10.000     a1       5.000
     YEAR   ITEM VARIATION i.ITEM i.VARIATION
#computing cov pairs for "a1": cov(a1, a1); cov(a2, a1) and cov(a3, a1)..

DT2[, 
    "Cov(ITEM, a1)" := frollapply(.SD, n=5, FUN=cov(x= VARIATION, y= i.VARIATION)),
    by=.(ITEM)]

但我得到了这个结果:

>Error in match.fun(FUN) : 
  'cov(x = VARIATION, y = i.VARIATION)' is not a function, character or symbol

编辑:通过执行以下操作尝试了@IRTFM 建议:

DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = VARIATION, y = i.VARIATION)}),    by = .(ITEM)]

并得到这个错误:

Error in frollapply(.SD, n = 5, FUN = function(x, y) { :   x must be list, data.frame or data.table of numeric or logical types

VARIATION 和 i.VARIATION 的元素都是数字的,因此,我尝试通过以下方式将它们作为列表返回:

DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = .(VARIATION), y = .(i.VARIATION))}),    by = .(ITEM)]

但返回相同的错误..

有关如何使用 frollapply正确执行此操作的任何提示或建议?

先感谢您!

标签: rdatatabledatatablescovariancerolling-computation

解决方案


推荐阅读