首页 > 解决方案 > 使用 R 中的循环对时间序列进行迭代计算

问题描述

我的数据由 17 个样本的 30 年可观察值组成。下面的测试数据框将样本深度最小化为三个,但保留了时间序列长度。目前,当我将年份锚定到 1991 年时,我的代码仅返回第一列所需的计算。注意:由于公式中的滞后,1991 年是第一个可能的计算。我希望能够为每个样本在所有年份迭代地运行计算。

test <-tibble(Year=c(1988:2017),value=c(runif(30)),value.2=c(runif(30)),value.3=c(runif(30)))
index=1991
x <-test[test$Year ==index,"value"]
lag.3 <-test[test$Year ==index-3,"value"]
lag.2 <-test[test$Year ==index-2,"value"]
lag.1 <-test[test$Year ==index-1,"value"]
lead.3 <-test[test$Year ==index+3,"value"]
lead.2 <-test[test$Year ==index+2,"value"]
lead.1 <-test[test$Year ==index+1,"value"]
average_lag_3 =(lag.1 +lag.2 +lag.3)/3
average_lead_3 = (lead.1 +lead.2 +lead.3)/3
var.x <- ((((lag.3 +lag.2+lag.1)*(x-average_lag_3)^2)/2)+(((lead.3 +lead.2+lead.1)*(index-average_lead_3)^2)/2))/2
dscore <- average_lag_3 - average_lead_3/(sqrt(var.x)*sqrt(2/3))

标签: rloopsiterationlaglead

解决方案


假设你只想要dscore后面,我把你的代码放在一个函数中。我所做的唯一更改是"value"columntest替换data,以匹配我选择的参数名称:

## you should choose a better name than `foo`...
foo <- function(column, data, index = 1991) {
  x <- data[data$Year == index, column]
  lag.3 <- data[data$Year == index - 3, column]
  lag.2 <- data[data$Year == index - 2, column]
  lag.1 <- data[data$Year == index - 1, column]
  lead.3 <- data[data$Year == index + 3, column]
  lead.2 <- data[data$Year == index + 2, column]
  lead.1 <- data[data$Year == index + 1, column]
  average_lag_3 = (lag.1 + lag.2 + lag.3) / 3
  average_lead_3 = (lead.1 + lead.2 + lead.3) / 3
  var.x <-
    ((((lag.3 + lag.2 + lag.1) * (x - average_lag_3) ^ 2) / 2) + 
       (((lead.3 + lead.2 + lead.1) * (index - average_lead_3) ^ 2) / 2)) / 2
  dscore <- average_lag_3 - average_lead_3 / (sqrt(var.x) * sqrt(2 / 3))
}

## We can then apply this function to all the column names except the first one:
sapply(names(test[-1]), foo, data = test)
# $value.value
# [1] 0.4992873
# 
# $value.2.value.2
# [1] 0.1238061
# 
# $value.3.value.3
# [1] 0.8298876

同样,我们可以遍历年份和列:

## Generate combinations
my_years = 1991:1994
my_cols = names(test[-1])
year_col = expand.grid(year = my_years, column = my_cols)
## Make a place for the results
year_col$dscore = NA

## Do the calculations
for(i in 1:nrow(year_col)) {
  year_col$dscore[i] = foo(year_col$column[i], index = year_col$year[i], data = test)  
}
year_col
#    year  column     dscore
# 1  1991   value  0.4992873
# 2  1992   value   0.389055
# 3  1993   value  0.4510184
# 4  1994   value  0.5745417
# 5  1991 value.2  0.1238061
# 6  1992 value.2 0.08960943
# 7  1993 value.2  0.2999356
# 8  1994 value.2  0.4479232
# 9  1991 value.3  0.8298876
# 10 1992 value.3  0.7607005
# 11 1993 value.3  0.6520277
# 12 1994 value.3  0.5498328

推荐阅读