首页 > 解决方案 > 如何使用 R 中的 Synth() 包遍历预测变量列表

问题描述

我在 R 中使用“Synth”包(参见ftp://cran.r-project.org/pub/R/web/packages/Synth/Synth.pdf),我想知道如何运行所有我的预测变量的可能组合。我一直在使用非常有用的以前的“合成器”循环问题here(循环结果变量)和here(将循环保存在列表中),但都没有完全解决我的问题,我仍然感觉卡住了。

为简单起见,我将使用循环结果变量中的前一个玩具数据集:

all_data_uk <- structure(list(countryno = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 16, 16, 16), country = c("Australia", "Australia", "Australia", "Canada", "Canada", "Canada", "Denmark", "Denmark", "Denmark", "United Kingdom", "United Kingdom", "United Kingdom"), year = c(1971, 1972, 1973, 1971, 1972, 1973, 1971, 1972, 1973, 1971, 1972, 1973), top10_income_share = c(0.2657, 0.2627, 0.2546, 0.37833, 0.37807, 0.37271, 0.323069660453, 0.322700285165, 0.320162826601, 0.2929, 0.289, 0.2831), top5_income_share = c(0.1655, 0.1654, 0.1593, 0.24075, 0.24106, 0.23917, 0.211599113574, 0.21160700537, 0.209096813051, 0.1881, 0.1848, 0.1818), top1_income_share = c(0.0557, 0.0573, 0.054, 0.08866, 0.08916, 0.08982, 0.082392548404, 0.0824267594074, 0.07776546085945, 0.0702, 0.0694, 0.0699), gdp_growth =     structure(c(4.00330835508684,3.91178191457604, 2.59931282534502, 4.11765761702448,5.44585557970514, 6.96420291945871, 3.00503299618597, 3.92934382503836,4.09292523611968, 3.48436803631409, 4.30194591910262,6.50872079327365), label ="(annual %)", class = c("labelled", "numeric")), capital_quinn = structure(c(50, 37.5, 37.5,87.5, 87.5, 75, 75, 75, 75, 50, 50, 50), label = (financial openness - capital     account)", class = c("labelled", "numeric"))), class = "data.frame", .Names = c("countryno", "country", "year", "top10_income_share", "top5_income_share", "top1_income_share", "gdp_growth", "capital_quinn"), row.names = c(NA, -12L))

使用“合成器”数据准备,输出如下:

control_units_top10 <- c(1,2)
treated_unit <- 16

# Run dataprep() which returns a list of matrices
dataprep.out_top10 <- dataprep(
  foo = all_data_uk,
  predictors = c("gdp_growth", "capital_quinn"),
  predictors.op = "mean", 
  time.predictors.prior = 1971:1972,
  special.predictors = list(
    list("top10_income_share", 1971, "mean"),
    list("top10_income_share", 1972, "mean")),
  dependent = "top10_income_share",
  unit.variable = "countryno",
  unit.names.variable = "country",
  time.variable = "year",
  treatment.identifier = treated_unit,
  controls.identifier = control_units_top10,
  time.optimize.ssr = 1971:1972,
  time.plot = 1971:1973)

# Run synth() command
synth.out_top10 <- synth(data.prep.obj = dataprep.out_top10, optimxmethod = "BFGS")

我想创建一个循环,以便预测变量(1)“gdp_growth”、(2)“capital_quinn”和(3)“gdp_growth”和“capital_quinn”的每次迭代都运行并存储在一个列表中,以便我可以比较MSPE 来自对 v 和 w 权重('loss.v'、'loss.w')的优化。换句话说:

predictors = c("gdp_growth")
predictors = c("capital_quinn")
predictors = c("gdp_growth", "capital_quinn")

实际上,我有五个预测变量,因此我需要一种更有效的方法来运行预测变量的组合。

标签: rloopsfor-looplapplysynth

解决方案


如果你想组合预测器,你可以使用这个语法expand.grid()..所以例子我有 2 个包含 ID 和产品名称的向量,在这里我可以将这两者与这些组合:

vec1 = c(1,4,7)
vec2 = c("ProdA", "ProdB", "ProdC")
expand.grid(vec1, vec2)
Var1  Var2
1    1 ProdA
2    4 ProdA
3    7 ProdA
4    1 ProdB
5    4 ProdB
6    7 ProdB
7    1 ProdC
8    4 ProdC
9    7 ProdC
combination = expand.grid(vec1, vec2)
combination$combined = paste0(combination$Var2, combination$Var1)
combination
  Var1  Var2 combined
1    1 ProdA   ProdA1
2    4 ProdA   ProdA4
3    7 ProdA   ProdA7
4    1 ProdB   ProdB1
5    4 ProdB   ProdB4
6    7 ProdB   ProdB7
7    1 ProdC   ProdC1
8    4 ProdC   ProdC4
9    7 ProdC   ProdC7

推荐阅读