首页 > 解决方案 > 预测空间 GAM-GEE 中循环协变量的影响

问题描述

我在{geepack}包中安装了 GAM-GEE,因为我想在已知个体的一些空间动物数据的个体残差自相关中考虑。我有一个循环协变量,我想使用循环样条来拟合,所以我通过首先在 中生成样条的基函数来做到这一点{mgcv},因为循环三次样条当前在{splines}包中不可用,然后将其用作协变量geeglm()合身。我首先这样做:

cyclic_basisfunc <- mgcv::gam(response ~s(cyclic_term, bs="cc"), fit=F, data=df, family = "binomial")$X %>%
  as_tibble() %>%
  dplyr::select(V2:ncol(.)) %>%
  as.matrix()

接着:

fit1 <- geepack::geeglm(response ~ cyclic_basisfunc +
                                   bs(beta1)+
                                   bs(lon+lat),
                                   family="binomial",
                                   data=df,
                                   id=as.factor(id),
                                   corstr = "independence")

我可以通过提取模型矩阵中与该项相对应的相关元素来估计循环平滑的部分影响(在这种情况下,循环项有 8 个基函数)。但是,在使用 predict() 函数对空间网格进行预测时,我无法弄清楚如何正确命名预测网格中的元素。循环项的输入变量是一个矩阵,任何将预测值包含为数字向量或矩阵的尝试都会返回错误。

我尝试在我的研究区域上生成一个预测网格(nb 尝试使用 cyclic_basisfunc 和 cyclic_term 的名称作为列标题):

    predgrid <- data.frame(expand.grid(lat=seq(min(df$lat-0.1), max(df$lat+0.1), length.out = 20),
                                         lon=seq(min(df$lon-0.1), max(df$lon+0.1), length.out = 20),
                                       cyclic_basis_func=seq(min(cyclic_basis_func), 
                                       max(cyclic_basisfunc), length.out=20),
                                       beta1=25)))
predict(fit1, predgrid)

并得到错误:

Error: variable 'cyclic_basisfunc' was fitted with type "nmatrix.8" but type "numeric" was supplied

这是有道理的,因为原始协变量被拟合为一个 8 列矩阵,每个基函数 1 个。已尝试将一列作为 8 列矩阵输入到预测网格中,但这似乎仍然不起作用。有没有办法为这个模型生成一个合适的预测网格?(以下数据样本的输入)。

df <- structure(list(response = c(0.117, 0.915, 0.22, 0.284, 0.551, 
0.871, 0.25, 0.261, 0.117, 0.875, 0.67, 0.533, 0.324, 0.138, 
0.154, 0.286, 0.935, 0.744, 0.118, 0.224, 0.865, 0.13, 0.889, 
0.115, 0, 0, 0.703, 0.917, 0.812, 0.14, 0.219, 0.114, 0.24, 0.163, 
0.115, 0.228, 0, 0, 0.115, 0.106, 0.243, 0.13, 0.908, 0.117, 
0.472, 0.95, 0.217, 0.133, 0.744, 0.92, 0.26, 0.138, 0.958, 0.113, 
0.147, 0.132, 0.496, 0.148, 0.119, 0.186, 0.721, 0.889, 0.162, 
0.157, 0.269, 0.25, 0.129, 0.357, 0.188, 0.361, 0.137, 0.128, 
0.872, 0.121, 0.135, 0.466, 0.15, 0.589, 0.138, 0.134, 0.122, 
0.463, 0.121, 0.369, 0.813, 0.145, 0.911, 0.17, 0.123, 0.649, 
0.476, 0.119, 0.367, 0.135, 0.923, 0.875, 0.119, 0.115, 0.895, 
0.923), cyclic_term = c(0.732222222222222, -2.28277777777778, 
2.56777777777778, -3.43333333333333, -0.89, -5.68611111111111, 
-3.47388888888889, -2.88277777777778, -1.79277777777778, 1.50333333333333, 
-0.910555555555556, -4.14944444444444, -0.379027777777778, 0.113333333333333, 
0.075, -3.99055555555556, -5.48388888888889, -1.84513888888889, 
0.286111111111111, -1.24833333333333, -2.19652777777778, -0.532222222222222, 
0.598333333333333, 5.43222222222222, -2.73222222222222, -1.11125, 
3.16833333333333, -2.88055555555556, 1.90319444444444, 2.585, 
-5.64333333333333, -3.79666666666667, 0.692083333333333, 5.80666666666667, 
-4.42333333333333, 1.95666666666667, 2.61722222222222, -5.90055555555556, 
1.63, 3.55222222222222, -4.20111111111111, -2.34, 3.39277777777778, 
-4.32944444444444, 1.32222222222222, 4.74388888888889, 0.251111111111111, 
0.0705555555555556, -1.84513888888889, 5.14305555555556, -3.92555555555556, 
-2.34277777777778, 2.55777777777778, -3.75944444444444, 2.32277777777778, 
1.82944444444444, -3.42111111111111, 3.22388888888889, -3.90055555555556, 
1.94, -5.01888888888889, 4.93902777777778, -2.97388888888889, 
4.11222222222222, 1.55055555555556, -2.29055555555556, 0.556666666666667, 
1.40375, 3.52944444444444, 4.56555555555556, 1.30611111111111, 
-2.59944444444444, 4.11166666666667, 6.005, 1.28111111111111, 
-2.35333333333333, -0.125, 0.991666666666667, -4.29055555555556, 
4.64055555555556, 1.19222222222222, -0.710555555555556, 0.125, 
3.835, -3.79722222222222, 1.46, 0.455833333333333, -5.855, 2.59277777777778, 
-1.42777777777778, 4.815, 0.508888888888889, -2.14333333333333, 
-1.47444444444444, 5.01847222222222, 3.06666666666667, 5.92388888888889, 
1.39944444444444, 5.00236111111111, 4.21666666666667), beta1 = c(32.95, 
28.8, 32.15, 32.75, 34.7, 29.7, 28.95, 28.85, 32.25, 28.5, 33.5, 
28.5, 30.75, 28.8, 32.4, 32.65, 29.7, 32.25, 29.7, 31.85, 32.15, 
31.45, 31.25, 31.05, 38.7, 35.2, 31.65, 33, 32.05, 28.7, 31.85, 
35.5, 31.25, 35.25, 33.25, 29.7, 35.5, 30.55, 35.45, 36, 33, 
29.85, 31.15, 33.35, 34.8, 28.1, 35.35, 28.8, 32.25, 29.3, 29.7, 
28.95, 28.4, 34.7, 28.5, 28.8, 28.5, 33.1, 36.35, 29, 26.95, 
33.05, 32.2, 29.2, 30.35, 36, 29.7, 34.25, 34.1, 31.9, 32.05, 
28.9, 33.3, 31.85, 32.55, 28.8, 29.1, 39.2, 28.95, 32.15, 28.8, 
33.1, 29.5, 37.95, 32.85, 28.5, 30.3, 34.55, 28.15, 33.35, 35.35, 
31.6, 35.95, 28.9, 31.1, 32.5, 35.7, 31.85, 32.95, 33.55), lon = c(-8.14386899769604, 
-8.1572279376935, -8.15157242384156, -8.11266145447895, -8.15174118001044, 
-8.15335952072546, -8.14600667978252, -8.15297882985764, -8.15568356665537, 
-8.13472008705139, -8.09368161533273, -8.10491923518749, -8.1603014305949, 
-8.15632063618518, -8.13543502792374, -8.09733172904193, -8.15868642071182, 
-8.12876868592058, -8.15690393084368, -8.10847025857788, -8.15564957894737, 
-8.16047965739412, -8.1538774272955, -8.13959002494812, -8.16031308174808, 
-8.13153629064039, -8.10225327088153, -8.11704735322503, -8.10320579591837, 
-8.09718723480212, -8.14769670963066, -8.15279598640478, -8.1536752924518, 
-8.15005347845117, -8.11959004402161, -8.15327362169542, -8.15338984397156, 
-8.15480377425017, -8.14843624352758, -8.1536150198704, -8.14265275265451, 
-8.15419676931013, -8.14388546959556, -8.12423783110794, -8.15865186565657, 
-8.12779523300791, -8.15498210353148, -8.15711005849511, -8.12876868592058, 
-8.14498268712871, -8.12777905464012, -8.14658887045715, -8.14966988563538, 
-8.15137416124342, -8.14757286777317, -8.15659830243711, -8.11739216850327, 
-8.14816670318125, -8.15283383471808, -8.15503278645551, -8.12968355281506, 
-8.11532096238244, -8.15388445232111, -8.12550097443724, -8.14214966153336, 
-8.15406262640947, -8.15366204068896, -8.16073804747475, -8.14748077293754, 
-8.10236112317726, -8.13306401111526, -8.15754008293152, -8.11496173845736, 
-8.14744857712061, -8.13097980901942, -8.15712565747807, -8.16003438931358, 
-8.16002796870351, -8.11892837135011, -8.13700008392334, -8.15721941772399, 
-8.14819490909576, -8.1561399618782, -8.16012501716613, -8.11709369783742, 
-8.13470092070733, -8.14629675, -8.15713679162397, -8.13686372838595, 
-8.12430530737705, -8.15464372671311, -8.15669989585876, -8.16044796649062, 
-8.15766701538992, -8.09362511111111, -8.13870000839233, -8.14998125100998, 
-8.15243885077905, -8.12291705479452, -8.15384632745479), lat = c(33.6622974395803, 
33.6600173368609, 33.6599819460086, 33.6598656189251, 33.6673908233704, 
33.6593042234088, 33.6572338143965, 33.6580708565473, 33.6629485478539, 
33.6547317504883, 33.6598712810572, 33.6567040290043, 33.6652851274788, 
33.6596383685524, 33.6570077561196, 33.6611995549193, 33.661593106719, 
33.6588793953069, 33.6614662478323, 33.6584457075095, 33.6641300278638, 
33.6621415089752, 33.6598426484043, 33.6570205688477, 33.6727939605693, 
33.6593126847291, 33.6591196082918, 33.6591313969, 33.6605346, 
33.6553558936485, 33.6650662271625, 33.6653484273653, 33.6600826614748, 
33.6642524699626, 33.6585006713867, 33.658733988916, 33.664975062454, 
33.6610514512704, 33.6621421965042, 33.6687249002733, 33.6576841575676, 
33.6600360803426, 33.6574947407709, 33.6584060573279, 33.6673802707071, 
33.6550894507841, 33.6654066532252, 33.6595139325288, 33.6588793953069, 
33.6568044356436, 33.6559013620991, 33.6568076288124, 33.6572189331055, 
33.6670448613725, 33.6563744930416, 33.6598780530983, 33.6554991693199, 
33.6602992307323, 33.6667773110577, 33.6591215807971, 33.6549378741848, 
33.6592876394984, 33.6624833258972, 33.6556429238423, 33.6564236265265, 
33.6680011078708, 33.6591718634038, 33.6684195434343, 33.6642363505652, 
33.6599223753418, 33.6575357831156, 33.6607284545898, 33.6602992227631, 
33.6646009116676, 33.6569601709497, 33.6597380618501, 33.6610439757783, 
33.6709440919706, 33.6553216040898, 33.6567993164063, 33.6604019538554, 
33.6604042053223, 33.6607238769535, 33.6704235076904, 33.6597095889516, 
33.6546409606935, 33.658124, 33.6659726122235, 33.6550456763148, 
33.6591343852459, 33.6648648385784, 33.6633987426758, 33.6695201510491, 
33.6609040792869, 33.655895, 33.6606597900391, 33.6629373624763, 
33.6612642187822, 33.6580512191781, 33.6635307311956), id = c(8, 
14, 12, 12, 7, 12, 8, 12, 10, 12, 14, 12, 6, 14, 10, 12, 8, 4, 
14, 14, 2, 10, 2, 12, 9, 5, 12, 12, 5, 10, 12, 14, 2, 14, 14, 
14, 12, 12, 14, 8, 12, 10, 12, 14, 6, 12, 12, 14, 4, 3, 12, 10, 
14, 8, 14, 14, 12, 10, 8, 14, 12, 3, 10, 12, 12, 12, 14, 6, 14, 
9, 12, 10, 12, 14, 14, 14, 14, 10, 12, 12, 14, 12, 14, 14, 12, 
12, 4, 8, 14, 2, 14, 14, 11, 10, 3, 12, 8, 14, 3, 12)), row.names = c("25101", 
"15358", "80097", "89024", "27479", "98805", "25425", "86335", 
"31333", "93483", "12171", "100849", "155418", "12853", "33858", 
"100851", "22470", "149448", "7443", "12167", "144934", "33938", 
"144132", "91062", "56909", "153781", "95039", "99533", "153760", 
"32687", "86913", "12298", "144133", "7402", "11672", "13939", 
"78548", "98801", "8135", "24867", "91818", "32609", "95688", 
"11675", "155218", "94268", "90367", "7440", "149447", "145506", 
"90571", "35105", "14210", "26177", "14975", "16723", "86359", 
"34450", "26139", "14198", "89237", "145503", "31062", "92665", 
"87694", "78666", "13917", "155219", "15350", "60377", "82820", 
"33174", "87056", "7406", "15370", "15356", "9330", "33533", 
"86726", "95709", "8131", "96538", "13911", "14229", "86539", 
"93482", "145837", "22101", "11305", "144939", "7391", "7445", 
"21817", "32804", "145314", "98223", "24175", "8132", "145504", 
"87396"), class = "data.frame")

标签: r

解决方案


您是否尝试过使用与原始数据相同的基础来评估新值,然后将结果用作协变量?请注意,您会收到几个警告,因为您的回答是比例/概率,而您没有给出权重或类似的东西。假设模型指定正确,看看下面的代码是否符合您的要求。

您似乎可以使用 predict.gam 中的“lpmatrix”类型从 GAM 中提取样条线。请参阅此处: 如何从 GAM (`mgcv::gam`) 中提取拟合样条曲线 老实说,我有点迷失,所以我建议只为协变量设置基础而不拟合 GAM。

确保结的数量和位置有意义并建立基础。据我了解, cSplinesDes() 在所需点评估基函数

spl_bs <- cSplineDes(x = df$cyclic_term, 
                     knots = seq(min(df$cyclic_term),
                                 max(df$cyclic_term),
                                 length.out = 8), 
                     ord = 4, derivs = 0)

给出合理的名字

colnames(spl_bs) <- paste0("cyclic.", 1:ncol(spl_bs))

您可能希望在数据框中包含这些新列。请注意,我删除了下面的拦截术语,但您可能希望将其保留在

df <- cbind(spl_bs, df)

用方便代码拟合 GEE 以指定模型

model <- reformulate(c(-1,
                       paste0("cyclic.", 1:(ncol(spl_bs))),
                       "splines::bs(beta1)", "splines::bs(lon+lat)"), 
                       response = "response")

fit1 <- geepack::geeglm(model,
                        family="binomial",
                        data=df,
                        id=as.factor(id),
                        corstr = "independence")

现在有了预测。创建新数据

predgrid <- data.frame(expand.grid(lat=seq(min(df$lat-0.1), max(df$lat+0.1), length.out = 20),
                                   lon=seq(min(df$lon-0.1), max(df$lon+0.1), length.out = 20),
                                   cyclic_term=seq(min(df$cyclic_term), 
                                                   max(df$cyclic_term), length.out=20),
                                   beta1=25))

设置与原始数据相同的基础,但现在在新点进行评估

new_spline <- cSplineDes(predgrid$cyclic_term, 
                         knots = seq(min(df$cyclic_term),
                                     max(df$cyclic_term),
                                     length.out = 8),
                         ord = 4, derivs=0)
    
colnames(new_spline) <- paste0("cyclic.", 1:ncol(new_spline))

组装新数据并根据 fit1 进行预测

newdf <- cbind(new_spline, predgrid)
predict(fit1, newdf)

请注意,纬度/经度超出原始数据范围,这是危险的


推荐阅读