首页 > 解决方案 > 循环遍历许多对物种模型以增强 R 中的相关系数

问题描述

这个问题类似于使用 cor.test() 来获取多对变量的相关系数(例如,通过引导计算相关系数,使用 cor.test 在许多类别上进行循环),但在这里我试图遍历多个模型并仅获得我关心的 1 个变量的模型之间的相关系数。我是引导程序和循环的新手,我一直试图同时做这两件事。

我有 16 种模型在渔业中捕获的物种。我只关心一个变量(渔具),我想引导每个渔具的回归系数以获得每个回归系数的稳健估计,然后计算不同物种之间的相关系数。我有一个循环为每个物种对手动执行此操作,但我想遍历所有模型并将结果输出到数据框,并用一列标记每个物种对。

library(mgcv) # for GAMs and GAM outputs

## generate random data for species models
num.caught <- sample(x=0:1000, size =50, replace = TRUE) 
year <- sample(x =2000:2010, size=50, replace=TRUE)
gear <- sample(c('net','line','trawl'), 50, replace=TRUE)
species1.dat <- data.frame(num.caught, year, gear)
species1.gam <- gam(num.caught ~ year + gear, data= species1.dat)
#summary(species1.gam)
#species1.gam$coefficients

num.caught <- sample(x=0:100, size =25, replace = TRUE) 
year <- sample(x =2000:2010, size=25, replace=TRUE)
gear <- sample(c('net','line','trawl'), 25, replace=TRUE)
species2.dat <- data.frame(num.caught, year, gear)
species2.gam <- gam(num.caught ~ year + gear, data= species2.dat)

num.caught <- sample(x=0:500, size =30) 
year <- sample(x =2000:2005, size=30, replace=TRUE)
gear <- sample(c('net','line','trawl'), 30, replace=TRUE)
species3.dat <- data.frame(num.caught, year, gear)
species3.gam <- gam(num.caught ~ year + gear, data= species3.dat)

# Make list of all models in environment
spp.names <- grep(".gam", names(.GlobalEnv), value=TRUE)

mod.1 <- species1.gam
mod.2 <- species2.gam
mod.3 <- species3.gam
# etc...

NoSamples <- 1000
CC <- rep(NA, NoSamples)
for(i in 1:NoSamples) {
#get data from a random draw for species 1
Index1 <- grep("gear",names(summary(mod.1)$p.coeff))
Sp1 <- rnorm(summary(mod.1)$p.coeff[Index1],summary(mod.1)$se[Index1])
#now species 2 
Index2 <- grep("gear",names(summary(mod.2)$p.coeff))
Sp2 <- rnorm(summary(mod.2)$p.coeff[Index2],summary(mod.2)$se[Index2])
#now get the correlation coefficient and store it
CC[i] <- cor(Sp1,Sp2)
} 
## the loop works to here, but I would have to manually re-run with every combination of the 16 species

## This should go inside the loop
quants <- tibble::rownames_to_column(data.frame(quantile(CC)), "quantile") %>% rename(quant_val="quantile.CC.") 
df.CC <- data.frame(mean(CC), quants)
# paste names for each species 
df.CC$spp_pair <- paste0(names(mod.1), "_", names(mod.2)) # This is wrong, it pastes all the col names for each model, not the name of the model itself
df.CC.wide <- pivot_wider(data=df.CC, id_cols = c(spp_pair, mean.CC.), 
                          names_from = quantile, names_prefix = "quant",
                          values_from = quant_val)
names(df.CC.wide) <- gsub(pattern = "%", replacement="", x=names(df.CC.wide))

在这里我可以手动重命名和绑定每个结果数据框,但是应该有一种方法可以遍历所有模型吗?我认为它也可以用 lapply 来完成?

# The desired output would be a dataframe with 1 row for each species pair, e.g:

spp_pair              mean.CC. quant0 quant25 quant50 quant75 quant100
1 species1_species2    0.940  0.940   0.940   0.940   0.940    0.940
2 species1_species3    0.200  0.180   0.190   0.200   0.210    0.220
3 species2_species3    0.750  0.600   0.700   0.720   0.800    0.810

标签: rfor-loopregression

解决方案


运行上述代码后summary(mod.1)$p.coeff返回NULL. 我不确定在运行代码时是否遗漏了任何内容,但总的来说,您可以使用mget+来实现您想要的lapply

使用mget我们可以获得列表中的所有模型,并lapply从中提取所需的统计数据。所以像这样的东西应该适合你。

lapply(mget(spp.names), function(x) {
  Index1 <- grep("gear",names(summary(x)$p.coeff))
  rnorm(summary(x)$p.coeff[Index1],summary(x)$se[Index1])
}) -> result

result

推荐阅读