r - 循环遍历许多对物种模型以增强 R 中的相关系数
问题描述
这个问题类似于使用 cor.test() 来获取多对变量的相关系数(例如,通过引导计算相关系数,并使用 cor.test 在许多类别上进行循环),但在这里我试图遍历多个模型并仅获得我关心的 1 个变量的模型之间的相关系数。我是引导程序和循环的新手,我一直试图同时做这两件事。
我有 16 种模型在渔业中捕获的物种。我只关心一个变量(渔具),我想引导每个渔具的回归系数以获得每个回归系数的稳健估计,然后计算不同物种之间的相关系数。我有一个循环为每个物种对手动执行此操作,但我想遍历所有模型并将结果输出到数据框,并用一列标记每个物种对。
library(mgcv) # for GAMs and GAM outputs
## generate random data for species models
num.caught <- sample(x=0:1000, size =50, replace = TRUE)
year <- sample(x =2000:2010, size=50, replace=TRUE)
gear <- sample(c('net','line','trawl'), 50, replace=TRUE)
species1.dat <- data.frame(num.caught, year, gear)
species1.gam <- gam(num.caught ~ year + gear, data= species1.dat)
#summary(species1.gam)
#species1.gam$coefficients
num.caught <- sample(x=0:100, size =25, replace = TRUE)
year <- sample(x =2000:2010, size=25, replace=TRUE)
gear <- sample(c('net','line','trawl'), 25, replace=TRUE)
species2.dat <- data.frame(num.caught, year, gear)
species2.gam <- gam(num.caught ~ year + gear, data= species2.dat)
num.caught <- sample(x=0:500, size =30)
year <- sample(x =2000:2005, size=30, replace=TRUE)
gear <- sample(c('net','line','trawl'), 30, replace=TRUE)
species3.dat <- data.frame(num.caught, year, gear)
species3.gam <- gam(num.caught ~ year + gear, data= species3.dat)
# Make list of all models in environment
spp.names <- grep(".gam", names(.GlobalEnv), value=TRUE)
mod.1 <- species1.gam
mod.2 <- species2.gam
mod.3 <- species3.gam
# etc...
NoSamples <- 1000
CC <- rep(NA, NoSamples)
for(i in 1:NoSamples) {
#get data from a random draw for species 1
Index1 <- grep("gear",names(summary(mod.1)$p.coeff))
Sp1 <- rnorm(summary(mod.1)$p.coeff[Index1],summary(mod.1)$se[Index1])
#now species 2
Index2 <- grep("gear",names(summary(mod.2)$p.coeff))
Sp2 <- rnorm(summary(mod.2)$p.coeff[Index2],summary(mod.2)$se[Index2])
#now get the correlation coefficient and store it
CC[i] <- cor(Sp1,Sp2)
}
## the loop works to here, but I would have to manually re-run with every combination of the 16 species
## This should go inside the loop
quants <- tibble::rownames_to_column(data.frame(quantile(CC)), "quantile") %>% rename(quant_val="quantile.CC.")
df.CC <- data.frame(mean(CC), quants)
# paste names for each species
df.CC$spp_pair <- paste0(names(mod.1), "_", names(mod.2)) # This is wrong, it pastes all the col names for each model, not the name of the model itself
df.CC.wide <- pivot_wider(data=df.CC, id_cols = c(spp_pair, mean.CC.),
names_from = quantile, names_prefix = "quant",
values_from = quant_val)
names(df.CC.wide) <- gsub(pattern = "%", replacement="", x=names(df.CC.wide))
在这里我可以手动重命名和绑定每个结果数据框,但是应该有一种方法可以遍历所有模型吗?我认为它也可以用 lapply 来完成?
# The desired output would be a dataframe with 1 row for each species pair, e.g:
spp_pair mean.CC. quant0 quant25 quant50 quant75 quant100
1 species1_species2 0.940 0.940 0.940 0.940 0.940 0.940
2 species1_species3 0.200 0.180 0.190 0.200 0.210 0.220
3 species2_species3 0.750 0.600 0.700 0.720 0.800 0.810
解决方案
运行上述代码后summary(mod.1)$p.coeff
返回NULL
. 我不确定在运行代码时是否遗漏了任何内容,但总的来说,您可以使用mget
+来实现您想要的lapply
。
使用mget
我们可以获得列表中的所有模型,并lapply
从中提取所需的统计数据。所以像这样的东西应该适合你。
lapply(mget(spp.names), function(x) {
Index1 <- grep("gear",names(summary(x)$p.coeff))
rnorm(summary(x)$p.coeff[Index1],summary(x)$se[Index1])
}) -> result
result
推荐阅读
- flutter - 关闭应用程序或在后台发出警报 - Flutter
- python - 何时在 AWS 中创建新的 python 会话?
- react-router - 如何在 IPFS 中使用反应路由
- r - rvest 抓取不同长度的数据
- python - Python - 硒得到错误:消息:元素
无法通过键盘访问 - matlab-deployment - 用于 Java 封装 Matlab 函数的“mlx”接口?
- r - 如何标记 ggridges 包中每个 bin 的计数?
- c++ - 升级到 macOS Catalina 后使用 cmake 为 c++ 构建 opencv4 失败
- angular - Angular:“Window & typeof globalThis”类型上不存在属性“xxx”
- docusignapi - 尝试发送 eNotary 信封时出现错误 NOTARY_HOSTED_SIGNER_ID_REQUIRED