首页 > 解决方案 > 使用 group by 和 tidy 运行多个模型并将结果提取到数据框

问题描述

我想group_by %>% do(tidy(*))用来运行几个线性回归模型并将模型结果提取到数据框中。每个模型的数据框应包括以下内容:结果变量、暴露变量、样本量、β 系数、SE 和 p 值。

library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

模型应该对针对所有协变量调整的每次暴露的每个结果进行回归,例如

lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...

最终的代码可能看起来像这样?

models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))

标签: rlinear-regressiondata-manipulation

解决方案


这是一个解决方案,它首先为要运行的每个模型创建公式,然后从要分析的数据集中调用正确的变量,而不是重塑数据集本身并应用模型:

library(tidyverse)
library(broom)

outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

expand.grid(outcomes, exposures, covariates) %>%
  group_by(Var1, Var2) %>%
  summarise(Var3 = paste0(Var3, collapse = "+")) %>%
  rowwise() %>%
  summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
  group_by(model_id = row_number(),
           frm) %>%
  do(tidy(lm(.$frm, data = mtcars))) %>%
  ungroup()

# # A tibble: 52 x 7
#   model_id frm                       term          estimate std.error statistic     p.value
#      <int> <chr>                     <chr>            <dbl>     <dbl>     <dbl>       <dbl>
# 1        1 wt~factor(gear)+drat+qsec (Intercept)      9.25     2.17       4.27  0.000218   
# 2        1 wt~factor(gear)+drat+qsec factor(gear)4   -0.187    0.493     -0.378 0.708      
# 3        1 wt~factor(gear)+drat+qsec factor(gear)5   -0.703    0.518     -1.36  0.186      
# 4        1 wt~factor(gear)+drat+qsec drat            -1.03     0.425     -2.42  0.0227     
# 5        1 wt~factor(gear)+drat+qsec qsec            -0.121    0.0912    -1.32  0.196      
# 6        2 wt~factor(vs)+drat+qsec   (Intercept)      4.35     2.28       1.91  0.0663     
# 7        2 wt~factor(vs)+drat+qsec   factor(vs)1     -1.04     0.416     -2.49  0.0189     
# 8        2 wt~factor(vs)+drat+qsec   drat            -0.918    0.263     -3.49  0.00160    
# 9        2 wt~factor(vs)+drat+qsec   qsec             0.147    0.106      1.39  0.175      
# 10        3 wt~factor(am)+drat+qsec   (Intercept)      8.29     1.31       6.33  0.000000766
# # ... with 42 more rows

如果您更喜欢使用mapfrom purrrpackage 而不是do

expand.grid(outcomes, exposures, covariates) %>%
  group_by(Var1, Var2) %>%
  summarise(Var3 = paste0(Var3, collapse = "+")) %>%
  rowwise() %>%
  summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
  group_by(model_id = row_number()) %>%
  mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
  unnest() %>%
  ungroup()

请记住,这种方法的关键是创建公式。因此,如果您设法以稍微不同的方式指定变量并帮助使用比以前更少的代码创建公式,代码将变得更简单:

outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariate1 <- "drat"
covariate2 <- "qsec"

expand.grid(outcomes, exposures, covariate1, covariate2) %>%
  transmute(frm = paste0(Var1, "~factor(", Var2, ")+", Var3, "+", Var4)) %>%
  group_by(model_id = row_number()) %>%
  mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
  unnest() %>%
  ungroup()

推荐阅读