首页 > 解决方案 > R:自动化表的几个多变量逻辑回归的结果

问题描述

structure(list(Number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15), age = c(25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39), sex = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 
0, 1, 0, 1, 0), bmi = c(35, 32, 29, 26, 23, 20, 17, 35, 32, 29, 
26, 23, 20, 17, 21), Phenotype1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 1), `Phenotype 2` = c(0, 1, 0, 1, 0, 1, 0, 1, 
0, 1, 0, 1, 1, 1, 1), `Phenotype 3` = c(1, 0, 1, 0, 1, 1, 1, 
1, 1, 1, 1, 0, 0, 0, 0), `Phenotype 4` = c(0, 0, 0, 0, 1, 1, 
0, 1, 0, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 15 x 8
   Number   age   sex   bmi Phenotype1 `Phenotype 2` `Phenotype 3` `Phenotype 4`
    <dbl> <dbl> <dbl> <dbl>      <dbl>         <dbl>         <dbl>         <dbl>
 1      1    25     0    35          0             0             1             0
 2      2    26     1    32          0             1             0             0
 3      3    27     0    29          0             0             1             0
 4      4    28     1    26          0             1             0             0
 5      5    29     0    23          0             0             1             1
 6      6    30     1    20          0             1             1             1
 7      7    31     0    17          0             0             1             0
 8      8    32     1    35          0             1             1             1
 9      9    33     0    32          0             0             1             0
10     10    34     1    29          0             1             1             1
11     11    35     0    26          0             0             1             1
12     12    36     1    23          0             1             0             1
13     13    37     0    20          1             1             0             1
14     14    38     1    17          1             1             0             1
15     15    39     0    21          1             1             0             1

大家好,我有一个包含 100 位患者(此处显示 15 位)、3 个协变量和 50 个表型(此处显示 4 位)的数据集。我想使用年龄、性别和 BMI 作为协变量对每个表型执行多变量逻辑回归,我想得到一个这样的表,其中我有每个协变量的 p 值、OR 和置信区间 (CI) . 在此处输入图像描述

我只是不知道如何开始。非常感谢您的帮助!

最好的,卡罗

标签: r

解决方案


我写了一个应该完成你需要的函数。可能有更优雅和更类似于 R 的方法来执行此操作,但这种方法在我的测试中有效:

## Load libraries
library(broom)
library(tidyr)
library(dplyr)


## Define a function to create your summary table
summary_table <- function(x) {
  
  # Capture number of columns passed to the function
  num_vars <- ncol(x)
  
  # Pre-define lists that will be populated and then collapsed by rest of function
  models <- vector("list", length = num_vars)
  first_tables <- vector("list", length = num_vars)
  second_tables <- vector("list", length = num_vars)
  
  # Loop to create each row for the final table
  for (i in 1:num_vars) {
      
    models[[i]] <- glm(x[[i]] ~ age + sex + bmi, family = "binomial", data = df)
      
    first_tables[[i]] <- broom::tidy(models[[i]])
    first_tables[[i]]$OR <- exp(first_tables[[i]]$estimate)
    first_tables[[i]]$CI1 <- exp(first_tables[[i]]$estimate - (1.96 * first_tables[[i]]$std.error))
    first_tables[[i]]$CI2 <- exp(first_tables[[i]]$estimate + (1.96 * first_tables[[i]]$std.error))
      
    first_tables[[i]] <- as.data.frame(first_tables[[i]][first_tables[[i]]$term != "(Intercept)", c("term", "p.value", "OR", "CI1", "CI2")])[1:3,]
      
      
    second_tables[[i]] <- first_tables[[i]] %>% 
                            pivot_wider(names_from = term, values_from = c("p.value", "OR", "CI1", "CI2")) %>%
                            select("p.value_age", "OR_age", "CI1_age", "CI2_age", "p.value_bmi", "OR_bmi", "CI1_bmi", "CI2_bmi",
                                   "p.value_sex", "OR_sex", "CI1_sex", "CI2_sex")
    
  } 
  
  # Combine the rows together into a final table
  final_table <- do.call("rbind", second_tables)
  final_table <- round(final_table, 3)
  row.names(final_table) <- rep(paste0("Phenotype", 1:num_vars))
  
  return(final_table)

}

## Let "df" be your data.frame with 100 rows and 54 columns

## Use the summary_table() function, passing in the 50 columns containing your Phenotype outcome vars (I assumed they're in columns 5:54)
final_table <- summary_table(df[5:54])

## Write the final table to your working directory as a CSV
write.csv(final_table, "final_table.csv")

推荐阅读