首页 > 解决方案 > 对某些列的跨行求和,如果全部为 NA,则保留 NA

问题描述

我有看起来像这样的临床数据......我有一堆不同的二元结果,但我只想总结一些结果来创建总结果/综合得分。我的数据看起来像这样

``patientid <- c(100,101,102,103,104,105,106)
outcome1 <- c(0,NA,1,0,1,NA,1)
outcome2 <- c(0,1,1,0,0,NA,1) 
outcome3 <- c(0,NA,NA,0,1,NA,0)
outcome4 <- c(NA,NA,NA,0,1,NA,0)
Data<-data.frame(patientid=patientid,outcome1=outcome1,outcome2=outcome2,outcome3=outcome3,outcome4=outcome4)
Data''

现在我想为其中三个结果创建一个综合得分。NA 应该算作零,除非在选择求和的每个结果中它都是 NA,在这种情况下它将保持 NA。我假设这是用rowsums完成的?这是我的愿望数据库应该是什么样的(仅总结结果 1、2、4)

``patientid <- c(100,101,102,103,104,105,106)
  outcome1 <- c(0,NA,1,0,1,NA,1)
  outcome2 <- c(0,1,1,0,1,NA,1) 
  outcome3 <- c(0,NA,NA,0,1,NA,0)
  outcome4 <- c(NA,NA,NA,0,1,NA,0)
  composite <- c(0,1,2,0,3,NA,2)
 data.frame(patientid=patientid,outcome1=outcome1,outcome2=outcome2,outcome3=outcome3,outcome4=outcome4, composite= composite)
    Data''

标签: rdata-cleaning

解决方案


library(tidyverse)

Data %>%
  rowwise() %>%
  mutate(
    Composite = if_else(
      c(outcome1, outcome2, outcome4) %>% is.na() %>% mean() %>% `==`(1), # looking for cases where all columns are NA
      NA_real_, # all NA columns produce NA
      c(outcome1, outcome2, outcome4) %>% sum(na.rm = T) # for other columns, NAs are treated as 0s
      )
  )

#  patientid outcome1 outcome2 outcome3 outcome4 composite
#1       100        0        0        0       NA         0
#2       101       NA        1       NA       NA         1
#3       102        1        1       NA       NA         2
#4       103        0        0        0        0         0
#5       104        1        0        1        1         2
#6       105       NA       NA       NA       NA        NA
#7       106        1        1        0        0         2

推荐阅读