首页 > 解决方案 > Using case_when() within mutate_at() to recode several columns with different types of NA

问题描述

Given the data:

df <- structure(list(cola = structure(c(5L, 9L, 6L, 2L, 7L, 10L, 3L, 
8L, 1L, 4L), .Label = c("a", "b", "d", "g", "q", "r", "t", "w", 
"x", "z"), class = "factor"), colb = c(156L, 8L, 6L, 100L, 49L, 
31L, 189L, 77L, 154L, 171L), colc = c(0.207140279468149, 0.51990159181878, 
0.402017514919862, 0.382948065642267, 0.488511856179684, 0.263168515404686, 
0.38591041485779, 0.774066215148196, 0.763264901703224, 0.474355421960354
), cold = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("a", 
"b"), class = "factor")), class = "data.frame", row.names = c(NA, 
-10L))

df
#    cola colb      colc cold
# 1     q  156 0.2071403    a
# 2     x    8 0.5199016    b
# 3     r    6 0.4020175    a
# 4     b  100 0.3829481    b
# 5     t   49 0.4885119    a
# 6     z   31 0.2631685    b
# 7     d  189 0.3859104    a
# 8     w   77 0.7740662    b
# 9     a  154 0.7632649    a
# 10    g  171 0.4743554    b

If the value in colc in a particular row is >= 0.5, I would like to replace the contents of all the other cells in that row with NA, except for the contents of cold for that row (which I would like to retain as it is).

I attempted this with dplyr::mutate_at() and base::ifelse(), and it works fine:

df %>% mutate_at(vars(-c(cold)), funs(ifelse(colc >= 0.5, NA, .)))

#    cola colb      colc cold
# 1     5  156 0.2071403    a
# 2    NA   NA        NA    b
# 3     6    6 0.4020175    a
# 4     2  100 0.3829481    b
# 5     7   49 0.4885119    a
# 6    10   31 0.2631685    b
# 7     3  189 0.3859104    a
# 8    NA   NA        NA    b
# 9    NA   NA        NA    a
# 10    4  171 0.4743554    b

But I would like to do this with dplyr::case_when(), as I might have more than one replacement condition to fulfill (e.g., replace with "foo" if colc < 0.5 & colc >= 0.3. But case_when() does not appear to be playing nice:

df %>% mutate_at(vars(-c(cold)), funs(case_when(colc >= 0.5 ~ NA, TRUE ~ .)))

Error: must be a logical vector, not a factor object

Why is this happening and what can I do to fix it? I assume this is because I am trying to convert multiple columns with different data types to NA. I tried to look for a solution online, but I wasn't able to find one.

Edit: in specific, I would like to preserve the data types of the various columns as they are.

标签: rdplyrconditional-statementstypeerrorna

解决方案


library(dplyr)

df %>%
  mutate_at(vars(-c(cold)), ~ case_when(colc >= 0.5 ~ `is.na<-`(., TRUE), TRUE ~ .))

#    cola colb      colc cold
# 1     q  156 0.2071403    a
# 2  <NA>   NA        NA    b
# 3     r    6 0.4020175    a
# 4     b  100 0.3829481    b
# 5     t   49 0.4885119    a
# 6     z   31 0.2631685    b
# 7     d  189 0.3859104    a
# 8  <NA>   NA        NA    b
# 9  <NA>   NA        NA    a
# 10    g  171 0.4743554    b

描述

使用case_when赋值NA时,需要指定类型,NANA_integer_,,和。但是,同时转换多个列并且这些列具有不同的类型,因此您不能对所有列应用一个语句。理想情况下,可能存在诸如识别类型之类的东西,但到目前为止我还没有发现。这个方法有点棘手。我用来将输入向量转换为 NA,这些 NA 将与输入向量的类型相同。例如:NA_real_NA_complex_NA_character_mutate_atNA_guessis.na()

x <- 1:5
is.na(x) <- TRUE ; x
# [1] NA NA NA NA NA
class(x)
# [1] "integer"

y <- letters[1:5]
is.na(y) <- TRUE ; y
# [1] NA NA NA NA NA
class(y)
# [1] "character"

推荐阅读