首页 > 解决方案 > str_detect also finding NA in filter

问题描述

I want to filter out rows where a column contains a string. I am using a tidyverse solution. The problem I'm having is str_detect also seems to be finding NA results and so these are also removed by my filter:

df1 = data.frame(x1 = c("PI", NA, "Yes", "text"),
                 x2 = as.character(c(NA, 1, NA,"text")),
                 x3 = c("foo", "bar","foo", "bar"))

> df1
    x1   x2  x3
1   PI <NA> foo
2 <NA>    1 bar
3  Yes <NA> foo
4 text text bar

#remove rows which have "PI" in column `x1`:

df2 = df1%>%
  filter(!str_detect(x1, "(?i)pi"))

> df2
    x1   x2  x3
1  Yes <NA> foo
2 text text bar

How do I prevent str_detect finding NA?

标签: rfilterstringr

解决方案


Add a condition with is.na and |. The NA issue is just because for NA elements, the str_detect returns NA, which gets automatically removed by filter

library(dplyr)
library(stringr)
df1 %>%
    filter(is.na(x1) |
       str_detect(x1, regex("pi", ignore_case = TRUE), negate = TRUE))

-output

   x1   x2  x3
1 <NA>    1 bar
2  Yes <NA> foo
3 text text bar

i.e. check the output of str_detect

with(df1, str_detect(x1, regex("pi", ignore_case = TRUE), negate = TRUE))
[1] FALSE    NA  TRUE  TRUE

The NA remains as such unless we make it TRUE

 with(df1, str_detect(x1, regex("pi", ignore_case = TRUE), negate = TRUE)|is.na(x1))
[1] FALSE  TRUE  TRUE  TRUE

Or another option is to coalesce with TRUE so that all the NA elements in str_detect will change to TRUE value

df1 %>% 
   filter(coalesce(str_detect(x1, regex("pi", ignore_case = TRUE), 
       negate = TRUE), TRUE))
    x1   x2  x3
1 <NA>    1 bar
2  Yes <NA> foo
3 text text bar

推荐阅读