首页 > 解决方案 > dplyr 过滤器(多个条件)函数中缺少数据

问题描述

我正在尝试使用多个条件(精确 + 部分匹配)过滤我的数据集。但是filter函数 fromdplyr只返回符合条件的部分结果。这是一个例子:

df1 <- structure(list(Date = c("6/24/2020", "6/24/2020", "6/24/2020", "6/24/2020", "6/25/2020", "6/25/2020"), 
                      Market = c("A", "A", "A", "B", "B", "B"), Salesman = c("MF", "RP", "FR", "FR", "MF", "MF"), 
                      Product = c("* Apple", "Apple", "* Banana", "* Orange", "* Apple", "* Banana"), Quantity = c(20L, 15L, 20L, 20L, 10L, 15L), 
                      Price = c(1L,1L, 2L, 3L, 1L, 1L), Cost = c(0.5, 0.5, 0.5, 0.5, 0.6, 0.6)), 
                 class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))

以下代码应该返回 row13,但它只返回第一行:

library(tidyr)
df1 %>%
  filter(Salesman == c("MF","FR"),
         Market == "A",
         grepl("* ",Product))

它似乎grepl("* ",Product)导致了问题,但我需要它返回Product包含* .

标签: rdplyrtidyr

解决方案


仅适用于长度为 1的==向量。如果超过此长度,它将回收并导致意外输出。在这里,我们需要%in%第一种情况,在 中grepl*是一个元字符(零个或多个)。转义 ( \\*) 或将其放在方括号 ( [*]) 中或用于fixed = TRUE逐字评估。fixed = TRUE可能更快,所以我们在这里使用

library(dplyr)
df1 %>% 
     filter(Salesman %in% c("MF", "FR"),
            Market == "A",  
            grepl("*", Product, fixed = TRUE))
#   Date Market Salesman  Product Quantity Price Cost
#1 6/24/2020      A       MF  * Apple       20     1  0.5
#3 6/24/2020      A       FR * Banana       20     2  0.5

如果*应该从字符串的开头( )拾取^,那么我们可以将其转义*

df1 %>% 
     filter(Salesman %in% c("MF", "FR"),
            Market == "A",  
            grepl("^\\*", Product))
#       Date Market Salesman  Product Quantity Price Cost
#1 6/24/2020      A       MF  * Apple       20     1  0.5
#3 6/24/2020      A       FR * Banana       20     2  0.5

推荐阅读