首页 > 解决方案 > 使用 case_when 在管道工作流中超过大量标准,而无需逐行

问题描述

我想使用管道工作流dplyr::case_when定期应用 10^4 行的数据集,并且想避免rowwise因为它很慢。

我想将大量标准与给定值匹配,例如any(x = c(1:999, 3000:200000, 250000:250100),其中length(x)1,并将其应用于 a 中的每一行data.frame

类似这个函数但有更多标准的东西:

is_good_car <- function(x){
  any(
    x == c(
      "Mazda RX4",
      "Datsun 710",
      "Valiant"
    )
  )
}

我可以这样应用它:

library(dplyr)

mtcars %>%
  mutate(
    car = rownames(.)
  ) %>%
  as_tibble %>%
  mutate(
    good_cars = case_when(
      is_good_car(car) ~ "good",
      TRUE ~ "rubbish"
    )
  ) %>%
  select(car, good_cars)
#> Warning in x == c("Mazda RX4", "Datsun 710", "Valiant"): longer object length is
#> not a multiple of shorter object length
#> # A tibble: 32 x 2
#>    car               good_cars
#>    <chr>             <chr>    
#>  1 Mazda RX4         good     
#>  2 Mazda RX4 Wag     good     
#>  3 Datsun 710        good     
#>  4 Hornet 4 Drive    good     
#>  5 Hornet Sportabout good     
#>  6 Valiant           good     
#>  7 Duster 360        good     
#>  8 Merc 240D         good     
#>  9 Merc 230          good     
#> 10 Merc 280          good     
#> # ... with 22 more rows

但这不起作用,因为它只返回一个TRUEfromis_good_car并将其返回到每一行。

我可以rowwise用来得到正确的答案,但对于我的目的来说它很慢:

mtcars %>%
  mutate(
    car = rownames(.)
  ) %>%
  as_tibble %>%
  rowwise %>%
  mutate(
    good_cars = case_when(
      is_good_car(car) ~ "good",
      TRUE ~ "rubbish"
    )
  ) %>%
  select(car, good_cars)
#> # A tibble: 32 x 2
#> # Rowwise: 
#>    car               good_cars
#>    <chr>             <chr>    
#>  1 Mazda RX4         good     
#>  2 Mazda RX4 Wag     rubbish  
#>  3 Datsun 710        good     
#>  4 Hornet 4 Drive    rubbish  
#>  5 Hornet Sportabout rubbish  
#>  6 Valiant           good     
#>  7 Duster 360        rubbish  
#>  8 Merc 240D         rubbish  
#>  9 Merc 230          rubbish  
#> 10 Merc 280          rubbish  
#> # ... with 22 more rows

我也可以使用sapply,但我想将它用于像上面这样的管道工作流:


sapply(
  X = rownames(mtcars),
  FUN = is_good_car
)
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>                TRUE               FALSE                TRUE               FALSE 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>               FALSE                TRUE               FALSE               FALSE 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>               FALSE               FALSE               FALSE               FALSE 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>               FALSE               FALSE               FALSE               FALSE 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>               FALSE               FALSE               FALSE               FALSE 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>               FALSE               FALSE               FALSE               FALSE 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>               FALSE               FALSE               FALSE               FALSE 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>               FALSE               FALSE               FALSE               FALSE

是否有任何选项可以is_good_carcase_when不使用的情况下根据需要使用功能rowwise

reprex 包(v2.0.1)于 2021-09-22 创建

标签: rdplyr

解决方案


要比较多个值,您应该使用%in%.

is_good_car <- function(x){
    x %in% c(
      "Mazda RX4",
      "Datsun 710",
      "Valiant"
  )
}

那么你可以在没有rowwise-的情况下使用它

library(dplyr)

mtcars %>%
  mutate(
    car = rownames(.)
  ) %>%
  as_tibble %>%
  mutate(
    good_cars = case_when(is_good_car(car) ~ "good",
      TRUE ~ "rubbish"
    )
  ) %>%
  select(car, good_cars)

#   car               good_cars
#   <chr>             <chr>    
# 1 Mazda RX4         good     
# 2 Mazda RX4 Wag     rubbish  
# 3 Datsun 710        good     
# 4 Hornet 4 Drive    rubbish  
# 5 Hornet Sportabout rubbish  
# 6 Valiant           good     
# 7 Duster 360        rubbish  
# 8 Merc 240D         rubbish  
# 9 Merc 230          rubbish  
#10 Merc 280          rubbish  
# … with 22 more rows

推荐阅读