首页 > 解决方案 > conditional matching between variables in dplyr

问题描述

I am trying to find observations within a column that have certain or all the possible values within another column. In this tibble

parties <- tibble(class = c("R","R","R","R","R","K","K","K","K","K","K",
                "L","L","L","L"),
       name = c("Party1", "Party2","Party3","Party4","Party5",
               "Party2", "Party4", "Party6","Party7","Party8","Party9",
                "Party2","Party3","Party4","Party10"))

I want to find all the "parties" that are in all three classes "R", "K" and "L". Or generally parties that are in class "X" or "Y". I managed to find a solution, using group_split(class), then extracting each table from the list and then lastly performing two semi_joins. That is for the case when I want parties that are in all three classes:

parties_split <- parties %>%
  group_split(class)

parties_K <- parties_split[[1]]
parties_L <- parties_split[[2]]
parties_R <- parties_split[[3]]

semi_join(parties_K,parties_L, by = "name") %>%
  semi_join(parties_R, by = "name") %>%
  select(-class)

name
<chr>
Party2              
Party4

This would work in this case but would not be efficient especially if the number of classes (or observations) that need to match are much larger than three. I am looking in particular for solutions in tidyverse. Any ideas? Thanks

标签: rdplyrtibble

解决方案


试试看:

parties %>% 
  group_by(name) %>% 
  filter("K" %in% class, 
         "R" %in% class, 
         "L" %in% class) %>% 
  summarise()

# A tibble: 2 x 1
  name  
  <chr> 
1 Party2
2 Party4

编辑:如果您想与超过 3 方合作,您还可以使用:

mask = c("K", "R", "L")
parties %>% 
  group_by(name) %>% 
  filter(all(mask %in% class)) %>% 
  summarise()

推荐阅读