首页 > 解决方案 > 基于两列过滤数据框

问题描述

我有一个df像这样的数据框

food    color   popular
apple   red     no
pear    green   no
banana  yellow  yes
apple   red     yes

如何仅基于两列(食物和颜色)获得不重复的行?

预期结果:

food    color   popular
pear    green   no
banana  yellow  yes

我试过了:

df %>% distinct(food, color, .keep_all = TRUE)

但这并没有给我预期的结果

标签: rdplyr

解决方案


library(dplyr)

# Create test data
df = tibble(
    food=c("apple", "pear", "bananna", "apple"),
    color=c("red", "green", "yellow", "red"),
    popular=c(F, F, T, T)
)

df %>%
    # Make a group for each combination of food and colour
    group_by(food, color) %>%
    # Then delete any group with more than 1 element
    # (since they are duplicates)
    filter(n() == 1) %>%
    ungroup()
# A tibble: 2 × 3
  food    color  popular
  <chr>   <chr>  <lgl>  
1 pear    green  FALSE  
2 bananna yellow TRUE 

推荐阅读