首页 > 解决方案 > R 根据其他列删除重复项

问题描述

我想根据其他列的相似或不同来删除重复项。

所有重复的 ID 都应完全删除,但前提是它们具有不同的颜色。它们是否也有不同的子组也没关系。如果它们具有相同的 ID 和相同的颜色,则应保留第一个。

最后,我想要一个所有 ID 的列表,这些 ID 仅是单色的(独立于子组)。应删除所有彩色 ID。

这里和例子:

   id colour   subgroup
1   1    red   lightred
2   2   blue  lightblue
3   2   blue   darkblue
4   3    red   lightred
5   4    red    darkred
6   4    red    darkred
7   4   blue  lightblue
8   5  green  darkgreen
9   5  green  darkgreen
10  5  green lightgreen
11  6    red    darkred
12  6   blue   darkblue
13  6  green lightgreen

最后它应该是这样的:

  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen

我用于此示例的数据:

id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("red","blue","blue","red","red","red","blue","green","green","green","red","blue","green")
subgroup = c("lightred","lightblue","darkblue","lightred","darkred","darkred","lightblue","darkgreen","darkgreen","lightgreen","darkred","darkblue","lightgreen")
data = data.frame(cbind(id,colour,subgroup))

谢谢你的帮助!

标签: rduplicates

解决方案


library(tidyverse)
data%>%
  group_by(id)%>%
  filter(1==length(unique(colour)),!duplicated(colour))
# A tibble: 4 x 3
# Groups:   id [4]
  id    colour subgroup 
  <fct> <fct>  <fct>    
1 1     red    lightred 
2 2     blue   lightblue
3 3     red    lightred 
4 5     green  darkgreen

使用基础 R:

 subset(data,as.logical(ave(colour,id,FUN=function(x)length(unique(x))==1& !duplicated(x))))
  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen

推荐阅读