首页 > 解决方案 > 如何根据其他变量标记/删除特定重复项

问题描述

我想知道如何根据列中的特定值删除特定行,但这些删除取决于子组中的其他变量。如果"aja"与"ase"分组在一起,我想删除它。如果子组同时具有“ase”或“aja”,则脚本应不理会它。我已经指出脚本应该删除哪些。

   id  somedata  subgroup
1  1   "aja"     okay
2  1   "aja"     okay
3  2   "ase"     okay
4  2   "aja"     delete
5  3   "aja"     delete
6  3   "ase"     okay
7  4   "aja"     okay
8  4   "aja"     okay
9  5   "ase"     okay
10 5   "ase"     okay
11 6   "aja"     delete
12 6   "ase"     okay




Code to generate the data

    id = c(1,1,2,2,3,3,4,4,5,5,6,6)
    somedata = c("aja","aja","ase","aja","aja","ase","aja","aja","ase","ase","aja","ase")
    subgroup = c("okay","okay","okay","DELETE","DELETE","okay","okay","okay","okay","okay","DELETE","okay")
    proov = data.frame(cbind(id,somedata,subgroup))

标签: rtidyverse

解决方案


你可以做一个简单的过滤,即

library(dplyr)

proov %>% 
 group_by(id) %>% 
 filter(!(n_distinct(somedata) > 1 & somedata == 'aja'))

这使,

# A tibble: 9 x 3
# Groups:   id [6]
  id    somedata subgroup
  <fct> <fct>    <fct>   
1 1     aja      okay    
2 1     aja      okay    
3 2     ase      okay    
4 3     ase      okay    
5 4     aja      okay    
6 4     aja      okay    
7 5     ase      okay    
8 5     ase      okay    
9 6     ase      okay    

推荐阅读