首页 > 解决方案 > 仅保留有条件的重复条目

问题描述

我正在清理数据集,我只需要保留那些重复 4 次的数据(如“a”和“b”),但是,我无法做到这一点。有人可以帮忙吗?

谢谢!

let <- c("a","a","a","a","b","b","b","b","c","c","c","d","d","e")
avg <- c(1,1,1,2,3,4,5,6,1,2,3,4,3,5)

sample <- data.frame(let,avg)

标签: r

解决方案


我们可以用data.table

library(data.table)
setDT(sample)[, .SD[.N >=4], let]
#   let avg
#1:   a   1
#2:   a   1
#3:   a   1
#4:   a   2
#5:   b   3
#6:   b   4
#7:   b   5
#8:   b   6

base R使用ave

sample[with(sample, ave(avg, let, FUN = length)>=4),]

或与table

subset(sample, let %in% names(which(rowSums(table(sample)) >=4)))

推荐阅读