首页 > 解决方案 > 如何删除数据集的重复值并计算每个值重复的次数?

问题描述

我每个月都会收到一个具有唯一参考 ID 的数据集,其中包含重复值。我必须删除重复的唯一 ID,并计算每个 ID 重复的次数。

name <- c("A","A","A","B","B","c","D","A")
age <- c(22,23,22,32,32,54,65,70)
sex <- c("m","f","f","m","m","f","m","f")
both <- data.frame(name,age,sex)
both

both[!duplicated(both$name),]

期望输出:

name    age sex count
A   70  f   4
B   32  m   2
C   54  f   1
D   65  m   1   

标签: r

解决方案


我们可以按'name'分组,得到频率计数(n()),然后filter是'sex'频率值最高的行,slice最后一行

library(dplyr)
both %>% 
    group_by(name) %>%
    group_by(n = n(), add = TRUE) %>%
    filter(sex == Mode(sex)) %>% 
    slice(n())
# A tibble: 4 x 4
# Groups:   name, n [4]
#  name    age sex       n
#  <fct> <dbl> <fct> <int>
#1 A        70 f         4
#2 B        32 m         2
#3 c        54 f         1
#4 D        65 m         1

在哪里

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
 }

推荐阅读