首页 > 解决方案 > 如何根据不同的类找到多少个唯一值

问题描述

我的数据是来自大数据的集合。每个唯一 ID 包含一个或多个类,每个类包含一个或多个 X 的唯一值。但是,我们可能在不同的 ID 中有相同的类(即 ID009 和 ID020 具有相同的类)我试图找出有多少基于不同 ID 的每个类值的唯一 X 值。

ID <- c("ID004", "ID004", "ID004", "ID004", "ID004", "ID004", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009","ID020", "ID020", "ID020", "ID020", "ID020", "ID020", "ID020", "ID020", "ID023", "ID023", "ID023", "ID023", "ID023", "ID023","ID023", "ID023", "ID023", "ID023")
Class <- c("CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001","CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001","CMP-001", "CMP-001", "CMP-001", "CMP-002", "CMP-002", "CMP-002","CMP-002", "CMP-002", "CMP-005", "CMP-005", "CMP-005", "CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002","CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002","CMP-002", "CMP-004", "CMP-004", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001","CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001")
X <- c(1,1,2,3,3,3,4,4,4,4,4,4,4,4,5,5,6,6,6,7,7,8,9,9,10,10,10,10,10,11,11,12,12,13,13,14,14,15,15,15,16,16,17,17,18,18,18)
data <- data.frame(ID, Class, X)

结果应该是;

ID         class       No. of X value
ID004     CMP-001           3
ID006     CMP-001           1
          CMP-002           2
          CMP-005           2
ID009     CMP-002           2
ID020     CMP-002           3
          CMP-004           1
ID023     CMP-001           4

谢谢您的帮助,

标签: rstatistics

解决方案


在这里,n_distinct在使用 'ID'、'Class' 进行分组后会很有用

library(dplyr)
data %>% 
   group_by(ID, Class) %>%
   summarise(No_X_value = n_distinct(X), .groups = 'drop')

-输出

# A tibble: 8 x 3
#  ID    Class   No_X_value
#  <chr> <chr>        <int>
#1 ID004 CMP-001          3
#2 ID006 CMP-001          1
#3 ID006 CMP-002          2
#4 ID006 CMP-005          2
#5 ID009 CMP-002          2
#6 ID020 CMP-002          3
#7 ID020 CMP-004          1
#8 ID023 CMP-001          4

或与data.table

library(data.table)
setDT(data)[, .(No_X_value = uniqueN(X), .(ID, Class)]

base Raggregate

aggregate(X ~ ., unique(data), FUN = length)
#     ID   Class X
#1 ID004 CMP-001 3
#2 ID006 CMP-001 1
#3 ID023 CMP-001 4
#4 ID006 CMP-002 2
#5 ID009 CMP-002 2
#6 ID020 CMP-002 3
#7 ID020 CMP-004 1
#8 ID006 CMP-005 2

推荐阅读