首页 > 解决方案 > 按变化计数汇总数据?

问题描述

我正在尝试汇总我的数据以说明教师在他们的日程安排中的不同课程数量。

基本上我的数据如下所示:

Id | Subject
123| algebra
123| geometry
123| algebra II
456| calc
456| calc
789| geometry
789| geometry
789| calc

and I need it to look like this:

Id | Subject count
123| 3
456| 1
789| 2

I have no idea where to start because I don't want it to simply count the number of courses they teach, I want the DIFFERENT courses. Please help!

标签: rcountaggregate

解决方案


我们可以按“Id”分组并获得“主题”的不同n_distinct计数summarise

library(dplyr)
df1 %>%
  group_by(Id) %>%
  summarise(Subject_Count = n_distinct(Subject))
# A tibble: 3 x 2
#     Id Subject_Count
#  <int>         <int>
#1   123             3
#2   456             1
#3   789             2

或者使用data.table, 转换为data.table( setDT(df1)),按“Id”分组,得到不同的计数uniqueN

library(data.table)
setDT(df1)[,.(Subject_Count = uniqueN(Subject)), by = Id]

数据

df1 <- structure(list(Id = c(123L, 123L, 123L, 456L, 456L, 789L, 789L, 
789L), Subject = c("algebra", "geometry", "algebra II", "calc", 
"calc", "geometry", "geometry", "calc")), class = "data.frame",
row.names = c(NA, 
-8L))

推荐阅读