首页 > 解决方案 > R dplyr:创建面板数据的计数

问题描述

我有以下数据框:

D <- data.frame("Id" = c("a","b","c","d","e","f","g"), "Group" = c("1","1","1","2","2","2","2"),"Time" = c("1","1","2","1","2","3","3"))

  Id Group Time
1  a     1    1
2  b     1    1
3  c     1    2
4  d     2    1
5  e     2    2
6  f     2    3
7  g     2    3

我想按周期和时间计算保持圆柱结构的个人数量。经典的方法是使用 dplyr

D %>% group_by(Group,Time) %>% tally()
  Group Time      n
  <fct> <fct> <int>
1 1     1         2
2 1     2         1
3 2     1         1
4 2     2         1
5 2     3         2

但结构不平衡:这里的第 1 组没有显示时间 3,但我希望看到它与 0 相关联,如下所示:

  Group Time      n
  <fct> <fct> <int>
1 1     1         2
2 1     2         1
3 1     3         0
4 2     1         1
5 2     2         1
6 2     3         2

有没有办法在 group_by 之后平衡结果?有人遇到过类似的事情吗?提前致谢

标签: rdplyrpanel

解决方案


由于Time是因子变量,我们可以使用countas.drop = FALSE默认情况下count会丢弃 0 个计数的观察。

library(dplyr)
D %>% count(Group, Time, .drop = FALSE)

#  Group Time      n
#  <fct> <fct> <int>
#1 1     1         2
#2 1     2         1
#3 1     3         0
#4 2     1         1
#5 2     2         1
#6 2     3         2

我们也可以使用相同的方法tally

D %>% group_by(Group,Time, .drop = FALSE) %>% tally()

或与complete

D %>%  count(Group, Time) %>% tidyr::complete(Group, Time, fill = list(n = 0))

推荐阅读