首页 > 解决方案 > 如何使用 dplyr 对这些变量进行分组以进行分组汇总?

问题描述

这是我的输入:

structure(list(Students = c(300L, 1600L, 100L, 90L, 2000L, 200L, 
300L, 340L, 1500L, 500L, 360L, 820L, 150L, 1380L, NA, 360L, 400L, 
1000L, 1600L, 142L, 250L, 2000L), Students_Primary = c(150L, 
NA, 100L, 90L, 800L, NA, NA, 150L, NA, 250L, 220L, 400L, NA, 
750L, NA, NA, NA, 600L, NA, 142L, NA, 500L), Chinese_Spoken = c("Mandarin", 
"Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", 
"Mandarin", "Mandarin", "Mandarin", "Cantonese", "Mandarin", 
"Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", 
"Mandarin", "Both", "Mandarin", "Both"), Chinese_Written = c("Simplified", 
"Traditional", "Simplified", "Traditional", "Both", "Traditional", 
"Traditional", "Simplified", "Simplified", NA, "Traditional", 
"Both", NA, "Both", "Both", "Simplified", "Both", "Traditional", 
"Traditional", "Traditional", "Simplified", "Both")), class = "data.frame", row.names = c(NA, 
-22L))

我试图总结有多少学生使用不同的中文写作,所以我尝试使用以下代码:

school %>% 
  select(Chinese_Written, Students) %>%
  group_by(Chinese_Written) %>% 
  arrange(Chinese_Written) %>% 
  na.omit()

它吐出这个:

   Chinese_Written Students
   <chr>              <int>
 1 Both                2000
 2 Both                 820
 3 Both                1380
 4 Both                 400
 5 Both                2000
 6 Simplified           300
 7 Simplified           100
 8 Simplified           340
 9 Simplified          1500
10 Simplified           360
11 Simplified           250
12 Traditional         1600
13 Traditional           90
14 Traditional          200
15 Traditional          300
16 Traditional          360
17 Traditional         1000
18 Traditional         1600
19 Traditional          142

有什么原因他们没有被组合在一起吗?我希望将所有“Both”、“Simplified”和“Traditional”分别归为一组。

标签: rdplyrtidyverse

解决方案


group_by alone does not do anything, it makes commands below be grouped by. So you can use summarise after to sum the variable Students by Chinese_Written

library(dplyr)

school %>% 
  group_by(Chinese_Written) %>% 
  summarise(Students = sum(Students,na.rm = TRUE))

# A tibble: 4 x 2
  Chinese_Written Students
  <chr>              <int>
1 Both                6600
2 Simplified          2850
3 Traditional         5292
4 NA                   650

推荐阅读