首页 > 解决方案 > R dplyr:按组汇总所有变量的完整案例

问题描述

我想为数据集中的每个变量按组汇总变量,使用dplyr. 汇总变量应以新名称存储。

一个例子:

df <- data.frame(
  group = c("A", "B", "A", "B"),
  a = c(1,1,NA,2),
  b = c(1,NA,1,1),
  c = c(1,1,2,NA),
  d = c(1,2,1,1)
)

df %>% group_by(group) %>% 
  mutate(complete_a = sum(complete.cases(a))) %>% 
  mutate(complete_b = sum(complete.cases(b))) %>%
  mutate(complete_c = sum(complete.cases(c))) %>% 
  mutate(complete_d = sum(complete.cases(d))) %>% 
  group_by(group, complete_a, complete_b, complete_c, complete_d) %>% summarise()

导致我的预期输出:

# # A tibble: 2 x 5
# # Groups:   group, complete_a, complete_b, complete_c [?]
# group complete_a complete_b complete_c complete_d
# <fct>      <int>      <int>      <int>      <int>
# A              1          2          2          2
# B              2          1          1          2

如何在不重复mutate每个变量的语句的情况下生成相同的输出?

我试过:

df %>% group_by(group) %>% summarise_all(funs(sum(complete.cases(.))))

这有效但不重命名变量。

标签: rdplyr

解决方案


你快到了。你必须使用rename_all

library(dplyr)

df %>% 
  group_by(group) %>% 
  summarise_all(funs(sum(complete.cases(.)))) %>% 
  rename_all(~paste0("complete_", colnames(df)))

# A tibble: 2 x 5
#  complete_group complete_a complete_b complete_c complete_d
#  <fct>               <int>      <int>      <int>      <int>
#1 A                       1          2          2          2
#2 B                       2          1          1          2

编辑

或者正如@symbolrush 所指出的,更直接地没有colnames

df %>% 
  group_by(group) %>% 
  summarise_all(funs(sum(complete.cases(.)))) %>% 
  rename_all(~paste0("complete_", .))

## A tibble: 2 x 5
#  complete_group complete_a complete_b complete_c complete_d
#  <fct>               <int>      <int>      <int>      <int>
#1 A                       1          2          2          2
#2 B                       2          1          1          2

推荐阅读