首页 > 解决方案 > R在for循环中使用dplyr group_by/ sum,输出为连接列表

问题描述

我正在使用 dplyr 包按周变量分组并获得三个变量的总和。输出应相互连接。这是我的数据框df:

week var1 var2 var3
 1    1    2    3
 1    2    2    3
 2    4    4    5
 2    2    2    6
 3    6    6    6
 3    4    4    4

我的命令是

calculate <- function(vars){
   x <- df %>% group_by(week) %>% summarise(summe = sum(vars))%>%mutate(group = paste(vars))
   x
}
cols <- c("var1", "var2", "var3")
for (i in 1:length(cols)){
var <- cols[i]
    cal <- calculate(var)
    total <- rbind(total,cal)   
} 

预期的输出应该是

 week summe group
   1    3    var1
   2    6    var1 
   3    10   var1
   1    4    var2
   2    6    var2
   3    10   var2
   1    6    var3
   2    11   var3
   3    10   var3

我的问题是:有没有更好的方法来代替使用 for 循环?

干杯,安迪

标签: rdplyr

解决方案


我们可以转向“长”格式,然后按“总和”进行分组

library(dplyr)
library(tidyr)
df %>%
    pivot_longer(cols = starts_with('var'), names_to = 'group') %>%
    group_by(week, group) %>%
    summarise(summe = sum(value)) %>%
    ungroup %>%
    arrange(group) %>%
    select(week, summe, group)
# A tibble: 9 x 3
#   week summe group
#  <int> <int> <chr>
#1     1     3 var1 
#2     2     6 var1 
#3     3    10 var1 
#4     1     4 var2 
#5     2     6 var2 
#6     3    10 var2 
#7     1     6 var3 
#8     2    11 var3 
#9     3    10 var3 

我们也可以sum先按“周”分组,然后转为“长”格式

df %>% 
   group_by(week) %>% 
   summarise_at(vars(-group_cols()), sum) %>% 
   pivot_longer(cols = starts_with('var'), names_to = 'group', values_to = 'summe')  %>% 
   select(week, summe, group)

数据

df <- structure(list(week = c(1L, 1L, 2L, 2L, 3L, 3L), var1 = c(1L, 
2L, 4L, 2L, 6L, 4L), var2 = c(2L, 2L, 4L, 2L, 6L, 4L), var3 = c(3L, 
3L, 5L, 6L, 6L, 4L)), class = "data.frame", row.names = c(NA, 
-6L))

推荐阅读