首页 > 解决方案 > group_by(月和年)和 summarise_if 不会留下单个月的平均值

问题描述

我很抱歉我的标题含糊不清,但我不知道我需要如何解决这个问题。

我有一个每年每月政府回复的数据框,我想获得每月每列的平均值。

现在我认为我的语法是正确的,只是我的结果不会让我每年只有一个月,而是多个月,而且我一生都无法掌握自己做错了什么。

我正在运行的代码是:

test <- ds %>%
  group_by(month, year, C1_calculated, C2_calculated, C3_calculated, C4_calculated, C5_calculated, C6_calculated, C7_calculated, C8_calculated, E1_calculated, E2_calculated) %>%
  summarise_if(is.integer, mean) %>%
  ungroup

结果如下:

test2 <- structure(list(month = c(1, 1, 1, 2, 2, 2), year = c(2020, 2021, 
2021, 2020, 2021, 2021), C1_calculated = c(0, 11.1111111111111, 
11.1111111111111, 0, 7.40740740740741, 11.1111111111111), C2_calculated = c(0, 
11.1111111111111, 11.1111111111111, 0, 11.1111111111111, 11.1111111111111
), C3_calculated = c(0, 11.1111111111111, 11.1111111111111, 0, 
11.1111111111111, 11.1111111111111), C4_calculated = c(0, 11.1111111111111, 
11.1111111111111, 0, 11.1111111111111, 11.1111111111111), C5_calculated = c(0, 
5.55555555555556, 5.55555555555556, 0, 5.55555555555556, 5.55555555555556
), C6_calculated = c(0, 3.7037037037037, 7.40740740740741, 0, 
7.40740740740741, 7.40740740740741), C7_calculated = c(0, 5.55555555555556, 
5.55555555555556, 0, 5.55555555555556, 5.55555555555556), C8_calculated = c(0, 
9.72222222222222, 9.72222222222222, 0, 9.72222222222222, 9.72222222222222
), E1_calculated = c(0, 50, 50, 0, 50, 50), E2_calculated = c(0, 
37.5, 37.5, 0, 37.5, 37.5)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

如您所见,我每年有多个同一月份的实例,这并没有真正的帮助。

这是我的数据集的 20 行的随机样本:

ds <- structure(list(C1_calculated = c(3.7037037037037, 7.40740740740741, 
3.7037037037037, 7.40740740740741, 3.7037037037037, 3.7037037037037, 
11.1111111111111, 11.1111111111111, 3.7037037037037, 7.40740740740741, 
3.7037037037037, 3.7037037037037, 3.7037037037037, 3.7037037037037, 
11.1111111111111, 7.40740740740741, 3.7037037037037, 7.40740740740741, 
0, 7.40740740740741), C2_calculated = c(7.40740740740741, 7.40740740740741, 
7.40740740740741, 11.1111111111111, 7.40740740740741, 7.40740740740741, 
7.40740740740741, 7.40740740740741, 7.40740740740741, 7.40740740740741, 
7.40740740740741, 3.7037037037037, 7.40740740740741, 7.40740740740741, 
7.40740740740741, 7.40740740740741, 7.40740740740741, 11.1111111111111, 
0, 7.40740740740741), C3_calculated = c(0, 11.1111111111111, 
11.1111111111111, 11.1111111111111, 5.55555555555556, 2.77777777777778, 
11.1111111111111, 11.1111111111111, 5.55555555555556, 11.1111111111111, 
11.1111111111111, 5.55555555555556, 0, 11.1111111111111, 11.1111111111111, 
11.1111111111111, 11.1111111111111, 11.1111111111111, 0, 11.1111111111111
), C4_calculated = c(5.55555555555556, 11.1111111111111, 11.1111111111111, 
11.1111111111111, 0, 11.1111111111111, 11.1111111111111, 11.1111111111111, 
0, 11.1111111111111, 11.1111111111111, 0, 5.55555555555556, 11.1111111111111, 
11.1111111111111, 8.33333333333333, 11.1111111111111, 11.1111111111111, 
0, 8.33333333333333), C5_calculated = c(0, 5.55555555555556, 
5.55555555555556, 5.55555555555556, 5.55555555555556, 0, 5.55555555555556, 
5.55555555555556, 0, 5.55555555555556, 5.55555555555556, 0, 0, 
5.55555555555556, 5.55555555555556, 0, 5.55555555555556, 5.55555555555556, 
0, 0), C6_calculated = c(3.7037037037037, 3.7037037037037, 3.7037037037037, 
7.40740740740741, 0, 3.7037037037037, 7.40740740740741, 7.40740740740741, 
0, 7.40740740740741, 3.7037037037037, 0, 3.7037037037037, 3.7037037037037, 
7.40740740740741, 3.7037037037037, 7.40740740740741, 7.40740740740741, 
0, 3.7037037037037), C7_calculated = c(0, 5.55555555555556, 5.55555555555556, 
5.55555555555556, 0, 0, 5.55555555555556, 5.55555555555556, 0, 
5.55555555555556, 5.55555555555556, 0, 0, 5.55555555555556, 5.55555555555556, 
5.55555555555556, 5.55555555555556, 5.55555555555556, 0, 5.55555555555556
), C8_calculated = c(8.33333333333333, 8.33333333333333, 9.72222222222222, 
9.72222222222222, 9.72222222222222, 8.33333333333333, 8.33333333333333, 
8.33333333333333, 9.72222222222222, 9.72222222222222, 9.72222222222222, 
9.72222222222222, 8.33333333333333, 9.72222222222222, 8.33333333333333, 
8.33333333333333, 9.72222222222222, 9.72222222222222, 0, 8.33333333333333
), E1_calculated = c(50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 
50, 50, 50, 50, 50, 50, 50, 50, 0, 50), E2_calculated = c(62.5, 
62.5, 37.5, 37.5, 12.5, 37.5, 62.5, 62.5, 37.5, 37.5, 37.5, 37.5, 
62.5, 37.5, 62.5, 62.5, 37.5, 37.5, 0, 62.5), month = c(8, 5, 
5, 2, 8, 9, 5, 4, 7, 4, 6, 7, 8, 6, 5, 6, 4, 2, 1, 6), year = c(2020, 
2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021, 2021, 2021, 
2020, 2021, 2020, 2020, 2021, 2021, 2020, 2020)), row.names = c(227L, 
145L, 488L, 423L, 586L, 272L, 122L, 104L, 565L, 464L, 520L, 555L, 
224L, 539L, 124L, 157L, 482L, 417L, 14L, 162L), class = "data.frame")

提前感谢您的所有帮助!

标签: r

解决方案


你的数据:

test2 <- structure(list(month = c(1, 1, 1, 2, 2, 2), year = c(2020, 2021, 
 2021, 2020, 2021, 2021), C1_calculated = c(0, 11.1111111111111, 
 11.1111111111111, 0, 7.40740740740741, 11.1111111111111), C2_calculated = c(0, 
 11.1111111111111, 11.1111111111111, 0, 11.1111111111111, 11.1111111111111
 ), C3_calculated = c(0, 11.1111111111111, 11.1111111111111, 0, 
 11.1111111111111, 11.1111111111111), C4_calculated = c(0, 11.1111111111111, 
 11.1111111111111, 0, 11.1111111111111, 11.1111111111111), C5_calculated = c(0, 
 5.55555555555556, 5.55555555555556, 0, 5.55555555555556, 5.55555555555556
 ), C6_calculated = c(0, 3.7037037037037, 7.40740740740741, 0, 
 7.40740740740741, 7.40740740740741), C7_calculated = c(0, 5.55555555555556, 
 5.55555555555556, 0, 5.55555555555556, 5.55555555555556), C8_calculated = c(0, 
 9.72222222222222, 9.72222222222222, 0, 9.72222222222222, 9.72222222222222
 ), E1_calculated = c(0, 50, 50, 0, 50, 50), E2_calculated = c(0, 
 37.5, 37.5, 0, 37.5, 37.5)), row.names = c(NA, -6L), class = c("tbl_df", 
 "tbl", "data.frame"))

总结:

library(tidyverse)

test2 %>% 
  group_by(month, year) %>% 
  summarise_at(vars(starts_with("C")), mean)

输出

# A tibble: 4 x 10
# Groups:   month [2]
  month  year C1_calculated C2_calculated C3_calculated C4_calculated C5_calculated C6_calculated C7_calculated C8_calculated
  <dbl> <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
1     1  2020          0              0             0             0            0             0             0             0   
2     1  2021         11.1           11.1          11.1          11.1          5.56          5.56          5.56          9.72
3     2  2020          0              0             0             0            0             0             0             0   
4     2  2021          9.26          11.1          11.1          11.1          5.56          7.41          5.56          9.72

推荐阅读