首页 > 解决方案 > How to sum data per month for a certain group?

问题描述

I have searched for similar questions but none really ask what I need to know.

My question is how can I add the values of one column of every month for a certain group. My data set has 3 columns: Date, Province and Total reported infections. Now, I need the Total reported infections per month per province, and I am not quite sure how to do this.

I hope this makes sense

Here is a random sample of my data set:

ds <- structure(list(Date_of_publication = c("2021-04-05", "2020-09-16", 
"2020-05-21", "2021-04-11", "2020-04-05", "2021-04-23", "2021-06-17", 
"2021-07-25", "2021-02-08", "2021-01-17", "2021-02-25", "2021-01-16", 
"2021-07-11", "2021-08-10", "2020-11-02", "2020-07-04", "2020-03-01", 
"2021-01-22", "2021-07-25", "2021-01-14"), Province = c("Noord-Brabant", 
"Limburg", "Flevoland", "Noord-Holland", "Noord-Holland", "Zuid-Holland", 
"Utrecht", "Friesland", "Drenthe", "Flevoland", "Noord-Holland", 
"Overijssel", "Zuid-Holland", "Zuid-Holland", "Utrecht", "Noord-Holland", 
"Overijssel", "Limburg", "Gelderland", "Noord-Brabant"), Total_reported = c(66L, 
3L, 0L, 26L, 1L, 16L, 0L, 18L, 6L, 15L, 24L, 19L, 1L, 8L, 0L, 
0L, 0L, 18L, 6L, 12L)), class = "data.frame", row.names = c(NA, 
-20L))

标签: r

解决方案


format使用和sumTotal_reported每个月和提取日期和月份Province

使用dplyr-

library(dplyr)

ds %>%
  group_by(year_month = format(as.Date(Date_of_publication), '%b %Y'), Province) %>%
  summarise(Total_reported = sum(Total_reported, na.rm = TRUE)) %>%
  ungroup

#  year_month Province      Total_reported
#   <chr>      <chr>                  <int>
# 1 Apr 2020   Noord-Holland              1
# 2 Apr 2021   Noord-Brabant             66
# 3 Apr 2021   Noord-Holland             26
# 4 Apr 2021   Zuid-Holland              16
# 5 Aug 2021   Zuid-Holland               8
# 6 Feb 2021   Drenthe                    6
# 7 Feb 2021   Noord-Holland             24
# 8 Jan 2021   Flevoland                 15
# 9 Jan 2021   Limburg                   18
#10 Jan 2021   Noord-Brabant             12
#11 Jan 2021   Overijssel                19
#12 Jul 2020   Noord-Holland              0
#13 Jul 2021   Friesland                 18
#14 Jul 2021   Gelderland                 6
#15 Jul 2021   Zuid-Holland               1
#16 Jun 2021   Utrecht                    0
#17 Mar 2020   Overijssel                 0
#18 May 2020   Flevoland                  0
#19 Nov 2020   Utrecht                    0
#20 Sep 2020   Limburg                    3

或以 R 为基数 -

aggregate(Total_reported ~ year_month + Province, 
      transform(ds, year_month = format(as.Date(Date_of_publication), '%b %Y')), 
      sum, na.rm = TRUE)

推荐阅读