首页 > 解决方案 > 月级别在 R 中以日级别汇总

问题描述

我有一个每日级别的数据集,我正在尝试为每月聚合没有价值的行创建一个标志。

例如,

您将如何编码以便contract_flag 对此进行评估?

        date  contract  budget_case  actual_case   contract_flag 
1  2017-01-01   F123         200        100              1     
2  2017-01-02   F123         200        100              1     
3  2017-01-03   F123         200        100              1     
4  2017-01-04   F123         200        100              1     
5  2017-01-05   F123         200        100              1     
6  2017-01-06   F123         200        100              1     
7  2017-01-29   F123         200        100              1     
8  2017-01-30   F123         200        100              1     
9  2017-01-01   K456         0          0                0     
10 2017-01-02   K456         0          0                0    
11 2017-01-03   K456         0          0                0     
12 2017-01-04   K456         0          0                0     
13 2017-01-05   K456         0          0                0   
14 2017-01-06   K456         0          0                0     
15 2017-01-29   K456         0          0                0     
16 2017-01-30   K456         0          0                0  

标签: r

解决方案


我们可以format.Date用来提取年份和月份并将其用作group_by变量:

library(dplyr)

df %>%
  group_by(Month = format.Date(date, format="%Y-%m")) %>%
  mutate(contract_flag = ifelse(sum(budget_case) > 0 & sum(actual_case) > 0, 1, 0))

或与case_when

df %>%
  group_by(Month = format.Date(date, format="%Y-%m")) %>%
  mutate(contract_flag = case_when(sum(budget_case) > 0 & sum(actual_case) > 0 ~ 1, TRUE ~ 0))

结果:

# A tibble: 16 x 6
# Groups:   Month [3]
   date       contract budget_case actual_case contract_flag Month  
   <chr>      <chr>          <int>       <int>         <dbl> <chr>  
 1 2017-01-01 F123             200         100             1 2017-01
 2 2017-01-02 F123             200         100             1 2017-01
 3 2017-01-03 F123             200         100             1 2017-01
 4 2017-01-04 F123             200         100             1 2017-01
 5 2017-01-05 F123             200         100             1 2017-01
 6 2017-01-06 F123             200         100             1 2017-01
 7 2017-01-29 F123             200         100             1 2017-01
 8 2017-01-30 F123             200         100             1 2017-01
 9 2017-01-01 K456               0           0             1 2017-01
10 2017-01-02 K456               0           0             1 2017-01
11 2017-01-03 K456               0           0             1 2017-01
12 2017-01-04 K456               0           0             1 2017-01
13 2017-01-05 K456               0           0             1 2017-01
14 2017-02-06 K456               0         100             1 2017-02
15 2017-02-28 K456             200           0             1 2017-02
16 2017-03-30 K456               0           0             0 2017-03

数据(修改日期列以获取更多月份示例):

df <- structure(list(date = c("2017-01-01", "2017-01-02", "2017-01-03", 
"2017-01-04", "2017-01-05", "2017-01-06", "2017-01-29", "2017-01-30", 
"2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05", 
"2017-02-06", "2017-02-28", "2017-03-30"), contract = c("F123", 
"F123", "F123", "F123", "F123", "F123", "F123", "F123", "K456", 
"K456", "K456", "K456", "K456", "K456", "K456", "K456"), budget_case = c(200L, 
200L, 200L, 200L, 200L, 200L, 200L, 200L, 0L, 0L, 0L, 0L, 0L, 
0L, 200L, 0L), actual_case = c(100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L, 0L, 0L, 0L, 0L, 0L, 100L, 0L, 0L), contract_flag = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("date", 
"contract", "budget_case", "actual_case", "contract_flag"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16"))

推荐阅读