首页 > 解决方案 > 使用 dplyr 从每组的另一列中识别另一个日期之前的最近日期?

问题描述

我有一个结构如下的数据框:

example <- data.frame(id = c(1,1,1,2,2,2,3,3,3),
                         delivereddate = c("7/20/2019","7/24/2019","7/28/2019","3/24/2019","4/13/2019","4/25/2019","11/13/2019","11/20/2019","11/27/2019"),
                         applieddate = c("7/22/2019","7/22/2019","7/22/2019",NA,NA,NA,"11/21/2019","11/21/2019","11/21/2019"))

我正在尝试添加一个列,该列标识每个 ID 的应用日期之前的最新交付日期。我试图获得的最终结果的一个例子如下:

desiredresult <- data.frame(id = c(1,1,1,2,2,2,3,3,3),
                            delivereddate = c("7/20/2019","7/24/2019","7/28/2019","3/24/2019","4/13/2019","4/25/2019","11/13/2019","11/20/2019","11/27/2019"),
                            applieddate = c("7/22/2019","7/22/2019","7/22/2019",NA,NA,NA,"11/21/2019","11/21/2019","11/21/2019"),
                            applied = c(1,0,0,0,0,0,0,1,0))

我需要应用的列是二进制的(0 或 1),并且每个 id 只能有 1 行带有 1 标志。如果一个 id 没有应用日期,那么所有行的应用标志都是 0。

标签: rdplyr

解决方案


我们可以使用findInterval

library(dplyr)
library(lubridate)
example %>% 
      dplyr::group_by(id) %>% 
      dplyr::mutate(applied = +(row_number() %in% 
              findInterval(lubridate::mdy(first(applieddate)), 
                          lubridate::mdy(delivereddate))))
# A tibble: 9 x 4
# Groups:   id [3]
#     id delivereddate applieddate applied
#  <dbl> <chr>         <chr>         <int>
#1     1 7/20/2019     7/22/2019         1
#2     1 7/24/2019     7/22/2019         0
#3     1 7/28/2019     7/22/2019         0
#4     2 3/24/2019     <NA>              0
#5     2 4/13/2019     <NA>              0
#6     2 4/25/2019     <NA>              0
#7     3 11/13/2019    11/21/2019        0
#8     3 11/20/2019    11/21/2019        1
#9     3 11/27/2019    11/21/2019        0

推荐阅读