首页 > 解决方案 > 使用 R 按组检测重叠日期

问题描述

给定一个数据集

在此处输入图像描述

structure(list(intervention = c("Self Isolation", "Lockdown Low", 
"Lockdown Low", "Self Isolation", "Social Distancing", "Lockdown Low", 
"Social Distancing", "Handwashing"), date_start = structure(c(17897, 
17957, 18444, 17987, 17897, 17532, 17942, 18018), class = "Date"), 
    date_end = structure(c(17956, 18262, 18475, 18017, 17956, 
    18053, 18017, 18048), class = "Date")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -8L))

如何检查任何“干预”是否有重叠日期?在此示例中,所有干预措施都很好,但“保持社交距离”和“低锁定”

理想的输出将是一个数据框,每行一个干预,一列填充TRUE/FALSE取决于干预是否有任何重叠。

在此处输入图像描述

(tidyverse 解决方案的加分项。)

标签: rdate

解决方案


我们可以做一个summarise

library(dplyr)
df1 %>%
    arrange(intervention, date_start, date_end) %>% 
    group_by(intervention) %>%
    summarise(overlapping = any(date_start < lag(date_end, 
         default = first(date_end)) & row_number() != 1))
# A tibble: 4 x 2
#  intervention      overlapping
#  <chr>             <lgl>      
#1 Handwashing       FALSE      
#2 Lockdown Low      TRUE       
#3 Self Isolation    FALSE      
#4 Social Distancing TRUE       

推荐阅读