首页 > 解决方案 > 检查组中的任何日期是否在 r 中该组的特定时间间隔内

问题描述

我想创建一个新变量来指示 visit_date 是否在为 id 列出的任何日期范围内

我已使用此代码逐行比较,但我想扩展它并将 id 的所有行与为该 id 列出的所有间隔行进行比较

df <- df %>%
  group_by(id) %>%
  mutate(between_any = ifelse((visit_date >= start & visit_date <= end), 1))

我也尝试过在变异之前创建一个区间变量并使用crossing(visit_date, interval),但是我无法让cross为日期对象工作。

以下是一些示例数据:

df <- data.frame(id = c("a","a","a","a","a","b","b","b"),
                 visit_date = c("2001-08-22","2001-09-21","2001-10-30","2001-11-10","2001-12-20","2002-12-22", "2003-04-30","2003-05-10"),
                 start = c(NA,"2001-09-21",NA,"2001-11-10",NA,"2002-12-22", "2003-04-30",NA),
                 end = c(NA, "2001-11-01",NA,"2001-11-10",NA,"2002-12-22","2003-06-01",NA))

> df
id visit_date    start        end
a 2001-08-22       <NA>       <NA>
a 2001-09-21 2001-09-21 2001-11-01
a 2001-10-30       <NA>       <NA>
a 2001-11-10 2001-11-10 2001-11-10
a 2001-12-20       <NA>       <NA>
b 2002-12-22 2002-12-22 2002-12-22
b 2003-04-30 2003-04-30 2003-06-01
b 2003-05-10       <NA>       <NA>

我想要的输出如下:

id visit_date      start       end   between_any
a 2001-08-22       <NA>       <NA>      0
a 2001-09-21 2001-09-21 2001-11-01      1
a 2001-10-30       <NA>       <NA>      1
a 2001-11-10 2001-11-10 2001-11-10      1
a 2001-12-20       <NA>       <NA>      0
b 2002-12-22 2002-12-22 2002-12-22      1
b 2003-04-30 2003-04-30 2003-06-01      1
b 2003-05-10       <NA>       <NA>      1

提前致谢!

标签: rdplyrlubridate

解决方案


in_range包中的功能data.table正是这样做的......

library(data.table)

df <- df %>%
  group_by(id) %>%
  mutate(between_any = as.numeric((inrange(visit_date, start, end))))

#> df
#  id visit_date      start        end between_any
#1  a 2001-08-22       <NA>       <NA>           0
#2  a 2001-09-21 2001-09-21 2001-11-01           1
#3  a 2001-10-30       <NA>       <NA>           1
#4  a 2001-11-10 2001-11-10 2001-11-10           1
#5  a 2001-12-20       <NA>       <NA>           0
#6  b 2002-12-22 2002-12-22 2002-12-22           1
#7  b 2003-04-30 2003-04-30 2003-06-01           1
#8  b 2003-05-10       <NA>       <NA>           1

以 data.table 形式...

dt <- setDT(df)      
dt[, between_any := inrange(visit_date, start, end), 
     by = id]

推荐阅读