首页 > 解决方案 > 查找组中个人的开始时间和结束时间之间的重叠时间

问题描述

我有

     household       person     start time   end time
          1           1          07:45:00    21:45:00
          1           2          09:45:00    17:45:00
          1           3          22:45:00    23:45:00
          1           4          08:45:00    01:45:00
          1           1          06:45:00    19:45:00
          2           1          07:45:00    21:45:00
          2           2          016:45:00   22:45:00

我想找一列来查找家庭成员之间的重叠时间。

我需要该列作为一个或多个与另一个人有/有时间交集的人的索引。

在上面的例子中,第一家庭,第一,第二和第四个人的时间有交集。

输出:

      household       person     start time   end time      overlap
          1           1          07:45:00    21:45:00           2,4
          1           2          09:45:00    17:45:00           1,4
          1           3          22:45:00    23:45:00            NA
          1           4          08:45:00    01:45:00           1,2
          1           1          18:45:00    19:45:00            NA     
          2           1          07:45:00    21:45:00            2
          2           2          016:45:00   22:45:00            1

NA 表示与其他家庭成员没有交集,可以为 0 或其他

标签: rdataframe

解决方案


左加入输入DF到自身加入同一家庭中的其他人并在重叠条件下。然后逐行将匹配的人连接成逗号分隔的字符串。

在没有解释什么构成重叠的情况下,我们尝试了三种不同的重叠定义。第三个最接近问题中显示的输出。

  1. 如果end_time < start_time那么之前end_time和之后的所有内容start_time都在要检查重叠的间隔内。然后根据连接的左右手边是否满足重叠条件,将重叠条件分解为4种情况。

  2. 如果start_time > end_time在左侧或右侧,那么我们认为两者不重叠

  3. 如果 end_time > start_time 然后反转它们并像以前一样执行重叠。

重叠的第一个重叠定义

library(sqldf)

sqldf("select a.*, group_concat(distinct b.person) as overlap
  from DF a
  left join DF b 
    on a.household = b.household and 
       a.person != b.person and
       (case 
          when a.start_time <= a.end_time and b.start_time <= b.end_time then 
               (a.start_time between b.start_time and b.end_time or
               b.start_time between a.start_time and a.end_time)
          when a.start_time <= a.end_time and b.start_time > b.end_time then
               not (a.start_time between b.end_time and b.start_time and
               a.end_time between b.end_time and b.start_time)
          when a.start_time > a.end_time and b.start_time <= b.end_time then
               not (b.start_time between a.end_time and a.start_time and
               b.end_time between a.end_time and a.start_time)
          else 1 end)
  group by a.rowid")

给予:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00       2
2         1      2   09:45:00 17:45:00     1,4
3         1      3   22:45:00 23:45:00       4
4         1      4   08:45:00 01:45:00     2,3
5         1      1   06:45:00 19:45:00       2
6         2      1   07:45:00 21:45:00       2
7         2      2  016:45:00 22:45:00       1

重叠的第二个重叠定义

library(sqldf)

sqldf("select a.*, group_concat(distinct b.person) as overlap
  from DF a
  left join DF b 
    on a.household = b.household and 
       a.person != b.person and              
       (case
          when a.start_time <= a.end_time and b.start_time <= b.end_time then
               (a.start_time between b.start_time and b.end_time or
               b.start_time between a.start_time and a.end_time)
          else 0 end)
  group by a.rowid")

给予:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00       2
2         1      2   09:45:00 17:45:00       1
3         1      3   22:45:00 23:45:00    <NA>
4         1      4   08:45:00 01:45:00    <NA>
5         1      1   06:45:00 19:45:00       2
6         2      1   07:45:00 21:45:00       2
7         2      2  016:45:00 22:45:00       1

重叠的第三个定义

sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
  select rowid, *, 
    min(start_time, end_time) as st,
    max(start_time, end_time) as en
  from DF)

  select a.household, a.person, a.start_time, a.end_time, 
      group_concat(distinct b.person) as overlap
    from DF2 a
    left join DF2 b 
      on a.household = b.household and 
         a.person != b.person and                  
         (a.st between b.st and b.en or
          b.st between a.st and a.en)
    group by a.rowid")

给予:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00     2,4
2         1      2   09:45:00 17:45:00       1
3         1      3   22:45:00 23:45:00    <NA>
4         1      4   08:45:00 01:45:00       1
5         1      1   06:45:00 19:45:00     2,4
6         2      1   07:45:00 21:45:00       2
7         2      2   16:45:00 22:45:00       1

笔记

我们假设可DF重现形式的输入是:

DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L, 
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00", 
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"), 
    end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00", 
    "19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA, 
-7L))

推荐阅读