r - 查找组中个人的开始时间和结束时间之间的重叠时间
问题描述
我有
household person start time end time
1 1 07:45:00 21:45:00
1 2 09:45:00 17:45:00
1 3 22:45:00 23:45:00
1 4 08:45:00 01:45:00
1 1 06:45:00 19:45:00
2 1 07:45:00 21:45:00
2 2 016:45:00 22:45:00
我想找一列来查找家庭成员之间的重叠时间。
我需要该列作为一个或多个与另一个人有/有时间交集的人的索引。
在上面的例子中,第一家庭,第一,第二和第四个人的时间有交集。
输出:
household person start time end time overlap
1 1 07:45:00 21:45:00 2,4
1 2 09:45:00 17:45:00 1,4
1 3 22:45:00 23:45:00 NA
1 4 08:45:00 01:45:00 1,2
1 1 18:45:00 19:45:00 NA
2 1 07:45:00 21:45:00 2
2 2 016:45:00 22:45:00 1
NA 表示与其他家庭成员没有交集,可以为 0 或其他
解决方案
左加入输入DF
到自身加入同一家庭中的其他人并在重叠条件下。然后逐行将匹配的人连接成逗号分隔的字符串。
在没有解释什么构成重叠的情况下,我们尝试了三种不同的重叠定义。第三个最接近问题中显示的输出。
如果
end_time < start_time
那么之前end_time
和之后的所有内容start_time
都在要检查重叠的间隔内。然后根据连接的左右手边是否满足重叠条件,将重叠条件分解为4种情况。如果
start_time > end_time
在左侧或右侧,那么我们认为两者不重叠如果 end_time > start_time 然后反转它们并像以前一样执行重叠。
重叠的第一个重叠定义
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
when a.start_time <= a.end_time and b.start_time > b.end_time then
not (a.start_time between b.end_time and b.start_time and
a.end_time between b.end_time and b.start_time)
when a.start_time > a.end_time and b.start_time <= b.end_time then
not (b.start_time between a.end_time and a.start_time and
b.end_time between a.end_time and a.start_time)
else 1 end)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1,4
3 1 3 22:45:00 23:45:00 4
4 1 4 08:45:00 01:45:00 2,3
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
重叠的第二个重叠定义
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
else 0 end)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 <NA>
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
重叠的第三个定义
sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
select rowid, *,
min(start_time, end_time) as st,
max(start_time, end_time) as en
from DF)
select a.household, a.person, a.start_time, a.end_time,
group_concat(distinct b.person) as overlap
from DF2 a
left join DF2 b
on a.household = b.household and
a.person != b.person and
(a.st between b.st and b.en or
b.st between a.st and a.en)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2,4
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 1
5 1 1 06:45:00 19:45:00 2,4
6 2 1 07:45:00 21:45:00 2
7 2 2 16:45:00 22:45:00 1
笔记
我们假设可DF
重现形式的输入是:
DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L,
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00",
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"),
end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00",
"19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA,
-7L))
推荐阅读
- amazon-web-services - Agora 云录制不保存到 S3 并返回 404
- gcc - 疯狂寻找错误的路径
- python - Pandas pivot_table:过滤聚合函数
- javascript - 我无法使用 aws education aacount 访问 appsync 服务
- java - 为什么弹簧安全上下文不保留经过身份验证的自定义用户?
- cassandra - 为什么测试 node.js 应用程序比在 Astra CQL 控制台中运行查询慢?
- ag-grid - Ag-Grid 自动将超过 20 位的值转换为指数 e+ - 15 之后如何限制?
- asp.net-mvc - 外键未出现在视图中
- mysql - 聚合分组中的值
- jenkins - Jenkinsfile:要么转义文字美元符号“\$5”,要么将值表达式“${5}”括起来