r - mutate:在分组的df(dplyr)中选择除当前行之外的所有行
问题描述
我想标记以下重叠:在同一支球队中有重叠的恢复时间间隔的球员。
这是我的代码:
library(tidyverse)
library(lubridate)
# data
df<- data.frame(times=c("01/01/2020","01/02/2020","01/07/2020","01/10/2020","01/01/2020","01/03/2020","01/05/2020","01/07/2020"))
starttime= mdy(df$times)
df <- tibble(team=c("A","A","A","A","B","B","B","B"),
player=c(1,2,3,4,1,2,3,4),
starttime= starttime,
stoptime= starttime+days(1))
# find overlaps
df %>%
mutate(interval=lubridate::interval(starttime,stoptime)) %>% #calculate interval
group_by(team) %>%
mutate(overlap_flag= case_when(
sum(starttime %within% as.list(interval)) == 0 ~ 0, # I want to chose as.list(interval[except actual row])
sum(starttime %within% as.list(interval)) > 0 ~ 1, # I want to chose as.list(interval[except actual row])
TRUE ~ NA_real_))
给我:
# A tibble: 8 x 6
# Groups: team [2]
team player starttime stoptime interval overlap_flag
<chr> <dbl> <date> <date> <Interval> <dbl>
1 A 1 2020-01-01 2020-01-02 2020-01-01 UTC--2020-01-02 UTC 1
2 A 2 2020-01-02 2020-01-03 2020-01-02 UTC--2020-01-03 UTC 1
3 A 3 2020-01-07 2020-01-08 2020-01-07 UTC--2020-01-08 UTC 1
4 A 4 2020-01-10 2020-01-11 2020-01-10 UTC--2020-01-11 UTC 1
5 B 1 2020-01-01 2020-01-02 2020-01-01 UTC--2020-01-02 UTC 1
6 B 2 2020-01-03 2020-01-04 2020-01-03 UTC--2020-01-04 UTC 1
7 B 3 2020-01-05 2020-01-06 2020-01-05 UTC--2020-01-06 UTC 1
8 B 4 2020-01-07 2020-01-08 2020-01-07 UTC--2020-01-08 UTC 1
我想要的是:
# A tibble: 8 x 6
# Groups: team [2]
team player starttime stoptime interval overlap_flag
<chr> <dbl> <date> <date> <Interval> <dbl>
1 A 1 2020-01-01 2020-01-02 2020-01-01 UTC--2020-01-02 UTC 1
2 A 2 2020-01-02 2020-01-03 2020-01-02 UTC--2020-01-03 UTC 1
3 A 3 2020-01-07 2020-01-08 2020-01-07 UTC--2020-01-08 UTC 0
4 A 4 2020-01-10 2020-01-11 2020-01-10 UTC--2020-01-11 UTC 0
5 B 1 2020-01-01 2020-01-02 2020-01-01 UTC--2020-01-02 UTC 0
6 B 2 2020-01-03 2020-01-04 2020-01-03 UTC--2020-01-04 UTC 0
7 B 3 2020-01-05 2020-01-06 2020-01-05 UTC--2020-01-06 UTC 0
8 B 4 2020-01-07 2020-01-08 2020-01-07 UTC--2020-01-08 UTC 0
我知道可能有一个data.table
解决方案......但是,我想知道这是否可以轻松完成dplyr
解决方案
我们可以使用row_number()
循环遍历行,然后将其用作删除“starttime”值的索引
library(dplyr)
library(lubridate)
library(purrr)
df %>%
mutate(interval = as.list(interval(starttime, stoptime))) %>%
group_by(team) %>%
mutate(overlap_flag = +(map2_lgl(row_number(),
interval, ~ sum(starttime[-.x] %within% .y) > 0)))
推荐阅读
- javascript - 将三个数组合并成一个字符串
- java - 为什么我无法在 Hadoop 中获取文件名并以格式(字文件名计数)显示它?
- mysql - Redshift 相关子查询内部错误
- python - 纽约时报 API
- netsuite - SuiteScript 2.0 中的包裹跟踪号
- node.js - 如何在 MQTT 事件之外传递更新的变量以将其用作条件?(Hyperledger Fabric 和 MQTT)
- python - 如何在画布的任何位置创建一个可以在画布上的任何位置移动的正方形
- excel - 在Excel中从另一个字符串中减去逗号分隔的字符串
- python - 如何使用正则表达式在列表中查找以下值:
- google-apps-script - 谷歌工作表脚本只勾选最后匹配的行