首页 > 解决方案 > 如何过滤时间窗口内最近发生的事件?

问题描述

我有一个包含时间、ID、两种事件类型(A 和 B)和(当前)空白共现列的数据流。我想浏览数据集,对于每个 B 事件,检查前 5 秒内是否有 A。如果是这样,则该 A 事件行将在其共现列中接收来自 B 事件的 ID。在极少数情况下存在多个,第二个共现添加到第二列(或者两者都可以进入同一列以便稍后处理)。

我可以使用循环和一些逻辑来实现大部分所需的结果,但是有时在 A 的 5 秒内出现多个 B,或者在 B 之前的 5 秒内出现多个 As,因此使用当前行 -1不捕捉这些。

示例数据流如下所示:

Time     ID  Event Co1 Co2
7:47:28  X1  A
7:47:30  X2  B
7:48:02  X3  A
7:48:04  X4  A
7:48:05  X5  B
7:50:11  X1  A
7:50:12  X2  B
7:50:15  X5  B
7:55:50  X6  A
7:55:52  X2  B

并且通过正确的处理应该产生这个:

Time     ID  Event Co1 Co2
7:47:28  X1  A     X2
7:47:30  X2  B
7:48:02  X3  A     X5
7:48:04  X4  A     X5
7:48:05  X5  B
7:50:11  X1  A     X2  X5
7:50:12  X2  B
7:50:15  X5  B
7:55:50  X6  A     X2
7:55:52  X2  B

任何正确方向的帮助或指示将不胜感激!

标签: rdatabasefiltering

解决方案


这是包中具有foverlaps功能的解决方案data.table

library(data.table)
dt <- read.table(text = "Time ID Event
07:47:28 X1 A
07:47:30 X2 B
07:48:02 X3 A
07:48:04 X4 A
07:48:05 X5 B
07:50:11 X6 A
07:50:12 X7 B
07:50:15 X8 B
07:55:50 X9 A
07:55:52 X10 B", header = TRUE, sep = " ", stringsAsFactors = FALSE)


# Use data.table
setDT(dt)


# Join dataset to self over the 5 second lookback period
dt[, time := as.ITime(Time)]
dt[, time.lookback := time - as.ITime("00:00:05")]
setkey(dt, time.lookback, time)
dt.join <- foverlaps(dt, dt)
dt.join <- dt.join[order(ID)]

# You should be able to simplify this part a lot:
dt.join <- dt.join[(Event == i.Event & time == i.time) | (Event == "A" & i.Event == "B" & time < i.time)]
setorder(dt.join, ID, Event, -i.Event, i.time)
dt.join[i.Event == "A", i.ID := NA]
dt.join[i.Event == "A", i.Event := NA]
dt.join[i.Event == "B" & time == i.time, i.ID := NA]
dt.join[i.Event == "B" & time == i.time, i.Event := NA]
dt.join[, rn := cumsum(i.Event == "B"), .(ID, Event)]

# Now brining the dataset back to original granularity:
res <- dcast(
  dt.join, 
  formula = ID + Event ~ paste0("col", rn), 
  value.var = "i.ID"
)
res$colNA <- NULL
res
#     ID Event col1 col2
# 1:  X1     A   X2 <NA>
# 2: X10     B <NA> <NA>
# 3:  X2     B <NA> <NA>
# 4:  X3     A   X5 <NA>
# 5:  X4     A   X5 <NA>
# 6:  X5     B <NA> <NA>
# 7:  X6     A   X7   X8
# 8:  X7     B <NA> <NA>
# 9:  X8     B <NA> <NA>
# 10:  X9     A  X10 <NA>

推荐阅读