首页 > 解决方案 > 基于 POSIXct 值组合两个数据集

问题描述

我正在努力将两个数据集相互结合。

Dataset1 包含一个“before time”和一个“after time”,以及一个“channel”。

Dataset2 仅包含一个“时间”和一个“通道”列。

我想用这个逻辑向 Dataset1 添加一个二进制列(是/否):如果 Dataset2 中有一行,其中通道 == 通道,并且时间在“之前”和“之后”时间之内,我想要有“是”。否则“否”。

数据1

ID   Channel   before_time   after_time 
1       A1  2019-09-02 20:13:00 2019-09-02 20:33:00
2       B1  2019-09-02 20:03:00 2019-09-02 20:23:00
3       C1  2019-09-02 20:23:00 2019-09-02 20:43:00
4       D1  2019-09-02 20:23:00 2019-09-02 20:43:00

数据2

ID_B     Channel_B    Time_B
Hallo       A1        2019-09-02 20:23:00
Hi          B2        2019-09-02 20:05:00
Hoi         C1        2019-09-02 22:23:00

期望的输出

ID   Channel   before_time   after_time                     Available
1       A1  2019-09-02 20:13:00 2019-09-02 20:33:00         Yes  # Channel == Channel, Time between before & after
2       B1  2019-09-02 20:03:00 2019-09-02 20:23:00          No  # Channel != Channel
3       C1  2019-09-02 20:23:00 2019-09-02 20:43:00          No  # Time is not between before & after
4       D1  2019-09-02 20:23:00 2019-09-02 20:43:00          No  # There is no matching data where channel is D1

期望的输出 2(评论解决方案)

从第二个数据集 (Data2) 添加额外的列。

ID   Channel   before_time   after_time                     Available   ID_B     
1       A1  2019-09-02 20:13:00 2019-09-02 20:33:00          Yes        Hallo       
2       B1  2019-09-02 20:03:00 2019-09-02 20:23:00          No         x 
3       C1  2019-09-02 20:23:00 2019-09-02 20:43:00          No         x
4       D1  2019-09-02 20:23:00 2019-09-02 20:43:00          No         x

可重现的示例(数据):

ID <- c("1", "2", "3", "4")
channel <- c("A1", "B1", "C1", "D1)
#startdate <- as.POSIXct(c("2019-09-02 20:23:00", "2019-09-02 20:13:00", "2019-09-02 20:33:00", "2019-09-02 20:33:00"))
before_time <- as.POSIXct(c("2019-09-02 20:13:00", "2019-09-02 20:03:00", "2019-09-02 20:23:00", "2019-09-02 20:23:00"))
after_time  <- as.POSIXct(c("2019-09-02 20:33:00", "2019-09-02 20:23:00", "2019-09-02 20:43:00","2019-09-02 20:43:00"))
data1 <- data.frame(ID, channel,   before_time, after_time)
View(data1)


ID_B <- c("Hallo", "Hi", "Hoi")
channel_B <- c("A1", "B2", "C1")
Time_B <- as.POSIXct(c("2019-09-02 20:23:00", "2019-09-02 20:05:00", "2019-09-02 22:23:00"))
data2 <- data.frame(ID_B, channel_B, Time_B)
View(data2)

标签: rjoindplyrdata.tableposixct

解决方案


正如 arg0naut91 所提到的,这里是非 equi 加入data.table

library(data.table)
setDT(data1)
setDT(data2)
data1[, c("Available", "ID_B") :=
        data2[.SD, on=.(channel_B=channel, Time_B>=before_time, Time_B<=after_time), 
            by=.EACHI, .(.N > 0, ID_B)][, (1L:3L) := NULL]
    ]

输出:

   ID channel         before_time          after_time Available  ID_B
1:  1      A1 2019-09-02 20:13:00 2019-09-02 20:33:00      TRUE Hallo
2:  2      B1 2019-09-02 20:03:00 2019-09-02 20:23:00     FALSE  <NA>
3:  3      C1 2019-09-02 20:23:00 2019-09-02 20:43:00     FALSE  <NA>
4:  4      D1 2019-09-02 20:23:00 2019-09-02 20:43:00     FALSE  <NA>

推荐阅读