首页 > 解决方案 > R:使用 sqldf 根据 ID 和时间连接两个数据帧

问题描述

我正在尝试加入两个数据框。第一个我确实有一个StartSession时间变量Start(当df1 介于df2和df2 之间时,End我想根据每个参与者的 ID 加入两个数据帧。不是 df1 中的每个案例都有 df2 中的数据。StartSessionStartEnd

我搜索了解决方案并遇到了sqldf(),所以我尝试使用sqldf left join. 但它不起作用,我没有错误消息,但在 df1 的右侧有 df2 的列名但没有数据(NA)。你能帮助我吗?如果有不清楚的地方,我很抱歉,我真的是 R 的初学者,如果有必要我很高兴提供更多信息。

data = sqldf("SELECT * FROM df1
                   left JOIN df2
                   ON df1.ID = df2.ID
                   AND df1.StartSession BETWEEN df2.Start and df2.End",
                   method = "name__class")

重现数据集

df1 <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), Session = c(3, 
4, 5, 6, 7, 1, 4, 5, 6, 7), StartSession = structure(c(1601943118, 
1602022814, 1602107392, 1602634216, 1602891080, 1601930068, 1602187292, 
1602278537, 1602526192, 1602979765), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), row.names = c(NA, 10L), class = "data.frame")

df2 <- structure(list(ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2), Start = structure(c(1601935128, 
1601935317, 1602015011, 1602099592, 1602626414, 1602883278, 1601922271, 
1602099313, 1602179494, 1602270739), class = c("POSIXct", "POSIXt"
), tzone = ""), End = structure(c(1601936456, 1601936602, 1602016298, 
1602100887, 1602627696, 1602884626, 1601923578, 1602100759, 1602180771, 
1602272006), class = c("POSIXct", "POSIXt"))), row.names = c(NA, 
10L), class = "data.frame")

样本数据

df1
ID   StartSession
01   2020-10-06 20:08:11
01   2020-10-07 15:18:12
01   2020-10-08 18:23:34
02   2020-10-07 19:45:23
03   2020-10-08 20:30:24
03   2020-10-08 23:48:02

df2
ID     Start                  End
01   2020-10-06 20:05:11    2020-10-06 20:15:11
01   2020-10-07 15:14:12    2020-10-07 15:24:12
02   2020-10-07 19:42:23    2020-10-07 19:52:23
03   2020-10-08 20:26:24    2020-10-08 20:36:24
03   2020-10-08 23:44:02    2020-10-08 23:54:02

结果应该是:

data
ID     StartSession           Start                  End
01     2020-10-06 20:08:11    2020-10-06 20:05:11    2020-10-06 20:15:11
01     2020-10-07 15:18:12    2020-10-07 15:14:12    2020-10-07 15:24:12
01     2020-10-08 18:23:34    NA                     NA
02     2020-10-07 19:45:23    2020-10-07 19:42:23    2020-10-07 19:52:23
03     2020-10-08 20:30:24    2020-10-08 20:26:24    2020-10-08 20:36:24
03     2020-10-08 23:48:02    2020-10-08 23:44:02    2020-10-08 23:54:02

标签: rleft-joinsqldf

解决方案


推荐阅读