r - Doing a left join with exact values plus closest values
问题描述
I have two data sets:
table1 <- data.frame(id=c(1000,1001,1002,1003),
date=as.POSIXct(c("2012-05-13","2012-09-23","2011-04-09","2014-11-08")))
table2 <- data.frame(id2=c(1000,1000,1001,1002,1003,1003),
date2=as.POSIXct(c("2012-05-13","2012-05-16","2012-09-24","2011-04-15","2014-11-09", "2014-11-10")))
I want to do a left join on table1 based on matching ID and Date, however not all dates have an exact match so I was wondering how could I join the dates based on the closest day? For example for id 1001, "2012-09-23" would match "2012-09-24" for id2 1001 since it is the only date for id2 and for 1003 the "2014-11-08" would match "2014-11-09" of 1003 for id2 since it is the closest day.
Desired result:
id date date2
1 1000 2012-05-13 2012-05-13
2 1001 2012-09-23 2012-09-24
3 1002 2011-04-09 2011-04-15
4 1003 2014-11-08 2014-11-09
解决方案
我还建议您遵循非 equidata.table
连接,但如果您出于某种原因想要坚持使用dplyr
并且您的数据不是很大或者您有足够的内存,您也可以尝试:
library(dplyr)
table1 %>%
left_join(table2, by = c("id" = "id2")) %>%
group_by(id) %>%
slice(which.min(abs(date - date2)))
输出:
# A tibble: 4 x 3
# Groups: id [4]
id date date2
<dbl> <dttm> <dttm>
1 1000 2012-05-13 00:00:00 2012-05-13 00:00:00
2 1001 2012-09-23 00:00:00 2012-09-24 00:00:00
3 1002 2011-04-09 00:00:00 2011-04-15 00:00:00
4 1003 2014-11-08 00:00:00 2014-11-09 00:00:00
推荐阅读
- firefox - 获取当前在 Youtube 中播放的视频
- python - 子类化 win32com 对象
- python - 为什么 tkinter 相对于窗口调整大小不成比例地扩展框架?
- bash - ROS 启动文件在 bash 文件中不起作用
- arrays - 在 Swift4 中检查填充的数组
- c++ - 将信息掩码为指针 - C++ (Boost.Intrusive)
- c++ - 我在使用带有类的数组时遇到什么问题?
- javascript - (Javascript)如何检查来自 navigator.geolocation.getCurrentPosition() 的长/纬度坐标是否在基于中心点的半径内?
- reactjs - 使用自行编写的基于 typescript 的 NPM 包时找不到模块
- python - OperatorNotAllowedInGraphError:在图形执行中不允许使用 `tf.Tensor` 作为 Python `bool`