r - 合并两个具有条件日期时间的数据集并显示不匹配的值
问题描述
我有两个数据集:
df1 和 df2,如果日期时间在 df2 的 20 秒内,我希望列从 df1 合并在一起
df1
Connect Ended
4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM
3/31/2020 11:00:08 AM 3/31/2020 11:00:10 AM
4/1/2020 10:00:05 PM 4/1/2020 12:00:05 PM
df2
Start End
4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM
3/31/2020 11:00:10 AM 3/31/2020 11:00:14 AM
期望的输出:
df3
Match_Start1 Match_End1 Match_Start2 Match_End2
4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM 4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM
3/31/2020 11:00:08 AM 3/31/2020 11:00:10 AM 3/31/2020 11:00:10 AM 3/31/2020 11:00:14 AM
df4(不匹配)
Unmatched_Start Unmatched_end
4/1/2020 10:00:05 PM 4/1/2020 12:00:05 PM
输入:
df1
structure(list(Connect = structure(c(4L, 2L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "3/31/2020 11:00:08 AM",
"4/1/2020 10:00:05 PM", "4/6/2020 1:15:21 PM"), class = "factor"),
Ended = structure(c(4L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("", "3/31/2020 11:00:10 AM", "4/1/2020 12:00:05 PM",
"4/6/2020 2:05:18 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-13L))
df2
structure(list(Start = structure(2:1, .Label = c("3/31/2020 11:00:10 AM",
"4/6/2020 1:15:21 PM"), class = "factor"), End = structure(2:1, .Label = c("3/31/2020 11:00:14 AM",
"4/6/2020 2:05:18 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
我试过的:
pd.merge_asof(df1, df2, on="Connect", "Ended", by='ticker', tolerance=pd.Timedelta('20 s'), direction='backward')
但是,如何合并 20 秒的条件,以及显示不匹配的数据集?
任何建议表示赞赏
解决方案
我们可以使用crossing
创建所有的组合df1
并且df2
只保留那些在 20 秒间隔内的行。
library(tidyr)
library(dplyr)
library(lubridate)
df3 <- crossing(df1, df2) %>%
mutate_all(mdy_hms) %>%
filter(abs(difftime(Connect, Start, units = "secs")) <= 20 &
abs(difftime(Ended, End, units = "secs")) <= 20)
df3
# A tibble: 2 x 4
# Connect Ended Start End
# <dttm> <dttm> <dttm> <dttm>
#1 2020-03-31 11:00:08 2020-03-31 11:00:10 2020-03-31 11:00:10 2020-03-31 11:00:14
#2 2020-04-06 13:15:21 2020-04-06 14:05:18 2020-04-06 13:15:21 2020-04-06 14:05:18
要获得df4
,我们可以这样做:
df4 <- df1 %>% mutate_all(mdy_hms) %>% anti_join(df3, by = c('Connect', 'Ended'))
推荐阅读
- java - 在 TEIID 中合并两个 Java 对象
- python - 如何在 Python 中使用 Kullback-Leibler 方法最小化 Weibull 分布的参数?
- html - 为什么我的光标不会更改为自定义图像?
- python - 仅在特定时间启用 Google Cloud Scheduler (GCP)
- c# - MethodImplOptions.AggressiveInlining 是否可以防止昂贵的参数在不使用时进行评估?
- c++ - C++/CLI VS 错误“ClassLibrary.dll 不是有效的 win32 应用程序”。更改启动项
- swiftui - 如何从计算属性创建自定义绑定?
- flutter - 在元素之间发送图像
- javascript - 计算对象数组中的重复项
- xamarin.forms - 带有 BODYHTML 的 Xamarin EmailMessage 不起作用