r - 比较数据框以提取特定值
问题描述
我有两个数据框:
df <- data.frame(Group = c("A","B","C","D","E","F"),
Date = c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00"))
df2 <- data.frame(Group = c("A","A","B","B","C","C","C","D","D","E","E","F","F"),
Date = c("2018-04-12 08:56:00","2018-04-12 10:42:00","2018-04-13 10:03:00","2018-04-13 11:21:00","2018-04-14 08:17:00","2018-04-14 10:32:00","2018-04-14 22:44:00","2018-04-15 03:10:00","2018-04-15 11:17:00","2018-04-16 16:56:00","2018-04-16 20:01:00","2018-04-17 11:15:00","2018-04-17 11:20:00"))
我想做两件事。首先,按组,我想将 df 中的 Date 列与 df2 中的列进行比较,并提取完全匹配的 datesDate ,或者如果没有完全匹配,则从 df2 中提取最接近和之前的 Date df 中的日期。
其次,按组,我想将 df 中的 Date 列与 df2 中的列进行比较,如果存在完全匹配,则提取 Date ,或者,如果没有完全匹配,则从 df2 中提取最近的 Date它在df中的日期之前。
因此,此示例的结果应如下所示:
result <- data.frame(Group = c("A","B","C","D","E","F"),
Date = c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00"),
Return1 = c("2018-04-12 08:56:00","2018-04-13 10:03:00","2018-04-14 10:32:00","2018-04-15 03:10:00",NA,"2018-04-17 11:15:00"),
Return2 = c("2018-04-12 08:56:00","2018-04-13 11:21:00","2018-04-14 10:32:00","2018-04-15 03:10:00","2018-04-16 16:56:00","2018-04-17 11:15:00"))
解决方案
这就是我认为您正在寻找的东西。
library(dplyr)
library(purrr)
library(lubridate)
library(data.table)
df <- df %>% mutate(Date = parse_date_time(Date, orders = "ymd HMS"))
df2 <- df2 %>% mutate(Date = parse_date_time(Date, orders = "ymd HMS")) %>% mutate(Result1 = Date)
df3 <- df2 %>% rename(Result2 = Result1)
setDT(df)
setDT(df2)
setDT(df3)
setkey(df,Group, Date)
setkey(df2,Group, Date)
setkey(df3,Group, Date)
list(df2[df, roll = Inf], df3[df, roll = "nearest"]) %>%
reduce(full_join, by = c("Group", "Date"))
# Group Date Result1 Result2
# 1 A 2018-04-12 08:56:00 2018-04-12 08:56:00 2018-04-12 08:56:00
# 2 B 2018-04-13 11:03:00 2018-04-13 10:03:00 2018-04-13 11:21:00
# 3 C 2018-04-14 14:30:00 2018-04-14 10:32:00 2018-04-14 10:32:00
# 4 D 2018-04-15 03:10:00 2018-04-15 03:10:00 2018-04-15 03:10:00
# 5 E 2018-04-16 07:28:00 <NA> 2018-04-16 16:56:00
# 6 F 2018-04-17 11:17:00 2018-04-17 11:15:00 2018-04-17 11:15:00
推荐阅读
- c# - 如何在 C# 中使用 LINQ 应用右外连接?
- python - 在两条以上的曲线之间填充 matplotlib
- amazon-web-services - 如何查看 AWS 上运行的所有服务?
- python - 为线性回归和 k-NN 生成双变量数据
- javascript - React Native - API 返回的响应状态在子组件中始终未定义
- c# - 如何在 docker 镜像中使用本地生成的 NuGet?
- python - 如何检查我的 ini 文件并读取用户输入?
- rest - 我已经使用 REST API 调用建立了与 LinkedIn 的连接。但是我在获取相关连接时遇到了这个问题
- python - 如何更新熊猫数据框中选定的 datetime64 值?
- android - 从其他应用发送 Firebase 通知