首页 > 解决方案 > 比较数据框以提取特定值

问题描述

我有两个数据框:

df <- data.frame(Group = c("A","B","C","D","E","F"),
             Date = c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00"))

df2 <- data.frame(Group = c("A","A","B","B","C","C","C","D","D","E","E","F","F"),
              Date = c("2018-04-12 08:56:00","2018-04-12 10:42:00","2018-04-13 10:03:00","2018-04-13 11:21:00","2018-04-14 08:17:00","2018-04-14 10:32:00","2018-04-14 22:44:00","2018-04-15 03:10:00","2018-04-15 11:17:00","2018-04-16 16:56:00","2018-04-16 20:01:00","2018-04-17 11:15:00","2018-04-17 11:20:00"))

我想做两件事。首先,按组,我想将 df 中的 Date 列与 df2 中的列进行比较,并提取完全匹配的 datesDate ,或者如果没有完全匹配,则从 df2 中提取最接近和之前的 Date df 中的日期。

其次,按组,我想将 df 中的 Date 列与 df2 中的列进行比较,如果存在完全匹配,则提取 Date ,或者,如果没有完全匹配,则从 df2 中提取最近的 Date它在df中的日期之前。

因此,此示例的结果应如下所示:

result <- data.frame(Group = c("A","B","C","D","E","F"),
                 Date = c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00"),
                 Return1 = c("2018-04-12 08:56:00","2018-04-13 10:03:00","2018-04-14 10:32:00","2018-04-15 03:10:00",NA,"2018-04-17 11:15:00"),
                 Return2 = c("2018-04-12 08:56:00","2018-04-13 11:21:00","2018-04-14 10:32:00","2018-04-15 03:10:00","2018-04-16 16:56:00","2018-04-17 11:15:00"))

标签: rdatemergeposixct

解决方案


这就是我认为您正在寻找的东西。

library(dplyr)
library(purrr)
library(lubridate)
library(data.table)

df <- df %>% mutate(Date = parse_date_time(Date, orders = "ymd HMS"))
df2 <- df2 %>% mutate(Date = parse_date_time(Date, orders = "ymd HMS")) %>% mutate(Result1 = Date)
df3 <- df2 %>% rename(Result2 = Result1)

setDT(df)
setDT(df2)
setDT(df3)

setkey(df,Group, Date)
setkey(df2,Group, Date)
setkey(df3,Group, Date)

list(df2[df, roll = Inf], df3[df, roll = "nearest"]) %>% 
    reduce(full_join, by = c("Group", "Date"))

#   Group                Date             Result1             Result2
# 1     A 2018-04-12 08:56:00 2018-04-12 08:56:00 2018-04-12 08:56:00
# 2     B 2018-04-13 11:03:00 2018-04-13 10:03:00 2018-04-13 11:21:00
# 3     C 2018-04-14 14:30:00 2018-04-14 10:32:00 2018-04-14 10:32:00
# 4     D 2018-04-15 03:10:00 2018-04-15 03:10:00 2018-04-15 03:10:00
# 5     E 2018-04-16 07:28:00                <NA> 2018-04-16 16:56:00
# 6     F 2018-04-17 11:17:00 2018-04-17 11:15:00 2018-04-17 11:15:00

推荐阅读