首页 > 解决方案 > 如何水平连接两个表并匹配 R 中的 2 个不同的列名?

问题描述

我有两个数据框:

数据1:

ID               DateTimeUTC
 A               12/4/2019 11:30:30 PM
 A               12/4/2019 11:30:31 PM
 B               12/5/2019 11:31:00 PM
 B               12/5/2019 11:31:01 PM
 C               12/5/2019 11:31:02 PM

和数据2:

 Message         DateTimeUTC
 A               12/4/2019 11:30:30 PM
 A               12/4/2019 11:30:31 PM
 B               12/5/2019 11:31:00 PM
 B               12/5/2019 11:31:01 PM

我想拥有

ID              DateTimeUTC               Message              DateTimeUTC
A               12/4/2019 11:30:30 PM      A           12/4/2019 11:30:30 PM
A               12/4/2019 11:30:31 PM      A           12/4/2019 11:30:31 PM
B               12/5/2019 11:31:00 PM      B           12/4/2019 11:31:00 PM
B               12/5/2019 11:31:01 PM      B           12/4/2019 11:31:01 PM

我希望只显示匹配的 ID 和消息。我执行了内部连接,但它给了我重复项,并且它删除了我的一个列名。

 library('dplyr')
 inner_join(data1,  data2, by = c("ID" = "Message"))  

目标:有人可以告诉我如何进行 rbind 以获得上述结果吗?

##pseudo_code:
 rbind(data1,data2, order_by ID & Message)

标签: rdplyrrbind

解决方案


实际上,inner_join 的想法是正确的,但问题是不仅要加入“ID”=“Message”,还应该考虑 DateTimeUTC。所以它在以下两个条件下加入;

library(dplyr)

df1 <-
  data.frame(
    ID = c("A", "A", "B", "B", "C"),
    DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", "12/5/2019 11:31:00 PM", 
                    "12/5/2019 11:31:01 PM", "12/5/2019 11:31:02 PM"),
    stringsAsFactors = FALSE
  )

df2 <-
  data.frame(
    Message = c("A", "A", "B", "B"),
    DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", 
                    "12/5/2019 11:31:00 PM", "12/5/2019 11:31:01 PM"),
    stringsAsFactors = FALSE
  )

df1 %>%
  inner_join(df2, by = c("ID" = "Message", "DateTimeUTC" = "DateTimeUTC"))

# ID           DateTimeUTC
# A 12/4/2019 11:30:30 PM
# A 12/4/2019 11:30:31 PM
# B 12/5/2019 11:31:00 PM
# B 12/5/2019 11:31:01 PM

推荐阅读