首页 > 解决方案 > 两个数据框的特定连接

问题描述

我有两个数据框:df1df2

> df1

     ID  Gender      age      cd       evnt     scr     test_dt
1 C0004    MALE       22       1          1      82    7/3/2014
2 C0004    MALE       22       1          2      76    7/3/2014
3 C0005    MALE       22       1          3    1514    7/3/2014
4 C0005    MALE       23       2          1      81   11/3/2014
5 C0006    MALE       23       2          2      75   11/3/2014
6 C0006    MALE       23       2          3     878   11/3/2014

和,

> df2

     ID    hgt    wt     phys_dt
1 C0004     70   147   6/29/2015
2 C0004     70   157   6/27/2016
3 C0005     67   175   6/27/2016
4 C0005     65   171    7/2/2014
5 C0006     69   160   6/29/2015
6 C0006     64   143    7/2/2014

我想加入df1df2以产生以下数据框的方式调用它df3

> df3

     ID   Gender      age      cd       evnt     scr     hgt     wt
1 C0004     MALE       22       1          1      82      70    147
2 C0004     MALE       22       1          2      76      70    157
3 C0005     MALE       22       1          3    1514      67    175
4 C0005     MALE       23       2          1      81      65    171
5 C0006     MALE       23       2          2      75      69    160
6 C0006     MALE       23       2          3     878      64    143

我正在尝试将df2$hgtand添加df2$wt到正确的ID行。棘手的部分是我想加入日期(和hgt)最接近wt的行。我在想我可以先按两个数据框各自的日期对它们进行排序,然后尝试加入?我不太确定如何解决这个问题。谢谢。IDdf1$test_dtdf2$phys_dtID

标签: rdataframejoindplyr

解决方案


如果您只想匹配 df1$ID 和 df2$ID,请执行以下操作:

df3 <- left_join(df1, df2, by = c("ID" = "ID"))  

如果日期和 ID 应该匹配,您可以尝试:

df3 <- left_join(df1, df2, by = c("ID" = "ID", "test_dt" = "phys_dt")) 

它在图书馆(dplyr)


推荐阅读