首页 > 解决方案 > How to match and combine dates and variables from two different dataframes in R

问题描述

I using R studio for this task. I've done some manipulation and basically have two dataframes: one with Site Location as PRO and another with Site Location as Home. Some of these patients attended both sites at the same day and so have variables for each.

The last step I need to do is match the dates (this is the last column in the data frames) from "Home" and "PRO" for the same patients who attended both sites on the same day so I can see only this matched data for Patient ID and Date. I don't want to see any unmatched data. Please send help.

Below is details of the two dataframes I'm dealing with.

Many thanks

   PATIENT_ID SITE_LOCATION       CREATION_DATE                IMPORT_DATE AMPEL_PARAMETER AMPEL_EXPECTED_VALUE AMPEL_EXPECTED_UNIT FEV1
1 -1234567890          HOME 07.12.2015 18:03:25 07.12.2015 18:14:11 +01:00             Fvc                    5                 [L] 4.10
2 -1234567891          HOME 08.12.2015 18:04:21 07.12.2015 18:17:32 +01:00             Fvc                    5                 [L] 4.10
3 -1234567892          HOME 09.12.2015 18:04:29 07.12.2015 18:17:32 +01:00             Fvc                    5                 [L] 4.02
4 -1234567893          HOME 10.12.2015 18:04:20 07.12.2015 18:17:32 +01:00             Fvc                    5                 [L] 3.95
5 -1234567894          HOME 11.12.2015 18:05:37 07.12.2015 18:20:16 +01:00             Fvc                    5                 [L] 3.78
6 -1234567895          HOME 12.12.2015 18:05:16 07.12.2015 18:20:16 +01:00             Fvc                    5                 [L] 3.91
  FEV1_UNIT  FVC FVC_UNIT      PEF PEF_UNIT FEF2575 FEF2575_UNIT FEV1_PREDICTED FVC_PREDICTED PEF_PREDICTED FEF2575_PREDICTED       Date
1       [L] 5.49      [L] 9.205000    [L/s]      NA      [L/min]             NA            NA            NA                NA 07.12.2015
2       [L] 5.39      [L] 8.928333    [L/s]      NA      [L/min]             NA            NA            NA                NA 08.12.2015
3       [L] 5.68      [L] 8.846667    [L/s]      NA      [L/min]             NA            NA            NA                NA 09.12.2015
4       [L] 5.61      [L] 9.268333    [L/s]      NA      [L/min]             NA            NA            NA                NA 10.12.2015
5       [L] 4.97      [L] 9.531667    [L/s]      NA      [L/min]             NA            NA            NA                NA 11.12.2015
6       [L] 5.13      [L] 9.031667    [L/s]      NA      [L/min]             NA            NA            NA                NA 12.12.2015
PATIENT_ID SITE_LOCATION       CREATION_DATE IMPORT_DATE AMPEL_PARAMETER AMPEL_EXPECTED_VALUE AMPEL_EXPECTED_UNIT     FEV1
1205 -1234567891           PRO 08.12.2015 13:11:50        <NA>            <NA>                   NA                <NA> 4.134448
1206 -1234567891           PRO 08.12.2015 13:15:27        <NA>            <NA>                   NA                <NA> 3.913590
1207 -1234567891           PRO 08.12.2015 16:04:56        <NA>            <NA>                   NA                <NA> 4.075508
1208 -1234567891           PRO 08.12.2015 16:05:39        <NA>            <NA>                   NA                <NA> 3.877134
1209 -1234567890           PRO 08.12.2015 16:56:44        <NA>            <NA>                   NA                <NA> 4.008187
1210 -1234567890           PRO 24.12.2015 17:39:45        <NA>            <NA>                   NA                <NA> 4.024912
     FEV1_UNIT      FVC FVC_UNIT       PEF PEF_UNIT  FEF2575 FEF2575_UNIT FEV1_PREDICTED FVC_PREDICTED PEF_PREDICTED FEF2575_PREDICTED
1205       [L] 5.382417      [L]  9.813333    [L/s] 176.7417      [L/min]       4.756512      5.989738       9.94095          268.4329
1206       [L] 5.429113      [L]  9.128333    [L/s] 170.7167      [L/min]       4.756512      5.989738       9.94095          268.4329
1207       [L] 5.651814      [L] 10.103333    [L/s] 181.5170      [L/min]       4.756512      5.989738       9.94095          268.4329
1208       [L] 5.640839      [L] 10.256667    [L/s] 150.9262      [L/min]       4.756512      5.989738       9.94095          268.4329
1209       [L] 5.586417      [L]  9.298333    [L/s] 175.1896      [L/min]       4.756512      5.989738       9.94095          268.4329
1210       [L] 5.617088      [L]  9.846667    [L/s] 175.6264      [L/min]       4.754849      5.988456       9.93880          268.2625
           Date     
1205 08.12.2015
1206 08.12.2015
1207 08.12.2015 
1208 08.12.2015
1209 08.12.2015 
1210 24.12.2015 

标签: rdataframedatepattern-matching

解决方案


Here's a tidyverse solution using inner_join. This should get you what you wanted if I understood the request correctly - it's just the rows from homedf where there's a match from the PROdf.

library(tidyr)
library(dplyr)
homedf<- homedf %>% mutate(dateonly=as.Date(CREATION_DATE))
PROdf<- PROdf %>% mutate(dateonly=as.Date(CREATION_DATE))
mathceddf<-inner_join(homedf,PROdf,by=c("PATIENT_ID","dateonly"))

推荐阅读