首页 > 解决方案 > 使用来自一个数据集的数据从不同的数据集中提取信息

问题描述

我有两个数据集。一个有每个主题的健康信息。另一个有 MRI 日期前后的信息。我正在尝试根据这些前后日期提取健康信息。

MRI Pre/Post 数据集:

ID  prescan PreDate Postscan    PostDate
5006    1   5/10/2018   1   6/14/2018
5007    1   5/15/2018   1   6/13/2018
5009    1   5/9/2018    1   6/11/2018
5011    1   5/31/2018   1   7/2/2018
5013    1   5/30/2018   1   7/5/2018

睡眠数据样本:

SubID   SleepDate   Day of Week RHR HRV Recovery
5007    5/12/2018   'Saturday ' 63  95  65
5007    5/13/2018   'Sunday   ' 66  72  52
5010    5/7/2018    'Monday   ' 74  40  48
5010    5/8/2018    'Tuesday  ' 68  67  59
5010    5/9/2018    'Wednesday' 75  74  82
5010    5/10/2018   'Thursday ' 71  80  89
5010    5/11/2018   'Friday   ' 71  91  95
5010    5/12/2018   'Saturday ' 68  66  58
5008    5/7/2018    'Monday   ' 60  132 85
5008    5/8/2018    'Tuesday  ' 60  123 90
5008    5/9/2018    'Wednesday' 66  105 68
5009    5/7/2018    'Monday   ' 47  148 90
5009    5/8/2018    'Tuesday  ' 45  169 87
5009    5/9/2018    'Wednesday' 46  176 75
5009    5/10/2018   'Thursday ' 50  138 54
5009    5/11/2018   'Friday   ' 46  132 42
5009    5/12/2018   'Saturday ' 47  158 60
5009    5/13/2018   'Sunday   ' 47  141 54
5006    5/7/2018    'Monday   ' 56  92  65

我尝试了什么(以及它的变体)

SleepData %>%
  subset(SubID == 5006) %>% 
  filter(SleepDate %in% MRI_date$PreDate)

上面经常返回所有的5006 ID数据

SleepData %>%
  subset(SubID == 5006) %>% 
  subset(SleepDate == MRI_date$PreDate)

返回:

longer object length is not a multiple of shorter object lengthLength of logical index must be 1 or 31, not 44Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 1

我要提取的内容

基于此,例如:

If ID == 5009 & (Date == 5/9/2018 & 6/11/2018)

我想相应地接收睡眠数据:

SubID   SleepDate   Day of Week RHR HRV Recovery
5009    5/9/2018    'Wednesday' 46  176 75
5009    6/11/2018   'Wednesday' 76  196 95

【我编了6/11/2018供参考】

标签: rdplyrtidyverse

解决方案


尝试这样的事情。

library(dplyr)

sleep.dat %>%
 inner_join(mri.dat, by = c("Id" = "subId") %>%
 select(Id == "5009") %>%
 mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
 filter(Date >= as.Date("5-9-2018") & Date <= as.Date("6-11-2018")) %>%
 select(Id, SleepDate, `Day of Week`, RHR, HRV, Recovery)

推荐阅读