r - 如何在R中合并两个时间序列数据?
问题描述
我有两个时间序列数据。一个包括药物名称、使用开始日期、停止使用日期和药物剂量,第二个数据包括访问日期和分数,
data1<- data.frame("Drug Name" = c("Drug1","Drug1","Drug2","Drug1","Drug3","Drug2",
"Drug4","Drug5","Drug1"),
"Start Date" = c("7/1/2016","1/1/2016", "8/6/2015","2/1/2015","6/14/2017",
"6/21/2017","1/24/2018","7/30/2018","7/30/2018"),
"Stop Date "=c("1/14/2017","1/14/2017", "1/14/2017","1/14/2017"
,"1/24/2018","6/29/2018","6/29/2018","Ongoing","Ongoing"),
"Dose"=c(12,20,32,3,5,6,6,8,9))
data2<-data.frame("visitdate"=c("8/24/2016","8/24/2016", "10/19/2016","12/7/2016","12/21/2016",
"3/22/2017","5/10/2017", "6/14/2017", "7/12/2017","9/27/2017",
"11/29/2017", "1/24/2018","3/21/2018","5/30/2018","8/15/2018",
"10/3/2018", "11/28/2018"),
"Score"=c(1,2,3,34,6,7,9,5,6,8,5,5,7,9,1,2,5))
我想以某种方式合并这两个数据,例如在访问日期8/24/2016
告诉我患者服用了多少药物及其剂量加上临床评分。
解决方案
可能有一些数据预处理需要尽早考虑。
首先,您上面的示例有带空格的列名,最好避免。我为此示例编辑并删除了空格。
此外,您将“正在进行”作为日期。建议使用转换为日期as.Date
。但是,在转换后那些带有“正在进行”的将被包含为NA
. 这些可以设置为Inf
(无穷大),这将起作用。
例如:
data1$StartDate <- as.Date(data1$StartDate, format = "%m/%d/%Y")
data1$StopDate <- as.Date(data1$StopDate, format = "%m/%d/%Y")
data2$VisitDate <- as.Date(data2$VisitDate, format = "%m/%d/%Y")
data1$StopDate[8:9] <- Inf
有许多其他方法可以解决这个问题,具体取决于您的数据来源。
之后,您可以使用tidyverse
和fuzzyjoin
执行以下操作。使用fuzzy_left_join
您可以将两个数据框连接在一起,其中仅包含日期范围内的日期。
您可能会考虑将结果保留为长格式。但是,如果您想要宽格式,则可以使用pivot_wider
. 最后select
将列按数字顺序排列,如您的示例所示。
library(tidyverse)
library(fuzzyjoin)
fuzzy_left_join(data2,
data1,
by = c("VisitDate" = "StartDate",
"VisitDate" = "StopDate"),
match_fun = list(`>=`, `<=`)) %>%
select(-StartDate, -StopDate) %>%
group_by(VisitDate, Score) %>%
mutate(rn = row_number(),
NumDrugs = ifelse(all(is.na(DrugName)) == TRUE, 0, n())) %>%
pivot_wider(id_cols = c(VisitDate, Score, NumDrugs), names_from = rn, values_from = c(DrugName, Dose)) %>%
select(VisitDate, Score, NumDrugs, names(.)[-c(1:3)][order(parse_number(names(.)[-c(1:3)]))])
输出
VisitDate Score NumDrugs DrugName_1 Dose_1 DrugName_2 Dose_2 DrugName_3 Dose_3 DrugName_4 Dose_4
<date> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 2016-08-24 1 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
2 2016-08-24 2 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
3 2016-10-19 3 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
4 2016-12-07 34 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
5 2016-12-21 6 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
6 2017-03-22 7 0 NA NA NA NA NA NA NA NA
7 2017-05-10 9 0 NA NA NA NA NA NA NA NA
8 2017-06-14 5 1 Drug3 5 NA NA NA NA NA NA
9 2017-07-12 6 2 Drug3 5 Drug2 6 NA NA NA NA
10 2017-09-27 8 2 Drug3 5 Drug2 6 NA NA NA NA
11 2017-11-29 5 2 Drug3 5 Drug2 6 NA NA NA NA
12 2018-01-24 5 3 Drug3 5 Drug2 6 Drug4 6 NA NA
13 2018-03-21 7 2 Drug2 6 Drug4 6 NA NA NA NA
14 2018-05-30 9 2 Drug2 6 Drug4 6 NA NA NA NA
15 2018-08-15 1 2 Drug5 8 Drug1 9 NA NA NA NA
16 2018-10-03 2 2 Drug5 8 Drug1 9 NA NA NA NA
17 2018-11-28 5 2 Drug5 8 Drug1 9 NA NA NA NA
数据
(转换日期之前)
data1 <- structure(list(DrugName = c("Drug1", "Drug1", "Drug2", "Drug1",
"Drug3", "Drug2", "Drug4", "Drug5", "Drug1"), StartDate = c("7/1/2016",
"1/1/2016", "8/6/2015", "2/1/2015", "6/14/2017", "6/21/2017",
"1/24/2018", "7/30/2018", "7/30/2018"), StopDate = c("1/14/2017",
"1/14/2017", "1/14/2017", "1/14/2017", "1/24/2018", "6/29/2018",
"6/29/2018", NA, NA), Dose = c(12, 20, 32, 3, 5, 6, 6, 8, 9)), class = "data.frame", row.names = c(NA,
-9L))
data2 <- structure(list(VisitDate = c("8/24/2016", "8/24/2016", "10/19/2016",
"12/7/2016", "12/21/2016", "3/22/2017", "5/10/2017", "6/14/2017",
"7/12/2017", "9/27/2017", "11/29/2017", "1/24/2018", "3/21/2018",
"5/30/2018", "8/15/2018", "10/3/2018", "11/28/2018"), Score = c(1,
2, 3, 34, 6, 7, 9, 5, 6, 8, 5, 5, 7, 9, 1, 2, 5)), class = "data.frame", row.names = c(NA,
-17L))
推荐阅读
- python - 如何过滤频率并将二元组添加到代码中?
- google-apps-script - Google Apps 脚本显示的值不正确
- c - 可以使用 jemalloc arenas 在 64 位架构上实现 32 位指针吗?
- python-3.x - 'requests.get' 为什么我不能得到正确的响应?
- database - 如何在不使用子查询的情况下过滤一组 ID 值
- c++ - 这些特定指针的含义
- java - 为什么〜1在Java中返回-2而不是0?
- java - 如何使用 Java 在 Eclipse 中运行 webdriver 和 JMeter 代码
- stripe-payments - Stripe 与 expo react 原生应用程序集成
- c# - 循环遍历未保存的实体对象并获取除非标量属性外的属性和值