首页 > 解决方案 > R:通过 ID 并仅在给定的日期范围内传递值

问题描述

我有一个如下所示的数据表:

   id  firstd       lastd       treat
   1   2003-03-23   2003-03-25  1
   1   2003-03-24   2003-03-25  NA
   1   2003-03-25   2003-03-25  NA
   1   2003-05-13   2003-05-15  0
   1   2003-05-14   2003-05-15  NA
   1   2003-05-15   2003-05-15  NA
   2   2004-04-28   2004-04-30  0
   2   2004-04-29   2003-04-30  NA
   2   2004-04-30   2003-04-30  NA

我想通过从 firstd 到 firstd==lastd by id 的日期范围处理列的值,以便用给定的值填充 NA。

理想情况下,它将如下所示:

   id  firstd       lastd       treat
   1   2003-03-23   2003-03-25  1
   1   2003-03-24   2003-03-25  1
   1   2003-03-25   2003-03-25  1
   1   2003-05-13   2003-05-15  0
   1   2003-05-14   2003-05-15  0
   1   2003-05-15   2003-05-15  0
   2   2004-04-28   2004-04-30  0
   2   2004-04-29   2003-04-30  0
   2   2004-04-30   2003-04-30  0

我知道如何通过一列传递一个值,但还没有完成给定日期范围的复杂性。有谁知道如何做到这一点?

通过给定列传递值时,我通常使用的代码如下 -

    one[, treat:= treat[!is.na(treat)][1], by = id]

有谁知道如何修改这段代码以考虑给定的日期范围?或者有什么进一步的建议?

标签: rdataframedata.table

解决方案


我们可以按“id”和fill

library(dplyr)
library(tidyr)
one %>%
   group_by(id) %>%
   fill(treat)
# A tibble: 9 x 4
# Groups:   id [2]
#     id firstd     lastd      treat
#  <int> <chr>      <chr>      <int>
#1     1 2003-03-23 2003-03-25     1
#2     1 2003-03-24 2003-03-25     1
#3     1 2003-03-25 2003-03-25     1
#4     1 2003-05-13 2003-05-15     0
#5     1 2003-05-14 2003-05-15     0
#6     1 2003-05-15 2003-05-15     0
#7     2 2004-04-28 2004-04-30     0
#8     2 2004-04-29 2003-04-30     0
#9     2 2004-04-30 2003-04-30     0

如果我们还使用日期作为分组变量,那么

one %>%
   group_by(id, grp = rleid(lastd)) %>%
   fill(treat)

或者,如果我们还考虑“firstd”,则根据日期之间的相等性创建一个分组变量

one %>%
    group_by(id, grp = lag(cumsum(firstd == lastd), default = 0)) %>%
    fill(treat)

withdata.table我们可以结合na.locf0fromzoo

library(zoo)
library(data.table)
setDT(one)[, treat := na.locf0(treat), by = id][]
#   id     firstd      lastd treat
#1:  1 2003-03-23 2003-03-25     1
#2:  1 2003-03-24 2003-03-25     1
#3:  1 2003-03-25 2003-03-25     1
#4:  1 2003-05-13 2003-05-15     0
#5:  1 2003-05-14 2003-05-15     0
#6:  1 2003-05-15 2003-05-15     0
#7:  2 2004-04-28 2004-04-30     0
#8:  2 2004-04-29 2003-04-30     0
#9:  2 2004-04-30 2003-04-30     0

数据

one <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L),
 firstd = c("2003-03-23", 
"2003-03-24", "2003-03-25", "2003-05-13", "2003-05-14", "2003-05-15", 
"2004-04-28", "2004-04-29", "2004-04-30"), lastd = c("2003-03-25", 
"2003-03-25", "2003-03-25", "2003-05-15", "2003-05-15", "2003-05-15", 
"2004-04-30", "2003-04-30", "2003-04-30"), treat = c(1L, NA, 
NA, 0L, NA, NA, 0L, NA, NA)), class = "data.frame", row.names = c(NA, 
-9L))

推荐阅读