首页 > 解决方案 > 长格式的时间序列数据,合并特定行时间点,同时忽略每个参与者的其他时间点

问题描述

我正在为以下内容寻找一个 - 最好是整洁的 - 解决方案。我有来自 4 个时间点的长格式纵向数据,有时我感兴趣的变量 ( var1, var2, ...) 在不同时间都有缺失值。我正在寻找一种仅在以下条件下合并每个值t1t2行的方法:id

我认为我在group_by,starts_withsummarise函数的组合上走在正确的轨道上,但我无法获得所需的确切输出。这里的其他几个线程是相似的,但没有满足忽略其他时间的特定时间点标准。

示例数据:

df1
# A tibble: 16 x 4
   id    time   var1  var2
   <chr> <chr> <dbl> <dbl>
 1 A1    t1       NA    NA
 2 A1    t2        2     3
 3 A1    t3        2     2
 4 A1    t4        3     2
 5 A2    t1        1     2
 6 A2    t2       NA    NA
 7 A2    t3        2    NA
 8 A2    t4        1     2
 9 A3    t1        2     3
10 A3    t2       NA     1
11 A3    t3       NA    NA
12 A3    t4        2     2
13 A4    t1       NA    NA
14 A4    t2       NA     1
15 A4    t3        1     1
16 A4    t4        1     1 

期望的输出

df2
# A tibble: 16 x 4
   id    time   var1  var2
   <chr> <chr> <dbl> <dbl>
 1 A1    t1        2     3
 2 A1    t2        2     3
 3 A1    t3        2     2
 4 A1    t4        3     2
 5 A2    t1        1     2
 6 A2    t2        1     2
 7 A2    t3        2    NA
 8 A2    t4        1     2
 9 A3    t1        2     3
10 A3    t2        2     1
11 A3    t3       NA    NA
12 A3    t4        2     2
13 A4    t1       NA     1
14 A4    t2       NA     1
15 A4    t3        1     1
16 A4    t4        1     1

表示数据集:

df1 <- structure(list(id = c("A1", "A1", "A1", "A1", "A2", "A2", "A2", 
"A2", "A3", "A3", "A3", "A3", "A4", "A4", "A4", "A4"), time = c("t1", 
"t2", "t3", "t4", "t1", "t2", "t3", "t4", "t1", "t2", "t3", "t4", 
"t1", "t2", "t3", "t4"), var1 = c(NA, 2, 2, 3, 1, NA, 2, 1, 2, 
NA, NA, 2, NA, NA, 1, 1), var2 = c(NA, 3, 2, 2, 2, NA, NA, 2, 
3, 1, NA, 2, NA, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -16L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_character", 
    "collector")), time = structure(list(), class = c("collector_character", 
    "collector")), var1 = structure(list(), class = c("collector_double", 
    "collector")), var2 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

df2 <- structure(list(id = c("A1", "A1", "A1", "A1", "A2", "A2", "A2", 
"A2", "A3", "A3", "A3", "A3", "A4", "A4", "A4", "A4"), time = c("t1", 
"t2", "t3", "t4", "t1", "t2", "t3", "t4", "t1", "t2", "t3", "t4", 
"t1", "t2", "t3", "t4"), var1 = c(2, 2, 2, 3, 1, 1, 2, 1, 2, 
2, NA, 2, NA, NA, 1, 1), var2 = c(3, 3, 2, 2, 2, 2, NA, 2, 3, 
1, NA, 2, 1, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -16L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_character", 
    "collector")), time = structure(list(), class = c("collector_character", 
    "collector")), var1 = structure(list(), class = c("collector_double", 
    "collector")), var2 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

任何帮助将非常感激!

标签: rdplyrtidyverse

解决方案


我们可以拆分出 t1 和 t2 并用于tidyr::fill(... .direction = "updown")创建您想要的逻辑,然后将其绑定到 t3 和 t4 值,然后使用。

bind_rows(
  df1 %>%
    group_by(id) %>%
    filter(time %in% c("t1", "t2")) %>%
    fill(c("var1", "var2"), .direction = "updown"),
  df1 %>%
    filter(time %in% c("t3", "t4"))
  ) %>%
  arrange(id, time) %>%
  ungroup()

# A tibble: 16 x 4
   id    time   var1  var2
   <chr> <chr> <dbl> <dbl>
 1 A1    t1        2     3
 2 A1    t2        2     3
 3 A1    t3        2     2
 4 A1    t4        3     2
 5 A2    t1        1     2
 6 A2    t2        1     2
 7 A2    t3        2    NA
 8 A2    t4        1     2
 9 A3    t1        2     3
10 A3    t2        2     1
11 A3    t3       NA    NA
12 A3    t4        2     2
13 A4    t1       NA     1
14 A4    t2       NA     1
15 A4    t3        1     1
16 A4    t4        1     1

推荐阅读