首页 > 解决方案 > 如何将R中的多行与数据集中的多列合并

问题描述

我想根据列将每个行合并为 record_id 一行,但 type 列中的志愿者除外,该record_id列中有两个重复项repeat。我想要这些的第二行。每个 record_id 对应于一个人,该人要么参加过一次(重复=1)或两次测试,因此在repeat列中有两个条目。

这是我的数据的样子

structure(list(record_id = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 
4, 4, 4), type = c(NA, "data_collection", "test", NA, "data_collection", 
"test", NA, "data_collection", "test", "test", NA, "cata_collection", 
"test", "test"), `repeat` = c(NA, 1, 1, NA, 1, 1, NA, 1, 1, 2, 
NA, 1, 1, 2), dt_volunteer_reg = structure(c(1597246320, NA, 
NA, 1599217080, NA, NA, 1596184500, NA, NA, NA, 1598192280, NA, 
NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), age = c(26, 
NA, NA, 64, NA, NA, 51, NA, NA, NA, 39, NA, NA, NA), gender = c(0, 
NA, NA, 1, NA, NA, 0, NA, NA, NA, 1, NA, NA, NA), case_type = c(NA, 
1, NA, NA, 2, NA, NA, 1, NA, NA, NA, 1, NA, NA), test_dis_dt = structure(c(NA, 
NA, 1597250220, NA, NA, 1600012980, NA, NA, 1596382080, 1601980740, 
NA, NA, 1598284020, 1603118700), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), test_dis_res = c(NA, NA, 1, NA, NA, 1, NA, 
NA, 2, 2, NA, NA, 2, 2), test_dis_in = c(NA, NA, NA, NA, NA, 
0.02, NA, NA, 6.13, 4.75, NA, NA, 7.23, 3.85), test_cont_dt = structure(c(NA, 
NA, 1597250280, NA, NA, 1608636120, NA, NA, NA, 1601980740, NA, 
NA, 1605704940, 1603205340), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
  test_cont_res = c(NA, NA, 2, NA, NA, 1, NA, NA, NA, 2, NA, 
  NA, 2, 2), test_cont_val = c(NA, NA, 123, NA, NA, 0, NA, 
  NA, NA, 40000, NA, NA, 471.6, 306.5)), row.names = c(NA, 
-14L), class = c("tbl_df", "tbl", "data.frame"))

这就是我希望得到的

  structure(list(record_id = c(1, 2, 3, 3, 4, 4), `repeat` = c(1, 
1, 1, 2, 1, 2), dt_volunteer_reg = structure(c(1597246320, 1599217080, 
1596184500, 1596184500, 1598192280, 1598192280), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), age = c(26, 64, 51, 51, 39, 39), gender = c(0, 
1, 0, 0, 1, 1), case_type = c(1, 2, 1, 1, 1, 1), test_dis_dt = structure(c(1597250220, 
1600012980, 1596382080, 1601980740, 1598284020, 1603118700), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), test_dis_res = c(1, 1, 2, 2, 2, 2), 
    test_dis_in = c(NA, 0.02, 6.13, 4.75, 7.23, 3.85), test_cont_dt = structure(c(1597250280, 
    1608636120, NA, 1601980740, 1605704940, 1603205340), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), test_cont_res = c(2, 1, NA, 2, 
    2, 2), test_cont_val = c(123, 0, NA, 40000, 471.6, 306.5)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

标签: rdplyrconcatenationreshapetidyr

解决方案


假设第一个数据帧被称为输入,并且您很高兴使用 tidyverse,您可以这样做。

input %>%
  nest(data = c(-record_id)) %>%
  mutate(
    data = map(data, ~replace_na(., as.list(head(., 1)))),  # Fill in speciment details
    data = map(data, filter, !is.na(`repeat`)),             # Remove speciment details
    data = map(data, ~replace_na(., as.list(head(., 1)))),  # Fill in test data with data collection details
    data = map(data, filter, type == "test")                # Remove data collection rows
  ) %>%
  unnest(data) %>%
  select(-type

有一些方法可以更简洁和/或更快地做到这一点,但这可能更具可读性。


推荐阅读