首页 > 解决方案 > 使用 tidyverse 格式化按 id 分组的嵌套 data.frame

问题描述

我有一个嵌套数据集,每行有两个 id,如下所示:

df <- data.frame(
        sample = rep(paste0("s", 1:5), each = 5),
        ID1 = paste0("id1.", 1:5),
        ID2 = paste0("id2.", 1:5),
        counts = rep(1:5, each = 5)) %>%
    arrange(ID1) %>%
    group_by(ID1, ID2) %>% nest

我想获得一个数据框以进行进一步分析,第一行给出样本,接下来的列给出计数(每个 id),合并的 id 为 colnames:

df3 <- data.frame(
    sample = paste0("s", 1:5),
    "id1.1|id2.1" = 1:5,
    "id1.2|id2.2" = 1:5,
    "id1.3|id2.3" = 1:5,
    "id1.4|id2.4" = 1:5,
    "id1.5|id2.5" = 1:5)

我已经开始格式化:

df2 <- df %>% 
    mutate(sample = data %>% map(pull, sample)) %>%
    mutate(counts = data %>% map(pull, counts))

但是,我不确定继续进行的优雅方法是什么。

标签: rtidyverse

解决方案


一种tidyr解决方案,它取消嵌套列,然后转向宽格式。

library(tidyr)

df %>%
  unite(ID, ID1, ID2) %>%
  unnest(data) %>%
  pivot_wider(names_from = ID, values_from = counts)

# # A tibble: 5 x 6
#   sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
#   <chr>        <int>       <int>       <int>       <int>       <int>
# 1 s1               1           1           1           1           1
# 2 s2               2           2           2           2           2
# 3 s3               3           3           3           3           3
# 4 s4               4           4           4           4           4
# 5 s5               5           5           5           5           5

或者从你的工作开始

df %>% 
  ungroup() %>% 
  mutate(sample = data %>% map("sample"),
         counts = data %>% map("counts"), .keep = "unused") %>% 
  unite(ID, ID1, ID2) %>%
  unnest(-ID) %>%
  pivot_wider(names_from = ID, values_from = counts)

请注意,这mutate(sample = data %>% map("sample")是 的快捷方式mutate(sample = data %>% map(pull, sample)),是 的功能map()


推荐阅读