首页 > 解决方案 > 如何在 R 中创建新数据时转置列?

问题描述

我有一个如下所示的基因数据集:

Pathway       Gene
Pathway1      Gene1
Pathway1      Gene2
Pathway2      Gene3
Pathway2      Gene1
Pathway3      Gene1
Pathway3      Gene4
Pathway3      Gene5

我希望将Pathways行转置为列,同时跟踪哪些基因存在于具有 1 和 0 的路径中。创建这样的输出:

Gene  Pathway1  Pathway2  Pathway3
Gene1    1           1         1
Gene2    1           0         0
Gene3    0           1         0
Gene4    0           0         1
Gene5    0           0         0

我的真实数据大约有 3000 行长,我对 R 没有信心,所以我一直在使用 t() 但我不确定从哪里开始编码以获得我正在寻找的二进制计数 -任何有关尝试功能的帮助或建议都会有所帮助。

输入示例数据:

structure(list(Pathway = c("Pathway1", "Pathway1", "Pathway2", 
"Pathway2", "Pathway3", "Pathway3", "Pathway3"), Gene = c("Gene1", 
"Gene2", "Gene3", "Gene1", "Gene1", "Gene4", "Gene5")), row.names = c(NA, 
-7L), class = c("data.table", "data.frame"))

标签: rdataframetranspose

解决方案


一个快速而肮脏的tidyverse解决方案:

library(tidyr)

# edit thanks to @Ronak Shah
df %>%
pivot_wider(names_from = Pathway,
            values_from = Pathway,
            values_fn = length, values_fill = 0)

# A tibble: 5 x 4
  Gene  Pathway1 Pathway2 Pathway3
  <chr>    <dbl>    <dbl>    <dbl>
1 Gene1        1        1        1
2 Gene2        1        0        0
3 Gene3        0        1        0
4 Gene4        0        0        1
5 Gene5        0        0        1

推荐阅读