首页 > 解决方案 > 如何合并两列并在 r 中保留第三列?

问题描述

我正在尝试将 2 个 ID 列合并为一个,但将每个 ID 的分数保留在另一列中。例如,我的数据如下所示:

VARIANT_ID1          VARIANT_ID2        score
01_1123425_A_G_1    01_1254436_A_G_1    0.1
02_21234356_A_G_1   02_2254436_A_G_1    0.2
03_31234356_A_G_1   03_3255436_A_G_1    0.3
10_10312345_A_G_1   10_10344745_A_G_1   0.4

我正在尝试输出这个:

VARIANT_ID1and2     score
01_1123425_A_G_1      0.1
02_21234356_A_G_1     0.2
03_31234356_A_G_1     0.3
10_10312345_A_G_1     0.4
01_1254436_A_G_1      0.1   #VARIANT_ID2 appended below VARIANT_ID1 here including their scores
02_2254436_A_G_1      0.2
03_3255436_A_G_1      0.3
10_10344745_A_G_1     0.4

我一直在尝试coalesce()从 dplyr 使用,但无法找到有关如何包含第三列的信息,我有一个生物学背景,所以不确定是否有任何其他功能可以解释这一点,对可能的功能有任何帮助将不胜感激。

输入数据:

structure(list(VARIANT_ID1 = c("01_1123425_A_G_1", "02_21234356_A_G_1", 
"03_31234356_A_G_1", "10_10312345_A_G_1", "11_1456768_A_G_1"), 
    VARIANT_ID2 = c("01_1254436_A_G_1", "02_2254436_A_G_1", "03_3255436_A_G_1", 
    "10_10344745_A_G_1", "11_11256437_A_G_1"), score = c(0.1, 
    0.2, 0.3, 0.4, 0.5)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

标签: rdataframedplyr

解决方案


使用pivot_longer()tidyr部分tidyverse):

library(tidyverse)

df <-
  structure(
    list(
      VARIANT_ID1 = c(
        "01_1123425_A_G_1",
        "02_21234356_A_G_1",
        "03_31234356_A_G_1",
        "10_10312345_A_G_1",
        "11_1456768_A_G_1"
      ),
      VARIANT_ID2 = c(
        "01_1254436_A_G_1",
        "02_2254436_A_G_1",
        "03_3255436_A_G_1",
        "10_10344745_A_G_1",
        "11_11256437_A_G_1"
      ),
      score = c(0.1,
                0.2, 0.3, 0.4, 0.5)
    ),
    row.names = c(NA, -5L),
    class = c("data.table",
              "data.frame")
  )

df %>% 
  pivot_longer(starts_with('VARIANT_ID'), names_to = 'Variant.ID', names_prefix = 'VARIANT_ID', values_to = 'VARIANT_ID1and2') %>% 
  arrange(Variant.ID) %>% 
  select(VARIANT_ID1and2, score, -Variant.ID)
#> # A tibble: 10 x 2
#>    VARIANT_ID1and2   score
#>    <chr>             <dbl>
#>  1 01_1123425_A_G_1    0.1
#>  2 02_21234356_A_G_1   0.2
#>  3 03_31234356_A_G_1   0.3
#>  4 10_10312345_A_G_1   0.4
#>  5 11_1456768_A_G_1    0.5
#>  6 01_1254436_A_G_1    0.1
#>  7 02_2254436_A_G_1    0.2
#>  8 03_3255436_A_G_1    0.3
#>  9 10_10344745_A_G_1   0.4
#> 10 11_11256437_A_G_1   0.5

reprex 包于 2020-05-11 创建(v0.3.0)

arrange(Variant.ID)是否仅以与您提供的输出相同的方式对其进行排序。Variant.ID列包含有关 ID 的信息。我用 将它从决赛桌中删除select(..., -Variant.ID)


推荐阅读