首页 > 解决方案 > dplyr:: 对列 x1:x5 求和,不包括名称 == 列 y 值的列

问题描述

tibble::tribble(
             ~cell.name, ~cluster_label, ~cluster_id, ~X1, ~X2, ~X3, ~X4, ~X5,
     "GTACTTTAGCCAGTAG",       "Div_10",         "3",   0,   1,   2,   0,   0,
     "ACACTGAAGTCTCAAC",         "CR_1",         "1",  13,   1,   0,   1,   0,
     "GACGGCTCATCCTTGC",         "CR_1",         "1",  10,   1,   0,   1,   0,
     "CTCGAAAGTATAAACG",         "CR_1",         "1",  13,   0,   0,   0,   0,
     "GACGGCTGTCGCGTGT",         "CR_1",         "1",  10,   5,   0,   1,   0
)

我想得到 X1:X5 列的总和,不包括每一行 i == cluster_id 的 Xi 列。

编辑:

预期输出:

tibble::tribble(
             ~cell.name, ~cluster_label, ~cluster_id, ~outliers,
     "GTACTTTAGCCAGTAG",       "Div_10",         "3",   1,
     "ACACTGAAGTCTCAAC",         "CR_1",         "1",   2,
     "GACGGCTCATCCTTGC",         "CR_1",         "1",   2,
     "CTCGAAAGTATAAACG",         "CR_1",         "1",   0,
     "GACGGCTGTCGCGTGT",        "Neu_2",         "2",   6
)

我怎样才能做到这一点?谢谢!

标签: rdplyr

解决方案


我们可以重塑为“长”格式,得到sum

library(dplyr)
library(tidyr)
df1 %>% 
   mutate(rn = row_number()) %>% 
   pivot_longer(cols = starts_with("X")) %>% 
   group_by(rn, cell.name) %>%
   summarise(cluster_id = first(cluster_id), cluster_label = first(cluster_label),
        outliers = sum(value[readr::parse_number(name)!= cluster_id])) %>%
   ungroup %>%
   select(-rn)
# A tibble: 5 x 4
#  cell.name        cluster_id cluster_label outliers
#  <chr>            <chr>      <chr>            <dbl>
#1 GTACTTTAGCCAGTAG 3          Div_10               1
#2 ACACTGAAGTCTCAAC 1          CR_1                 2
#3 GACGGCTCATCCTTGC 1          CR_1                 2
#4 CTCGAAAGTATAAACG 1          CR_1                 0
#5 GACGGCTGTCGCGTGT 1          CR_1                 6

推荐阅读