首页 > 解决方案 > How do l extract all genes in gene_symbol into a new column that have the same start and end in r

问题描述

l have a dataframe like, l want to have another column name call Gene, where it looks through and pick all genes in gene symbols that have the same fragment or start and end into a new column call Genes as seen below

chr  start    end Fragments CK BB FP i.start  i.end       gene_name            gene_symbol
1:   1 710000 715000   143  0.2662  1  0.0138   91421 762886 ENSG00000225880      LINC00115
2:   1 710000 715000   143  0.2662  1  0.0138   91421 762886 ENSG00000240453 RP11-206L10.10
3:   1 710000 715000   143  0.2662  1  0.0138  676386 762886 ENSG00000228327  RP11-206L10.2
4:   1 710000 715000   143  0.2662  1  0.0138  714172 740255 ENSG00000237491  RP11-206L10.9
5:   1 720000 725000   145  0.0000  0  0.0000   91421 762886 ENSG00000225880      LINC00115
6:   1 720000 725000   145  0.0000  0  0.0000   91421 762886 ENSG00000240453 RP11-206L10.10
                                  

l want it to be like this

chr  start    end Fragments CK BB FP i.start  i.end           Genes
1:   1 710000 715000   143  0.2662  1  0.0138   91421 762886      LINC00115,RP11-206L10.10,RP11-206L10.2,RP11-206L10.9
2:   1 720000 725000   145  0.0000  0  0.0000   91421 762886    LINC00115,RP11-206L10.10

标签: rdataframejupyter-notebook

解决方案


We can do a group by paste

library(data.table)
dt[, .(Genes = toString(gene_symbol)),
     .(chr, start, end, Fragments, CK, BB, i.start, i.end)]

推荐阅读