首页 > 解决方案 > 如何在 R 中删除 grepl 不匹配的列值

问题描述

我有一个名为 mydf 的数据框。我想c.changeClinvar_Type. 如果存在,我想删除grepl("Clinvar, colnames(mydf)).

这是我的数据:

mydf <- structure(c("chr1:8045045:A:G", "chr1:8045045:A:G", "chr1:8045045:A:G", 
"chr1:17314702:C:T", "chr1:17314702:C:T", "chr1:17314702:C:T", 
"c.501A>G", "c.441A>G", "c.414A>G", "c.2775G>A", "c.2658G>A", 
"c.2790G>A", "NM_007262.5(PARK7):c.501A>G (p.Ala167=)", "NM_007262.5(PARK7):c.501A>G (p.Ala167=)", 
"NM_007262.5(PARK7):c.501A>G (p.Ala167=)", "NM_022089.4(ATP13A2):c.2790G>A (p.Ser930=)", 
"NM_022089.4(ATP13A2):c.2790G>A (p.Ser930=)", "NM_022089.4(ATP13A2):c.2790G>A (p.Ser930=)", 
"single nucleotide variant", "single nucleotide variant", "single nucleotide variant", 
"single nucleotide variant", "single nucleotide variant", "single nucleotide variant", 
"HGNC:16369", "HGNC:16369", "HGNC:16369", "HGNC:30213", "HGNC:30213", 
"HGNC:30213"), .Dim = 6:5, .Dimnames = list(NULL, c("VarID_build37", 
"c.change", "Clinvar_ Name", "Clinvar_ Type", "Clinvar_ HGNC_ID"
)))

我想要的结果:

    VarID_build37       c.change    Clinvar_ Name                                Clinvar_ Type               Clinvar_ HGNC_ID
 "chr1:8045045:A:G"  "c.501A>G"  "NM_007262.5(PARK7):c.501A>G (p.Ala167=)"    "single nucleotide variant" "HGNC:16369"    
"chr1:8045045:A:G"  "c.441A>G"     
"chr1:8045045:A:G"  "c.414A>G"     
"chr1:17314702:C:T" "c.2775G>A" 
"chr1:17314702:C:T" "c.2658G>A" 
"chr1:17314702:C:T" "c.2790G>A" "NM_022089.4(ATP13A2):c.2790G>A (p.Ser930=)" "single nucleotide variant" "HGNC:30213"  

标签: r

解决方案


这是一个基本的 R 解决方案。(如果您愿意,可以替换""NA)。

mydf[,-(1:2)][!apply(mydf,1,function(x) grepl(x["c.change"], x["Clinvar_ Name"])),] <- ""

    VarID_build37       c.change    Clinvar_ Name                                Clinvar_ Type               Clinvar_ HGNC_ID
[1,] "chr1:8045045:A:G"  "c.501A>G"  "NM_007262.5(PARK7):c.501A>G (p.Ala167=)"    "single nucleotide variant" "HGNC:16369"    
[2,] "chr1:8045045:A:G"  "c.441A>G"  ""                                           ""                          ""              
[3,] "chr1:8045045:A:G"  "c.414A>G"  ""                                           ""                          ""              
[4,] "chr1:17314702:C:T" "c.2775G>A" ""                                           ""                          ""              
[5,] "chr1:17314702:C:T" "c.2658G>A" ""                                           ""                          ""              
[6,] "chr1:17314702:C:T" "c.2790G>A" "NM_022089.4(ATP13A2):c.2790G>A (p.Ser930=)" "single nucleotide variant" "HGNC:30213" 

推荐阅读