首页 > 解决方案 > R:如何检查我的列表中的每个元素是否部分匹配数据框中的列?

问题描述

我有一个test_list

test_list <- list("hg38:Chr12:8823762", "hg38:Chr10:50814012", "hg19:Chr12:8990070", 
        "hg38:chr1:16949", "hg38:chr9:342484")

我想检查列表中的每个元素是否部分匹配我的Extra_informationdf

df <- structure(list(Extra_information = c("hg38:Chr10:50814012, hg19:Chr10:52573772, CpG:Mutation may have occured by deamination of methylated CpG dinucleotide", 
"hg38:Chr12:8822661, hg19:Chr12:8975257, COM:Patient is homozygous for c.706C>G p.Leu236Val in SLC26A4., dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs1409944554", 
"hg38:Chr12:8823729, hg19:Chr12:8976325, COM:Variant of unknown significance. Clinical features descr. in supplementary table 2. functional study., dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs766201825", 
"hg38:Chr12:8823762, hg19:Chr12:8976358, COM:VUS Table 2. RIT1 variant also present.", 
"hg38:Chr12:8835642, hg19:Chr12:8988238, COM:VUS Table 2. SOS1 and CBL variants also present., dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs11047499", 
"hg38:Chr12:8837474, hg19:Chr12:8990070, dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs863224952"
)), row.names = c(NA, 6L), class = "data.frame")

获取我的列表的数据框,其中包含1forTRUE0for的值FALSE

test_df <- structure(list(Entries = c("hg38:Chr12:8823762", "hg38:Chr10:50814012", "hg19:Chr12:8990070", 
        "hg38:chr1:16949", "hg38:chr9:342484"), Values = c(1,1,1,0,0)), row.names = c(NA, 5L), class = "data.frame"))

输出

我怎样才能达到预期的输出?

提前致谢。

标签: r

解决方案


这是一个基本的 R 方法。

data.frame(Entries = unlist(test_list),
           Values = sapply(test_list,function(x){
             as.numeric(length(grep(x,df$Extra_information)) > 0)
             }))
#              Entries Values
#1  hg38:Chr12:8823762      1
#2 hg38:Chr10:50814012      1
#3  hg19:Chr12:8990070      1
#4     hg38:chr1:16949      0
#5    hg38:chr9:342484      0

推荐阅读