首页 > 解决方案 > 如何通过循环中的名称列表 gsub

问题描述

我有一批样本要提交以在我的大学集群上进行处理。我有超过 1000 个样本需要运行。不必手动创建脚本,我想知道我可以制作一个 for 循环来替换示例 ID。每个脚本本质上都是一样的,我只需要更改示例 ID 和文件的位置。

df <- structure(list(V1 = c("#!/bin/bash", "#BSUB -W 1440", "#BSUB -n 16", 
                            "#BSUB -x", "#BSUB -R \"rusage[mem=4000] span[hosts=1]\"", "#BSUB -o /gpfs_common/share01/files/abc123.out.%J.txt", 
                            "#BSUB -e /gpfs_common/share01/files/abc123.err.%J.txt", "", 
                            "", "", "", "mcli cp def456/abc123 /panfs/roc/groups/0/location/data.base", 
                            "gzip /panfs/roc/groups/0/location/data.base/abc123", "mcli mv /panfs/roc/groups/0/location/data.base/abc123.gz def456/", 
                            "", "", "#BSUB -J abc123", "\t\t\t", "", "", "", "", "")), row.names = c(NA, 
                                                                                                     -23L), class = c("data.table", "data.frame"))

names <- list(V1 = c("D00268.merged.dedup.realn.haplotypecaller.g.vcf", 
                         "D00316.merged.dedup.realn.haplotypecaller.g.vcf", "D00426.merged.dedup.realn.haplotypecaller.g.vcf", 
                         "D00432.merged.dedup.realn.haplotypecaller.g.vcf", "D00474.merged.dedup.realn.haplotypecaller.g.vcf", 
                         "D00510.merged.dedup.realn.haplotypecaller.g.vcf", "D00574.merged.dedup.realn.haplotypecaller.g.vcf", 
                         "D00607.merged.dedup.realn.haplotypecaller.g.vcf", "D00619.merged.dedup.realn.haplotypecaller.g.vcf", 
                         "D00662.merged.dedup.realn.haplotypecaller.g.vcf"))
    
locations <- list(V1 = c("s3/lab/wgs/yrkt/D00268/gvcf/", "s3/lab/wgs/dach/D00316/gvcf/", 
                         "s3/lab/wgs/mnpd/D00426/gvcf/", "s3/lab/wgs/yrkt/D00432/gvcf/", 
                         "s3/lab/wgs/ckcs/D00474/gvcf/", "s3/lab/wgs/lbrt/D00510/gvcf/", 
                         "s3/lab/wgs/shlt/D00574/gvcf/", "s3/lab/wgs/shlt/D00607/gvcf/", 
                         "s3/lab/wgs/mnsc/D00619/gvcf/", "s3/lab/wgs/gtdn/D00662/gvcf/"
))

所以 df 只是我想运行 for 循环的主脚本。我在主脚本中将示例名称更改为“abc123”,将示例位置更改为“def456”,这样我就可以使用 gsub 之类的东西来识别这两种模式并将它们替换为示例 ID 和示例位置。我希望在完成后创建一个看起来像这样的文本文件。

#!/bin/bash
#BSUB -W 1440
#BSUB -n 16
#BSUB -x
#BSUB -R "rusage[mem=4000] span[hosts=1]"
#BSUB -o /gpfs_common/share01/files/D00268.merged.dedup.realn.haplotypecaller.g.vcf.out.%J.txt
#BSUB -e /gpfs_common/share01/files/D00268.merged.dedup.realn.haplotypecaller.g.vcf.err.%J.txt




mcli cp s3/lab/wgs/yrkt/D00268/gvcf/D00268.merged.dedup.realn.haplotypecaller.g.vcf /panfs/roc/groups/0/location/data.base
gzip /panfs/roc/groups/0/location/data.base/D00268.merged.dedup.realn.haplotypecaller.g.vcf
mcli mv /panfs/roc/groups/0/location/data.base/D00268.merged.dedup.realn.haplotypecaller.g.vcf.gz s3/lab/wgs/yrkt/D00268/gvcf/


#BSUB -J D00268.merged.dedup.realn.haplotypecaller.g.vcf
        

我在想一个 for 循环将是在这里做的最简单的事情,但我愿意接受建议。希望这一切都说得通。如果您有任何问题,请告诉我

我过去曾使用过此 for 循环,但我从未使用过 for 循环通过列表进行 gsub

for(i in 1:nrow(df)){
  df[i,'V1'] <- gsub("abc123", "D00268.merged.dedup.realn.haplotypecaller.g.vcf", df[i,'V1'])
  df[i,'V1'] <- gsub("def456", "s3/lab/wgs/yrkt/D00268/gvcf/", df[i,'V1'])
  
}

标签: rfor-loop

解决方案


要坚持 for 循环的想法并修改您建议的方法,您可以执行以下操作:

for(i in 1:length(locations[[1]])){

df2 <- df
df2[,'V1'] <- gsub("abc123", names[['V1']][i], df2[,'V1'])
df2[,'V1'] <- gsub("def456", locations[['V1']][i], df2[,'V1'])
fileConn<-file(paste0("script_", i, ".sh" ))
writeLines(df2$V1, fileConn)
close(fileConn)

}

推荐阅读