r - 如何通过循环中的名称列表 gsub
问题描述
我有一批样本要提交以在我的大学集群上进行处理。我有超过 1000 个样本需要运行。不必手动创建脚本,我想知道我可以制作一个 for 循环来替换示例 ID。每个脚本本质上都是一样的,我只需要更改示例 ID 和文件的位置。
df <- structure(list(V1 = c("#!/bin/bash", "#BSUB -W 1440", "#BSUB -n 16",
"#BSUB -x", "#BSUB -R \"rusage[mem=4000] span[hosts=1]\"", "#BSUB -o /gpfs_common/share01/files/abc123.out.%J.txt",
"#BSUB -e /gpfs_common/share01/files/abc123.err.%J.txt", "",
"", "", "", "mcli cp def456/abc123 /panfs/roc/groups/0/location/data.base",
"gzip /panfs/roc/groups/0/location/data.base/abc123", "mcli mv /panfs/roc/groups/0/location/data.base/abc123.gz def456/",
"", "", "#BSUB -J abc123", "\t\t\t", "", "", "", "", "")), row.names = c(NA,
-23L), class = c("data.table", "data.frame"))
names <- list(V1 = c("D00268.merged.dedup.realn.haplotypecaller.g.vcf",
"D00316.merged.dedup.realn.haplotypecaller.g.vcf", "D00426.merged.dedup.realn.haplotypecaller.g.vcf",
"D00432.merged.dedup.realn.haplotypecaller.g.vcf", "D00474.merged.dedup.realn.haplotypecaller.g.vcf",
"D00510.merged.dedup.realn.haplotypecaller.g.vcf", "D00574.merged.dedup.realn.haplotypecaller.g.vcf",
"D00607.merged.dedup.realn.haplotypecaller.g.vcf", "D00619.merged.dedup.realn.haplotypecaller.g.vcf",
"D00662.merged.dedup.realn.haplotypecaller.g.vcf"))
locations <- list(V1 = c("s3/lab/wgs/yrkt/D00268/gvcf/", "s3/lab/wgs/dach/D00316/gvcf/",
"s3/lab/wgs/mnpd/D00426/gvcf/", "s3/lab/wgs/yrkt/D00432/gvcf/",
"s3/lab/wgs/ckcs/D00474/gvcf/", "s3/lab/wgs/lbrt/D00510/gvcf/",
"s3/lab/wgs/shlt/D00574/gvcf/", "s3/lab/wgs/shlt/D00607/gvcf/",
"s3/lab/wgs/mnsc/D00619/gvcf/", "s3/lab/wgs/gtdn/D00662/gvcf/"
))
所以 df 只是我想运行 for 循环的主脚本。我在主脚本中将示例名称更改为“abc123”,将示例位置更改为“def456”,这样我就可以使用 gsub 之类的东西来识别这两种模式并将它们替换为示例 ID 和示例位置。我希望在完成后创建一个看起来像这样的文本文件。
#!/bin/bash
#BSUB -W 1440
#BSUB -n 16
#BSUB -x
#BSUB -R "rusage[mem=4000] span[hosts=1]"
#BSUB -o /gpfs_common/share01/files/D00268.merged.dedup.realn.haplotypecaller.g.vcf.out.%J.txt
#BSUB -e /gpfs_common/share01/files/D00268.merged.dedup.realn.haplotypecaller.g.vcf.err.%J.txt
mcli cp s3/lab/wgs/yrkt/D00268/gvcf/D00268.merged.dedup.realn.haplotypecaller.g.vcf /panfs/roc/groups/0/location/data.base
gzip /panfs/roc/groups/0/location/data.base/D00268.merged.dedup.realn.haplotypecaller.g.vcf
mcli mv /panfs/roc/groups/0/location/data.base/D00268.merged.dedup.realn.haplotypecaller.g.vcf.gz s3/lab/wgs/yrkt/D00268/gvcf/
#BSUB -J D00268.merged.dedup.realn.haplotypecaller.g.vcf
我在想一个 for 循环将是在这里做的最简单的事情,但我愿意接受建议。希望这一切都说得通。如果您有任何问题,请告诉我
我过去曾使用过此 for 循环,但我从未使用过 for 循环通过列表进行 gsub
for(i in 1:nrow(df)){
df[i,'V1'] <- gsub("abc123", "D00268.merged.dedup.realn.haplotypecaller.g.vcf", df[i,'V1'])
df[i,'V1'] <- gsub("def456", "s3/lab/wgs/yrkt/D00268/gvcf/", df[i,'V1'])
}
解决方案
要坚持 for 循环的想法并修改您建议的方法,您可以执行以下操作:
for(i in 1:length(locations[[1]])){
df2 <- df
df2[,'V1'] <- gsub("abc123", names[['V1']][i], df2[,'V1'])
df2[,'V1'] <- gsub("def456", locations[['V1']][i], df2[,'V1'])
fileConn<-file(paste0("script_", i, ".sh" ))
writeLines(df2$V1, fileConn)
close(fileConn)
}
推荐阅读
- c++ - 不明白为什么连接不上
- bouncycastle - 如何从 AsymmetricKeyParameter 创建 java.security.PublicKey
- javascript - 如何替换两个对象数组的对象中的键/值对
- nginx - Nginx 代理显示旧内容
- selenium - BeautifulSoup/Scraping- Python
- istio - 为特定的路由或路径应用 EnvoyFilter
- visual-studio-code - 在光标位置插入预定义文本
- ansible - Ansible如何获取windows更新kb数
- vba - 循环遍历char、array,根据之前的值Visual Basic对每个char进行计算
- python - 在 Python 中的 BeautifulSoup 中获取 NextSibling