input - 多个输出到单个列表输入 - 在 Nextflow 中合并 BAM 文件
问题描述
我正在尝试将通过一次执行多个对齐生成的x个 bam 文件(对y个 fastq 文件的批次)合并到 Nextflow 中的一个 bam 文件中。
到目前为止,在执行对齐和排序/索引生成的 bam 文件时,我有以下内容:
//Run minimap2 on concatenated fastqs
process miniMap2Bam {
publishDir "$params.bamDir"
errorStrategy 'retry'
cache 'deep'
maxRetries 3
maxForks 10
memory { 16.GB * task.attempt }
input:
val dirString from dirStr
val runString from stringRun
each file(batchFastq) from fastqBatch.flatMap()
output:
val runString into stringRun1
file("${batchFastq}.bam") into bamFiles
val dirString into dirStrSam
script:
"""
minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
samtools index ${batchFastq}.bam
"""
}
${batchFastq}.bam
包含一批y个 fastq 文件的 bam 文件在哪里。
此管道完成得很好,但是,当尝试samtools merge
在另一个进程 (samToolsMerge) 中对这些 bam 文件执行时,该进程在每次运行对齐时运行(在本例中为 4),而不是为收集的所有 bam 文件运行一次:
//Run samtools merge
process samToolsMerge {
echo true
publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
cache 'deep'
errorStrategy 'retry'
maxRetries 3
maxForks 10
memory { 14.GB * task.attempt }
input:
val runString from stringRun1
file bamFile from bamFiles.collect()
val dirString from dirStrSam
output:
file("**")
script:
"""
samtools merge ${runString}.bam ${bamFile}
"""
}
输出为:
executor > lsf (9)
[49/182ec0] process > catFastqs (1) [100%] 1 of 1 ✔
[- ] process > nanoPlotSummary -
[0e/609a7a] process > miniMap2Bam (1) [100%] 4 of 4 ✔
[42/72469d] process > samToolsMerge (2) [100%] 4 of 4 ✔
Completed at: 04-Mar-2021 14:54:21
Duration : 5m 41s
CPU hours : 0.2
Succeeded : 9
如何仅从生成的 bam 文件中获取miniMap2Bam
并运行它们samToolsMerge
一次,而不是多次运行该进程?
提前致谢!
编辑:感谢 Pallie 在下面的评论中,问题是将先前进程中的 runString 和 dirString 值输入 miniMap2Bam,然后输入 samToolsMerge,导致每次传递值时该过程都会重复。
解决方案就像从 miniMap2Bam 中删除 vals 一样简单(如下):
//Run minimap2 on concatenated fastqs
process miniMap2Bam {
errorStrategy 'retry'
cache 'deep'
maxRetries 3
maxForks 10
memory { 16.GB * task.attempt }
input:
each file(batchFastq) from fastqBatch.flatMap()
output:
file("${batchFastq}.bam") into bamFiles
script:
"""
minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
samtools index ${batchFastq}.bam
"""
}
解决方案
最简单的修复可能会停止通过通道传递静态目录字符串和运行字符串:
// Instead of a hardcoded path use a parameter you passed via CLI like you did with bamDir
dirString = file("/path/to/fastqs/")
runString = file("/path/to/fastqs/").getParent()
fastqBatch = Channel.from("/path/to/fastqs/")
//Run minimap2 on concatenated fastqs
process miniMap2Bam {
publishDir "$params.bamDir"
errorStrategy 'retry'
cache 'deep'
maxRetries 3
maxForks 10
memory { 16.GB * task.attempt }
input:
each file(batchFastq) from fastqBatch.flatMap()
output:
file("${batchFastq}.bam") into bamFiles
script:
"""
minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
samtools index ${batchFastq}.bam
"""
}
//Run samtools merge
process samToolsMerge {
echo true
publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
cache 'deep'
errorStrategy 'retry'
maxRetries 3
maxForks 10
memory { 14.GB * task.attempt }
input:
file bamFile from bamFiles.collect()
output:
file("**")
script:
"""
samtools merge ${runString}.bam ${bamFile}
"""
推荐阅读
- css - 在全球范围内减少 bootstrap 4 中所有内容的大小
- r - 数据框根据条件对行进行分组
- python - 如何从函数中求和
- apache - htaccess 通配符重定向(文件夹到子文件夹)
- javascript - div不会并排坐着
- javascript - Javascript Promises:测试错误场景
- reactjs - 当我更新 Redux 时,React Native componentWillReceiveProps 没有正确调用
- android - 在 Django 中出现“您无权访问该端口”错误
- excel - VBA Excel - 将数组传递给函数
- vue.js - VueJs: push value with key in array