首页 > 解决方案 > nextflow 输入和输出一个带键的元组

问题描述

我正在使用 Nextflow 处理文件,该文件具有示例 ID,并且希望跨进程携带此 sampleID,因此我使用元组。相关的代码片段在这里:

process 'rsem_quant' {

  input:
      val genome from params.genome 
      tuple val(sampleId), file(read1), file(read2) from samples_ch

  output:
    tuple sampleId , path "${sampleId}.genes.results" into rsem_ce 

  script:
    """
    module load RSEM
    rsem-calculate-expression --star --keep-intermediate-files \
    --sort-bam-by-coordinate  --star-output-genome-bam --strandedness reverse \
    --star-gzipped-read-file --paired-end $genome \
    $read1 $read2 $sampleId
    """

问题是当使用元组作为输出时,我收到以下错误:

No such variable: sampleId

如果我删除元组,只输出任一部分(sampleId 或路径)它工作正常,任何帮助表示赞赏

标签: nextflow

解决方案


我无法使用提供的代码重现错误。我怀疑您的输出块需要为“sampleId”变量定义输出类型val :

output:
    tuple val(sampleId) , path("${sampleId}.genes.results") into rsem_ce 


在双端读取(使用Conda )上运行RSEM的最小示例可能如下所示:

nextflow.enable.dsl=2

params.ref_name = 'GRCh38_GENCODE_v31'
params.ref_fasta = 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/GRCh38.primary_assembly.genome.fa.gz'
params.ref_gtf = 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.primary_assembly.annotation.gtf.gz'
params.strandedness = 'reverse'


include { gunzip as gunzip_fasta } from './gzip.nf'
include { gunzip as gunzip_gtf } from './gzip.nf'


process 'rsem_prepare_ref' {

    conda 'rsem star samtools'

    input:
    val ref_name
    path ref_fasta
    path ref_gtf

    output:
    path "${ref_name}"

    """
    mkdir "${ref_name}"
    rsem-prepare-reference \\
        --gtf "${ref_gtf}" \\
        --star \\
        "${ref_fasta}" \\
        "${ref_name}/${ref_name}"
    """
}

process 'rsem_calculate_expression' {

    tag { sample }

    conda 'rsem star samtools'

    input:
    tuple val(sample), path(reads) 
    path ref_name 

    output:
    tuple val(sample), path("${sample}.genes.results")

    script:
    def (read1, read2) = reads

    """
    rsem-calculate-expression \\
        --star \\
        --sort-bam-by-coordinate \\
        --star-output-genome-bam \\
        --strandedness "${params.strandedness}" \\
        --star-gzipped-read-file \\
        --paired-end \\
        "${read1}" \\
        "${read2}" \\
        "${ref_name}/${ref_name}" \\
        "${sample}"
    """
}

workflow {

    reads = Channel.fromFilePairs( './data/*_{1,2}.fastq.gz' )

    ref_fasta = gunzip_fasta( params.ref_fasta )
    ref_gtf = gunzip_gtf( params.ref_gtf )

    rsem_prepare_ref( params.ref_name, ref_fasta, ref_gtf )
    rsem_calculate_expression( reads, rsem_prepare_ref.out )
}

内容gzip.nf

process gunzip {

    tag { gzfile.name }

    input:
    path gzfile

    output:
    path "${gzfile.getBaseName()}"

    when:
    gzfile.getExtension() == "gz"

    """
    gzip -dc "${gzfile}" > "${gzfile.getBaseName()}"
    """
}

运行使用:

nextflow run test.nf -resume -ansi-log false

结果:

N E X T F L O W  ~  version 21.04.3
Launching `main.nf` [awesome_poincare] - revision: 51040c89cc
[cf/ffec1a] Cached process > gunzip_fasta (GRCh38.primary_assembly.genome.fa.gz)
[ce/b7a04b] Cached process > gunzip_gtf (gencode.v38.primary_assembly.annotation.gtf.gz)
[f1/bcb8e3] Cached process > rsem_prepare_ref
[de/f7906e] Submitted process > rsem_calculate_expression (HBR_Rep2)
[1e/3984da] Submitted process > rsem_calculate_expression (UHR_Rep1)
[59/907f56] Submitted process > rsem_calculate_expression (UHR_Rep3)
[26/41db23] Submitted process > rsem_calculate_expression (HBR_Rep1)
[e8/2c98fe] Submitted process > rsem_calculate_expression (UHR_Rep2)
[03/bbb42b] Submitted process > rsem_calculate_expression (HBR_Rep3)

推荐阅读