首页 > 解决方案 > 使用 snakemake touch 使用 bwa 和 gatk 创建参考基因组索引

问题描述

我将读取与bwa调用变体与gatk. gatk需要dict为参考基因组创建一个,并且bwa需要创建索引。当我对它们都使用触摸时,我得到了这个错误:

AmbiguousRuleException:
Rules bwa_index and gatk_refdict are ambiguous for the file ref.
Expected input files:
        bwa_index: ref.fasta
        gatk_refdict: ref.fasta

这是代码:

rule bwa_index:
    input:
        database="ref.fasta"
    output:
        done =touch("ref")
    shell:
        """
        bwa index -p ref {input.database}
        """
rule bwa_mem:
    input:
        bwa_index_done = "ref",
        fastq1="{sample}_R1.trimmed.fastq.gz",
        fastq2="{sample}_R2.trimmed.fastq.gz"
    output:
        bam = temp("{sample}.bam")
    shell:
        """
        bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
        """
rule_gatk_refdict:
    input:
        ref="ref.fasta"
    output:
        done =touch("ref")
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
        """
rule gatk:
    input:
        gatk_refdict_done = "ref",
        bam="bam_list"
    output:
        outf ="{chr}.vcf"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R ref -I {input.bam} --min-base-quality-score 20 -O {output.outf}
        """

或者,我指定了 index .dict,但它也不起作用,因为 gatk 在创建 dict 之前调用了变体,所以我收到一个错误,即没有 dict 文件:

rule_gatk_refdict:
    input:
        ref="ref.fasta"
    output:
        outf ="ref.dict"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.outf}
        """
rule gatk:
    input:
        ref = "ref.fasta",
        bam="bam_list"
    output:
        outf ="{chr}.vcf"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
        """

如何解决这个问题?

标签: snakemake

解决方案


为什么不简单地将dict文件定义为gatk 规则的输入,将ìindex 定义为bwa 规则的输入?

rule bwa_index:
    input:
        database="ref.fasta"
    output:
        done =touch("ref")
    shell:
        """
        bwa index -p ref {input.database}
        """
rule bwa_mem:
    input:
        bwa_index_done = "ref",
        fastq1="{sample}_R1.trimmed.fastq.gz",
        fastq2="{sample}_R2.trimmed.fastq.gz"
    output:
        bam = temp("{sample}.bam")
    shell:
        """
        bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
        """
rule gatk_refdict:
    input:
        ref="ref.fasta"
    output:
        done = "ref.dict"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
        """
rule gatk:
    input:
        ref = "ref.fasta",
        dict = "ref.dict",
        bam="bam_list"
    output:
        outf ="{chr}.vcf"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
        """

你得到的AmbiguousRuleException是因为snakemake不知道要运行哪个规则,因为两个规则具有相同的输出。不要忘记,snakemake 会尝试从所有规则开始构建 DAG。当涉及到 run 时rule gatk,您将其定义"ref"为输入。由于两个规则可以生成这个文件,snakemake 不知道它必须使用rule gatk_refdict还是rule bwa_index.

你有一个错字(“_”不应该在那里):

----v
rule_gatk_refdict:
    input:
        ref="ref.fasta"
    ...

推荐阅读