snakemake - “输入文件由另一个作业更新”但不是真的
问题描述
我的管道有问题,我不知道这对我来说是一个不好的用途还是一个错误。我正在使用蛇形 5.26.1。直到几天前,我使用相同版本的蛇形制作时才遇到任何问题。我不明白发生了什么变化。
管道部分就在检查点规则和聚合步骤之后,这会产生一个输出文件,例如foo.fasta
.
我在foo.fasta
使用它作为输入文件的生成之后有规则,并且使用 再次运行,--reason
Input files updated by another job
但情况并非如此,它们的输出比foo.fasta
.
此外,使用该--summary
选项,文件 foo.fasta 被标记为无更新,并且文件具有正确的时间戳顺序,无需再次执行规则。
我找不到再次运行以下规则的原因。检查点是否有可能导致问题使得 snakemake 认为foo.fasta
已更新而不是更新?
这是一个试运行输出:规则agouti_scaffolding
产生tros_v5.pseudohap.fasta.gz
(my foo.fasta
)
Building DAG of jobs...
Updating job 7 (aggregate_gff3).
Replace aggregate_gff3 with dynamic branch aggregate_gff3
updating depending job agouti_scaffolding
Updating job 3 (agouti_scaffolding).
Replace agouti_scaffolding with dynamic branch agouti_scaffolding
updating depending job unzip_fasta
updating depending job btk_prepare_workdir
updating depending job v6_clean_rename
updating depending job kat_comp
updating depending job assembly_stats
Job counts:
count jobs
1 kat_comp
1 kat_plot_spectra
2
[Tue Oct 27 12:08:41 2020]
rule kat_comp:
input: results/preprocessing/tros/tros_dedup_proc_fastp_R1_001.fastq.gz, results/pre
sing/tros/tros_dedup_proc_fastp_R2_001.fastq.gz, results/fasta/tros_v5.pseudohap.fasta.g
output: results/kat/tros_v5/tros_v5_comp-main.mx
log: logs/kat_comp.tros_v5.log
jobid: 1
reason: Input files updated by another job: results/fasta/tros_v5.pseudohap.fasta.gz
wildcards: sample=tros, version=v5
[Tue Oct 27 12:08:41 2020]
rule kat_plot_spectra:
input: results/kat/tros_v5/tros_v5_comp-main.mx
output: results/kat/tros_v5/tros_v5_spectra.pdf
log: logs/kat_plot.tros_v5.log
jobid: 0
reason: Input files updated by another job: results/kat/tros_v5/tros_v5_comp-main.mx
wildcards: sample=tros, version=v5
Job counts:
count jobs
1 kat_comp
1 kat_plot_spectra
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
似乎updating depending job kat_comp
是这个问题。甚至ancient
kat_comp 输入上的标志也不会改变任何东西。
编辑
这是管道的一部分。正如@TroyComi 建议的最后一个开发版本(5.26.1+26.gc2e2b501.dirty)解决了古老指令的问题。移除远古指令规则时仍会执行。
rule all:
"results/kat/tros_v5/tros_v5_spectra.pdf"
checkpoint split_fa_augustus:
input:
"results/fasta/{sample}_v4.pseudohap.fasta.gz"
output:
"results/agouti/{sample}/{sample}_v4.pseudohap.fa",
directory("results/agouti/{sample}/split")
params:
split_size = 50000000
conda:
"../envs/augustus.yaml"
shell:
"""
zcat {input} > {output[0]}
mkdir {output[1]}
splitMfasta.pl {output[0]} \
--outputpath={output[1]} --minsize={params.split_size}
"""
rule augustus:
input:
"results/agouti/{sample}/split/{sample}_v4.pseudohap.split.{i}.fa"
output:
"results/agouti/{sample}/split/pred_{i}.gff3"
conda:
"../envs/augustus.yaml"
shell:
"""
augustus --gff3=on --species=caenorhabditis {input} > {output}
"""
def aggregate_input_gff3(wildcards):
checkpoint_output = checkpoints.split_fa_augustus.get(**wildcards).output[1]
return expand("results/agouti/{sample}/split/pred_{i}.gff3",
sample=wildcards.sample,
i=glob_wildcards(os.path.join(checkpoint_output, f"{wildcards.sample}_v4.pseudohap.split." + "{i}.fa")).i)
rule aggregate_gff3:
input:
aggregate_input_gff3
output:
"results/agouti/{sample}/{sample}_v4.pseudohap.gff3"
conda:
"../envs/augustus.yaml"
shell:
"cat {input} | join_aug_pred.pl > {output}"
#===============================
# Preprocess RNAseq data
# The RNAseq reads need to be in the folder resources/RNAseq_raw/{sample}
# Files must be named {run}_R1.fastq.gz and {run}_R2.fastq.gz for globbing to work
# globbing is done in the rule merge_RNA_bams
rule rna_rcorrector:
input:
expand("resources/RNAseq_raw/{{sample}}/{{run}}_{R}.fastq.gz",
R=['R1', 'R2'])
output:
temp(expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_{R}.cor.fq.gz",
R=['R1', 'R2']))
params:
outdir = lambda w, output: os.path.dirname(output[0])
log:
"logs/rcorrector_{sample}_{run}.log"
threads:
config['agouti']['threads']
conda:
"../envs/rna_seq.yaml"
shell:
"""
run_rcorrector.pl -1 {input[0]} -2 {input[1]} \
-t {threads} \
-od {params.outdir} \
> {log} 2>&1
"""
rule rna_trimgalore:
input:
expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_{R}.cor.fq.gz",
R=['R1', 'R2'])
output:
temp(expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_trimgal_val_{i}.fq",
i=['1', '2']))
params:
outdir = lambda w, output: os.path.dirname(output[0]),
basename = lambda w: f'{w.run}_trimgal'
log:
"logs/trimgalore_{sample}_{run}.log"
threads:
config['agouti']['threads']
conda:
"../envs/rna_seq.yaml"
shell:
"""
trim_galore --cores {threads} \
--phred33 \
--quality 20 \
--stringency 1 \
-e 0.1 \
--length 70 \
--output_dir {params.outdir} \
--basename {params.basename} \
--dont_gzip \
--paired \
{input} \
> {log} 2>&1
"""
#===============================
# Map the RNAseq reads
rule index_ref:
input:
"results/agouti/{sample}/{sample}_v4.pseudohap.fa"
output:
multiext("results/agouti/{sample}/{sample}_v4.pseudohap.fa",
".0123", ".amb", ".ann", ".bwt.2bit.64", ".bwt.8bit.32", ".pac")
conda:
"../envs/mapping.yaml"
shell:
"bwa-mem2 index {input}"
rule map_RNAseq:
input:
expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_trimgal_val_{i}.fq",
i=['1', '2']),
"results/agouti/{sample}/{sample}_v4.pseudohap.fa",
multiext("results/agouti/{sample}/{sample}_v4.pseudohap.fa",
".0123", ".amb", ".ann", ".bwt.2bit.64", ".bwt.8bit.32", ".pac")
output:
"results/agouti/{sample}/mapping/{run}.bam"
log:
"logs/bwa_rna_{sample}_{run}.log"
conda:
"../envs/mapping.yaml"
threads:
config['agouti']['threads']
shell:
"""
bwa-mem2 mem -t {threads} {input[2]} {input[0]} {input[1]} 2> {log} \
| samtools view -b -@ {threads} -o {output}
"""
def get_sample_rna_runs(w):
list_R1_files = glob.glob(f"resources/RNAseq_raw/{w.sample}/*_R1.fastq.gz")
list_runs = [re.sub('_R1\.fastq\.gz$', '', os.path.basename(f)) for f in list_R1_files]
return [f'results/agouti/{w.sample}/mapping/{run}.bam' for run in list_runs]
rule merge_RNA_bams:
input:
get_sample_rna_runs
output:
"results/agouti/{sample}/RNAseq_mapped_merged.bam"
params:
tmp_merge = lambda w: f'results/agouti/{w.sample}/tmp_merge.bam'
conda:
"../envs/mapping.yaml"
threads:
config['agouti']['threads']
shell:
"""
samtools merge -@ {threads} {params.tmp_merge} {input}
samtools sort -@ {threads} -n -o {output} {params.tmp_merge}
rm {params.tmp_merge}
"""
#===============================
# Run agouti on all that
rule agouti_scaffolding:
input:
fa = "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
bam = "results/agouti/{sample}/RNAseq_mapped_merged.bam",
gff = "results/agouti/{sample}/{sample}_v4.pseudohap.gff3"
output:
protected("results/fasta/{sample}_v5.pseudohap.fasta.gz")
params:
outdir = lambda w: f'results/agouti/{w.sample}/agouti_out',
minMQ = 20,
maxFracMM = 0.05
log:
"logs/agouti_{sample}.log"
conda:
"../envs/agouti.yaml"
shell:
"""
python /opt/agouti/agouti.py scaffold \
-assembly {input.fa} \
-bam {input.bam} \
-gff {input.gff} \
-outdir {params.outdir} \
-minMQ {params.minMQ} -maxFracMM {params.maxFracMM} \
> {log} 2>&1
gzip -c {params.outdir}/agouti.agouti.fasta > {output}
"""
#===============================================================
# Now do something on output {sample}_{version}.pseudohap.fasta.gz
rule kat_comp:
input:
expand("results/preprocessing/{{sample}}/{{sample}}_dedup_proc_fastp_{R}_001.fastq.gz",
R=["R1", "R2"]),
ancient("results/fasta/{sample}_{version}.pseudohap.fasta.gz")
output:
"results/kat/{sample}_{version}/{sample}_{version}_comp-main.mx"
params:
outprefix = lambda w: f'results/kat/{w.sample}_{w.version}/{w.sample}_{w.version}_comp'
log:
"logs/kat_comp.{sample}_{version}.log"
conda:
"../envs/kat.yaml"
threads:
16
shell:
"""
kat comp -t {threads} \
-o {params.outprefix} \
'{input[0]} {input[1]}' \
{input[2]} \
> {log} 2>&1
"""
rule kat_plot_spectra:
input:
"results/kat/{sample}_{version}/{sample}_{version}_comp-main.mx"
output:
"results/kat/{sample}_{version}/{sample}_{version}_spectra.pdf"
params:
title = lambda w: f'{w.sample}_{w.version}'
log:
"logs/kat_plot.{sample}_{version}.log"
conda:
"../envs/kat.yaml"
shell:
"""
kat plot spectra-cn \
-o {output} \
-t {params.title} \
{input} \
> {log} 2>&1
"""
解决方案
推荐阅读
- android - 为 WebView 添加窗口事件监听器并调用 Java 函数
- aws-api-gateway - 如何在 CDK 中集成 Api Gateway 和 Step Functions?
- typescript - Typescript 界面和 Promise 返回问题
- android - 如何在 android 设备中录制内部音频 [root/no root]
- sql - 仅当值等于或大于 3,500 时求和?
- sql - 尝试在 VBA 中使用 ADODB 进行参数化查询时从错误消息中获取更多信息
- c# - 创建大于 150MB 的 ZIP 文件会引发 OutOfMemoryException C#
- flutter - Flutter:“bool”类型不是“double”类型的子类型
- flutter - 如何更改 Flutter 默认选项卡控制器中的选项卡?
- java - 数 N 年后的奶牛数量