bioinformatics - Snakemake 在两个不同的通配符子集上发布运行规则
问题描述
我正在尝试编写一个snakemake 规则,该规则将根据它们grouping
与其他文件的不同来处理某些文件。我的文件列表加载在 sample.tsv 文件中。
我认为这会相对容易,因为我认为在 中填充通配符rule all
会触发特定规则的执行,但情况似乎并非如此。
这是我正在研究的配对版本
示例文件列表。请注意,chip
这里的类别对于定义我的组变得很重要
tissue replicate chip file
leaf rep2 input 00.data/chip_seq/input/leaf_input_rep2.fastq
leaf rep1 input 00.data/chip_seq/input/leaf_input_rep1.fastq
leaf rep2 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep2.fastq
leaf rep1 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep1.fastq
leaf rep1 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep1.fastq
leaf rep2 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep2.fastq
在我的脚本中,我将它们分为两个子类别
broad = ['H3K36me3']
narrow = ["H3K56ac"]
rule all:
input:
#Align all reads
expand("02.unique_align/{tissue}_{chip}_{replicate}_unique_bowtie2_algn.bam", \
¦ ¦ ¦ tissue = samples['tissue'], replicate = samples['replicate'], \
¦ ¦ ¦ chip = samples['chip']),
#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_narrow to trigger
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_narrow.bed",
¦ ¦ ¦ tissue = narrow_peaks['tissue'],
¦ ¦ ¦ replicate = narrow_peaks['replicate'],
¦ ¦ ¦ chip = narrow),
#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_broad to trigger
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_broad.bed",
¦ ¦ ¦ tissue = samples['tissue'],
¦ ¦ ¦ replicate = samples['replicate'],
¦ ¦ ¦ chip = broad)
## Two functions, one to get the input files, defined here as `get_input` the other to retrieve the chip files
def get_input(wildcards):
z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
¦ wildcards.replicate + "_" + "input_unique_bowtie2_algn.bam")))
return z
def get_chip(wildcards):
z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
¦ ¦ ¦ wildcards.replicate + "_" + wildcards.chip + "_" + \
¦ ¦ ¦ "unique_bowtie2_algn.bam")))
return z
rule run_bcp_peak_caller_broad:
input:
¦ chip_input = get_input,
¦ chip_mod = get_chip
params:
¦ "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad"
output:
¦ "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
shell:"""
peakranger bcp \
--format bam \
--verbose \
--pval .001 \
--data {input.chip_mod} \
--control {input.chip_input} \
--output {params}
"""
rule run_bcp_peak_caller_narrow:
input:
chip_input = get_input,
chip_mod = get_chip
params:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow"
output:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow.bed"
shell:"""
peakranger \
--format bam \
--verbose \
--pval .001 \
--data {input.chip_mod} \
--control {input.chip_input} \
--output {params}
"""
错误如下:
MissingInputException in line 39 of /scratch/jpm73279/04.lncRNA/02.Analysis/24.regenerate_expression_peaks/Generate_peak_lists.snake:
Missing input files for rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep1_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K4me1_peaks_region_broad.bed
我的理解是,snakemake 会填充该rule all
部分中找到的文件组合,然后确定需要在上游运行哪些步骤。
任何帮助将不胜感激
解决方案
你的理解是对的;当没有为snakemake指定输出时,它会找到第一个规则,并尝试生成它的输出。
问题是rule all
指定了不能“制定”的规则。我将显示错误的并排比较:
rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
rule run_bcp_peak_caller_broad:
output:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
看到不同?您说要生成的文件以 . 结尾peaks_region_broad.bed
,但是您的规则使输出以peaks_broad.bed
.
再看一遍,rule all
可能你想删除字符串的_region部分。
推荐阅读
- spring-boot - 自定义 ObjectMapper 覆盖 SpringBoot 默认 ObjectMapper
- laravel - (laravel)在值标签中获取正确的产品
- python - 如何在 django 中保存 value ='val' where user ='username'
- node.js - IE11 使用 blob 保存下载的 .xlsx 损坏的文件(在本地工作)
- jquery - 缩放后如何保持形状图选择颜色?
- c# - 如何将此字符串转换为日期时间
- javascript - 编译 Vue NativeScript 项目时,命令 gradlew.bat 失败,退出代码为 1
- python - BST算法的实现
- php - 将数组传递给 Twig 中的宏
- swift - 快速使用 GCD 的竞争条件