python - Snakemake：如何防止不明确的规则同时执行？

问题描述

我在机器学习环境中使用snakemake。我有两个规则（process_x_only和process_x_and_y）processed_x.txt作为目标输出，因此是模棱两可的。请参见以下代码：

rule process_x_only:
    input:
        'x.txt',
    output:
        'processed_x.txt'

rule process_x_and_y:
    input:
        'x.txt',
        'y.txt'
    output:
        'processed_x.txt',
        'processed_y.txt'

ruleorder: process_x_only > process_x_and_y

rule refit_model:
    input:
        'processed_x.txt',
        'processed_y.txt'
    output:
        'predictions_refit.txt'

rule predict_model:
    input:
        'processed_x.txt'
    output:
        'predictions.txt'

在snakemake的文档之后，我使用一个ruleorder语句来指定最好只处理x（即只有在需要处理y时才process_x_and_y应该运行，否则只处理x就足够了并且process_x_only可以运行。）。这解决了歧义问题，但引入了另一个问题。当我执行时：

snakemake predictions_refit.txt

snakemake 将首先执行process_x_only，然后process_x_and_y，而在这种情况下，我只想process_x_and_y被执行。如何让snakemake构建一个仅在其中process_x_and_y执行的DAG？

澄清一下：这是对我的实际问题的一个很大的简化。我知道更改问题陈述的限制将解决问题，但我对如何解决 snakemake 正在执行规则顺序中的两个规则感兴趣。

添加：在执行规则refit model时，会显示以下警告：

Warning: the following output files of rule process_x_only were not 
present when the DAG was created:

{'processed_x.txt'}

标签： pythonsnakemake

这是个有趣的问题。我不知道snakemake不能靠它自己解决这个问题。一个技巧是告诉snakemake哪个规则产生文件refit_model

rule refit_model:
    input:
        rules.process_x_and_y.output
    output:
        'predictions_refit.txt'

但是，您仍然可能在更复杂的工作流程中遇到麻烦，或者如果您尝试运行snakemake refit_model predict_model. hack 是否为您解决了问题？

对于一个真正的解决方案，我认为 snakemake 在创建 DAG 后必须对其进行修剪。我建议你在这里开一个新问题。

python - Snakemake：如何防止不明确的规则同时执行？

问题描述

解决方案

推荐阅读