python - snakemake 在 python 函数中使用通配符输入/输出
问题描述
我有一个简单的 python 函数,它接受输入并创建输出
def enlarge_overlapping_region(input,output):
fi=open(input,"r")
fo=open(output,"w")
df = pd.read_table(fi, delimiter='\t',header=None,names=["chr","start","end","point","score","strand","cdna_count","lib_count","region_type","region_id"])
df1 = (df.groupby('region_id', as_index=False)
.agg({'chr':'first', 'start':'min', 'end':'max','region_type':'first'})
[['chr','start','end','region_type','region_id']])
df1 = df1[df1.region_id != "."]
df1.to_csv(fo,index=False, sep='\t')
return(df1)
我在一个规则 snakemake 中调用这个函数。但我无法访问我不知道为什么的文件。
我尝试过这样的事情:
rule get_enlarged_dhs:
input:
"data/annotated_clones/{cdna}_paste_{lib}.annotated.bed"
output:
"data/enlarged_coordinates/{cdna}/{cdna}_paste_{lib}.enlarged_dhs.bed"
run:
lambda wildcards: enlarge_overlapping_region(f"{wildcards.input}",f"{wildcards.output}")
我收到了这个错误:
Missing files after 5 seconds:
data/enlarged_coordinates/pPGK_rep1/pPGK_rep1_paste_pPGK_input.enlarged_dhs.bed
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wa
it.
如果我直接将python代码放入这样的规则中:
rule get_enlarged_dhs:
input:
"data/annotated_clones/{cdna}_paste_{lib}.annotated.bed"
output:
"data/enlarged_coordinates/{cdna}/{cdna}_paste_{lib}.enlarged_dhs.bed"
run:
fi=open(input,"r")
fo=open(output,"w")
df = pd.read_table(fi, delimiter='\t',header=None,names=["chr","start","end","point","score","strand","cdna_count","lib_count","region_type","region_id"])
df1 = (df.groupby('region_id', as_index=False)
.agg({'chr':'first', 'start':'min', 'end':'max','region_type':'first'})
[['chr','start','end','region_type','region_id']])
df1 = df1[df1.region_id != "."]
df1.to_csv(fo,index=False, sep='\t')
我收到了这个错误:
expected str, bytes or os.PathLike object, not InputFiles
解决方案
它比你想象的要简单,可能是:
lambda wildcards: enlarge_overlapping_region(f"{wildcards.input}",f"{wildcards.output}")
应该:
enlarge_overlapping_region(input[0], output[0])
同样,要修复您尝试更改的第二个解决方案:
fi=open(input,"r")
fo=open(output,"w")
到
fi=open(input[0],"r")
fo=open(output[0],"w")
run
在我看来,为输入和输出文件分配名称并在orshell
指令中使用该名称不太容易出错。例如
rule get_enlarged_dhs:
input:
bed= "...",
output:
bed= "...",
run:
enlarge_overlapping_region(input.bed, output.bed)
推荐阅读
- ios - Cocoapods 1.10.beta.2 - 无法加载文件列表的内容 (xcfilelist)
- c++ - 在模板函数中将给定参数乘以 3 (C++)
- laravel - ReflectionException 类 app\Repositories\MailMessageRepository 不存在
- r - 强制复制分配
- scala - 您如何通过 RDD 操作计算两个过滤器?
- python - 更新列中的值
- python - 转动
放入列表时抛出错误 - java - 来自 ArrayList 的平均/最大值
- git - Git 的提交历史中没有合并提交
- java - 2020b 和 2020c 的 tzupdater 失败