linux - 使用 slurm 运行多个文件而不会使队列超载
问题描述
我需要在多个(70000)个样本中并行运行一个脚本,我不想一次将所有样本提交到队列中。我如何一次安排 100 个,并且每次一个完成另一个可以排队?
由于运行了包含在我的脚本中的另一个软件,因此编写了很多文件。我还需要将每个文件中的结果提取到单个结果文件中。
我想到了一些事情:
# set maximum number of processes to run in SLURM
MAX_QUEUE=200
Protein_sequence='MNNAANTGTTNESNVSDAPRIEPLPSLNDDDIEKILQPNDIFTTDRTDASTTSSTAIEDIINPSLDPQSAASPVPSSSFFHDSRKPSTSTHLVRRGTPLGIYQTNLYGHNSRENTNPNSTLLSSKLLAHPPVPYGQNPDLLQHAVYRAQPSSGTTNAQPRQTTRRYQSHKSRPAFVNKLWSMLNDDSNTKLIQWAEDGKSFIVTNREEFVHQILPKYFKHSNFASFVRQLNMYGWHKVQDVKSGSIQSSSDDKWQFENENFIRGREDLLEKIIRQKGSSNNHNSPSGNGNPANGSNIPLDNAAGSNNSNNNISSSNSFFNNGHLLQGKTLRLMNEANLGDKNDVTAILGELEQIKYNQIAISKDLLRINKDNELLWQENMMARERHRTQQQALEKMFRFLTSIVPHLDPKMIMDGLGDPKVNNEKLNSANNIGLNRDNTGTIDELKSNDSFINDDRNSFTNATTNARNNMSPNNDDNSIDTASTNTTNRKKNIDENIKNNNDIINDIIFNTNLANNLSNYNSNNNAGSPIRPYKQRYLLKNRANSSTSSENPSLTPFDIESNNDRKISEIPFDDEEEEETDFRPFTSRDPNNQTSENTFDPNRFTMLSDDDLKKDSHTNDNKHNESDLFWDNVHRNIDEQDARLQNLENMVHILSPGYPNKSFNNKTSSTNTNSNMESAVNVNSPGFNLQDYLTGESNSPNSVHSVPSNGSGSTPLPMPNDNDTEHASTSVNQGENGSGLTPFLTVDDHTLNDNNTSEGSTRVSPDIKFSATENTKVSDNLPSFNDHSYSTQADTAPENAKKRFVEEIPEPAIVEIQDPTEYNDHRLPKRAKK'
# 5' primer to add at "N" terminal (left of the sequence)
p5=${Protein_Sequence:463:30}
header=true # file has header and I have to skip it
# open file containing the sequence fused at the right of p5
for insert in `cat $1 | awk 'BEGIN{FS=","}{print $2}'`
do
# if header, then continue with next iteration and flag header as false
if [ $header = true ]
then
header=false
else
printf ">${insert}\n${p5}${insert}" > ${insert}.fasta # write fasta file (this is the input of psipred)
# check how many processes are in the queue
queue=$(squeue -u aerijman | wc -l)
queue=$(echo $queue -1 | bc)
# if few processes queued, proceed, else wait.
if [ $queue -lt $MAX_QUEUE ]
then
sbatch -p campus -c 1 --job-name=${insert} --wrap="runpsipred ${insert}.fasta"
else
# take the chance to find *horiz files which contain the result
for prefix in `ls *horiz`
do
# extract the resulting sequence of 2ry structure elements and append it to a ingle file with all esults
horiz=$(while read line; do if [ "${line:0:4}" == Pred ]; then echo ${line:6:${#line}} | tr -d "\n"; fi; done < $prefix)
printf ">${p5}${insert}\n${horiz}" >> horiz.results
# rm all side files (from psipred-blast)
rm ${prefix:0:-5}*
done
# This loop is tracking if any process has finished (so a new processes can ve queued)
while [ $queue -ge $MAX_QUEUE ]
do
queue=$(squeue -u aerijman | wc -l)
queue=$(echo $queue -1 | bc)
done
fi
fi
done
对于在此脚本中包含太多不相关的信息,我深表歉意,但我相信我的业余方式可以通过更智能的方式更改循环监视队列中的空缺。
任何帮助将不胜感激!
解决方案
推荐阅读
- c# - 如何对具有类型安全性的结构进行装箱和拆箱
- c# - 使用 GitHub API 在一次提交中编辑多个文件
- python - 在 python 中执行代码时可以创建新变量吗?
- zsh - 如何使用 Pengwin 和 zsh 在 WSL 2 中添加或删除 /mnt/c/
- php - Laravel Lumen 图像验证不起作用
- reactjs - 如何在 redux saga 生成器函数中模拟变量?
- python - 如何向 Matplotlib 图例添加多个元素?
- java - 如何为使用 kafka 和 cassandra 的应用程序设置/创建 docker 映像
- javascript - 防止多次执行同一个监听器
- javascript - 倒数计时器在 10 分钟后消失且不重置计数器