shell - 在一个 slurm 脚本中运行同时任务，然后运行单个任务

问题描述

我需要一个 Slurm 脚本来执行以下操作：

在暂存空间中创建一个目录用于临时存储（使用 sbatch --gres disk:1024 请求）
对数百个 bam 文件运行 samtools sort 并将排序后的副本存储在暂存空间中（一次尽可能多）
排序后，对暂存空间中的排序文件运行 samtools index（一次尽可能多）
建立索引后，使用所有排序/索引的 bam 文件运行单个（大型）任务（CPU 越多越好）
将需要的文件复制回主存储系统并删除剩余文件（包括排序的 bam 和索引文件）

使用基本作业数组似乎不起作用，因为它抛弃了只需要执行一次的步骤。最后的单个任务说文件不存在所以我猜脚本正在超越自己并在其他人完成之前删除所有内容（可能运行 rm 与使用数组一样多次），所以尝试其他的东西。

以下脚本给了我来自 samtools 的错误，说在尝试创建排序的 bam 文件时没有这样的文件或目录。

[E::hts_open_format] Failed to open file /mnt/scratch/parallel_build/Sample_13-00145.bam
samtools sort: failed to create "/mnt/scratch/parallel_build/Sample_13-00145.bam": No such file or directory

如果我将 --nodes 降为 1，samtools sort 工作正常，但是它只按顺序运行，并且在大约 50 个文件之后它会向前跳，运行第 2 部分关于那里有哪些文件，最后的单个任务找不到其余文件（使用少于 30 个文件时可以正常工作）。

关于如何正确执行此操作的任何帮助都会很棒。当空间可用于第 1 部分和第 2 部分时，我希望在所有节点上安装尽可能多的任务。第 3 部分需要在一个节点上，但从许多 CPU 中受益匪浅，因此提供更多的 CPU 比它之前的小型并行任务（如果这意味着同时执行更多任务，则可以使用更少的 CPU）。请记住，我确实需要在一个过程中完成这一切，因为出于各种原因需要暂存空间。

#!/bin/sh
#SBATCH --job-name=majiq_build
#SBATCH --nodes=5
#SBATCH --tasks-per-node=4 # 4 tasks, would like more
#SBATCH --cpus-per-task=8
#SBATCH --time=30:00:00
#SBATCH --mem-per-cpu=4G
#SBATCH -A zheng_lab
#SBATCH -p exacloud
#SBATCH --error=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/log_files/build_parallel.x80.%J.err
#SBATCH --output=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/log_files/build_parallel.x80.%J.out

# Set variables
DIR=/home/exacloud/lustre1/zheng_lab/users/eggerj
TMP=/mnt/scratch/parallel_build
WORK=$DIR/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification

# Create temporary directory in requested scratch space to store files
mkdir $TMP
mkdir $TMP/majiq_out

##################################################################################################################
#
# PART 1: Sort bam files (in parallel) and store in scratch space
#         (wait until all are finished before part 2)
#
##################################################################################################################
while read F  ;
do
    fn="$(rev <<< "$F" | cut -d'/' -f 1 | rev)"
    echo $fn
    srun -N 1 -n 1 -c $SLURM_CPUS_PER_TASK --exclusive /opt/installed/samtools-1.6/bin/samtools sort -@ $SLURM_CPUS_PER_TASK -m 4G -o $TMP/$fn $F &
done <$WORK/test_bams/test_bam_list_x80.txt
wait

##################################################################################################################
#
# PART 2: Index bam files (in parallel) in scratch space
#         (wait until all are finished before part 3)
#
##################################################################################################################
for file in $TMP/*bam ;
do
    srun -N 1 -n 1 -c $SLURM_CPUS_PER_TASK --exclusive /opt/installed/samtools-1.6/bin/samtools index -@ $SLURM_CPUS_PER_TASK $file &
done
wait

# Check files actually made it before running MAJIQ
ls -lh $TMP

##################################################################################################################
#
# PART 3: Run MAJIQ build (single task) using all bam files (after all have been indexed)
#
##################################################################################################################

# Activate majiq virtual environment
source $DIR/majiq/bin/activate

# Run MAJIQ build using all bam files (.ini file indicates that bam files are in temp directory)
srun -N 1 -n 1 -c $SLURM_CPUS_PER_TASK --exclusive majiq build $WORK/gtfs/Homo_sapiens.GRCh37.75.gff3 -c $WORK/test_build_parallel/settings.x80.parallel.ini \
                                                          -j $SLURM_CPUS_PER_TASK --output $TMP/majiq_out --min-experiments 0.25
wait

# Move majiq output files from tmp directory to output directory on Lustre and remove
cp $TMP/majiq_out/* $WORK/test_build_parallel/majiq_out_x80/
rm -r $TMP

标签： shellcluster-computingslurmsbatchsamtools

shell - 在一个 slurm 脚本中运行同时任务，然后运行单个任务

问题描述

解决方案

推荐阅读