首页 > 解决方案 > SLURM 无法连接到本地机器上的控制器

问题描述

我试图在我的本地机器上运行 SLURM 以在部署到 HPC 之前执行一些配置测试,但我在配置它时遇到了麻烦,我希望在这里找到帮助。

我使用 ubuntu 运行 docker,根据 SLRUM 安装说明安装了 munge 和 slurm。

我在这里创建了一个配置文件:

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=7e51ce889cd1
SlurmctldHost=7e51ce889cd1

MpiDefault=none

ProctrackType=proctrack/linuxproc

ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity

# TIMERS

InactiveLimit=0
KillWait=30

SlurmctldTimeout=120
SlurmdTimeout=300

Waittime=0

# SCHEDULING
DefMemPerCPU=8192
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core

# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
JobCompLoc=/var/log/jobcompletion
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log

# COMPUTE NODES
NodeName=7e51ce889cd1 CPUs=1 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2
PartitionName=7e51ce889cd1 Nodes=ALL Default=YES MaxTime=INFINITE State=UP

运行我的 docker 并运行后

service slurmd start
service slurmctld start
sinfo

我收到以下错误:

slurm_load_partitions: Unable to contact slurm controller (connect failure)

有人可以帮我解决这个问题吗?

祝你今天过得愉快

塞巴斯蒂安

标签: slurmhpc

解决方案


推荐阅读