Slurm is an open-source workload manager designed for Linux clusters of all sizes. It’s a great system for queuing jobs for your HPC applications. I’m going to show you how to install Slurm on a CentOS 7 cluster.
- Delete failed installation of Slurm
- Install MariaDB
- Create the global users
- Install Munge
- Install Slurm
- Use Slurm
Cluster Server and Compute Nodes
I configured our nodes with the following hostnames using these steps. Our server is:
buhpc3
The clients are:
buhpc1 buhpc2 buhpc3 buhpc4 buhpc5 buhpc6
Delete failed installation of Slurm
I leave this optional step in case you tried to install Slurm, and it didn’t work. We want to uninstall the parts related to Slurm unless you’re using the dependencies for something else.
First, I remove the database where I kept Slurm’s accounting.
yum remove mariadb-server mariadb-devel -y
Next, I remove Slurm and Munge. Munge is an authentication tool used to identify messaging from the Slurm machines.
yum remove slurm munge munge-libs munge-devel -y
I check if the slurm and munge users exist.
cat /etc/passwd | grep slurm
Then, I delete the users and corresponding folders.
userdel - r slurm
userdel -r munge userdel: user munge is currently used by process 26278
kill 26278
userdel -r munge
Slurm, Munge, and Mariadb should be adequately wiped. Now, we can start a fresh installation that actually works.
Install MariaDB
You can install MariaDB to store the accounting that Slurm provides. If you want to store accounting, here’s the time to do so. I only install this on the server node, buhpc3. I use the server node as our SlurmDB node.
yum install mariadb-server mariadb-devel -y
We’ll setup MariaDB later. We just need to install it before building the Slurm RPMs.
Create the global users
Slurm and Munge require consistent UID and GID across every node in the cluster.
For all the nodes, before you install Slurm or Munge:
export MUNGEUSER=991 groupadd -g $MUNGEUSER munge useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge export SLURMUSER=992 groupadd -g $SLURMUSER slurm useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm
Install Munge
Since I’m using CentOS 7, I need to get the latest EPEL repository.
yum install epel-release
Now, I can install Munge.
yum install munge munge-libs munge-devel -y
After installing Munge, I need to create a secret key on the Server. My server is on the node with hostname, buhpc3. Choose one of your nodes to be the server node.
First, we install rng-tools to properly create the key.
yum install rng-tools -y rngd -r /dev/urandom
Now, we create the secret key. You only have to do the creation of the secret key on the server.
/usr/sbin/create-munge-key -r
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key
After the secret key is created, you will need to send this key to all of the compute nodes.
scp /etc/munge/munge.key root@1.buhpc.com:/etc/munge scp /etc/munge/munge.key root@2.buhpc.com:/etc/munge scp /etc/munge/munge.key root@4.buhpc.com:/etc/munge scp /etc/munge/munge.key root@5.buhpc.com:/etc/munge scp /etc/munge/munge.key root@6.buhpc.com:/etc/munge
Now, we SSH into every node and correct the permissions as well as start the Munge service.
chown -R munge: /etc/munge/ /var/log/munge/ chmod 0700 /etc/munge/ /var/log/munge/
systemctl enable munge systemctl start munge
To test Munge, we can try to access another node with Munge from our server node, buhpc3.
munge -n munge -n | unmunge munge -n | ssh 3.buhpc.com unmunge remunge
If you encounter no errors, then Munge is working as expected.
Install Slurm
Slurm has a few dependencies that we need to install before proceeding.
yum install openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad -y
Now, we download the latest version of Slurm preferably in our shared folder. The latest version of Slurm may be different from our version.
cd /nfs wget http://www.schedmd.com/download/latest/slurm-15.08.9.tar.bz2
If you don’t have rpmbuild yet:
yum install rpm-build rpmbuild -ta slurm-15.08.9.tar.bz2
We will check the rpms created by rpmbuild.
cd /root/rpmbuild/RPMS/x86_64
Now, we will move the Slurm rpms for installation for the server and computer nodes.
mkdir /nfs/slurm-rpms cp slurm-15.08.9-1.el7.centos.x86_64.rpm slurm-devel-15.08.9-1.el7.centos.x86_64.rpm slurm-munge-15.08.9-1.el7.centos.x86_64.rpm slurm-perlapi-15.08.9-1.el7.centos.x86_64.rpm slurm-plugins-15.08.9-1.el7.centos.x86_64.rpm slurm-sjobexit-15.08.9-1.el7.centos.x86_64.rpm slurm-sjstat-15.08.9-1.el7.centos.x86_64.rpm slurm-torque-15.08.9-1.el7.centos.x86_64.rpm /nfs/slurm-rpms
On every node that you want to be a server and compute node, we install those rpms. In our case, I want every node to be a compute node.
yum --nogpgcheck localinstall slurm-15.08.9-1.el7.centos.x86_64.rpm slurm-devel-15.08.9-1.el7.centos.x86_64.rpm slurm-munge-15.08.9-1.el7.centos.x86_64.rpm slurm-perlapi-15.08.9-1.el7.centos.x86_64.rpm slurm-plugins-15.08.9-1.el7.centos.x86_64.rpm slurm-sjobexit-15.08.9-1.el7.centos.x86_64.rpm slurm-sjstat-15.08.9-1.el7.centos.x86_64.rpm slurm-torque-15.08.9-1.el7.centos.x86_64.rpm
After we have installed Slurm on every machine, we will configure Slurm properly.
Visit http://slurm.schedmd.com/configurator.easy.html to make a configuration file for Slurm.
I leave everything default except:
ControlMachine: buhpc3 ControlAddr: 128.197.116.18 NodeName: buhpc[1-6] CPUs: 4 StateSaveLocation: /var/spool/slurmctld SlurmctldLogFile: /var/log/slurmctld.log SlurmdLogFile: /var/log/slurmd.log ClusterName: buhpc
After you hit Submit on the form, you will be given the full Slurm configuration file to copy.
On the server node, which is buhpc3:
cd /etc/slurm vim slurm.conf
Copy the form’s Slurm configuration file that was created from the website and paste it into slurm.conf. We still need to change something in that file.
Underneathe slurm.conf “# COMPUTE NODES,” we see that Slurm tries to determine the IP addresses automatically with the one line.
NodeName=buhpc[1-6] CPUs = 4 State = UNKOWN
I don’t use IP addresses in order, so I manually delete this one line and change it to:
1 2 3 4 5 6 |
NodeName=buhpc1 NodeAddr=128.197.115.158 CPUs=4 State=UNKNOWN NodeName=buhpc2 NodeAddr=128.197.115.7 CPUs=4 State=UNKNOWN NodeName=buhpc3 NodeAddr=128.197.115.176 CPUs=4 State=UNKNOWN NodeName=buhpc4 NodeAddr=128.197.115.17 CPUs=4 State=UNKNOWN NodeName=buhpc5 NodeAddr=128.197.115.9 CPUs=4 State=UNKNOWN NodeName=buhpc6 NodeAddr=128.197.115.15 CPUs=4 State=UNKNOWN |
After you explicitly put in the NodeAddr IP Addresses, you can save and quit. Here is my full slurm.conf and what it looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=buhpc3 ControlAddr=128.197.115.176 # #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none TaskPlugin=task/none # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill #SchedulerPort=7321 SelectType=select/linear # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=buhpc #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=3 SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES NodeName=buhpc1 NodeAddr=128.197.115.158 CPUs=4 State=UNKNOWN NodeName=buhpc2 NodeAddr=128.197.115.7 CPUs=4 State=UNKNOWN NodeName=buhpc3 NodeAddr=128.197.115.176 CPUs=4 State=UNKNOWN NodeName=buhpc4 NodeAddr=128.197.115.17 CPUs=4 State=UNKNOWN NodeName=buhpc5 NodeAddr=128.197.115.9 CPUs=4 State=UNKNOWN NodeName=buhpc6 NodeAddr=128.197.115.15 CPUs=4 State=UNKNOWN PartitionName=debug Nodes=buhpc[1-6] Default=YES MaxTime=INFINITE State=UP |
Now that the server node has the slurm.conf correctly, we need to send this file to the other compute nodes.
scp slurm.conf root@1.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@2.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@4.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@5.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@6.buhpc.com/etc/slurm/slurm.conf
Now, we will configure the server node, buhpc3. We need to make sure that the server has all the right configurations and files.
mkdir /var/spool/slurmctld chown slurm: /var/spool/slurmctld chmod 755 /var/spool/slurmctld touch /var/log/slurmctld.log chown slurm: /var/log/slurmctld.log touch /var/log/slurm_jobacct.log /var/log/slurm_jobcomp.log chown slurm: /var/log/slurm_jobacct.log /var/log/slurm_jobcomp.log
Now, we will configure all the compute nodes, buhpc[1-6]. We need to make sure that all the compute nodes have the right configurations and files.
mkdir /var/spool/slurmd chown slurm: /var/spool/slurmd chmod 755 /var/spool/slurmd touch /var/log/slurmd.log chown slurm: /var/log/slurmd.log
Use the following command to make sure that slurmd is configured properly.
slurmd -C
You should get something like this:
ClusterName=(null) NodeName=buhpc3 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=7822 TmpDisk=45753 UpTime=13-14:27:52
The firewall will block connections between nodes, so I normally disable the firewall on the compute nodes except for buhpc3.
systemctl stop firewalld systemctl disable firewalld
On the server node, buhpc3, I usually open the default ports that Slurm uses:
firewall-cmd --permanent --zone=public --add-port=6817/udp firewall-cmd --permanent --zone=public --add-port=6817/tcp firewall-cmd --permanent --zone=public --add-port=6818/tcp firewall-cmd --permanent --zone=public --add-port=6818/tcp firewall-cmd --permanent --zone=public --add-port=7321/tcp firewall-cmd --permanent --zone=public --add-port=7321/tcp firewall-cmd --reload
If the port freeing does not work, stop the firewalld for testing. Next, we need to check for out of sync clocks on the cluster. On every node:
yum install ntp -y chkconfig ntpd on ntpdate pool.ntp.org systemctl start ntpd
The clocks should be synced, so we can try starting Slurm! On all the compute nodes, buhpc[1-6]:
systemctl enable slurmd.service systemctl start slurmd.service systemctl status slurmd.service
Now, on the server node, buhpc3:
systemctl enable slurmctld.service systemctl start slurmctld.service systemctl status slurmctld.service
When you check the status of slurmd and slurmctld, we should see if they successfully completed or not. If problems happen, check the logs!
Compute node bugs: tail /var/log/slurmd.log Server node bugs: tail /var/log/slurmctld.log
Use Slurm
To display the compute nodes:
scontrol show nodes
-N allows you to choose how many compute nodes that you want to use. To run jobs on the server node, buhpc3:
srun -N5 /bin/hostname
buhpc3 buhpc2 buhpc4 buhpc5 buhpc1
To display the job queue:
scontrol show jobs
JobId=16 JobName=hostname UserId=root(0) GroupId=root(0) Priority=4294901746 Nice=0 Account=(null) QOS=(null) JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2016-04-10T16:26:04 EligibleTime=2016-04-10T16:26:04 StartTime=2016-04-10T16:26:04 EndTime=2016-04-10T16:26:04 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=debug AllocNode:Sid=buhpc3:1834 ReqNodeList=(null) ExcNodeList=(null) NodeList=buhpc[1-5] BatchHost=buhpc1 NumNodes=5 NumCPUs=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=20,node=5 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/bin/hostname WorkDir=/root Power= SICP=0
To submit script jobs, create a script file that contains the commands that you want to run. Then:
sbatch -N2 script-file
Slurm has a lot of useful commands. You may have heard of other queuing tools like torque. Here’s a useful link for the command differences: http://www.sdsc.edu/~hocks/FG/PBS.slurm.html
Accounting in Slurm
We’ll worry about accounting in Slurm with MariaDB for next time. Let me know if you encounter any problems with the above steps!