Slum Quick Reference Guide

Commands
Slum Scripts
Slum Positions

Job Submission
Monitoring jobs
Job Deletion

Quick Reference Guide

Commands

Table with most used Slurm commands. See complete listing here <https://slurm.schedmd.com/pdfs/summary.pdf>

Command	Description
sacct	Display Accounting Data on jobs
salloc	Allocate resources required for a job
srun	Obtain job allocation and execute job
sbatch	Submit a job script for execution
scancel	Cancel a job
sinfo	View information about nodes and partitions
squeue	View information about jobs

Man pages exist for all Slurm commands. The command option --help also provides a brief summary of options. Note that the command options are all case sensitive.

Slurm Scripts

Slurm jobs are usually send via a shell script that does the following:

• Describes the processing to be done (Input-Process-Output)

• Requests resources to use for processing

Example of a simple Slurm Script <testslurm.sh>

#!/bin/bash

# set the number of nodes

#SBATCH --nodes=4

# set max wallclock time

#SBATCH --time=10:00:00

# set name of job

#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution

#SBATCH --mail-type=ALL

# send mail to this address

#SBATCH --mail-user=john.doe@email.edu

# run the application

srun hostname

Once the script has been saved it can then be submitted as a job using the sbatch command. Eg.

$ sbatch –o my.output ./testslurm.sh

Upon submission, slurm will generate a job id. Jobs will remain in the queue until enough resources can be allocated for execution.

The command squeue will allow you to see the jobs in the queue and the scancel <job id> command will allow you to cancel your job if necessary.

Slurm Partitions

Slurm partitions are the various queues that can handle jobs. Each partition consists of a set of nodes. SPARKS has the following partitions defined:

• defq – The default partition consisting of 32 compute nodes (2x8 core Xeon)

• gpuq – This partition consists of 4 compute nodes with GPU accelerators (2xNVidia K80)

• trainq – This partition consists of 2 compute nodes (2x8 core Xeon). This partition is used for testing jobs (ensuring that scripts work) before submission to the default partition.

MITS Helpdesk Live