Slum Quick Reference Guide

Slum Quick Reference Guide

  • Commands
  • Slum Scripts
  • Slum Positions
  • Job Submission
  • Monitoring jobs
  • Job Deletion

Quick Reference Guide

Commands

Table with most used Slurm commands.  See complete listing here <https://slurm.schedmd.com/pdfs/summary.pdf>

Command Description
sacct Display Accounting Data on jobs
salloc Allocate  resources required for a job
srun Obtain job allocation  and execute job
sbatch Submit a job script for execution
scancel Cancel a job
sinfo View information about nodes and partitions
squeue View information about jobs

Man pages exist for all Slurm commands. The command option --help also provides a brief summary of options. Note that the command options are all case sensitive.

Slurm Scripts
 
Slurm jobs are usually send via a shell script that does the following:
Describes the processing to be done (Input-Process-Output)
Requests resources to use for processing
Example of a simple Slurm Script <testslurm.sh>
#!/bin/bash
 
# set the number of nodes
#SBATCH --nodes=4
 
# set max wallclock time
#SBATCH --time=10:00:00
 
# set name of job
#SBATCH --job-name=test123
 
# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL
 
# send mail to this address
#SBATCH --mail-user=john.doe@email.edu
 
# run the application
srun hostname
 
Once the script has been saved it can then be submitted as a job using the sbatch command. Eg.
$ sbatch –o my.output ./testslurm.sh
 
Upon submission, slurm will generate a job id.  Jobs will remain in the queue until enough resources can be allocated for execution.
The command squeue will allow you to see the jobs in the queue and the scancel <job id> command will allow you to cancel your job if necessary.
 
Slurm Partitions
 
Slurm partitions are the various queues that can handle jobs.  Each partition consists of a set of nodes.  SPARKS has the following partitions defined:
defq – The default partition consisting of 32 compute nodes (2x8 core Xeon)
gpuq – This partition consists of 4 compute nodes with GPU accelerators (2xNVidia K80)
trainq – This partition consists of 2 compute nodes (2x8 core Xeon).  This partition is used for testing jobs (ensuring that scripts work) before submission to the default partition.