PrintPrint
Prerequisite: to run batch or interactive jobs on Discover, you will need to set up an "authorized_keys" file under $HOME/.ssh directory. To do so, refer to Passwordless SSH/SCP Within NCCS Systems and complete Steps 1 and 2.

Slurm

SLURM is a job scheduler and resource manager dedicated to organizing scientific computing jobs on the Discover supercomputer. This video gives instruction on how users submit jobs to be scheduled and allocate resources such as CPU time, memory, as well as other options for optimizing their job.

NCCS provides SchedMD's Slurm software for both interactive and batch access to Discover resources. NCCS slurm man page should be referenced below for full documentation of the currently installed version of all Slurm commands. Additional documentation of Slurm can be found at http://slurm.schedmd.com/documentation.html (but may reflect a different version).
 

Unix style man pages are available, but you must ensure your MANPATH variable is properly set. For bash and c-shell, respectively:

$ export MANPATH=$MANPATH:/usr/slurm/share/man
% setenv MANPATH $MANPATH:/usr/slurm/share/man

Slurm key concepts:

  • Partition:
Set of Discover nodes to which work can be submitted. Available partitions include the default, general-purpose partition for all sizes and durations of computational jobs; and datamove, a special purpose partition for single-core jobs,for moving data to and from the Discover cluster.
  • Quality of Service (QoS):
Slurm feature that sets relative priority and resource limits for every job in the queue. NCCS QoSs are available only in the default partition; they include the default QoS, as well as long, debug, and serial. (Details here)
  • Queue:
Generic term for a list of jobs in any state.

Slurm key commands:

  • sbatch: submit a batch job script for queueing and execution
  • salloc: submit an interactive job request
  • srun: run a command within an existing job, on a subset of allocated resources
  • scancel: cancel a queued or running job
  • squeue: query the status of your job(s) or the job queue

Slurm basic usage:

Slurm advanced usage:

  • use a subset of resources within a job allocation (srun --ntasks)
  • obtain a shell within a currently running batch job allocation
  • order a set of jobs (sbatch --depend or --nice)
  • submit a set of jobs for which only inputs and outputs vary (sbatch --array)
  • monitor your job's stdout/stderr

Slurm example jobs:

  • example MPI, OpenMP and PoDS job scripts

Related: Determine memory usage: