Prerequisite: to run batch or interactive jobs on Discover, you will need to set up an "authorized_keys" file under $HOME/.ssh directory. To do so, refer to Passwordless SSH/SCP Within NCCS Systemsand complete Steps 1 and 2.
SLURM is a job scheduler and resource manager dedicated to organizing scientific computing jobs on the Discover supercomputer. This video gives instruction on how users submit jobs to be scheduled and allocate resources such as CPU time, memory, as well as other options for optimizing their job.
NCCS provides SchedMD's Slurm software for both interactive and batch access to Discover resources. NCCS slurm man pageshould be referenced below for full documentation of the currently installed version of all Slurm commands. Additional documentation of Slurm can be found at http://slurm.schedmd.com/documentation.html (but may reflect a different version).
Unix style man pages are available, but you must ensure your MANPATH variable is properly set. For bash and c-shell, respectively:
Set of Discover nodes to which work can be submitted. Available partitions include the default, general-purpose partition for all sizes and durations of computational jobs; and datamove, a special purpose partition for single-core jobs,for moving data to and from the Discover cluster.
Quality of Service (QoS):
Slurm feature that sets relative priority and resource limits for every job in the queue. NCCS QoSs are available only in the default partition; they include the default QoS, as well as long, debug, and serial. (Details here)
Generic term for a list of jobs in any state.
Slurm key commands:
sbatch: submit a batch job script for queueing and execution
salloc: submit an interactive job request
srun: run a command within an existing job, on a subset of allocated resources
scancel: cancel a queued or running job
squeue: query the status of your job(s) or the job queue