// Slurm Best Practices on Discover

The following approaches allow Slurm's advanced scheduling algorithm the greatest flexibility to schedule your job to run as soon as possible.

Learn how to request Cascade Lake, or Milan nodes to run your slurm job.
Inline directives (#SBATCH) should be included in the beginning of your job script.
See “man sbatch” for the corresponding command line options.

Feel free to experiment with these, or contact support@nccs.nasa.gov, and we'll be happy to customize our recommendations for your specific use case.

DO NOT

The guiding principle here is to specify only what's necessary, to give yourself the best and earliest chance of being scheduled.

Do not specify any partition, unless you are trying to access specialized hardware, such as datamove or co-processor nodes. Since the default partition may need to change over time, eliminating such specifications will minimize required changes in your job scripts in the future.

DO

The guiding principle here is to specify complete, accurate, and flexible resource requirements:

Do specify any and all processor architectures that your job can use (e.g. “[cas]”, if your job can run on Cascade Lake nodes; other related examples below). NCCS's Slurm configuration ensures that each job will only run on one type of processor architecture.

Time Limit

Specify both a preferred maximum time limit, and a minimum time limit as well, if your workflow performs self-checkpointing. In this example, if you know that your job will save its intermediate results within the first 4 hours, these specifications will cause Slurm to schedule your job in the earliest available time window of 4 hours or longer, up to 12hrs: #SBATCH --time=12:00:00 #SBATCH --time-min=04:00:00

Alternatively, specify as low a time limit as will realistically allow your job to complete; this will enhance your job's opportunity to be backfilled: #SBATCH --time=30:00

Memory Limits

Specify memory requirements explicitly, either as memory per node, or as memory per CPU: #SBATCH --mem=12G or #SBATCH --mem-per-cpu=3G

The following combination of options will let Slurm run your job on nodes that have an aggregate core count of at least 256, and aggregate total memory of at least 512G: #SBATCH --mem-per-cpu=2G #SBATCH --ntasks=256

Node Requirements

Specify a range of acceptable node counts. This example tells the scheduler that the job can use anywhere from 128 to 256 nodes: (NOTE: Your job script must then launch the appropriate number of tasks, based on how many nodes you are actually allocated.) #SBATCH --nodes=128-256

Specify the minimum number of cpus per node that your application requires. With this example, your application will run on any available node with 16 or more cores available: #SBATCH --mincpus=16

To flexibly request large memory nodes, you could specify a node range, maximum number of tasks (if you receive the maximum node count you request), and total memory needed per node. For example, for an application that can run on anywhere from 20-24 nodes, needs 8 cores per node, and uses 2G per core, you could specify the following: #SBATCH --nodes=20-24 #SBATCH --ntasks-per-node=8 #SBATCH --mem=16G

In the above, Slurm understands --ntasks to be the maximum task count across all nodes. So your application will need to be able to run on 160, 168, 176, 184, or 192 cores, and will need to launch the appropriate number of tasks, based on how many nodes you are actually allocated.

Requesting Cascade Lake Nodes

Use the proper directives. From the command line: $ sbatch --constraint=cas jobscript

Inline directives: #SBATCH --constraint=cas

It is always a good practice to ask for resources in terms of cores or tasks, rather than number of nodes. For example, 10 Cascade Lake nodes could run 460 tasks on 460 cores.

The wrong way to ask for the resources: #SBATCH --nodes=10

The right way to ask for resources: #SBATCH --ntasks=460

If you should need more memory per task and, therefore, use fewer cores per node, use the following (note: memory below is in Megabytes): #SBATCH --ntasks-per-node=N #SBATCH --mem-per-cpu=M

If you know the code can run on a Cascade Lake node: #SBATCH --constraint="[cas]"

If you know the code can run on a Cascade Lake or Milan node: #SBATCH --constraint="[cas|mil]”
PLEASE NOTE: Square brackets and double quotas are required when specifying multiple processor types to ensure that the constraints are evaluated properly.

Requesting Milan Nodes

Use the proper directives. From the command line: $ sbatch --constraint=mil jobscript

Inline directives: #SBATCH --constraint=mil

Specifying multiple node types along with other constraints

If you know your code can run on a Cascade Lake and you also need read-only access to CSS, specify: #SBATCH --constraint=”[cas]&cssro”

Debugging and testing jobs

By default, when a compute node fails, Slurm will re-queue the job which was running on that node under the assumption that the node experienced a hardware failure. However, if the node crashed because of issues with the user job, re-queueing the job leads to subsequent node failures as the job is re-queued onto different nodes and causes similar failures. This can be extremely disruptive as multiple nodes are often made unavailable until the offending job is identified and canceled by the system administrators.

It is highly recommended, especially when debugging and testing new codes or jobs, that users specify the no-requeue option to prevent Slurm from re-queueing the job. Include the following directive in your batch script to prevent Slurm from re-queueing your job during testing and debugging. #SBATCH --no-requeue

NASA Center for Climate Simulation

High Performance Computing for Science