// Slurm Best Practices on Discover
The following approaches allow Slurm's advanced scheduling algorithm the greatest flexibility to schedule your job to run as soon as possible.
Learn how to request Cascade Lake, Skylake, or Milan nodes to run your slurm job.
Inline directives (#SBATCH) should be included in the beginning of your job script.
See “man sbatch” for the corresponding command line options.
Feel free to experiment with these, or contact support@nccs.nasa.gov, and we'll be happy to customize our recommendations for your specific use case.
DO NOT
The guiding principle here is to specify only what's necessary, to give yourself the best and earliest chance of being scheduled.
- Do not specify any partition, unless you are trying to access specialized hardware, such as datamove or co-processor nodes. Since the default partition may need to change over time, eliminating such specifications will minimize required changes in your job scripts in the future.
DO
The guiding principle here is to specify complete, accurate, and flexible resource requirements:
- Do specify any and all processor architectures that your job can use (e.g. “[sky|cas]”, if your job can run on Skylake or Cascade Lake nodes; other related examples below). NCCS's Slurm configuration ensures that each job will only run on one type of processor architecture.
Time Limit
Specify both a preferred maximum time limit, and a minimum time limit as well, if your workflow performs self-checkpointing. In this example, if you know that your job will save its intermediate results within the first 4 hours, these specifications will cause Slurm to schedule your job in the earliest available time window of 4 hours or longer, up to 12hrs:
#SBATCH --time=12:00:00
#SBATCH --time-min=04:00:00
Alternatively, specify as low a time limit as will realistically allow your job to complete; this will enhance your job's opportunity to be backfilled:
#SBATCH --time=30:00
Memory Limits
Specify memory requirements explicitly, either as memory per node, or as memory per CPU:
#SBATCH --mem=12G
or
#SBATCH --mem-per-cpu=3G
The following combination of options will let Slurm run your job on nodes that have an aggregate core count of at least 256, and aggregate total memory of at least 512G:
#SBATCH --mem-per-cpu=2G
#SBATCH --ntasks=256
Node Requirements
Specify a range of acceptable node counts. This example tells the scheduler that the job can use anywhere from 128 to 256 nodes: (NOTE: Your job script must then launch the appropriate number of tasks, based on how many nodes you are actually allocated.)
#SBATCH --nodes=128-256
Specify the minimum number of cpus per node that your application requires. With this example, your application will run on any available node with 16 or more cores available:
#SBATCH --mincpus=16
To flexibly request large memory nodes, you could specify a node range, maximum number of tasks (if you receive the maximum node count you request), and total memory needed per node. For example, for an application that can run on anywhere from 20-24 nodes, needs 8 cores per node, and uses 2G per core, you could specify the following:
#SBATCH --nodes=20-24
#SBATCH --ntasks-per-node=8
#SBATCH --mem=16G
In the above, Slurm understands --ntasks to be the maximum task count across all nodes. So your application will need to be able to run on 160, 168, 176, 184, or 192 cores, and will need to launch the appropriate number of tasks, based on how many nodes you are actually allocated.
Requesting Cascade Lake Nodes
Use the proper directives. From the command line:
$ sbatch --constraint=cas jobscript
Inline directives:
#SBATCH --constraint=cas
It is always a good practice to ask for resources in terms of cores or tasks, rather than number of nodes. For example, 10 Cascade Lake nodes could run 460 tasks on 460 cores.
The wrong way to ask for the resources:
#SBATCH --nodes=10
The right way to ask for resources:
#SBATCH --ntasks=460
If you should need more memory per task and, therefore, use fewer cores per node, use the following (note: memory below is in Megabytes):
#SBATCH --ntasks-per-node=N
#SBATCH --mem-per-cpu=M
If you know the code can run on a Cascade Lake or Skylake node:
#SBATCH --constraint="[cas|sky]"
If you know the code can run on a Cascade Lake or Skylake or Milan node:
#SBATCH --constraint="[cas|sky|mil]”
PLEASE NOTE: Square brackets and double quotas are required when specifying multiple processor types to ensure that the constraints are evaluated properly.
Requesting Skylake Nodes
Use the proper directives. From the command line:
$ sbatch --constraint=sky jobscript
Inline directives:
#SBATCH --constraint=sky
Requesting Milan Nodes
Use the proper directives. From the command line:
$ sbatch --constraint=mil jobscript
Inline directives:
#SBATCH --constraint=mil
Specifying multiple node types along with other constraints
If you know your code can run on a Cascade Lake or Skylake, and you also need read-only access to CSS, specify:
#SBATCH --constraint=”[cas|sky]&cssro”
Debugging and testing jobs
By default, when a compute node fails, Slurm will re-queue the job which was running on that node under the assumption that the node experienced a hardware failure. However, if the node crashed because of issues with the user job, re-queueing the job leads to subsequent node failures as the job is re-queued onto different nodes and causes similar failures. This can be extremely disruptive as multiple nodes are often made unavailable until the offending job is identified and canceled by the system administrators.
It is highly recommended, especially when debugging and testing new codes or jobs, that users
specify the no-requeue option to prevent Slurm from re-queueing the job. Include the following
directive in your batch script to prevent Slurm from re-queueing your job during testing and debugging.
#SBATCH --no-requeue