// Using The Discover Cluster
NOTICE: Discover Scalable Unit 17 (SCU17) Available for NCCS Users
The Discover Cluster Environment
The Discover cluster is the main compute cluster for processing batch jobs requiring significant compute resources. It consists of several scalable compute units(SCUs) that offer a variety of processor types. There are a variety of nodes dedicated to batch computing and interactive data analysis.
Operating System
SuSE Linux Enterprise Server
Use of Discover requires basic Linux usage skills. Please be sure you achieve competence in the concepts and commands explained in these resources, before using Discover:
- Introduction to Linux Tutorial
- More Advanced Linux Tutorial
- SATERN Linux shell-scripting course code: os_doss_a01_it_enus
- Linux Handbook on file-permissions and ownership.
For fine-grained control of file and directory permissions, you might also need to understand and consult with the Discover systems team regarding your requirements for Access Control Lists.
Processor Architectures
Architecture | CPUs/GPUs | Memory per CPU/GPU | Memory per Node |
---|---|---|---|
Skylake | 40 CPUs (36 are usable for multi-node jobs) | 4 GB / CPU | 192 GB |
Cascade Lake | 48 CPUs (46 usable) | 4.0 GB / CPU | 190 GB |
AMD Rome | 48 CPUs + 4 GPUs | 122 GB / GPU | 498 GB |
Milan | 128 CPUs (126 usable) | 4.0 GB / CPU | 512GB |
Learn how to use Cascade Lake, Milan, and Skylake nodes to submit a Slurm job.
Shells
BASH is the default shell available to all users on Discover. To switch to a different default shell, contact NCCS Support.
Here is a list of available shells on the Discover cluster. To verify your local environment check the $SHELL environment variable.
- bash
- csh
- tcsh
- sh or ksh
Log-in Information
Here you can find guidance about logging-in to NCCS Systems, passwords and tokens, and passwordless SSH/SCP.
LEARN MOREFile System + Storage
The NCCS provides several different types of file systems, including Home, Nobackup, Scratch, and Archive on the Discover cluster.
LEARN MORECompilation + Software
To accommodate the needs of a broad range of user groups, multiple versions of compilers from different vendors are provided on the Discover cluster.
LEARN MOREFile Transfers
Describes how to perform a secure file transfer between the Discover cluster and other systems.
LEARN MOREJupyterHub
JupyterHub is a web-based portal that allows users to use Python, Octave, or interactive shell access to Discover for visualization, data processing and analysis, as well as general interaction with the cluster.
LEARN MORERunning Jobs using Slurm
Slurm is a job scheduler and resource manager dedicated to organizing scientific computing jobs on the Discover supercomputer. This section gives instruction on how users submit, monitor, kill jobs and much more!
LEARN MORELearn how to fully utilize Slurm's scheduling algorithm to enhance and schedule your job as soon as possible by flagging time limits, and node requirements.
SLURM BEST PRACTICESDISCOVER QOS
Some users may want to run a large set of sequential jobs, for example, post-processing or data archiving jobs, on Discover. Portable Distributed Scripts (PoDS) is a set of scripts created by SSSO that enables users to execute a series of independent, sequential jobs concurrently on Discover's multi-core nodes.
PoDSMonitoring + Optimization
Use the following techniques and tools to analyze and utilize your programming workflows
- Memory Tools
- Performance Tools
- Storage Tools
- Debugging Tools
LEARN MORE
GPU Partition
Documentation on accessing and using Discover GPUs
LEARN MOREMiscellaneous
Following page includes variety of topics that will facilitate the users to change their Discover cluster environment according to their requirements:
- Using Modules to load appropriate compilers, libraries and other software.
- Cron on the Discover cluster allows users to automatically run tasks at a specified time.
LEARN MORE