// Using The Discover Cluster

NOTICE: Discover Scalable Unit 16 (SCU16) Available for NCCS Users

NOTICE: The NCCS has begun deployment of SLES12 on the Discover cluster, over SLES11.
See our SLES12 Frequently Asked Questions page if you have any questions about the new environment.

The Discover Cluster Environment

The Discover cluster is the main compute cluster for processing batch jobs requiring significant compute resources. It consists of several scalable compute units(SCUs) that offer a variety of processor types. There are a variety of nodes dedicated to batch computing and interactive data analysis.

Operating System

SuSE Linux Enterprise Server

Processor Architectures

Architecture CPUs/GPUs Memory per CPU/GPU Memory per Node
Haswell 28 CPUs 4.5 GB / CPU 126 GB
Skylake 40 CPUs (36 are usable for multi-node jobs) 4 GB / CPU 192 GB
Cascade Lake 48 CPUs (46 usable) 4.0 GB / CPU 190 GB
AMD Rome 48 CPUs + 4 GPUs 123 GB / GPU 498 GB

Learn how to use Haswell, Cascade Lake, and Skylake nodes to submit a Slurm job.

Shells

BASH is the default shell available to all users on Discover. To switch to a different default shell, contact NCCS Support.

Here is a list of available shells on the Discover cluster. To verify your local environment check the $SHELL environment variable.

  • bash
  • csh
  • tcsh
  • sh or ksh

Log-in Information

Here you can find guidance about logging-in to NCCS Systems, passwords and tokens, and passwordless SSH/SCP.

LEARN MORE

File System + Storage

The NCCS provides several different types of file systems, including Home, Nobackup, Scratch, and Archive on the Discover cluster.

LEARN MORE

Compilation + Software

To accommodate the needs of a broad range of user groups, multiple versions of compilers from different vendors are provided on the Discover cluster.

LEARN MORE

File Transfers

Describes how to perform a secure file transfer between the Discover cluster and other systems.

LEARN MORE

Running Jobs using Slurm

Slurm is a job scheduler and resource manager dedicated to organizing scientific computing jobs on the Discover supercomputer. This section gives instruction on how users submit, monitor, kill jobs and much more!

LEARN MORE

Learn how to fully utilize Slurm's scheduling algorithm to enhance and schedule your job as soon as possible by flagging time limits, and node requirements.

SLURM BEST PRACTICES

DISCOVER QOS

Some users may want to run a large set of sequential jobs, for example, post-processing or data archiving jobs, on Discover. Portable Distributed Scripts (PoDS) is a set of scripts created by SSSO that enables users to execute a series of independent, sequential jobs concurrently on Discover's multi-core nodes.

PoDS

Monitoring + Optimization

Use the following techniques and tools to analyze and utilize your programming workflows

  • Memory Tools
  • Performance Tools
  • Storage Tools
  • Debugging Tools

LEARN MORE

GPU Partition

Documentation on accessing and using Discover GPUs

LEARN MORE

Miscellaneous

Following page includes variety of topics that will facilitate the users to change their Discover cluster environment according to their requirements:

  • Using Modules to load appropriate compilers, libraries and other software.
  • Cron on the Discover cluster allows users to automatically run tasks at a specified time.

LEARN MORE