// Using Prism
Prism is equipped with several powerful servers built specifically for accelerating AI/ML/DL workloads with GPUs. This cutting edge platform is easy to access and the preinstalled software/libraries provide foundational tools that enable scientists to maximize their workflows.
|System||Sockets||Cores per socket||Total Cores||Memory (GB)||NVME Storage (TB)||GPUs (NVIDIA V100)|
The systems come with the following software and libraries pre-installed. However, users will have to use Anaconda environments to access python machine-learning packages.
- Operating System: CentOS 7
- CUDA 11.2
- Anaconda 3
To gain access to the Prism GPU Cluster, please contact NCCS User Support and request access to Prism on ADAPT. You may connect by logging in to adaptlogin.nccs.nasa.gov, then ssh to gpulogin1. Once you are connected to the login node, you will need to use SLURM in order to access the Prism GPU resources.
For more information on SLURM, see the 'SLURM' section below.
For more information on access and login, see NCCS Account Setup.
To see gpu0 utilization on every node, go to: Prism Ganglia
All nodes have four GPUs, you need to select “gpuX_util” for GPUs 1, 2 and 3. Once we get the DGX added, that will have GPUs 4-7, as well.
See here for information on accessing and using the current JupyterHubs on ADAPT.
Anaconda environments have been used to install the Python machine learning frameworks. These environments can be accessed by loading the anaconda module with:
- Prism GPU Cluster: ‘module load anaconda’
To activate an environment run ‘conda activate <ENV>’ on the environment of your choice once the module is loaded.
Users can inspect the complete list of packages and versions installed within an environment by running:
$ conda list
Users can also inspect other available environments by running:
$ conda env list
Users may also create Anaconda environments in their home directory. This will allow users to maintain the environment on their own.
Users are recommended to load both the Anaconda and NVIDIA modules in Prism using 'module load nvidia' and 'module load anaconda'; use 'module spider' for more options. For more information on how to load modules, see the Tips & Info for New NCCS ADAPT Users Tech Talk slides under "Modules".
SLURM allows for more efficient resource allocation, fairer sharing, and easier management of the resources on the GPU nodes. There are three main ways to interface with SLURM on the GPU nodes:
- ‘sbatch’: Submit a batch script to Slurm. Create a job script that can be submitted to the queue and call multiple tasks from within.
- ‘srun’: Specify resources for running a single command or execute a job step.
- ‘salloc’: Run interactively on allocated resources, or run a step by step job.
All three of these mechanisms share most, but not all, of the standard SLURM configuration flags. Some of the most useful are as follows:
|-G<NUM>||Specifies the total number of GPUs to be allocated to the job.|
|-t<TIME>||Allows you to set a time limit for your jobs and allocations. Acceptable formats include ‘-t <MINUTES>’, ‘-t HH:MM:SS’,’ -t D-HH:MM:SS’|
|--nodelist=<NODES>||Allows you to specify the nodes that you would like your jobs to run on. By default the pool includes all available nodes, but you could specify one (or more nodes separated by comma) to restrict the systems on which your work will run. This is not recommended though as you may end up waiting in a queue for a certain system when other resources may already be available.|
|-n <NUM_TASKS>||Specifies the number of tasks to run. In sbatch this is the maximum number of tasks to be run at any given time. This allows adequate resources to be allocated upon job submission.|
|-c <CPUS>||Specifies the number of processors to be allocated to each task.|
|-N <NUM_NODES>||Specifies the number of nodes to run on.|
|-J <JOB_NAME>||Allows you to name your job.|
||Specifies the minimum required amount of memory allocated per node.|