nasalogo
wide_banner_withNCCSlogo

NCCS Primer: A User's Guide

MATLAB user? Please see this important notice!
PBS Pro has been replaced by SLURM. Please consult
your guide to SLURM.

The NASA Center for Climate Simulation provides compute nodes for batch and interactive analysis. Thousands of nodes are available to manage serial and parallel processing tasks. The facilities are made up of several groups of computers, each of which is tasked with a particular aspect of data-intensive high performace computing.

In April 2012, some of us and some of our users met for an informal brown-bag talk that addressed topics of concern or interest. We hope to make it into a series of many. The first one was about the tape archive system, ramifications of different patterns of use and suggestions to improve everyone's overall experience. The slides from this session are available in PDF in Your data on tape.

This Primer and Guide is divided into sections that focus on specific details of using the resources. Use the tabs above to access each section.

  • DISCOVER is the main compute cluster for processing batch jobs requiring significant compute resources. It is made up of several so called scalable units that offer a variety of processor types. It consists of a mix of nodes dedicated to computing, interactive data analysis, and managing the global parallel file system (GPFS).

  • DALI nodes are special login nodes with very large physical memory, and are normally used interactively for large-scale data analysis.

  • DIRAC is the system that manages the archive system. You will log in on a Dirac node to manage your file migration in the archives

These systems are connected by shared file systems and by network communications. It is vital to become familiar with some of the details, as the overall performance and your experience depend on it.

Software is arbitrarily divided into the following categories, Visualization and Analtics, Compilers, and Libraries Support for various flavors and versions of MPI, Fortan, C and C++ from different vendors is provided. The Linux modules package is used to manage users' environment variables to ensure smooth operation.

The next section contains important announcements about issues affecting all Discover users, and is likely to change often. It is a good idea to revisit this page regularly.

Quick Start

You can login to a Discover or a Dali node using procedures described in the System Login page. Now you can also request to be logged on a Dali node with a GPU.

$ ssh dali $ ssh dali-gpu

The first command will get you a Dali node with or without a GPU. The second comand will only connect you to a Dali node with GPU.

Caveats

warning icon

Archive issues

Occasionally when there are system issues either on the archive cluster machines or on the discover or dali nodes, file writes to the archive will fail, but ls(1) output on the archive filename will show the full file size even though there is no data in the file. The archive administrators usually detect this. NCCS Support emails the file owner explaining that despite the correct-looking file size of a particular archive file, the file was found to actually contain no data blocks and is therefore corrupt, and the owner is requested to replace the file if it is needed, or delete it if it is not.

Future Plans

This Primer is meant to be a quick introduction to the facilities for new users. It is, like all web pages, a living document that will grow as systems, configurations, technology change, and as we learn better ways of doing things. You can help us by pointing out omissions, errors, and let us know what topics interest you for future inclusion in this primer.

Your suggestions are always welcome. Please send email to NCCS Support: support@nccs.nasa.gov, and please put 'Primer suggestions' in the subject field.

Valid XHTML 1.0 Strict Valid CSS!

Suggestions are always welcome. Please send mail to NCCS Support: support at nccs.nasa.gov
usagovlogo
 
nasalogo

shim