Skip all navigation and jump to content Jump to site navigation Jump to section navigation.
NASA Logo - Goddard Space Flight Center + Visit NASA.gov
NASA Center for Computational Sciences
NCCS HOME USER SERVICES SYSTEMS DOCUMENTATION NEWS GET MORE HELP

 

User Services
OVERVIEW
ACCOUNT INFO
FAQ
GLOSSARY
TUTORIALS
CLASSES & WORKSHOPS

Latest Changes on Discover


On Thursday, July 10 2008, the NCCS will be upgrading the operating system on the Discover cluster from SLES-9 to SLES-10.

While the two OS's are binary compatible and recompiling is not absolutely required, it is strongly recommended that users recompile their applications to ensure that they have been built against the latest versions of various system libraries.

Below are a few things that users should be aware of regarding the upgrade to SLES-10

  • Scali MPI has been upgraded from verion 5.3 to version 5.6 The module for Scali MPI has been renamed from "scali-5.3" to "scali-5". Users will need to change their "module load" commands to load "mpi/scali-5" instead of "mpi/scali-5.3"

  • "ssh totalview" and "ssh idl" will no longer be supported. Any users that have used this method to establish X-forwarding for PBS jobs should use "xsub" instead. "xsub" accepts all the same arguments as qsub, and establishes the necessary X-forwarding for you.

  • Module changes (Compilers, Math Kernel Libraries, etc.)

  • The following modules that are currently available under SLES-9 will not be available under SLES-10. (Please contact user support if you feel you have a continuing need for any of these items)

    comp/gcc-3.3.6 comp/intel-8.1.034 comp/intel-8.1.038
    comp/intel-9.1.038 comp/intel-9.1.039 comp/intel-9.1.042
    comp/intel-9.1.046 comp/intel-9.1.049 comp/intel-10.0.023
    comp/intel-10.0.025 comp/intel-10.1.013 comp/intel-10.1.015
    comp/nag-5.1 comp/pgi-6.1.6 comp/pgi-6.2.4
    lib/mkl-10.0.2.018 lib/mkl-8.1 lib/mkl-9.0.017
    lib/mkl-9.1.018 lib/mkl-9.1.021 tool/tview-8.0.0.0
    tool/tview-8.1.0.1

  • The following modules will be available under SLES-10

    comp/gcc-4.1.2
    (natively available without a "module load")
    comp/gcc-4.2.4
    comp/gcc-4.3.1 comp/intel-9.1.052
    comp/intel-10.1.017 comp/nag-5.1-463
    comp/pgi-7.1.6 comp/pgi-7.2.1
    lib/mkl-9.1.021 lib/mkl-10.0.3.020
    tool/tview-8.4.1.6 mpi/scali-5

  • Additional modules will be made available for other software and will be listed as "other/". These modules will be made available as these packages are rebuilt.

  • Process limits

    • Default process limits (data size and stack size) will be set to a maximum safe limit based on physical memory on the various nodes. Under most circumstance, users should not need to change these settings. Please contact user support if you have questions or concerns about this.

  • Some of the additional software that currently resides in /usr/local is now included as part of SLES-10 and will no longer be maintained in /usr/local.

  • The software packages in /usr/local are being rebuild under SLES-10 and some may not be available initially. Please report any problems you find or anything that appears to be missing.

On Wednesday Feb 20th, the NCCS will be making the following changes to the discover system (downtime notice to follow):

  • Multiple login nodes will be used for interactive access.

    • Users will still connect to and ask for the service the same way they do now. The difference is that they will be placed on the login nodes discover05, 06, 07, or 08 (in a round-robin fashion) instead of discover01. More login nodes may be added to in the future.

    • Any user scripts that connect from a discover login node to another remote system may fail if the remote system does not allow all the discover login nodes access. Please have your system administrators contact the NCCS User Services Group for node address information if required.
  • CRON jobs will be run and managed from a single dedicated cron node so they don't impact interactive processes.

    • Once on discover, users may access and manage their cron jobs by connecting to discover-cron . The new login nodes will deny user CRON activity. All existing cron entries will be relocated to discover-cron.
  • System wide process virtual memory limits will be put in place.

    • In order to limit the impact from process that exceed a nodes memory resources, we will be setting virtual memory limits globally on discover.

This means that any single process on discover that reaches 6GB of virtual memory will be terminated. We have found that processes that reach 6GB of virtual memory will continue growing until they exceed the nodes memory resources. This causes the node(s) to hang and the filesystem daemon is frequently killed. Users may see a runtime library error if their processes exceed the 6GB virtual memory limit.


FirstGov logo + Privacy Policy and Important Notices
+ Sciences and Exploration Directorate
+ CISTO
NASA Curator: Mason Chang,
NCCS User Services Group (301-286-9120)
NASA Official: Phil Webster, High-Performance
Computing Lead, GSFC Code 606.2