Latest Changes on Discover
On Thursday, July 10 2008, the NCCS will be upgrading the operating
system on the Discover cluster from SLES-9 to SLES-10.
While the two OS's are binary compatible and recompiling is not
absolutely required, it is strongly recommended that users
recompile their applications to ensure that they have been
built against the latest versions of various system libraries.
Below are a few things that users should be aware of regarding
the upgrade to SLES-10
- Scali MPI has been upgraded from verion 5.3 to version 5.6
The module for Scali MPI has been renamed from "scali-5.3"
to "scali-5". Users will need to change their "module load"
commands to load "mpi/scali-5" instead of "mpi/scali-5.3"
- "ssh totalview" and "ssh idl" will no longer be supported.
Any users that have used this method to establish X-forwarding
for PBS jobs should use "xsub" instead. "xsub" accepts all
the same arguments as qsub, and establishes the necessary
X-forwarding for you.
- Module changes (Compilers, Math Kernel Libraries, etc.)
- The following modules that are currently available under SLES-9
will not be available under SLES-10. (Please contact user support
if you feel you have a continuing need for any of these items)
| comp/gcc-3.3.6 |
comp/intel-8.1.034 |
comp/intel-8.1.038 |
| comp/intel-9.1.038 |
comp/intel-9.1.039 |
comp/intel-9.1.042 |
| comp/intel-9.1.046 |
comp/intel-9.1.049 |
comp/intel-10.0.023 |
| comp/intel-10.0.025 |
comp/intel-10.1.013 |
comp/intel-10.1.015 |
| comp/nag-5.1 |
comp/pgi-6.1.6 |
comp/pgi-6.2.4 |
| lib/mkl-10.0.2.018 |
lib/mkl-8.1 |
lib/mkl-9.0.017 |
| lib/mkl-9.1.018 |
lib/mkl-9.1.021 |
tool/tview-8.0.0.0 |
| tool/tview-8.1.0.1 |
|
|
|
- The following modules will be available under SLES-10
comp/gcc-4.1.2 (natively available without a "module load") |
comp/gcc-4.2.4 |
| comp/gcc-4.3.1 |
comp/intel-9.1.052 |
| comp/intel-10.1.017 |
comp/nag-5.1-463 |
| comp/pgi-7.1.6 |
comp/pgi-7.2.1 |
| lib/mkl-9.1.021 |
lib/mkl-10.0.3.020 |
| tool/tview-8.4.1.6 |
mpi/scali-5 |
- Additional modules will be made available for other software
and will be listed as "other/". These modules
will be made available as these packages are rebuilt.
- Process limits
- Default process limits (data size and stack size) will be set to
a maximum safe limit based on physical memory on the various nodes.
Under most circumstance, users should not need to change these
settings. Please contact user support if you have questions or
concerns about this.
- Some of the additional software that currently resides in
/usr/local is now included as part of SLES-10 and will no
longer be maintained in /usr/local.
- The software packages in /usr/local are being rebuild under
SLES-10 and some may not be available initially. Please report
any problems you find or anything that appears to be missing.
On Wednesday Feb 20th, the NCCS will be making the following changes
to the discover system (downtime notice to follow):
- Multiple login nodes will be used for interactive access.
- Users will still connect to and ask for the service
the same way they do now. The difference is that they will be placed on
the login nodes discover05, 06, 07, or 08 (in a round-robin fashion)
instead of discover01. More login nodes may be added to in the future.
- Any user scripts that connect from a discover login node to another
remote system may fail if the remote system does not allow all the
discover login nodes access. Please have your system administrators
contact the NCCS User Services Group for node address information
if required.
- CRON jobs will be run and managed from a single dedicated cron node so
they don't impact interactive processes.
- Once on discover, users may access and manage their cron jobs by
connecting to discover-cron . The new login
nodes will deny user CRON activity. All existing cron entries will
be relocated to discover-cron.
- System wide process virtual memory limits will be put in place.
- In order to limit the impact from process that exceed a nodes
memory resources, we will be setting virtual memory limits globally
on discover.
This means that any single process on discover that reaches 6GB of
virtual memory will be terminated. We have found that processes
that reach 6GB of virtual memory will continue growing until they
exceed the nodes memory resources. This causes the node(s) to hang and
the filesystem daemon is frequently killed. Users may see a runtime
library error if their processes exceed the 6GB virtual memory limit.
|