nasalogo
thin_banner_image

HPC Systems

Refer to the Systems page for general information on the computing resources available at the NCCS.

New User Accounts

For information on requesting a new NCCS user ID, visit the Access Information page.

System Login

The NCCS Bastion host, login.nccs.nasa.gov, provides the ability to securely connect to the following NCCS resources:
  • discover.nccs.nasa.gov: the Linux "Discover" Cluster, or
  • dirac.nccs.nasa.gov: the Data Management Facility (DMF) on the SGI (Dirac) computing system, or
  • dali.nccs.nasa.gov or dali-gpu.nccs.nasa.gov: the "Dali" Data Analysis Cluster, or
  • discover-nastran.nccs.nasa.gov: for the NASTRAN project
from outside of the NCCS.

In addition to the original Login Mode, the NCCS Bastion Service now supports Proxy Mode, which can be used to streamline access to internal NCCS systems, including discover, dirac, and dali, and facilitates easier file transfer via SCP.

Use one of the two methods described below to log into the NCCS resources:

1. The Bastion Service -- Login Mode

For example, from your workstation or a resource outside the NCCS environment, access Discover using:
$ ssh -XY user_id@login.nccs.nasa.gov ... or... $ ssh -XY -t user_id@login.nccs.nasa.gov discover

-XY for trusted X11 forwarding is a secure way of running X applications via ssh to your local screen. For Windows Putty users, make sure you choose ssh protocol 2 and enable X11 forwarding in your configuration.

You will be asked to authenticate your access using your RSA SecurID authentication, which is your given PIN followed by a six-digit TOKENCODE shown in your SecurID key, and then initiate the login process to the NCCS computing systems using your NCCS password:

PASSCODE: PIN+TOKENCODE host: discover (or dali, dirac) password: your-NCCS-password

Note that no host prompt will appear if using the second "ssh" command shown above.

Once you are inside the system environment, you may ssh to other NCCS systems without entering the PASSCODE. For example, you can "ssh dali" from Discover passwordlessly.

2. The Bastion Service -- Proxy Mode

To use the proxied connection, you must be authorized first so that the bastion will permit the connection. Please contact the User Services Group (support@nccs.nasa.gov) for the authorization.

Note that system login via the Bastion Service Proxy Mode, which will be described below, applies to command-line users only.

Once you are authorized, then you need to establish a configuration step below, which will enable you to connect directly to NCCS systems via the bastion.

For command-line users, create a file, $HOME/.ssh/config, on your system as the following. You only need to substitute the user_id below with your login id. Also, you can leave out any host that you are not using, or not authorized to access.

host discover.nccs.nasa.gov dirac.nccs.nasa.gov dali.nccs.nasa.gov \ discover-nastran.nccs.nasa.gov   User user_id   LogLevel Quiet   ProxyCommand ssh -l user_id login.nccs.nasa.gov direct %h   Protocol 2   ConnectTimeout 30 (optional)

Once you create "config" file, make sure the permission of the $HOME/.ssh directory is set up as inaccessible by anyone except you, i.e.,

$ chmod 0700 $HOME/.ssh

Now you will be able to simply SSH or SCP to any of the hostnames following "host" in the above $HOME/.ssh/config file, for example,

$ ssh -XY user_id@discover.nccs.nasa.gov

Ensure that for the passcode you enter a freshly generated SecureID token code, i.e., one that has at least a couple of bars of life left.

NCCS Password

NCCS consolidates most of the NCCS computing resources to use a single NCCS password.  You may change your password by one of two methods:

Please follow the new NCCS password policy, including

  • password contains at least 12 characters, matching at least three of the following:
    • minimum of 1 digit;
    • minimum of 1 uppercase letter;
    • minimum of 1 lowercase letter;
    • minimum of 1 special character;
  • valid for maximum of 60 days;
  • may not be reused for 24 cycles;
  • will lock after 5 failed attempts;
  • changing a password again within 24 hours is not allowed

If you fail to change your NCCS password before your current one expires, the systems will prompt you to change your password immediately upon login. Please contact the NCCS User Services Group (mail to support@nccs.nasa.gov) should you need more information or request assistance to reset your NCCS password.

File Transfer via Proxy Mode

File transfer between your local workstation to Discover or the archive, Dirac

File transfer using the Bastion Service Proxy Mode is highly recommended because of the security reason. You must be authorized for proxied access. For first-time users, please contact the User Services Group (support@nccs.nasa.gov) for authorization request.

For Command-line users:

First make sure you have already completed a configuration step for proxied connections. Then you can transfer files, such as data sets or application files, directly to Discover/Dali or the archive system, Dirac. Note that Discover and Dali share the same global file systems.

For example, from your workstation you can do the following:

$ scp ~/myfile user_id@discover.nccs.nasa.gov:~/. $ scp -r user_id@dali.nccs.nasa.gov:~/mydir ~/. $ scp user_id@dirac.nccs.nasa.gov:~/mydir/\*.tar ~/.

You will be asked to provide both your PASSCODE and your NCCS password for all the scp commands.

NOTE: When closing a ProxyCommand connection, you will see the warning message "Killed by signal 1". This is SSH tearing down the ProxyCommand connection and is nothing to be alarmed about.

For WinSCP users:

1. Configure WinSCP as shown below and any other settings that you desire, and then save the session.

It is important that you DO-NOT populate/save any of the password fields.

Session Tab:
winscp-session

Hostname NCCS-HOSTNAME
e.g., discover.nccs.nasa.gov or dali.nccs.nasa.gov or dirac.nccs.nasa.gov or discover-nastran.nccs.nasa.gov
Username your user id
Password _LEAVE_BLANK
Advanced Options make sure to click to select


Proxy Tab:
winscp-proxy

Proxy Type local
Local Proxy Command "C:\\Program Files\\PuTTY\\plink.exe" -pw %pass -l %user %proxyhost direct %host
(This assumes that plink.exe is found under C:\\Program Files\\PuTTY. Also note that "-l" in the command above is a "dash lowercase L". )
Username your user id
Password _LEAVE_BLANK
Proxy Hostname login.nccs.nasa.gov
Do DNS name lookup at proxy end Yes


2. Once you complete the configuration of WinSCP, here is how to launch WinSCP:
  • Open WinSCP
  • Load your saved session
  • Click on the proxy branch in the left pane
  • Enter your PASSCODE (PIN + TOKENCODE) in the password field
  • Click Login (DO NOT click Save)

Note: A WinSCP feature request that asks for WinSCP to prompt for the proxy password if it is left blank is currently being tracked as bug #468, however the original bug was filed in 2009 so it's unclear how much traction it has at the moment.

One caveat that is noteworthy to WinSCP users: once you have configured WinSCP as described above, do NOT launch Putty using the button embedded within WinSCP. In this case it attempts to connect back to login.ncca.nasa.gov using the PASSCODE that was previously used. The end result is a locked token. Launching the standalone Putty should always work fine.

File transfer between Discover/Dali to Dirac

One way to prepare data ready for a large compute run on Discover is to first submit a batch job to the datamove queue , in order to copy large data files from the archive or an external location to the Discover file systems.

Jobs in the datamove queue run on a cluster gateway node that has access to external, archive, and cluster-wide file systems. Once the data is on a file system visible to the compute nodes, the compute jobs using these data can be executed. The results of the compute jobs can be saved back out to the archive or the external user system using another datamove job.

Therefore, you can submit the three jobs (two datamove jobs and one compute job) all at once, and then use PBS's dependency attribute to make one job depend on the completion of other jobs. See PBS Chain Jobs with Dependencies section for a sample script submitting jobs with dependencies.

For more details regarding data transfer, please refer to the Your Data page.

Passwordless ssh/scp within NCCS Systems

You can SSH or SCP within the NCCS systems without typing the NCCS password by setting up authorization keys. For example, to passwordlessly SCP or SSH from Discover to Dirac, use the following steps:

1. On Discover, create a new authorized_keys file under HOME/.ssh.

$ chmod 0700 $HOME/.ssh $ cd $HOME/.ssh $ ssh-keygen -t dsa

Hit enter two times for the prompted questions. This will create a pair of private and public identity files, id_dsa and id_dsa.pub, under the .ssh directory.

2. Copy the file id_dsa.pub into authorized_keys in the same directory. If the file authorized_keys already exists on the system, append the contents of id_dsa.pub.

$ cat id_dsa.pub >> authorized_keys

3. Copy the contents of id_dsa.pub file from Discover to Dirac:

$ scp $HOME/.ssh/id_dsa.pub \ userid@dirac.nccs.nasa.gov:~/.ssh/id_dsa.pub.discover

4. On Dirac, append the copied file to the authorized_keys file:

$ cat $HOME/.ssh/id_dsa.pub.discover >> $HOME/.ssh/authorized_keys

Once the authorized_keys file is in place on both Dirac and the NCCS HPC systems, you can start using passwordless SCP and SSH.

Default Shell

BASH is the default shell on Discover. If you want to change to another shell, either contact the NCCS User Services Group or try the "chsh" command.

To work temporarily within another shell, execute the appropriate command. The shells available are, to name a few, bash, csh, ksh, sh, tcsh, and zsh.

All shells have some type of script files that are executed at login time to set environmental variables (such as PATH and LD_LIBRARY_PATH) and perform other environmental setup tasks. The table below lists some common shells and the startup files that might require edits for setting up your personal environmental variables.

Shell Startup file to edit
sh or ksh $HOME/.profile
bash $HOME/.bashrc if it exists;
or $HOME/.bash_profile if it exists;
or $HOME/.profile if it exists (in that order)
csh $HOME/.cshrc
tcsh $HOME/.tcshrc if it exists;
or $HOME/.cshrc if there is no .tcshrc

Here we include two example shell startup files, one .bashrc and one .cshrc:

# Sample .bashrc script # Handy aliases alias ls='ls -a -F --color=tty ' alias ll='ls -l' alias myfind='find . -depth -name \!* -print' alias emfs='emacs -fn 9x15bold \!:1 ' alias dali='xterm -ls -T Dali -n Dali -e ssh -XY userid@dali-gpu ' # Setup basic modules module purge module load comp/intel-11.1.056 mpi/impi-4.0.3.008 # Example environment export PATH=${PATH}:/usr/local/other/SLES11/imageMagick/6.3.5/bin export PATH=${PATH}:/usr/local/other/SLES11/ncview/2.1.1/bin LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/other/SLES11/jpeg/6b/intel-11.1.069/lib export LD_LIBRARY_PATH # set default umask umask 022 # set stacklimit ulimit -S -s 12800

To ensure the changes in your .profile or .bashrc to take effect, invoke ". .bashrc".

# Sample .cshrc script # Feel-good prompt if ( $?prompt ) then # shell is interactive. set history=50 # previous commands to remember. alias setprompt 'set prompt = "../$cwd:t/"' # option cwd:t for tail alias cd 'cd \!* ; setprompt' alias pushd 'pushd \!* ; setprompt' alias popd 'popd \!* ; setprompt' setprompt endif # Handy aliases alias ls 'ls -a -F --color=tty ' alias myfind 'find . -depth -name \!* -print' alias so 'source ~/.cshrc' alias ll 'ls -l' alias emfs 'emacs -fn 9x15bold \!:1 &' alias dali 'xterm -ls -T Dali -n Dali -e ssh -XY userid@dali-gpu &' # Setup basic modules module purge module load comp/intel-11.1.056 mpi/impi-4.0.3.008 # Example environment setenv PATH ${PATH}:/usr/local/other/SLES11/imageMagick/6.3.5/bin setenv PATH ${PATH}:/usr/local/other/SLES11/ncview/2.1.1/bin setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/other/SLES11/jpeg/6b/intel-11.1.069/lib # set default umask umask 022 # set stacklimit limit stacksize 12800M

After editing your .cshrc file, you must issue the command "source .cshrc" for your changes to take effect.

File Systems

The NCCS provides several different types of file systems, including Home File System, Nobackup, Scratch, and Archive system.

  • Home File Systems ($HOME)
    • /home(or /discover/home)/<username>, symlinked to /gpfsm/dhome/<username>.
    • Part of the global file system GPFS, i.e., accessible from login and compute nodes.
    • Quota controlled, user quota set at 1GB.
    • Fully backed up. Full backups are performed once a week, every Sunday at midnight.
  • Nobackup File Sytems ($NOBACKUP)
    • /discover/nobackup/<username>, symlinked to /gpfsm/dnbxx/<username>
      ALWAYS use the symlinks, e.g, $HOME, $NOBACKUP, /home, or /discover/nobackup, in your scripts to specify paths. NEVER use the real path, /gpfsm/dnbxx, because it could be changed due to disk augmentations or system events.
    • Part of the global file system GPFS, i.e., accessible from login and compute nodes.
    • Quota controlled, user quota set at 250GB.
    • Generally used to store large working files used for running applications, post-processing, analysis, etc.
    • Is not backed up. Any files that need to be saved for long periods should be copied into the mass storage directories.
    • In addition to user $NOBACKUP space, some space driven by specific project requirements is set under /discover/nobackup/projects/<project_name>. There is also /usr/local/other, where most of third-party packages reside.
  • Scratch
    • /lscratch.
    • Set up on each compute node. Node-specific local directory. NOT global.
    • Access via the $LOCAL_TMPDIR environment variable; it is the fastest performing file system.
      We recommend users consider using the local scratch file system when their jobs will create or write/read a large number of small files. The files written in the local scratch can be copied to $NOBACKUP once the jobs are complete.
    • A temporary directory is created at the time a PBS batch job begins running. Temporary storage area created for the life of the PBS batch job, and any data that needs to be saved, must be removed before the job is completed.
  • Mass Storage ($ARCHIVE)
    • symlinked to /archive/<username> mounted on Discover/Dali.
    • NCCS runs the SGI Data Migration Facility (DMF) to manage long-term archive data. The user community accesses the archive data through (1) the Dirac servers via 10 Gigabit network connectivity, or (2) via NFS on Discover/Dali with 1 Gigabit connectivity.

Show Quota

For users, the HOME and NOBACKUP file system is controlled by quotas. To determine your resource usage and how it compares to your quota, type the showquota command:

$ showquota Block Limits Filesystem type KB quota limit grace | -----------------------------------------------------------+ dhome USR 351312 1024000 1228800 none | dnb31 USR 1077123680 1048576000 1258291200 4days | Disk quotas for group fileset <dnb43_map> dnb43 FILESET 259487392 262144000 314572800 none |

The right hand side of the output show this:

$ ... | files quota limit grace -+------------------------------------ | 1620 0 0 none | 109872 100000 150000 3hours Disk quotas for group fileset <dnb43_map>: | 48964 0 0 none

This output shows the quotas for a user in file system "dhome" set (the user's $HOME) to a soft limit of 1GB, a hard limit of 1.2GB, and 351,312 KB being currently used by the user.

The quotas for the user in file system "dnb31" set (the user's $NOBACKUP; "nb" stands for "no backup") to a soft limit of 1TB, a hard limit of 1.2TB. As soon as the user has exceeded the soft limit, the countdown to the grace peroid (total 7 days) starts. The output shows that there are 4 days left for the user to bring his usage below the soft limit quota value. If the user fails to do so within 4 days, the soft limit becomes the hard limit and the user would not be allocated any more space.

Under the "dhome" set, there are 1,620 files currently allocated to this user, but no soft or hard limt for file numbers (inodes) is enforced in $HOME. Under the nobackup fileset, however, the soft limit for files (inodes) is 100,000 and hard limit is 150,000. A grace period appears because the user has exceeded his quota. The user has 3 hours left now to bring his usage below the 100,000 inode value. If the user fails to do so, the user is not allocated any more space.

The rest of the "showquota" output displays the quota information for group(s) the user belongs to.

To find individual usage, you can always cd to directory and use the command "du". For example, to show the total usage of the current directory in KB:

$ du -ks *

If the limits imposed by quotas are a problem for you, please send an e-mail to the User Services Group (support@nccs.nasa.gov ). Indicate how much more space you will require, where the increase is required (e.g. /nobackup, /archive, /home), and why the increase request is being made.

Cron Job

Cron is available on discover-cron. Discover-cron is an alias for the one Discover login-style node that runs cron.It has access to all of the same filesystems as the other Discover nodes (including Dali) as well as the same NFS filesystems as the Dali nodes.

From Discover:

$ ssh discover-cron

To access the crontab use this command:

$ crontab -e

This will allow you to edit your crontab.

Here are examples of crontabs.

.---------------- minute (0 - 59) | .------------- hour (0 - 23) | | .---------- day of month (1 - 31) | | | .------- month (1 - 12) or jan,feb,mar... | | | | .---- day of week (0 - 6) (Sunday=0) | | | | | * * * * * command to be executed 21 13 * * * mycron.csh 1 > FULLPATH/test.out 2>/dev/null 52 * * * * showquota 1>> FULLPATH/test.out 2>&1 52 * * * * ssh dali mycron.csh 1>> FULLPATH/test.out 2>&1 21 13 * * * mycron.csh | mailx -s "Subject" User@whereever.com 0 1 * * * ssh dali mycron.csh 2>&1 | mailx -s "Subject" User@nasa.gov

The third example shows how to run a command or script on Dali nodes to take advantage of their large memory. As shown above, standard output is redirected with 1> or 1>>. Standard error is rediected with 2>. To write/overwrite a file use >, to append to a file use >>.

The last two examples show how to set up emailing standard output and standard error to the user. Be careful, however, if the standard output or standard error is large (which may happen if the job does not run as expected), the mail daemon would have problems delivering the email, which fills up /var on the node and may cause problems with the node.

(Updated after SLURM transition) For batch jobs submitted via cron, you will first need to source /etc/profile to define appropriate environment before issing the submission, i.e.,

0 1 * * * . /etc/profile.local ; /usr/slurm/bin/qsub myjob.sh 1>> FULLPATH/submit.out 2>&1



Doris Pan (doris.pan@nasa.gov) Last updated 12/09/2013

Valid XHTML 1.0 Strict

Suggestions are always welcome. Please send mail to NCCS Support: support at nccs.nasa.gov
usagovlogo
 
nasalogo

shim