// Using Mass Storage  

The Primer pages have been transformed into Using Discover, and Using Mass Storage pages. If you cannot find or have trouble finding the technical documentation from the Primer pages, please contact NCCS Support.

Dirac is the system that front-ends the mass storage system. It is recommended that users log into a Dirac node to manage file migration to and from mass storage.

The /archive filesystems are local to Dirac. The $HOME filesystems on Dirac and the Discover cluster are separate.

The Dirac nodes mount /archive (the cache filesystems) by CXFS using a fast Fiber Channel connection. CXFS is a proprietary distributed file system similar to GPFS. The limiting factor here is the disk and the bus speeds of the nodes themselves. From the point of view of the Dirac nodes, /archive is a local mount.

The /archive filesystems are also NFS mounted on all Discover login and gateway nodes. This makes it possible to be on Discover, and transfer data from the archive cache to your home on Discover.

How DMF works

Any file written to /archive (the cache filesystem) gets written to tape, even if the file is small, with one exception. Empty files (files with zero lengths) are not written to tape. There are no guarantees about which tape will store a particular file. If you have a thousand files, they can be spread over a thousand tapes. This can also cause a thousand tape mounts(if you recall them), which is a mechanical process, and therefore takes a long time. This, and the quota limit on the number of files you are allowed, is the reason it is better to have a few large files than many small ones.

As you determine your actual storage usage in order to effectively manage data holdings within assigned quotas, you should keep the following in mind. Normal Linux commands, like du only show what is actually used on the disk subsystem. They do not reflect what is offline on tape. There are DM aware commands that perform similar functions, but also account for files that are offline. For more details, expand the drop down below.

// DM Commands

Files managed under DMF can be in one of several states.

State Description
UNM are in the process of moving from tape to disk, also known as "unmigrating".
REG are not managed by DMF. Directories fall in this category.
OFL are offline, they are on tape only.
MIG are in the process of migrating from disk to tape.
DUL are in dual state, that is, they are available on disk and tape.

DM Commands

Command Description Usage
dmls The dmls command on dirac lets you know the state of a file. $ dmls -l $ARCHIVE/results
dmdu The dmdu command gives you a reasonable estimate of how much space is used in a directory by all files, those on disk and offline on tape. In contrast, du accounts for files only on disk and has no idea about the tapes. $ cd $ARCHIVE
$ dmdu -sk ./data
dmfind The normal find command will find all files whether they are offline or not, because the directory structure itself is always on disk, only the actual files are migrated to tape. Therefore, we suggest using the DMF aware dmfind command, which works similarly to the find command, but can tell the difference between DMF states. $ dmfind $ARCHIVE -type f -state OFL
# this command prints all files that are offline
dmget The dmget command is used to recall a file from tape to the disk cache. This could take a while. After completion the state of affected file changes from offline (OFL) to dual (DUL). Copying offline files to your $NOBACKUP is allowed, but you will endure the additional wait time caused by the implicit unmigration process to complete. $ dmget data2.tar
dmtag The dmtag command is used to check or set the sitetag of a file. More information $ dmtag -t 2 list-of-archive-files

If you need to recall more than a few hundred files at a time, consider asking us to organize your list by tape, so that you can recall the list in groups without overloading the tape system. Also, please be cognizant of the amount of data you recall at once. If you need to recall multiple terabytes of data, please request our help.

// BEST PRACTICES

The following are the most valuable recommendations to follow when using the archive system:

  • Create fewer large (up to 100GByte) files, instead of many small ones on /archive

  • Whenever possible, use a datamove node to copy a (large) file between /archive and $NOBACKUP

  • When you move data between /archive and non-NCCS systems, please do it by logging into Dirac. /archive is local to Dirac only and accessible via NFS on Discover

  • NFS mounts make it possible to be on Discover, and transfer data from the archive cache to your home on Discover. However, do this sparingly, as it is possible for one node to saturate the network with a few very large data transfer jobs. This is a shared resource, so when you flood the network you cause delays for everyone else.

  • It is in your best interest to organize your data so that files that are not needed in the very short term are bundled into a single file with tar and copied to your archive area. This reduces the number of files (inodes) charged to your quota, and reduces the number of tape mounts. There is no guarantee, however, that any two of your files will be on the same tape. All these actions help the performance of the filesystems and DMF database.

  • It seems clear that one big file is better than many little files. We recommend that you do not directly copy or rsync to the archives from your nobackup area. Instead, make a few tarballs of manageable sizes, and copy them to the archive area. If you find that you don't have the resources available to make a large tar file on /nobackup, then move that to the archives, there is a one-line solution that would create a tar stream, and instead of saving it in a file, it simply pipes it directly to the archives through a shell command: $ tar zvcf - ./work | ssh dirac "cat > /archive/u/myuserid/work.tar.gz"
    You must have passwordless ssh access between discover and dirac for this to work. There are disadvantages to this method. The intermediate tar file can not be checked for consistency. It can be a little slow, because ssh suffers from the same problems as scp, as you will see here.



Moving Data

Move your data from your local machine to Dirac, or from Dirac to Discover

// File Transfer from User to the Discover cluster

For security reasons it is highly recommended that users transfer files between local workstations and either Discover or Dirac (mass storage), using the Bastion Service Direct Mode described below.

COMMAND LINE FILE TRANSFER

For WinSCP users

This video instructional describes how to perform a secure file transfer between the Discover systems and a local windows workstation. The user will be given instruction on how to download and install the PuTTy terminal emulator software as well as the WinSCP secure GUI file transfer utility necessary to accomplish file transfers.



Users are highly encouraged to use the new versions of Putty 64-bit (0.70) and WinSCP (5.13.3)

  1. Use configuration files to automatically connect to discover, discover-nastran, etc.
    Download and use the following configuration files to automatically fill in required settings that will enable you to safely connect to the NCCS systems. These files can be found under Tools→Import/Restore. The INI file does not include usernames which the user will need to supply themselves (NASA AUID).
    Note: These INI files assume that Putty is installed in the default location for Windows 10 64-bit.

    WinSCP requires the user to edit, update the "password" field on the Proxy tab with the current RSA "Pin+Passcode," which can be found under the advanced settings section. Once you have entered the passcode hit "ok" followed by "save" and then "open." This process needs to be repeated every time a user attempts to log into the systems.

    If the user doesn't edit and update the pin+passcode they will lock out their TFTI token.

  2. Configure WinSCP as shown below, adding any other settings of interest, and then save the session. WINSCP session tab screen shot for demonstration
    Hostname NCCS-HOSTNAME 
    e.g., discover.nccs.nasa.gov  or dirac.nccs.nasa.gov 
    Username your user id
    Password _LEAVE_BLANK
    Advanced Options make sure to click to select
    Proxy Tab

    WINSCP proxy tab screen shot for demonstration
    Proxy Tab local
    Local Proxy Command "C:\\Program Files\\PuTTY\\plink.exe" -pw %pass -l %user %proxyhost direct %host
    (This assumes that plink.exe is found under C:\\Program Files\\PuTTY. Also note that "-l" in the command above is a "dash lowercase L". )
    Username your user id
    Password RSA "Pin+Passcode"
    Proxy Hostname login.nccs.nasa.gov
    Do DNS name lookup at proxy end Yes
  3. Once you complete the configuration of WinSCP, here is how to launch WinSCP:
    • Open WinSCP
    • Load your saved session
    • Click on the proxy branch in the left pane
    • Enter your PASSCODE (PIN + TOKENCODE) in the password field
    • Click Save, and then Login
    Note: A WinSCP feature request that asks for WinSCP to prompt for the proxy password if it is left blank is currently being tracked as bug #468, however the original bug was filed in 2009 so it's unclear how much traction it has at the moment.
    One caveat that is noteworthy to WinSCP users: once you have configured WinSCP as described above, do NOT launch PuTTY using the button embedded within WinSCP. Doing so will cause WinSCP to connect back to login.nccs.nasa.gov using a previously used PASSCODE resulting in a locked token. Launching the standalone PuTTY should always work fine.

For Command-line users

First make sure you have already completed the configuration step for proxied connections.Then you can transfer files, such as data sets or application files, directly to Discover or Dirac.

For example, from your workstation you can do the following:

$ scp ~/myfile user_id@discover.nccs.nasa.gov:~/.
$ scp user_id@dirac.nccs.nasa.gov:~/mydir/\*.tar ~/.

You will be asked to provide both your PASSCODE and your NCCS password for all the scp commands.

When closing a ProxyCommand connection, you will see the warning message "Killed by signal 1". This is SSH closing the ProxyCommand connection and can be ignored.

// File transfer from Discover to Dirac

Recommended methods to transfer data between Discover and Dirac

  1. In a datamove job use cp to copy from $ARCHIVE
  2. On a login node use cp to copy from $ARCHIVE
  3. On a login node use bbscp
  4. In a datamove job use bbscp through dirac
  5. On dirac use scp to copy to discover

File transfer between Discover and Dirac using the datamove partition

One way to prepare data ready for a large compute run on Discover is to first submit a batch job to the datamove partition in order to copy large data files from the archive or an external location to the Discover file systems.

Jobs submitted to the datamove partition run on a “gateway” node. The gateway nodes have external network interfaces, which allow access to the archive (Dirac), and other external systems. They also have access to Discover’s local GPFS cluster-wide file systems. You can submit a job to the datamove partition to migrate data into the Discover environment, making it visible to Discover’s compute nodes. You can then submit a compute job to analyze the data you moved. The results of your compute job can then be transferred back out of Discover via an additional datamove job.

You can submit three jobs all at once (a datamove job to move data into the system, a compute job, and a datamove job to move data back off the system). You can then use Slurm's dependency functions to ensure job execution in the correct order.

Copying Large Files from $ARCHIVE to $NOBACKUP using SLURM

First, ensure that the file you are moving is online and not on tape.
IMPORTANT: Retrieve your files from tape before attempting to use them! USE dmget

Once the state of the files in the archives are dual, they are ready to be copied to your $NOBACKUP where you will be able to use them.
IMPORTANT: Do not untar anything in the archive area!
Untarring files in /archive will create many small files in the archive area, which will then be written to many tapes.

The best way to copy from your $ARCHIVE into your $NOBACKUP is by using a datamove node, either with a batch job or interactively. The following example, assumes $ARCHIVE and $NOBACKUP into some hypothetical pathnames. Here is the Slurm script for the batch job for copying a big file from archives to your nobackup area using the datamove queue.

Batch job datamove transfer example: #!/bin/bash
#SBATCH -J mycopyjob
#SBTACH -t 00:01:00
#SBATCH -p datamove
#SBATCH --account=xxxx
source /usr/share/modules/init/bash
module purge
cp /archive/u/myplace/bigfile /discover/nobackup/myplace/

As a rule of thumb,it is recommended to set the wall time limit to be 2 seconds per gigabyte, and multiply that by 2 for good measure.

Interactive datamove transfer example:

If you are logged onto a discover node, start an interactive shell on a datamove node, and then issue the copy command. The -t flag will set the time limit to 1 hour.

$ salloc -p datamove -t 01:00:00
salloc: Pending job allocation 8164701
salloc: job 8164701 queued and waiting for resources
salloc: Granted job allocation 8164701
srun.slurm: cluster configuration lacks support for cpu binding
$ cp /archive/u/myplace/bigfile /discover/nobackup/myplace/

Moving Large Files to $ARCHIVE using cp on Discover

It is better to transfer few large files than many small ones between the tape cache (/archive) and your nobackup area.

When copying large files from discover to dirac, we recommend using "cp" over "mv." As mv deletes files after copying, there is a risk on losing your files if they system hangs due to storage shortage or any other issues. The following example script can be used to automate the transfer process. #!/bin/bash
#
# check the status of a command
#
if cp a b ; then
echo "Copy succeeded"
rm a
else
echo "Copy failed"
fi

The same script in csh looks like this: #!/bin/csh
#
# check the status of a command
#
if { cp a b } then
echo "Copy succeeded"
rm a
else
echo "Copy failed"
endif

BBSCP

Bbscp is a wrapper tool to the bbftp utility; bbscp generates the appropriate command for the users without requiring them to define flags and/or other options for bbftp. Bbftp is a remote file transfer utility that does NOT encrypt files for transfer. In return, it increases the transfer speed for quicker transfers between the systems. Bbscp and bbftp can only be used for file transfer between Dirac, the front-end to the mass storage system, and the Discover cluster. Bbscp should also be utilized through the Discover cluster. Follow the steps below to appropriately use the bbscp/bbftp utility:

  1. Log onto the Discover cluster, and set up passwordless ssh to Dirac from the Discover cluster.

    PASSWORDLESS SSH

  2. Once passwordless ssh has been set up, log onto both dirac1 and dirac2 servers(from the Discover cluster) and accept the RSA fingerprint: $ userid@discover: ssh dirac1
    $ userid@discover: ssh dirac2

  3. Bbscp runs the -r flag by default, so users can transfer single files or directories without specifying any flags. To transfer a file from your current working directory on the Discover cluster to your home directory on Dirac, run:$ userid@discover: bbscp test_file dirac:~/
    To transfer an entire directory from your current working directory on the Discover cluster to your home directory on Dirac, run:$ userid@discover: bbscp test_dir/ dirac:~/

  4. To learn more about the bbscp or bbftp, use the -h flag: $ userid@discover: bbscp -h
    $ userid@discover: bbftp -h

SCP

A simple file transfer utility, like scp, can also be utilized for file transfers. The following example command shows how to transfer a file from the Discover cluster to your Dirac home directory:$ userid@discover:scp test_file dirac:~/

For more options and help, refer to the scp man pages: $ userid@discover: man scp



Monitoring Data

Monitor your data on Dirac and archive tapes using utilities provided by NCCS

// Monitoring Data on MSS 

Monitor using Quota Command

Monitor your data on mass storage, using the Quota command. On Dirac, Quota allows you to monitor your MSS inode consumption and your dirac $HOME space consumption.

// QUOTA

Both Discover (compute nodes) and Dirac (mass storage) have a quota system but they are distinct from each other:

On Discover, the total volume and the number of files are restricted.

On Dirac, there is a space quota for groups and a limit on the number of files for individual users.

On Discover, the "showquota" command shows the limits on your home and nobackup areas.

The left side of the output looks like this: Memory Limit in KiloBytes.

                                           Block Limits |
Filesystem   type        KB            quota               limit             grace |
-------------------------------------------------------------------------------------+
dhome        USR       342208         1024000             1228800             none |
dnb31        USR       8100448      262144000             288358400           none |
Disk quotas for group fileset:
dnb43        FILESET  259487392     262144000             314572800           none |
Disk quotas for group fileset :
dnb43        FILESET  571424        5242880               5242880             none |

In this example, you are currently using about 342 MB in your home directory (dhome). Your home's quota is one gigabyte with a 20% overhead for the hard limit. You can exceed your quota if you are under the hard limit, however, any process that attempts to create enough data to exceed it is immediately terminated. The second line shows your quota in your nobackup area (in this example, dnb31), to be about 262GB. The information on filesets reflects the resources consumed by the groups of which you are a member. For most cases, you can safely ignore it.

On the right side of the output, you would see: File count limit

|            File Limits
| files    quota    limit       grace
-+----------------------------------------
| 1312         0        0       none
| 15233   100000   150000       none

You are using 1,312 inodes which includes files and directories in your home directory. Notice the 100,000 inode limit in your nobackup area. If you plan to run simulations that create tens of thousands of files in one session, you might exceed the inode limit. If you do exceed this limit, please contact the NCCS Applications team so they can help you identify ways to reduce your inode use.

On Dirac, use the "quota" command instead of "showquota" to see individual user quotas:

dirac $ quota -l userid

Disk quotas for user userid (uid 1230):

dirac $ quota -l userid
Disk quotas for user userid (uid 1230):
Filesystem block s quota          limit     grace  files   quota        limit        grace
/dev/cxvm/ddns3_cache06
                   3044     0                 0                  129   250000    400000
/dev/cxvm/is2201_user_home
                 2116360 6291456 6496256         58251  250000      400000

Group Space Quota

Group data quotas were enabled on Dirac for files stored under /archive based on the amount of data on tape requested by each group’s principal investigator in the SBU allocation process. Since the quota command does not account for data that is only on tape, use the /local/nccs/bin/mssgroupreport to see who is storing data in the group:

dirac $ /local/nccs/bin/mssgroupreport

Your groups are: groupname Usage Report (GBs:# of Files) for group groupname(groupid)
                 sfa_cache01     ...     sfa_cache04    ...     sfa_cache09
userid            0.0:0                121789.8:255367             0.0:0
Totals          121789.8 (GBs)              255367
Quota            124789.8
Percentage of Quota                         97.6V


Manage Mass Storage using DM Commands

Use DM Commands to manage your data on archive tapes

// DM Commands

Files managed under DMF can be in one of several states.

State Description
UNM are in the process of moving from tape to disk, also known as "unmigrating".
REG are not managed by DMF. Directories fall in this category.
OFL are offline, they are on tape only.
MIG are in the process of migrating from disk to tape.
DUL are in dual state, that is, they are available on disk and tape.

DM Commands

Command Description Usage
dmls The dmls command on dirac lets you know the state of a file. $ dmls -l $ARCHIVE/results
dmdu The dmdu command gives you a reasonable estimate of how much space is used in a directory by all files, those on disk and offline on tape. In contrast, du accounts for files only on disk and has no idea about the tapes. $ cd $ARCHIVE
$ dmdu -sk ./data
dmfind The normal find command will find all files whether they are offline or not, because the directory structure itself is always on disk, only the actual files are migrated to tape. Therefore, we suggest using the DMF aware dmfind command, which works similarly to the find command, but can tell the difference between DMF states. $ dmfind $ARCHIVE -type f -state OFL
# this command prints all files that are offline
dmget The dmget command is used to recall a file from tape to the disk cache. This could take a while. After completion the state of affected file changes from offline (OFL) to dual (DUL). Copying offline files to your $NOBACKUP is allowed, but you will endure the additional wait time caused by the implicit unmigration process to complete. $ dmget data2.tar
dmtag The dmtag command is used to check or set the sitetag of a file. More information $ dmtag -t 2 list-of-archive-files

If you need to recall more than a few hundred files at a time, consider asking us to organize your list by tape, so that you can recall the list in groups without overloading the tape system. Also, please be cognizant of the amount of data you recall at once. If you need to recall multiple terabytes of data, please request our help.



Monitor User, Group, Project Storage using MSS Commands

Use scripts on MSS to monitor storage for yourself and your team

// MSS Usage Commands

The following scripts allow users to better manage their and their team's storage:

Script Name Location Purpose Description
groupaccessreport /local/nccs/bin/groupaccessreport Provides a snapshot of the group’s data holding based on last access. The header numbers are days (e.g. 30 is data accessed in the last 30 days). The first data line is number of fines, and the second is amount of data in GBs. If no options are provided, the list from your default group is output. If you provided “-g groupname”, the list from groupname is output provided you are a member of that group.
groupchange /local/nccs/bin/groupchange.pl Provides a report on a requested group showing the change in usage for each of the group’s members Options are [-g groupname] [-s Start Date] [-e End Date]. Date format is YYYYMMDD. If the group is not provided, your default group is used. If either date is not specified, the current date is used.
groupgrowthreport /local/nccs/bin/groupgrowthreport Provides a daily total in GBs of data storage under a group. If no options are provided, the list from your default group is output. If you provided “-g groupname”, the list from groupname is output provided you are a member of that group.
useraccessreport /local/nccs/bin/useraccessreport Provides a snapshot of the user’s data holding based on last access. The header numbers are days (e.g. 30 is data accessed in the last 30 days). The first data line is number of fines, and the second is amount of data in GBs.
usergrowthreport /local/nccs/bin/usergrowthreport Provides a daily total in GBs of data storage under the user running this command.  





Second Copy

Make a second copy on tape systems for long term storage

// Second Copy

Only one tape copy of your archive file is generated for users. However, it is possible to ask for a second copy to be made with the dmtag command. This command is available on Discover login nodes and from the datamove partition. dmtag is used for checking and changing the "sitetag" of files in the archive. The purpose of the "sitetag" field in the inode is to indicate to the system that 2 tape copies should be made for this file. The command can be invoked as follows:

$ dmtag -t 2 list-of-archive-files

Like other dm commands, it can accept names streamed to its standard input. The only acceptable value for the sitetag (the -t option) is 2, which indicates that two copies are requested. Any value different from 2 results in a single tape copy. You can check the sitetag of a previously tagged archive file with dmtag with no arguments or options:

$ dmtag test_*
0 /cxfsm/cache06/users/g05/joe/test_t10kc.10GB
0 /cxfsm/cache06/users/g05/joe/test_t10kc.2gb
0 /cxfsm/cache06/users/g05/joe/test_t10kc.2gb.2
$ dmtag -t 2 test_*
$ dmtag test_*
2 /cxfsm/cache06/users/g05/joe/test_t10kc.10GB
2 /cxfsm/cache06/users/g05/joe/test_t10kc.2gb
2 /cxfsm/cache06/users/g05/joe/test_t10kc.2gb.2

Note: When you use dmtag to increase the number of tape copies to 2, an automated process will generate the second tape copy within several days, depending on system load.

Caveats

In order to run this command from Discover, passwordless ssh must be set up just like for dmget.

Passwordless SSH/SCP

The dmtag command is inconsistent when run against symlinks. Generally changing the sitetag of the link will actually change the tag of the file pointed to by the link, if that file is an archive file, but from some clients querying the sitetag with dmtag will show the sitetag of the link itself and not the file. We have reported the symlink inconsistencies to SGI.

Invoking the dmtag command from tcsh or csh can prevent some error checking from taking place in the case of non-existent files. For example, the command:

$ dmtag list* clap*
0 /cxfsm/cache06/users/g05/joe/list
2 /cxfsm/cache06/users/g05/joe/list.2

The shell simply skips files that do not exist, giving no error message. Using bash or ksh is a better option:

$ dmtag list* clap*
clap* does not exist or the file it points to does not exist
0 /cxfsm/cache06/users/g05/joe/list
2 /cxfsm/cache06/users/g05/joe/list.2