// NCCS Transfer Nodes
The NCCS has configured a number of transfer nodes to improve our users’ ability to move medium and large datasets in and out of our systems as well as between NCCS systems. Currently the nodes allow transfers to, from, and between ADAPT and CSS storage. Small file transfers can to be performed on any node a user can access using transfer protocols such as scp or sftp. We are working to add selected Discover filesystems and this should be available in the March 2025 timeframe.
Limitations:
- Transfer nodes do not have access to any internal NCCS compute systems.
- To preserve load balancing, once a user is logged into a transfer node, SSH between transfer nodes is restricted. If a user has been disconnected and already has a running process such as tmux or screen on a different transfer node, they will then be able to SSH to that node and rejoin the session.
- Transfer nodes are designated only for transfer; any scripts or workflows found to be performing compute calculations will be terminated.
Internal Transfers
The NCCS transfer nodes currently offer mounts for Explore/ADAPT storage and CSS. Transferring files between the filesystems can be performed using standard Linux tools such as cp, mv or rsync.
External Transfers
The NCCS transfer nodes have the following transfer protocols available:
- Scp
- Rsync
- Sftp
- HPN-SCP
- HPN-SFTP
- Rsync (with HPN-SSH)
- Fpsync (Multi threaded rsync)
See below for important tips on how to use the above transfer mechanisms. Note, these apply for local and remote data transfers.
Rsync with directories with trailing slashes:
If using trailing slashes, rsync will copy the CONTENTS of the supplied path.
$ rsync -av --progress --partial /source/directory/ /target/location/
If using rsync without trailing slashes, the source directory will be created in the target location and then all files copied to the directory. The following example will result in a /target/location/directory being created and all the contents of /source/directory being copied into the new directory.
$ rsync -av --progress --partial /source/directory /target/location
Using rsync with ssh keys:
Performed on NCCS transfer nodes:
Push example:
$ rsync -av --progress --partial -e “ssh -i /location/to/ssh_key” /source/file userid@target_server:/target/location
Pull example:
$ rsync -av --progress --partial -e “ssh -i /location/to/ssh_key” userid@target_server:/source/file /target/location
Performed on server external to NCCS:
Push example:
$ rsync -av --progress --partial -e “ssh -i /location/to/ssh_key” /source/file userid@transfer.nccs.nasa.gov:/target/location
Pull example:
$ rsync -av --progress --partial -e “ssh -i /location/to/ssh_key” userid@transfer.nccs.nasa.gov:/source/file /target/location
HPN-SSH
HPN-SSH is a fork of openssh provided by Pittsburgh Supercomputing Center. It attempts to increase transfer speed for long haul transfers by multi-threading encryption. In testing we have seen up to a 20% increase in speed. Please keep in mind, increases in speed vary depending on situation.
Using HPN-SSH
NOTE: HPN-SSH requires a client and server for increased transfer speed. Users will need a client installed on the sending node when sending to NCCS. If sending from the NCCS, HPN-SSH clients are available on the NCCS transfer nodes. Client and server installation instructions can be found at https://www.psc.edu/hpn-ssh-home/
Rsync with HPN-SSH
Performed on NCCS transfer nodes:
Push example:
$ rsync -av --progress --partial -e “hpnssh -i /location/to/ssh_key” /source/file userid@target_server:/target/location
Pull example:
$ rsync -av --progress --partial -e “hpnssh -i /location/to/ssh_key” userid@target_server:/source/file /target/location
Performed on server external to NCCS:
Push example:
$ rsync -av --progress --partial -e “hpnssh -i /location/to/ssh_key” /source/file userid@transfer.nccs.nasa.gov:/target/location
Pull example:
$ rsync -av --progress --partial -e “hpnssh -i /location/to/ssh_key” userid@transfer.nccs.nasa.gov:/source/file /target/location
HPN-SCP
HPN-SCP and HPN-SFTP work identically to the standard linux scp utility.
Fpsync
Fpsync is a utility that uses its own file data crawler and scheduler to allow for multiple threads of rsync to run at the same time. Information about fpsync and its file data crawler (fpart) can be found at https://www.fpart.org/fpsync/
- Fpsync can only push data as the file data crawler must access files it will send locally.
- Due to authentication, fpsync will need to use an ssh key when sending data to a remote host.
- Fpsync scans the file system when the process is started. Once a scan is finished, the file system is no longer analyzed. This means that if live updates are being made on the source, some changes could be missed and a resync will need to be run.
- Fpsync is meant to be used with large file count sends, it will not produce useful results for file counts under 1000.
- Fpsync does not support all rsync flags, avoid modifying flags where possible. Results may be unpredictable.
Usage:
The following example starts a sync with 4 rsync threads and 100 files per rsync job. The number of files per job will need to be adjusted based on file count being sent. Typically, we suggest 1000 files per job as the optimal setting.
$ fpsync -n 4 -f 100 -o "-lptgoD -v --numeric-ids -e \\\"ssh -i ssh_key_path\\\"" /source/path/ user@remote_host:/destination/path/
Note: By default the NCCS has set fpsync to use /tmp/$USER/fpsync as its temp directory. If you would like to change this please set the path in the FPSYNC_TMP environment variable.
Accessing NCCS transfer nodes
ssh transfers:
$ ssh user@transfer.nccs.nasa.gov
HPN-ssh transfers:
$ hpnssh user@hpn-transfer.nccs.nasa.gov
NCCS SSH Fingerprints