// BEST PRACTICES
The following are the most valuable recommendations to follow when using the archive system:
- Create a few large (up to 100GB) files, instead of several small ones on /archive
- Whenever possible, use a datamove node to copy a (large) file between /archive and $NOBACKUP
- When you move data between /archive and non-NCCS systems, please do it by logging into Dirac. The /archive filesystem is local to Dirac but accessible via NFS on Discover
- NFS mounts make it possible to be on Discover and transfer data from the /archive to your /home on Discover. However, do this sparingly, as it is possible for one node to saturate the network with a few very large data transfer jobs. This is a shared resource, so when you flood the network you cause delays for everyone else.
- Organize your data so that files not needed in the short term are bundled into a single tar file and copied to your /archive area. This reduces the number of files (inodes) charged to your quota, and reduces the number of tape mounts. There is no guarantee, however, that any two of your files will be on the same tape. These actions help the performance of the filesystems and DMF database.
- One large file is better than many small files. We recommend that you do not directly copy or rsync to the archives from your nobackup area. Instead, make a few tar files of manageable sizes, then copy them to the archive area. If you find that you don't have the resources available to make a large tar file on /nobackup, then move that to the archives. There is a one-line solution that would create a tar stream, and instead of saving it in a file, it simply pipes it directly to the archives through a shell command:
$ tar zvcf - ./work | ssh dirac "cat > /archive/u/myuserid/work.tar.gz"
You must have passwordless ssh between Discover and Dirac for this to work. There are also disadvantages to this method: the intermediate tar file cannot be checked for consistency, and the command can be a little slow due to ssh. - For users of the GEOS software, please read the GMAO's guidance on how to control GEOS output, both to learn how to reduce the number of data collections saved and how to create a custom data collection to eliminate the need to write unneeded data to disk or tape.