IOPS Galore Encore: Upgrading a Supercomputer's Metadata with Non-Volatile Memory

Jordan Robertson

Overview

The most recent wave of enhancements to the NASA Center for Climate Simulation (NCCS) Discover supercomputing cluster included an upgrade of the metadata storage. This has significantly enlarged the metadata capacity of the Discover cluster's storage; increased user and administrator abilities in metadata operations, including input/output operations per second (IOPS); and introduced new technology into the General Parallel File System (GPFS) cluster.

Project Details

Discover will soon contain 3,800+ nodes (approximately 110,000 cores) and more than 40 petabytes of usable storage. To support such a dynamic and robust operation, the NCCS uses IBM’s GPFS software to manage user data. GPFS can store data and metadata either together or separately, allowing system administrators to place metadata on storage systems that are physically separate from the data storage systems.

Separating data and metadata has many benefits, one of which is using technologies such as Flash, Non-Volatile Memory Express (NVMe), etc. that may not be feasible for large storage clusters. The NCCS has taken advantage of this ability by upgrading the previous SATA Multi-Level Cell (MLC) Solid-State Drive (SSD) metadata storage to an NVMe SSD solution. Among the benefits of this technologically superior solution is leveraging NVMe for significantly higher IOPS and throughput numbers.

Results and Impact

This year, system administrators successfully transitioned all of Discover’s GPFS metadata over to the NVMe solution—a shift of over 660 million inodes (~14 terabytes across 25 filesystems). Improvements include a metadata performance threshold increase of up to 5x, zero noticeable impact during most IOPS-intensive processes, and a significant metadata capacity increase from 16 terabytes to ~70 terabytes (usable). In turn, more Discover users can run simultaneous IOPS-intensive code with significantly shorter completion times.

Why HPC Matters

The NCCS high-performance computing (HPC) environment gives NASA scientists and engineers a place to generate and analyze up to petabytes of data within an effective period of time. By providing more advanced storage technologies, the NCCS enables these researchers to work with their data at a fraction of the time cost and tackle scientific problems of greater size and complexity.