NCCS Debuts Centralized Storage
with Curated Earth Science Datasets

Now available to all NASA Center for Climate Simulation (NCCS) computing platforms is a new resource hosting notable Earth science datasets. NCCS users can access the Centralized Storage System (CSS) from the Discover supercomputer, the ADAPT Science Cloud, the DataPortal, and a new GPU cluster focused on Artificial Intelligence (AI) and Machine Learning (ML) applications that will be available on August 10, 2020. As of July 2020, CSS holds 10 petabytes of curated datasets from NASA, other agencies, and national and international research programs.

“NCCS previously provided long-term storage on tape media, which resulted in data that was difficult for users to discover and integrate into compute workflows,” said Bennett Samowich, NCCS CSS Lead. “By holding curated data products that are archived at other locations, CSS allows fast access to datasets in support of scientific research.“

“In particular, AI and Machine Learning workflows can take advantage of the availability of these datasets,” Samowich noted. “CSS is the best location for final data products that can become input to other projects.”

The Centralized Storage System extends across several hardware racks and currently holds 10 petabytes of curated Earth science datasets. Photo by Laura Carriere, NCCS.

CSS-hosted datasets span a range of types and sources, including:

The path “/css” provides access to all of the CSS data, which is stored using IBM’s Spectrum Scale General Parallel File System (GPFS). High-speed internal InfiniBand and 40 Gigabit Ethernet networks connect CSS to the NCCS compute and Data Services environments

While CSS is available as read-only, its 15-petabyte capacity (growing to 30 petabytes later this summer) will allow NCCS users to make their own data available to other users. Besides Earth science data, NCCS is open to hosting users’ astrophysics datasets in the future. NCCS encourages users to share their data by requesting CSS access and moving their final data products to CSS. Data must fall under an active Data Management Plan; contact the NCCS User Services Group to get help setting up a plan.

In addition to CSS, “NCCS will continue to provide local storage on both Discover and ADAPT for intermediate datasets,” Samowich said. “This local storage will continue to support workflows that require fast I/O during computation as well as analysis of research results.”

Jarrett Cohen, NASA Goddard Space Flight Center