Exascale Cyberinfrastructure for NASA's Weather and Climate Models

Dr. Dan Duffy

Overview

The NASA Center for Climate Simulation (NCCS) works closely with the NASA science community to push research boundaries and further our understanding of the environment. Through partnerships with the Global Modeling and Assimilation Office (GMAO) and the Goddard Institute for Space Studies (GISS), we employ supercomputing resources to understand and predict weather and climate processes at increasingly high resolutions in both space and time. Currently, the GMAO is running operational research weather predictions at 12-kilometer (km) global resolution to support NASA missions and field campaigns and help design next-generation remote sensing systems.

Project Details

Over the past few years, the GMAO has pushed the Goddard Earth Observing System (GEOS) model down to an unprecedented 1.5-km global resolution. While the results were stunning, the model runs highlighted areas in GEOS that need better understanding—such as convection, physics, and chemistry—to improve predictions. Harnessing supercomputing resources, the goal is to increase model resolution 10-fold over the next 10 to 15 years, resulting in 10 million times more data. A true high-performance, exascale cyberinfrastructure will be necessary to meet this goal, and the NCCS is already exploring evolving its computing systems toward that end.

Results and Impact

Perhaps even more challenging than overcoming computational barriers and making algorithmic improvements will be building the capability to analyze the resulting large datasets. We have begun exploring virtual environments as well as deploying scalable storage systems. The latter hold commonly used datasets in Portable Operating System Interface (POSIX)-compliant, parallel file systems that can perform parallel analytics using MapReduce and Spark without having to change the native data format. Coupled with custom Application Programming Interfaces (APIs), users can quickly access and analyze massive and diverse datasets.

Why HPC Matters

This research is impossible without high-performance computing (HPC), as problem sizes are too large to fit within the footprint of a single computer's memory and computational capability. For example, the current GMAO reanalysis of the Earth’s atmosphere over the satellite era (1980 to present) generates about 400 terabytes of data at 50-km resolution. Plans to add processes such as full chemistry and increase resolution over the next decade would yield a reanalysis generating tens of petabytes of data and make it nearly impossible for the end user to download data to perform science.

What’s Next

The latest NCCS computational upgrade will add approximately 20,000 computational cores using a 100 gigabit-per-second interconnect. In addition, the Advanced Data Analytics Platform (ADAPT) is a virtual machine environment designed for large-scale analytics and offers thousands of compute cores and approximately 10 petabytes of data storage. Finally, the Data Analysis Storage System (DASS) will enable embedded analytics using Spark and MapReduce across approximately 15 petabytes of data. The combination of these systems will provide the NCCS with the necessary architecture to enable science at even higher resolutions and accelerate our understanding of the Earth.