Understanding the Scalability and Computational Performance of GEOS

Dr. Hamid Oloso

Overview

The Goddard Earth Observing System (GEOS) is a modular family of models primarily used by NASA's Global Modeling and Assimilation Office (GMAO) for conducting research in weather analysis and prediction, seasonal-to-decadal analysis and prediction, reanalysis, global mesoscale modeling, and observing system science. To prepare GEOS for running at higher resolutions on computers with very large CPU counts, it is important to understand its scalability and computational performance. Towards this end, researchers are examining several profiling and performance analysis tools to assess their usefulness in gaining better insights into GEOS computational performance.

Project Details

GEOS is an evolving complex system of models coupled together by the Earth System Modeling Framework (ESMF) and Modeling and Analysis Program Layer (MAPL). At higher resolutions, the contributions to wall time from computation, communication, and I/O may shift from one part of the system to another. Moreover, the behavior of each model within the system may be variously impacted by architectural changes in the computing environment, including CPU design, interconnect, I/O subsystem, etc. In order to keep track of how scalability and overall computational performance of GEOS may be changing, it is important to routinely measure relevant performance metrics. It is also important to understand how those metrics could be used to know which areas of GEOS will benefit the most in terms of human resource allocation. To this end, as a first step, we are assessing several profiling and performance analysis tools for their abilities to handle a complex system like GEOS. We have looked at MPIProf, Tuning and Analysis Utilities (TAU), and Likwid, but are still exploring.

Results and Impact

Improving GEOS scalability and computational performance enables additional/improved/new scientific investigations aimed at better understanding the Earth's climate and weather system.

Why HPC Matters

Improvements in weather forecasting capabilities are dependent upon increasing spatial resolution. High-performance computing (HPC) is crucial both to provide sufficient memory to allow high-resolution representations of the state of the Earth’s system, but also for performing simulations quickly enough to be useful as forecasts. For example, running GEOS using 3-km horizontal resolution and 300 vertical layers and a simulation time step of 30 seconds will require upwards of 20 million compute cores.

What’s Next

As detailed profiling of GEOS continues, researchers will gather performance statistics that will subsequently direct efforts on what/where code changes are needed for improved scalability and computational performance without sacrificing scientific results.