Machine Learning in Complex Earth System Models

Dr. Craig Pelissier

Overview

There has been a surge of interest in Machine Learning (ML), largely due to the availability of massive amounts of data and the increased power of modern supercomputers. ML for predictive analytics has seen success in many areas, for example, in image processing and language recognition. Scientists are now considering ML to replace complex physics-based models. They take this approach reluctantly, since the goal of science is to understand the underlying dynamics of physical systems. In addition, since physical processes underlie the data being modeled, a physics-based model should exist that provides more accurate results. However, ML models offer at least two important things: (1) If ML models extract more information from the data, it implies that physics-based models can be improved; inaccuracy is not just a result of poor or insufficient data. (2) Applications such as weather forecasting that provide a public service should take advantage of the methods that produce the most accurate predictions. In this work, a ML model is used in NASA's Land Information System (LIS) and is shown to outperform currently used models. The research also involves inferring model deficiencies from the ML models.

Project Details

Researchers used North American Land Data Assimilation System (NLDAS) data to produce land surface parameters important for predicting the weather, droughts, and other climate-related phenomenon. In this work, the Gaussian Process Regression (GPR) ML model is used within LIS to determine land surface conditions and compare the results to other state-of-the-art models currently being used.

Results and Impact

The ML learning model produced more accurate results, demonstrating that more information can be extracted from current observational data. These results imply physics-based model improvements can lead to more accurate predictions with the observational datasets available today, and ML has the potential to help infer where the current models are deficient.

Why HPC Matters

Simulations using complex Earth System models (ESMs) require large computational resources, and high-performance computing (HPC) is essential. Training ML models on large training sets can often lead to a bigger computational burden. Additionally, ESMs used for weather prediction have strict time-to-solution requirements; predictions are most accurate when using recent observations to predict the near future, which limits the allowable simulation time. As a result, compute resources are often the limiting factor on simulation resolution.

What’s Next

The next steps are to investigate different ML approaches to extract the most information from observational data and produce potentially more accurate results. The ultimate goal is to develop a systematic way of learning how to improve physics-based models from ML.