Learning from the Weather

Dan'l Pierce

Overview

Whether it leads to a bumper or paltry rice crop in Texas, severe drought heightening the wildfire risk in California, or a destructive hurricane hitting Florida, weather has dramatic effects on people’s lives. Produce prices could be higher, vacations unexpectedly interrupted, or homes damaged. Having more information about weather and being able to predict or identify upcoming events can aid in enjoying favorable conditions as well as avoiding dangerous situations.

Enabling more robust analysis of weather data is the new Data Analytics Storage System (DASS) at the NASA Center for Climate Simulation (NCCS). DASS is a compute cluster with significant storage associated with each node. Data stored on DASS includes many public datasets and weather reanalysis data, although DASS does not serve as the official source for any of the data.

Project Details

With a suite of computational capabilities on DASS, scientists can apply analytics to ensembles of data files, specifying spatial and/or temporal criteria to identify weather data of interest. In the first DASS release, analytic capabilities are limited to max, min, sum, average, standard deviation, and anomaly calculations. Growth of analytics will proceed based on user demands and emerging capabilities. Access to DASS is through a Python interface or the web processing service (WPS). There are specialized capabilities in different stages of deployment ranging from the Super Cloud Library (SCL), which exploits the Hadoop Distributed File System (HDFS) and can identify subsetted data over time (e.g., following a hurricane track), to interfaces with NASA Climate Model Data Services (CDS) and the Earth Data Analytics System (EDAS) under development.

Results and Impact

To date, DASS has been successfully demonstrated using SCL and EDAS. It provides both a Portable Operating System Interface (POSIX) and Hadoop file access to the same data and corresponding specialized environments for tools needing these capabilities. Machine and deep learning methods and correspondingly enabled infrastructure will be added in time. The vision is to give scientists the ability to extract specific weather events, as they occur in time, from a massive ensemble of data across a multitude of files and then to use that dataset for further discovery and applications.

Why HPC Matters

The DASS architecture requires the coupling of high-performance computing (HPC) resources, both hardware and software, with the storage, access, protection, and manipulation of many extremely large datasets. This effort includes exploiting parallel computations and well as parallel I/O.