Achieving Expansive Biomass Estimates with Cloud Bursting

How a partnership between science, HPC, and private industry enabled complex computational processing of satellite images across Africa.

Project Goal:
NASA scientists set out to create the first-ever biomass estimate showing the amount of carbon stored in the vegetation spanning the Sub-Saharan region of Africa.

During a recent Amazon Web Services (AWS) Public Sector Summit, Intel Corporation presented a cloud computing search for the most groundbreaking research project. Intel offered to provide the ‘Head in the Clouds’ winner software optimization support and access to AWS services. NASA High-End Computing Program Manager Tsengdar Lee, recognizing a great opportunity to expand NASA scientific research, approached NASA Earth scientist Compton Tucker for ideas.

Tucker, in partnership with University of Minnesota colleague Paul Morin, proposed processing satellite images to count all of the trees and shrubs within a coast-to-coast swath of Africa, from the Sahara south to the savanna zones (see map below). The goal of the project would be to create a biomass estimate showing the amount of carbon stored in the region’s vegetation—carbon that could be released as carbon dioxide into the atmosphere in the event of deforestation. Such an estimate has been impossible to quantify to date.
This map represents the Sub-Saharan area of Africa covered by the tree census, spanning Universal Transverse Mercator (UTM) zones 28 through 38. At nearly 10 million square kilometers, the study area is larger than the continental United States. Image by Katherine Melocik, GSFC.

Impressed by its purpose and scope, Intel pledged resources to Tucker, Morin, and Lee’s project; these principal investigators then partnered with the NASA Center for Climate Simulation (NCCS) and Cycle Computing LLC for pre-processing and cloud bursting support.

To start, the scientists tackled 260,000 satellite images of the region totaling 200 terabytes of data. The region was divided into smaller and smaller pieces for parallel processing on the NCCS Advanced Data Analytics Platform (ADAPT), which ultimately deconstructed the images into multiple 25- by 25-km sub-tiled mosaics.

In order to complete the biomass estimates, researchers have been continually refining the algorithms that have, and will continue to, run in AWS. The goal is for the team-optimized algorithms to identify and count the number of trees and shrubs, measure crown area, and, from the shadows cast, determine tree height from the mosaics. A test case with UTM zone 32— mostly the country of Niger‚—won the 2015 HPCwire Readers’ Choice Award for the Best HPC Collaboration Between Government and Industry.

ADAPT is a computing system at the NCCS that combines high-performance computing and virtualization technologies to create an on-site private cloud. Its customizable operating environments enable researchers to process and analyze large datasets using the analytics solution of their choice. ADAPT also houses multiple data repositories that can be accessed for research.

In addition to its own cloud computing capabilities, ADAPT is able to facilitate cloud bursting into a commercial cloud service. While a commercial platform can provide vast amounts of virtual machines for processing user data, there are significant costs associated with data retrieval and data-at-rest. Therefore, it is imperative that users develop an in-depth data processing plan and have the resources allocated before starting to work with a commercial cloud provider.

For this demanding workload, the NCCS leveraged several innovations. Employees enhanced ADAPT with “fat” nodes with as much as 3 terabytes of random-access memory for the memory-intensive code, and 6 terabytes of flash memory to handle the myriad input/output operations required. The NCCS and Cycle Computing also upgraded the network link to the AWS East facility from 1 to 10 gigabits per second.

The NCCS also saved the project time and money. For cloud bursting, they leveraged “spot instances” that take advantage of off-hour processing time, so that processing 43 terabytes of data in 72 hours cost less than $2,000.

With ADAPT in demand by other big customers, “what we processed over the weekend in AWS would have taken us months here in ADAPT given the available resources,” said Garrison Vaughan, NCCS systems engineer.

The NCCS is dedicated to ensuring that NASA scientists and engineers’ projects receive the best computing solutions possible. This may include partnering with other organizations, including commercial solutions when beneficial and feasible.

To learn more about the science behind this study, see the following: