Containerizing the NASA Land Information System Framework
Garrison Vaughan
Overview
Developed by the Hydrological Sciences Laboratory at NASA’s Goddard Space Flight Center (GSFC), the Land Information System (LIS) is a high-performance software framework for terrestrial hydrology modeling and data assimilation. LIS enables integrating satellite and ground-based observational products and advanced modeling algorithms to extract land surface states and fluxes.
The LIS framework is difficult for non-experts to install due to many dependencies on specific versions of software and compilers. This situation has created a significant barrier to entry for domain scientists interested in using the software on their own computing systems or in the cloud. In addition, the requirement to support multiple run-time environments across the LIS community has created a significant burden on the NASA team. To overcome these challenges, NASA has deployed LIS using Docker containers, which allows installing an entire software package, along with all dependencies, within a working runtime environment, as well as Kubernetes, which orchestrates the deployment of a cluster of containers. Installations that took weeks or months can now be completed in minutes either in the cloud or in on-premises clusters.
Project Details
This project's goal is to simplify the barrier of entry to running the LIS framework in an HPC cluster. Instead of requiring end users—who are likely domain scientists and not systems experts—to undertake the complex build process required for LIS on each node of their cluster, this project provides a simplified approach where end users can deploy prebuilt LIS containers across their nodes using Kubernetes.
Results and Impact
To develop and test this project's approach to running distributed LIS jobs in Docker containers, researchers employed a small, 300-core InfiniBand-interconnected test system comprised of 25 nodes of retired high-performance computing (HPC) gear from the NASA Center for Climate Simulation's Discover supercomputer. Researchers ran a 252-core, MPI-based LIS job across 24 Docker containers in this cluster, orchestrated by Kubernetes. The job successfully completed a North American Land Data Assimilation System model test run. With the LIS framework built in a Docker container, this build process only had to be completed once, and subsequently could be used across an arbitrary amount of compute nodes by using Kubernetes to deploy a cluster of these LIS Docker containers. This same container can now be deployed on any system capable of running Docker containers, making it easy to reproduce this environment in the cloud or on any other HPC cluster without the need of going through the LIS build process again.
Why HPC Matters
Depending on the model of interest and inputs selected for use in the LIS framework, a LIS job needs anywhere from one to thousands of cores in order to complete in a reasonable time frame. Such workloads would be impossible without HPC systems having high-speed interconnects.
What’s Next
Researchers will explore the use of Docker and Kubernetes for deploying other NASA applications with complex build requirements.