The Only Constant Is Change—Evolution of an HPC Cluster

Bruce Pfaff

Overview

The Discover supercomputer cluster is the heart of the high-performance computing (HPC) services provided at the NASA Center for Climate Simulation (NCCS). The NCCS provides HPC and data storage services for NASA's Science Mission Directorate, including numerous NASA and university researchers. These users have come to rely on the regular technology upgrades to Discover, which allow them access to the most current processor and high-speed network technologies available.

Project Details

Over the past 11 years, the Discover supercomputer cluster at the NASA Center for Climate Simulation has undergone numerous changes. Discover has hosted eight generations of Intel processors, various GPU and co-processor technologies, five generations of high-speed network interconnects, six generations of disk storage systems, and three generations of metadata storage devices in 14 scalable compute units (SCUs) from five different HPC vendors.

Maintaining a heterogeneous resource like Discover requires the NCCS staff to rapidly adjust and adapt to new technologies and to develop expertise across processor technologies, high-speed interconnects, networking, rack design and layout, electrical load balancing, advanced cooling technologies, system integration, and facilities design and planning.

Results and Impact

This year, the NCCS is expanding the Discover cluster with its first Omni-Path-based SCU, which will be employing Intel Skylake processors. SCU14 will contain 20,800 Skylake cores and offer 1.5 petaflops of performance, allowing for extended support of higher-resolution scientific models. This addition will bring the overall size of Discover to over 3,800 compute nodes and approximately 110,000 processing cores.

Why HPC Matters

As a result of the latest upgrade to the Discover supercomputer, the NCCS will be able to provide additional HPC resources to NASA High-End Computing (HEC) Program users in the form of faster, more efficient processors with one of the fastest high-speed interconnects available today. This upgrade, as with all previous upgrades, will enable NASA science and engineering research that would not be possible without state-of-the art computational resources like Discover.