Discover CPUs and ADAPT GPUs Help Accelerate Findings in Astronomical Data
This innovative data science project demonstrates the early results of implementing an approach applying artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC) to different astronomical datasets to look for obscure, but valuable, multiple-star systems. Using HPC systems located at the NASA Center for Climate Simulation (NCCS), including the Discover supercomputer and the Advanced Data Analytics Platform (ADAPT), a NASA data scientist extracted millions of light curves from observed astronomical objects and used deep learning methods to create high-dimensional embedding spaces. In turn these embedding spaces can be used to find light curves that exhibit similar features among the millions in the dataset.
A variety of space-observing satellites are constantly collecting vast amounts of data to investigate for potential discoveries. Each of these datasets present unique challenges and opportunities for scientists such as NASA Goddard Space Flight Center’s Brian P. Powell, a data scientist (AST, Fields and Particles) in the High Energy Astrophysics Science Archive Research Center (HEASARC). Powell, who is also a West Point graduate and Iraq/Afghanistan veteran, is no stranger to challenges and enjoys the difficulty of those presented by astronomical datasets. “Astronomical data presents an opportunity to explore space from behind a computer,” Powell said. “This is quite literally the most exciting data in the universe.”
Powell works with a wide variety of data from several observatories. One of those, the Transiting Exoplanet Survey Satellite (TESS), is an all-sky survey mission launched in April of 2018. The spacecraft’s powerful cameras stare at each sector for at least 27 days, looking at the most promising stars for exoplanet discovery at a two-minute cadence.
The data collected by TESS allows scientists to view the amount of optical light arriving from a star over time, called a light curve. “The richest potential for discovery lies in TESS Full-Frame Images (FFIs),” Powell observed, “which will ultimately contain the data from over one billion stars collected on a 30-minute cadence.”
Using traditional methods, the amount of time it would take to thoroughly investigate the terabytes of astronomical data to find rare phenomena such as multiple-star (triple, quadruple, etc.) systems is daunting. Moreover, the process of extracting the light curves from TESS FFIs is computationally intensive and cannot be accomplished with traditional computing methods for any meaningful quantity of data.
Thanks to the NCCS and its resources, NASA scientists are using AI, ML, and HPC tools to process these large volumes of TESS datasets from their raw state and accelerate the discovery of new findings.
Results and Impact
Using over 74 CPU-years of computational time on the Discover supercomputer, to date Powell has extracted more than 60 million light curves for further investigation by NASA astronomers. From these light curves, NASA scientists thus far have identified over 50 planet candidates, more than 200 potential heartbeat stars, more than 10 potential triple star systems, more than 20 potential quadruple star systems, and even a potential sextuple star system that were all previously undiscovered. ADAPT GPUs have also been employed, in an attempt to characterize the features of the light curves through high-dimensional representations created by neural networks.
These findings are only the beginning. “We have only just started exploiting these light curves through machine learning,” observed Powell. “The possibilities are vast, and NCCS provides us the resources we need to continue this effort without computational obstacles.”
As TESS continues its mission, more data will become available. Powell hopes to establish an HPC pipeline of extracting light curves from the TESS instrument, classifying light curves using neural networks on ADAPT, distributing light curves to interested NASA astronomers, re-training neural networks on ADAPT using astronomer feedback, and repeating this process for new data. Old data can continue to be reprocessed as new insights are gained.
As NASA astronomers make new discoveries and expand our understanding of the universe through astronomical datasets, Powell is eager to grow interest and participation in the data research process by the NASA Goddard scientific community. “Success of this effort will bring additional interest,” Powell said. “As more astronomers come to understand how machine learning can accelerate discovery in astronomical datasets,” he continued, “I look forward to developing new machine learning methods to support those scientists with the help of NCCS resources and personnel.”
Sean Keefe, NASA Goddard Space Flight Center