Progress in Developing a Prototype Science Pipeline and Full-Volume,
Global Hyperspectral Synthetic Data Sets for NASA's Earth System
Observatory's Upcoming Surface, Biology and Geology Mission
Abstract
The Surface Biology and Geology (SBG) mission is one of the core
missions of NASA’s Earth System Observatory (ESO). SBG will acquire high
resolution solar-reflected spectroscopy and thermal infrared
observations at a data rate of ~10 TB/day and generate
products at ~75 TB/day. As the per-day volume is greater
than NASA’s total extant airborne hyperspectral data collection,
collecting, processing/re- processing, disseminating, and exploiting the
SBG data presents new challenges. To address these challenges, we are
developing a prototype science pipeline and a full-volume global
hyperspectral synthetic data set to help prepare for SBG’s flight. Our
science pipeline is based on the science processing operations
technology developed for the Kepler and TESS planet-hunting missions.
The pipeline infrastructure, Ziggy, provides a scalable architecture for
robust, repeatable, and replicable science and application products that
can be run on a range of systems from a laptop to the cloud or an
on-site supercomputer. Our effort began by ingesting data and applying
workflows from the EO- 1/Hyperion 17-year mission archive that provides
globally sampled visible through shortwave infrared spectra that are
representative of SBG data types and volumes. We have fully implemented
the first stage of processing, from the raw data (Level 0) to
top-of-the-atmosphere radiance (Level 1R). We plan to begin reprocessing
the entire 55 TB Hyperion data set by the end of 2021. Work to implement
an atmospheric correction module to convert the L1R data to surface
reflectance (Level 2) is also underway. Additionally, an effort to
develop a hybrid High Performance Computing (HPC)/cloud processing
framework has been started to help optimize the cost, processing
throughput and overall system resiliency for SBG’s science data system
(SDS). Separately, we have developed a method for generating full-volume
synthetic data sets for SBG based on MODIS data and have made the first
version of this data set available to the community on the data portal
of NASA’s Advanced Supercomputing Division at NASA Ames Research Center.
The synthetic data will make it possible to test parts of the pipeline
infrastructure and other software to be applied for product generation.