Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at help@authorea.com in case you face any issues.

loading page

Ziggy, a Portable, Scalable Infrastructure for Science Data Processing Pipelines
  • +6
  • Peter Tenenbaum,
  • Bill Wohler,
  • Jon Jenkins,
  • Yohei Shinozuka,
  • Jennifer Dungan,
  • Ian Brosnan,
  • Chris Henze,
  • Mark Rose,
  • Andrew Michaelis
Peter Tenenbaum
SETI Institute

Corresponding Author:peter.g.tenenbaum@gmail.com

Author Profile
Bill Wohler
SETI Institute
Author Profile
Jon Jenkins
NASA Ames Research Center
Author Profile
Yohei Shinozuka
Bay Area Environmental Research Institute Sonoma
Author Profile
Jennifer Dungan
NASA Ames Research Center
Author Profile
Ian Brosnan
NASA Ames Research Center
Author Profile
Chris Henze
NASA Ames Research Center
Author Profile
Mark Rose
PSGS / NASA Ames Research Ctr
Author Profile
Andrew Michaelis
NASA Ames Research Center
Author Profile

Abstract

We describe Ziggy, an infrastructure for pipelines that process large volumes of science data. Ziggy is based on the pipeline infrastructure software that was developed to process flight data for the Kepler and TESS exoplanet missions. In this latter capacity, multiple terabytes of data are processed every month. Ziggy provides execution control, logging, exception management, marshaling, and persistence, and data accountability record management for user-defined sequences of processing steps. Users define a pipeline via a set of XML files that specify the order in which processing algorithms are applied (including optional branching, in which one step is followed by multiple algorithms that run simultaneously), inputs, outputs, and any instrument models or control parameters that are required for each step. Ziggy supports heterogeneous pipelines: each processing algorithm can be in any supported language, and each step can run locally on a server or remotely on a supercomputer or cloud computing facility. Ziggy is sufficiently lightweight to run on a laptop and sufficiently robust to run on a supercomputer; builds on Mac OS X and Linux are supported. Ziggy is currently in use as the pipeline infrastructure tool for reprocessing the full data volume of the EO-1/Hyperion mission data and is a candidate for use in the upcoming Surface Biology and Geology (SBG) mission of the Earth System Observatory (ESO). Ziggy contains no proprietary or sensitive/controlled software or algorithms, and approval for its release as a NASA Open Source Software Project is underway.