Abstract
We describe Ziggy, an infrastructure for pipelines that process large
volumes of science data. Ziggy is based on the pipeline infrastructure
software that was developed to process flight data for the Kepler and
TESS exoplanet missions. In this latter capacity, multiple terabytes of
data are processed every month. Ziggy provides execution control,
logging, exception management, marshaling, and persistence, and data
accountability record management for user-defined sequences of
processing steps. Users define a pipeline via a set of XML files that
specify the order in which processing algorithms are applied (including
optional branching, in which one step is followed by multiple algorithms
that run simultaneously), inputs, outputs, and any instrument models or
control parameters that are required for each step. Ziggy supports
heterogeneous pipelines: each processing algorithm can be in any
supported language, and each step can run locally on a server or
remotely on a supercomputer or cloud computing facility. Ziggy is
sufficiently lightweight to run on a laptop and sufficiently robust to
run on a supercomputer; builds on Mac OS X and Linux are supported.
Ziggy is currently in use as the pipeline infrastructure tool for
reprocessing the full data volume of the EO-1/Hyperion mission data and
is a candidate for use in the upcoming Surface Biology and Geology (SBG)
mission of the Earth System Observatory (ESO). Ziggy contains no
proprietary or sensitive/controlled software or algorithms, and approval
for its release as a NASA Open Source Software Project is underway.