STARE for scalable unification of diverse data within Earth, Space, and
Planetary Science
Abstract
The variety of missions and observational campaigns in Earth and Space
Science has led to a vast number of files containing low-level datasets
with native, incompatible arrays. While higher-level datasets
re-interpret this low-level data onto common, comparable arrays, this
standardization moves scientists farther away from the observations,
constraining the analysis and science that can be performed.
SpatioTemporal Adaptive Resolution Encoding (STARE) provides a
parallelizable, scalable index for co-aligning data with different
native spatiotemporal formats efficiently across distributed computing
resources. STARE is particularly useful for opening low-level datasets
to intercomparison and integrative analysis by providing “array”
indexes that carry spatiotemporal semantics, unifying datasets with
previously incomparable native array indexing. We are developing STARE
as a software library with both C++ and Python APIs and are integrating
STARE indexing with existing data transfer tools (OPeNDAP). By
organizing data in a hierarchical format and taking advantage of the
Hierarchical Data Format’s (HDF) virtualization features, STARE may
provide an end user with familiar HDF usability with STARE-enhanced
performance and data unification on the back end. Furthermore, STARE’s
spatial encoding can be used to index and integrate datasets associated
with other planetary bodies, bringing scalability and unification of
diverse data for planetary and space science as well.