loading page

What Geoscientists Want: Short and Sweet Commands with Eco-friendly Data
  • Charles Zender
Charles Zender
Univ California Irvine

Corresponding Author:zender@uci.edu

Author Profile

Abstract

The twin pressures to achieve mind-share and to harness available computing power drive the evolution of geoscientific data analysis tools. Such tools have enabled a remarkable progression in the atomic or fundamental unit of data they can easily analyze. In the mid-1980s we analyzed one or a few naked arrays at at time, and now researchers routinely intercompare climatological ensembles each comprising thousands of files of heterogeneous variables richly dressed in metadata. Two complementary semantic trends have empowered this analytical revolution: more intuitive and concise analysis commands that can exploit more standardized and brokered self-describing data stores. This talk highlights how tool developers can leverage these trends to successfully imagine and build the analysis tools of tomorrow by understanding the needs of domain researchers and the power of domain specific languages today. This talk will also highlight recent improvements in compression speed and interoperability that geoscientists can exploit to reduce our carbon footprint. Observations and simulations to advance Earth system sciences generate exabytes of archived data per year. Storage accounts for about 40% of datacenter power consumption, with its attendant consequences for greenhouse gas emissions and environmental sustainability. Precision-preserving lossy compression can further reduce the size of losslessly compressed data by 10-25% without compromising its scientific content. Modern lossless codecs (e.g., Zstandard or Zlib-ng) accelerate compression and decompression, relative to the traditional Zlib, by factors of 2-5x with no penalty in compression ratio. These proven modern compression technologies can help geoscientific datacenters become significantly greener.