Abstract
The US National Center for Atmospheric Research (NCAR) has published
several large datasets in the Amazon Web Services (AWS) cloud, thanks to
support from the NCAR “Science at Scale” project, the AWS Open Data
Sponsorship program, and the Amazon Sustainability Data Initiative. In
each case we selected a subset comprising the most useful variables from
the original data, and converted that subset from NetCDF to Zarr before
publication. The Zarr format supports the same data model as netCDF and
is well suited to object storage and distributed computing in the cloud
using the Pangeo libraries in Python. Each dataset has an accompanying
Intake-ESM catalog to facilitate data discovery and reading via Xarray,
and each also has a sample Jupyter Notebook to illustrate how to access
and analyze the data. Egress for these data are free, but users are
encouraged to bring their compute to the data. The datasets currently
published are: Community Earth System Model Large Ensemble (CESM LENS):
https://doi.org/10.26024/wt24-5j82 North American Coordinated Regional
Downscaling Experiment (NA-CORDEX): https://doi.org/10.26024/9xkm-fp8
CESM version 2 Large Ensemble (CESM2-LE):
https://doi.org/10.26024/y48t-q717 Data Assimilation Research Testbed
(DART) Reanalysis: https://doi.org/10.26024/sprq-2d04 This paper will
provide information about the datasets and summarize lessons learned
from the data conversion and publication.