4. Summary and outlook
Hydrometeorological data are critical in model development and water
resources management. In this work, we compiled hydrometeorological data
from 30 intensively monitored watersheds. In addition to standard
measurements of precipitation, air temperature and streamflow, the
CHOSEN dataset also includes soil moisture, SWE, snow depth, isotopes,
and other hydrometeorology variables. Most of the raw data were
downloaded from publicly available resources, including the LTER and US
CZO networks.
Since raw data often have errors, gaps, and inconsistent formats across
sites, we applied quality control procedures, gap-filled missing data,
and standardized the data format. The three-step gap-filling approach
consisted of interpolation, regression, and climate catalog methods. We
also generated flag tables denoting the different data types (raw,
missing, or filled) and gap-filling methods. Data users can update or
change the gap-filling techniques with the help of these flag tables.
Finally, we published the synthesis product in NetCDF format along with
Jupyter Notebook examples demonstrating the cleaning procedures and how
to access the data.
Different large sample hydrological datasets are best adapted to various
research pursuits and questions (Addor et al., 2020). Compared with
other large sample hydrological datasets, our dataset seeks to
facilitate hydrological and ecological studies that require a broad set
of hydrometeorological variables and time series data. For instance,
hydrological model development may benefit from having soil moisture,
snowmelt and streamflow data as part of its multi-objective calibration
function (Andersen et al., 2001), while isotope data can support studies
that focus on watershed storage estimates and transport (Perrin et al.,
2003; Sprenger et al., 2018). Similarly, the CHOSEN dataset can promote
data-driven models focused on predictions beyond streamflow, which will
allow them to be more comprehensive than mere discharge predictors. And
for the ecological community, these data could be used to better
understand how processes such as microbial biogeochemical reactions and
evapotranspiration respond to changing hydrological regimes and climate
variability. Besides this data release, future data products can include
catchment physical attributes such as watershed topography and soil
characteristics. The availability of such physical watershed attributes
can further assist in comparative studies using both data-driven and
distributed models, possibly leading to prediction in ungauged basins
and transferability of model parameters among catchments (Razavi &
Coulibaly, 2013; Sivapalan et al., 2003; Zelelew & Alfredsen, 2014).
While other large-sample hydrological datasets often contain more
watersheds than CHOSEN does currently, to the best of our knowledge,
CHOSEN reflects one of the largest open-source collections of
hydrometeorological data from intensely and comprehensively monitored
watersheds. Similar to the CAMELS initiative (Addor et al., 2017), which
has now encouraged the release of data from different groups and
countries (CAMELS-UK(Coxon et al., 2020), CAMELS-Chile (Alvarez-Garreton
et al., 2018)), we hope that this data product will encourage different
groups to publicly release comprehensive data sets to enrich
hydrological analysis.