1. Introduction
Hydrometeorological data are essential for decision making in water
resources engineering and management, and for developing predictive
models for how watersheds and ecosystems respond to perturbation.
Extreme-event analysis, flood mapping, and hydrological model building,
calibration, and validation all rely on hydrometeorological data (Borga
et al., 2011; Clark et al., 2008; Khan et al., 2011; Marchi et al.,
2010; Razavi & Coulibaly, 2013). Although different models require
various data inputs, most models could benefit from intensively measured
hydrometeorological data spanning diverse catchments. Notably, the
continued development of both data-driven models and physically based
distributed models requires comprehensive data for their execution and
validation (Andersen et al., 2001; Asong et al., 2020; Kumar et al.,
2008; Nord et al., 2017). Cross-site synthesis can also provide core
knowledge to scale up hillslope to global processes and thus improve
Earth system models (Fan et al., 2019).
Besides benefiting the development of hydrological and ecosystem models,
comprehensive catchment data sets could also improve site-specific and
comparative cross-site studies. Place-based studies, such as flood
prediction (Rozalis et al., 2010), dominant hydrological process
analysis (Schmocker-Fackel et al., 2007; Western et al., 2004), and
climate change impact investigations (Jha et al., 2004) are critical in
local decision making and hypothesis testing. For example, Tennant et
al. (2020) made use of multiple hydrometeorological variables to improve
the understanding of the dominant controls on catchment discharge.
Conversely, comparative hydrology aims to understand hydrological
variability and the role of catchment characteristics, and to develop
generally applicable models (Kuentz et al., 2017; Sawicz et al., 2011;
Wymore et al., 2017). For example, Wymore et al. (2017) studied
concentration-discharge relationships across 10 tropical watersheds with
different landscape characteristics. With the increasing interest in
comparative hydrology, demand for large-sample hydrological datasets has
grown (Gupta et al., 2014). Such large-sample hydrology datasets support
continental-scale hydrological studies, facilitate comparative
hydrological analysis, and help to identify hydrological patterns (Addor
et al., 2017; Duan et al., 2006). The comprehensive dataset presented in
this study, a synthesis of streamflow and hydrometeorology data across
intensively monitored catchments, will serve the hydrological research
community by providing quality-controlled, ready-to-use data with a
coordinated and standardized structure.
CHOSEN (Comprehensive Hydrologic Observatory SEnsor Network) is a
compilation of data from the Long-Term Ecological Research (LTER) and
Critical Zone Observatory (CZO) networks, and several other ecological
and hydrological observatories. Initiatives like the LTER and CZO
networks seek to create opportunities for analyses that span multiple
watersheds and ecosystem types. However, cross-network and cross-site
comparative efforts are often hampered by site-to-site differences in
which variables are measured, how they are processed and formatted, and
how they are reported. The work of finding diverse catchment data sets,
extracting them from whatever formats they are stored in, and cleaning
and harmonizing them requires a significant investment of time and
effort. CHOSEN aims to address these challenges by providing a
ready-to-use comprehensive hydrometeorological dataset, with an
accompanying open-access data processing pipeline allowing for the
incorporation of new data and the continued evolution of the data set.
Several previous data synthesis efforts, including the MOPEX (Duan et
al., 2006) and CAMELS datasets (Addor et al., 2017), have also sought to
facilitate large-sample hydrological studies. Compared with those
previous datasets, the CHOSEN dataset focuses strictly on intensively
monitored sites with field measurements that extend beyond just
discharge, precipitation, and weather, to include snow depth and snow
water equivalent (SWE), soil moisture, soil temperature, and isotope
data. Time series of these variables are critical to process-based
hydrological and ecological studies, for example, process-oriented
benchmarking evaluation (Nearing et al., 2018), and coupling physical
process models with machine learning (Reichstein et al., 2019). Such
datasets can also assist in understanding the physically based
mechanisms underlying watershed behavior (Werkhoven et al., 2008) and
ecosystem resilience (Qi et al., 2016). In some catchments, soil
moisture patterns have been used to reveal the dynamics of water storage
and transport in the landscape (Bracken et al., 2013; James & Roulet,
2007; Tetzlaff et al., 2011). Snow data are essential in investigating
hydrological processes and simulating runoff in snow-dominated areas
(Rasmussen et
al., 2011; Foy et al., 2015). Isotope time series facilitate the
tracing of water fluxes through watersheds (Hrachowitz et al., 2013).
Rather than merely treating basins as black boxes that convert
precipitation inputs to streamflow outputs, the age distribution of the
water derived from isotope data provides information about storage
timescales within catchments (Kirchner et al., 2000; McDonnell et al.,
2010; Soulsby et al., 2006; Tetzlaff et al., 2014). By focusing on
intensively monitored catchments with more comprehensive data than just
discharge, precipitation, and weather, the CHOSEN dataset seeks to
facilitate the understanding of hydrological processes, development of
simulation models, and effective management of catchments and ecosystems
spanning diverse environmental conditions.