4. Summary and outlook

Hydrometeorological data are critical in model development and water resources management. In this work, we compiled hydrometeorological data from 30 intensively monitored watersheds. In addition to standard measurements of precipitation, air temperature and streamflow, the CHOSEN dataset also includes soil moisture, SWE, snow depth, isotopes, and other hydrometeorology variables. Most of the raw data were downloaded from publicly available resources, including the LTER and US CZO networks.
Since raw data often have errors, gaps, and inconsistent formats across sites, we applied quality control procedures, gap-filled missing data, and standardized the data format. The three-step gap-filling approach consisted of interpolation, regression, and climate catalog methods. We also generated flag tables denoting the different data types (raw, missing, or filled) and gap-filling methods. Data users can update or change the gap-filling techniques with the help of these flag tables. Finally, we published the synthesis product in NetCDF format along with Jupyter Notebook examples demonstrating the cleaning procedures and how to access the data.
Different large sample hydrological datasets are best adapted to various research pursuits and questions (Addor et al., 2020). Compared with other large sample hydrological datasets, our dataset seeks to facilitate hydrological and ecological studies that require a broad set of hydrometeorological variables and time series data. For instance, hydrological model development may benefit from having soil moisture, snowmelt and streamflow data as part of its multi-objective calibration function (Andersen et al., 2001), while isotope data can support studies that focus on watershed storage estimates and transport (Perrin et al., 2003; Sprenger et al., 2018). Similarly, the CHOSEN dataset can promote data-driven models focused on predictions beyond streamflow, which will allow them to be more comprehensive than mere discharge predictors. And for the ecological community, these data could be used to better understand how processes such as microbial biogeochemical reactions and evapotranspiration respond to changing hydrological regimes and climate variability. Besides this data release, future data products can include catchment physical attributes such as watershed topography and soil characteristics. The availability of such physical watershed attributes can further assist in comparative studies using both data-driven and distributed models, possibly leading to prediction in ungauged basins and transferability of model parameters among catchments (Razavi & Coulibaly, 2013; Sivapalan et al., 2003; Zelelew & Alfredsen, 2014).
While other large-sample hydrological datasets often contain more watersheds than CHOSEN does currently, to the best of our knowledge, CHOSEN reflects one of the largest open-source collections of hydrometeorological data from intensely and comprehensively monitored watersheds. Similar to the CAMELS initiative (Addor et al., 2017), which has now encouraged the release of data from different groups and countries (CAMELS-UK(Coxon et al., 2020), CAMELS-Chile (Alvarez-Garreton et al., 2018)), we hope that this data product will encourage different groups to publicly release comprehensive data sets to enrich hydrological analysis.