Finding similarities between model parameters across different catchments has proved to be challenging, especially for ungauged catchments. Existing approaches struggle due to catchment heterogeneity and non-linear dynamics. In particular, attempts to correlate catchment attributes with hydrological responses have failed due to interdependencies among variables and consequent equifinality. Machine Learning (ML), particularly Long Short-Term Memory (LSTM) approach, has demonstrated strong predictive and spatial regionalization performance. However, understanding the nature of the regionalization relationships remains difficult. This study proposes a novel approach to partially decouple the representation learning of (a) catchment dynamics by using the HydroLSTM architecture and (b) spatial regionalization relationships by using a Random Forest(RF) clustering approach to learn the relationships between catchment attributes and the dynamics. This coupled approach, called Regional HydroLSTM, generates a representation of “potential streamflow” using a single cell-state, while the output gate corrects it given the temporal context of the hydrologic regime. RF clusters mediate the relationship between catchment attributes and dynamics, allowing the identification of spatially consistent hydrological regions, thereby providing insight into the factors driving spatial and temporal hydrological variability. Results suggest that combining the two complementary architectures can enhance the interpretability of regional machine learning models in hydrology, offering a new perspective on the ”catchment classification” problem and potentially advancing streamflow prediction in ungauged basins. We conclude that an improved understanding of the underlying nature of hydrologic systems can be achieved by careful design of ML architectures to target the specific things we are seeking to learn from the data.

Andrew Bennett

and 7 more

Integrated hydrologic models can simulate coupled surface and subsurface processes but are computationally expensive to run at high resolutions over large domains. Here we develop a novel deep learning model to emulate continental-scale subsurface flows simulated by the integrated ParFlow-CLM model. We compare convolutional neural networks like ResNet and UNet run autoregressively against our novel architecture called the Forced SpatioTemporal RNN (FSTR). The FSTR model incorporates separate encoding of initial conditions, static parameters, and meteorological forcings, which are fused in a recurrent loop to produce spatiotemporal predictions of groundwater. We evaluate the model architectures on their ability to reproduce 4D pressure heads, water table depths, and surface soil moisture over the contiguous US at 1km resolution and daily time steps over the course of a full water year. The FSTR model shows superior performance to the baseline models, producing stable simulations that capture both seasonal and event-scale dynamics across a wide array of hydroclimatic regimes. The emulators provide over 1000x speedup compared to the original physical model, which will enable new capabilities like uncertainty quantification and data assimilation for integrated hydrologic modeling that were not previously possible. Our results demonstrate the promise of using specialized deep learning architectures like FSTR for emulating complex process-based models without sacrificing fidelity.

Luis De la Fuente

and 2 more

A key step in model development is selection of an appropriate representational system, including both the representation of what is observed (the data), and the formal mathematical structure used to construct the input-state-output mapping. These choices are critical, because they completely determine the questions we can ask, the nature of the analyses and inferences we can perform, and the answers that we can obtain. Accordingly, a representation that is suitable for one kind of investigation might be limited in its ability to support some other kind. Arguably, how different representational approaches affect what we can learn from data is poorly understood. This paper explores three complementary representational strategies as vehicles for understanding how catchment-scale hydrological processes vary across hydro-geo-climatologically diverse Chile. Specifically, we test a lumped water-balance model (GR4J), a data-based dynamical systems model (LSTM), and a data-based regression-tree model (Random Forest). Insights were obtained regarding system memory encoded in data, spatial transferability by use of surrogate attributes, and informational deficiencies of the dataset that limit our ability to learn an adequate input-output relationship. As expected, each approach exhibits specific strengths, with LSTM providing the best characterization of dynamics, GR4J being the most robust under informationally deficient conditions, and RF being most supportive of interpretation. Overall, the complementary nature of the three approaches suggests the value of adopting a multi-representational framework in order to more fully extract information from the data. Our results show that a multi-representational approach better supports the goals of prediction, understanding, and scientific discovery in Hydrology.