Gregor Laaha

and 3 more

Environmental models typically rely on stationarity assumptions, which are often violated by process heterogeneity. This paper explores ways to incorporate process heterogeneity into statistical models to improve their performance. It considers problems from different disciplines in environmental sciences and demonstrates the effects of process heterogeneity together with model extensions. The first considered problem addresses process heterogeneity in flood frequency analysis, where flood samples are generated by different processes in catchment and atmosphere. A mixture model that combines peakover-threshold distributions of flood types into an overall distribution can deal with such process heterogeneity. Distributions differ greatly between event types, especially regarding tail heaviness, making this approach highly relevant for flood design. The second problem addresses frequency analysis of minimum flows, where heterogeneity arises from different generation processes in summer and winter. A mixed probability model for minima can adequately incorporate different seasonal distributions into an overall distribution, and a copula-based estimator can additionally incorporate event dependence. The performance gain of the mixed distribution approaches was found to be large, especially for extreme events with a high return period. The third problem addresses the role of process heterogeneity in rainfall models. We propose clustering multiple event characteristics derived from the rainfall series (such as duration and average intensity) to stratify the rainfall series. Partitioning using Gower's distance to include a lightning index enables us to discriminate between convective and stratiform events with different rainfall distributions. The method has potential to improve the realism of rainfall generators. The fourth problem deals with parameter variation in temporal models of environmental variables. It uses daily streamflow series as an example and investigates the effects of process heterogeneity using a tree-based machine learning model. The prediction performance for mean and extreme values changes strongly depending on the quantile loss optimization, and the model parameters also show variation. The results suggest that different or combined models are needed for full time series in the presence of process heterogeneity. The study shows that process heterogeneity can be an obstacle in modeling and should be taken into account from the very beginning of the analysis in an appropriate way. The study should be seen as an encouragement to better understand the statistical assumptions in the models used and to enrich the physical knowledge included in environmental statistics.
This study assesses the potential of a hierarchical space-time model for monthly low-flow prediction in Austria. The model decomposes the monthly low-flows into a mean field and a residual field, where the mean field estimates the seasonal low-flow regime augmented by a long-term trend component. We compare four statistical (learning) approaches for the mean field, and three geostatistical methods for the residual field. All model combinations are evaluated using a hydrological diverse dataset of 260 stations in Austria, covering summer, winter, and mixed regimes. Model validation is performed by a nested 10-fold cross-validation. The best model for monthly low-flow prediction is a combination of a model-based boosting approach for the mean field and topkriging for the residual field. This model reaches a median R2 of 0.73. Model performance is generally higher for stations with a winter regime (best model yields median R2 of 0.84) than for summer regimes (R2 = 0.7), and lowest for the mixed regime type (R2 = 0.68). The model appears especially valuable in headwater catchments, where the performance increases from 0.56 (median R2 for simple topkriging routine) to 0.67 for the best model combination. The favorable performance results from the hierarchical model structure that effectively combines different types of information: average low-flow conditions estimated from climate and catchment characteristics, and information of adjacent catchments estimated by spatial correlation. The model is shown to provide robust estimates not only for moderate events, but also for extreme low-flow events where predictions are adjusted based on synchronous local observations.