1.2 Artificial intelligence methods in the prediction of FDCs
With the development and maturity of data science and artificial
intelligence, the research focus of hydrological prediction models has
gradually shifted from process-drive to data-driven models
(Mohammadrezapour et al., 2019; Sharifi Garmdareh et al., 2018) . The
data-driven model was based on the statistical properties of the data,
without considering the physical causes of runoff, and directly
calculates the correlation between the input and output of the model to
obtain hydrological prediction results. Machine learning models
typically exhibit a relatively complex model structure. By adjusting
parameters and conducting model training, the model can continuously
approach the optimal mapping relationship between the input and output,
and the predicted results usually have high accuracy. However, due to
the limitations of the “black box”, decision-makers cannot directly
know how machine learning models calculate decision results (Cortez and
Embrechts, 2013) . The ”black box” of machine learning models simplifies
model input and training, which makes its prediction results lack
practical physical significance, and the model is unable to explain how
to obtain prediction results from the causes and mechanisms of runoff
formation, resulting in low credibility in practical prediction work.
But machine learning methods are widely used in hydrology (Khan et al.,
2016; Khan et al., 2019) because they have unreasonable effectiveness
when applied to real-world problems (Shen, 2018) . Due to the complexity
of hydrological systems which cannot be easily represented by simple
conceptual relationships between variables and the nonlinear
relationship between watershed characteristics and hydrological
characteristics, traditional methods lack sufficient ability to predict
FDCs, while artificial intelligence models have some applicative
potential (Nearing and Gupta, 2015) .
SVM, ANN, and nonlinear regression (NLR) were used for regression
prediction using different runoff duration as output variables and six
basin feature selections as input variables in a study of 33 watersheds.
The results indicate that SVR is the most suitable model for estimating
FDC (Vafakhah and Khosrobeigi Bozchaloei, 2020) . A multi-output neural
network model was developed to predict the FDC of 9203 dataless areas in
the southeastern United States over a 60-year period from 1950 to 2009,
suggesting that compared with single-output neural-network models,
multi-output neural networks is capable of learning monotonic
relationships between adjacent quantiles and yield better predictions
(Worland et al., 2019) .
Machine learning (ML) has demonstrated outstanding performance in
forecasting FDC and is extensively utilized for predicting (Ley et al.,
2023; Vaheddoost et al., 2023) . Existing research has primarily
concentrated on enhancing the prediction accuracy of FDC through single
ML model, neglecting the impact of its influencing factors, and the
prediction accuracy through traditional prediction methods is relatively
low. Moreover, there are few research of using multiple machine model
algorithms for comprehensive comparison, and conducting regionalization
research on FDC prediction based on geographical and climatic
characteristics. Explainable machine learning (eg. SHAP) is a rapidly
developing subfield aimed at understanding how models use inputs for
prediction and eliminating the black box problem (Kim, 2017) . Thus, the
main issues studied in this paper include (see Figure 1 ):
[Insert Figure 1]
Figure 1 Framework of the
prediction and inference of FDC using ML
This paper utilizes a total of 645
sets of samples, made up of 22 basin characteristic variables (including
“mutable” and “immutable”) in 30 years from 244 hydrometric stations
located in the middle and lower reaches of the Yangtze River basin.
Using typical characteristics of the basin, regional FDC model was
established through machine learning methods and the performance of
these methods was compared to determine the most suitable model for
predicting the FDC. Firstly, the model includes 22 basin characteristics
that were selected and divided into mutable and immutable variables and
15 corresponding quantiles of FDC. Secondly, basin characteristic
variable-flow quantile database was established using eight typical ML
models to study the nonlinear relationship between the input parameter
(basin characteristics) and the fifteen flow quantiles which affect the
shape of the FDC. Each quantile was predicted and Taylor plots were
applied to compare different ML models to select the best one to
estimate FDC. Finally, the key influencing factors of various input on
the fifteen quantiles were quantified and determined using SHAP. How
these important hydrological factors affect the results was also
discussed.