2.4.4 Extreme learning machine (ELM)
ELM is a feedforward neural network with only single hidden layer (HUANG et al., 2004) , which differs from the BP algorithm in that it does not need to adjust the weight of the hidden layer through reverse iteration. The weight of the input feature vector is randomly assigned from the input layer to the hidden laye. (Huang et al., 2006) (see Figure 5 (d)). Its difference with BP is that the weights between the layers all need to be iteratively solved using the gradient descent method in the BP algorithm, But for ELM, the weight of input layer and the hidden layer can be determined without iteration for the certain input feature vector.

2.4.5 Extreme gradient boosting (XGB)

The Extreme gradient boosting (XGB) algorithm reduces the error of the previous prediction step by continuously generating new regression trees, gradually narrowing the gap between the true and predicted values, and thereby improving the prediction accuracy (Chen and Guestrin, 2016) (see Figure 5 (e)). It improves the generalization ability and computational efficiency of the model by introducing regularization terms and parallel computing techniques on the basis of the original gradient boosting decision tree (GBDT) algorithm.

2.4.6 Radial-Basis Function (RBF)

Radial-Basis Function (RBF) network can approximate any nonlinear function, deal with the difficulty to analyze regularity in the system (Majnooni et al., 2023) (seeFigure 5 (f)). Compared with BP, RBF has only one hidden layer, while BP does not limit the number of it. BP is a global approximation of nonlinear mapping, while RBF is a local approximation of nonlinear mapping, with faster training speed.
[Insert Figure 5]
Figure 5Principles of the six machine learning algorithms

2.5 Swarm intelligence optimization algorithm

Although BP neural networks can quickly adapt to various problems, they are also prone to falling into local optimum and overfitting, which will affect the prediction results, while swarm intelligence optimization algorithms demonstrates their effectiveness in solving complex optimization problems (Cai et al., 2018). Therefore, we present swarm intelligence optimization algorithms to seek out both optimal global and local solution. These algorithms generate initial populations and utilize heuristic rules to carry out searches and reduce overfitting risks by optimizing weight parameters to improve the accuracy of FDC prediction.

2.5.1 Particle swarm optimizationand gray wolf optimizer algorithm

The particle swarm optimization algorithm (PSO) originates from a group of animal social interaction models that search for food (Poli et al., 2007) , in which birds are regarded as particles, and all particle information within the group is shared to find the optimal strategy (seeFigure 6 (a)). Assuming a fixed search area is limited to ad -dimensional space, where the number of particles is n . During the iteration process, particles track the “individual extremum ” and “global extremum ” by changing their own speed and position. The formula for particle speed and position is as follows:
Where and represent the velocities of particle at the iteration and , having both size and direction; Similarly, and are the positions of those particles; is the individual extremum and is the global extremum which is the uniformly distributed random number of the currently found optimal solution in the particle swarm. and denote acceleration coefficients; and are two random numbers between 0 and 1; is the inertia weight.
The Grey Wolf Optimizer (GWO) boasts several advantages, including simplicity in structure and parameters, and capabilities in adaptive adjustment. These features render it an effective tool in striking a balance between local and global optimization (Dehghani et al., 2019; Seifi and Soroush, 2020) . It includes three main stages: finding, surrounding, and attacking preys (Mirjalili et al., 2014) .Figure 6 (b) shows the working structure. 、、 and the remaining wolves dominate the social hierarchy, and the level of dominance of wolf species descends from to , which means that is the most powerful wolf category, with wolves 、 and guiding the remaining wolves to search towards their targets. GWO can bypass local optimal stagnation and enhance the global optimal convergence ability (Adnan et al., 2023) (Adnan et al., 2023). The hunting behavior of gray wolves encircling their prey is quantified using the following position update formula (Maroufpoor et al., 2020; Zhou et al., 2021) .
Where and represent the positions of the prey and the grey wolf respectively; t represents the current number of iteration; D represents the distance between the wolf and its prey; is the wolf’s position at time (t +1); C and A are coefficient vectors; The above equation represents the distance between the grey wolf and its prey, and the following formula represents the the grey wolves’ position updated.
[Insert Figure 6]
Figure 6 The principles of swarm intelligence optimization algorithm: PSO and GWO

2.6 Evaluation of model performance

The performance evaluation criteria including R -squared (R 2), root mean squared error (RMSE), and Nash-Sutcliffe Efficiency (NSE) (Nash and Sutcliffe, 1970) were used to evaluate the ML performance.
where and are the predicted and the observed value; and are the average values of observed and predicted; n represents the sample number.

2.7 Explanation of ML’s output

SHapley Additive exPlanation (SHAP) is based on the Shapley value in game theory, which can take into account the mutual influence between all features, providing more accurate and comprehensive explanations. For each prediction sample, the model generates a corresponding prediction value, and the SHAP value represents the contribution of each feature to the model’s overall prediction (Lee, 2017) . The Shapley value was interpreted as an additive feature attribution method.

3 DATABASE DESCRIPTION AND ANALYSI

3.1 Selection of input variables

Though multiple research have been carried out on the prediction of flow rate in the regions without historical flow rate data via FDC, the accuracy of prediction and applicability to regions is still unsettled. Therefore, 224*3 sets of test data were used in this paper. All basin characteristics selected in this paper (Table 1) which were selected as input variables are divided into two categories: mutable variables and immutable variables. Characteristics that may change over time (such as Aridity index (AI ), precipitation, etc.) are regarded to as “mutable” variables. Those characteristics which are reflected as quantities that are fixed over time, for example, such as elevation、drainage area、reservoir control area、average slope (dimensionless)、baseflow index (dimensionless)、topographic wetness index (dimensionless) et al., are regarded as “immutable” variables. The impact of basin characteristics on FDC seems to be empirical (Elena Ridolf et al., 2020). By assessing the value of each parameter, 15 unique corresponding flow percentile values can be obtained.
Table 1 Basin characteristic parameters selected for basins of hydrometric stations
[Insert Table 1]

3.2 Correlation and description of basin characteristics

The results judged by the Spearman correlation coefficients (seeFigure 7 ) show that BFI_ Mean and has a significant correlation with flow data includes 15 percentiles corresponding to exceedance probabilities. Smax, AP_max, DP_ std, ATP and SWS all exhibit a negative correlation with the streamflow percentile. From Figure 7 (c), the relationship between DA and the streamflow percentile varies from negative to positive with the rise of percentiles (decrease in the rate of flow and the return period), while the relationship between wind and flow percentile is reversed. Moreover, BFI_ Mean has a better correlation with low flow rate (bottom part of the FDC curve), while has a higher correlation with high flow rate.
[Insert Figure 7]
Figure 7 Correlation of input and output variables. (a) shows the correlation between variables of selected characteristics and Q0.3. (b) shows the correlation between all the variables. (c) shows the correlation between variables of selected characteristics and the flow rate includes 15 streamflow percentiles corresponding to exceedance probabilities.
[Insert Figure 8]
Figure 8 Distribution of the seven variables with high correlation withQ0.3
As shown in Figure 8 , scatter matrix are further used to display the data distribution features of these input variables. Considering the large size of the matrix, here only a small portion of the matrix was shown, selectively analyzing the seven variables (BFI_mean、Smax、ATP、DP_std、AP_max、、SWS) with high correlation with Q 0.3 in Figure 7 to show the data situation. The frequency distribution of these input variables are represented in the diagonal line. The upper right section portrays the scatter plot of input basin characteristics. It can be observed that ATP and DP_std follow a linear relationship. Nevertheless, the variables exhibit a tendency towards nonlinear distribution, thus uncomplicated linear regression and nonlinear regression models are not able to precisely describe the distribution features of the input variables and their impacts on the output variables. The lower left section displays the probability density of the distribution of these input variables, with darker colors indicating a higher probability of data appearing. To take an instance, ATP is concentrated near the value of 500 when BFI_mean assumes the value of 0.5. In order to fully evaluate their impacts on them, SHAP was utilized to increase the interpretability of ML models and disclose the mechanism in section 4.4.

4 RESULTS AND DISCUSSIONS

4.1 Parametric analysis

[Insert Figure 9]
Figure 9