4. Discussion

In the present study, the performance of different models in predicting spatially biased occurrence data was evaluated. For this purpose, modeling was performed based four SDMs. A variety of correlative distribution models have been emerged in recent decades and their performance has been compared in numerous studies (Elith and Graham, 2009; Merow et al., 2014). These methods, however, are challenged by a number of methodological problems that make their comparison controversial. Among these issues are providing a balance between goodness-of-fit and model complexity (Araújo et al., 2019; Warren and Seifert, 2011), and the spatial bias of the input data and manipulating them for evaluating model performance (Hijmans, 2012; Phillips et al., 2009). Generally, for most SDMs, particularly for complex machine learning ones, using a set of default parameters have been recommended based on a comprehensive model tuning [for example see Phillips & Dudík (2008) for the MaxEnt and Elith et al., (2008) for boosted regression trees]. However, a blindfold utilization of them may come up with a poorly performing model (Muscarella et al., 2014). Equally important, is the probable SAC in the input occurrence points which may result in an inflated score of the metrics used for model valuation (Dormann et al., 2007; Radosavljevic and Anderson, 2014). Another problem is the paucity of operative procedures for tuning SDMs that are efficiently effortless and time-saving. Consequently, in most SDM efforts, while the species, scale, and area of interest are different, the modeling relies on default setting of the related model.
In the current research, we employed a novel approach that in one hand applyed model-specific parameterization, and on the other hand, manipulated a spatially imblanced-biased input data to cope with the above-mentioned issues. In general, the results showed that the models have high predictive performance based on AUC and TSS values. However, the geographical pattern of the predicted suitable habitats among models was different. For example, the area of suitable habitats was greater in GLM in comparison with other SDMs. Accordingly, for this model lower values of specificity were observed across the different gradients of suitability thresholds (red curves in Fig. 3). This is intrinsically due to the simple regression-based nature of this model in which a basic assumption is the normality of the error distribution and its constant variance (Osborne and Waters, 2002). Thus, those data that are not used as training are well-fitted in the predicted model. On the other hand, this makes GLM prone to over-prediction across non-sampled areas leading to lower specificity, i.e., true negative rate, of this model.
On the contrary, we found the best training sensitivity and specificity for decision-tree-based methods, i.e., GBM and RF. These models, due to the automatic promotion caused by the model learner, attempt to improve task classification, as much as possible, leading to the highest discrimination capacity of training presence/absence dataset (De’ath and Fabricius, 2000; Friedman, 2017). In our case, results indicated that the RF models for both random and background weighting procedures, while excellently classified the training dataset, failed to predict out-of-bag ones. Fundamentally, RF and GBM depend on bagging and boosting algorithms of the tree learner, respectively (Elith et al., 2008; Friedman, 2017). In RF the bagging algorithm, also called bootstrap aggregation, allows the tree learner repeatedly select a random sample with replacement of the training set and fits trees to these samples (Breiman, 2001). The trees in RF are run in parallel and there is no interaction between them while the trees are built. Once all the trees are built, then an average is taken across all the trees’ predictions (Cutler et al., 2012). Conversely, the trees in boosting algorithms, e.g. GBM, are trained sequentially, and accordingly, weaker results are boosted or reweighted over many iterations to have the learner focus more on areas it got wrong, and less on those observations that were correct (Elith et al., 2008; Friedman, 2017). The risk of deeply-grown trees in random forests comes at the expense of overfitting the training data set in which predictions have low bias but high variances. In our modeling approach, we grew the RF model to 1000 trees which is reasonable and comparable to the similar SDM efforts (Breiman, 2001; Shabani et al., 2016). Altough based on a cross-validated splitting of the training data we fine-tuned the RF model, it appears that this model still suffers from high variance. In fact, the strict classification algorithm of the RF model is prone to return low capabilities in predicting out-of-bag dataset (Elith et al., 2008; Merow et al., 2014). Accordingly, the use of this method in extrapolating predictions should be taken into account prudently.
Given the random background approach, our results highlight the excellent predictive performance of the GBM model in which true positive rate, i.e. sensitivity, and true negative rate, i.e. specificity, remained well-justified across accumulative thresholds. In contrast to the paralleled bagging method of the RF, it is believed that the boosting manner of the tree learner in GBM reduce the probability of overfitting and allows well-classifying out-of-bag samples (Elith et al., 2008; Shabani et al., 2016). However, our study revealed that fitting a GBM model based on a background weighting approach significantly reduces the predictive performance of this model for identifying out-of-bag data. This highlights the inefficiency of this method for being used in SDM efforts with imbalanced-biased data where the primary goal is finding probability distribution over areas that are not comparably sampled.
Unlike the decision-tree-based methods, MaxEnt resulted in a comparable prediction based on both random and background weighting approaches. This was similar to the results of the GLM model except that for the latter true negative rates were misplaced at lower thresholds. As mentioned before, the normally-distributed errors and no trends in residuals relative to the fitted values (Hardin et al., 2007) allow the GLM to be interpretably efficient for predicting out-of-bag data. In MaxEnt, as a density estimator algorithm, the species distribution is represented by a probability distribution that is closest to uniform (Phillips et al., 2006). This probability distribution is bounded by a set of constraints that are simple functions of the explanatory variables, called ‘features’, and derived from the species occurrence locations. The primary assumption of the MaxEnt model is that the mean of each feature is required to be close (within same error bounds) to the empirical average over the presence sites (Phillips and Dudík, 2008). From a general point of view, this constraint in MaxEnt could be assumed equivalent to the consistency of error variance in the GLM method, and as a consequence, bringing consistent results for out-of-bag data is also expectable in the MaxEnt model. In both GLM and MaxEnt models a maximum likelihood is used to estimate a parametric exponential distribution of linear combination of features (Phillips et al. 2004). Although GLM could be fitted by considering quadratic and interactive terms of the explanatory variables, more variation of feature types in MaxEnt allows fitting more complex models (Phillips et al. 2006). More importantly, while GBM and RF as complex machine learning methods are more prone to overfitting, the regularization multiplier in MaxEnt prevents the model to match the input data too closely. Altogether, our results highlight the efficiency of the MaxEnt model in using the benefits of different modeling methods to produce results that are both predictable (extrapolative) and complex (interpolative).
Asymmetry in spatially biased model predictions also highlights the need to evaluate model performance using threshold-dependent sensitivity (true positive rate) and specificity (true negative rate) in addition to threshold-independent AUC. As criticized by Lobo et al. (2008) and Jiménez‐Valverde (2012), modeling goals and setting highly influence the appropriateness of the AUC for measuring the performance of a model. AUC inflates the number of false absence data (Lobo et al., 2008), and accordingly, over-represents predictive performance for rare species (Phillips et al., 2009; Stolar and Nielsen, 2015), as in the case ofMontivipera species. Moreover, being only as a discrimination measure, AUC doesn’t show goodness-of-fit, i.e. classification accuracy of the model, and consequently, a model with high AUC value is not necessarily a well-fitted one (Jiménez‐Valverde, 2012). For threshold dependent measures a critical trick is selecting the best suitability threshold at which the sensitivity and specificity of the resulted model are well-balanced. Although several thresholds have been suggested to do this, for example see Liu et al. (2005) as a review, a single suitability threshold provides only a cross-section of the model performance and doesn’t provide a comprehensive perception of the classification accuracy of the model across a gradient of suitability threshold. For example, the threshold at which the sensitivity is equal to the specificity, i.e., where their curves cross on the Fig. 3, is among the widely used suitability thresholds, nevertheless, the corresponding accuracy measure doesn’t inclusively specify the performance of the model to classify presence and background data.
There are several ways to correct sampling bias, some of which cannot be used in cases where data is scarce. Spatial filtering may not be helpful when there are only a few presence points (Phillips et al., 2009). Decreasing clumpiness reduces training sample size and, depending on the heterogeneity of the surrounding environment and the selected spatial resolution, it may drop some of the information on the species occupation sites. In addition to spatial filtering and background weighting, a third method called model-based bias correction has been used (El‐Gabbas and Dormann, 2018a) to address spatial bias in occurrence data. In this method other environmental variables, used as bias covariates, characterize potential sources of sampling bias. Although this method is confirmed to be useful when dealing with sparse datasets (El‐Gabbas and Dormann, 2018b), it is highly dependent on the selected bias variable that, in turn, intensifies upstream uncertainty caused by assisting covariates.
It is worth mentioning that bias adjustment and model parameterization depending on the prevalence of the target species result in varying spatial predictions (Araújo et al., 2019; Pottier et al., 2013). This variation is most noticeable for common species, and thus, distribution models of rare species that are habitat specialists may not be very sensitive to spatially-biased occurrence data (Stolar and Nielsen, 2015). Being limited to a narrow gradient of environmental conditions, specialist species are thus more predictable as well as more distinguishable, i.e. high values of AUC of their SDMs, because of the high distinctiveness between their occurrence points and background space. In our case, this tendency was more obvious where narrow-ranged mountain vipers (Ahmadi et al., 2019) obtained high scores of AUC and TSS. This, to a high extend, justify our SDM approach where, due to the sparse data of the mountain vipers, their occurrence points were pooled into one set. Since they belong to distinct taxonomic levels, e.g. species or sub-species levels, the resulted SDMs might challenge niche equilibrium assumption (Wiens et al., 2009) and be prone to a inflated niche breadth (Pearman et al., 2010) where the resulted distribution models show higher levels of over-estimation. However, niche inflation is more challenging for general species with abundant data (Randin et al., 2006). Moreover, the narrow-ranged mountain vipers, in general, and the species/sub-species belonging to the Raddei clade, in specific, show low rates of niche evolution and high degrees of niche conservatism (Ahmadi et al., 2021) that leads to the occupation of similar ecological conditions in this species.