Quantification of variable importance
Ecological and evolutionary research deals with a large array of interacting factors influencing the phenomenon of interest. Identifying the relative importance of predictors is a challenging task given the presence of interactions and collinearity and that variables do not operate in isolation. In the present study, we explored three methods to quantify the relative importance of the main drivers of regional alien species richness worldwide, including classical regression approaches (Dawson et al., 2017). We suggest that the two compared variable importance measures (RF and HP) delivered improved insights about the spatial patterns of alien species richness, and, in turn, on the most likely drivers. Specifically, our analysis revealed that economic drivers (both relative and absolute) were more important than population variables for alien species richness.
Although it is well known in the statistical literature (see references in the Introduction and Fig. 1) that RF and HP can better reflect variable importance compared to partial regression coefficients, the latter remain the most widely methods used by ecologists for such purposes (Table S1). Due to progress in statistical methods, data availability, and computing power, machine-learning methods have substantially increased in use among ecologists (Lucas, 2020) but mostly for species-distribution modelling because of the higher predictive power (Bradter et al., 2013). In addition to invasion biology, the adoption of RF and HP to identify variable importance should be extended to many other topics such as climate change or habitat fragmentation (Mac Nally, 2000; Leng et al., 2008; Smith et al., 2009; Zheng, 2018). The use of partial regression coefficients or their standardized versions (i.e., beta weights) – commonly employed indices to quantify and understand the ecological relationships between various explanatory variables and a response variable – should be treated with caution to unravel the variable importance of an ecological phenomenon, particularly when dealing with complex, nonlinear or high-dimensional data. A clear example is Figure 2, where RF and HP identified GDPc and HPD as the most important factors for alien plant species and GDPc and sampling effort for alien bird species. In contrast, LMM highlighted HPD, region area, sampling effort and environmental drivers such as temperature (Fig. 2).
Our results also showed that the relative importance of predictors that are weakly intercorrelated is overestimated in linear regression, as has been inferred in previous works (Galipaud et al., 2014; Giam & Olden, 2016; Lai et al., 2022). Similarly, traditional regression models will rarely be able to cope with situations requiring the inclusion of numerous explanatory variables (Breiman, 2001). Despite the recognized shortcomings of (generalized) linear regression models, such approaches still are widely used in recent literature (Planque & Buffaz, 2008; Yee et al., 2008; Bolker et al., 2009; Gompert & Buerkle, 2009; Koper & Manseau, 2009). Despite many variables being intercorrelated, our results with LMM still show a significant influence of the main drivers shaping the spatial patterns of alien species richness.