Quantification of variable importance
Ecological and evolutionary research deals with a large array of
interacting factors influencing the phenomenon of interest. Identifying
the relative importance of predictors is a challenging task given the
presence of interactions and collinearity and that variables do not
operate in isolation. In the present study, we explored three methods to
quantify the relative importance of the main drivers of regional alien
species richness worldwide, including classical regression approaches
(Dawson et al., 2017). We suggest that the two compared variable
importance measures (RF and HP) delivered improved insights about the
spatial patterns of alien species richness, and, in turn, on the most
likely drivers. Specifically, our analysis revealed that economic
drivers (both relative and absolute) were more important than population
variables for alien species richness.
Although it is well known in the statistical literature (see references
in the Introduction and Fig. 1) that RF and HP can better reflect
variable importance compared to partial regression coefficients, the
latter remain the most widely methods used by ecologists for such
purposes (Table S1). Due to progress in statistical methods, data
availability, and computing power, machine-learning methods have
substantially increased in use among ecologists (Lucas, 2020) but mostly
for species-distribution modelling because of the higher predictive
power (Bradter et al., 2013). In addition to invasion biology, the
adoption of RF and HP to identify variable importance should be extended
to many other topics such as climate change or habitat fragmentation
(Mac Nally, 2000; Leng et al., 2008; Smith et al., 2009; Zheng, 2018).
The use of partial regression coefficients or their standardized
versions (i.e., beta weights) – commonly employed indices to quantify
and understand the ecological relationships between various explanatory
variables and a response variable – should be treated with caution to
unravel the variable importance of an ecological phenomenon,
particularly when dealing with complex, nonlinear or high-dimensional
data. A clear example is Figure 2, where RF and HP identified GDPc and
HPD as the most important factors for alien plant species and GDPc and
sampling effort for alien bird species. In contrast, LMM highlighted
HPD, region area, sampling effort and environmental drivers such as
temperature (Fig. 2).
Our results also showed that the relative importance of predictors that
are weakly intercorrelated is overestimated in linear regression, as has
been inferred in previous works (Galipaud et al., 2014; Giam & Olden,
2016; Lai et al., 2022). Similarly, traditional regression models will
rarely be able to cope with situations requiring the inclusion of
numerous explanatory variables (Breiman, 2001). Despite the recognized
shortcomings of (generalized) linear regression models, such approaches
still are widely used in recent literature (Planque & Buffaz, 2008; Yee
et al., 2008; Bolker et al., 2009; Gompert & Buerkle, 2009; Koper &
Manseau, 2009). Despite many variables being intercorrelated, our
results with LMM still show a significant influence of the main drivers
shaping the spatial patterns of alien species richness.