Data collection
We used the comprehensive dataset provided by Dawson et al. (2017) on the worldwide distribution of alien species richness of eight taxonomic groups and potential environmental and human socioeconomic predictors. The dataset, which consisted of amphibians, ants, birds, mammals, vascular plants, reptiles, spiders, and freshwater fishes, was curated by Dawson et al. (2017) to fit within the Biodiversity Information Standards (TDWG) geographic system (Brummit, 2001). The set includes inventories of established alien species in 609 TDWG level 4 regions, mostly corresponding to countries or states and provinces within larger countries, and major islands and archipelagos. The total number of alien species in each region and global data coverage was compiled for each taxonomic group (Dawson et al., 2017). Dawson et al. (2017) is a high-quality dataset because it provides information on the geographical area and sampling effort, which is important because alien species richness almost certainly depends on the size of the region and sampling intensity. Moreover, for amphibians, birds, vascular plants, ants and mammals, survey completeness is also considered to represent the degree to which the documented species in a particular region (or grid cell) reflects the actual biota of that area (see further details in Dawson et al. 2017). Global data coverage for regions was highest for birds and mammals (609 regions), followed by vascular plants (449 regions, 82% of global ice-free terrestrial area), ants, freshwater fishes, spiders and least for amphibians and reptiles (311 and 310 regions, 48% and 47% of area, respectively). Given that species-richness values per region differed greatly among taxonomic groups, alien species richness was scaled from 0 to 1 for each taxonomic group (see further details in Dawson et al., 2017). Last, we removed islands following Winter et al. (2010) because invasion processes almost certainly differ on islands from continental areas (Moser et al., 2018; Essl et al., 2019).
The dataset of Dawson et al. (2017) includes a suite of environmental and socioeoconomic variables that may relate to the spatial patterns of alien species. We used the same environmental variables as did Dawson et al. (2017) without adding more variables because our objective was to evaluate variable importance among different statistical techniques, using a comprehensive, standard ecological dataset. Therefore, gross domestic product per capita (GDPc) (US dollars) was available for each TDWG level 4 region as the average of estimated values in 1 km2 grid cells, using estimates from nighttime light in satellite data (Ghosh et al., 2010). Although nighttime satellite data are a coarse proxy of actual economic data, particularly for those countries with large populations and relatively low lighting (e.g., China, India), they can detect low-intense light areas and provide a larger extent of the global economy (Ghosh et al., 2010). Human population density (HPD) in the year 2000 was calculated in a similar manner from 1 km2 grid-cell values obtained from the Global Rural Urban Mapping Project (GRUMP; http://sedac.ciesin.columbia.edu/data/set/grump-v1-population-density). From the data provided by Dawson et al. (2017), we also computed absolute measures for each region, namely: gross domestic product (GDP) was estimated by multiplying the gross domestic product per capita (GDPc) by the population density (in 2000) and geographical area (km2) for each region; and (total) human population (in 2000) was estimated by multiplying the population density and geographical area of each region. The two relative (GDPc and HPD) and absolute (GDP and HP) measures inform about different properties: GDPc is more the economic development or standard of living (i.e., also accumulated human impacts) compared to GDP, which is more overall economic activity (Taylor & Irwin, 2004; see world maps in Fig. S2). GDP and GDPc were not highly correlated (Pearson’s r = 0.075;P = 0.101), while HP and HPD were moderately correlated (Pearson’s r = 0.533; P < 0.001).
In Dawson et al. (2017), climate data such as mean annual temperature (MAT) and mean annual precipitation (MAP) were obtained at 1-minute resolution from WORLDCLIM (www.wordclim.org; mean annual temperature = BIO1 and mean annual precipitation = BIO12 from the bioclim variables), and averages were calculated for each TDWG level 4 region.