2.7 Species distribution modeling
Occurrence data for each species was obtained from the specimens used in this study, “expert” identified individual occurrences from GBIF, and research grade locality records from iNaturalist (www.inaturalist.org). This resulted in a total of 43 T. blandingii localities and 30T. pulverulenta localities (Fig. S1). Duplicate records were removed, and points were thinned within a distance of 10-kilometers using the spThin package (Aiello-Lammens, Boria, Radosavljevic, Vilela, & Anderson, 2015) in R v. 3.4.4 (R Core Team, 2018). A subset of points from each data set was set aside for model calibration (75%) and internal testing (25%) following Cobos, Peterson, Barve, and Osorio-Olvera (2019).
Environmental data were obtained from the WorldClim database v. 1.4 (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). Fifteen of the 19 bioclim variables were downloaded at a 2.5-minute resolution. We excluded bio8, bio9, bio18, and bio19 which are known to create artifacts in distribution models (Escobar, Lira-Noriega, Medina-Vogel, & Peterson, 2014). The same 15 variables were used for the Last Glacial Maximum (LGM) under three general circulation models (GCMs): CCSM4, MIROC-ESM, and MPI-ESM-P. In order to reduce spatial autocorrelation, principal component analyses (PCAs) were performed on present bioclim variables and projected to the LGM for the extent of sub-Saharan Africa.
Model calibration areas were defined as a 1000-kilometer buffer around occurrence points for each species. Model calibration, creation, projection, and evaluation were done using the R package kuenm (Cobos et al., 2019). In order to calibrate our models, we created 1479 candidate models for each species by combining three sets of environmental predictors (PCAs 1–6, 1–5, 1–4), 17 possible regularization multipliers (0.1–1.0 at intervals of 0.1, 2–6 at intervals of 1, and 8 and 10), and all combinations of five feature classes (linear = l, quadratic = q, product = p, threshold = t, and hinge = h; Cobos et al., 2019).
Candidate models were run in Maxent (Phillips, Anderson, & Schapire, 2006) and chosen based on significant partial ROC scores (Peterson, Papes, & Soberón, 2008), omission rates of E ≤ 5% (Anderson, Lew, & Peterson, 2003), and AICc scores of ≤ 2 to minimize model complexity (Warren & Seifert, 2011). These models determined the parameter set used for final model creation.
Final models were created for each species using the full set of occurrence records and the parameters chosen during model calibration. Models were run in Maxent with ten bootstrap replicates and logistic outputs. After models were run in the present, they were projected to the LGM and mid-Holocene for the three GCMs. The mobility-oriented parity (MOP) index was used to test for model extrapolation (Soberón & Peterson, 2005). Models were visualized in QGIS 3.4 and thresholded to 5% to create presence-absence maps. Models from each time period were summed to estimate potential LGM and mid-Holocene distributions as well as continuous stability maps (Devitt, Devitt, Hollingsworth, McGuire, & Moritz, 2013; Yannic et al., 2014).