Comparing JTM’s habitat usage obtained from different data sources
In order to ask whether social media data sources included more urban records than did GBIF, we compared the logged intensity of night light between records from each source.
To ask whether GBIF data underestimated the urban component of JTM’s range-shift, we compared social media records to habitat suitability calculated using GBIF records. GBIF HSMs were produced using bioclim in the dismo package. Bioclim is a distance-based, boxcar method for assessing habitat suitability based on the similarity of bioclimatic variables between points in space36. Thus, bioclim is simple and robust, which is ideal for comparing habitat suitability at points in different regions and time periods, when the placement of pseudo-absences might strongly affect habitat suitability estimates. First, a single historic (‘GBIF-calculated’) HSM was calculated for 2000-2009 using the average climatic variables and occurrences of JTM in GBIF from these years, and night light data from 2012. This model was a suitable baseline as it would average out any unusual bioclimatic conditions that could occur within a single year and boasted a relatively large sample size (N = 775). We used a randomly selected 80% of the data points as training data to construct the historic model. Model performance (measured as area under the receiver operating curve; AUC, and calculation of the Boyce Index51) was calculated using the remaining 20% of the data as a testing dataset. In order to calculate the AUC and Boyce Index, pseudo-absences were generated by selecting random points from the same study region as the presence data with a 50% prevalence. When predicting suitable and unsuitable habitat, we used a sensitivity threshold of 0.9. This maximised the potential suitable habitat for JTM and partially accounted for underreporting. A threshold of 0.95 was also attempted but discarded since it classified areas that are almost certainly unsuitable for JTM (such as the Scottish Highlands22) as suitable.
The historic model was then used to predict the relative habitat suitability for JTM for each year between 2010 and 2018 across the study region using the climatic and night light variables for each year (with the exceptions of 2010 and 2011, which used the night light data from 2012). We extracted habitat suitability from HSMs at the coordinates of each occurrence of JTM from each data source across the study region for the years 2016–2018. The years 2016 – 2018 were selected as these had relatively large sample sizes for all data sources. In order to test if different data sources recorded JTM in areas of differing habitat suitability across the study region, a linear model was constructed with predicted habitat suitability at each occurrence of JTM as the response variable and the source of the occurrence data as a predictor variable. Any differences between sources were then investigated via Tukey’s post-hoc test. Predicted habitat suitability data extracted from JTM occurrence locations were log transformed to homogenise the variance and meet assumptions of linearity. Following this, to investigate if any differences were due to urbanisation, a linear model was produced with night light extracted from JTM occurrence locations as the response variable and the source of the occurrence data as the predictor variable. Night light data were square root transformed to meet the assumptions of linearity. Any differences between sources were then investigated via Tukey’s post-hoc test.
Any geographical area with extremes of climate could generate a bias when testing between predicted habitat suitability if one data source happened to be overrepresented in this extreme. For example, if iNaturalist was overrepresented in Italy and Italy was predicted to have a low habitat suitability for JTM (based on data from GBIF) due to extreme temperature, then this could confound a result which suggested that data from iNaturalist were located in areas of significantly lower habitat suitability. Since Italy represented the hottest parts of JTM’s range in the study area, we repeated all the above analyses without Italy included in the models and then compared the output of both Italy-included and Italy-omitted analyses. We did not do this for the coldest part of the range of JTM since the range shift into these colder climates (e.g. the UK) is foundational to our questions.