Comparing JTM’s habitat usage obtained from different data
sources
In order to ask whether social media data sources included more urban
records than did GBIF, we compared the logged intensity of night light
between records from each source.
To ask whether GBIF data underestimated the urban component of JTM’s
range-shift, we compared social media records to habitat suitability
calculated using GBIF records. GBIF HSMs were produced using bioclim in
the dismo package. Bioclim is a distance-based, boxcar method for
assessing habitat suitability based on the similarity of bioclimatic
variables between points in space36. Thus, bioclim is
simple and robust, which is ideal for comparing habitat suitability at
points in different regions and time periods, when the placement of
pseudo-absences might strongly affect habitat suitability estimates.
First, a single historic (‘GBIF-calculated’) HSM was calculated for
2000-2009 using the average climatic variables and occurrences of JTM in
GBIF from these years, and night light data from 2012. This model was a
suitable baseline as it would average out any unusual bioclimatic
conditions that could occur within a single year and boasted a
relatively large sample size (N = 775). We used a randomly selected 80%
of the data points as training data to construct the historic model.
Model performance (measured as area under the receiver operating curve;
AUC, and calculation of the Boyce Index51) was
calculated using the remaining 20% of the data as a testing dataset. In
order to calculate the AUC and Boyce Index, pseudo-absences were
generated by selecting random points from the same study region as the
presence data with a 50% prevalence. When predicting suitable and
unsuitable habitat, we used a sensitivity threshold of 0.9. This
maximised the potential suitable habitat for JTM and partially accounted
for underreporting. A threshold of 0.95 was also attempted but discarded
since it classified areas that are almost certainly unsuitable for JTM
(such as the Scottish Highlands22) as suitable.
The historic model was then used to predict the relative habitat
suitability for JTM for each year between 2010 and 2018 across the study
region using the climatic and night light variables for each year (with
the exceptions of 2010 and 2011, which used the night light data from
2012). We extracted habitat suitability from HSMs at the coordinates of
each occurrence of JTM from each data source across the study region for
the years 2016–2018. The years 2016 – 2018 were selected as these had
relatively large sample sizes for all data sources. In order to test if
different data sources recorded JTM in areas of differing habitat
suitability across the study region, a linear model was constructed with
predicted habitat suitability at each occurrence of JTM as the response
variable and the source of the occurrence data as a predictor variable.
Any differences between sources were then investigated via Tukey’s
post-hoc test. Predicted habitat suitability data extracted from JTM
occurrence locations were log transformed to homogenise the variance and
meet assumptions of linearity. Following this, to investigate if any
differences were due to urbanisation, a linear model was produced with
night light extracted from JTM occurrence locations as the response
variable and the source of the occurrence data as the predictor
variable. Night light data were square root transformed to meet the
assumptions of linearity. Any differences between sources were then
investigated via Tukey’s post-hoc test.
Any geographical area with extremes of climate could generate a bias
when testing between predicted habitat suitability if one data source
happened to be overrepresented in this extreme. For example, if
iNaturalist was overrepresented in Italy and Italy was predicted to have
a low habitat suitability for JTM (based on data from GBIF) due to
extreme temperature, then this could confound a result which suggested
that data from iNaturalist were located in areas of significantly lower
habitat suitability. Since Italy represented the hottest parts of JTM’s
range in the study area, we repeated all the above analyses without
Italy included in the models and then compared the output of both
Italy-included and Italy-omitted analyses. We did not do this for the
coldest part of the range of JTM since the range shift into these colder
climates (e.g. the UK) is foundational to our questions.