Data Collection
Occurrence data were collected from five sources: GBIF, iNaturalist,
Twitter, Instagram, and Flickr. iNaturalist, Twitter, Instagram, and
Flickr were selected because biological records could be extracted with
relative ease. Records from each source were collected from between 2000
and 2018 as these were the years where comparable environmental data
could be gathered and where JTM had been sufficiently sampled
(> 50 occurrences per year) from GBIF across the selected
study region.
The study region included the UK, Republic of Ireland, France, Belgium,
the Netherlands, Luxembourg, Switzerland, Czech Republic, Austria,
Germany, Denmark, and Italy (Fig. 1). This region represents a large
proportion of the known distribution of JTM and includes nations from
which biological records were reported to GBIF throughout 2000 – 2018.
Although the region does not encompass the hottest component of the
species’ climate niche (as records in this region were too sparse), this
should not affect predictions of habitat suitability for the range shift
of JTM into the UK, where conditions are cooler.
Data from iNaturalist were removed from the GBIF dataset to avoid any
duplication within the two datasets. Search terms (Table 1) were applied
for Twitter, Instagram, and Flickr to both original posts and their
subsequent comments to account for individuals who were unable to
identify JTM and were seeking identification. We only used occurrences
derived from posts and tweets that included an image of adult JTMs.
Duplicates from social media data arising from people sharing the same
information on different platforms were removed. All occurrence data
derived from social media platforms were manually checked to ensure that
identification of adult JTM was correct. Only occurrences that fell
within months of the year when JTM adults fly were retained in the
instances of Flickr, Twitter, and Instagram. All larval records were
removed from our GBIF dataset. We only obtained 16 records from Twitter,
so we did not include this data source in in any further analyses. For
data from Flickr, georeferences were automatically extracted using a
custom script, but for Instagram and Twitter, georeferenced data was
manually collected from individual posts where such information was
provided. Where georeferences were absent, no data were collected.
Table 1 | Summary of the search terms and processes
used to collect biological records of JTM across the study region. Hits
refers to the quantity of successful occurrences that contained all of
the required information for the study within the time span of the study
(2000 – 2018) and within the study region (the UK, Republic of Ireland,
France, Belgium, the Netherlands, Luxembourg, Switzerland, Czech
Republic, Austria, Germany, Denmark, and Italy). Note that searches on
Instagram are limited to hashtags rather than caption text. Data are
available from https://figshare.com/s/94529defd9aa93d18426, except
GBIF data which are available from the link in the table