Data Collection
Occurrence data were collected from five sources: GBIF, iNaturalist, Twitter, Instagram, and Flickr. iNaturalist, Twitter, Instagram, and Flickr were selected because biological records could be extracted with relative ease. Records from each source were collected from between 2000 and 2018 as these were the years where comparable environmental data could be gathered and where JTM had been sufficiently sampled (> 50 occurrences per year) from GBIF across the selected study region.
The study region included the UK, Republic of Ireland, France, Belgium, the Netherlands, Luxembourg, Switzerland, Czech Republic, Austria, Germany, Denmark, and Italy (Fig. 1). This region represents a large proportion of the known distribution of JTM and includes nations from which biological records were reported to GBIF throughout 2000 – 2018. Although the region does not encompass the hottest component of the species’ climate niche (as records in this region were too sparse), this should not affect predictions of habitat suitability for the range shift of JTM into the UK, where conditions are cooler.
Data from iNaturalist were removed from the GBIF dataset to avoid any duplication within the two datasets. Search terms (Table 1) were applied for Twitter, Instagram, and Flickr to both original posts and their subsequent comments to account for individuals who were unable to identify JTM and were seeking identification. We only used occurrences derived from posts and tweets that included an image of adult JTMs. Duplicates from social media data arising from people sharing the same information on different platforms were removed. All occurrence data derived from social media platforms were manually checked to ensure that identification of adult JTM was correct. Only occurrences that fell within months of the year when JTM adults fly were retained in the instances of Flickr, Twitter, and Instagram. All larval records were removed from our GBIF dataset. We only obtained 16 records from Twitter, so we did not include this data source in in any further analyses. For data from Flickr, georeferences were automatically extracted using a custom script, but for Instagram and Twitter, georeferenced data was manually collected from individual posts where such information was provided. Where georeferences were absent, no data were collected.
Table 1 | Summary of the search terms and processes used to collect biological records of JTM across the study region. Hits refers to the quantity of successful occurrences that contained all of the required information for the study within the time span of the study (2000 – 2018) and within the study region (the UK, Republic of Ireland, France, Belgium, the Netherlands, Luxembourg, Switzerland, Czech Republic, Austria, Germany, Denmark, and Italy). Note that searches on Instagram are limited to hashtags rather than caption text. Data are available from https://figshare.com/s/94529defd9aa93d18426, except GBIF data which are available from the link in the table