Discussion
In this study, we demonstrate that occurrence records from Instagram and iNaturalist are found in different and more urbanised locations compared to occurrences from traditional datasets, such as GBIF. We therefore highlight the utility of these social media platforms as additional and complementary sources of data to traditional databases, such as GBIF. However, contrary to our predictions, occurrence data from Flickr offer a somewhat similar outlook to that provided by GBIF records. There was a notable difference in the environments surveyed by different social media platforms, with Flickr data occurring in more rural locations than data from iNaturalist, and Instagram occurring in still less rural areas.
As predicted, the majority of post-2009 occurrence records from GBIF for JTM within our study region fell within more rural areas, likely due to the majority of GBIF data originating from scientific surveys and formal recorder efforts, which largely operate in rural zones. Contrary to this, as predicted, Instagram largely contains data from highly urban zones, likely because urban areas are densely populated by humans and have good internet connections leading to geographic trends in human behaviour affecting whether they upload data to social media. However, data from Flickr are more rural than data from iNaturalist and Instagram. Flickr is tailored towards individuals with an interest in the quality of photography, which may attract wildlife photographers intent on capturing wildlife in relatively rural environments, rather than in urban zones. Overall, this demonstrates the utility of social media sites such as iNaturalist and Instagram to fill a void in the occurrence records provided by traditionally used data sources such as GBIF. Although we only studied one species, we think it is likely that the urban-rural differences between databases would remain for similar (colourful, eye-catching) species that would likely be uploaded to social media.
Our results also highlight the importance of accounting for recorder effort. The strong positive effect of recorder effort on GBIF occurrences indicates that JTM is detected where recorders are searching for it. Thus GBIF’s predictions of low suitability in urban areas is not necessarily trustworthy. Likewise, iNaturalist and Flickr data also occur in areas where recorder effort on their platforms is high, indicating that these data sources alone may not contain occurrences in all areas the species is found. In contrast to iNaturalist and Flickr, there was a much shallower relationship between the location of JTM records from Instagram and recorder effort. This suggests that Instagram is better at detecting a species in areas where it is not looked for by the majority of users. Instagram’s ability to both detect JTM in areas of lower GBIF-calculated habitat suitability and areas of higher urbanisation, as well as a relatively shallow effect of recorder effort makes it an ideal complement to traditional occurrence data for range shifters. The utility of Flickr and iNaturalist should not be discounted though, since both may make species records publically available more rapidly than GBIF.
It should be noted that recorder effort was particularly geographically uneven for social media sources and our results could be affected by this patchiness. It’s possible that blackbird recorder effort does not reflect JTM recorder effort, particularly within GBIF, since surveys for different taxa (birds and insects, in this instance) are likely to employ differing sampling techniques and audiences38. However, abundance data for insects with which to calculate recorder effort are rarely available. Moreover, using this species, which is well represented in all data sources, would allow for comparison of recorder effort between localities and time periods that could be applied to a wide range of taxa. There may be a novelty bias towards range shifting species, causing geographical and temporal variation in recorder effort. Given the varying, but broadly important, effect of recorder effort, developing improved recorder effort metrics could be particularly important to the use of social media data in biogeography and range-shift ecology. Even if not a precise, quantitative metric of recorder effort, the approach we developed is a useful tool for comparison between data sources, locations, and time periods. This is particularly important when dealing with social media data, which are prone to temporal and spatial trends and uneven geographical use.
Range-shifting and invasive species have previously been found to be human-associated, persisting in urban parks and gardens10. Although the extent of this association remains unknown, our results highlight the potential for social media data to track and understand range-shifting species in urban zones. Since Instagram’s focus is on photography, it could be used to track the arrival of eye-catching or charismatic taxa in urban area. However, a less recognisable or visually appealing species than JTM could generate fewer occurrences, and thus the repeatability of the use of Instagram data across different taxa requires further investigation. In addition, collection of ad hoc social media data may present opportunities for researchers to assess wildlife management practises in urban and suburban areas. Surveys of bug hotels, bird feeders, and mutualists from social media could be recorded to assess hotspots of positive management in cities, as well as areas that are deficient in their capacity to support biodiversity. Furthermore, social media data could be used to assess the persistence of endangered species in urban and suburban areas, adding to the work already compiled regarding the importance of gardens in supporting threatened or keystone taxa39. Our study also suggests that there may even be scope for assessing the potential for urban spaces to propagate range-shifts and invasions further in a similar way to forest corridors40. It is clear that, if robust and repeatable methodologies can be applied, social media data sources have a high potential to provide high quality data at speed. Furthermore, it is likely that these methods will only increase in importance as urbanisation rises globally41. It is also noteworthy that social media platforms such as Twitter have been used to promote uptake of the UK ladybird survey, yielding insights into the spread of the Harlequin ladybird42.
Scientific and policy-maker interest in community-science in urban areas is growing, given that urban environments are increasing, most people live in urban environments, and most nature experiences are close to home43. Noticing urban wildlife can improve mental and physical wellbeing44,45, and increasing engagement with urban nature offers opportunity for improved ecological literacy and nature connectedness, particularly amongst social groups that have historically had inequitable access to nature46,47. Our results further reinforce recent findings that social media platforms could be harnessed to assist in urban nature engagement and conservation48,49. While our results highlight a promising avenue for future studies and offer novel sources of data with new information, a fundamental area of improvement is the establishment of a rigorous and consistent methodology19. A source of uncertainty for this study is that the search terms and the access and use of APIs could not be made consistent across all social media data sources. The process by which data are attained would be benefited by greater consistency; the main barrier here is the expense of using the API services supplied by Instagram and Twitter. Both services have recency constraints and query limits associated with the free-to-use APIs, and the cost of more expansive API usage was outside of the budget of this study, costing up to £2000 per month depending on the service used at the time at which this study was conducted. This could be overcome with additional studies highlighting the importance of access to these data for scientists, thus prompting social media companies to produce an API service that is accessible to scientists. Alternatively, machine learning programmes such as UI Path could provide a more affordable and consistent method to gather data from online sources50. Implementation of alternative methodologies and different focal species are likely to increase the utility of Twitter, which was omitted from analyses due to a low sample size, and could permit use of other social media sources not considered here due to data accessibility, such as TikTok or Facebook.
A further potential issue with social media use is that there is not necessarily equal utilisation of these sources throughout all nations, particularly in those outside of Europe and North America. We have attempted to account for unequal usage in our study by using three different sources of social media data, but ideally more could be implemented. Search terms should also be considered with caution. We have included the search terms that yielded the most occurrences of JTM. However, there may be an English bias here, since social media users from non-English speaking individuals will likely submit potential occurrences using English and colloquial terminology. Although various common names are not always simple to incorporate (as was the case here, with German names such as “Spanish flag” and “Russian bear”, which yielded countless non-moth results when searched), this is certainly worthy of consideration. Social media data sources are also driven by trends, which may contribute to varying usefulness of different sources over time as the popularity and novelty of range shifting species wax and wane. Such an effect seemed to be apparent with JTM, where the inclusion of the moth on postage stamps in the Channel Islands was associated with an increase in GBIF and iNaturalist occurrences in 2012 and 2013 (which also illustrates that even GBIF is not resistant to trends). Nonetheless, such trends could also be a potential advantage to social media data sources. In theory, governments and scientists could highlight species of interest to the public, thus generating a trend around focal organisms that could be used to generate social media occurrence records. Such strategies could increase the use of social media to record biological phenomena, potentially producing large quantities of community science data.
The results presented here support the idea that the combined use of traditional (GBIF) and social media (particularly Instagram and iNaturalist) data sources to generate a more complete understanding of the habitat-use of range shifting species. Our study suggests that traditional and social media biodiversity data can contain different, but complementary, information regarding habitat usage of a range shifting species. While GBIF captures the rural range of JTM across the study region, Instagram demonstrated that JTM also occupies highly urbanised environments. Social media data may be particularly prone to variation in recorder effort, and we propose a method that can account for this. We suggest that data from social media should be added to occurrence datasets when tracking range shifting species. Implementation of occurrence records from social media could be particularly important given the human-associated nature of some range shifters, which often occupy parks and gardens in urban zones as well as rural spaces.