2. Materials and Methods
2.1 Study sites and field data collection
Fieldwork was conducted in the period March to July of 2016-2019 at 11 woodland sites in the UK. Sites selected were pre-existing Hawfinch ringing studies study areas within the Wye Valley, Dolgellau, Cardiff, the New Forest and Norfolk (Figure 1). The artificial feed sites used to attract Hawfinches for capture have been operational for a number of years within regions of Hawfinch population strongholds (Clements, 2013; Kirby et al., 2018). Study s ites were broadly typical of British mixed broadleaved woodland, with sites in the Wye Valley and north Wales dominated by beech, oak and ash (Fraxinus excelsior , Linnaeus). The study site located in Norfolk was a mixed woodland consisting of lime (Tilia sp., Linnaeus), ash and maples (Acer sp., Linnaeus), while the New Forest site was dominated by oak, with an understorey flora comprising of Holly (Ilex sp., Linnaeus) and bramble (Rubus sp., Linnaeus). All site locations are approximate for anonymity.
Hawfinches were caught by trained bird ringers, operating under British Trust for Ornithology (BTO) approved licences using either mist or whoosh nets. For each bird captured morphometric data and time of capture were recorded, including maximum chord wing length, sex and body mass (Svensson, 1992). Hawfinches were individually placed within a clean, paper bag which was then placed inside a cotton bag and left for 10-20 minutes until the bird defecated. To avoid excessive stress, if birds had not defecated within this time frame they were processed and released. Each faecal sample was removed from the paper bags using a new, clean plastic toothpick to minimise contamination. Samples were placed in separate 2ml Eppendorf tubes, and frozen to -20oC at 1-8h after collection.
2.2 Dietary analysis
DNA was extracted from a total of 365 faecal samples using the Qiagen QIAamp DNA Stool Mini Kit (Qiagen, UK) with modifications designed to improve DNA yield from avian faeces, following Shutt et al.(2020). Samples were extracted in batches of 23 with an additional negative control containing no DNA. We used primers UniplantF, 5′-TGTGAATTGCARRATYCMG-3′ and UniPlantR 5′-CCCGHYTGAYYTGRGGTCDC-3′ to amplify a 187-387-bp fragment covering the ITS2 region of plant nuclear DNA (Moorhouse-Gann et al., 2018). For amplification of invertebrate DNA, mlCOIintF, 5’–GGWACWGGWTGAACWGTWTAYCCYCC-3’ (Leray et al.2013) and Nancy 5’-ACTAGCAGTACCCGGTAAAATTAAAATATAAACTTC-3’, (Simonet al. 1992) were used following selection and modification by Stockdale (2018) for amplification of a 306-bp fragment of the COI region. Primer sets were validated by Stockdale (2018) to ensure DNA amplification of the expected range of taxa. A two-stage PCR process involved initial amplification of the target regions followed by the addition of a unique combination of 10-bp molecular identifier tags (MID-tags), with samples having a unique pairing of forward and reverse tags for subsequent sample identification. Within each PCR 96-well plate, 12 negative (extraction and PCR) and two positive controls were included following Taberlet et al. (2018). Amplicons were multiplexed into five pools, each containing between 63 and 186 samples. Library preparation for Illumina sequencing was undertaken via NEXTflex Rapid DNA-Seq kit (Bioo Scientific, Austin, USA), with a unique adapter added to each pool. Pools were diluted to 4nM and quantified using Qubit dsDNA High‐sensitivity Assay Kits. Finally, the diluted pools were combined equimolarly and sequenced on a MiSeq desktop sequencer via a v2 chip with 2 x 250bp paired-end reads (expected capacity 24-30,000,000 reads). Due to the unbalanced nature of the amplicon libraries, a 15% PhiX buffer was added to the sequencing run in order to improve cross-talk and phasing calculations.
2.3 Identification of dietary taxa
The Illumina run generated 6,328,388 and 12,307,560 ITS2 and COI reads respectively. All reads were checked for truncation, quality checked, filtered and assigned taxonomic identification following Drake et al., (2021). All read counts less than the maximum in unused-MID tag combinations and negative controls for each respective zero radius OTU (zOTU) were removed prior to statistical analysis. Read counts of non-positive control taxa detected within positive controls were calculated as a percentage of the maximum read count for that taxon. The greatest of these percentages was then applied as a measure of read percentage to be removed from each taxon. This accounts for over-represented taxa tag jumping or “leaking” into other samples during sequencing. Optimal thresholds of 3% and 1% were applied to ITS2 and COI data respectively. No known lab contaminants such as German cockroach (Blatella germanica , Linnaeus) or non-target taxa that could be identified as lab contamination were detected in either library. Data from respective ITS2 and COI libraries were aggregated together to form a single taxon list for each marker, and non-target taxa such as fungi and gastrotrichs were removed. In order to standardise the taxonomic level and create evenness for analysis, all taxa were converted to genus-level, as some zOTUs could not be resolved to species.
2.4 Statistical analysis
For all statistical analysis, the presence/absence of each taxonomic unit within a sample was used instead of read count, as the latter is not an accurate representation of abundance due to amplification biases (Clare, Symondson, & Fenton, 2014; Yu et al., 2012). Control samples were excluded from the analyses. All statistical analysis were carried out in R version 3.6.3 (R Core Team, 2020) unless otherwise stated.
To evaluate the most prevalent plant and invertebrate taxa within Hawfinch diet, the number of samples in which a dietary zOTU occurred (frequency of occurrence), was calculated. In order to estimate the total dietary niche breadth, the specpool function in R’s vegan package (Oksanen et al., 2019) was used to calculate Chao’s incidence-based estimator of richness (Chao & Jost, 2012; Oksanen et al., 2019). This is based upon presence/absence of dietary taxa found in sample sites, and gives a single estimate for a collection of sampling sites (Oksanen et al., 2019). Observed species richness divided by the Chao estimate gave the proportion of total dietary diversity detected. Species accumulation curves were also produced using the poolaccumfunction within the vegan package, relating the overall dietary diversity captured to the number of faecal samples analysed. To determine whether plant or invertebrate taxonomic richness in Hawfinch diet was greater, a Wilcoxon matched-pairs test was undertaken to test for a significant difference in the median species richness of plant and invertebrate taxa detected. Only Hawfinch samples which provided taxonomic results for both invertebrate and plant taxa were included. Based upon ringing recapture data, no individual Hawfinch that had tested positive for both plant and invertebrate DNA was included more than once.
To test our hypotheses that diet varied among locations and between the sexes we compared dietary composition between sampling regions and sexes using multivariate generalised linear models (MGLMs) using the functionmanyglm within the package mvabund (Wang, Naumann, Eddelbuettel, John, & Warton, 2012). Regions were broadly categorised into: Wye Valley, Dolgellau, north Cardiff, New Forest and Norfolk, as some regions contained multiple catching sites. Where an individual was sampled more than once, data was used from the first capture only to avoid pseudo replication and subsequent biases. The functionanova.manyglm in mvabund was used to test the significance of each term within the overall model and the p.uni = “adjusted” argument was implemented to perform tests for each species in turn, adjusting the p -values with Holm’s step down resampling algorithm (Westfall & Young, 1993). Parametric bootstrap (Monte Carlo) resampling was applied to test for dietary differences, ensuring inferences took into account correlation between variables (Wang et al., 2012). When necessary, pairwise comparisons were performed using thepairwise.comp function of anova.manyglm . Models were simplified using the step function based upon the lowest Akaike’s information criterion (AIC) value. For all models, diagnostic plots were checked to ensure model assumptions were met. Dietary differences were visualised using non-metric multidimensional scaling analysis (nMDS) via the function metaMDS in the vegan package (Oksanen et al., 2019). The nMDS was performed with Jaccard dissimilarities in two-dimensional space (k=2). Spider plots were produced using nMDS results via ordispider and plotted through ggplot2 (Wickham, 2016). Singletons and outliers were removed to visually improve the ordination.