Introduction
Increasingly, ecological studies can leverage DNA metabarcoding to count and identify the species present in an environment or a complex mixture of tissues. Frequent application areas in aquatic ecosystems include plankton surveys, food web analyses, and forensic identification of harvested species. Accurate and effective analysis of the DNA content in these sample types depends on a series of critical methodological decisions, foremost of which is the choice of barcoding primers (Aizpurua et al., 2018; Alberdi et al., 2019; Alberdi, Aizpurua, Gilbert, & Bohmann, 2018; Zhang, Zhao, & Yao, 2020; Zinger et al., 2019). Primer selection influences which taxa can be detected, and the taxonomic resolution to which they can be identified. Barcoding primers amplify sections of genes, which have been selected to provide a balance between having enough divergence to distinguish species and being conservative enough to allow amplification across major taxonomic groups. So-called universal primers, which rely on highly conserved nucleotide binding sites, are attractive because a single marker can amplify a wide range of taxa. However, the greater the breadth of taxa covered (e.g., all metazoa or all teleost fishes), the less likely that species-level identification will be possible because of a lack of sequence resolution when priming sites are conserved across divergent taxonomic groups. Another issue when attempting to obtain species identification is the failure of markers to amplify due to mismatches between primers and template sequences. These mismatches can lead to poor taxon recovery or cause less competitive taxa to drop out if sequencing depth is insufficient (Aizpurua et al., 2018). Primer-template mismatches are more common in diverse samples (Elbrecht et al., 2019); thus, researchers can improve recovery of constituents by combining universal primers that selectively amplify each focal taxonomic group with additional primers that offer species resolution (e.g., Thomsen et al., 2012) or using multiple primers that are optimized for different taxonomic groups (Aizpurua et al., 2018; Berry et al., 2017; Carroll et al., 2019; Evans et al., 2016; Jeunen et al., 2019; Koziol et al., 2019; Silva et al., 2019). Yet, even when using multiple primers, many studies do not obtain species-level assignments because of the challenge of balancing taxonomic breadth and resolution (Djurhuus et al., 2020; Leray & Knowlton, 2015; Locatelli, McIntyre, Therkildsen, & Baetscher, 2020).
Since the DNA in tissue mixtures of interest is often degraded, primers that target short DNA fragments, or minibarcodes, may recover a more complete amplification across taxa in such samples. Smaller barcodes are more readily amplified than longer fragments and these shorter fragments are more likely to persist in environmental samples (Shokralla et al., 2015; Staats et al., 2016) or stomach contents (Devloo-Delva et al., 2019). Studies that have compared full-length and minibarcodes for mitochondrial Cytochrome c oxidase I (COI) found that minibarcodes 200-300 bp provide comparable resolution to the full-length 658 bp barcode (Hajibabaei et al., 2006; Yeo, Srivathsan, & Meier, 2020). Moreover, full-length barcodes failed to amplify degraded samples (processed fish products), whereas minibarcodes recovered species-level sequences (Marín et al., 2018; Yeo, Srivathsan, & Meier, 2020). Short barcodes are also more economical to sequence than full-length barcoding genes, as current low-cost, high-throughput sequencing platforms tend to produce read lengths of ≤ 300 bp. This means that for barcodes shorter than this length researchers can obtain greater read depth for a given investment in sequencing, which can be important because greater sequencing depth potentially detects more rare taxa (Singer, Fahner, Barnes, McCarthy, & Hajibabaei, 2019; Smith & Peay, 2014).
While initial barcoding efforts for animals primarily leveraged variation within the COI gene (e.g., Barcode of Life, Ratnasingham & Hebert, 2007), several other mitochondrial genes have become attractive alternatives (e.g., Deagle, Jarman, Coissac, Pompanon, & Taberlet, 2014; Machida & Knowlton, 2012; Miya et al., 2015). The popularity of certain barcoding genes has made extensive high-quality reference data available via the NCBI and BOLD databases to support taxonomic assignments. Availability of suitable reference data for particular taxonomic groups and the accuracy of those data varies among barcoding genes (Leray, Knowlton, Ho, Nguyen, & Machida, 2019), hence it is a key factor in choosing primers.
Aquatic habitats – both marine and freshwater – have become popular targets for metabarcoding studies, likely because of the logistical challenges and considerable expense associated with traditional sampling and survey methodologies (e.g., Salter, Joensen, Kristiansen, Steingrund, & Vestergaard, 2019). A product of these studies are dozens of primer sets for fishes and aquatic taxa which offer researchers an abundance of reference data for interpreting metabarcoding results; yet choosing the optimal primer portfolio also requires assessment of amplification biases and potential sample degradation. To this end, some studies evaluate primers in silico and/or in the laboratory, but comparisons have been largely ad hoc and of limited geographic and taxonomic extent. Notably, the results of in silico assessments, which frequently guide primer selection, sometimes differ from those ofin vivo tests (Alberdi et al., 2019; Zhang et al., 2020). The most comprehensive comparison of eDNA and metabarcoding primers for fishes to date (Zhang et al., 2020), for example, assessed primers based exclusively on freshwater fishes from waterbodies in Beijing. Although such an assessment is beneficial, the results may have limited application to marine or endemic species outside this region, and therefore more empirical testing and comparison of the performance and complementarity of metabarcoding markers is needed.
Despite the proliferation of studies using multiple metabarcoding markers, few studies have experimentally tested the additive benefit of a portfolio of markers (each of which amplify a single locus) for obtaining high resolution (species- or genus-level) taxonomic assignments (but see Corse et al., 2019). Instead, many studies that rely on multiple primer sets use each one to identify different taxonomic groups (Berry et al., 2017) or to balance the trade-off between sequence identification at a high taxonomic rank and resolution of taxa within a rank (e.g., Carroll et al., 2019; Djurhuus et al., 2018). However, even within a single taxonomic group, different primers pairs may amplify different subsets of species due to polymorphisms in the primer regions, resulting in complementarity for detection of even closely related taxa. Further complementarity can be gained from varying levels of sequence divergence within the amplified targets, which may result in different markers allowing species-level resolution for different subsets of taxa. Identification to species-level is often important, such as when samples may include closely related species that must be distinguished for biodiversity accounting, fishery and wildlife management, and species conservation. Accordingly, careful design of primer portfolios can boost both the detection rates and resolution of metabarcoding studies, but little empirical testing has explored this potential.
To assess primer complementarity arising from amplification bias, reference data, and trade-offs between taxonomic resolution and breadth, we empirically assess 22 markers, some of which are universal fish primers and others that are taxon-specific, for their ability to recover species-level identification from a diverse reference DNA pool of >100 species of primarily marine and freshwater fishes, but also including a few representatives of other marine organisms (elasmobranchs, crustaceans, and cephalopods) to evaluate species recovery beyond the target taxonomic group. We then explore the utility of a portfolio approach using complementary markers that amplify sections of COI, 16S, and 12S genes. Marker performance is assessed based on the integrated effect of primer specificity and availability and resolution of reference sequences for the particular taxa in our DNA mixture, and – in this framework – markers are valuable when they contribute species identifications for taxa that are not identified by any other markers. We then test the optimal portfolio from our initial analysis on a set of different tissue mixtures to assess 1) the tissue input threshold to ensure detection; 2) how read depth scales with tissue abundance; and 3) the effect of non-target material in the mixture on recovery of target taxa (marker performance).
Our study was designed to optimize tools for forensic assessment of aquaculture feed composition and accordingly, our DNA pools were composed of aquatic taxa that might be found in fishmeal or other complex tissue mixtures derived from marine and inland fisheries (Mo, Man, & Wong, 2018; Tacon & Metian, 2008) and our tissue mixtures were designed to emulate aquaculture feeds. However, these mock feeds are very similar in nature to other types of tissue mixtures studied broadly in ecology, including stomach contents, fecal samples, and plankton tows. Hence, our overall findings and approach should be transferable to many applications of metabarcoding analysis of heterogeneous tissues.