Experimental tissue mixture samples
To explore how a metabarcoding primer portfolio performs on heterogeneous tissue mixtures with varying amounts of each constituent, such as for applications to aquaculture ingredient tracing, we created a complex fishmeal mixture that was further diluted with fillers. Across these experimental feeds, the proportion of sequencing reads recovered did not reliably reflect relative amounts of input tissue quantity, although taxa input at <1% of the fishmeal mixture consistently received a smaller proportion of reads and went undetected by markers more frequently than taxa that comprised at least 1% of the fishmeal. Squid (Loliolus sp. ) was the exception, likely because of inefficient DNA extractions from cephalopod tissues using traditional methods (Fig. 6; Lee, McFall-Ngai, Callaerts, & de Couet, 2009). Accordingly, our results are in line with conclusions from a recent review that suggested that a weak quantitative relationship may exist between relative DNA input amounts and sequence yields, albeit with a large degree of uncertainty (Lamb et al., 2019). Ecological studies of tissue mixtures or diets are typically not accompanied by sensitivity analyses of metabarcoding assays to assess detection limits, but food science applications using untargeted deep sequencing of genomic DNA also identified mixture constituents at >1% of an experimental composition (Haiminen et al., 2019; Ripp et al., 2014), concordant with our PCR-based results.
Poor recovery of Loliolus sp. highlights a key observation that although primer-binding and PCR amplification biases receive considerable attention in the metabarcoding literature (e.g. Deagle et al., 2019; Elbrecht & Leese, 2015), variation in DNA extraction efficiency from different tissue types and taxa may, in some cases, be the factor that undermines quantitative inferences from sequencing reads. One line of evidence for tissue extraction bias in our study comes from the consistent performance of certain taxa added to the fishmeal mixture across four markers that amplify three different barcoding genes (COI, 12S, 16S, Fig. 6). If PCR bias were the dominant effect, presumably the amplification bias would favor different taxa for each primer set. Instead, we conclude that discrepancies in DNA extraction efficiency are the most likely explanation for our result that Scomber scombrus and Salmo salar acquire the largest share of sequencing reads across all three genes (four markers).
Variation in read counts among the five species added in equal mass to our fish mixture suggests that idiosyncrasies of particular taxa are at least as important as the actual tissue input amount. Variation among taxa in low-input categories (0.01% and 0.1%) also could arise from heterogeneity within our fishmeal mixture. Inconsistent read proportions per taxon among markers, among fishmeal percentages in experimental feeds (Fig. 4), and between filler types (Fig. 5), as well as taxon drop-out (Fig. 6), highlight the uncertainty associated with accurately recovering constituents that comprise the smallest proportions of a mixture. Further, PCR repeatability becomes less reliable for very low DNA inputs (SI, Fig. S7). Although metabarcoding studies often filter data based on taxon occurrence in multiple technical replicates (Alberdi et al., 2018), this creates a conservative bias with respect to truly rare species.
The filler composition of experimental feed samples did not impact the recovery of reference taxa or proportion of sequencing reads attributed to those taxa, likely because the filler was sufficiently taxonomically divergent that universal fish (teleost) primers preferentially amplified the target DNA. Yet if filler competes with target DNA during PCR amplification – either because of lack of locus-specificity or taxonomic similarity between target taxa and matrix – then shallow sequencing depth could affect recovery of target taxa, especially those added in very small amounts. Here, we recovered most of the lowest tissue-input taxa (0.01%) with one or more markers, suggesting that sequencing depth is not a limiting factor in the present study.
Although PCR-free and untargeted approaches appear better-suited to quantitative inference of relative representation (Haiminen et al., 2019), these also require more starting material (Haynes, Jimenez, Pardo, & Helyar, 2019) and genome reference data, which is currently unavailable for many of the 30 reference taxa added to our experimental feeds. Further, the challenges of tissue mixtures – low inputs of fragmented DNA and the presence of inhibitors – will likely remain problematic for accurate quantification (Haynes et al., 2019), particularly as mixture complexity and heterogeneity increases.