Experimental tissue mixture samples
To explore how a metabarcoding primer portfolio performs on
heterogeneous tissue mixtures with varying amounts of each constituent,
such as for applications to aquaculture ingredient tracing, we created a
complex fishmeal mixture that was further diluted with fillers. Across
these experimental feeds, the proportion of sequencing reads recovered
did not reliably reflect relative amounts of input tissue quantity,
although taxa input at <1% of the fishmeal mixture
consistently received a smaller proportion of reads and went undetected
by markers more frequently than taxa that comprised at least 1% of the
fishmeal. Squid (Loliolus sp. ) was the exception, likely because
of inefficient DNA extractions from cephalopod tissues using traditional
methods (Fig. 6; Lee, McFall-Ngai, Callaerts, & de Couet, 2009).
Accordingly, our results are in line with conclusions from a recent
review that suggested that a weak quantitative relationship may exist
between relative DNA input amounts and sequence yields, albeit with a
large degree of uncertainty (Lamb et al., 2019). Ecological studies of
tissue mixtures or diets are typically not accompanied by sensitivity
analyses of metabarcoding assays to assess detection limits, but food
science applications using untargeted deep sequencing of genomic DNA
also identified mixture constituents at >1% of an
experimental composition (Haiminen et al., 2019; Ripp et al., 2014),
concordant with our PCR-based results.
Poor recovery of Loliolus sp. highlights a key observation that
although primer-binding and PCR amplification biases receive
considerable attention in the metabarcoding literature (e.g. Deagle et
al., 2019; Elbrecht & Leese, 2015), variation in DNA extraction
efficiency from different tissue types and taxa may, in some cases, be
the factor that undermines quantitative inferences from sequencing
reads. One line of evidence for tissue extraction bias in our study
comes from the consistent performance of certain taxa added to the
fishmeal mixture across four markers that amplify three different
barcoding genes (COI, 12S, 16S, Fig. 6). If PCR bias were the dominant
effect, presumably the amplification bias would favor different taxa for
each primer set. Instead, we conclude that discrepancies in DNA
extraction efficiency are the most likely explanation for our result
that Scomber scombrus and Salmo salar acquire the largest
share of sequencing reads across all three genes (four markers).
Variation in read counts among the five species added in equal mass to
our fish mixture suggests that idiosyncrasies of particular taxa are at
least as important as the actual tissue input amount. Variation among
taxa in low-input categories (0.01% and 0.1%) also could arise from
heterogeneity within our fishmeal mixture. Inconsistent read proportions
per taxon among markers, among fishmeal percentages in experimental
feeds (Fig. 4), and between filler types (Fig. 5), as well as taxon
drop-out (Fig. 6), highlight the uncertainty associated with accurately
recovering constituents that comprise the smallest proportions of a
mixture. Further, PCR repeatability becomes less reliable for very low
DNA inputs (SI, Fig. S7). Although metabarcoding studies often filter
data based on taxon occurrence in multiple technical replicates (Alberdi
et al., 2018), this creates a conservative bias with respect to truly
rare species.
The filler composition of experimental feed samples did not impact the
recovery of reference taxa or proportion of sequencing reads attributed
to those taxa, likely because the filler was sufficiently taxonomically
divergent that universal fish (teleost) primers preferentially amplified
the target DNA. Yet if filler competes with target DNA during PCR
amplification – either because of lack of locus-specificity or
taxonomic similarity between target taxa and matrix – then shallow
sequencing depth could affect recovery of target taxa, especially those
added in very small amounts. Here, we recovered most of the lowest
tissue-input taxa (0.01%) with one or more markers, suggesting that
sequencing depth is not a limiting factor in the present study.
Although PCR-free and untargeted approaches appear better-suited to
quantitative inference of relative representation (Haiminen et al.,
2019), these also require more starting material (Haynes, Jimenez,
Pardo, & Helyar, 2019) and genome reference data, which is currently
unavailable for many of the 30 reference taxa added to our experimental
feeds. Further, the challenges of tissue mixtures – low inputs of
fragmented DNA and the presence of inhibitors – will likely remain
problematic for accurate quantification (Haynes et al., 2019),
particularly as mixture complexity and heterogeneity increases.