Detection success and multiple primer set complementarity
To allow fair comparison, we subsampled reads to improve evenness of coverage across markers and removed three markers with insufficient data (<1,000 reads or <0.1% of the mean number of reads per locus) from further analyses (Teleo, Crust2, and 16Sfish). One additional primer, 16Svar, had low yield (<10,000 reads, <1% of the mean), but included sufficient data for data decontamination so was retained for analysis. Out of the 19 retained markers, a combination of four identified all 60 families of marine and freshwater taxa in the full reference DNA pool. These four markers, FishminiA, nsCOIFo, MiFish, and CEP (for details, see Table 1) provided sufficient taxonomic resolution to correctly recover the genus of 90.9% of taxa and identify to species 58.6% of input taxa (Fig. 1). All but one of the of the species in our reference pool (Petrochromis kazumbe ) had reference data for at least one of the four markers, so the frequent lack of species-level detection resulted from insufficient sequence variation within the amplified target rather than database representation. Two additional markers (aquaF2 and either aquaF3 or shark474) allowed recovery to genus-level of three more reference taxa (83 of 88 genera; 94.3%) and adding two additional markers (aquaF2 and Fish_COILBC) identified two of the remaining known taxa to species level, but the remaining 13 markers did not. Genus-level assignments were more successful than species-level assignment because BLAST hits to multiple unique species within the top 2% of hits were aggregated to the genus-level.
Marker performance was broadly consistent across taxonomic levels (species, genus, family), with COI markers generally performing better than other barcoding genes. This success was at least partially attributable to the more extensive coverage of our focal taxa in the reference database for COI (SI, Fig. S5). The two top performing markers target adjacent but non-overlapping regions of the COI gene, and of these, the single best marker identified 90% of reference taxa to family, 78.4% to genus, and 41.7% to species (Fig. 2). In combination, these two COI markers identified 95% of families, 85% of them to genus level, and just under 50% to species-level. The best 16S and 12S markers recovered fewer taxa to species-level (~25% each), but contributed taxa not identified by any other primer, supporting the value of the portfolio (Fig. 2).
For taxon-specific markers, the COI markers for elasmobranchs and plankton identified nearly as many reference taxa (the majority of which were teleosts) as the top-performing COI marker, and with similar taxonomic resolution, and thus were of more general use. However, crustacean and cephalopod markers had limited use outside of these targeted groups (Fig. 2). In contrast, the more general fish markers successfully identified the few representative elasmobranch, crustacean, and cephalopod samples included in our DNA pool, suggesting broader taxonomic reach of those primers.