4.1. Perspectives on improvement
Taking full advantage of the potential of CBH will require several important improvements to focus on biodiversity inventories based on long and resolutive fragments: adapting sequencing depth and testing ofde novo DNA reconstruction.
In fact, the reconstruction was done here by mapping sequences to the existing SILVA databases using the program EMIRGE (Miller et al., 2011). This program developed for 16S has been shown to be extremely efficient, with 2224 full barcodes reconstructed, leading to 6132 detections in ten samples. Theoretically, a similar result could be expected for eukaryotic 18S, yet the numbers of identifications with EMIRGE were less than 6% compared to those with short CBH with Kraken2. To exclude the possibility that EMIRGE is not sufficient for 18S reconstruction, we also tested the program MAFFT for reconstruction (Katoh, Misawa, Kuma, & Miyata, 2002), which was also shown to be suitable, with similar results (data not presented). Thus, enhancing the sequencing depth is the main solution to improving the reconstruction of long fragments based on CBH.
For eDNA analysis in general, high sequencing depth is crucial (Singer, Fahner, Barnes, McCarthy, & Hajibabaei, 2019) and will allow de novo assembly instead of mapping algorithms, improving species identification (Deiner et al., 2017) and the detection of taxa or groups absent from the reference dataset. In this first study, both gene regions were pooled for sequencing, and the results suggest i) that the 16S probes were more efficient than the newly designed 18S probes in uncovering community richness, ii) the dominance of prokaryotic biomass was reflected, or iii) the method suffered from both limitations. The unbalanced biomass could be fixed by sequencing libraries from each rDNA separately. However, most bacterial diversity may be revealed with lower sequencing depth for these unicellular organisms, while the higher variations in body size and of the number of rDNA 18S copies among metazoans may result in a more uneven distribution of the number of fragments among taxa. In any case, a higher sequencing depth will be needed to unravel the diversity of the communities they form through full barcode reconstruction. Setting a generic number of sequences would not be realistic because the optimal number strongly depends on the diversity and biomass of the standing stock in the sediment. When studying new areas, pilot metabarcode studies may help tune the sequencing depth, allowing the optimal inventory of full-length metabarcodes in the studied ecosystems.
We tested here for the first time a new set of DNA probes that proved useful and efficient, yet the method still needs improvement to capture several taxa, such as nematodes and some other meiofauna taxa with low levels of detection. In fact, meiofauna of deep-sea sediments are generally dominated by nematodes and copepods in terms of biomass, abundance, and species richness (Zeppilli et al., 2018). Neither CBH nor MTB consistently reflects this, suggesting that not only sequencing depth but also the versatility of the set of probes needs to be enhanced.
Finally, the huge gaps in knowledge of marine biodiversity, magnified in the deep sea, result in a paucity of deep-sea sequences in nucleotide reference databases (Sinniger et al., 2016), inhibiting the efficiency of reference-based bioinformatic reconstructions (Mendoza, Sicheritz-Pontén, & Gilbert, 2014). This likely explains the very uneven reconstruction success across the phyla observed, with rather good results for Porifera, Annelida, and some Arthropods, while for other phyla such as Deuterostomia, Molluska, Nematoda, Cnidaria, and Platyhelminthes, no reconstruction could be obtained despite numerous identified sequences.