Simone Cardoni

and 7 more

Amplicon sequencing of the nuclear ribosomal 5S RNA gene arrays is highly promising for genotaxonomy, to resolve species’ genetic resources and tracing evolution. However, the huge amount of data retrieved with this approach is difficult to manage and prone to redundancy, error, and computational difficulties. Reducing the amount of data per sample without losing the conveyed molecular-phylogenetic signal is therefore a crucial step for downstream analyses. In this work, we compared Operational Taxonomic Units (OTUs) and Amplicon Sequence Variants (ASVs) from 5S intergenic spacer (5S-IGS) amplicons of seven beech species (Fagus spp.) obtained with two widely used and competing bioinformatics tools, MOTHUR and DADA2. We assessed qualitative and quantitative differences among sample profiles obtained with the two methods and the capacity of the derived phylogenetic inferences to enclose pivotal 5S-IGS variant types. Over 70% of processed reads were shared between OTUs and ASVs. Despite a strong reduction (>80%) of the representative sequences, DADA2-ASVs identified all main 5S-IGS variants known for Fagus, fully reflecting the overall genetic diversity patterns within each sample. In contrast, large proportions of low-abundant representative amplicons appeared in MOTHUR-OTUs and -ASVs profiles and were inference-wise redundant. We conclude that differences in the sequence variation detected by the two pipelines are minimal and provide no exclusive phylogenetic information. DADA2 ASVs are handier and may thus efficiently replace OTUs in future 5S-IGS studies aimed at deciphering complex bio-ecological phenomena such as hybridisation, polyploidisation, drift and inferring evolutionary pathways of species systems, especially when using increasingly large sample sets.

Roberta Piredda

and 4 more

Measuring biological diversity is a crucial but difficult undertaking, as exemplified in oaks where complex morphological, ecological, biogeographic and genetic differentiation patterns collide with traditional taxonomy that measures biodiversity in number of species (or higher taxa). In this pilot study, we generated High-Throughput Sequencing (HTS) amplicon data of the intergenic spacer of the 5S nuclear ribosomal DNA cistron (5S-IGS) in oaks, using six mock samples that differ in geographic origin, species composition, and pool complexity. The potential of the marker for automated geno-taxonomy applications was assessed using a reference dataset of 1770 5S-IGS cloned sequences, covering the entire taxonomic breadth and distribution range of western Eurasian Quercus, and applying similarity (BLAST) and evolutionary approaches (ML trees and EPA). Both methods performed equally well, with correct identification of species in sections Ilex and Cerris in the pure and mixed samples and main genotypes shared by species of sect. Quercus. Application of different cut-off thresholds revealed that medium-high abundance sequences (>10 or 25) suffice for a net species identification of samples containing one or few individuals. Lower thresholds identify phylogenetic correspondence with all target species in highly mixed samples (analogue to environmental bulk samples) and include rare variants pointing towards reticulation, incomplete lineage sorting, pseudogenic 5S units, and in-situ (natural) contamination. Our pipeline is highly promising for future assessments of intra-specific and inter-population diversity, and of the genetic resources of natural ecosystems, which are fundamental to empower fast and solid biodiversity conservation programs worldwide.