Vasilii Shapkin

and 7 more

The nuclear ribosomal DNA Internal Transcribed Spacer (ITS) region is used as universal fungal barcode marker, but it is often missing significant DNA barcoding gap between sister taxa. Here we tested reliability of protein coding low-copy genes as alternative markers. Mock communities of three unrelated agaric genera (Dermoloma, Hodophilus and Russula) representing lineages of closely related species were sequenced by Illumina platform targeting ITS1, ITS2, the second largest subunit of RNA polymerase II gene (rpb2) and the transcription elongation factor 1-alpha gene (ef1-α) regions. The representation of species and their relative abundances were similar in all tested barcode regions, despite lower copy number in protein coding markers. ITS1 and ITS2 required more sophisticated sequence filtering because they produced a high number of chimeric sequences requiring reference-based chimera removal and had higher number of sequence variants per species. Clustering of filtered ITS sequences showed in average higher number of correctly clustered units at best fitted similarity thresholds, but these thresholds were very different among genera. Best fitted thresholds of low-copy markers were more consistent among genera but species resolution was frequently missing due to low intraspecific variability. At some thresholds we observed multiple species lumped together and, at the same time, species split in multiple partial clusters, which should be taken into consideration when assessing best clustering thresholds and taxonomic identity of clusters. For best taxonomic resolution and better species detection, we recommend to combine different markers and to apply additional reference-based sorting of clusters.

Joao Saraiva

and 4 more

An estimated 8.7 million eukaryotic species exist on our planet. However, recent tools for taxonomic classification of eukaryotes only dispose of 734 reference genomes. As most Eukaryotic genomes are yet to be sequenced, the mechanisms underlying their contribution to different ecosystem processes remain untapped. Although approaches to recover Prokaryotic genomes have become common in genome biology, few studies have tackled the recovery of Eukaryotic genomes from metagenomes. This study assessed the reconstruction of Eukaryotic genomes using 215 metagenomes from diverse environments using the EukRep pipeline. We obtained 447 eukaryotic bins from 15 classes (e.g., Saccharomycetes, Sordariomycetes, and Mamiellophyceae) and 16 orders (e.g., Mamiellales, Saccharomycetales, and Hypocreales). More than 73% of the obtained eukaryotic bins were recovered from samples whose biomes were classified as host-associated, aquatic and anthropogenic terrestrial. However, only 93 bins showed taxonomic classification to (9 unique) genera and 17 bins to (6 unique) species. A total of 193 bins contained completeness and contamination measures. Average completeness and contamination were 44.64% (σ=27.41%) and 3.97% (σ=6.53%), respectively. Micromonas commoda was the most frequent taxa found while Saccharomyces cerevisiae presented the highest completeness, possibly resulting from a more significant number of reference genomes. However, mapping eukaryotic bins to the chromosomes of the reference genomes suggests that completeness measures should consider both single-copy genes and chromosome coverage. Recovering eukaryotic genomes will benefit significantly from long-read sequencing, intron removal after assembly, and improved reference genomes databases.