Genotyping discordances? Empirical comparison of base-selective adaptors
impact in 2b-RAD studies
Abstract
Population genomic studies are increasing in the last decade, showing
great potential to understand the evolutionary patterns in a great
variety of organisms, mostly relying on RAD sequencing techniques to
obtain reduced representations of the genomes. Among them, 2b-RAD can
provide further secondary reduction to adjust study costs by using
base-selective adaptors, although its impact on genotyping is unknown.
Here we provide empirical comparisons on genotyping and genetic
differentiation when using fully degenerate and base-selective adaptors
and assess the impact of missing data. We built libraries with the two
types of adaptors for the same individuals and generated independent and
combined datasets with different missingness filters according to their
presence (100%, 75% and 50%). Exploring locus-by-locus, we found 92%
of identical genotypes between both libraries of the same individual
when using loci present in 100% of the samples, which decreased to 35%
when working with loci present in at least 50% of them. We show that
missing data is a major source of individual genetic differentiation.
The loci discordant by genotyping were in low frequency (7.67%) in all
filtered files. Only 0.96% were directly attributable to base-selective
adaptors, and 6.44% underestimated heterozygosity in NN libraries, of
which ca. 70% had <10 reads per locus indicating that
sufficient read depth should be ensured for a correct genotyping. Our
work confirms that 2b-RAD libraries using base-selective adaptors are a
robust tool to use in population genomics of species with large genome
sizes.