Rebekah Mohn

and 3 more

The increasing affordability of whole genome resequencing in the past five years and numbers of published reference genomes have enabled multispecies population genomic and phylogenomic studies on non-model organisms, but they raise new questions: what reference genomes should be used for read mapping in comparative studies, and what mapping methods provide the greatest and least bias in comparative genomics? Focusing on Eastern North American white oaks (Quercus sect. Quercus), which have an estimated 36 Ma divergence, we compared the effects of mapping resequencing data to four Quercus reference genomes, using three read-mapping methods: Bowtie2 –end-to-end, Bowtie2 –local, and BWA mapping methods. We analyzed the reference genomes and read-mapping methods in a fully factorial design to call variants and invariants for nine Quercus genome resequencing samples, then used the resulting datasets to test how different combinations of reference genome and method influence genotyping accuracy and bias. We found that both the genetic distance of the reference genome to the ingroup samples and mapping method together impacted sample heterozygosity, tree topology, and tree branch lengths. Specifically, the heterozygosity of closely-related sample/reference genome pairs using Bowtie2 –end-to-end alone was not significantly different from the average heterozygosity of samples that match the reference species. The outgroup reference genome resulted in low base pair recovery, low heterozygosity, and unbalanced phylogenies. We concluded that using a closely related, but not conspecific reference is ideal to minimize bias from the reference and Bowtie2 –end-to-end minimizes mismapping enabling the most accurate calls.