loading page

Read mapping stringency and genetic relatedness to the reference genome significantly impact multispecies population genetic and phylogenetic analyses
  • +1
  • Rebekah Mohn,
  • Mira Garner,
  • Paul Manos,
  • Andrew Hipp
Rebekah Mohn
The Morton Arboretum

Corresponding Author:rebekah.mohn@gmail.com

Author Profile
Mira Garner
Field Museum of Natural History
Author Profile
Paul Manos
Duke University
Author Profile
Andrew Hipp
The Morton Arboretum
Author Profile

Abstract

The increasing affordability of whole genome resequencing in the past five years and numbers of published reference genomes have enabled multispecies population genomic and phylogenomic studies on non-model organisms, but they raise new questions: what reference genomes should be used for read mapping in comparative studies, and what mapping methods provide the greatest and least bias in comparative genomics? Focusing on Eastern North American white oaks (Quercus sect. Quercus), which have an estimated 36 Ma divergence, we compared the effects of mapping resequencing data to four Quercus reference genomes, using three read-mapping methods: Bowtie2 –end-to-end, Bowtie2 –local, and BWA mapping methods. We analyzed the reference genomes and read-mapping methods in a fully factorial design to call variants and invariants for nine Quercus genome resequencing samples, then used the resulting datasets to test how different combinations of reference genome and method influence genotyping accuracy and bias. We found that both the genetic distance of the reference genome to the ingroup samples and mapping method together impacted sample heterozygosity, tree topology, and tree branch lengths. Specifically, the heterozygosity of closely-related sample/reference genome pairs using Bowtie2 –end-to-end alone was not significantly different from the average heterozygosity of samples that match the reference species. The outgroup reference genome resulted in low base pair recovery, low heterozygosity, and unbalanced phylogenies. We concluded that using a closely related, but not conspecific reference is ideal to minimize bias from the reference and Bowtie2 –end-to-end minimizes mismapping enabling the most accurate calls.