The Critical Role of Choosing Appropriate Genotype Caller and Reference
Genome for Population Genomic Inference: A Demonstration Study with Five
Pterocarya Species
Abstract
Contemporary population genomic studies typically involve mapping raw
reads to a reference genome and analyzing single nucleotide polymorphism
(SNP) data obtained from variant calling. Despite the widespread use of
the genotype caller GATK for variant calling, its design primarily for
human data poses limitations in non-human species. Recently, ATLAS has
emerged as a promising alternative caller, exhibiting superior
performance with lower false positive and negative rates, significantly
impacting phylogenomic inferences. However, the extent to which ATLAS
versus GATK influences downstream population genomic analyses remains
largely unexplored. To address this gap, we conducted a population
genomic study on five Pterocarya species using GATK and ATLAS, alongside
two reference genomes, P. stenoptera and P. macroptera. Analyzing four
datasets, we evaluated mapping depth, coverage rate, linkage
disequilibrium (LD), nucleotide diversity (π), population structure, and
demographic history. Notably, using P. stenoptera as the reference
genome resulted in less depth and coverage rate variation across species
compared to P. macroptera. ATLAS consistently identified more SNPs,
higher nucleotide diversity, and lower LD for both reference genomes.
Population structure results were more sensitive to the choice of
reference genome than callers, while both reference genomes and callers
significantly influenced population demography inference. Our study
emphasizes the critical impact of genotype caller and reference genome
selection on downstream analyses. Based on current evidence, selecting a
closely related reference genome and employing ATLAS for SNP calling are
recommended to enhance the accuracy and reliability of population
genomic studies.