loading page

GenoPop-Impute: Efficient and accurate whole-genome genotype imputation in non-model species for evolutionary genomic research
  • Marie Gurke,
  • Frieder Mayer
Marie Gurke
Museum für Naturkunde - Leibniz-Institut für Evolutions- und Biodiversitätsforschung

Corresponding Author:margurke@gmail.com

Author Profile
Frieder Mayer
Museum für Naturkunde - Leibniz-Institut für Evolutions- und Biodiversitätsforschung
Author Profile

Abstract

Missing genotypes in DNA sequence data are an issue in many evolutionary genomic studies, especially of non-model organisms. It can be addressed using genotype imputation. However, algorithms that do not require additional genotype data as reference for imputation, which is often not available for non-model taxa, and are able to work with large whole-genome data sets are scarce. Therefore, we developed a new algorithm called GenoPop-Impute, which imputes the whole genome in separate batches and employs a random forest algorithm for imputation of correlated data sets. The batch-wise approach utilizes linkage disequilibrium to increase imputation accuracy and allows computational parallelization and thus efficiency. Tests on simulated data demonstrate that linkage disequilibrium between SNPs has a positive effect on imputation accuracy, due to correlation that originated in a shared evolutionary history. In comparison to two alternative algorithms, GenoPop-Impute is more accurate and is the only one computationally applicable to data sets of whole genomes. In addition, we found that GenoPop-Impute also increases the accuracy of commonly estimated population genomic metrics and mitigates biases due to missing data in demographic modeling experiments. We conclude that genotype imputation can be a valuable tool for evolutionary genomic studies of non-model taxa and that GenoPop-Impute is a highly suitable algorithm for this.
15 Aug 2024Submitted to Molecular Ecology Resources
22 Aug 2024Submission Checks Completed
22 Aug 2024Assigned to Editor
22 Aug 2024Review(s) Completed, Editorial Evaluation Pending
02 Sep 2024Reviewer(s) Assigned