loading page

Missing genotype imputation in non-model species using Self-Organizing Maps
  • +1
  • Fernando Mora-Márquez,
  • Juan Carlos Nuño,
  • Álvaro Soto,
  • Unai López de Heredia
Fernando Mora-Márquez
Universidad Politécnica de Madrid
Author Profile
Juan Carlos Nuño
Universidad Politécnica de Madrid
Author Profile
Álvaro Soto
Universidad Politécnica de Madrid
Author Profile
Unai López de Heredia
Universidad Politécnica de Madrid

Corresponding Author:unai.lopezdeheredia@upm.es

Author Profile

Abstract

Current methodologies of genome-wide Single Nucleotide Polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on Self-Organizing Maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. We follow a classical approach that explores genotype datasets to select SNP loci for each query missing SNP genotype to build training sets, and that initializes and trains the neural networks to finally use the SOM-derived clustering to impute the best genotype. To automate the imputation process, we have implemented GTIMPUTATION, an open source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.
01 Jun 2023Submitted to Molecular Ecology Resources
05 Jun 2023Submission Checks Completed
05 Jun 2023Assigned to Editor
05 Jun 2023Review(s) Completed, Editorial Evaluation Pending
09 Jun 2023Reviewer(s) Assigned