loading page

Popfinder: A highly effective artificial neural network package for genetic population assignment
  • +2
  • Katie Birchard,
  • Chris Boccia,
  • Heather Lounder,
  • Lila Colston-Nepali,
  • Vicki Friesen
Katie Birchard
Queen's University
Author Profile
Chris Boccia
Queen's University
Author Profile
Heather Lounder
Queen's University
Author Profile
Lila Colston-Nepali
Queen's University
Author Profile
Vicki Friesen
Queen's University

Corresponding Author:vlf@queensu.ca

Author Profile

Abstract

The ability to assign biological samples to source populations based on genetic variation with high accuracy and precision is important for numerous applications from ecological studies through wildlife conservation to epidemiology. However, population assignment when genetic differentiation is low is challenging, and methods to address this problem are lacking. The application of artificial neural networks to population assignment using genomic data is highly promising. Here we present popfinder: a new, easy to use Python-based artificial neural network pipeline for genetic population assignment. We tested popfinder both with simulated genetic data from populations connected by varying levels of gene flow, and with reduced-representation sequence data for three species of seabirds with weak to no population genetic structure. Popfinder was able to assign individuals to their source populations with high accuracy, precision and recall in most cases, including both simulated and empirical datasets, except in the weakest empirical population structure dataset, where the comparator programs also performed poorly. Compared to other available software, popfinder was slower on the simulated data sets due to hyperparameter tuning and the fact that it does not reduce the dimensionality of the data set; however, all programs ran in seconds on empirical data sets. Additionally, popfinder provides a perturbation ranking method to help develop optimized SNP panels for genetic population assignment, and is designed to be user-friendly. Finally, we caution users of all assignment programs to watch both for leakage of data during model training, and for unequal detection probabilities.
05 Nov 2024Submitted to Molecular Ecology Resources
05 Nov 2024Submission Checks Completed
05 Nov 2024Assigned to Editor
05 Nov 2024Review(s) Completed, Editorial Evaluation Pending
15 Nov 2024Reviewer(s) Assigned