Popfinder: A highly effective artificial neural network package for
genetic population assignment
Abstract
The ability to assign biological samples to source populations based on
genetic variation with high accuracy and precision is important for
numerous applications from ecological studies through wildlife
conservation to epidemiology. However, population assignment when
genetic differentiation is low is challenging, and methods to address
this problem are lacking. The application of artificial neural networks
to population assignment using genomic data is highly promising. Here we
present popfinder: a new, easy to use Python-based artificial neural
network pipeline for genetic population assignment. We tested popfinder
both with simulated genetic data from populations connected by varying
levels of gene flow, and with reduced-representation sequence data for
three species of seabirds with weak to no population genetic structure.
Popfinder was able to assign individuals to their source populations
with high accuracy, precision and recall in most cases, including both
simulated and empirical datasets, except in the weakest empirical
population structure dataset, where the comparator programs also
performed poorly. Compared to other available software, popfinder was
slower on the simulated data sets due to hyperparameter tuning and the
fact that it does not reduce the dimensionality of the data set;
however, all programs ran in seconds on empirical data sets.
Additionally, popfinder provides a perturbation ranking method to help
develop optimized SNP panels for genetic population assignment, and is
designed to be user-friendly. Finally, we caution users of all
assignment programs to watch both for leakage of data during model
training, and for unequal detection probabilities.