Effective sample size and population assignment
To assess our ability to accurately assign individuals of unknown origin to breeding populations, we first determined the accuracy of assignment of the known breeding origin individuals using a leave-one-out approach implemented in WGSassign (DeSaix et al. in review ). Leave-one-out avoids assignment bias by iteratively removing an individual from their given source population, re-estimating the allele frequency of the source population, and then calculating the likelihood of the individual’s assignment to each population. Another source of bias in assignment tests is variation in the precision of allele frequency estimation, which arises from populations having different numbers of samples and/or having differences in sequencing depth of their individuals. To mitigate this bias, we tested two other approaches for source population sampling design: 1) we reduced the number of samples per breeding population to be the same as the population with the fewest samples (size-standardized breeding populations; SSBPs) and 2) we followed the guidelines in DeSaix et al. (in review ) to standardize the effective sample sizes of the breeding populations (effective-size-standardized breeding populations; ESSBPs).Effective sample size is a Fisher information metric that determines the comparable number of individuals with known genotypes that would reflect the same variance in estimated allele frequency as the sampled low-coverage individuals (DeSaix et al., in review ). The purpose of ESSBPs is to equalize the effective sample size among populations by removing individuals from the populations with the highest effect sample sizes, thereby making the precision of allele frequency estimation similar among the different populations. We used WGSassign to calculate each breeding population’s effective sample size for the SSBPs and ESSBPs and performed leave-one-out assignment. We also performed standard assignment with all breeding individuals not in the standardized sets. Leave-one-out assignment for the full data set and the combined leave-one-out assignment and standard assignment accuracy were compared across all three source population sampling designs. Posterior probabilities of assignment to a population were determined by dividing the maximum likelihood of assignment over the sum of all likelihoods. A cut-off of 0.8 was used for the posterior probability to determine if an individual was confidently assigned to a population.