Population genetic structure
Genetic clustering and admixture analyses of populations of A. alalia and A. mantiqueira were performed using three alternative methods to identify population structure. The matrix of SNPs generated in the population program was considered for these methods, with the sampling locations as groups. The Discriminant Analysis of Principal Components (DAPC) was performed to estimate the genetic structure in the R package adegenet (Jombart, 2008). The number of retained PCs was estimated by the a-score to optimize the trade-off between the power of discrimination and overfitting of the data. Identification of the clusters was achieved by the find.clusters function, and the number of clusters (K) was evaluated regarding the Bayesian Information Criteria. To detect structuring among the populations of A. mantiqueira , K values close to the optimum were also investigated (Supplementary Material).
The second method used to infer population structure used sparse, a non-negative matrix factorization (sNMF) to estimate individual fly admixture coefficients implemented in the R package LEA with thesnmf function (Frichot & François, 2015). The estimates of ancestry coefficients are similar to the program STRUCTURE (Pritchard et al., 2000) for out-crossing species, and the estimates can be more accurate in the presence of inbreeding (Frichot et al., 2014). The sNMF algorithm tested K values from 1 to 8, with 200 repetitions per K value and other options set to default values in all cases. The best fit value of K was selected using the cross-entropy criterion as detailed in the manual of the LEA package.
The third method incorporated spatial information to inform individual ancestry estimates using the R package Tess3r (Caye et al., 2016). We used the default values of the program, except for the maximum number of iterations of the optimization algorithm, which was increased to 1,000. The optimal value of K is inferred when the cross-validation curve exhibits a plateau or starts increasing, but the cross-validation criterion did not exhibit a minimum value or a plateau (Supplementary Material). Therefore, the K values obtained in the other cluster analyses (DAPC and sNMF) were investigated.
Overall genetic structure was estimated by a nonhierarchical analysis of molecular variance (AMOVA) using the software Arlequin v. 3.5 (Excoffier & Lischer, 2010). Hierarchical AMOVA was conducted among: (i) A. alalia and A. mantiqueira and (ii) clusters of sampling localities identified by the three clustering methods (DAPC, sNMF, Tess3r). Genetic structure was interpreted from the Φ statistics associated with different hierarchical levels at which variation is distributed (Excoffier et al., 1992). The significance of the ΦST values was evaluated using 10,000 permutations, a computed distance matrix using pairwise difference, and gamma value = 0. Slatkin pairwise FST values were also estimated in Arlequin (Slatkin, 1995).