Population genetic structure
Genetic clustering and admixture analyses of populations of A.
alalia and A. mantiqueira were performed using three alternative
methods to identify population structure. The matrix of SNPs generated
in the population program was considered for these methods, with
the sampling locations as groups. The Discriminant Analysis of Principal
Components (DAPC) was performed to estimate the genetic structure in the
R package adegenet (Jombart, 2008). The number of retained PCs was
estimated by the a-score to optimize the trade-off between the power of
discrimination and overfitting of the data. Identification of the
clusters was achieved by the find.clusters function, and the number of
clusters (K) was evaluated regarding the Bayesian Information Criteria.
To detect structuring among the populations of A. mantiqueira , K
values close to the optimum were also investigated (Supplementary
Material).
The second method used to infer population structure used sparse, a
non-negative matrix factorization (sNMF) to estimate individual fly
admixture coefficients implemented in the R package LEA with thesnmf function (Frichot & François, 2015). The estimates of
ancestry coefficients are similar to the program STRUCTURE (Pritchard et
al., 2000) for out-crossing species, and the estimates can be more
accurate in the presence of inbreeding (Frichot et al., 2014). The sNMF
algorithm tested K values from 1 to 8, with 200 repetitions per K value
and other options set to default values in all cases. The best fit value
of K was selected using the cross-entropy criterion as detailed in the
manual of the LEA package.
The third method incorporated spatial information to inform individual
ancestry estimates using the R package Tess3r (Caye et al., 2016). We
used the default values of the program, except for the maximum number of
iterations of the optimization algorithm, which was increased to 1,000.
The optimal value of K is inferred when the cross-validation curve
exhibits a plateau or starts increasing, but the cross-validation
criterion did not exhibit a minimum value or a plateau (Supplementary
Material). Therefore, the K values obtained in the other cluster
analyses (DAPC and sNMF) were investigated.
Overall genetic structure was estimated by a nonhierarchical analysis of
molecular variance (AMOVA) using the software Arlequin v. 3.5 (Excoffier
& Lischer, 2010). Hierarchical AMOVA was conducted among: (i) A.
alalia and A. mantiqueira and (ii) clusters of sampling
localities identified by the three clustering methods (DAPC, sNMF,
Tess3r). Genetic structure was interpreted from the Φ statistics
associated with different hierarchical levels at which variation is
distributed (Excoffier et al., 1992). The significance of the
ΦST values was evaluated using 10,000 permutations, a
computed distance matrix using pairwise difference, and gamma value = 0.
Slatkin pairwise FST values were also estimated in
Arlequin (Slatkin, 1995).