3.3 Origin of medfly infestation based on demographic history and phylogenetic analysis
In the DIYABC-RF analysis that tested six hypothetical evolutionary scenarios (Fig. 5), the classification votes from Scenario 1 to Scenario 6 were: 599, 411, 123, 467, 267 and 133, respectively (i.e. the number of times a scenario is selected in a random forest). Based on the classification votes and posterior probabilities, the best fit was Scenario 1, with a posterior probability of 0.596 and global and local error rates of 0.499 and 0.404, respectively (Fig. 5). The projections of data set from the training set on the linear discriminate analyses (LDA) indicated low power to discriminate the tested Scenarios 1, 2 and 4 because the observed data set was located within the cloud of their simulated data (Fig. S5-C. Supplementary information). To improve prediction quality and power of differentiation, we ran a new analysis only for Scenarios 2, 4 and 5 (selected based on the previous test results). The classification votes were 778, 1025 and 197 respectively. The best fit scenario was Scenario 4, with a posterior probability of 0.697 and global and local errors of 0.246 and 0.303, respectively (Fig. 5). The projection of data sets from the training set in this second test was located within the cloud of Scenario 2, indicating substantial power to discriminate among the tested scenarios (Fig. S5-C. Supplementary information). Overall, the demographic colonisation scenarios suggested a long and interconnected history of invasions ofC. capitata in the studied sites. Both best fit scenarios predicted Brazil divergence from the ancestral South African population. However, Scenario 1 predicted direct colonisation from Brazil to the other sampling sites, while Scenario 4 predicted that the Spain-Guatemala group originated from the admixture between lineages from South Africa and Brazil. According to these results, Brazil specimens were established by direct colonisation from South Africa and likely admixture events leading to the establishment of the remaining lineages (i.e. Spain-Guatemala and Greece-Australia).
SNAPP recovered a total of 15 consensus trees topologies. The consensus tree 1 covered 37.18% of the total cumulative trees (Fig. 6) increasing to 67.04% when the consensus trees 2 and 3 were included. The consensus tree topologies were consistent across the independent runs in which different individuals were sampled from each location (Fig. 6; Fig. S6, supporting information), indicating that subsampling did not significantly impact the topology of the SNAPP trees. The species tree revealed three highly supported lineages (PP=1) corresponding to South Africa, Brazil and a third lineage comprised of all other regions, whereas two nodes consistently showed moderate support corresponding to the divergence between Greece (PP=0.81) and Guatemala and Australia (PP=0.84) lineages. These results are consistent with the genetic clusters found in the DAPC and Structure analyses (Fig. 2 and Fig. 3). Effective population size represented in the branch thickness of the consensus tree inferred by theta-estimates showed the highest value in South Africa, followed by Spain, Brazil, and Greece with intermediate values (Fig. 6).