SNP population genomic analyses
Results of PCA and Structure were concordant and supported hierarchical population structure within this dataset. In the PCA, the first and second principal components (PCs) of the 16 localities recovered two highly divergent populations from the eastern edge of the sampled region in Manitoba: Swan River and Portage la Prairie (Fig. 2A). Two Albertan localities on the western edge of our sampling region, Athabasca and Sangudo, were less distinct but the combined effect of PC 1 and PC 2 clustered them apart from the remaining 12 central localities. These western and eastern sampling edges broadly coincide with the boundaries of canola production in the Canadian Prairies, excluding the Peace River Region of Alberta, a geographically disparate region in the Boreal Plains northwest of the rest of the prairies (western-most cluster of survey points in Fig. 1); we did not recover any CFM larvae from this region in our 2017 or 2018 surveys. Hierarchical PCA omitting the divergent Manitoba localities (i.e. “14 localities”) separated the two aforementioned western Alberta localities along PC 1 and PC 2 (Fig. 2B). Further hierarchical PCA omitting the divergent Manitoba and Alberta localities (i.e. “12 localities”) recovered little additional substructure, although three localities, Fairy Glen, Preeceville, and Dauphin, had some individuals that appeared to be genetically distinct along PC 1 and PC 2 and others that clustered with the remaining central localities (Fig. 2C).
In Structure analyses, the use of sampling location as a prior (locprior ) did not produce substantial differences in cluster assignments when compared to the analyses that did not incorporate this information (nolocprior ), thus we focus only on the latter here. We found variable support for an optimal value of K : LnPr(X |K ) displayed only a gradual plateau starting at K = 5 to 7, ΔK values were generally low (maximum ΔK = 21.8) but supported K = 2, 5, 7, and 9, and the Puechmaille statistics supported K = 5, 6, and 7 (Fig. S1). Visualization of bar charts for all values of K indicated hierarchical structure that matched the results of the PCA: K = 2 and 3 separated the two eastern-most Manitoba localities and K = 4 separated the two western-most Alberta localities. At K = 5 and 6, some individuals from two Saskatchewan localities (Fairy Glen and Preeceville) formed a distinct cluster, as was observed in the PCA (Fig. 2). Beyond K = 6 there was little meaningful structure and additional clusters were generally represented by low Q -ratios (all bar charts presented in Fig. S1). Additionally, independent hierarchical Structure analyses of the large central cluster (12 localities) supported the same divisions as the K = 6 results (Fig. 2D, Fig. S1), further supporting K = 6 as the optimal value of K . Finally, two specimens sampled in Sangudo and Athabasca clustered with the central population rather than with their collection locality and likely represent migrants (Fig. 2D).
IBD analysis using Euclidean distance and pairwiseFST /(1-FST ) values for all 16 localities was highly significant (r2 = 0.33, p-value = 0.004, Fig. 3A), and remained significant after removing the eastern Portage la Prairie and Swan River localities (r2 = 0.26, p-value = 0.03, Fig. 3B). However, pairwise point densities indicated “islands” of data points rather than a single cline tracking the regression line as would be expected if genetic divergence increased linearly with geographic distance. After additionally removing the Sangudo and Athabasca localities, IBD analysis of the remaining 12 central localities was not significant (r2 = 0.04, p-value = 0.38, Fig. 3C), suggesting that the four divergent localities were the primary drivers of the aforementioned relationships.
Values of expected and observed heterozygosity were moderate and generally similar within each population, except for Swan River and Portage la Prairie, which both had slight heterozygote excess (Ho = 0.24, He = 0.16 in both populations, Table 1), and North Battleford, which had lower observed values of heterozygosity (HO = 0.14,HE = 0.21). We note however that the North Battleford population had far higher levels of missing data than the other populations (average missing data of North Battleford population = 45%; average missing data across remaining populations = 9%). PairwiseFST values ranged from 0 - 0.39 (Table 1), and were lower between central populations (0 - 0.17), and higher in comparisons including at least one of the four divergent populations (Swan River, Portage la Prairie, Sangudo, and Athabasca) recovered in the PCA and Structure analyses (0.13 - 0.39).