SNP population genomic analyses
Results of PCA and Structure were concordant and supported hierarchical
population structure within this dataset. In the PCA, the first and
second principal components (PCs) of the 16 localities recovered two
highly divergent populations from the eastern edge of the sampled region
in Manitoba: Swan River and Portage la Prairie (Fig. 2A). Two Albertan
localities on the western edge of our sampling region, Athabasca and
Sangudo, were less distinct but the combined effect of PC 1 and PC 2
clustered them apart from the remaining 12 central localities. These
western and eastern sampling edges broadly coincide with the boundaries
of canola production in the Canadian Prairies, excluding the Peace River
Region of Alberta, a geographically disparate region in the Boreal
Plains northwest of the rest of the prairies (western-most cluster of
survey points in Fig. 1); we did not recover any CFM larvae from this
region in our 2017 or 2018 surveys. Hierarchical PCA omitting the
divergent Manitoba localities (i.e. “14 localities”) separated the two
aforementioned western Alberta localities along PC 1 and PC 2 (Fig. 2B).
Further hierarchical PCA omitting the divergent Manitoba and Alberta
localities (i.e. “12 localities”) recovered little additional
substructure, although three localities, Fairy Glen, Preeceville, and
Dauphin, had some individuals that appeared to be genetically distinct
along PC 1 and PC 2 and others that clustered with the remaining central
localities (Fig. 2C).
In Structure analyses, the use of sampling location as a prior
(locprior ) did not produce substantial differences in cluster
assignments when compared to the analyses that did not incorporate this
information (nolocprior ), thus we focus only on the latter here.
We found variable support for an optimal value of K :
LnPr(X |K ) displayed only a gradual plateau
starting at K = 5 to 7, ΔK values were generally low
(maximum ΔK = 21.8) but supported K = 2, 5, 7, and 9, and
the Puechmaille statistics supported K = 5, 6, and 7 (Fig. S1).
Visualization of bar charts for all values of K indicated
hierarchical structure that matched the results of the PCA: K = 2
and 3 separated the two eastern-most Manitoba localities and K =
4 separated the two western-most Alberta localities. At K = 5 and
6, some individuals from two Saskatchewan localities (Fairy Glen and
Preeceville) formed a distinct cluster, as was observed in the PCA (Fig.
2). Beyond K = 6 there was little meaningful structure and
additional clusters were generally represented by low Q -ratios
(all bar charts presented in Fig. S1). Additionally, independent
hierarchical Structure analyses of the large central cluster (12
localities) supported the same divisions as the K = 6 results
(Fig. 2D, Fig. S1), further supporting K = 6 as the optimal value
of K . Finally, two specimens sampled in Sangudo and Athabasca
clustered with the central population rather than with their collection
locality and likely represent migrants (Fig. 2D).
IBD analysis using Euclidean distance and pairwiseFST /(1-FST ) values for all
16 localities was highly significant (r2 =
0.33, p-value = 0.004, Fig. 3A), and remained significant after removing
the eastern Portage la Prairie and Swan River localities
(r2 = 0.26, p-value = 0.03, Fig. 3B). However,
pairwise point densities indicated “islands” of data points rather
than a single cline tracking the regression line as would be expected if
genetic divergence increased linearly with geographic distance. After
additionally removing the Sangudo and Athabasca localities, IBD analysis
of the remaining 12 central localities was not significant
(r2 = 0.04, p-value = 0.38, Fig. 3C),
suggesting that the four divergent localities were the primary drivers
of the aforementioned relationships.
Values of expected and observed heterozygosity were moderate and
generally similar within each population, except for Swan River and
Portage la Prairie, which both had slight heterozygote excess
(Ho = 0.24, He = 0.16 in
both populations, Table 1), and North Battleford, which had lower
observed values of heterozygosity (HO = 0.14,HE = 0.21). We note however that the North
Battleford population had far higher levels of missing data than the
other populations (average missing data of North Battleford population =
45%; average missing data across remaining populations = 9%). PairwiseFST values ranged from 0 - 0.39 (Table 1), and
were lower between central populations (0 - 0.17), and higher in
comparisons including at least one of the four divergent populations
(Swan River, Portage la Prairie, Sangudo, and Athabasca) recovered in
the PCA and Structure analyses (0.13 - 0.39).