Results
Genotyping-by-sequencing . Using GBS technology, 839 Z. palustris samples representing the Natural Stand, Cultivated, and Temporal collections along with 50 samples of Z. aquatic a, which served as an outgroup, were sequenced to a depth of 2.5 million reads per sample. A total of 1,833,504,458 reads were generated with an average of 2,185,345 reads per sample. From these data, a total of 5,955 SNP markers were identified that met our filtering criteria, 3,005 (50.1%), of which, reside in genes. Basic marker statistics are shown in Table 1. The SNP density ranged from 1.15 - 6.37 SNPs/Mb per chromosome with a genome-wide average of 4.34 SNPs/Mb using 1Mb bins. In genic regions, the SNP density ranged from 10-20 SNPs/Mb per chromosome. The genome-wide TsTv ratio was 3.15 with a minimum of 1.50 (ZPchr0458) and a maximum of 4.18 (ZPchr0016) (Table S4). PIC ranged from 0.030 - 0.313, with an average of 0.217 (Table S5).
Genetic Diversity of NWR Collections. To visualize the variation within the diversity collection, we first performed PCoA on both the Natural Stand and Cultivated collections together (Figure 2a). The first three principal coordinates explained 12.5%, 7.7%, and 7.3% of the variance, respectively (27.5% total). Within the first coordinate, samples were primarily split into two clusters, including a cluster of all Z. aquatica samples and the majority of Natural Stand samples, and a second cluster including all Cultivated samples and Bass and Decker Lake populations from the UMR watershed (Figure 2a). Upper Rice Lake genotypes appeared to blend between these two main groups but trended more heavily toward the group containing the UMR watershed genotypes. Within the first two principal coordinates, a large range of continuous variation can be seen, with samples primarily grouped by their germplasm type and geographic origin. Overall, these two principal coordinates failed to fully separate individual populations, though mostZ. aquatica samples did separate from those of Z. palustris (Figure 2a). The Cultivated collection largely formed its own group with only a small number of samples overlapping with those from Bass, Upper Rice, Decker, Dahler, and Phantom (WI) Lakes. The third principal coordinate, in conjunction with coordinate 1, isolatedZ. aquatica from all Z. palustris samples, Natural Stand and Cultivated collections, while maintaining the separation of the two main groups defined by coordinates 1 and 2 (Figure S2).
When analyzed separately from the Cultivated collection, the first two coordinates of the Natural Stand collection explained 19% and 13% of the variance, respectively (Figure 2b). The two main clusters were consistent with Figure 2a, as was Upper Rice Lake bridging the two clusters (Figure 2b), while Z. aquatica formed a more distinct cluster in principal coordinate 2. Natural Stand populations from the SCR fell between Z. aquatica and the rest of the MN Z. palustris populations, with only one Z. palustris sample from Dahler Lake overlapping with each of these groups. Other populations appear to cluster along a geographical gradient rather than by watershed. For example, the samples from Clearwater and Ottertail Rivers in the RRN, and Plantagenet and Shell Lakes in the UMR, which are located near one another, also grouped closely together (Figure 2b; Table S2). There was also a more defined population structure among UMR populations with several smaller grouping patterns within the larger two clusters including Necktie River with Garfield Lake, and Bass Lake with Dahler and Decker Lakes populations.
Compared to the Natural Stand collection, there was far less genetic structure evident among genotypes of the 4 cultivars and 10 breeding lines of the Cultivated collection (Figure 2a and c). When the Cultivation collection was analyzed alone, the first and second principal coordinates explained 10% and 7% of the variation, respectively (Figure 2c). The most distinct cluster within the Cultivated collection consisted of samples of ‘K2’, which is a cNWR cultivar released in 1972 that has been kept spatially isolated from other cNWR populations for at least the last 20+ years. While some grouping was present between individuals of a variety or breeding population, there was an overall lack of population structure among Cultivated materials.
The UPGMA clustering analysis (Figure S3) was consistent with some of the clustering observed in the PCoA (Figure 2a-c). Monophyletic clades could be defined for many of the sampling locations, which included the majority of the samples from those populations. Notably, all samples from Clearwater River clustered, as did 98% of the samples from Ottertail River and 92% of the samples from Lake Plantagenet. In total, over 80% of samples within individual populations clustered together except for Upper Rice Lake, Shell Lake and Phantom Lake, which were distributed throughout the tree. Decker Lake and Bass Lake samples each formed two separate clusters distributed through the tree.
Four primary clades were identified in the UPGMA tree. To begin, aZ. aquatica clade was identified, which contained the vast majority of the Z. aquatica samples but had low bootstrap support (<50%). An additional 25 samples from 8 different Natural Stand populations and 2 Cultivated populations did not cluster with anything else and were found at the base of the tree. The lack of clustering within these samples is likely unknown at this time. The remaining three clades included a clade with samples from Necktie (UMR), Garfield (UMR), and Plantagenet (UMR) Lakes; a clade with samples from Upper Rice (RRN), Phantom (SCR), Dahler (UMR), Decker (UMR) and Bass (UMR) Lakes as well as Cultivated samples; and a clade with samples from Clearwater River (RRN), Shell Lake (UMR) and Ottertail River (RRN).
Within the tree, 91% of Cultivated samples grouped together and formed a cluster with 40% of samples from Upper Rice Lake, 14% each from Phantom and Bass Lakes, 8% from Dahler Lake, 6% each from Shell and Decker Lakes, and a single sample each from Lake Plantagenet andZ. aquatica populations. Consistent with the PCoA, there was minimal structure within Cultivated samples. However, unlike the PCoA, samples from Shell Lake, Necktie River, and Lake Plantagenet as well as a single sample of Z. aquatica fell within the clade containing Cultivated samples. A second Cultivated cluster at the base of the tree consisted entirely of a subset of K2 samples (10/20).
Using STRUCTURE analyses from K =2 -14 , evidence of admixture can be seen across species, geographic origin, and germplasm type (Figures 3 and S4), similar to the results of the PCoA and UPGMA analyses. Although there was a significant decrease in the DeltaK statistic between K=2 and K=3 , the lowest value was found at K=5 (Figure S5). Most notably, Z. aquatica separated from the Natural Stand populations for the first time at K=5 , while the Cultivated collection formed its own cluster and the Natural Stands grouped similarly to the PCoA and UPGMA plots (Figure 3). The vast majority of the collections showed limited admixture (<1%) between different populations with the exceptions of Upper Rice Lake and Phantom Lake. Upper Rice Lake showed heavy admixture with Decker Lake, Dahler Lake, Bass Lake, Ottertail River, and Shell Lake populations. Phantom Lake showed an average of 21.43% admixture with the Cultivated materials. Other population groupings were also informative, for example at K=3 , we observed the Cultivated collection separate into its own cluster, while the other two clusters consisted of the Natural Stand populations and Z. aquatica (Figure 3).
Interesting patterns were found at higher K values as well. AtK=10 , Z. aquatica and the Cultivated collection remained largely unchanged from K=5 as did Bass Lake, Clearwater River, and Necktie River populations (Figure S4). Populations with high admixture including Phantom and Upper Rice Lakes also remained largely unchanged. However, the Mud Hen Lake population, which showed high admixture at K=5 , formed a unique cluster at K=10 , possibly owing to its distance from other sampling sites, as one of only two lakes collected in Wisconsin. Mud Hen Lake was further delineated atK=14 , splitting into two unique clusters. The results ofK=14 were otherwise consistent with K=10 (Figure S4).
Analysis of Molecular Variance. Analysis of Molecular Variance (AMOVA) was conducted between several different groupings including: 1) Natural Stand populations vs. Natural Stand populations; 2) Cultivated lines vs. Cultivated lines; 3) Natural Stand populations vs. Cultivated lines; and 4) Z. palustris individuals vs. Z. aquatica individuals. The AMOVA results revealed more variation within rather than between groups (Table 2). All the comparative groups identified that 3.37%-8.10% of the variation could be attributed to differences among the groups, rather than within. The highest and lowest variation among groups were identified in the Natural Stand vs Natural Stand analysis (8.10%) and the Cultivated vs Cultivated analysis (3.37%), respectively (Table 2).
Gene Flow. FST values were calculated to compare the genetic differentiation within and among the Natural Stand and Cultivated collections (Figure 4). Overall, genetic differentiation between the Natural Stand populations was 0.09, with a range of 0.04 - 0.16. Pairwise comparisons that included Mud Hen Lake had the highestFST values, most above 0.13, indicating high differentiation from other Natural Stand populations. The Mud Hen Lake population was most similar to Phantom Lake (0.09) and Upper Rice Lake (0.01) populations. Overall, Necktie River and Garfield Lake (FST =0.04) and Decker and Upper Rice Lakes (FST =0.04) were the most similar populations. As seen with the results of the PCoA, Upper Rice Lake appeared similar to a number of other Natural Stand populations including Bass, Decker, and Phantom Lake populations, which all had FST values below 0.05. Although most pairwise comparisons between Z. aquatica and the Natural Stand populations hadFST values above 0.1, Upper Rice Lake and Phantom Lake values were slightly lower at FST = 0.08 and 0.07, respectively.
The pairwise comparisons between Cultivated populations had much lowerFST values, with an average of 0.03 and a range of 0.01 - 0.06, indicating less differentiation than the Natural Stand populations. For pairwise comparisons between Natural Stand and Cultivated populations, the average FST value was 0.10, with a range of 0.05 - 0.16. Phantom lake showed the highest similarity to the Cultivated populations, withFST values below 0.06 for all comparisons except that with K2 (FST =0.07). Mud Hen Lake comparisons with Cultivated populations had the highest FST values, all falling above 0.13, similar to its comparisons with other Natural Stand populations. Z. aquatica pairwise comparisons also had relatively high FST values, with a range of 0.11 - 0.13.
To further examine the basis for the observed genetic differentiation, we used a Mantel test to look for correlation between genetic and geographic distances of the Natural Stand collection. A positive correlation between the two was observed (r2 = 0.4011) and the best fit line yielded a regression equation of y = 0.1 + 0.0002x (Figure S6). Permutation tests showed that the observed results were unlikely to have occurred by chance (p -value < 0.05) (Figure S7). We were also interested in whether we could detect admixture between our Natural Stand and Cultivated collections. UsingD -statistics, we analyzed three groups including one Cultivated group, two Natural Stand groups based on PCoA and STRUCTURE analyses, along with Z. aquatica , which served as an outgroup, and found that there was no significant admixture between the groups (Z=1.66,p =0.098; Table S6).
The Temporal Collection . We examined the effect of time on the genetic diversity of two Natural Stand populations collected in 2010 and 2018, using the same approaches used to compare Natural Stand and Cultivated collections. The first principal coordinate of the PCoA for the Temporal collection separated Garfield and Shell Lakes, demonstrating population level differences. Meanwhile, the second coordinate, which explained 7% of the variation, was largely attributed to the temporal variation between Shell Lake samples collected in 2010 vs 2018 (Figure 2d). Within the first two principal components, the Garfield Lake population did not separate by time points. The Shell Lake population, on the other hand, demonstrated a wider range of variation in 2010 than it did in 2018. Additionally, a small unique cluster, not identified in 2010, was identified in 2018 (Figure 2d). Similar patterns were found in the UPGMA tree with monophyletic clades defined for both lakes (Figure S8). There was an even distribution of clustering patterns between Garfield Lake samples collected in 2010 and 2018. For Shell Lake, we observed one unique cluster consisting of 76% of the 2010 samples and 22% of the 2018 samples. The remaining samples showed minimal clustering with few differences between samples collected at different time points (Figure S8). The pairwise comparisons of each lake’s genetic differentiation over time further substantiated these findings as Garfield Lake 2010 vs. 2018 comparison had a lower FST  value of 0.0004 and Shell Lake in the same two time periods had a higher value of 0.0125 (Figure 4).
Scanning for Signatures of Selection . To identify genomic regions potentially subjected to selection or other genetic bottlenecks in cNWR, we calculated genome-wide statistics for nucleotide diversity (π),FST , and XP-CLR tests (Figure 5). While we observed considerable similarity between π values of the Natural Stand and Cultivated collections across the genome (Figure 5a), we identified 25 bins with negative Tajima’s D values in the Cultivated collection (Table S7). The most negative Tajima’s D was -0.192 at ~50 Mb on ZPchr0009. Genome-wide scans of FST values calculated between Natural Stand and Cultivated collections identified 166 individual pairwise comparisons above 0.10; 63 above 0.20; 24 above 0.30; 9 above 0.40; and 3 above 0.50 (Figure 5b). Genomic positions for these hits are listed in Table S7. The top 1% ofFST scores were identified on ZPchr0007, ZPchr0009, and ZPchr0013. Finally, we identified 9 regions of the genome with XP-CLR scores above 40 and 4 regions with scores above 60 (Figure 5c). The top 1% of these scores included 468 SNPs, including 141 hits on ZPchr0005 between 8.5-9.7 Mb, 112 hits on ZPchr0006 between 1.2-1.4 Mb, 22 hits on ZPchr0011 between 1.3-1.4 Mb, and 189 hits on ZPchr0013 between 6.2-6.8Mb (Table S7). Allele frequencies of SNPs in these regions were all fixed in the Cultivated collection and
No putative regions of interest were identified across all three tests. However, 11 5-Mb regions with the top 1% of Tajima’s D andFST scores were identified, including 2 regions on ZPchr0001; 2 regions on ZPchr0002; 2 on ZPchr0008; 2 regions on ZPchr0009; 1 on ZPchr0010; 1 on ZPchr0012; and 1 on ZPchr0015 (Table S7). Two regions, less than 1 Mb each, with overlapping top 1% ofFST and XP-CLR scores were identified on ZPchr0011 and ZPchr0013.