Results
Genotyping-by-sequencing . Using GBS technology, 839 Z.
palustris samples representing the Natural Stand, Cultivated, and
Temporal collections along with 50 samples of Z. aquatic a, which
served as an outgroup, were sequenced to a depth of 2.5 million reads
per sample. A total of 1,833,504,458 reads were generated with an
average of 2,185,345 reads per sample. From these data, a total of 5,955
SNP markers were identified that met our filtering criteria, 3,005
(50.1%), of which, reside in genes. Basic marker statistics are shown
in Table 1. The SNP density ranged from 1.15 - 6.37 SNPs/Mb per
chromosome with a genome-wide average of 4.34 SNPs/Mb using 1Mb bins. In
genic regions, the SNP density ranged from 10-20 SNPs/Mb per chromosome.
The genome-wide TsTv ratio was 3.15 with a minimum of 1.50 (ZPchr0458)
and a maximum of 4.18 (ZPchr0016) (Table S4). PIC ranged from 0.030 -
0.313, with an average of 0.217 (Table S5).
Genetic Diversity of NWR Collections. To visualize the variation
within the diversity collection, we first performed PCoA on both the
Natural Stand and Cultivated collections together (Figure 2a). The first
three principal coordinates explained 12.5%, 7.7%, and 7.3% of the
variance, respectively (27.5% total). Within the first coordinate,
samples were primarily split into two clusters, including a cluster of
all Z. aquatica samples and the majority of Natural Stand
samples, and a second cluster including all Cultivated samples and Bass
and Decker Lake populations from the UMR watershed (Figure 2a). Upper
Rice Lake genotypes appeared to blend between these two main groups but
trended more heavily toward the group containing the UMR watershed
genotypes. Within the first two principal coordinates, a large range of
continuous variation can be seen, with samples primarily grouped by
their germplasm type and geographic origin. Overall, these two principal
coordinates failed to fully separate individual populations, though mostZ. aquatica samples did separate from those of Z.
palustris (Figure 2a). The Cultivated collection largely formed its own
group with only a small number of samples overlapping with those from
Bass, Upper Rice, Decker, Dahler, and Phantom (WI) Lakes. The third
principal coordinate, in conjunction with coordinate 1, isolatedZ. aquatica from all Z. palustris samples, Natural Stand
and Cultivated collections, while maintaining the separation of the two
main groups defined by coordinates 1 and 2 (Figure S2).
When analyzed separately from the Cultivated collection, the first two
coordinates of the Natural Stand collection explained 19% and 13% of
the variance, respectively (Figure 2b). The two main clusters were
consistent with Figure 2a, as was Upper Rice Lake bridging the two
clusters (Figure 2b), while Z. aquatica formed a more distinct
cluster in principal coordinate 2. Natural Stand populations from the
SCR fell between Z. aquatica and the rest of the MN Z.
palustris populations, with only one Z. palustris sample from
Dahler Lake overlapping with each of these groups. Other populations
appear to cluster along a geographical gradient rather than by
watershed. For example, the samples from Clearwater and Ottertail Rivers
in the RRN, and Plantagenet and Shell Lakes in the UMR, which are
located near one another, also grouped closely together (Figure 2b;
Table S2). There was also a more defined population structure among UMR
populations with several smaller grouping patterns within the larger two
clusters including Necktie River with Garfield Lake, and Bass Lake with
Dahler and Decker Lakes populations.
Compared to the Natural Stand collection, there was far less genetic
structure evident among genotypes of the 4 cultivars and 10 breeding
lines of the Cultivated collection (Figure 2a and c). When the
Cultivation collection was analyzed alone, the first and second
principal coordinates explained 10% and 7% of the variation,
respectively (Figure 2c). The most distinct cluster within the
Cultivated collection consisted of samples of ‘K2’, which is a cNWR
cultivar released in 1972 that has been kept spatially isolated from
other cNWR populations for at least the last 20+ years. While some
grouping was present between individuals of a variety or breeding
population, there was an overall lack of population structure among
Cultivated materials.
The UPGMA clustering analysis (Figure S3) was consistent with some of
the clustering observed in the PCoA (Figure 2a-c). Monophyletic clades
could be defined for many of the sampling locations, which included the
majority of the samples from those populations. Notably, all samples
from Clearwater River clustered, as did 98% of the samples from
Ottertail River and 92% of the samples from Lake Plantagenet. In total,
over 80% of samples within individual populations clustered together
except for Upper Rice Lake, Shell Lake and Phantom Lake, which were
distributed throughout the tree. Decker Lake and Bass Lake samples each
formed two separate clusters distributed through the tree.
Four primary clades were identified in the UPGMA tree. To begin, aZ. aquatica clade was identified, which contained the vast
majority of the Z. aquatica samples but had low bootstrap support
(<50%). An additional 25 samples from 8 different Natural Stand
populations and 2 Cultivated populations did not cluster with anything
else and were found at the base of the tree. The lack of clustering
within these samples is likely unknown at this time. The remaining three
clades included a clade with samples from Necktie (UMR), Garfield (UMR),
and Plantagenet (UMR) Lakes; a clade with samples from Upper Rice (RRN),
Phantom (SCR), Dahler (UMR), Decker (UMR) and Bass (UMR) Lakes as well
as Cultivated samples; and a clade with samples from Clearwater River
(RRN), Shell Lake (UMR) and Ottertail River (RRN).
Within the tree, 91% of Cultivated samples grouped together and formed
a cluster with 40% of samples from Upper Rice Lake, 14% each from
Phantom and Bass Lakes, 8% from Dahler Lake, 6% each from Shell and
Decker Lakes, and a single sample each from Lake Plantagenet andZ. aquatica populations. Consistent with the PCoA, there was
minimal structure within Cultivated samples. However, unlike the PCoA,
samples from Shell Lake, Necktie River, and Lake Plantagenet as well as
a single sample of Z. aquatica fell within the clade containing
Cultivated samples. A second Cultivated cluster at the base of the tree
consisted entirely of a subset of K2 samples (10/20).
Using STRUCTURE analyses from K =2 -14 , evidence of
admixture can be seen across species, geographic origin, and germplasm
type (Figures 3 and S4), similar to the results of the PCoA and UPGMA
analyses. Although there was a significant decrease in the DeltaK
statistic between K=2 and K=3 , the lowest value was found
at K=5 (Figure S5). Most notably, Z. aquatica separated
from the Natural Stand populations for the first time at K=5 ,
while the Cultivated collection formed its own cluster and the Natural
Stands grouped similarly to the PCoA and UPGMA plots (Figure 3). The
vast majority of the collections showed limited admixture (<1%)
between different populations with the exceptions of Upper Rice Lake and
Phantom Lake. Upper Rice Lake showed heavy admixture with Decker Lake,
Dahler Lake, Bass Lake, Ottertail River, and Shell Lake populations.
Phantom Lake showed an average of 21.43% admixture with the Cultivated
materials. Other population groupings were also informative, for example
at K=3 , we observed the Cultivated collection separate into its
own cluster, while the other two clusters consisted of the Natural Stand
populations and Z. aquatica (Figure 3).
Interesting patterns were found at higher K values as well. AtK=10 , Z. aquatica and the Cultivated collection remained
largely unchanged from K=5 as did Bass Lake, Clearwater River,
and Necktie River populations (Figure S4). Populations with high
admixture including Phantom and Upper Rice Lakes also remained largely
unchanged. However, the Mud Hen Lake population, which showed high
admixture at K=5 , formed a unique cluster at K=10 ,
possibly owing to its distance from other sampling sites, as one of only
two lakes collected in Wisconsin. Mud Hen Lake was further delineated atK=14 , splitting into two unique clusters. The results ofK=14 were otherwise consistent with K=10 (Figure S4).
Analysis of Molecular Variance. Analysis of Molecular Variance
(AMOVA) was conducted between several different groupings including: 1)
Natural Stand populations vs. Natural Stand populations; 2) Cultivated
lines vs. Cultivated lines; 3) Natural Stand populations vs. Cultivated
lines; and 4) Z. palustris individuals vs. Z. aquatica individuals. The AMOVA results revealed more variation within
rather than between groups (Table 2). All the comparative groups
identified that 3.37%-8.10% of the variation could be attributed to
differences among the groups, rather than within. The highest and lowest
variation among groups were identified in the Natural Stand vs Natural
Stand analysis (8.10%) and the Cultivated vs Cultivated analysis
(3.37%), respectively (Table 2).
Gene Flow. FST values were calculated to
compare the genetic differentiation within and among the Natural Stand
and Cultivated collections (Figure 4). Overall, genetic differentiation
between the Natural Stand populations was 0.09, with a range of 0.04 -
0.16. Pairwise comparisons that included Mud Hen Lake had the highestFST values, most above 0.13, indicating high
differentiation from other Natural Stand populations. The Mud Hen Lake
population was most similar to Phantom Lake (0.09) and Upper Rice Lake
(0.01) populations. Overall, Necktie River and Garfield Lake
(FST =0.04) and Decker and Upper Rice Lakes
(FST =0.04) were the most similar populations. As
seen with the results of the PCoA, Upper Rice Lake appeared similar to a
number of other Natural Stand populations including Bass, Decker, and
Phantom Lake populations, which all had FST values below 0.05. Although most pairwise comparisons between Z.
aquatica and the Natural Stand populations hadFST values above 0.1, Upper Rice Lake and Phantom
Lake values were slightly lower at FST = 0.08 and
0.07, respectively.
The pairwise comparisons between Cultivated populations had much lowerFST values, with an average of 0.03 and a range
of 0.01 - 0.06, indicating less differentiation than the Natural Stand
populations. For pairwise comparisons between Natural Stand and
Cultivated populations, the average FST value was
0.10, with a range of 0.05 - 0.16. Phantom lake showed the highest
similarity to the Cultivated populations, withFST values below 0.06 for all comparisons except
that with K2 (FST =0.07). Mud Hen Lake comparisons
with Cultivated populations had the highest FST values, all falling above 0.13, similar to its comparisons with other
Natural Stand populations. Z. aquatica pairwise comparisons also
had relatively high FST values, with a range of
0.11 - 0.13.
To further examine the basis for the observed genetic differentiation,
we used a Mantel test to look for correlation between genetic and
geographic distances of the Natural Stand collection. A positive
correlation between the two was observed (r2 = 0.4011)
and the best fit line yielded a regression equation of y = 0.1 + 0.0002x
(Figure S6). Permutation tests showed that the observed results were
unlikely to have occurred by chance (p -value < 0.05)
(Figure S7). We were also interested in whether we could detect
admixture between our Natural Stand and Cultivated collections. UsingD -statistics, we analyzed three groups including one Cultivated
group, two Natural Stand groups based on PCoA and STRUCTURE analyses,
along with Z. aquatica , which served as an outgroup, and found
that there was no significant admixture between the groups (Z=1.66,p =0.098; Table S6).
The Temporal Collection . We examined the effect of time on the
genetic diversity of two Natural Stand populations collected in 2010 and
2018, using the same approaches used to compare Natural Stand and
Cultivated collections. The first principal coordinate of the PCoA for
the Temporal collection separated Garfield and Shell Lakes,
demonstrating population level differences. Meanwhile, the second
coordinate, which explained 7% of the variation, was largely attributed
to the temporal variation between Shell Lake samples collected in 2010
vs 2018 (Figure 2d). Within the first two principal components, the
Garfield Lake population did not separate by time points. The Shell Lake
population, on the other hand, demonstrated a wider range of variation
in 2010 than it did in 2018. Additionally, a small unique cluster, not
identified in 2010, was identified in 2018 (Figure 2d). Similar patterns
were found in the UPGMA tree with monophyletic clades defined for both
lakes (Figure S8). There was an even distribution of clustering patterns
between Garfield Lake samples collected in 2010 and 2018. For Shell
Lake, we observed one unique cluster consisting of 76% of the 2010
samples and 22% of the 2018 samples. The remaining samples showed
minimal clustering with few differences between samples collected at
different time points (Figure S8). The pairwise comparisons of each
lake’s genetic differentiation over time further substantiated these
findings as Garfield Lake 2010 vs. 2018 comparison had a lower
FST value of 0.0004 and Shell Lake in the same
two time periods had a higher value of 0.0125 (Figure 4).
Scanning for Signatures of Selection . To identify genomic regions
potentially subjected to selection or other genetic bottlenecks in cNWR,
we calculated genome-wide statistics for nucleotide diversity (π),FST , and XP-CLR tests (Figure 5). While we
observed considerable similarity between π values of the Natural Stand
and Cultivated collections across the genome (Figure 5a), we identified
25 bins with negative Tajima’s D values in the Cultivated collection
(Table S7). The most negative Tajima’s D was -0.192 at ~50
Mb on ZPchr0009. Genome-wide scans of FST values
calculated between Natural Stand and Cultivated collections identified
166 individual pairwise comparisons above 0.10; 63 above 0.20; 24 above
0.30; 9 above 0.40; and 3 above 0.50 (Figure 5b). Genomic positions for
these hits are listed in Table S7. The top 1% ofFST scores were identified on ZPchr0007,
ZPchr0009, and ZPchr0013. Finally, we identified 9 regions of the genome
with XP-CLR scores above 40 and 4 regions with scores above 60 (Figure
5c). The top 1% of these scores included 468 SNPs, including 141 hits
on ZPchr0005 between 8.5-9.7 Mb, 112 hits on ZPchr0006 between 1.2-1.4
Mb, 22 hits on ZPchr0011 between 1.3-1.4 Mb, and 189 hits on ZPchr0013
between 6.2-6.8Mb (Table S7). Allele frequencies of SNPs in these
regions were all fixed in the Cultivated collection and
No putative regions of interest were identified across all three tests.
However, 11 5-Mb regions with the top 1% of Tajima’s D andFST scores were identified, including 2 regions
on ZPchr0001; 2 regions on ZPchr0002; 2 on ZPchr0008; 2 regions on
ZPchr0009; 1 on ZPchr0010; 1 on ZPchr0012; and 1 on ZPchr0015 (Table
S7). Two regions, less than 1 Mb each, with overlapping top 1% ofFST and XP-CLR scores were identified on
ZPchr0011 and ZPchr0013.