Correspondence:
Elizabeth Kierepka
North Carolina Museum of Natural Sciences
11 W Jones Street
Raleigh, NC 27601
EMAIL: emkierep@ncsu.edu
Word Count: 5,589
ABSTRACT
Pleistocene glaciation events had a dramatic impact on temperate taxa by
displacing animal and plant populations south of ice sheets into glacial
refugia. Genetic variation often reflects these histories of isolation
within glacial refugia and subsequent recolonization. The highly
speciose rodent genus Peromyscus , in particular, is well known
for its rapid diversification during the Pleistocene. Peromyscus are also significant reservoirs for a myriad of zoonoses, and many
cosmopolitan species are undergoing range expansions due to human land
use and climate change. This study focused on the range-wide
phylogeography of the white-footed mouse (Peromyscus leucopus ), a
common species found in eastern North America that is one the primary
reservoirs for Lyme Disease (Borrelia burgdorferi ). We used two
mitochondrial genes, cytochrome b and control region, to identify
evolutionary lineages of white-footed mice and characterize patterns of
expansion of each lineage across their geographic range. Overall, we
found evidence for four evolutionary lineages with a Southwest lineage
largely restricted to grassland and desert habitats. Time since recent
common ancestors placed all lineages diverging within the Last Glacial
Maximum (~19-25k years ago). All lineages exhibited
signatures of expansion, particularly the two northern lineages known to
host Lyme Disease. Overall, white-footed mice underwent rapid
diversification similar to other Peromyscus species and
potentially exhibit habitat-based divergence within the Southwest
lineage. Signatures of expansion also indicate that white-footed mice
will continue to facilitate increased spread of zoonoses like Lyme
Disease, but further study is needed to clarify how these evolutionary
dynamics interact with other factors associated with human disease
incidence.
Keywords: phylogeography, white-footed mouse, Pleistocene,Peromyscus leucopus , Lyme Disease
INTRODUCTION
Quaternary climatic cycles greatly influenced numerous species across
the northern hemisphere by forcing populations into refugia south of
glaciated areas. Contemporary populations often retain the genetic
signatures of isolation within separate refugia and subsequent
recolonization after glacial recession (Hewitt 2004). Considerable
diversity exists among responses to glaciation, owing to an interplay
between species’ life history (e.g., dispersal capability, degree of
ecological specialization, geographic distribution; Shafer et al. 2010,
Kierepka and Latch 2016) and landscape factors (e.g., presence of
contemporary barriers and number of occupied refugia). This interaction
is particularly evident in wide-ranging, generalist species where they
potentially could have occupied multiple refugia and utilized distinct
colonization routes, making it difficult to infer how such species
responded to Pleistocene glaciation events. Indeed, wide-ranging
generalist species exhibit diverse responses to glaciation, ranging from
a single lineage (Kierepka and Latch 2016) and simple eastern and
western lineages (Reding et al. 2012, Kierepka et al. 2023, Klicka et
al. 2023) to multiple refugial lineages (Puckett et al. 2015) and
formation of cryptic species (Burbrink et al. 2022, McDonough et al.
2022).
The murid genus Peromyscus exemplifies the difficulty in
inferring responses to Pleistocene glaciation due to their complex
evolutionary histories. Peromyscus is a highly speciose group
that occupies every terrestrial biome in North America, resulting in
complex patterns of speciation across numerous refugia and geographic
barriers. Despite these evolutionary dynamics, Peromyscus mice
exhibit very similar morphologies, which makes it difficult to identify
individual species and evolutionary lineages. Indeed, many taxa require
genetic identification to conclusively differentiate species when they
occur in sympatry. With genetic data, multiple cryptic species and
thirteen species complexes have been found within previously thought
cosmopolitan species (e.g., Bradley et al. 2007, Castaneda-Rico et al.
2014). Many of these newly described species have relatively narrow
geographic ranges within western biogeographic hotspots like Mexico
(Bradley et al. 2022, Leon-Tapia et al. 2022), Pacific Northwest (Boria
and Blois 2023), and California (Riddle et al. 2000, Boria and Blois
2023) while several species are still considered generalists with large
geographic ranges.
While many Peromyscus species have small geographic ranges,
others are highly cosmopolitan and have experienced range expansion due
to human land use and climate change. These expansions are of particular
concern in eastern North America because Peromyscus are the
primary reservoirs of multiple tick-borne illnesses, principallyBorrelia burgdorferi , the most common causative agent of Lyme
disease. The most competent reservoir for B. burgdorferi is the
white-footed mouse, Peromyscus leucopus , an abundant rodent
throughout central and eastern North America. P. leucopus is
found in many habitats including highly urban habitats with a
distribution range stretching from central Mexico and Arizona in the
west to southern coastal Canada in the east. Numerous studies have
documented the northward expansion of P. leucopus and associatedB. burgdorferi into Canada and the Great Lakes region
(Roy-Dufresne et al. 2013, Fiset et al. 2015, Garcia-Elfring et al.
2017).
Previous work with P. leucopus in eastern North America has
identified two monophyletic evolutionary lineages in eastern North
America: a broadly distributed eastern lineage (East) and another
lineage in the Upper Midwest and north of the St. Lawrence River
(Midwest; Rowe et al. 2006, Fiset et al. 2015). These lineages contain 8
morphological subspecies that correspond to five islands and three
mainland groups (Hall 1981). Strong genetic signatures of expansion have
been detected in both lineages along their northern border in Michigan
and Canada (Fiset et al. 2015, Prado et al. 2022). These lineages appear
to have formed in allopatry during the Pleistocene (Rowe et al. 2006,
Moscarella et al. 2019, Prado et al. 2022). Several studies have
recorded additional lineages west of these lineages, but had low support
in phylogenetic analysis or were based on a few individuals (e.g.,
Shipp-Pennock et al. 2005, Rowe et al. 2006, Herrera 2021).
Western P. leucopus populations occur across multiple grassland
and desert habitats unlike the largely forest-associated East and
Midwest lineages. The westernmost extent of P. leucopus occurs in
the desert southwest, a well-recognized glacial refugium for many
species (Riddle and Hafner 2006, McDonnough et al. 2022). WesternP. leucopus also contain three chromosomal inversions that
separate them from eastern populations, and thus, have been considered
chromosomal “races” (Baker et al. 1983). The three inversion
difference between western and eastern chromosomal races also represents
a higher divergence than other Peromyscus species pairs (Baker et
al. 1983). However, these two chromosomal races meet in Oklahoma and
appear to inbreed extensively (Stangl 1986), making it unclear if these
races correspond to evolutionary lineages like East and Midwest. Taken
together, it is likely there is further divergence found in P.
leucopus beyond the well-recognized East and Midwest lineages with at
least one more lineage in the west. Thus, this study sampled across the
range of Peromyscus leucopus to examine the species-wide effects
of Pleistocene glaciation on observed genetic variation.
Collectively, Pleistocene glaciation and large water barriers (e.g.,
Atlantic Ocean, Great Lakes, and St. Lawrence River) appear to have had
the largest impact on P. leucopus within the eastern portion of
their range (e.g., Shipp-Pennock et al. 2005, Rowe et al. 2006, Fiset et
al. 2015, Moscarella 2019). Much of the habitat within these areas are
forested, which allows high gene flow. More habitat heterogeneity and
geographic barriers exist in the western portion of P. leucopus ’
range, increasing the potential for higher genetic differentiation.
Indeed, western P. leucopus is separated into 11 morphological
subspecies (9 mainland, 2 island; Hall 1981). Peromyscus spp .
often speciate due to habitat heterogeneity in biodiversity hotspots
(e.g., León-Tapia et al. 2022), and western P. leucopus occur
within areas known for high endemism (e.g., desert southwest and
Mexico). We predict that Pleistocene glaciation influenced the western
portion of P. leucopus range and that the western populations
form a separate evolutionary lineage from East and Midwest. Furthermore,
given the known species’ ability to colonize variety of habitats and
subsequently expand its range, we predict strong signatures of
population expansion in all evolutionary lineages.
METHODS
Sample Collection
We collected putative P. leucopus tissues (n = 157) from museums,
NSF NEON tissue collection, and field sampled animals from across their
range (Table S1). Collection emphasized animals caught west of the
Mississippi River including Kansas, Mexico, Nebraska, New Mexico, North
Dakota, Oklahoma, and Texas (n = 85; Fig 1). The remaining individuals
originated from areas east of the Mississippi River including previously
glaciated areas like Canada, New England, and the Great Lakes regions.
All samples were georeferenced, and stored in a -20°C until DNA
extraction.
Laboratory Methods
All tissues were extracted via Zymo Miniprep tissue kits following the
manufacturer guidelines. We amplified two mitochondrial genes: 750 bp of
cytochrome b and the entire control region (920 bp) using published
mammal (MTCB-F: 5’-CCHCCATAAATAGGNGAAGG-3’, MTCB-R:
(5’-WAGAAYTTCAGCTTTGGG-3’; Naidu et al. 2012) and custom (F:
5’-CCAAAGCTGATATTCTATTTAAAC-3’, R: 5’-ATAAGGCTAGGACCAAACCT-3’, Internal:
5’-ACATATCTGCGTTATCTTACATAC-3’) primer sets. We chose these two regions
as they are the most commonly used in phylogeographic literature onP. leucopus , allowing for inclusion of Genbank sequences.
Polymerase chain reactions (10 µL total volume) contained 5-10 ng of
extracted DNA, 10 nM of forward and reverse primer, 2.0 mM of
MgCl2, 5.0 µL of 1:5 mix of Takara Taq Polymerase and
Promega GoTaq MasterMix, and 2.9 µL nuclease-free water. PCRs began with
initiation at 95 oC for 2 min then 35 cycles of
denaturing at 95 oC for 15 seconds, annealing at 52
(cyt b) or 54oC (control region) for 30 seconds, and
extension at 72 oC for 1 min, completed by an
extension at 72 oC for 10 min. We sequenced the
forward primer for cytochrome b and the forward and internal primers for
control region to obtain the final dataset. Sequencing reactions
initiated at 96 oC for 3 min then underwent 30 cycles
at 96 oC for 10 seconds, 50 oC for 5
seconds, and 60 oC for 2.5 min. Sequencing reactions
were then cleaned via an ethanol precipitation method (Latch and Rhodes
2005) and sequenced on an ABI 3500. All sequences were aligned and
concatenated in Geneious Prime (Kearse et al. 2012).
Species Identification
Peromyscus spp. can be extremely difficult to distinguish based
on morphology, and P. leucopus ’ range overlaps with multiple
other species including both wide-ranging species (e.g., P.
maniculatus species group) and range-restricted species
(Peromyscus eremicus species group). Therefore, we confirmed
species identification via BLAST within Geneious Prime (Kearase et al.
2012). Of the 157 tissue samples, 27 were genetically identified as
different species (P. maniculatus, pectoralis, attwateri,
gossypinus, and eremicus ) and 12 failed to amplify at both
mitochondrial regions, leaving a total of 118 P. leucopus samples
for phylogeographic analysis.
Combined Mitochondrial Dataset Analysis
To reconstruct phylogenetic relationships among P. leucopus , we
used Bayesian coalescent models in BEAST v 2.6.6 (Bouckaert et al.
2019). Model parameters were set in BEAUti and included the following
priors: coalescent constant population size, clock rate of 5% and 20%
per million years for cytochrome b and control region respectively, and
a strict molecular clock. With these priors, we performed Bayesian
Markov chain Monte Carlo searches to estimate the most recent common
ancestor for each major lineage. We sampled trees and divergence dates
every 10,000 iterations for 10,000,000 generations, and checked model
convergence in Tracer v.1.7.1 (Rambaut et al. 2018). Final consensus
trees were visualized in FigTree v.1.4.4 (Rambaut 2018).
To complement the tree-building analysis, we constructed a median
joining haplotype network within PopArt version 1.7 (Leigh and Bryant
2015). Large gaps of mutational steps are expected to separate
Pleistocene lineages, allowing comparison to the BEAST analysis.
Additionally, we performed a principal coordinates analysis (PCoA) with
individual-based Kimura’s 2-parameters distance (Kimura 1980). These
distances were calculated in the R package ape via the “dist.dna”
function (Paradis et al. 2004). We tested other evolutionary models for
genetic distance between individuals, and the results were similar
across all distances. We expected lineages to be most distant from one
another within the PCoA and correspond to the BEAST analysis and
haplotype network.
For each lineage and the total dataset, we calculated measures of
diversity within DNAsp v. 6 (Rozas et al. 2017). These metrics included
number of haplotypes (h), haplotype diversity (hD),
nucleotide diversity (π), and mean number of pairwise nucleotide
differences (k). We also calculated FST between all
lineages in Arlequin v. 3.5.2.2 (Excoffier and Lischer 2010). Finally,
we used Arlequin to estimate two metrics, Fu’s Fs (Fu 1997) and Tajima’s
D (Tajima 1989), as both are expected to be significantly negative if a
lineage underwent rapid expansion.
Reduced Control Region Dataset Analysis
We also downloaded 216 Genbank sequences of the control region to
increase our geographic coverage. These sequences were georeferenced
based on their original studies. Our control region sequences were
trimmed to 636 bp to maximize power while allowing for additional
sequences. In total, this reduced control region dataset included 334
sequences. We repeated the analyses described above (BEAST, haplotype
network, PCoA, diversity, Fst, Fu’s Fs, and Tajima’s D) for the reduced
dataset.
In addition to Fu’s Fs and Tajima’s D, we used Arlequin to calculate
raggedness index and sum of square differences (SSD) under both
demographic and spatial expansion. These metrics are derived from
mismatch distributions where the distribution is expected to be unimodal
if populations experienced recent demographic or spatial expansion.
Multimodal or ragged distributions indicate a stable population over
time (Rogers and Harpending 1992). Raggedness indices, therefore,
describe the shape of the mismatch distributions and sum of SSD serves
as a goodness of fit. Non-significant tests indicate that lineages
either underwent demographic and/or spatial expansion. These metrics
were not possible in the combined dataset as the markers evolve at
different rates.
RESULTS
Combined Mitochondrial Dataset
For the combined dataset, we recovered 107 unique haplotypes among the
118 individuals. We trimmed the sequences to 1652 bp, which included 240
polymorphic sites (7 indels and 233 substitutions). Of the 240
polymorphic sites, 95 occurred once while 145 were parsimony
informative.
For the BEAST analysis, the consensus tree highly supported three
lineages (all BPP = 1.0; East, Midwest, and Southwest; Fig. 1). A fourth
group was sister to East and was found to be distinct in all other
analyses (see below), so we defined this group as another lineage
denoted Central (Fig 1). The first split occurred between the Southwest
lineage and the remaining three (time to most recent common ancestor
[TMRCA] = 0.2511 ma, 95% HPD = 0.2100 – 0.2907 ma). BEAST
estimated the next split was between the Midwest and East lineages to be
0.2026 ma (95% HPD = 0.1705 – 0.2364 ma). Although not statistically
supported in the BEAST analysis (BPP = 0.652), the final split between
East and Central was the newest divergence estimated at 0.1942 ma (95%
HPD = 0.1627 – 0.2267 ma).
All lineages grouped geographically across North America with multiple
areas of overlap (Fig 2a). East was broadly distributed east of the
Mississippi River while the Midwest lineage occurred in the upper
Midwest including the Upper Peninsula of Michigan, Wisconsin, Minnesota,
and North Dakota as recorded in numerous other studies (e.g., Rowe et
al. 2006, Fiset et al. 2015, Perno et al. 2022). Central occurred
primarily in the southern Midwest and the eastern Great Plains while
Southwest was found across Texas, Mexico, and New Mexico (Fig 2a).
Significant overlap exists between the Central and Southwest lineages.
The PCoA separated the four lineages with Axis 1 (variance explained =
25.78%) separating Southwest from the remaining three lineages. East
and Midwest were most distant on Axis 2 (variance explained = 21.71%)
with Central in the middle of all three lineages (Fig 2b). Similarly,
all lineages are separated by at least 10 mutational steps within the
haplotype network, and Central remains in the middle of all three
lineages as observed in the PCoA (Fig 2c).
We found high diversity in all lineages (Table 1), with the majority of
individuals in each lineage constituting a unique haplotype
(81.57-96.88%). Haplotype diversity was similar across lineages
(0.980-0.998) and nucleotide diversity ranged from 0.0073-0.0136 across
lineages (all lineages combined = 0.0202). Pair-wise differences ranged
from 12.106 (Midwest) to 22.595 (Central). Fu’s Fs were significant in
three (East, Midwest, and Southwest: -11.619-18.248, all p <
0.001) of the four lineages (Central = -1.998, p = 0.421) while no
Tajima’s D was significant (D = -1.207 - 1.657, all p >
0.05; Table 1). Fst values ranged from 0.424 (Central vs. Southwest) to
0.647 (Midwest vs. Southwest), and all were significant.
Reduced Control Region Dataset Analysis
Our reduced control region dataset consisted of 634 bp for a total of
276 haplotypes across 334 individuals. These haplotypes contained 167
polymorphic sites and 10 indels. Of the 167 polymorphic sites, 45 were
singletons and 122 informative sites.
The reduced control region analyses largely agreed with the combined
dataset despite lower power. With increased geographic coverage, it is
apparent that there are areas of overlap between the lineages. East,
Central, and Midwest individuals occur in Illinois and Indiana whereas
Central and Southwest individuals occur throughout Texas and New Mexico
(Fig 3a). The PCoA is largely similar to the combined dataset (Fig 3b),
but Axis 1 (variance explained = 29.67%) separates East and Midwest
first while Southwest, Midwest, and Central are spread along Axis 2
(variance explained = 13.42%). Lineages were less distinct in the
haplotype network as expected with reduced power, but each lineage is
separated by at least 4 mutations (Fig 3c).
All four lineages exhibited high diversity metrics as observed in the
combined dataset (Table 2). Both haplotype (0.993-0.995) and nucleotide
diversities (0.0144-0.0218) were similar across lineages whereas
pairwise differences were highest in the Central (13.801) and Southwest
(13.004) lineages compared to East (9.068) and Midwest (7.898). Sample
sizes were much higher in East and Midwest (n = 166 and 93 respectively)
compared to the other lineages (Central n = 44 and Southwest n = 32),
which likely explains these differences. Fst values ranged from 0.448
(Central vs. Southwest) to 0.620 (East vs. Southwest) with all Fsts
being significantly different from 0 (all p < 0.001).
All Fu’s Fs metrics were significant (Fs = -13.132- -191.893, all p
< 0.001) while all Tajima’s D were not (D = -1.009- -1.905,
all p > 0.05; Table 2). Tests for expansion found evidence
for demographic expansion in all lineages (all Raggedness indices
< 0.008, p > 0.064, SSD < 0.008, p
> 0.078). No other tests for expansion were significant
(Table 3).
DISCUSSION
Based on the combined and reduced control region datasets, we found
strong evidence for three evolutionary lineages of white-footed mouse
(East, Midwest, and Southwest) with a likely fourth lineage (Central)
that received less support. These lineages disagree with the recognized
morphological subspecies (Hall 1981). East, Midwest, and Southwest
formed strongly supported monophyletic lineages within the BEAST
analysis and were the most strongly differentiated groups in the
haplotype networks, PCoA, and Fst analyses. Central lineage was
monophyletic, but did not receive strong support in the phylogeny.
However, it was differentiated from the other three lineages in the
haplotype network, PCoA, and Fst analyses. Three of the four lineages
crossed the Mississippi River, providing little evidence for the
Mississippi River acting as a barrier for white-footed mice.
Collectively, these results are consistent with additional Pleistocene
lineages beyond the well characterized East and Midwest lineages.