Correspondence:
Elizabeth Kierepka
North Carolina Museum of Natural Sciences
11 W Jones Street
Raleigh, NC 27601
EMAIL: emkierep@ncsu.edu
Word Count: 5,589
ABSTRACT
Pleistocene glaciation events had a dramatic impact on temperate taxa by displacing animal and plant populations south of ice sheets into glacial refugia. Genetic variation often reflects these histories of isolation within glacial refugia and subsequent recolonization. The highly speciose rodent genus Peromyscus , in particular, is well known for its rapid diversification during the Pleistocene. Peromyscus are also significant reservoirs for a myriad of zoonoses, and many cosmopolitan species are undergoing range expansions due to human land use and climate change. This study focused on the range-wide phylogeography of the white-footed mouse (Peromyscus leucopus ), a common species found in eastern North America that is one the primary reservoirs for Lyme Disease (Borrelia burgdorferi ). We used two mitochondrial genes, cytochrome b and control region, to identify evolutionary lineages of white-footed mice and characterize patterns of expansion of each lineage across their geographic range. Overall, we found evidence for four evolutionary lineages with a Southwest lineage largely restricted to grassland and desert habitats. Time since recent common ancestors placed all lineages diverging within the Last Glacial Maximum (~19-25k years ago). All lineages exhibited signatures of expansion, particularly the two northern lineages known to host Lyme Disease. Overall, white-footed mice underwent rapid diversification similar to other Peromyscus species and potentially exhibit habitat-based divergence within the Southwest lineage. Signatures of expansion also indicate that white-footed mice will continue to facilitate increased spread of zoonoses like Lyme Disease, but further study is needed to clarify how these evolutionary dynamics interact with other factors associated with human disease incidence.
Keywords: phylogeography, white-footed mouse, Pleistocene,Peromyscus leucopus , Lyme Disease
INTRODUCTION
Quaternary climatic cycles greatly influenced numerous species across the northern hemisphere by forcing populations into refugia south of glaciated areas. Contemporary populations often retain the genetic signatures of isolation within separate refugia and subsequent recolonization after glacial recession (Hewitt 2004). Considerable diversity exists among responses to glaciation, owing to an interplay between species’ life history (e.g., dispersal capability, degree of ecological specialization, geographic distribution; Shafer et al. 2010, Kierepka and Latch 2016) and landscape factors (e.g., presence of contemporary barriers and number of occupied refugia). This interaction is particularly evident in wide-ranging, generalist species where they potentially could have occupied multiple refugia and utilized distinct colonization routes, making it difficult to infer how such species responded to Pleistocene glaciation events. Indeed, wide-ranging generalist species exhibit diverse responses to glaciation, ranging from a single lineage (Kierepka and Latch 2016) and simple eastern and western lineages (Reding et al. 2012, Kierepka et al. 2023, Klicka et al. 2023) to multiple refugial lineages (Puckett et al. 2015) and formation of cryptic species (Burbrink et al. 2022, McDonough et al. 2022).
The murid genus Peromyscus exemplifies the difficulty in inferring responses to Pleistocene glaciation due to their complex evolutionary histories. Peromyscus is a highly speciose group that occupies every terrestrial biome in North America, resulting in complex patterns of speciation across numerous refugia and geographic barriers. Despite these evolutionary dynamics, Peromyscus mice exhibit very similar morphologies, which makes it difficult to identify individual species and evolutionary lineages. Indeed, many taxa require genetic identification to conclusively differentiate species when they occur in sympatry. With genetic data, multiple cryptic species and thirteen species complexes have been found within previously thought cosmopolitan species (e.g., Bradley et al. 2007, Castaneda-Rico et al. 2014). Many of these newly described species have relatively narrow geographic ranges within western biogeographic hotspots like Mexico (Bradley et al. 2022, Leon-Tapia et al. 2022), Pacific Northwest (Boria and Blois 2023), and California (Riddle et al. 2000, Boria and Blois 2023) while several species are still considered generalists with large geographic ranges.
While many Peromyscus species have small geographic ranges, others are highly cosmopolitan and have experienced range expansion due to human land use and climate change. These expansions are of particular concern in eastern North America because Peromyscus are the primary reservoirs of multiple tick-borne illnesses, principallyBorrelia burgdorferi , the most common causative agent of Lyme disease. The most competent reservoir for B. burgdorferi is the white-footed mouse, Peromyscus leucopus , an abundant rodent throughout central and eastern North America. P. leucopus is found in many habitats including highly urban habitats with a distribution range stretching from central Mexico and Arizona in the west to southern coastal Canada in the east. Numerous studies have documented the northward expansion of P. leucopus and associatedB. burgdorferi into Canada and the Great Lakes region (Roy-Dufresne et al. 2013, Fiset et al. 2015, Garcia-Elfring et al. 2017).
Previous work with P. leucopus in eastern North America has identified two monophyletic evolutionary lineages in eastern North America: a broadly distributed eastern lineage (East) and another lineage in the Upper Midwest and north of the St. Lawrence River (Midwest; Rowe et al. 2006, Fiset et al. 2015). These lineages contain 8 morphological subspecies that correspond to five islands and three mainland groups (Hall 1981). Strong genetic signatures of expansion have been detected in both lineages along their northern border in Michigan and Canada (Fiset et al. 2015, Prado et al. 2022). These lineages appear to have formed in allopatry during the Pleistocene (Rowe et al. 2006, Moscarella et al. 2019, Prado et al. 2022). Several studies have recorded additional lineages west of these lineages, but had low support in phylogenetic analysis or were based on a few individuals (e.g., Shipp-Pennock et al. 2005, Rowe et al. 2006, Herrera 2021).
Western P. leucopus populations occur across multiple grassland and desert habitats unlike the largely forest-associated East and Midwest lineages. The westernmost extent of P. leucopus occurs in the desert southwest, a well-recognized glacial refugium for many species (Riddle and Hafner 2006, McDonnough et al. 2022). WesternP. leucopus also contain three chromosomal inversions that separate them from eastern populations, and thus, have been considered chromosomal “races” (Baker et al. 1983). The three inversion difference between western and eastern chromosomal races also represents a higher divergence than other Peromyscus species pairs (Baker et al. 1983). However, these two chromosomal races meet in Oklahoma and appear to inbreed extensively (Stangl 1986), making it unclear if these races correspond to evolutionary lineages like East and Midwest. Taken together, it is likely there is further divergence found in P. leucopus beyond the well-recognized East and Midwest lineages with at least one more lineage in the west. Thus, this study sampled across the range of Peromyscus leucopus to examine the species-wide effects of Pleistocene glaciation on observed genetic variation.
Collectively, Pleistocene glaciation and large water barriers (e.g., Atlantic Ocean, Great Lakes, and St. Lawrence River) appear to have had the largest impact on P. leucopus within the eastern portion of their range (e.g., Shipp-Pennock et al. 2005, Rowe et al. 2006, Fiset et al. 2015, Moscarella 2019). Much of the habitat within these areas are forested, which allows high gene flow. More habitat heterogeneity and geographic barriers exist in the western portion of P. leucopus ’ range, increasing the potential for higher genetic differentiation. Indeed, western P. leucopus is separated into 11 morphological subspecies (9 mainland, 2 island; Hall 1981). Peromyscus spp . often speciate due to habitat heterogeneity in biodiversity hotspots (e.g., León-Tapia et al. 2022), and western P. leucopus occur within areas known for high endemism (e.g., desert southwest and Mexico). We predict that Pleistocene glaciation influenced the western portion of P. leucopus range and that the western populations form a separate evolutionary lineage from East and Midwest. Furthermore, given the known species’ ability to colonize variety of habitats and subsequently expand its range, we predict strong signatures of population expansion in all evolutionary lineages.
METHODS
Sample Collection
We collected putative P. leucopus tissues (n = 157) from museums, NSF NEON tissue collection, and field sampled animals from across their range (Table S1). Collection emphasized animals caught west of the Mississippi River including Kansas, Mexico, Nebraska, New Mexico, North Dakota, Oklahoma, and Texas (n = 85; Fig 1). The remaining individuals originated from areas east of the Mississippi River including previously glaciated areas like Canada, New England, and the Great Lakes regions. All samples were georeferenced, and stored in a -20°C until DNA extraction.
Laboratory Methods
All tissues were extracted via Zymo Miniprep tissue kits following the manufacturer guidelines. We amplified two mitochondrial genes: 750 bp of cytochrome b and the entire control region (920 bp) using published mammal (MTCB-F: 5’-CCHCCATAAATAGGNGAAGG-3’, MTCB-R: (5’-WAGAAYTTCAGCTTTGGG-3’; Naidu et al. 2012) and custom (F: 5’-CCAAAGCTGATATTCTATTTAAAC-3’, R: 5’-ATAAGGCTAGGACCAAACCT-3’, Internal: 5’-ACATATCTGCGTTATCTTACATAC-3’) primer sets. We chose these two regions as they are the most commonly used in phylogeographic literature onP. leucopus , allowing for inclusion of Genbank sequences. Polymerase chain reactions (10 µL total volume) contained 5-10 ng of extracted DNA, 10 nM of forward and reverse primer, 2.0 mM of MgCl2, 5.0 µL of 1:5 mix of Takara Taq Polymerase and Promega GoTaq MasterMix, and 2.9 µL nuclease-free water. PCRs began with initiation at 95 oC for 2 min then 35 cycles of denaturing at 95 oC for 15 seconds, annealing at 52 (cyt b) or 54oC (control region) for 30 seconds, and extension at 72 oC for 1 min, completed by an extension at 72 oC for 10 min. We sequenced the forward primer for cytochrome b and the forward and internal primers for control region to obtain the final dataset. Sequencing reactions initiated at 96 oC for 3 min then underwent 30 cycles at 96 oC for 10 seconds, 50 oC for 5 seconds, and 60 oC for 2.5 min. Sequencing reactions were then cleaned via an ethanol precipitation method (Latch and Rhodes 2005) and sequenced on an ABI 3500. All sequences were aligned and concatenated in Geneious Prime (Kearse et al. 2012).
Species Identification
Peromyscus spp. can be extremely difficult to distinguish based on morphology, and P. leucopus ’ range overlaps with multiple other species including both wide-ranging species (e.g., P. maniculatus species group) and range-restricted species (Peromyscus eremicus species group). Therefore, we confirmed species identification via BLAST within Geneious Prime (Kearase et al. 2012). Of the 157 tissue samples, 27 were genetically identified as different species (P. maniculatus, pectoralis, attwateri, gossypinus, and eremicus ) and 12 failed to amplify at both mitochondrial regions, leaving a total of 118 P. leucopus samples for phylogeographic analysis.
Combined Mitochondrial Dataset Analysis
To reconstruct phylogenetic relationships among P. leucopus , we used Bayesian coalescent models in BEAST v 2.6.6 (Bouckaert et al. 2019). Model parameters were set in BEAUti and included the following priors: coalescent constant population size, clock rate of 5% and 20% per million years for cytochrome b and control region respectively, and a strict molecular clock. With these priors, we performed Bayesian Markov chain Monte Carlo searches to estimate the most recent common ancestor for each major lineage. We sampled trees and divergence dates every 10,000 iterations for 10,000,000 generations, and checked model convergence in Tracer v.1.7.1 (Rambaut et al. 2018). Final consensus trees were visualized in FigTree v.1.4.4 (Rambaut 2018).
To complement the tree-building analysis, we constructed a median joining haplotype network within PopArt version 1.7 (Leigh and Bryant 2015). Large gaps of mutational steps are expected to separate Pleistocene lineages, allowing comparison to the BEAST analysis. Additionally, we performed a principal coordinates analysis (PCoA) with individual-based Kimura’s 2-parameters distance (Kimura 1980). These distances were calculated in the R package ape via the “dist.dna” function (Paradis et al. 2004). We tested other evolutionary models for genetic distance between individuals, and the results were similar across all distances. We expected lineages to be most distant from one another within the PCoA and correspond to the BEAST analysis and haplotype network.
For each lineage and the total dataset, we calculated measures of diversity within DNAsp v. 6 (Rozas et al. 2017). These metrics included number of haplotypes (h), haplotype diversity (hD), nucleotide diversity (π), and mean number of pairwise nucleotide differences (k). We also calculated FST between all lineages in Arlequin v. 3.5.2.2 (Excoffier and Lischer 2010). Finally, we used Arlequin to estimate two metrics, Fu’s Fs (Fu 1997) and Tajima’s D (Tajima 1989), as both are expected to be significantly negative if a lineage underwent rapid expansion.
Reduced Control Region Dataset Analysis
We also downloaded 216 Genbank sequences of the control region to increase our geographic coverage. These sequences were georeferenced based on their original studies. Our control region sequences were trimmed to 636 bp to maximize power while allowing for additional sequences. In total, this reduced control region dataset included 334 sequences. We repeated the analyses described above (BEAST, haplotype network, PCoA, diversity, Fst, Fu’s Fs, and Tajima’s D) for the reduced dataset.
In addition to Fu’s Fs and Tajima’s D, we used Arlequin to calculate raggedness index and sum of square differences (SSD) under both demographic and spatial expansion. These metrics are derived from mismatch distributions where the distribution is expected to be unimodal if populations experienced recent demographic or spatial expansion. Multimodal or ragged distributions indicate a stable population over time (Rogers and Harpending 1992). Raggedness indices, therefore, describe the shape of the mismatch distributions and sum of SSD serves as a goodness of fit. Non-significant tests indicate that lineages either underwent demographic and/or spatial expansion. These metrics were not possible in the combined dataset as the markers evolve at different rates.
RESULTS
Combined Mitochondrial Dataset
For the combined dataset, we recovered 107 unique haplotypes among the 118 individuals. We trimmed the sequences to 1652 bp, which included 240 polymorphic sites (7 indels and 233 substitutions). Of the 240 polymorphic sites, 95 occurred once while 145 were parsimony informative.
For the BEAST analysis, the consensus tree highly supported three lineages (all BPP = 1.0; East, Midwest, and Southwest; Fig. 1). A fourth group was sister to East and was found to be distinct in all other analyses (see below), so we defined this group as another lineage denoted Central (Fig 1). The first split occurred between the Southwest lineage and the remaining three (time to most recent common ancestor [TMRCA] = 0.2511 ma, 95% HPD = 0.2100 – 0.2907 ma). BEAST estimated the next split was between the Midwest and East lineages to be 0.2026 ma (95% HPD = 0.1705 – 0.2364 ma). Although not statistically supported in the BEAST analysis (BPP = 0.652), the final split between East and Central was the newest divergence estimated at 0.1942 ma (95% HPD = 0.1627 – 0.2267 ma).
All lineages grouped geographically across North America with multiple areas of overlap (Fig 2a). East was broadly distributed east of the Mississippi River while the Midwest lineage occurred in the upper Midwest including the Upper Peninsula of Michigan, Wisconsin, Minnesota, and North Dakota as recorded in numerous other studies (e.g., Rowe et al. 2006, Fiset et al. 2015, Perno et al. 2022). Central occurred primarily in the southern Midwest and the eastern Great Plains while Southwest was found across Texas, Mexico, and New Mexico (Fig 2a). Significant overlap exists between the Central and Southwest lineages. The PCoA separated the four lineages with Axis 1 (variance explained = 25.78%) separating Southwest from the remaining three lineages. East and Midwest were most distant on Axis 2 (variance explained = 21.71%) with Central in the middle of all three lineages (Fig 2b). Similarly, all lineages are separated by at least 10 mutational steps within the haplotype network, and Central remains in the middle of all three lineages as observed in the PCoA (Fig 2c).
We found high diversity in all lineages (Table 1), with the majority of individuals in each lineage constituting a unique haplotype (81.57-96.88%). Haplotype diversity was similar across lineages (0.980-0.998) and nucleotide diversity ranged from 0.0073-0.0136 across lineages (all lineages combined = 0.0202). Pair-wise differences ranged from 12.106 (Midwest) to 22.595 (Central). Fu’s Fs were significant in three (East, Midwest, and Southwest: -11.619-18.248, all p < 0.001) of the four lineages (Central = -1.998, p = 0.421) while no Tajima’s D was significant (D = -1.207 - 1.657, all p > 0.05; Table 1). Fst values ranged from 0.424 (Central vs. Southwest) to 0.647 (Midwest vs. Southwest), and all were significant.
Reduced Control Region Dataset Analysis
Our reduced control region dataset consisted of 634 bp for a total of 276 haplotypes across 334 individuals. These haplotypes contained 167 polymorphic sites and 10 indels. Of the 167 polymorphic sites, 45 were singletons and 122 informative sites.
The reduced control region analyses largely agreed with the combined dataset despite lower power. With increased geographic coverage, it is apparent that there are areas of overlap between the lineages. East, Central, and Midwest individuals occur in Illinois and Indiana whereas Central and Southwest individuals occur throughout Texas and New Mexico (Fig 3a). The PCoA is largely similar to the combined dataset (Fig 3b), but Axis 1 (variance explained = 29.67%) separates East and Midwest first while Southwest, Midwest, and Central are spread along Axis 2 (variance explained = 13.42%). Lineages were less distinct in the haplotype network as expected with reduced power, but each lineage is separated by at least 4 mutations (Fig 3c).
All four lineages exhibited high diversity metrics as observed in the combined dataset (Table 2). Both haplotype (0.993-0.995) and nucleotide diversities (0.0144-0.0218) were similar across lineages whereas pairwise differences were highest in the Central (13.801) and Southwest (13.004) lineages compared to East (9.068) and Midwest (7.898). Sample sizes were much higher in East and Midwest (n = 166 and 93 respectively) compared to the other lineages (Central n = 44 and Southwest n = 32), which likely explains these differences. Fst values ranged from 0.448 (Central vs. Southwest) to 0.620 (East vs. Southwest) with all Fsts being significantly different from 0 (all p < 0.001).
All Fu’s Fs metrics were significant (Fs = -13.132- -191.893, all p < 0.001) while all Tajima’s D were not (D = -1.009- -1.905, all p > 0.05; Table 2). Tests for expansion found evidence for demographic expansion in all lineages (all Raggedness indices < 0.008, p > 0.064, SSD < 0.008, p > 0.078). No other tests for expansion were significant (Table 3).
DISCUSSION
Based on the combined and reduced control region datasets, we found strong evidence for three evolutionary lineages of white-footed mouse (East, Midwest, and Southwest) with a likely fourth lineage (Central) that received less support. These lineages disagree with the recognized morphological subspecies (Hall 1981). East, Midwest, and Southwest formed strongly supported monophyletic lineages within the BEAST analysis and were the most strongly differentiated groups in the haplotype networks, PCoA, and Fst analyses. Central lineage was monophyletic, but did not receive strong support in the phylogeny. However, it was differentiated from the other three lineages in the haplotype network, PCoA, and Fst analyses. Three of the four lineages crossed the Mississippi River, providing little evidence for the Mississippi River acting as a barrier for white-footed mice. Collectively, these results are consistent with additional Pleistocene lineages beyond the well characterized East and Midwest lineages.