Results
Ten individuals of Arabian gray mangrove (A. marina var.marina ) from each of 19 Arabian Peninsula populations plus an eastern Australian population to be used as an outgroup (A. marina var. australasica ) were sampled for resequencing, for a total of 200 sequenced individuals (Fig. 1; Table 1; Table S1, Supplementary Information). Geographic coordinates of each sampled tree were recorded to enable genotype-environment analysis using georeferenced environmental data. SNP calling was carried out with the GATK Genome Analysis Toolkit (GATK; 28 ) to produce a set of genome-wide SNPs for downstream genetic analyses. The genome coverage per individual was 16.8X on average, ranging from 6.1X to 27.1X, after quality and missing data filtering.
Population structure and genetic diversity analyses The study aimed at evaluating how the array of spatial settings and temporally volatile geographic barriers of the Arabian Peninsula shaped the diversity of Arabian mangrove populations. To explore patterns of population structure, we produced a set of independent, selectively neutral SNPs after applying quality and missing data filters. A t-distributed stochastic neighbor embedding (t-SNE) analysis revealed marked levels of population structure, highly congruent with the geographic distribution of the sampled populations. Populations from the Red Sea and the PAG clustered apart, while those from the Arabian Sea occupied a central, more scattered position, relatively closer to the PAG cluster yet differentiated from it (Fig. 2A). Structure was also observable within each of the water bodies defining the three biogeographic regions, with little to no overlapping among the sampling locations, revealing considerable differentiation at the population level (Fig. 2A).
We also examined population divergence in Arabian mangroves using a sparse non-negative matrix factorization method (SNMF; 29 ), which recovered variability patterns consistent with the t-SNE results (Fig. 2B). At K = 2, populations clustered into two groups separating the Red Sea from the Arabian Sea, the Sea of Oman and the PAG. The plot for K = 3 also revealed the more isolated, geographically adjacent populations from the southern Arabian Sea (Salalah and Taqah) as a differentiated group, with populations along the northeast Oman coast and into the PAG showing decreasing degrees of shared ancestry. At K = 4, populations from the northern basin of the PAG (Bahrain and Dammam) appeared as a differentiated cluster. At K = 5, populations from northern Red Sea (Duba, Al Wajh 1 and Al Wajh 2) also appeared as a genetically differentiated cluster, yet showing a signal of coancestry decreasing with latitude with proximal populations from the central Red Sea (Al Kharrar and King Abdullah Economic City, hereafter KAEC) and to a lesser extent, Al Lith. In the plot for K = 6, populations from the Sea of Oman, Qurm and Shinas, clustered together with Filim, yet Shinas showed intermediate levels of shared ancestry with the genetic cluster of the southern PAG basin, which included Ras Al Khaimah (hereafter RAK), Umm Al Quawain (hereafter UAQ), Ras Ghurab and Suweihat. At K = 7, populations from southern Red Sea (Al Lith, Farasan Banks 1 and Farasan Banks 2; hereafter FB1 and FB2) clustered apart from the central Red Sea populations, yet Al Lith presented intermediate levels of coancestry with the central Red Sea population KAEC.
Pairwise F ST and Nei’s genetic distances based on neutral loci were generally congruent with the t-SNA and SNMF results, showing high correlation with the geographic distribution of the mangrove populations. Both indices showed the highest differentiation scores when comparing populations from the Red Sea to those of the PAG. Within-region F ST comparisons were higher for the Red Sea and Nei’s were higher for the PAG, suggesting divergent population histories (Table S2, Supplementary Material). Heterozygosity and nucleotide diversity indices revealed overall similar levels of genetic variability, with more admixed populations at Al Kharrar, KAEC, Filim, Qurm and Shinas (see SNMF results above) generally showing the highest scores (Table 1).
Phylogenetic analysis To explore how evolutionary relationships fitted a scenario of recolonization after the last glacial maximum (LGM) versus isolation in putative glacial refugia in the Red Sea and the PAG, we conducted a phylogenetic analysis based on resequencing data. A maximum likelihood phylogenetic reconstruction was consistent with the t-SNE and SNMF analyses, showing marked differentiation between and within biogeographic regions, with almost all the populations sampled resolving in monophyletic clades with high node support. Arabian mangrove populations grouped in three major, reciprocally monophyletic clades or phylogenetic lineages: the Red Sea lineage (Major Clade I), the southern Arabian Sea lineage (Major Clade II) and the PAG plus the Sea of Oman lineage (Major Clade III) (Fig. 3). Within the Red Sea’s Major Clade I, the phylogenetic analysis revealed a sequence of cladogenetic events consistent with the geographic distribution of the sampled populations, with the first split separating northern (Duba, Al Wajh 1 and Al Wajh 2) and central populations (Al Kharrar and KAEC) from those of the south (Al Lith, FB1 and FB2); adjacent Al Kharrar and KAEC populations from central Red Sea showed little differentiation and grouped into a single clade. The Arabian Sea’s Major Clade II encompassed the populations Salalah, Taqah and Filim, all occurring in southern Oman, and appeared as the sister group of the Major Clade III, which includes all Sea of Oman and PAG populations. Major Clade III was further divided into two clades separating the populations of the northern basin of the PAG (Bahrain and Dammam) from those of the southern basin (RAK, UAQ, Ras Ghurab and Suweihat) and the Sea of Oman (Qurm and Shinas). Populations of these two latter subregions, which occur on each side of the Strait of Hormuz, appeared as reciprocally monophyletic sister groups. The Brisbane population belonging to the Australian variety of the gray mangrove (A. marina var. australasica ) was used as outgroup (Fig. 3).
Population and demographic history of the Arabian gray mangrove We performed model comparisons under the likelihood framework developed in fastSIMCOAL2 (30 ) to test the competing hypotheses of colonization after the LGM versus potential isolation in glacial refugia in the Red Sea and the PAG. Informed by the previously inferred phylogenetic relationships and using three populations from each biogeographic region as lineage representatives, two sets of models were independently compared for the Red Sea and the PAG plus the Sea of Oman lineages (Fig. S1, Supplementary Material). The Brisbane population of Australian mangroves was included as outgroup and to calibrate times of divergence (26, 31 ). For the Red Sea, the model with the greatest likelihood score (ΔAIC = 1,707.58; Table S3, Fig. S1, Supplementary Information) revealed two cladogenetic events and splitting times dating back to 89,950 years for Duba and Al Kharrar (95% CI = [55,910 - 221,093]); and 166,975 years for FB2 (95% CI = [94,333 - 290,121]), consistent with a process of lineage differentiation in geographic isolation during glacial periods of low marine connectivity. No gene exchange among extant lineages was detected (MIG = 0, 95% CI = [0 – 0]). Potential gene flow was detected among ancestral lineages, yet the 95% confidence intervals varied by orders of magnitude and included the zero (MIG anc. pop. 1 → FB2 = 36, 95% CI = [0 – 78]; MIG FB2 → anc. pop. 1 = 0, 95% CI = [0 - 5,979] migrants per generation), revealing little impact of these parameters in the SFS (Table 2, Fig. 4). In the case of the PAG, the model that better fitted our data (ΔAIC = 7,769.98; Table S3, Fig. S1, Supplementary Information) revealed a scenario of simultaneous cladogenesis dating back 32,725 years (95% CI = [19,411-92,498]), prior to the LGM. Signs of some gene flow were detected among PAG lineages (MIG Dammam → Ras Ghurab = 13, 95% CI = [0 – 15]; MIG Ras Ghurab → Dammam = 0, 95% CI = [0 – 9]; MIG Ras Ghurab → Shinas = 16, 95% CI = [14 – 30]; MIG Shinas → Ras Ghurab = 0, 95% CI = [0 – 7] migrants per generation). While narrower than in the Red Sea estimates, the ranges also included zero in most cases, reducing the certainty in the model (Table 2, Fig. 4).
We used TreeMix (32 ) to further explore patterns of historical gene flow among mangrove populations within the Red Sea and the PAG, separately. TreeMix generates a maximum likelihood tree based on allele frequencies and then estimates migration events among branches to account for excess of covariance. The Salalah population from the Arabian Sea was used to root the trees in both models. In the case of the Red Sea, a tree model with no migration events accounted for 99.9% of variation in the SNP dataset, revealing no signs of gene flow among Red Sea populations (32 ) (Fig, 5A), congruently with the fastSIMCOAL2 analysis. In the PAG plus the Sea of Oman, a tree model with three migration events between ancestral (Dammam + Bahrain → Ras Ghurab + Suweihat) and terminal branches (Shinas → RAK; Suweihat → Dammam) explained up to 99.8% of the SNP variance, revealing signs of both historical and contemporary gene flow among mangrove populations in the northern and southern basins of PAG, and between the PAG and Sea of Oman populations (Fig. 5B).
To examine changes in the effective population size of the Arabian gray mangroves, we used the MSMC2 program (33 ) for a reduced set of populations representative of each one of the marine biogeographic regions of the peninsula. The results showed similar patterns for all lineages, revealing an ancestral demographic expansion two to five million years ago, followed by a gradual decrease of the population sizes, slightly more pronounced in the lineages from the Red Sea. The analysis also revealed a moderate demographic recovery 20,000 to 10,000 years ago for all lineages, overlapping with the end of the Last Glacial Maximum and rising sea levels (Fig. S2, Supplementary Information).
Adaptive divergence in Arabian gray mangroves We conducted a genotype-environment association analysis to identify candidate loci potentially involved in differential selection among Arabian gray mangrove populations. We implemented a redundancy analysis (RDA) based on population allele frequencies and georeferenced environmental data approximating maximum air temperature of the warmest month, minimum air temperature of the coldest month, maximum of sea surface salinity (averaged over months) and annual precipitation as explanatory variables. The environmental predictors explained 18.3% of the total genetic variance (Adjusted R2 = 0.183, p-value < 0.001). Per population RDA scores revealed distinct association patterns between biogeographic regions. Red Sea mangroves showed high, negative loading values on RDA axis (RD) 1, while the remaining populations correlated positively with this axis, and heavily in the case of the Sea of Oman and PAG southern basin. In contrast, the Arabian Sea populations showed high negative association values with RD2, while Red Sea and PAG mangroves showed positive association scores along this axis, yet the former with much lower correlation values (Table S4, Supplementary Information). The analysis revealed 668 genetic variants showing significant associations with one or more of the four environmental predictors (α = 1x10-4; Fig. S3, Supp. Inf.). Of the 668 outliers, 79 were located within up to 51 functionally annotated genes, from which at least 11 (21.6%) presented functions potentially relevant for adaptation to the extreme and variable habitats of the seas around the Arabian Peninsula (See Supplementary Information for details and a comprehensive list of candidate genes with reported functions).
To visually explore how adaptive variability is structured at the individual level, we conducted a complementary redundancy analysis based on individual genotypes of the 668 candidate loci alone. The plot of the first two RDA axes revealed distinctive associations patterns: Populations from the northern Red Sea (Duba, Al Wajh 1 and Al Wajh 2), the central Red Sea (Al Kharrar and KAEC) and the southern Red Sea (Al Lith, FB1 and FB2) differentiated along a gradient loading heavily on minimum temperature of the coldest month and annual precipitation. Populations from the Arabian Sea (Taqah and Salalah) and Qurm, the proximal population from the Sea of Oman, presented similar environment-genotype association patterns, correlating negatively with maximum sea surface salinity and to a lesser extent, with maximum temperature of the warmest month. PAG populations from the northern (Dammam and Bahrain) and southern (Suweihat, Ras Ghurab, RAK and UAQ) basins clustered together and with northern Red Sea populations, loading negatively on minimum temperature of the coldest month and positively on maximum sea surface salinity. The population of Shinas in the Sea of Oman also presented similar correlation patterns to the PAG/northern Red Sea cluster, yet showed weaker association with maximum sea surface salinity and loaded negatively on maximum temperature of the warmest month (Fig. 6A). Individual Euclidean distances averaged over populations in the RDA ordination space showed general similar trends (Fig. 6B), with within-region, highest differences among association patterns occurring along the Red Sea and outrunning the Northern Red Sea versus PAG scores. Populations FB1 and especially FB2 showed the most dissimilar association scores of all pairwise population comparisons of the RDA based on candidate loci (Fig. 6B).