Introduction

Polyploidy is as an important mechanism of speciation and diversification in plants (Stebbins 1971; Wood et al. 2009). Allopolyploid species, derived from hybridization and whole genome duplication of diploids, are particularly widespread across both geographical and environmental spaces, but underpinnings of their large distribution ranges remain elusive (te Beest et al. 2012). Allopolyploids arise in sympatry with one or both of their progenitor species and nascent populations therefore have to struggle against a frequency-dependent disadvantage to reproduce and establish in face of locally adapted diploids (i.e. ‘minority cytotype exclusion’; Levin 1975; Fowler & Levin 1984). An influential prediction of this model is that successfully established polyploids have shifted their ecological niche to stably coexist with their diploid progenitors. Novel genetic and environmental traits exhibited by polyploids were accordingly often discussed as the result of such an adaptive path (Levin 2002; Soltis et al. 2016). Despite the hypothesis that non-equilibrium climates of the Quaternary have promoted polyploid speciation (Stebbins 1985), neutral models of allopolyploid evolution remain largely to be explored. To what extent increased availability of suitable space during climate-enforced range shifts may lessen the competition from diploid progenitors and thus support allopolyploid speciation without niche shift is unknown.
Several studies having contrasted geographical and ecological ranges of diploid vs polyploid species supported the climatic niche divergence predicted under competitive exclusion (e.g. Theodoridis et al. 2013; Ramsey & Ramsey 2014). However, they rarely considered historical factors or tested specific hypotheses regarding the origin(s) of polyploids from multiple progenitor species. Despite recent work using ecological modeling across species to address the success of polyploids (e.g. Martin & Husband 2009; Glennon et al. 2014; Blaine Marchant et al. 2016; Baniaga et al. 2019), the evolution of their ecological niches following the combination of diploids remains elusive. To account for the impact of hybridization between differentially adapted taxa, Parisod & Broennimann (2016) suggested to not only rely on adequate phylogenies but to go beyond ambiguous pairwise comparisons by testing the null hypothesis that allopolyploids represent the additivity of their diploid progenitors. Allopolyploid speciation is indeed initiated by the combination of divergent suites of diploid loci that expectedly sum up as fixed heterozygosity in the derived allopolyploids (i.e. genetic additivity). Allopolyploid species correspondingly unite ecological tolerances from their progenitors and expectedly fully occupy their ecological niches (i.e. ecological additivity). Only significant deviation beyond such combined ranges of conditions is considered as characteristic of potential novelties supporting the establishment of allopolyploids in a distinct ecological niche.
This work uses wild wheats (Aegilops sp) to assess the eco-evolutionary drivers of range dynamics among four diploid species (2x = 14 chromosomes) that differentially combined into four independent allopolyploid species (4x = 28; Feldman & Levy 2015). Selected species are all annual selfers that share a ruderal life history (van Slageren 1994). Diploids are however typically distributed across restricted ranges, whereas allopolyploid wild wheats are widespread throughout the Mediterranean area (Kilian et al. 2011). Here, we use this excellent model system to investigate the genetic and environmental additivity of diploid progenitors in allopolyploids and identify drivers of their successful establishment and large distribution ranges. Despite prior cytogenetic and phylogenetic studies (e.g. Meimberg et al. 2009; Bernhardt et al. 2017), diploid progenitors and the recurrent origins of allopolyploid wild wheats were not precisely identified. Accordingly, we first infer the reticulate phylogeny of allopolyploids using plastid and nuclear loci to provide a robust comparative framework. Nuclear loci among samples collected across native species ranges then address the genetic additivity of diploids in allopolyploids and compare their phylogeography to assess main processes having shaped genetic variation. Environmental modeling is finally used to compare climatic niches of species and test to what extent allopolyploids differ from the expected ecological additivity of their diploid progenitors. Our integration of genetic and environmental characteristics supports an overall additivity of locally adapted diploid progenitors in allopolyploid species that successfully expend during environmental changes.
 

Material and Methods

Sampling across distribution ranges of diploid and allopolyploid species
Four diploid species – Ae. caudata (genome CC), Ae. comosa (MM), Ae. tauschii (DD) and Ae. umbellulata (UU) – and four of their derived allopolyploid species – Ae. crassa (DDMM), Ae. cylindrica (DDCC), Ae. geniculata (UUMM) and Ae. triuncialis (UUCC) – were investigated. Selected species show clear morphological differences and are representative of the diversity of existing polyploid complexes in wild wheats (Kilian et al. 2011).
A total of 402 geo-localized accessions of the eight Aegilops spp. were selected from germplasms to represent genetic variation across distribution ranges with 30 and 80 accessions per diploid and polyploid species, respectively (Table S1). For Ae. crassa, only 10 accessions were available. DNA was extracted using DNAeasy kit from Qiagen® following the manufacturer protocol. Samples were randomized and 26 accessions were replicated among 96-well plates to ensure genotyping accuracy. DNA concentration was measured by Nanodrop and normalized at 10ng/μL and 50ng/μL for amplification of plastid and nuclear loci, respectively.
Occurrences of the eight Aegilops species were collected from the GBIF database (http://www.gbif.org/) and filtered out from replicates and non-native samples from America (Lyons et al. 2010). Including genotyped samples, a total of 3667 occurrences georeferenced with a precision of 1 km (i.e. at least 4-decimal coordinates; Table S1) were kept for niche analyses.
 
Genetic analyses of plastid and nuclear loci to infer origin(s) of allopolyploids
To address the reticulate evolutionary relationships among species and identify corresponding progenitor(s) of polyploid individuals, all accessions were sequenced at two maternally-inherited plastid loci and 30 bi-parentally-inherited nuclear loci (Petit et al. 2005; Li et al. 2019). Intergenic loci of the plastid genomes, ndhF-rpl32 and trnT-trnF, were sequenced and concatenated following Meimberg et al. (2009) to infer the Median-Joining haplotype network shown in Fig S1 using SplitsTree4 (Huson & Bryant 2006). Nuclear loci (Table S1) were genotyped by sequencing multiplexed amplicons encompassing non-coding introns of independent low-copy genes, using 48.48 Access Arrays from Fluidigm coupled with Illumina MiSeq (Text S1). Allopolyploid samples were conservatively genotyped based on high-coverage sequences (2x250 bp) as AAAA, AABB, ABC- and ABCD for samples presenting one, two, three and four alleles, respectively.
To address the expected genetic additivity of progenitors in allopolyploids, nuclear haplotypes were assigned to their corresponding subgenomes and the relative contribution of each diploid species was estimated. At each locus, haplotypes segregating in polyploids that were identical to those in diploid progenitor species were considered as “conserved haplotypes” assigned to the corresponding subgenome or considered as shared when reported among diploids. Haplotypes specific to an allopolyploid were considered as “derived haplotypes” and assigned to the subgenome showing shorter branch lengths than with other diploids (Text S1). Derived haplotypes showing unclear phylogenetic relationships with putative progenitors were considered of unknown origin and lost, and coded as missing data. Under the null hypothesis that allopolyploids are the genetic additivity of progenitor species, haplotypes from each subgenome are expected to segregate in similar proportions. After normalization per loci, proportion tests assessed such predictions for conserved, derived and lost haplotypes segregating within the species.
 
Phylogenetic inferences and origins of allopolyploid species
To infer a reticulate species network, allopolyploid multi-labeled species trees were assessed from haplotype sequences at 30 nuclear loci (Text S1) using starBEAST in BEAST 2.5 (Drummond et al. 2012). The species trees were based on a subset of 39 representative samples selected as two random individuals from each genetic cluster within species (see Genetic structure) and two outgroups. Each allopolyploid subgenome was analyzed as a distinct diploid individual within the species and run up to 800 million generations of Markov Chain Monte Carlo (MCMC), until stable MCMC mixing after 10 % burnin and tree sampling every 100,000 generations. Convergence was checked on Tracer v1.7 (Rambaut et al. 2018) and Maximum Credibility Clade trees were summarized by TreeAnnotator available in BEAST2 and viewed in FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/). A secondary calibration based on divergence of diploid Aegilops species following Huynh et al. (2019) was used to date each allopolyploid species tree, considering issues inherent to molecular dating of polyploidy events (Doyle & Egan 2010).
 
Genetic structure within and among species
Allelic richness (AR) within species was estimated for both plastid and nuclear loci as the expected number of alleles among k gene copies (i.e. the minimum sample size of 10 and 40 gene copies reported in Ae. crassa for plastid and nuclear loci, respectively). Observed heterozygosity (Ho) was estimated as the average proportion of heterozygotes among nuclear loci within each species. To account for possible phylogenetic structure, gene diversity within species was estimated after correction for sample size following Nei (1978) from both unordered alleles (h; i.e. all alleles considered equidistant) and ordered alleles (v; i.e. considering phylogenetic distances as their proportion of nucleotide substitutions) using SPAGeDi (Hardy & Vekemans 2002).
To highlight phylogeographic signals in each species, intraspecific genetic structure was assessed from nuclear loci using Bayesian clustering as implemented in STRUCTURE 2.3.4 (Pritchard et al. 2000). Based on the admixture model with correlated alleles, 500,000 MCMC generations with 100,000 burnin were repeated ten times for each number of genetic clusters (K) between one and ten, and the optimal K was selected through ad hoc deltaK procedure estimating the rate of change in the log probability of data between successive K values (Evanno et al. 2005) as implemented in STRUCTURE HARVESTER (Earl & vonHoldt 2012). The average individual assignment probabilities under the optimal K were estimated over 20 iterations of 1,000,000 MCMC generations and 100,000 burnin with the FullSearch algorithm in CLUMPP 1.1.2 (Jakobsson & Rosenberg 2007) and represented by stacked bar plot generated in R using strataG v0.9.4 package (Archer et al. 2017).
 
Climatic niches quantifications
We compared the climatic niche of diploid and allopolyploid species in a simplified climatic space summarizing the gradients present in the 19 high resolution bioclimatic variables from the CHELSA database (30 arc sec, i.e. 1 km precision; Karger et al. 2017). On top of those 19 variables, similarly high resolution monthly ‘soil water balance’  variables from CGIAR database (Trabucco & Zomer 2010) were averaged over months and included as a 20th variable (swc) in the analyses. In order to allow direct comparisons of niches among all species, we defined a study area as the adjacent biomes (WWF; Olson et al. 2001) where occurrences of Aegilops species were recorded. Large geographical areas without occurrences were further excluded (e.g. Russia, the desert biome in North Africa; Fig. S1). The resulting area was considered as the accessible area available to all species (sensu Barve et al. 2011) and corresponding values for all 20 bioclimatic variables were used to produce a common background for all species in subsequent climatic niche analyses (Broennimann et al. 2012). We then run a principal component analysis (PCA) of a random sample of 1,000,000 raster cells across the study area (i.e. representing the whole realized environment, see PCA-env in Broennimann et al. 2012) and used the two first PCA axes (PC1 and PC2) as climatic space.
Using the niche analysis functions of the R-package ecospat (Di Cola et al. 2017), the realized niche of each Aegilops species was modeled in a common gridded climatic space (each pixel of that gridded space representing a unique combination of climatic conditions) using Gaussian kernel density analyses that smooth the density of occurrences and produce niche models independent from the sampling effort or the geographic resolution of occurrences (Broennimann et al. 2012). We also quantified the niche of the pool of each pair of known diploid progenitors as the expected niche of allopolyploids. Niche centroid and breadth (i.e. dispersion) were estimated according to density of the modeled niche along principal components (PC1 and PC2).
To what extent species occupy all available sites with suitable conditions (range filling; Svenning and Skov 2004) was estimated as the ratio of the effectively occupied area to the suitable area for each species. Our occurrence dataset was not exhaustive (i.e. did not covered every pixel occupied by the species) and the occupied area was therefore estimated using a kernel density function with 5% threshold around occurrences in the R-package ks (Duong 2019). The suitable area of each species was calculated based on the projection of its climatic niche onto the geography across the study area (using the function ‘ecospat.niche.zProjGeo’ in ecospat) and delimited by discarding 10% of the gridded space with lowest occurrence densities. Range filling was estimated as the proportion of occupied out of suitable cells.
 
Climatic niches comparisons
Different metrics were used to compare inferred climatic niches. First, we compared their overlap using the D metric of Schoener (1968), which measures the similarity between niches of any species based on the density of occurrence modeled in each pixel of the background climatic space. The significance of niche overlaps D was tested through similarity tests following Warren et al. (2008), by randomly reallocating niche centroid of both species and recalculating D through 100 repetitions following Broennimann et al. (2012). The similarity test thus assessed whether the range of climatic conditions (i.e. climatic niche) occupied by one species was more similar to the niche occupied by the other species than would be expected by chance (a=0.05). Using the USE framework (niche unfilling, stability and expansion) of Guisan et al. (2014), we further quantified the proportion of the climate space that was used only by the first species, only by the second species and by both species.
Niche stability S is a metric of niche overlap that considers the presence/absence of species and not their occurrence density as with the D metric. Both the D and USE metrics are empirical approaches relying kernel density estimations on existing environments and which calculate non-parametric niche shapes. These two metrics are thus robust to possibly non-convex niches and expectedly outperform alternative methods assuming niche convexity such as analyses of variance (Treier et al. 2009) or range bagging (Drake 2015). However, we further assessed niche differences among species based on 1000 bootstrap of niche centroids and breadths along PC1 and PC2 and comparisons using one-way ANOVA and post-hoc Tukey tests in the R-package vegan (Oksanen et al. 2019). Each bioclimatic factor was similarly compared among species.