Results
Preliminary delimitation of Carob Evolutionary Units (CEUs) using microsatellite data
Based on genotypes from 17 microsatellite loci (including SSR and SNPs) and geographical coordinates, we grouped 1,020 carob trees into CEUs (Fig. S1 in supplementary material). The Ward cluster method, which considers geographic and genetic distances, found seven CEUs grouped in two clusters. The western cluster included populations from South Morocco (SM), South-West Spain (SWS) and South-East Spain (SES). The eastern cluster included the remaining four CEUs: North Morocco (NM), Central Mediterranean (CM), North-East Mediterranean (NEM) and South-East Mediterranean (SEM). Based on microsatellite markers, the overall genetic differentiation among these seven CEUs is 16% (Gst). Pairwise Gst values visualized by a neighbor-joining tree (Fig. S1C in supplementary material) confirmed the Ward clustering scheme, but placed SEM and SM in intermediate positions along the NJ tree instead of grouping these two CEUs together.
Assembly of RADseq markers and identification of outliers
The de novo assembly conducted on 36 samples with a clustering threshold of 0.94 and a minimum number of samples by loci of 30, produced a mean number of 48,828 loci by sample (Tab. S1) reduced to 13,371 loci retained after ipyrad filtering. The sequences of these loci were used as a reference for the assembly of 350 genotypes, which resulted in 10,012 loci with an overall missing data rate of 64%. The structure of genetic diversity revealed by this RADseq data set is congruent with the microsatellite dataset (Fig. S1) and with a west-east pattern of differentiation (Fig. S2A in supplementary material), even picturing a more pronounced longitudinal structure. However, SEM and NM exchange their positions in the RADseq dataset: NM appears in an intermediate position between west and east CEUs close to CM whereas SEM forms a group with NEM (Fig. S2B).
The detection of the genotypes and loci having the highest missing rate by Matrix condenser, and their subsequent removal, produced a matrix with 190 samples and 12,767 loci, with 14% missing rate. After filtering to keep one SNP by locus, the dataset included 190 samples for 3,613 loci (or SNPs), with 9.5% overall missing data and 3.5% median missing rate by genotype. The Outflank method detected 56 outlier markers within the dataset of 190 samples by 3,613 SNPs (Fig. S3 in supplementary material). The global differentiation computed for these outliers was six times higher than for other SNPs (Gst 0.74 vs. 0.12). Nucleotide BLAST searches revealed that 27 outliers were assigned to plastid genome, 14 to the mitochondrial genome, 11 to nuclear genomes and 4 were not assigned. Excluding the outliers detected by Outflank, the GWN dataset included 3,557 SNPs. The nuclear matrix produced for phylogenetic inference after filtering out samples with missing data rates above 20%, contained 171 C. siliqua and 4 C. oreothauma genotypes for 3,284 unlinked SNPs and 8% missing data rate. The MH matrix contained 190 C. siliqua genotypes for 946 nucleotides, 17 variable sites and 60% missing data. The PH matrix contained 190 C. siliqua genotypes for 1,897 nucleotides, 31 variables sites and 47 % missing data.
Phylogenetic and genetic structure of carob trees across the Mediterranean based on RADseq data
Phylogenetic analysis of MH and PH concatenated in two alignments revealed the same pattern with two clearly differentiated haplogroups with branch support over 95% (Fig. S4 in supplementary material). In the Western Mediterranean CEUs (SES, SWS, NM and SM) a single haplogroup represented 85 % and 94.5 % of plastid and mtDNA haplotypes respectively. These western mtDNA and pDNA haplogroups were observed in only 11 % of the samples from central and eastern CEUs. A central-eastern haplogroup was found across CM, NEM and SEM CEUs, and was very rare in Western CEUs. Although strictly matching otherwise, mtDNA and pDNA haplogroups were not congruent in the northern part of SM (population from Imouzzer Ida Ou Tanane area, in Morocco). In this area, most samples (8/13) belong to the western haplogroup for mtDNA, but to the central-eastern one for pDNA, the others have matching haplogroups belonging to the central-eastern one (4/13) or the western one (1/13).
The strong West-East differentiation observed for the 350 genotypes dataset (Fig. S2 in supplementary material) and for MH and PH datasets (Fig. S4) was confirmed in the analysis conducted using GWN data. The SVDquartets reconstruction of 171 samples of C. siliqua and four samples of C. oreothauma as an outgroup, confirmed the monophyly of C. siliqua and resolved a strongly supported first split of the SM clade from the remaining C. siliqua lineages (Fig. 2A). A subsequent split separated the remaining Western CEUs (SWS, SES and NM) from Central and Eastern CEUs (CM, NEM, SEM). This phylogenetic topology based on GWN data is congruent with MH and PH data in supporting southwest Morroco as the ancestral area in the carob evolutionary history.
SVDquartets phylogenetic topology is highly congruent with the K=4 genetic structure obtained with SSR data (Fig. 2A). The four genetic groups match the following clades: SM, SWS + SES, NEM + SEM, CM. The relative position of NM in the SVDquartets tree, although sister to the other three western CEUs, is congruent with its mixed assignation to different genetic clusters for SSR and RADseq data (Fig. 2A,D).
The genetic clustering estimated with snmf based on RADseq data was repeated from K=2 to K=7 ancestral populations. The cross-entropy criterion suggested two optimal solutions for 5 and 7 groups (Fig. S5B in supplementary material), of which K=7 had the highest probability. The five groups estimated with snmf (Fig. S5A) are congruent with SSR data and the SVDquartets reconstruction for the Western CEUs: three genetic groups have a good match with SM, SWS + SES and NEM + SEM, respectively, whereas CM have ancestry mainly from the two remaining genetic groups and NM shows admixture from the five genetic groups. The most likely genetic clustering, K=7 solution (Fig. 2A,D), resolved the differentiation of four ancestral populations roughly corresponding to south Morocco (SM), south Spain (SWS and SES) and the eastern CEUs (NEM and SEM). However, departures from a good match between microsatellite-based CEUs and RADseq genetic groups are substantial too. NEM exhibits admixture from SEM. CM includes individuals with admixture from three ancestral groups, two of them (in orange and yellow) almost restricted to this CEU and the last one, coloured in turquoise, present in all CEUs although with higher proprotions in NM, SWS and CM. NM was resolved as admixed with the contribution of all ancestral populations. The PCA results are consistent with the main genetic groups found with snmf, showing a clear delimitation of SM, SWS + SES and NEM+SEM groups, but partial overlap between the NM and CM groups (Figure 2B, C). RADseq genetic differentiation among the seven CEUs (Tab. S2. in supplementary material) was moderate with an overall Gst of 11%; all values above the mean involved a west-east differentiation, the highest differentiation being SES between and SEM (Gst = 19%).
Impact of dispersals on the genetic diversity and structure of