Figure 2. A : Map with sampling sites of Labeobarbussympatric pairs (generalized and thick-lipped ecomorphs) from rivers of the Ethiopian Highlands; the map was created using QGIS v.3.16.4.B : Scheme of lips measurements: LLL – lower lip length, MLL – middle lobe length, ULW – upper lip width.
Morphology. The head length (C ), middle lobe length (MLL ), lower lip length (LLL ), and upper lip width (ULW ) were measured in CorelDRAW 2017 (v. 19.1.0.419) using photographs. Scheme of measurements is given on Fig. 2B. The indexes ofMLL , LLL , and ULW to head length (C ) were used for subsequent analyses.
Gut Length. Gut length (GL ) was measured with a ruler (to the nearest 1 mm). A ratio GL to SL expressed as % was used for analyses.
Diet . Gut content was dried on filter paper and weighed using a Pioneer PX84/E balance with 0.0001 g accuracy. The diet particles were identified using Olympus CX41 microscope (100–1000× magnification) and Motic DMW-143-N2GG stereomicroscope (100–400× magnification). The diet components were grouped into: (i) detritus, (ii) invertebrates, (iii) macrophytes and (iv) fish. The group ‘Invertebrates’ included the larvae of amphibiotic insects, and their fragments. The group ‘Macrophytes’ included any fragments of plants—such as leaves, stems or seeds. A composite measure of diet, an index of relative importance (IR ) [Natarajan, Jhingran, 1961; Popova, Reshetnikov, 2011], was used to assess the contribution of different components to the diet. TheIR index was calculated as follows:\(IR\ =\ \ \frac{Fi\ \times\ Pi}{\sum(Fi\ \times\ Pi)}\ \times\ 100\%\), where Fi = the frequency of occurrence of each food group, andPi = its part by weight; the value of i itself changes from 1 to n (n = the part of food organisms in the food bolus).
Stable Isotope Composition. For stable isotope (SI) analyses, white muscle tissue from the dorsal side of the body under the dorsal fin was sampled from freshly collected specimens. White muscle samples were dried at 60 °C for subsequent SI analyses. The samples were weighed using a Mettler Toledo MX5 microbalance (Mettler Toledo, Columbus, OH, United States) with 2 μg accuracy, and were wrapped in tin capsules. The weight of the fish tissue samples varied from 250 to 500 μg. SI analysis was conducted at the Joint Usage Center of the A.N. Severtsov Institute of Ecology and Evolution RAS, Moscow. Briefly, a Thermo Delta V Plus continuous-flow IRMS was coupled to an elemental analyzer (Flash 1112) equipped with a Thermo No-Blank device. The isotopic composition of N and C was expressed in the ‰. Along with the isotopic analysis, the nitrogen and carbon content (as %) and C/N ratios were determined. In total, 237 white-muscle samples were analyzed.
Statistical Analyses of Morphological and Ecological Data.Several R packages and functions were used for the statistical analyses and plot construction. Basal descriptive statistics was obtained using the summarytools library [Comtois, 2022]. The Mann-Whitney U test was applied for pairwise comparison of the lipped and generalized ecomorphs in lip characters MLL , LLL , ULW , gut length GL , and SI composition within each locality using the function wilcox.test (in package stats ) [R Core Team, 2021]. The Pearson correlation and the violin boxplots were obtained using the ggplot2 library [Wickham, 2022]. Principal component analysis (PCA) was done using the prcomp function. For visualization of PCA results, the packages factoextra library [Kassambara, Mundt, 2020], ggfortify library [Tang et al., 2016], and ggplot2 library [Wickham, 2022] were used. The package SIBER v.2.1.6 [Jackson et al., 2011] was used to assess the differences in the isotopic trophic niche features. The total convex hull areas (TA), core trophic niche breadths, and sample size-corrected standard ellipse area (SEAc) were estimated. The trophic overlap for 95% TA was estimated using nicheROVER [Lysy et al., 2021], a method that is insensitive to the sample size and incorporates statistical uncertainty using Bayesian approach [Swanson et al., 2015].
DNA sampling, extraction, amplification, sequencing, and analysis – mtDNA data. DNA samples (n  = 213) were collected from both generalized and lipped ecomorphs of Labeobarbus  from the same six localities in Ethiopian Highlands (Fig. 2B; Supplementary Table S1 for details). Total genomic DNA was extracted from ethanol-preserved fin tissues using the QIAamp DNA Micro kit (Qiagen). Sequences of the mitochondrial gene cytochrome b  (cytb ), 1038 bp in length, were amplified (polymerase chain reaction [PCR] conditions were taken from: Palumbi, 1996; Perdices & Doadrio, 2001). PCR products were visualized on 1.5% agarose gels, purified with ExoSAP-IT and sequenced using an ABI 3500 sequencer. Some sequences were obtained previously (Levin et al., 2020; GenBank accession nos. are given in Table S1) while new sequences obtained in this study were deposited in GenBank under accession nos. OQ604627-OQ604649 (see Table S1 for details). All sequences were aligned and edited using the muscle algorithm (Edgar, 2004) as implemented in mega 6.0 (Tamura et al., 2013). The data final set is comprised of 213 cytb  sequences. A haplotype network was constructed using the median joining algorithm (Bandelt et al., 1999) in popart 1.7 (Leigh & Bryant, 2015) with the default value of epsilon (0).
ddRAD-seq library preparation . High-molecular-weight DNA was isolated from fin tissue preserved in ethanol using a QIAamp DNA Mini Kit (Qiagen) or obtained with a salt-based DNA extraction method (Aljanabi & Martinez, 1997) followed by purification using a CleanUp Standard kit (Evrogen). The quantity of dsDNA was measured using a dsDNA HS Assay Kit for fluorometer Qubit 3 (Life Technologies). A ddRAD-library was constructed following the quaddRAD protocol (Franchini et al., 2017) using restriction enzymes Pst I and Msp I. In total, 63 DNA samples of Labeobarbus  ecomorphs from five riverine basins (see Table 2) were sequenced by two runs of Illumina HiSeq2500 and Illumina X Ten (2 × 150 bp paired-end reads). The raw sequencing data from 63 samples were demultiplexed by the sequencing provider using outer Illumina TruSeq dual indexes.
Processing of RAD-seq data. Read quality was assessed with fastqc 0.11.7 (Andrews & Krueger, 2010) and multiqc 1.13 (Ewels et al., 2016). Further demultiplexing of individually barcoded samples, construction and cataloging of RAD-loci and single nucleotide polymorphism (SNP) calling were done with stacks 2.62 (Rochette et al., 2019). Identification and removal of PCR duplicates were done using the “clone_filter”  module of stacks. The stacks module “process_radtags” was used to demultiplex reads by the dual index inner barcodes and obtain separate fastq files for each individual. Demultiplexed reads were trimmed at their 5′- and 3′- ends by 5 bp to a uniform length of 140 bp using fastp 0.20.1 (Chen et al., 2018) to reduce the influence of sequencing error (due to decreased base quality at both ends). Samples that failed to produce more than 100,000 reads were excluded from further processing. The ‘integrated’ approach of stacks was used to assemble loci de novo and perform genotype calling after mapping assembled loci to a heterospecific high-quality reference genome of Barbus barbus (GenBank assembly accession: GCA_936440315.1) and filtering. We selected optimal parameters for de novo loci assembly using the approach suggested by Paris et al. (2017). Following the aforementioned procedure, we found that a minimum stack depth (-m ) of 6, distance allowed between stacks (-M ) of 1 and maximum distance required to merge catalog loci (-n ) of 1 provided the best balance between data quality and quantity for our data set. We also set –max_locus_stacks parameter to 7 to improve binning and avoid paralogs (Stobie et al. 2017). We align the loci catalog generated in the de novo assembly to the reference genome using bwa mem v.0.7.17 (Li & Durbin, 2009) with default settings. To avoid spurious alignments of de novo loci ‘stacks-integrate-alignments’ was run with minimum alignment coverage and percent identity are both set to 0.8.
Population genomic and phylogenomic analyses . To prove that sympatric ecomoprhs of Labeobarbus from different basins are independent evolutionary units we used maximum likelihood phylogenetic analysis based on SNPs in iq-tree 1.6.12 (Minh et al., 2020). Multiple sequence alignment (MSA) of SNPs were created using the “–phylip-var” option of the “populations ” module of stacks with retention of loci genotyped in at least 70% of all samples and SNPs with a minor allele count above 3. Heterozygous sites within each individual were encoded using IUPAC notation. Invariant sites (arising due to missing data) were excluded from the MSA by IQ-TREE, resulting in 15,820 nucleotide sites. To take into account the absence of constant sites an ascertainment bias correction (+ASC) model (Lewis, 2001) was applied to all substitution models in a best-fit model selection process with ModelFinder (REF). Branch support values were obtained using an ultrafast bootstrap procedure (Hoang et al., 2018) with 1,000 replicates. The phylogenetic tree was visualized using figtree 1.4.4 (Rambaut, 2014).
STRUCTURE 2.3.4 (Pritchard et al., 2000) was used to examine population structure of the whole dataset and within each pair of ecomorphs (i.e. basin). First, all individual genotypes were filtered and tested for deviations from Hardy–Weinberg Equilibrium (HWE) using the “populations ” module of stacks with the following settings: (i) loci genotyped in at least 80% of all samples were kept; (ii) SNPs with a minor allele count (–min-mac) less than 3 were pruned. Next, we whitelisted loci that are in HWE for all populations and created a structure-formatted file with “populations ” while retaining a single random SNP per locus to avoid inclusion of closely linked SNPs. This dataset consisted of 2,411 SNPs. We performed 16 independent runs (100,000 chains as burn-in plus 100,000 MCMC chains) of STRUCTURE for each K between 1 and 8 using the admixture model with correlated allele frequencies. Structure_threader (Pina-Martins et al., 2017) was used to parallelize the runs. Results of the Structure runs were summarized using CLUMPAK webserver (Kopelman et al., 2015) with default settings. The optimal K-value was determined by the approaches of Pritchard et al. (2000) and Evanno et al. (2005). The same protocol was followed for consecutive hierarchical STRUCTURE runs for the identified clusters until no subdivision was revealed (i.e., K = 1). For these runs we created new SNP sets using the same procedure as described above but limiting genotypes to the individuals from identified genetic clusters. In addition, Principal Component Analysis (PCA) was performed using the glPca function of the ADEGENET 2.1.1 R-package (Jombart and Ahmed, 2011). Pairwise Reich–Patterson F ST values (Reich et al., 2009) with respective 95% confidence intervals for ecomorphs/genetic clusters were calculated using the R script from Junker et al. (2020). The PCA and Reich–Patterson F ST calculations were done on the aforementioned 2,411 SNPs dataset.