Figure 2. A : Map with sampling sites of Labeobarbussympatric pairs (generalized and thick-lipped ecomorphs) from rivers of
the Ethiopian Highlands; the map was created using QGIS v.3.16.4.B : Scheme of lips measurements: LLL – lower lip length, MLL –
middle lobe length, ULW – upper lip width.
Morphology. The head length (C ), middle lobe length
(MLL ), lower lip length (LLL ), and upper lip width
(ULW ) were measured in CorelDRAW 2017 (v. 19.1.0.419) using
photographs. Scheme of measurements is given on Fig. 2B. The indexes ofMLL , LLL , and ULW to head length (C ) were
used for subsequent analyses.
Gut Length. Gut length (GL ) was measured with a ruler (to
the nearest 1 mm). A ratio GL to SL expressed as % was
used for analyses.
Diet . Gut content was dried on filter paper and weighed using a
Pioneer PX84/E balance with 0.0001 g accuracy. The diet particles were
identified using Olympus CX41 microscope (100–1000× magnification) and
Motic DMW-143-N2GG stereomicroscope (100–400× magnification). The diet
components were grouped into: (i) detritus, (ii) invertebrates, (iii)
macrophytes and (iv) fish. The group ‘Invertebrates’ included the larvae
of amphibiotic insects, and their fragments. The group ‘Macrophytes’
included any fragments of plants—such as leaves, stems or seeds. A
composite measure of diet, an index of relative importance (IR )
[Natarajan, Jhingran, 1961; Popova, Reshetnikov, 2011], was used to
assess the contribution of different components to the diet. TheIR index was calculated as follows:\(IR\ =\ \ \frac{Fi\ \times\ Pi}{\sum(Fi\ \times\ Pi)}\ \times\ 100\%\),
where Fi = the frequency of occurrence of each food group, andPi = its part by weight; the value of i itself changes
from 1 to n (n = the part of food organisms in the food
bolus).
Stable Isotope Composition. For stable isotope (SI) analyses,
white muscle tissue from the dorsal side of the body under the dorsal
fin was sampled from freshly collected specimens. White muscle samples
were dried at 60 °C for subsequent SI analyses. The samples were weighed
using a Mettler Toledo MX5 microbalance (Mettler Toledo, Columbus, OH,
United States) with 2 μg accuracy, and were wrapped in tin capsules. The
weight of the fish tissue samples varied from 250 to 500 μg. SI analysis
was conducted at the Joint Usage Center of the A.N. Severtsov Institute
of Ecology and Evolution RAS, Moscow. Briefly, a Thermo Delta V Plus
continuous-flow IRMS was coupled to an elemental analyzer (Flash 1112)
equipped with a Thermo No-Blank device. The isotopic composition of N
and C was expressed in the ‰. Along with the isotopic analysis, the
nitrogen and carbon content (as %) and C/N ratios were determined. In
total, 237 white-muscle samples were analyzed.
Statistical Analyses of Morphological and Ecological Data.Several R packages and functions were used for the statistical analyses
and plot construction. Basal descriptive statistics was obtained using
the summarytools library [Comtois, 2022]. The Mann-Whitney U
test was applied for pairwise comparison of the lipped and generalized
ecomorphs in lip characters MLL , LLL , ULW , gut
length GL , and SI composition within each locality using the
function wilcox.test (in package stats ) [R Core Team,
2021]. The Pearson correlation and the violin boxplots were obtained
using the ggplot2 library [Wickham, 2022]. Principal
component analysis (PCA) was done using the prcomp function. For
visualization of PCA results, the packages factoextra library
[Kassambara, Mundt, 2020], ggfortify library [Tang et al.,
2016], and ggplot2 library [Wickham, 2022] were used. The
package SIBER v.2.1.6 [Jackson et al., 2011] was used to assess the
differences in the isotopic trophic niche features. The total convex
hull areas (TA), core trophic niche breadths, and sample size-corrected
standard ellipse area (SEAc) were estimated. The trophic overlap for
95% TA was estimated using nicheROVER [Lysy et al., 2021], a method
that is insensitive to the sample size and incorporates statistical
uncertainty using Bayesian approach [Swanson et al., 2015].
DNA sampling, extraction, amplification, sequencing, and analysis
– mtDNA data. DNA samples
(n = 213) were collected from both generalized and lipped
ecomorphs of Labeobarbus from the same six localities in
Ethiopian Highlands (Fig. 2B; Supplementary
Table S1 for
details). Total genomic DNA was extracted from ethanol-preserved fin
tissues using the QIAamp DNA Micro kit (Qiagen). Sequences of the
mitochondrial gene
cytochrome b (cytb ),
1038 bp in length, were amplified
(polymerase chain reaction [PCR] conditions were taken from:
Palumbi, 1996; Perdices & Doadrio, 2001). PCR products were visualized
on 1.5% agarose gels, purified with ExoSAP-IT and sequenced using an
ABI 3500 sequencer. Some sequences were obtained previously (Levin et
al., 2020; GenBank accession nos. are given in Table S1) while new
sequences obtained in this study were deposited in GenBank under
accession nos. OQ604627-OQ604649 (see Table S1 for details). All
sequences were aligned and edited using the muscle algorithm
(Edgar, 2004) as implemented in mega 6.0 (Tamura et al., 2013).
The data final set is comprised of 213 cytb sequences. A
haplotype network was constructed using the median joining algorithm
(Bandelt et al., 1999) in popart 1.7 (Leigh & Bryant, 2015)
with the default value of epsilon (0).
ddRAD-seq library preparation . High-molecular-weight DNA was
isolated from fin tissue preserved in ethanol using a QIAamp DNA Mini
Kit (Qiagen) or obtained with a salt-based DNA extraction method
(Aljanabi & Martinez, 1997) followed by purification using a CleanUp
Standard kit (Evrogen). The quantity of dsDNA was measured using a dsDNA
HS Assay Kit for fluorometer Qubit 3 (Life Technologies). A
ddRAD-library was constructed following the quaddRAD protocol (Franchini
et al., 2017) using restriction enzymes Pst I and Msp I. In
total, 63 DNA samples of Labeobarbus ecomorphs from five riverine
basins (see
Table 2)
were sequenced by two runs of Illumina HiSeq2500 and Illumina X Ten
(2 × 150 bp paired-end reads). The raw sequencing data from 63 samples
were demultiplexed by the sequencing provider using outer Illumina
TruSeq dual indexes.
Processing of RAD-seq data. Read quality was assessed
with fastqc 0.11.7 (Andrews & Krueger, 2010)
and multiqc 1.13 (Ewels et
al., 2016).
Further demultiplexing of individually barcoded samples, construction
and cataloging of RAD-loci and single nucleotide polymorphism (SNP)
calling were done with stacks 2.62 (Rochette et
al., 2019).
Identification and removal of PCR duplicates were done using
the “clone_filter” module of stacks. The
stacks module “process_radtags” was used to
demultiplex reads by the dual index inner barcodes and obtain separate
fastq files for each individual. Demultiplexed reads were trimmed at
their 5′- and 3′- ends by 5 bp to a uniform length of 140 bp using fastp
0.20.1 (Chen et al., 2018) to reduce the influence of sequencing error
(due to decreased base quality at both ends). Samples that failed to
produce more than 100,000 reads were excluded from further processing.
The ‘integrated’ approach of stacks was used to assemble
loci de novo and perform genotype calling after mapping assembled
loci to a heterospecific high-quality reference genome of Barbus
barbus (GenBank assembly accession: GCA_936440315.1) and filtering. We
selected optimal parameters for de novo loci assembly using the
approach suggested by Paris et al. (2017). Following the aforementioned
procedure, we found that a minimum stack depth (-m ) of 6,
distance allowed between stacks (-M ) of 1 and maximum distance
required to merge catalog loci (-n ) of 1 provided the best
balance between data quality and quantity for our data set. We also set
–max_locus_stacks parameter to 7 to improve binning and avoid
paralogs (Stobie et al. 2017). We align the loci catalog generated in
the de novo assembly to the reference genome using bwa mem
v.0.7.17 (Li & Durbin, 2009) with default settings. To avoid spurious
alignments of de novo loci ‘stacks-integrate-alignments’ was run
with minimum alignment coverage and percent identity are both set to
0.8.
Population genomic and phylogenomic analyses . To prove that
sympatric ecomoprhs of Labeobarbus from different basins are
independent evolutionary units we used maximum likelihood phylogenetic
analysis based on SNPs in iq-tree 1.6.12 (Minh et al., 2020).
Multiple sequence alignment (MSA) of SNPs were created using the
“–phylip-var” option of the “populations ” module
of stacks with retention of loci genotyped in at least 70% of
all samples and SNPs with a minor allele count above 3. Heterozygous
sites within each individual were encoded using IUPAC notation.
Invariant sites (arising due to missing data) were excluded from the MSA
by IQ-TREE, resulting in 15,820 nucleotide sites. To take into account
the absence of constant sites an ascertainment bias correction (+ASC)
model (Lewis, 2001) was applied to all substitution models in a best-fit
model selection process with ModelFinder (REF). Branch support values
were obtained using an ultrafast bootstrap procedure (Hoang et
al., 2018) with 1,000 replicates. The phylogenetic tree was visualized
using figtree 1.4.4 (Rambaut, 2014).
STRUCTURE 2.3.4 (Pritchard et al., 2000) was used to examine population
structure of the whole dataset and within each pair of ecomorphs (i.e.
basin). First, all individual genotypes were filtered and tested for
deviations from Hardy–Weinberg Equilibrium (HWE) using the
“populations ” module of stacks with the following
settings: (i) loci genotyped in at least 80% of all samples were kept;
(ii) SNPs with a minor allele count (–min-mac) less than 3 were
pruned. Next, we whitelisted loci that are in HWE for all populations
and created a structure-formatted file with “populations ” while
retaining a single random SNP per locus to avoid inclusion of closely
linked SNPs. This dataset consisted of 2,411 SNPs. We performed 16
independent runs (100,000 chains as burn-in plus 100,000 MCMC chains) of
STRUCTURE for each K between 1 and 8 using the admixture model with
correlated allele frequencies. Structure_threader (Pina-Martins et al.,
2017) was used to parallelize the runs. Results of the Structure runs
were summarized using CLUMPAK webserver (Kopelman et al., 2015) with
default settings. The optimal K-value was determined by the approaches
of Pritchard et al. (2000) and Evanno et al. (2005). The same protocol
was followed for consecutive hierarchical STRUCTURE runs for the
identified clusters until no subdivision was revealed (i.e., K = 1). For
these runs we created new SNP sets using the same procedure as described
above but limiting genotypes to the individuals from identified genetic
clusters. In addition, Principal Component Analysis (PCA) was performed
using the glPca function of the ADEGENET 2.1.1 R-package (Jombart and
Ahmed, 2011). Pairwise
Reich–Patterson F ST values (Reich et al., 2009)
with respective 95% confidence intervals for ecomorphs/genetic clusters
were calculated using the R script from Junker et al. (2020). The PCA
and Reich–Patterson F ST calculations were done
on the aforementioned 2,411 SNPs dataset.