Phylogenetic analysis and divergence time estimation
To ascertain the evolutionary position of T. dalaica , gene models
of its genome were compared with that of six other relative fish in
Cypriniformes. These included two in Cobitidae (T. siluroides andT. tibetana ), and four in Cyprinidae (Anabarilius grahami ,D. rerio , Danionella translucida , and Megalobrama
amblycephala ). Gene models of these species were downloaded from the
NCBI Database and then were used to identify potential orthologous gene
families following the OrthoMCL v2.0.9 pipeline (L. Li, Stoeckert, &
Roos, 2003) under default settings.
Single copy orthologous genes were used to perform both molecular
phylogenetic analysis and subsequent divergence time estimations.
Briefly, deduced protein sequences were aligned using MUSCLE (Edgar,
2004), where they were transformed back into nucleotide sequences; with
highly variable regions filtered using Gblocks v0.91b (Talavera &
Castresana, 2007). Alignments were then concatenated and fed into RAxML
v8.2.11 (Stamatakis, 2014), to perform phylogenetic analysis using the
GTRGAMMA model. To assess topological robustness, 100 bootstrap
replicates were performed. Divergence time estimations, for T.
dalaica and relative fishes, were implemented using MCMCTree, included
in PAML package v4.9e (Yang, 2007) with known divergence times
downloaded from Timetree (http://www.timetree.org/) set as calibration
points. The MCMCTree parameters were set as follows: clock = 2, RootAge
≤ 1.73, model = 7, BDparas = 110, kappa_gamma = 62, alpha_gamma = 11,
rgene_gamma = 23.18, and sigma2_gamma = 14.5.
Gene
family dynamics and positively selected T. dalaica candidate genes
Based on divergence times estimated for T. dalaica and relative
species, as well as gene families identified using OrthoMCL, the
possible expansion and contraction of gene families residing in these
genomes were detected using Computational Analysis of gene Family
Evolution, v4.0.1 (CAFE) (De Bie, Cristianini, Demuth, & Hahn, 2006).
To precisely detect positively selected genes, specifically those likely
to be related to alkaline acclimation, a new set of orthologous gene
groups were identified. A reciprocal best hits strategy, based on gene
model comparison between three public Triplophysa species
generated using BLAST v2.2.30, with an E-value cutoff ≤ 1e-05, was used.
Multiple sequence alignments were performed using GUIDANCE2 (Sela,
Ashkenazy, Katoh, & Pupko, 2015) for all ortholog groups, with
parameters set as follows: seqType = codon, seqCutoff = 0.3, and
msaProgram = muscle. Next, codeml, included in PAML v4.9e (Yang, 2007),
was used to estimate dN/dS ratios (ω), to deduce selection pressures
leading to the current evolution of T. dalaica . To achieve
this, branch-site models (model = 2 and NSsite = 2) were used. For the
null hypothesis, parameters ‘fix_omega’ and ‘omega’ were each set to 1,
while for the alternative hypothesis, ‘fix_omega’ and ‘omega’ were set
to 0 and 1.5, respectively. To check convergence, analyses were
performed twice for each ortholog group, with final p -values
obtained through comparison of both the chi2 distribution and twice the
LRT values, between the two models.