Phylogenetic analysis and divergence time estimation
To ascertain the evolutionary position of T. dalaica , gene models of its genome were compared with that of six other relative fish in Cypriniformes. These included two in Cobitidae (T. siluroides andT. tibetana ), and four in Cyprinidae (Anabarilius grahami ,D. rerio , Danionella translucida , and Megalobrama amblycephala ). Gene models of these species were downloaded from the NCBI Database and then were used to identify potential orthologous gene families following the OrthoMCL v2.0.9 pipeline (L. Li, Stoeckert, & Roos, 2003) under default settings.
Single copy orthologous genes were used to perform both molecular phylogenetic analysis and subsequent divergence time estimations. Briefly, deduced protein sequences were aligned using MUSCLE (Edgar, 2004), where they were transformed back into nucleotide sequences; with highly variable regions filtered using Gblocks v0.91b (Talavera & Castresana, 2007). Alignments were then concatenated and fed into RAxML v8.2.11 (Stamatakis, 2014), to perform phylogenetic analysis using the GTRGAMMA model. To assess topological robustness, 100 bootstrap replicates were performed. Divergence time estimations, for T. dalaica and relative fishes, were implemented using MCMCTree, included in PAML package v4.9e (Yang, 2007) with known divergence times downloaded from Timetree (http://www.timetree.org/) set as calibration points. The MCMCTree parameters were set as follows: clock = 2, RootAge ≤ 1.73, model = 7, BDparas = 110, kappa_gamma = 62, alpha_gamma = 11, rgene_gamma = 23.18, and sigma2_gamma = 14.5.
Gene family dynamics and positively selected T. dalaica candidate genes
Based on divergence times estimated for T. dalaica and relative species, as well as gene families identified using OrthoMCL, the possible expansion and contraction of gene families residing in these genomes were detected using Computational Analysis of gene Family Evolution, v4.0.1 (CAFE) (De Bie, Cristianini, Demuth, & Hahn, 2006).
To precisely detect positively selected genes, specifically those likely to be related to alkaline acclimation, a new set of orthologous gene groups were identified. A reciprocal best hits strategy, based on gene model comparison between three public Triplophysa species generated using BLAST v2.2.30, with an E-value cutoff ≤ 1e-05, was used. Multiple sequence alignments were performed using GUIDANCE2 (Sela, Ashkenazy, Katoh, & Pupko, 2015) for all ortholog groups, with parameters set as follows: seqType = codon, seqCutoff = 0.3, and msaProgram = muscle. Next, codeml, included in PAML v4.9e (Yang, 2007), was used to estimate dN/dS ratios (ω), to deduce selection pressures leading to the current evolution of T. dalaica . To achieve this, branch-site models (model = 2 and NSsite = 2) were used. For the null hypothesis, parameters ‘fix_omega’ and ‘omega’ were each set to 1, while for the alternative hypothesis, ‘fix_omega’ and ‘omega’ were set to 0 and 1.5, respectively. To check convergence, analyses were performed twice for each ortholog group, with final p -values obtained through comparison of both the chi2 distribution and twice the LRT values, between the two models.