Sample collection and sequencing
A female T. dalaica (Figure 1A) was collected from Lake Dali Nur, in Inner Mongolia (43°22’43”N, 116°39’24”E, sampling site 1 of Figure 1B), and subjected to DNA sequencing. The fish was immediately dissected following treatment with the anesthetic, tricaine MS-222. Total genomic DNA was extracted from muscle tissue using the standard phenol/chloroform extraction method. A paired-end library, with an insert size of 400 bp, was constructed according to Illumina standard procedures (Illumina, San Diego, CA, USA). The library was sequenced on a HiSeq 2500 system, using the 150 bp PE mode. Extracted DNA was also used to construct two 20 kb libraries, according to PacBio manufacturing protocols (Pacific Biosciences, CA, USA). Libraries were then sequenced using one cell of the PacBio Sequel II sequencing platform. For comparison, another freshwater T. dalaica, from the Hai River in Henan province (35°54’28.0”N, 113°51’26.0”E, sampling site 2 of Figure 1B), was also subject to both Illumina library construction and sequencing using the HiSeq 2500 platform; resulting in the generation of millions of 150 bp paired-end reads.
To obtain as many expression evidence as possible for gene prediction, total RNA from eight tissues, including intestine, liver, brain, heart, muscle, gill, spleen, and ovary, of the aforementioned individual used for de novo sequencing, were extracted using a total RNA purification kit (Takara Bio). For each tissue, one RNA-sequencing library, with an insert size of 350 bp, was constructed and sequenced on the Illumina HiSeq 2500 platform, using the 150 bp PE mode.