Scaffolding of the T. dalaica genome
It is thought that interactions occur more frequently between closer locations on the chromosome, than farther. Based on next-generation sequencing technology, a total of 106 Gb of data, covering nearly 200X of the estimated genome size, were generated for the Hi-C library. Based on interaction relationships detected using Hi-C technology, 141 contigs, covering approximately 96.3% of the whole genome length, were anchored and orientated onto 25 chromosomes, resulting in a scaffold N50 length of 23.6 Mb. The interaction relationships along each chromosome are shown in Figure 1. Notably, each chromosome had less than 10 introduced gaps, except chromosome 5. Moreover, our assembly captured long stretches of telomeric sequence (TAACCC/TTAGGG)n at all ends of 14 chromosomes, and at one single end for eight chromosomes (Table 1). To assess the accuracy of the scaffolding results, based on Hi-C technology, gene collinearity between T. dalaica and its two relative species, D. rerio and T. tibetana , were compared, with results indicating nearly perfect collinearity (Figure 2). Thus taken together, the high anchoring rate, the minimal gap numbers, the presence of telomeric sequences for most chromosomes, and the nearly perfect collinearity, all indicate the high level of quality of the present T. dalaica genome assembly.