Scaffolding of the T. dalaica genome
It is thought that interactions occur more frequently between closer
locations on the chromosome, than farther. Based on next-generation
sequencing technology, a total of 106 Gb of data, covering nearly 200X
of the estimated genome size, were generated for the Hi-C library. Based
on interaction relationships detected using Hi-C technology, 141
contigs, covering approximately 96.3% of the whole genome length, were
anchored and orientated onto 25 chromosomes, resulting in a scaffold N50
length of 23.6 Mb. The interaction relationships along each chromosome
are shown in Figure 1. Notably, each chromosome had less than 10
introduced gaps, except chromosome 5. Moreover, our assembly captured
long stretches of telomeric sequence (TAACCC/TTAGGG)n at all ends of 14
chromosomes, and at one single end for eight chromosomes (Table 1). To
assess the accuracy of the scaffolding results, based on Hi-C
technology, gene collinearity between T. dalaica and its two
relative species, D. rerio and T. tibetana , were compared,
with results indicating nearly perfect collinearity (Figure 2). Thus
taken together, the high anchoring rate, the minimal gap numbers, the
presence of telomeric sequences for most chromosomes, and the nearly
perfect collinearity, all indicate the high level of quality of the
present T. dalaica genome assembly.