Genome annotations
Generally, annotations of a newly assembled genome include repeat, gene
model, and gene function annotations. For repeat annotations, a total of
197,396 SSRs were obtained using MISA. Combined homology and de novo
based results showed that repeat sequences accounted for 35.01% of theT. dalaica genome assembly (Table 2), of which, DNA transposons
made up the greatest proportion (16.01%), followed by LRT (8.9%) and
LINEs (4.24%).
The final set of protein-coding genes was obtained by integrating the
results of ab initio, homologue based, RNA-seq based predictions. This
set consisted of 23,925 genes, with average gene length, average CDS
length, and number of exons, per gene, 12,128.58 bp, 1,715.77 bp and
9.9, respectively. Distribution of these parameters was similar amongT. dalaica and the species used for annotation (Figure S1),
suggesting both gene conservation and annotation robustness. On the
contrary, the homology-based ncRNA annotation showed a total of 1,664
miRNAs, 11,504 tRNAs, 684 rRNAs, and 1,207snRNAs residing in the genome.
Functional annotations, including function descriptions, KEGG pathways,
and GO term assignments, as well as database summaries, are shown in
Table S5. In total, 23,594, 98.62% of the total 23,925 genes, could be
annotated as having potential functions.