3.2 Genome annotation
Repeat sequences that were 526.92 Mb in length, accounting for 50.47%,
were identified in the assembled genome of the C. sonnerati . The
TEs accounted for 47.23% with 493.11 Mb in length of the assembly
genome (Table 4). The percentage was higher than that ofPlectropomus leopardus (30.74%) (Zhou et al., 2020) andEpinephelus akaara (43.02%) (Ge et al., 2019). Among them, DNA
transposons, LINEs, and LTRs were the top three categories of repetitive
elements, accounting for 24.82, 13.74, and 6.72%, respectively.
We predicted protein-coding genes of the C. sonnerati genome by
using three methods, including de novo , homology-based and
transcriptome sequencing-based gene predictions. A total of 26,130
protein-coding genes were generated from the genome of C.
sonnerati (Supplementary Table S2). Then, the statistics of the
predicted gene models were compared with eight closet teleost species
(E. lanceolatus , P.
leopardus , E. akaara ,O. niloticus ,L.
calcarifer ,G.
acuticeps ,P.
georgianus andC.
lumpus ), displaying similar distribution patterns in the exon and
intron number, gene and CDS length, exon and intron length, and gene and
CDS gene content of C. sonnerati (Figure 3). In total, 24,629
genes (approximately 94.26%) were functionally annotated in at least
one of the databases (Table 5), which is higher than that of E.
akaara (23,808) (Ge et al., 2019) and P. leopardus (24,364)
(Zhou et al., 2020), but lower than that of E. lanceolatus(24,794) (Zhou et al., 2019).
For non-coding genes, 373 miRNAs, 2,232 tRNAs, 169 rRNAs and 515 snRNAs
were also identified in the genome of C. sonnerati (Supplementary
Table S3).