3.1 Genome assembly
In this study, we generated a high-quality chromosome-level genome
assembly of C. sonnerati using a combination of PacBio sequencing
and Hi-C sequencing technologies. We obtained 56.98 Gb of clean
short-read sequencing data from the genome of C. sonnerati(Figure 1). Then, the quality clean reads were used for genome size
estimation by the k-mer-based methods (Liu. et al., 2013). Accordingly,
the genome size of C. sonnerati was estimated to be 1015 Mb, with
the proportion of repeat sequences and the heterozygosity rate
determined to be 0.84% and 42.99%, respectively (Figure 2, Table 1).
With the SMRT cells in the PacBio Sequel platform, we generated
~100X subreads by removing adaptor sequences within
sequences. The longest 150X subreads data was used for genome assembly
of C. sonnerati . Then the draft assembly of the genome was
assembled using mecat2 (Xiao et al., 2017) with default parameters. To
correct errors in the primary assembly, we used gcpp (v1.9.0)
(https://github.com/PacificBiosciences/gcpp)to polish the genome after the initial assembly of the genome was
completed. In addition, we used Illumina derived short reads to correct
any remaining errors by Pilon (v1.22) (Walker et al., 2014). Finally, we
produced a total length of about 1043.66 Mb with an N50 length of 2.49
Mb, which accounted for 97.3% of the genome size estimated by k-mer
analysis, containing 795 contigs (Table 2). Moreover, the genome of theC. sonnerati was longer than that the genome of the leopard coral
grouper Plectropomus leopardus (881.55 Mb) (Zhou et al., 2020)
but shorter than the genome of the red spotted grouper Epinephelus
akaara (1135 Mb) (Ge et al., 2019). Furthermore, the assembled genome
was subjected to BUSCO (Benchmarking Universal Single-Copy Orthologs)
v3.0.2 with OrthoDB to evaluate the completeness of the genome.
Overall,
95.8% and 95.6%
of
the complete BUSCOs were identified in the assembled and annotated
genome,
respectively (Supplementary Table S1). The results validated that the
genome assembly was complete.
For anchored contigs, 801,816,224 clean read pairs were generated from
the Hi-C library and were mapped to the polished C. sonneratigenome using BWA (bwa 0.7.17) with the default parameters. Then, we
generated 324,980,877 unique mapped paired-end reads that were used to
perform the Hi-C-associated scaffolding. Finally, we successfully
clustered 795 contigs into 24 groups with the agglomerative hierarchical
clustering method (Burton et al., 2013) in C. sonnerati .
Subsequently, the genome of C. sonnerati was applied to order and
orient the clustered contigs. Similarly, there were 767 contigs
successfully ordered and oriented with 1.02 Gb. Finally, we obtained the
first chromosome-level high-quality assembly, and chromosomal lengths
ranged from 2.52 to 44.48 Mb, containing
98.01%
of the total sequence (Table 3).