3.2 Genome assembly and assessment
A total of 350.81 Gb polymerase reads were obtained from the PacBio
platform. After adapter removal and quality control, we obtained 220.53
Gb subreads (coverage depth
approximately 159 X), with average
read length of 16,922 bp (Table S2). The genome assembled using these
subreads was 1,465.32 Mb in size
and contained 9,015 contigs with an N50 length of 472,841 bp (Table 1).
According to the Benchmarking Universal Single-Copy Orthologs (BUSCO)
notation analysis using 1,066 lineal homologous single-copy genes,
95.3% of the genes were assembled (complete: 94%, fragmented: 1.3%,
missing: 4.7%) (Table S3). Furthermore, Core Eukaryotic Genes Mapping
Approach (CEGMA) analysis revealed that 234 genes were assembled from
248 Core Eukaryotic Genes (CEGs), which accounted for 94.35% (Table
S4). Both analyses indicated that the genome assembly was relatively
complete. The mapping rate of short reads from the Illumina platform was
approximately 89.62%, and the coverage rate reached 96.15%, indicating
that the short reads and the assembled genome had a good consistency
(Table S5).
We obtained 203.8 Gb clean, non-duplicate data from the Illumina HiSeq
platform by Hi-C technology. The
contigs were anchored to 1,063 scaffolds with N50 of 36.87 Mb (Table 1,
Table S6), including 43 pseudochromosomes (Figure 1b, c) and 1,020
unplaced scaffolds. The total
length of the 43 pseudochromosomes was 1,451.52 Mb (Table S7), covered
99.00% of the assembly, whereas the length of unplaced scaffolds was
14.60 Mb (Table S8).