3.2 Genome assembly and assessment
A total of 350.81 Gb polymerase reads were obtained from the PacBio platform. After adapter removal and quality control, we obtained 220.53 Gb subreads (coverage depth approximately 159 X), with average read length of 16,922 bp (Table S2). The genome assembled using these subreads was 1,465.32 Mb in size and contained 9,015 contigs with an N50 length of 472,841 bp (Table 1). According to the Benchmarking Universal Single-Copy Orthologs (BUSCO) notation analysis using 1,066 lineal homologous single-copy genes, 95.3% of the genes were assembled (complete: 94%, fragmented: 1.3%, missing: 4.7%) (Table S3). Furthermore, Core Eukaryotic Genes Mapping Approach (CEGMA) analysis revealed that 234 genes were assembled from 248 Core Eukaryotic Genes (CEGs), which accounted for 94.35% (Table S4). Both analyses indicated that the genome assembly was relatively complete. The mapping rate of short reads from the Illumina platform was approximately 89.62%, and the coverage rate reached 96.15%, indicating that the short reads and the assembled genome had a good consistency (Table S5).
We obtained 203.8 Gb clean, non-duplicate data from the Illumina HiSeq platform by Hi-C technology. The contigs were anchored to 1,063 scaffolds with N50 of 36.87 Mb (Table 1, Table S6), including 43 pseudochromosomes (Figure 1b, c) and 1,020 unplaced scaffolds. The total length of the 43 pseudochromosomes was 1,451.52 Mb (Table S7), covered 99.00% of the assembly, whereas the length of unplaced scaffolds was 14.60 Mb (Table S8).