2.2 Genome survey and assembly
The genome size, repetitive sequence proportion and heterozygosity was estimated by K-mer frequency distribution method with K-mer=17. The genome size was revised by error rate: Revised size = Genome size (1-error rate), the error rate refers to the proportion of K-mer with depth of 1.
After adapter removal and filtered by minimum length of 50 bp, the subreads from PacBio platform were assembled using wtdbg2 (Ruan & Li, 2020), which uses the Fuzzy Bruijn Graph (FBG) approach. The FBG is not as sensitive to small duplications as the De Bruijn Graph. To solve the problem of high error rate, the Gapped Sequence Alignment method was used. After quality control, the high-quality Hi-C sequencing data were mapped to the draft genome by BWA software (H. Li & Durbin, 2010), and Samtools (H. Li et al., 2009) was used to remove duplicate and unmapped data to obtain high-quality data. Next, the reads near the restriction sites were extracted for assisted assembly (Burton et al., 2013).