2.2 Genome survey and assembly
The genome size, repetitive sequence proportion and heterozygosity was
estimated by K-mer frequency distribution method with K-mer=17. The
genome size was revised by error rate: Revised size = Genome size
(1-error rate), the error rate refers to the proportion of K-mer with
depth of 1.
After adapter removal and filtered by minimum length of 50 bp, the
subreads from PacBio platform were assembled using wtdbg2 (Ruan & Li,
2020), which uses the Fuzzy Bruijn Graph (FBG) approach. The FBG is not
as sensitive to small duplications as the De Bruijn Graph. To solve the
problem of high error rate, the Gapped Sequence Alignment method was
used. After quality control, the high-quality Hi-C sequencing data were
mapped to the draft genome by BWA software (H. Li & Durbin, 2010), and
Samtools (H. Li et al., 2009) was used to remove duplicate and unmapped
data to obtain high-quality data. Next, the reads near the restriction
sites were extracted for assisted assembly (Burton et al., 2013).