2.3 Genome assembly and quality
control
We
used NextDenovo v2.4.0 (https://github.com/Nextomics/NextDenovo) tode novo assemble the genome with ONT long reads (100×). First,
the NextCorrect module was applied to correct the raw reads, then the
preliminary genome assembly was generated by the NextGraph module. Purge
Haplotigs (Roach et al., 2018) were used to identify and remove the
candidate duplicate haplotypes to manually curate the heterozygous
assemblies. Racon (Vaser, Sović, Nagarajan, & Šikić, 2017) v1.4.20 was
then employed to polish the assembly for two rounds with the corrected
ONT long reads (Figure S2 and S3). Finally, we used Nextpolish (J. Hu,
Fan, Sun, & Liu, 2020) v1.3.1 for two rounds of assembly polishing
based on Illumina short reads (100×) and then we generated the final
genome assembly.
We anchored the genome assembly to the chromosome level using the Hi-C
data. HiC-Pro (Servant et al., 2015) was employed to control the raw
data with default parameters. Bowtie2 (Langmead & Salzberg, 2012) was
used to map the Hi-C reads to the assembled genome. The unique mapped
reads were extracted, with duplicates excluded, by HiC-Pro. Finally, we
used LACHESIS (Burton et al., 2013) to cluster, reorder, and orientate
the corrected contigs onto
pseudo-chromosomes based on the
interaction level.
To assess the quality of our assembly, whole-genome sequencing (NGS)
reads and assembled transcripts were mapped to the genome by BWA (H. Li,
2013) v0.7.17 and HISAT2 (D. Kim, Langmead, & Salzberg, 2015) v2.1.0,
respectively. Benchmarking Universal Single-Copy Orthologs (BUSCO)
(Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) was also
employed to assess the completeness of the assembly based on the dataset
of embryophyta_odb10.