Abstract
Sophora japonica is a medium-size deciduous tree belonging to
Leguminosae family and famous for its high ecological, economic, and
medicinal value. Here, we reveal a draft genome of S. japonica,
which was approximately 511.49 Mb long (contig N50 size of 16.15 Mb)
based on Illumina, Nanopore and Hi-C data. We reliably assembled 110
contigs into 14 chromosomes, representing 91.62% of the total genome,
with an improved N50 size of 31.32 Mb based on Hi-C data. Further
investigation identified 271.76 Mb (53.13%) of repetitive sequences and
31,000 protein-coding genes, of which 30,721 (99.1%) were functionally
annotated. Phylogenetic analysis indicates that S. japonica
separated from Arabidopsis thaliana and Glycine max about
107.53 and 61.24 million years ago, respectively. We detected evidence
of species-specific and common-legume WGD events in S. japonica.
We further found that multiple TF families (e.g., BBX and PAL) have
expanded in S. japonica, which might have led to its enhanced
tolerance to abiotic stress. In addition, S. japonica harbors
more genes involved in the lignin and cellulose biosynthesis pathways
than the other two species. Finally, population genomic analyses
revealed no obvious differentiation among geographical groups and the
effective population size continuously declined since 2 Ma. Our genomic
data provide a powerful comparative framework to study the adaptation,
evolution and active ingredients biosynthesis in S. japonica.
More importantly, our high-quality S. japonica genome is
important for elucidating the biosynthesis of its main bioactive
components, and improving its production and/or processing.