Results
Summary of the Genomeassembly and annotation
data
Our genome survey was based on a total of 44.23-Gb short reads generated
from the Illumina platform. The peak depth was determined at 54, and the
total 17-mer number was
42,252,144,258 (see Fig. 1).
Therefore, the genome size of
striped catfish was calculated to be
about 782 Mb. In addition, the heterozygosity rate and repeat rate were
estimated to be 0.26% and 43.39%, respectively (Fig. 1).
A total of 63.07-Gb long reads
generated from the Nanopore platform were assembled into a 742.6-Mb
draft genome, with a contig N50 of
3.5 Mb and the GC content of 38.89% (Table 1). Additionally, A total of
682 million raw reads with total length of about 90 Gb generated from
the Hi-C library were applied to identify contacts among contigs, of
which 304 million pairs of reads (89.3% of raw reads) were mapped to
the assembled genome. The mapped reads were then used to assemble
contigs into scaffolds, resulting in a 731.7-Mb genome assembly with 30
chromosomes and a scaffold N50 of
29.5 Mb (Fig. 2a, Table1). The
chromosome-level genome assembly of striped catfish was mapped as a
Circos atlas, with genome-wide distributions of gene density, genomic GC
content, and the internal syntenic blocks (Fig. 2b).
By repeat annotation, we predicted a total of 274-Mb repeat sequences,
covering 36.9% of the total assembled genome (Table S1). Among them,
172.8 Mb of DNA repeat elements, 63.1 Mb of long interspersed nuclear
elements (LINE), 48.8 Mb of long terminal repeats (LTR) and 5.4 Mb of
short interspersed nuclear elements (SINE) were identified (Table S2).
After integrating the results from both homology and transcriptome-based
annotations, we predicted 18,895 protein-coding genes in the stripped
catfish genome (Table S3). Finally, using four public databases, we
predicted that 98.46% of the predicted protein-coding genes (a total of
18,604) are functionally annotated (Table S4).
Genome quality evaluation
The final chromosome-level genome assembly of striped catfish, 731.7 Mb
in length, accounted for 98.5% of the assembled genome. To evaluate the
genome quality, we compared our genome assembly
to those previously published
genomes of striped catfish (GenBank assembly accession:
GCA_003671635.1; Kim et al., 2018) and other Siluriformes species. On
account of the integrative strategy using shotgun sequencing and
long-read Nanopore sequencing as well as the Hi-C data, our
chromosome-level genome assembly of striped catfish with higher values
of contig and scaffold N50 than most of previously published genome
assemblies. Especially, among all published Siluriformes genomes, both
contig and scaffold N50 values of the assembled genome for striped
catfish in the present study reached a comparatively higher level (Table
2), indicating a considerable continuity of our striped catfish genome
assembly. Subsequently, BUSCO
(Simão et al., 2015) was used to estimate the completeness of our
assembled genome, with the popular
actinopterygii_odb9 database.
Among the totally searched 4,584
BUSCO groups, 4,279 (93.3%) BUSCO core genes were completely identified
(Table S5).
Gene family and gene
clustering
These CDS, predicted from assembled genomes of striped catfish and other
7 species, were applied for gene families clustering. Eventually,
protein-coding genes of striped catfish were clustered into 13,391 gene
families (containing 18,220 genes), among them 1,581 were identified as
single-copy orthologous gene families (Table S6). With these single-copy
orthologous gene families, we constructed a phylogenetic tree based on
the maximum likelihood method, predicting that the divergence of striped
catfish from others occurred 54.9 Mya (Fig. 3).
In addition, we found 414
expanded
gene families and 4,212 contracted gene families in striped catfish
(Fig. 3). By KEGG enrichment
analysis, those genes in the expanded gene families were clustered into
several critical metabolic pathways, indicating a greater degree of
expansion in terms of association with calcium signaling pathway,
salivary secretion, apelin signaling pathway and oxytocin signaling
pathway (Table S7). Subsequently, we examined the expanded genes with a
relation to lipid metabolism, in which we observed an interesting
expansion of fatty acid binding protein1 gene (fabp1 ) in the
striped catfish genome.