Results
Summary of the Genomeassembly and annotation data
Our genome survey was based on a total of 44.23-Gb short reads generated from the Illumina platform. The peak depth was determined at 54, and the total 17-mer number was 42,252,144,258 (see Fig. 1). Therefore, the genome size of striped catfish was calculated to be about 782 Mb. In addition, the heterozygosity rate and repeat rate were estimated to be 0.26% and 43.39%, respectively (Fig. 1).
A total of 63.07-Gb long reads generated from the Nanopore platform were assembled into a 742.6-Mb draft genome, with a contig N50 of 3.5 Mb and the GC content of 38.89% (Table 1). Additionally, A total of 682 million raw reads with total length of about 90 Gb generated from the Hi-C library were applied to identify contacts among contigs, of which 304 million pairs of reads (89.3% of raw reads) were mapped to the assembled genome. The mapped reads were then used to assemble contigs into scaffolds, resulting in a 731.7-Mb genome assembly with 30 chromosomes and a scaffold N50 of 29.5 Mb (Fig. 2a, Table1). The chromosome-level genome assembly of striped catfish was mapped as a Circos atlas, with genome-wide distributions of gene density, genomic GC content, and the internal syntenic blocks (Fig. 2b).
By repeat annotation, we predicted a total of 274-Mb repeat sequences, covering 36.9% of the total assembled genome (Table S1). Among them, 172.8 Mb of DNA repeat elements, 63.1 Mb of long interspersed nuclear elements (LINE), 48.8 Mb of long terminal repeats (LTR) and 5.4 Mb of short interspersed nuclear elements (SINE) were identified (Table S2). After integrating the results from both homology and transcriptome-based annotations, we predicted 18,895 protein-coding genes in the stripped catfish genome (Table S3). Finally, using four public databases, we predicted that 98.46% of the predicted protein-coding genes (a total of 18,604) are functionally annotated (Table S4).
Genome quality evaluation
The final chromosome-level genome assembly of striped catfish, 731.7 Mb in length, accounted for 98.5% of the assembled genome. To evaluate the genome quality, we compared our genome assembly to those previously published genomes of striped catfish (GenBank assembly accession: GCA_003671635.1; Kim et al., 2018) and other Siluriformes species. On account of the integrative strategy using shotgun sequencing and long-read Nanopore sequencing as well as the Hi-C data, our chromosome-level genome assembly of striped catfish with higher values of contig and scaffold N50 than most of previously published genome assemblies. Especially, among all published Siluriformes genomes, both contig and scaffold N50 values of the assembled genome for striped catfish in the present study reached a comparatively higher level (Table 2), indicating a considerable continuity of our striped catfish genome assembly. Subsequently, BUSCO (Simão et al., 2015) was used to estimate the completeness of our assembled genome, with the popular actinopterygii_odb9 database. Among the totally searched 4,584 BUSCO groups, 4,279 (93.3%) BUSCO core genes were completely identified (Table S5).
Gene family and gene clustering
These CDS, predicted from assembled genomes of striped catfish and other 7 species, were applied for gene families clustering. Eventually, protein-coding genes of striped catfish were clustered into 13,391 gene families (containing 18,220 genes), among them 1,581 were identified as single-copy orthologous gene families (Table S6). With these single-copy orthologous gene families, we constructed a phylogenetic tree based on the maximum likelihood method, predicting that the divergence of striped catfish from others occurred 54.9 Mya (Fig. 3).
In addition, we found 414 expanded gene families and 4,212 contracted gene families in striped catfish (Fig. 3). By KEGG enrichment analysis, those genes in the expanded gene families were clustered into several critical metabolic pathways, indicating a greater degree of expansion in terms of association with calcium signaling pathway, salivary secretion, apelin signaling pathway and oxytocin signaling pathway (Table S7). Subsequently, we examined the expanded genes with a relation to lipid metabolism, in which we observed an interesting expansion of fatty acid binding protein1 gene (fabp1 ) in the striped catfish genome.