3.1 Genome sequencing and de novo assembly
The k-mer (K=17) analysis indicated that the heterozygosity of S.
chinensis was 0.786% and the estimated genome size was 274,512,001 bp
(Figure S3). The sequencing of the fundatrigenia genome (using the
PacBio Sequel II platform) generated 130 Gb raw data with an N50 length
of 21,033 bp. The raw contig-level assembly was composed of 304,774,269
bases with 1,409 contigs and the N50 length of 2,961,835 bp (Table 1).
After removing the heterozygosity, the length of final contig-level
assembly was 271,416,320 bp with 378 contigs, and N50 length of
4,333,385 bp (Table 1).
The chromosome-level genome was generated via Hi-C data (Table S1) with
a total length of 271,524,833 bp, with a scaffold of N50 20,405,002
(Table 1). More than 97% of the total genome bases were successfully
anchored to the 13 chromosomes (Figure 2). The remaining 2.8% sequences
was comprised 341 small scaffolds (Table 1). Chromosome lengths ranged
from 14,859,000 bp to 10,104,278 bp. As revealed by BUSCO analyses
against the Eukaryota, Arthropoda, Insecta and Hemiptera datasets, theS. chinensis genome assembly contained a higher number of
conserved single-copy Arthropoda genes than any other published aphid
genome, suggesting the completeness and high quality of our genome
assembly (Figure 4A). The genomic short reads were mapped to the
assembled genome sequences, resulting in a 97.79% mapping rate and 60
Gb average sequence depth (Table S2). RNA-seq isolated from seven
samples including fundatrix, fundatrigeniae, autumn migrants, nymphs,
spring migrants (sexuparae), male and female sexuales, a total of 124.22
Gb raw data were generated using the Illumina platform, and more than
86% of the assembled RNA-seq transcripts were mapped to the genome
(Table S3). Altogether 260,508 transcripts (280,520,495 bp in total)
were produced by Trinity (Table S4).