RNA seq and transcriptome assembly
Transcriptome sequencing resulted in on average 42,928,473±14,030,906
paired-end reads with a GC content of 38.2±1.6%. De novoassembly statistics are illustrated in Table 1. Filtering with
TransRate significantly increased the quality of assemblies, as
evidenced by substantially less duplicated BUSCO hits (Table 2;
two-sample Wilcoxon Rank Sum Test: W =199,p <0.001), but it decreased completeness. Raw
Agalma assemblies have
significantly more complete BUSCO hits (W =198,p <0.001), less fragmentary hits (W =31,p <0.001) and less missing data (W =8.5,p <0.001). We observed no significant difference between
the raw Agalma assemblies and TransRate-filtered
assemblies in the number of complete single-copy BUSCO hits
(W =132, p =0.436), however. The clustered
supertranscriptome contained 988,460 contigs in total and BUSCO analysis
against the Metazoa_odb9
database indicated that it is very complete, but with a high level of
redundancy. Prediction and clustering of ORFs resulted in the retention
of 131,503 ORFs, and effectively decreased the number of duplicated
BUSCO hits and therewith redundancy (Table 2; Ortiz-Sepulveda et al.,
2022).