RNA seq and transcriptome assembly
Transcriptome sequencing resulted in on average 42,928,473±14,030,906 paired-end reads with a GC content of 38.2±1.6%. De novoassembly statistics are illustrated in Table 1. Filtering with TransRate significantly increased the quality of assemblies, as evidenced by substantially less duplicated BUSCO hits (Table 2; two-sample Wilcoxon Rank Sum Test: W =199,p <0.001), but it decreased completeness. Raw Agalma assemblies have significantly more complete BUSCO hits (W =198,p <0.001), less fragmentary hits (W =31,p <0.001) and less missing data (W =8.5,p <0.001). We observed no significant difference between the raw Agalma assemblies and TransRate-filtered assemblies in the number of complete single-copy BUSCO hits (W =132, p =0.436), however. The clustered supertranscriptome contained 988,460 contigs in total and BUSCO analysis against the Metazoa_odb9 database indicated that it is very complete, but with a high level of redundancy. Prediction and clustering of ORFs resulted in the retention of 131,503 ORFs, and effectively decreased the number of duplicated BUSCO hits and therewith redundancy (Table 2; Ortiz-Sepulveda et al., 2022).