Gene prediction and annotation of genome assemblies
We modeled repetitive sequences with RepeatModeler v2.0 (part of Dfam TE Tools v1.0, Dfam Consortium, 2019) and masked the assemblies of these repetitive regions with RepeatMasker v4.0.7_4 (Smit et al., 2020). We then completed three rounds of Maker v2.31.10_2 (Holt & Yandell, 2011) gene prediction on scaffolds longer than 5 kb. For the first round, we used NCBI Coleopteran RefSeq proteins (downloaded 2019-11-12) for protein homology evidence, and NCBI D. ponderosae ESTs (downloaded 2019-11-12); D. ponderosae GenBank FLcDNAs: BT126413-BT128693 and JQ855638-JQ855707; and NCBI Dendroctonusspp. TSA accessions: GABX00000000, GAFI00000000, GAFW00000000, GAFX00000000, GDAR00000000, and GGKQ00000000 for EST evidence. Alignments were completed using exonerate v2.4.0 (Slater & Birney, 2005) and BLAST+ 2.9.0 (Camacho et al., 2009). The results from the first round were used with BUSCO v3.0.2_2 to train AUGUSTUS v3.3.3 (Stanke et al., 2006). Round two used AUGUSTUS, SNAP v 2013-11-29_1 (Korf, 2004), and GeneMark-ES v2019-05 (Lomsadze et al., 2005) forab initio gene prediction. We used the results from round two to directly train AUGUSTUS for the final round of Maker along with SNAP and GeneMark-ES again. We then used InterProScan v5.40-77.0 (Jones et al., 2014) and BLAST+ v2.10.0 against the UniProtKB/Swiss-Prot 2020_01 database to functionally annotate the predicted proteins from the gene predictions. BUSCO v4.1.4 analysis with the insecta_odb10.2020-09-10 lineage dataset was completed on the predicted proteins from the first transcript variant of each gene model. The annotated assemblies are available with the following NCBI accessions: BioProjects, PRJNA638274 (male) and PRJNA638278 (female); genome accessions JAFETF000000000 (male) and JAFETG000000000 (female).