Genome assembly, gene function annotation and phylogenetic analyses
A total of 7,225,123 raw Illumina sequences were obtained, filtered, and cleaned using the trim_galore program v0.6.7 (https://github.com/FelixKrueger/TrimGalore). Out of these, 7,206,640 sequences, with both pairs preserved, were utilized for genome assembly. The de novo genome assembly was conducted using the multi-platform genome assembly pipeline (MpGAP) v3.1, employing Nextflow version 21.10.6 and Masurca version 4.0.5. To assess the quality of the assembly, Quast version 5.0.2 and Busco version 5.4.2 were employed, utilizing the Saccharomycetales_odb10 database. The assembly with the best performance (highest N50 and largest contig size (Kbp)) was then annotated using the Funannotate pipeline version 1.8.14. Contigs smaller than 500 base pairs were removed, and repeated sequences were masked using default settings. Gene prediction was carried out ab initio using ’Augustus’, ’HiQ’, ’GlimmerHMM’, ’snap’, and ’GeneMark’. The prediction of t-RNAs was performed using tRNAscan-SE v2.0.9, a program included in Funannotate. For annotation and functional prediction of genes we utilized InterProScan. KEGG and KofamKOALA web server were used to predict the gene functions of N. atacamensis ATA-11A-B. The average nucleotide identity (ANI) between Nakazawaea genomes was estimated from different available assemblies using OrthoANI (Leeet al. , 2016).
A maximum-likelihood phylogenetic tree using the available genomes within the Nakazawaea genus was generated using the SANS serif version 2.3_7ª tool (Rempel & Wittler, 2021). The phylogenetic analysis was performed using a default K-mer length of 31, which allowed capturing relevant genetic patterns within the studied genomes.Pachysolen tannophilus was used as an outgroup.