2.5 | Data analysis
To determine how the libraries needed to be normalized and pooled for
HiSeq sequencing we used initial read counts of MiSeq data mapped to
fungal and cyanobacterial reference genes (β-tubulin (AFJ45056.1) and
glycerol 3-phosphate dehydrogenase (AFJ45057.1) for fungus; protein
translocase subunit secA (CP026681.1; region: 4141894 - 4142180)
and RNase P RNA gene rnpB (CP001037.1; region: 1485004 - 1485242)
for cyanobacteria). The quality of the MiSeq and HiSeq data was assessed
using FastQC version 0.11.5
(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/; accessed:
15.12.2019) and MultiQC version 1.1 (https://multiqc.info/; accessed:
25.02.2019). Poor quality base reads were removed with the FASTX-toolkit
version 0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/; accessed:
18.02.2019). Adapter sequences were trimmed with Trimmomatic version
0.36 (http://www.usadellab.org/cms/?page=trimmomatic; accessed:
20.02.2019). The processed paired-end MiSeq data was used for de
novo transcriptome assembly with Trinity software version 2.4.0 (Haas
et al., 2013). The quality of the assembly was assessed with the Trinity
perl script TrinityStats.pl. The HiSeq data was pseudoaligned to thede novo transcriptome assembly with the RNA-seq quantification
program kallisto version 0.45.0 (Bray et al., 2016). Coding regions were
identified with TransDecoder (http://transdecoder.github.io; accessed:
20.03.2019). For the respective parameter settings see Electronic
Supplementary Table S1.