Sequence data processing and symbiont genus determination
Detailed descriptions of the data processing pipeline are on Github
(https://github.com/evelynabbott/codominant_symbiosis.git). The Fastq
files from both studies were downloaded using the SRA toolkit. Adapter
trimming was done on paired-end mode using cutadapt, with a minimum
length of 20 bp and a PHRED quality cutoff set to 20. FASTQC (Andrews,
2010) was used to assess the quality of a subset of 10,000 reads before
and after trimming. Reads were then mapped to a combined reference
comprising Cladocopium transcriptome, Durusdiniumtranscriptome
(Ladner
et al., 2012), and Acropora millepora genome
(Fuller
et al., 2018) using bowtie2. The reads in the resulting sam
files were then split into three separate sam files, one for each
organism. PCR duplicates were removed after alignment using
MarkDuplicates from the Picard Toolkit (Broad Institute, 2019). Samtools
(Li et al.,
2009) was used to sort and convert from sam files to bam files.
FeatureCounts
(Liao, Smyth,
& Shi, 2014) was used to count reads mapping to annotated gene
boundaries.