2.1. Sequence search, alignment and phylogenetic analysis
Cellulase sequences of S. quadricauda were identified from genome sequence data of our previous study 9 using protein folding homology analysis by Phyre2 12 and Blast-N similarity study 13 with Monoraphidium neglectum taken as reference, and their details are included in table 1. Other analyzed sequences of Scenedesmus were taken from PhycoCosm 14 or NCBI {https://www.ncbi.nlm.nih.gov/} and their accession numbers are shown in table 2. Conserved domains, signal peptide, and GH-family assignment were identified with Prosite patterns 15, DeepLoc 16 and PredAlgo17. The sequences were aligned and processed with Clustal Omega 18 and visualized with ESPript 3.019. To construct the phylogenetic trees, all the sequences were aligned with sequences from phylogenetically distant β-1,4-endoglucanases, β-glucosidases or exocellulases (respectively) from microalgae, fungi, plants, invertebrates and bacteria and processed with Gblock v0.91b before analyzing them in MEGA 6.0620,21. Enzymes signal peptides were not included in the phylogenetic analysis. The phylogenetic trees were built by Maximum Likelihood method in MEGA 6.06 version with the model and the restrictions suggested by the program. Phylogenies were determined by Bootstrap Analysis of 100 replicates.