3.2.1. Endo-β-glucanases
Among the selected species we found thirty genes that encode different endoglucanases. Their catalytic domains belong to the GH5 and GH9 family of CAZymes (Carbohydrate-Active Enzymes) 24 and seven of these proteins present CBM from families 1 and 2. While CBM2 were found only in GH5 endoglucanases, CBM1 were associated with some GH9 endoglucanases (Figure 1 and Table 2).
Our results showed that three GH9 endoglucanases, Sceob1|18668, SceobE4|579152, SceobDOE|757385, present Big_1 domains in their C-terminal. Big_1 is a bacterial immunoglobulin (Ig)-like domain, usually present in GH9 endoglucanases. The functions of these Ig-like modules are not clear; however, they are supposed to be involved in the catalytic efficiency or in the structural stability of GH9 endoglucanases 25,26. On the other hand, KAF8065624.1 present a LPMO domain in its N-terminal region. LPMO is a lytic polysaccharide monooxygenase domain that usually act synergically with GH9 domains, increasing the cellulolytic enzyme activity27.
Most of the GH9 endoglucanases analyzed are secreted or anchored to the cell membrane (with the catalytic domain localized to the outer surface of the plasma membrane). On the other hand, only three identified proteins GH5 endonucleases are predicted to be secreted, while the other four identified proteins would be cytoplasmic enzymes (Table 2).
The analysis performed with the Prosite Database showed three highly conserved regions present in GH9 endoglucanases which contains conserved residues important for their catalytic activity 5,28(Figure 2). The first region comprises the DAGD motif, where the first Asp (Asp54, Nasta 1KS8 numbering) is an active site residue. The second region contains a conserved RPHHR sequence, where the first His (His359, Nasta 1KS8 numbering) is also part of the active site of GH9 endoglucanases. Finally, Region III contains two Asp and Glu residues (Asp399 and Glu412, Nasta 1KS8 numbering) that would be involved in catalysis. Thus, all the proteins identified and analyzed in this study contain four acidic residues, D, H, D and E, in the mentioned regions, which would be part of the active site of GH9 endoglucanases and would be involved in catalysis, with the exception of ScsoPA4|KAF8062061.1 and SceobDOE|1035052 proteins, that lack the catalytic D from Region I and the H residue from Region II, respectively (Fig 2.A.).
S. quadricauda LWG002611 possess another two hypothetical GH9 endoglucanase proteins (Scequ2611|3068 and Scequ2611|4665) showing a high similarity with respect to the analyzed Scenedesmus endoglucanases (Scequ2611|3068 34-36% with KAF6265438.1 and KAF8066338.1, Scequ2611|4665 30% to KAF6264795.1) but they lack the essential Glu residue from Region III (Fig 2B). Further studies are needed to determine if Scequ2611|3068 and Scequ2611|4665 are catalytically inactive enzymes or if they have a different mechanism than traditional glucosyl hydrolases.
Cellulases usually present linkers with length from 6–14 residues long up to >100 residues 5,29. FromScenedesmus endoglucanases analyzed, most of the CBM1 containing enzymes showed putative P/S-rich or poliQ linkers, mainly located between the GH9 and the CBM regions (Fig 2.A.). These proline or glutamine rich spacers constitute a rigid type of linkers which are thought to act as spacers to avoid non-native interactions between domains that may affect the correct folding of proteins30. In addition, the linkers would allow cellulases to push forward on the exterior of the polysaccharide with a caterpillar-like movement 5,31.
In contrast to what was described in GH9 endoglucanases from other algae, such as Chlamydomonas , Volvox and Gonium5 CBM1 domains are located either at the C- or N-terminus of some the studied Scenedesmus GH9 endoglucanases. Notably, both, the N- and the C-terminus CBM1 analyzed, are cysteine-rich domains as previously described in Chlamydomonas .
A phylogenetic tree was also constructed (Fig. 3) using Gblock and the MEGA 6.06 software from the alignment of the amino acid sequences from Scenedesmus GH9 enzymes together with homologous sequences identified with Blast-P from invertebrates, fungi, plants and bacteria. The organization of the tree branches suggest that Scenedesmus GH9 endoglucanases are evolutionarily closer to termites, worms, sea urchins and bivalves GH9 cellulases (red branches of the tree) rather than the enzymes from higher plants, fungi and bacteria.
Figure 4 shows the 3D model of the Sceobl1|32711 GH9 and CBM1 domains constructed with RaptorX Contact Prediction. The model created presents a similar fold to that described for previously characterized GH9 endoglucanases 5,32 with a (α/α)6-barrel fold. Besides, the catalytic amino acid residues are positioned in a similar spatial location when Sceobl1|32711 model was superposed with 1ks8 template (an endocellulase from the termite Nasutitermes takasagoensis 40.24% identical to Sceobl1|32711 (79% cover). On the other hand, its N-terminal CBM1 showed a high sequence identity (99.6%, cover 35%) with the cellulose-binding domain of endoglucanase I fromTrichoderma reesei (PDB entry: 4BMF) and its model present a good spatial conservation when both models where superposed.
Regarding GH5 endoglucanases, they present the consensus pattern [LIV]-[LIVMFYWGA](2)-[DNEQG]-[LIVMGST]-{SENR}-N-E-[PV]- [RHDNSTLIVFY] 15. The C-terminal Glu is an active site residue. The predicted catalytic residues, Glu168 and Glu309 (Pyrho 3W6M numbering) are strictly conserved in all the GH5 endonucleases analyzed (Figure 5).
The Figure 6 shows the GH5 endoglucanase phylogenetic tree. The tree branching organization suggest that most of the GH5 proteins analyzed are evolutionarily closer to those of other microalgae and higher plants. However, the group of enzymes containing a CBM2 are closer to fungal and bacterial endoglucanases, suggesting a microbial origin.
The homology model of the Sceobl1|14060 GH5 domain present the (α/β)8 TIM barrel fold classical of GH5 family 33(Figure 7). The superposition with Pyrho 3W6M PDB structure (an hyperthermophilic endocellulase from Pyrococcus horikoshii ) showed a conservation of the catalytic amino acid spatial location.
Respects its N-terminal CBM2 domain, Sceobl1|14060 has a high sequence identity (97.3% identity, 68% cover) with 2RTT PDB structure (a chitin-binding domain of Chi18aC from Streptomyces coelicolor ); moreover, both models showed a high structural and spatial conservation.