3.2.1. Endo-β-glucanases
Among the selected species we found thirty genes that encode different
endoglucanases. Their catalytic domains belong to the GH5 and GH9 family
of CAZymes (Carbohydrate-Active Enzymes) 24 and seven
of these proteins present CBM from families 1 and 2. While CBM2 were
found only in GH5 endoglucanases, CBM1 were associated with some GH9
endoglucanases (Figure 1 and Table 2).
Our results showed that three GH9 endoglucanases, Sceob1|18668,
SceobE4|579152, SceobDOE|757385, present Big_1
domains in their C-terminal. Big_1 is a bacterial immunoglobulin
(Ig)-like domain, usually present in GH9 endoglucanases. The functions
of these Ig-like modules are not clear; however, they are supposed to be
involved in the catalytic efficiency or in the structural stability of
GH9 endoglucanases 25,26. On the other hand,
KAF8065624.1 present a LPMO domain in its N-terminal region. LPMO is a
lytic polysaccharide monooxygenase domain that usually act synergically
with GH9 domains, increasing the cellulolytic enzyme activity27.
Most of the GH9 endoglucanases analyzed are secreted or anchored to the
cell membrane (with the catalytic domain localized to the outer surface
of the plasma membrane). On the other hand, only three identified
proteins GH5 endonucleases are predicted to be secreted, while the other
four identified proteins would be cytoplasmic enzymes (Table 2).
The analysis performed with the Prosite Database showed three highly
conserved regions present in GH9 endoglucanases which contains conserved
residues important for their catalytic activity 5,28(Figure 2). The first region comprises the DAGD motif, where the first
Asp (Asp54, Nasta 1KS8 numbering) is an active site residue. The second
region contains a conserved RPHHR sequence, where the first His (His359,
Nasta 1KS8 numbering) is also part of the active site of GH9
endoglucanases. Finally, Region III contains two Asp and Glu residues
(Asp399 and Glu412, Nasta 1KS8 numbering) that would be involved in
catalysis. Thus, all the proteins identified and analyzed in this study
contain four acidic residues, D, H, D and E, in the mentioned regions,
which would be part of the active site of GH9 endoglucanases and would
be involved in catalysis, with the exception of
ScsoPA4|KAF8062061.1 and SceobDOE|1035052 proteins,
that lack the catalytic D from Region I and the H residue from Region
II, respectively (Fig 2.A.).
S. quadricauda LWG002611 possess another two hypothetical GH9
endoglucanase proteins (Scequ2611|3068 and
Scequ2611|4665) showing a high similarity with respect to the
analyzed Scenedesmus endoglucanases (Scequ2611|3068
34-36% with KAF6265438.1 and KAF8066338.1, Scequ2611|4665 30%
to KAF6264795.1) but they lack the essential Glu residue from Region III
(Fig 2B). Further studies are needed to determine if
Scequ2611|3068 and Scequ2611|4665 are catalytically
inactive enzymes or if they have a different mechanism than traditional
glucosyl hydrolases.
Cellulases usually present linkers with length from 6–14 residues long
up to >100 residues 5,29. FromScenedesmus endoglucanases analyzed, most of the CBM1 containing
enzymes showed putative P/S-rich or poliQ linkers, mainly located
between the GH9 and the CBM regions (Fig 2.A.). These proline or
glutamine rich spacers constitute a rigid type of linkers which are
thought to act as spacers to avoid non-native interactions between
domains that may affect the correct folding of proteins30. In addition, the linkers would allow cellulases to
push forward on the exterior of the polysaccharide with a
caterpillar-like movement 5,31.
In contrast to what was described in GH9 endoglucanases from other
algae, such as Chlamydomonas , Volvox and Gonium5 CBM1 domains are located either at the C- or
N-terminus of some the studied Scenedesmus GH9 endoglucanases.
Notably, both, the N- and the C-terminus CBM1 analyzed, are
cysteine-rich domains as previously described in Chlamydomonas .
A phylogenetic tree was also
constructed (Fig. 3) using Gblock and the MEGA 6.06 software from the
alignment of the amino acid sequences from Scenedesmus GH9
enzymes together with homologous sequences identified with Blast-P from
invertebrates, fungi, plants and bacteria. The organization of the tree
branches suggest that Scenedesmus GH9 endoglucanases are
evolutionarily closer to termites, worms, sea urchins and bivalves GH9
cellulases (red branches of the tree) rather than the enzymes from
higher plants, fungi and bacteria.
Figure 4 shows the 3D model of the Sceobl1|32711 GH9 and CBM1
domains constructed with RaptorX Contact Prediction. The model created
presents a similar fold to that described for previously characterized
GH9 endoglucanases 5,32 with a
(α/α)6-barrel fold. Besides, the catalytic amino acid
residues are positioned in a similar spatial location when
Sceobl1|32711 model was superposed with 1ks8 template (an
endocellulase from the termite Nasutitermes takasagoensis 40.24%
identical to Sceobl1|32711 (79% cover). On the other hand, its
N-terminal CBM1 showed a high sequence identity (99.6%, cover 35%)
with the cellulose-binding domain of endoglucanase I fromTrichoderma reesei (PDB entry: 4BMF) and its model present a good
spatial conservation when both models where superposed.
Regarding GH5 endoglucanases, they present the consensus pattern
[LIV]-[LIVMFYWGA](2)-[DNEQG]-[LIVMGST]-{SENR}-N-E-[PV]-
[RHDNSTLIVFY] 15. The C-terminal Glu is an active
site residue. The predicted catalytic residues, Glu168 and Glu309 (Pyrho
3W6M numbering) are strictly conserved in all the GH5 endonucleases
analyzed (Figure 5).
The Figure 6 shows the GH5 endoglucanase phylogenetic tree. The tree
branching organization suggest that most of the GH5 proteins analyzed
are evolutionarily closer to those of other microalgae and higher
plants. However, the group of enzymes containing a CBM2 are closer to
fungal and bacterial endoglucanases, suggesting a microbial origin.
The homology model of the Sceobl1|14060 GH5 domain present the
(α/β)8 TIM barrel fold classical of GH5 family 33(Figure 7). The superposition with Pyrho 3W6M PDB structure (an
hyperthermophilic endocellulase from Pyrococcus horikoshii )
showed a conservation of the catalytic amino acid spatial location.
Respects its N-terminal CBM2 domain, Sceobl1|14060 has a high
sequence identity (97.3% identity, 68% cover) with 2RTT PDB structure
(a chitin-binding domain of Chi18aC from Streptomyces
coelicolor ); moreover, both models showed a high structural and spatial
conservation.