3.2.2. β-glucosidases
In the the analyzed genomes we found twenty-nine putative
β-glycosidases, all of them belonging to the GH1 family of CAZymes
(Figure 8). The most common enzymatic activities reported for
glycoside
hydrolases of this family are β-glucosidases and β-galactosidases.
It has been previously described that one of the highly conserved
regions in GH1 sequences has a glutamic acid residue and is classified
as GH1_1 34. This region between positions 388-392
(Nanochloropsis β-glucosidase GH1 numbering, PDB code: 5YJ7,)
presents a conserved sequence (V/I)TENG. The Glu residue would
participate in the cleavage of the glycosidic bond by acting as a
nucleophile 34. This catalytic nucleophile was first
identified as Glu358 in a β-glucosidase from Agrobacterium35.
In our work, the conserved region (V/I)TENG was found in all the GH1
sequences analyzed: (Figure 9.A) GH1_1. The extended region defined as
consensus is: [LIVMFSTC]-[LIVFYS]
[LIV]-[LIVMST]-E-N-G-[LIVMFAR]-[CSAGN]. All of the
proteins chosen in this work possess in this region the Glu involved in
catalysis followed by Asn and Gly, as can be seen in Figure 9. However,
the four amino acids upstream Glu residue appear to differ in the
proteins, with Ile-Trp-Ile-Thr being predominant.
As a second signature pattern, the conserved region GH1_2 (Figure 8.A)
was chosen for our analysis. This region, defined as:
F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA],
is located at the N-terminal of GH1 β-glucosidases, however, it may not
be present in some proteins of this family. The alignment of
Sceobl|9031 and Sceobl|10236 sequences showed that
these proteins do not contain that region (Figure 9.A). On the other
hand, the protein SceoblEN4 |617109, possesses only nine of the
fifteen amino acids established as consensus, which is equivalent to 60
%, and therefore would be considered that this domain is present but
with some variations in the amino acid sequence. Similarly,
Sceobl1|35463 and Scequ2611|9833 proteins have only 46
and 40 % of the consensus sequence, respectively.
Scesp1|1644545 only possesses the last four amino acids of this
consensus sequence, while the other eleven, do not correspond to the
established consensus. The sequence of KAF8060308.1 BGLU11 shows some
particular characteristics in this region. Although it possesses 86 %
sequence identity respect to the established consensus sequence, it
presents an insertion of several amino acids downstream the consensus
(34 residues), before x-E-x-[GSTA]. This protein also contains an
additional domain in its N-terminal region, a protein disulfide
isomerase domain (cl36828: ER_PDI_fam
Superfamily,36), previously involved in protein
folding 37.
Interestingly, only five of the proteins analyzed contains the first Phe
residue, which is characteristic of the GH1_2 pattern. It is
interesting to note that none Scenedesmaceae β- glycosidases
analyzed in this study presents CBM nor linkers.
On the other hand, S. quadricauda LWG002611 have a hypothetical
GH1 β-glucosidase protein, named Scequ2611|3544, with high
similarity with other Scenedesmus endoglucanases (86% cover and
29.45 % identity with KAF8059426.1) but that lacks the region
containing the catalytic Glu residue (Figure 9.B).
The phylogenetic tree performed shows four β-glucosidase subgroups: (i)
the plant GH1 subgroup, (ii) the GH1 from algae, (iii) the GH1 from
fungi, and (iv) the GH1 from bacteria (Figure 10). The
Scesp1|1509300, SceobDOE|17466,
SceoblEN4|575894, and SceobDOE|32074 proteins are
grouped in a branch close to the bacteria enzyme, that proposes the
possible acquisition of these genes by horizontal transfer. On the other
hand, the Scequ2611|9833 protein was found within a large group
of GH1 enzymes from algae. This result suggests the correct inference of
its sequence and the possibility that orthologous genes are those that
code for proteins found on a nearby branch.
There is at least one representative of each species corresponding to a
genus in each branch of the group of proteins from algae. This result
suggests that the different β-glucosidases present in the different
species could fulfill the same function and that it would not be
redundant within the genus.
The homology model of the Sceob1|9434 β-glucosidase constructed
with RaptorX Contact Prediction is shown in Figure 11. The superposition
with Nannochloropsis oceanica BGLN1 β-glucosidase crystal
structure (PDB code: 5YJ7) shows a catalytic amino acid positional
conservation in the central region. Also, it presents an overall
structure of TIM barrel, and the ENG residues conserved, which suggest a
reliable protein function assignment of this new protein in