3.2.2. β-glucosidases
In the the analyzed genomes we found twenty-nine putative β-glycosidases, all of them belonging to the GH1 family of CAZymes (Figure 8). The most common enzymatic activities reported for glycoside hydrolases of this family are β-glucosidases and β-galactosidases.
It has been previously described that one of the highly conserved regions in GH1 sequences has a glutamic acid residue and is classified as GH1_1 34. This region between positions 388-392 (Nanochloropsis β-glucosidase GH1 numbering, PDB code: 5YJ7,) presents a conserved sequence (V/I)TENG. The Glu residue would participate in the cleavage of the glycosidic bond by acting as a nucleophile 34. This catalytic nucleophile was first identified as Glu358 in a β-glucosidase from Agrobacterium35.
In our work, the conserved region (V/I)TENG was found in all the GH1 sequences analyzed: (Figure 9.A) GH1_1. The extended region defined as consensus is: [LIVMFSTC]-[LIVFYS] [LIV]-[LIVMST]-E-N-G-[LIVMFAR]-[CSAGN]. All of the proteins chosen in this work possess in this region the Glu involved in catalysis followed by Asn and Gly, as can be seen in Figure 9. However, the four amino acids upstream Glu residue appear to differ in the proteins, with Ile-Trp-Ile-Thr being predominant.
As a second signature pattern, the conserved region GH1_2 (Figure 8.A) was chosen for our analysis. This region, defined as: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA], is located at the N-terminal of GH1 β-glucosidases, however, it may not be present in some proteins of this family. The alignment of Sceobl|9031 and Sceobl|10236 sequences showed that these proteins do not contain that region (Figure 9.A). On the other hand, the protein SceoblEN4 |617109, possesses only nine of the fifteen amino acids established as consensus, which is equivalent to 60 %, and therefore would be considered that this domain is present but with some variations in the amino acid sequence. Similarly, Sceobl1|35463 and Scequ2611|9833 proteins have only 46 and 40 % of the consensus sequence, respectively. Scesp1|1644545 only possesses the last four amino acids of this consensus sequence, while the other eleven, do not correspond to the established consensus. The sequence of KAF8060308.1 BGLU11 shows some particular characteristics in this region. Although it possesses 86 % sequence identity respect to the established consensus sequence, it presents an insertion of several amino acids downstream the consensus (34 residues), before x-E-x-[GSTA]. This protein also contains an additional domain in its N-terminal region, a protein disulfide isomerase domain (cl36828: ER_PDI_fam Superfamily,36), previously involved in protein folding 37.
Interestingly, only five of the proteins analyzed contains the first Phe residue, which is characteristic of the GH1_2 pattern. It is interesting to note that none Scenedesmaceae β- glycosidases analyzed in this study presents CBM nor linkers.
On the other hand, S. quadricauda LWG002611 have a hypothetical GH1 β-glucosidase protein, named Scequ2611|3544, with high similarity with other Scenedesmus endoglucanases (86% cover and 29.45 % identity with KAF8059426.1) but that lacks the region containing the catalytic Glu residue (Figure 9.B).
The phylogenetic tree performed shows four β-glucosidase subgroups: (i) the plant GH1 subgroup, (ii) the GH1 from algae, (iii) the GH1 from fungi, and (iv) the GH1 from bacteria (Figure 10). The Scesp1|1509300, SceobDOE|17466, SceoblEN4|575894, and SceobDOE|32074 proteins are grouped in a branch close to the bacteria enzyme, that proposes the possible acquisition of these genes by horizontal transfer. On the other hand, the Scequ2611|9833 protein was found within a large group of GH1 enzymes from algae. This result suggests the correct inference of its sequence and the possibility that orthologous genes are those that code for proteins found on a nearby branch.
There is at least one representative of each species corresponding to a genus in each branch of the group of proteins from algae. This result suggests that the different β-glucosidases present in the different species could fulfill the same function and that it would not be redundant within the genus.
The homology model of the Sceob1|9434 β-glucosidase constructed with RaptorX Contact Prediction is shown in Figure 11. The superposition with Nannochloropsis oceanica BGLN1 β-glucosidase crystal structure (PDB code: 5YJ7) shows a catalytic amino acid positional conservation in the central region. Also, it presents an overall structure of TIM barrel, and the ENG residues conserved, which suggest a reliable protein function assignment of this new protein in