3. Results and discussion
3 .1 Sequence-based analysis of putative GH57 GBE sequences
Using the key words “DUF1957 domain-containing protein” or “Glycoside hydrolase family 57 protein”, 2,497 amino acid sequences were retrieved from the NCIB database. These sequences varied in length between 418 and 1,184 amino acids. Except for 50, all sequences had the nucleophile and acid/base catalyst and contained the five conserved sequence regions typical for GH57 members (Fig. 1). The exception were sequences that missed one or both catalytic residues and showed a large variation in four of the five conserved sequence regions. These sequences were excluded from further analysis as it was assumed that they are not active. The first four conserved sequence regions are positioned within the A-domain containing the (β/α)7 barrel. Conserved region 5 is located in the C domain on the second α helix.
When comparing the sequence logo for all the GH57 GBEs of this study with the logos recently published on 1,602 GH57 sequences25, two GBE specific fingerprints become clear; the first is a quintet of amino acids with the combination HxHLP, with x being A, S or T, found in CSRI of almost all GBE sequences; in a small number of GBE sequences the L at position 4 is replaced by an I or M. In all other GH57 enzymes a Q is present instead of an L at position 4 whereas in α-galactosidases and -related proteins there is an L at position 4 but this is followed by a Q or A/M and not a P as is the case for GH57 GBEs. The second GBE fingerprint is the sextet ELF(Y)GHW present in CSRIV. The first position of this fingerprint, the E, is conserved among all proteins assigned to a functional GH57 enzyme subfamily. This E is not conserved in the proteins that are categorized as -like proteins. These proteins miss one or both of the catalytic residues and are very likely not active. In the 4-α-glucanotransferase of Thermococcus litoralis the E is only 5.1 Å from the acid-base catalyst D (Fig. 2A) and is involved in binding the -1 subsite residue through a water molecule31. In the other GH57 crystal structures, the conserved E is 4 Å (T. maritima AmyC) to 7 Å (T. thermophilus GBE) from the acid-base catalyst (Fig. 2B and 2C). In GH13 enzymes, a catalytic triad of a catalytic nucleophile D, a general acid base catalyst E, and a transition state stabilizer D play a key role in catalysis [32-34]. As the CSRIV E is completely conserved in all GH57 proteins assigned to a functional subfamily and is positioned close to the acid-base catalyst in all available GH57 crystal structures, it is not unlikely to assume that this E plays a similar role as the transition state stabilizer D in GH13.
The other positions of the sextet are completely conserved in all GBE proteins analyzed in this study, with the exception of the third position, the F which is in K. pacifica GBE replaced by another hydrophobic side chain containing amino acid, an Y. Whereas previously it was reported that the C at position 16 (CSRIII) is conserved among GH57 GBEs6, this position is not absolutely invariant, as 72 out of the 2,497 (2.9%) sequences have a different amino acid in this position, a feature also noticed by Martinovičová and Janeček25; the majority of these have an M (56; 2.2%), nine have an S, five an L and two an F. This almost fully conserved C can still be seen as a fingerprint as none of all the other GH57 enzymes and -like proteins have a C at this position.
In addition, five residues in the vicinity of the active sites were identified to be fully conserved in all sequences; three tryptophans (W274, W404 and W413), one histidine (H146), and one arginine (R265) (T. thermophilus numbering). In T. kodakarensis , the three tryptophans and the one at position 28 have been defined as the aromatic gate keepers8. This group of four aromatic gate keeper tryptophans is highly conserved in all GH57 GBEs except W28, which is a threonine in the GBEs of Thermus and Meiothermus species. In the T. thermophilus GBE the W274 is positioned to the side and the W404 at the bottom of the positive subsites6. Both are involved in substrate binding by aromatic stacking (W274) and hydrogen bonding (W404). In the T. maritima GBE, the W274 equivalent (W246) is buried such that aromatic stacking is very unlikely to occur while the position of the W413 equivalent (W411) is difficult to predict6. The role of the H146 and the R265 is not clear.
In the P. horokoshii GBE, a tryptophan (W22) at the bottom of the active site groove is involved in substrate recognition. Changing this W into an A resulted in almost complete loss of activity19. This W is also found at the same position in the crystal structure of T. thermophilus and T. kodakarensis . In GH57GBEs from T. maritima , P. mexicana ,P. mobilis and K. pacifica , this W is replaced by D or E or P. Besides the bottom W four other aromatic amino acids are found in close vicinity of the active sites of T. kodakarensis , T. thermophilus , P. horokoshii , or T. maritima ; F23, F289, W360 and F461 (T. thermophilus numbering). In all the other sequences of this study, three of these four aromatic amino acids are functionally conserved while the F23 is not conserved. Zhang et al.8 reported another three important amino acids near the active site, H11, S462, and D463 (T. thermophilus numbering). These are all conserved at the respective positions in all the 2,497 sequences.