Figure 7
Figure 7. The crystal structure of dimeric Af1503 (grey) is
shown in a superposition with the best AlphaFold2 model (green,
monomer). The only noteworthy difference between the prediction and the
crystal structure is found in a loop in the PAS domain, which was found
to coordinate an ion in the crystal structure.
Model accuracy
Although the AlphaFold2 predictions were modeled as monomers that are
not fully compatible with the dimeric state of Af1503, the predicted
models superimpose closely on the final crystal structure (Fig. 7). Of
the five AlphaFold2 models, four are in a conformation that closely
matches the dimeric state, and all of them superimpose with an RMSD
below 2.5 Å over their full length on all chains of the crystal
structure. Consequently, more focused, local superimpositions yield RMSD
values far below 2 Å. In short, the model accuracy is fairly close to
what one would expect for another crystal structure of the same protein.
There is just one region that deviates from the crystal structure: The
electron density revealed that an elongated loop within the PAS domain
is actually coordinating a metal ion, which has a pronounced impact on
its structure. Needless to say, AlphaFold2 did not predict the presence
and coordination of that ion, but nevertheless, it predicted this loop
in a conformation that is at least close to the ion-bound state.
AlphaFold2 models aid in crystal structure determination of
the bacterial exo-sialidase Sia24 (CASP: T1089) by molecular
replacement – by SDR and GAC.From email to the CASP Prediction Center: Models 1, 2, 3, and 5
worked quite well as an ensemble for molecular replacement, and quite
well on their own. We eventually achieved similar results with an
ensemble of current PDB models, but this one scored much higher in MR
from the beginning. Steven Rees
Brief description of the target
Sialidase enzymes (or neuraminidases) cleave sialic acid (SA) moieties
found on mucin glycoproteins of the gastrointestinal (GI) tract, and are
utilized by microbial communities for the sequestration of SAs as
metabolic substrates, or (in the case of some pathogenic species) a
means of biofilm formation, surface adhesion, and revealing
toxin-binding sites54,55.
Exo-sialidases, which cleave terminal SAs, are typically classified in
the carbohydrate-active enzymes database (CAZy) as GH family 33 (GH33)
and are the most common sialidases identified54,56,57,
typically utilizing a two-step catalytic mechanism where a conserved Glu
activates a spatially proximal Tyr for nucleophilic attack of C5 of the
SA, prompting acid-base catalysis at C5 by an Asp residue58,59.
While most sialidases characterized to date are ambivalent towards the
mammalian SAs Neu5Ac and Neu5Gc (differing only by a hydroxyl group at
the acetamido C5 on the latter), we and others characterized a series of
Neu5Gc-favoring sialidases in both the microbial communities of mice fed
Neu5Gc-enriched diets and a human population during Neu5Gc-enriched
dietary seasons 59.
This study identified an upregulation of Sia24, a Neu5Gc-favoring
sialidase likely from Bacteroides acidifaciens with low sequence
homology to published sialidase structures.
Methodology
Sia24 was purified and concentrated to 10-12 mg/mL59, and crystallized in
100 mM Bis-Tris pH 6.5 and 20% polyethylene glycol monomethyl ether
5,000. Crystals in the P41 space group typically
diffracted to 2.2-2.6 Å, with a single high-resolution dataset collected
at 2.0 Å. A more detailed description of the protein production and
crystallization are provided in the Supplementary Material, and will be
presented in a future study.
Our initial molecular replacement attempts used cross-species homolog
structures identified by sequence-based searches in the PDB. These
searches focused on using the catalytic domain of exo-sialidase models
derived from the GH33 family, as Sia24 lacks the carbohydrate-binding
motif found in some members. Various identified catalytic domain search
models (from PDB accession codes 1DIL, 1EUR, 1WCQ, 2VK5, 4FJ6, 4J9T,
4BBW, 4Q6K, and 5TSP) initially failed to find a reasonable phasing
solution by molecular replacement regardless of model modification
(e.g., poly-alanine, CCP4’s Chainsaw-mediated side-chain pruning and
mutagenesis, and removal of flexible loop regions outside of canonical
beta-propeller domain secondary structure). Ab initio models
generated by Robetta (https://robetta.bakerlab.org/) and I-TASSER60-62 did not yield a
solution by molecular replacement. Phyre263 offered reasonable
solutions (TFZ=14.1, LLG=194), as did using PHENIX.ENSEMBLER to generate
an ensemble of the nine models mentioned above (TFZ=16.9, LLG=256). Both
of these approaches struggled during subsequent refinement and manual
building steps, and the latter ensemble models lacked much of the Sia24
sequence because of low homology. Concurrently, we tried models of Sia24
generated by the AlphaFold2 team and provided by the CASP14 organizers.
Four of their five coordinate models were quickly successful in initial
phase estimation by molecular replacement after removal of flexible N-
and C-terminal regions, with model 2 (T1089TS427_2) showing the highest
performance (TFZ=62.5, LLG=3791).
Model accuracy
AlphaFold2’s model had high coordinate similarity (RMSD=1.08 Å on all
atoms, 0.55 Å after 5 cycles of outlier rejection) to the
crystallographic structure (PDB code 7MHU), and displays the
beta-propeller structure of the canonical exo-sialidase catalytic domain
(Fig. 8). Most side-chains are also reasonably oriented. The largest
deviations in the models were localized to the N- and C-termini and
regions between anti-parallel beta strand propeller motifs. The
low-homology model ensemble described above has a similar consistency,
albeit lacking most information on side-chain and flexible loop
placement. Similarly, models from other ab initio methods display
reasonable overlap, but were not successful in initial molecular
replacement attempts.