Figure 7
Figure 7. The crystal structure of dimeric Af1503 (grey) is shown in a superposition with the best AlphaFold2 model (green, monomer). The only noteworthy difference between the prediction and the crystal structure is found in a loop in the PAS domain, which was found to coordinate an ion in the crystal structure.
Model accuracy
Although the AlphaFold2 predictions were modeled as monomers that are not fully compatible with the dimeric state of Af1503, the predicted models superimpose closely on the final crystal structure (Fig. 7). Of the five AlphaFold2 models, four are in a conformation that closely matches the dimeric state, and all of them superimpose with an RMSD below 2.5 Å over their full length on all chains of the crystal structure. Consequently, more focused, local superimpositions yield RMSD values far below 2 Å. In short, the model accuracy is fairly close to what one would expect for another crystal structure of the same protein. There is just one region that deviates from the crystal structure: The electron density revealed that an elongated loop within the PAS domain is actually coordinating a metal ion, which has a pronounced impact on its structure. Needless to say, AlphaFold2 did not predict the presence and coordination of that ion, but nevertheless, it predicted this loop in a conformation that is at least close to the ion-bound state.
AlphaFold2 models aid in crystal structure determination of the bacterial exo-sialidase Sia24 (CASP: T1089) by molecular replacement – by SDR and GAC.From email to the CASP Prediction Center: Models 1, 2, 3, and 5 worked quite well as an ensemble for molecular replacement, and quite well on their own. We eventually achieved similar results with an ensemble of current PDB models, but this one scored much higher in MR from the beginning. Steven Rees
Brief description of the target
Sialidase enzymes (or neuraminidases) cleave sialic acid (SA) moieties found on mucin glycoproteins of the gastrointestinal (GI) tract, and are utilized by microbial communities for the sequestration of SAs as metabolic substrates, or (in the case of some pathogenic species) a means of biofilm formation, surface adhesion, and revealing toxin-binding sites54,55. Exo-sialidases, which cleave terminal SAs, are typically classified in the carbohydrate-active enzymes database (CAZy) as GH family 33 (GH33) and are the most common sialidases identified54,56,57, typically utilizing a two-step catalytic mechanism where a conserved Glu activates a spatially proximal Tyr for nucleophilic attack of C5 of the SA, prompting acid-base catalysis at C5 by an Asp residue58,59. While most sialidases characterized to date are ambivalent towards the mammalian SAs Neu5Ac and Neu5Gc (differing only by a hydroxyl group at the acetamido C5 on the latter), we and others characterized a series of Neu5Gc-favoring sialidases in both the microbial communities of mice fed Neu5Gc-enriched diets and a human population during Neu5Gc-enriched dietary seasons 59. This study identified an upregulation of Sia24, a Neu5Gc-favoring sialidase likely from Bacteroides acidifaciens with low sequence homology to published sialidase structures.
Methodology
Sia24 was purified and concentrated to 10-12 mg/mL59, and crystallized in 100 mM Bis-Tris pH 6.5 and 20% polyethylene glycol monomethyl ether 5,000. Crystals in the P41 space group typically diffracted to 2.2-2.6 Å, with a single high-resolution dataset collected at 2.0 Å. A more detailed description of the protein production and crystallization are provided in the Supplementary Material, and will be presented in a future study.
Our initial molecular replacement attempts used cross-species homolog structures identified by sequence-based searches in the PDB. These searches focused on using the catalytic domain of exo-sialidase models derived from the GH33 family, as Sia24 lacks the carbohydrate-binding motif found in some members. Various identified catalytic domain search models (from PDB accession codes 1DIL, 1EUR, 1WCQ, 2VK5, 4FJ6, 4J9T, 4BBW, 4Q6K, and 5TSP) initially failed to find a reasonable phasing solution by molecular replacement regardless of model modification (e.g., poly-alanine, CCP4’s Chainsaw-mediated side-chain pruning and mutagenesis, and removal of flexible loop regions outside of canonical beta-propeller domain secondary structure). Ab initio models generated by Robetta (https://robetta.bakerlab.org/) and I-TASSER60-62 did not yield a solution by molecular replacement. Phyre263 offered reasonable solutions (TFZ=14.1, LLG=194), as did using PHENIX.ENSEMBLER to generate an ensemble of the nine models mentioned above (TFZ=16.9, LLG=256). Both of these approaches struggled during subsequent refinement and manual building steps, and the latter ensemble models lacked much of the Sia24 sequence because of low homology. Concurrently, we tried models of Sia24 generated by the AlphaFold2 team and provided by the CASP14 organizers. Four of their five coordinate models were quickly successful in initial phase estimation by molecular replacement after removal of flexible N- and C-terminal regions, with model 2 (T1089TS427_2) showing the highest performance (TFZ=62.5, LLG=3791).
Model accuracy
AlphaFold2’s model had high coordinate similarity (RMSD=1.08 Å on all atoms, 0.55 Å after 5 cycles of outlier rejection) to the crystallographic structure (PDB code 7MHU), and displays the beta-propeller structure of the canonical exo-sialidase catalytic domain (Fig. 8). Most side-chains are also reasonably oriented. The largest deviations in the models were localized to the N- and C-termini and regions between anti-parallel beta strand propeller motifs. The low-homology model ensemble described above has a similar consistency, albeit lacking most information on side-chain and flexible loop placement. Similarly, models from other ab initio methods display reasonable overlap, but were not successful in initial molecular replacement attempts.