Figure 4
Figure 4 . Inaccuracies in AlphaFold2 models. Cryo-EM-derived structures and AlphaFold2 models of several AR9 nvRNAP subunits are superimposed and regions where the conformation of the AlphaFold2 model deviates significantly from the cryo-EM-derived structure are indicated with a dashed line and their boundary residues are labeled. Note that the folds of both the N- and C-terminal domains of gp226 were predicted correctly, but the structure of the interdomain linker and the relative orientation of the two domains were incorrect.
The overall accuracy of AlphaFold2 models on multidomain targets was lower than that on individual domains, albeit still remarkably good (Fig. 4), and the structures of the four multidomain proteins that comprise the β-β′ core of the AR9 nvRNAP were predicted correctly. The model of the gp226 interdomain linker and, as a consequence, the complete model of gp226 was incorrect, although this is hardly surprising considering the fact that the interdomain linker does not have a well-defined secondary structure.
Besides collecting cryo-EM data on the AR9 nvRNAP holoenzyme in complex with the promoter-containing DNA oligonucleotide, we crystallized it separately and collected X-ray diffraction data to 3.4 Å resolution. This dataset had a solvent content of ~64% and contained one molecule of the complete holoenzyme-DNA complex in the asymmetric unit of the C 2 space group. As a final test of the accuracy of the AlphaFold2 models, we examined whether they could serve as search models for solving the phase problem of this dataset by molecular replacement. The models of gp105, gp089, gp270, and gp154 were used as is, without any modification. The gp226 model consisted of two spatially separated globular domains (NTD) and (CTD) connected by a long linker, so we treated the two domains as independent entities. We then used Phaser 30 to perform an automatic molecular replacement procedure with these six sets of coordinates as search models. The four proteins comprising the β-β′ core of the enzyme (gp105, gp089, gp270, and gp154) were placed correctly while the placement of both gp226 domains was incorrect. Manual inspection of the map showed that an electron density for both domains of gp226 was present although was weak, and that the density of a peripheral domain of gp154 was slightly shifted compared to its location in the AlphaFold2 model. We proceeded with fitting the AlphaFold2 models of both gp226 domains into the density and adjusting the location of the peripheral gp154 domain – all as rigid bodies – using Coot 19. A subsequent 20-cycle restrained refinement run with Refmac520 brought the R-free factor to 39%, which resulted in a much better and cleaner electron density in which many of the minor model inaccuracies (some of which are shown in Fig. 4) became obvious and could be easily corrected using a long segment refine/morph procedure implemented in Coot. Further corrections and refinement of the atomic model with Refmac520 and Phenix36 improved the density and revealed the presence of the DNA oligonucleotide. Subsequent rounds of refinement and model building made the AlphaFold2-derived structure indistinguishable (within the expected accuracy) from that obtained by an MR procedure that used the complete cryo-EM-derived holoenzyme complex structure as a search model.
In conclusion we note that the AlphaFold2 team has clearly developed a methodology to accurately predict the tertiary structure of individual domains not only for proteins for which deep sequence alignments could be built but even for unique proteins, such as AR9 gp226. Furthermore, the structures of multidomain proteins, such as those comprising individual subunits of the β-β′ core of the AR9 nvRNAP enzyme, were also predicted with astounding accuracy. This places the AlphaFold2 team within reach of predicting the quaternary structure of larger complexes, and one can argue that they already demonstrated this by the accuracy of their prediction of individual subunits of the AR9 nvRNAP β-β′ core that could be assembled into a complex that closely resembles the experimentally determined structure.
AlphaFold2 helped correct cis and trans proline assignments and the subsequent tracing of 20 amino acid residues in the crystal structure of the baseplate anchor and partner TSP assembly region of TSP4 from Bacteriophage CBA120 (CASP T1070) – by OH, KC, XS, JG, SBL and DCN
From email to the CASP Prediction Center: Unbelievable. They predicted residues 16-75 correctly with an RMS of 1.26 A. Also, the prediction includes a different assignment of a cis proline (P236) than my original assignment. It turned out that the predicted version is correct because it enables repositioning of a tyrosine residue (Y247) in the right place. The change, together with another adjustment ultimately results in a 2-residue shift of 20 residues (237-256). Osnat Herzberg