Figure 4
Figure 4 . Inaccuracies in AlphaFold2 models. Cryo-EM-derived
structures and AlphaFold2 models of several AR9 nvRNAP subunits are
superimposed and regions where the conformation of the AlphaFold2 model
deviates significantly from the cryo-EM-derived structure are indicated
with a dashed line and their boundary residues are labeled. Note that
the folds of both the N- and C-terminal domains of gp226 were predicted
correctly, but the structure of the interdomain linker and the relative
orientation of the two domains were incorrect.
The overall accuracy of AlphaFold2 models on multidomain targets was
lower than that on individual domains, albeit still remarkably good
(Fig. 4), and the structures of the four multidomain proteins that
comprise the β-β′ core of the AR9 nvRNAP were predicted correctly. The
model of the gp226 interdomain linker and, as a consequence, the
complete model of gp226 was incorrect, although this is hardly
surprising considering the fact that the interdomain linker does not
have a well-defined secondary structure.
Besides collecting cryo-EM data on the AR9 nvRNAP holoenzyme in complex
with the promoter-containing DNA oligonucleotide, we crystallized it
separately and collected X-ray diffraction data to 3.4 Å resolution.
This dataset had a solvent content of ~64% and
contained one molecule of the complete holoenzyme-DNA complex in the
asymmetric unit of the C 2 space group. As a final test of the
accuracy of the AlphaFold2 models, we examined whether they could serve
as search models for solving the phase problem of this dataset by
molecular replacement. The models of gp105, gp089, gp270, and gp154 were
used as is, without any modification. The gp226 model consisted of two
spatially separated globular domains (NTD) and (CTD) connected by a long
linker, so we treated the two domains as independent entities. We then
used Phaser 30 to
perform an automatic molecular replacement procedure with these six sets
of coordinates as search models. The four proteins comprising the β-β′
core of the enzyme (gp105, gp089, gp270, and gp154) were placed
correctly while the placement of both gp226 domains was incorrect.
Manual inspection of the map showed that an electron density for both
domains of gp226 was present although was weak, and that the density of
a peripheral domain of gp154 was slightly shifted compared to its
location in the AlphaFold2 model. We proceeded with fitting the
AlphaFold2 models of both gp226 domains into the density and adjusting
the location of the peripheral gp154 domain – all as rigid bodies –
using Coot 19. A
subsequent 20-cycle restrained refinement run with Refmac520 brought the R-free
factor to 39%, which resulted in a much better and cleaner electron
density in which many of the minor model inaccuracies (some of which are
shown in Fig. 4) became obvious and could be easily corrected using a
long segment refine/morph procedure implemented in Coot. Further
corrections and refinement of the atomic model with Refmac520 and Phenix36 improved the density
and revealed the presence of the DNA oligonucleotide. Subsequent rounds
of refinement and model building made the AlphaFold2-derived structure
indistinguishable (within the expected accuracy) from that obtained by
an MR procedure that used the complete cryo-EM-derived holoenzyme
complex structure as a search model.
In conclusion we note that the AlphaFold2 team has clearly developed a
methodology to accurately predict the tertiary structure of individual
domains not only for proteins for which deep sequence alignments could
be built but even for unique proteins, such as AR9 gp226. Furthermore,
the structures of multidomain proteins, such as those comprising
individual subunits of the β-β′ core of the AR9 nvRNAP enzyme, were also
predicted with astounding accuracy. This places the AlphaFold2 team
within reach of predicting the quaternary structure of larger complexes,
and one can argue that they already demonstrated this by the accuracy of
their prediction of individual subunits of the AR9 nvRNAP β-β′ core that
could be assembled into a complex that closely resembles the
experimentally determined structure.
AlphaFold2 helped correct cis and trans proline assignments
and the subsequent tracing of 20 amino acid residues in the crystal
structure of the baseplate anchor and partner TSP assembly region of
TSP4 from Bacteriophage CBA120 (CASP T1070) – by OH, KC, XS, JG, SBL
and DCN
From email to the CASP Prediction Center: Unbelievable. They
predicted residues 16-75 correctly with an RMS of 1.26 A. Also, the
prediction includes a different assignment of a cis proline (P236) than
my original assignment. It turned out that the predicted version is
correct because it enables repositioning of a tyrosine residue (Y247) in
the right place. The change, together with another adjustment ultimately
results in a 2-residue shift of 20 residues (237-256). Osnat Herzberg