Figure
2
Figure 2. Workflow of FoxB structure determination. The
structure was determined by MR-SAD using the AlphaFold2 model and
experimental phases. (A) Anomalous difference map with Se and Fe sites
at 2σ. (B) Overall map of FoxB after refinement (2σ). (C) Superposition
of the final model (green) and AlphaFold2 model (cyan) shows excellent
agreement. Density for heme groups (not present in AlphaFold2 model) is
shown.
Model accuracy
The AlphaFold2 model that was used for the study (T1058TS427_3) shows a
remarkable similarity to the final
structure17. The
overall RMSD is 1.17 Å for all atoms and 0.973 Å for Cα atoms. Not only
were all transmembrane helices built and registered correctly, but also
the periplasmic domains containing several loops were modelled with high
accuracy. There was no density for the cytoplasmic loop connecting TM
helices 2 and 3 (residues 172-188), and it was therefore omitted from
the final model. Molecular replacement was only successful with the
AlphaFold2 model but not with server models from the CASP14 experiment
(>30 models tried, many of them with correct overall fold).
The success of the AlphaFold2 models seems to be due to their models
“getting the details right”, which was required for a clear MR
solution. As one example for the accuracy of the AlphaFold2 model, the
His residues coordinating the two heme groups in FoxB were positioned
correctly, although this model did not contain heme groups (as we only
provided the protein sequence to CASP14). This fact however, also
highlights a current limitation of the AlphaFold2 model: While it
provides an astonishing good model for the apo protein, it is obviously
still lacking the functional groups (two heme groups in case of FoxB),
which are responsible for the biological function.
The astounding accuracy of AlphaFold2 models of all subunits
of phage AR9 non-virion RNA polymerase (CASP: T1092-T1096) – by AF,
MLS and PGL.
From email to the CASP Prediction Center: We are shocked…
stunned… by the quality of the model. You would not believe how
much effort we have put into getting this structure. Years of
work… Both cryo-EM and crystallography… I mean, this is
really shocking. Petr Leiman
Brief description of the target
A group of large or “jumbo” bacteriophages, with genomes larger than
200 kbp, encode two distinct DNA-dependent RNA polymerases (RNAPs),
allowing these phages to assemble independently from the host RNAP21-24. One of these
phage-encoded RNAPs is packaged into the phage capsid and hence is
called the virion RNAP (vRNAP). Following the attachment to the host
cell, the virus injects the vRNAP together with its DNA into the host
cytoplasm. After injection, the vRNAP transcribes early phage genes,
including those of the second RNAP (the non-virion RNAP, nvRNAP). The
latter transcribes late genes, including those that encode for the
vRNAP, which is then packaged into newly assembled phage particles. The
exact mechanism of this temporal and spatial activation/regulation of
transcription is unclear but it is known that v- and nvRNAPs recognize
different promoters 23.
Both v- and nvRNAPs are distantly related to multi-subunit RNAPs
(msRNAPs) of bacteria, eukaryotes, and archaea23. The universally
conserved core of cellular msRNAPs contains six subunits
α2ββ′ω, and the catalytic cavity is formed by β and β′25. However, neither v-
or nvRNAPs contain homologs of α or ω subunits, and their β and β′
subunits are split into two or three separate genes that are located in
different regions of the phage genome. For sequence-specific initiation
of transcription, the phage AR9 nvRNAP core is required to form a
complex with a promoter specificity subunit gene product 226 (gp226)
that shows no sequence similarity to any known bacterial, eukaryotic, or
archaeal transcription initiation factor. In fact, the amino acid
sequence of gp226 was a singleton in the GenBank database at the time of
CASP14 experiment.
Besides employing a unique transcription factor, the AR9 nvRNAP
possesses a number of other distinct properties. Unlike any known
msRNAP, the AR9 nvRNAP recognizes the promoter in the template strand of
double stranded DNA and can initiate promoter-specific transcription on
single stranded DNA 26.
Furthermore, as the genomic DNA of bacteriophage AR9 contains
deoxyuridine instead of thymidine21, the AR9 nvRNAP is
critically sensitive to the presence of uracils in two key positions of
its promoter sequence, and promoters with thymines in these positions
are not recognized 26.
To understand the novel and unusual mechanism of promoter recognition by
the AR9 nvRNAP, we decided to determine the structure of this enzyme in
various states: in complex with the specificity subunit and without it,
and in DNA template-bound and DNA-free forms. For the template, we used
a short DNA oligonucleotide that contained a promoter recognized by the
AR9 nvRNAP in vivo and in vitro .
How AlphaFold2 models helped solve the structure
The most feature-full and continuous electron density map of the AR9
nvRNAP was initially obtained by cryo-electron microscopy (cryo-EM)
imaging of the nvRNAP holoenzyme (i.e. containing the specificity
subunit) in complex with the promoter-containing DNA oligonucleotide.
This complex contained five polypeptide chains – the specificity
subunit gp226, the N- and C-terminal parts of the β subunit gp105 and
gp089 (respectively), and the N- and C-terminal parts of the β′ subunit
gp270 and gp154 (respectively) – and the DNA oligonucleotide, the
structure of which will be described elsewhere. The cryo-EM
reconstruction was calculated using cryoSPARC27 and had a resolution
of 3.8 Å.
In parallel, several maps of the AR9 nvRNAP β-β′ core (i.e. without the
specificity subunit) of varying quality and resolutions were obtained
using X-ray crystallography. The dataset that produced the best electron
density also extended to 3.8 Å resolution, albeit this map was
significantly worse (poorer connectivity and quality of side chain
features) than the cryo-EM map. The phases for this dataset were
obtained by eight-fold non-crystallographic
averaging 28,29of molecular replacement phases30 calculated with the
help of a partial model. The latter was built using a single wavelength
anomalous dispersion map of a dataset with a smaller unit cell31-33.
According to HHpred analysis at the time34, the most similar
RNAP with a known atomic structure was that of Mycobacterium
tuberculosis (PDB code 5ZX335). The AR9 nvRNAP
gp089, gp270, and gp154 proteins could all be aligned – with a 20-24%
sequence identity and 100% probabilities – to continuous stretches of
the M. tuberculosis RNAP β and β′ subunits. Gp105 was a more
difficult target, with only its C-terminal half being predicted to be
similar to a fragment of the M. tuberculosis RNAP β subunit with
an 80% probability and an E value of 2.3. The structure of gp226, as it
was a unique sequence in the entire GenBank, could not be reliably
predicted by any tool.
Using both the best cryo-EM and X-ray maps of the AR9 nvRNAP and the
structure of the M. tuberculosis RNAP as a chain-tracing guide in
stretches of high sequence similarity, we manually built
~90% of the AR9 nvRNAP structure19. Some peripheral
domains of gp105, gp154, and gp226 and regions for which no homology
models existed were particularly challenging. Fortunately, while we were
working on improving the cryo-EM map and X-ray phases to make the
structure building process for these regions possible, the models of all
five proteins produced by the AlphaFold2 team were made available to us
by the CASP14 organizers. To our amazement, the AlphaFold2 models were
of excellent quality and fit the cryo-EM and X-ray maps near perfectly
almost everywhere including the no-homology regions (Fig. 3). This made
the completion of the structure building process nearly trivial.