3.1. Brief description of the target
As a member of the recently defined Kuttervirus genus, theEscherichia coli O157:H7 bacteriophage CBA120 infects multiple hosts using four tailspike proteins (TSP1-4). Each TSP has a distinct endo-glycosidase activity specific to the lipopolysaccharides of different bacterial hosts. The four phage CBA120 TSPs are so far the best characterized, thus they served as a paradigm for understanding the infection mechanism and host range expansion characteristic to theKuttervirus genus. All TSPs assemble into trimers and employ the same overall fold of their catalytic domains (trimers of β-helix subunits). Nevertheless, within this fold, the different active site architectures confer different endo-glycosidase substrate specificities, which in turn facilitates the host range expansion of the phage37-40. The four TSPs form a complex, seen on negative-stained electron micrographs as a branched appendage emanating from the phage tail41. The 335 N-terminal amino acids of TSP4 mediate this assembly and anchoring function. The sequence of this region (herewith termed TSP4-N) comprise the target submitted for CASP14 structure prediction (target T1070). The crystal structure of TSP4-N was determined initially at a resolution limit of 3.2 Å using Single-wavelength Anomalous Dispersion at the Se absorption edge of crystals containing SeMet protein. This structure served as a Molecular Replacement search model to determine the crystal structure of the wild-type TSP4-N using crystals that diffracted to a resolution limit of 2.6 Å. Structure refinement of this crystal form yieldedR = 0.206 and R free = 0.229.
Consistent with the full-length TSP4, the TSP4-N also assembled into trimers. The structure revealed four domains connected by flexible linkers. The 75 N-terminal amino acids comprise the domain that anchors TSP4 to the phage tail baseplate (herewith termed AD). Of these, approximately 50 amino acid residues fold into an intertwined triple β-helix, which then disengage to form an antiparallel β-prism II from the ensuing 25 residues, with each subunit contributing 3-stranded antiparallel β-sheet to the trimer prism (Fig. 5A). This was the most challenging region for structure prediction because of its lack of sequence homology to sequences of known protein structure. Following a short linker region, the polypeptide chain folds into three domains (herewith termed XD1-3) that recruit the partner TSPs. While XD1 exhibits a low but clear sequence identity to a domain of gp9 from phage T4 baseplate (18% over 95 of 100 shared amino acid residues), XD2 and XD3 exhibit only remote sequence homology to proteins of known crystal structure, which can be detected by Hidden Markov Model methods. Domain XD1 adopts a mixed β-sandwich fold, while both XD2 and XD3 adopt a jellyroll fold. In the crystal structures, whether the trimers employ a crystallographic or non-crystallographic 3-fold symmetry axis, all domains obey the same 3-fold symmetry axis. The XD1 and XD3 monomers form closely packed trimeric assemblies. However, XD2 subunits splay apart and do not interact with one another even though they remain related by the 3-fold symmetry axis. This spatial separation of XD2 subunits prevents binding of a trimeric partner TSP, and is probably a crystal packing artifact. Indeed, a crystal structure of a protein construct lacking the XD3 domain revealed closely packed XD2 subunits, as necessary for binding of a trimeric TSP partner.