Andriy Kryshtafovych -

Andriy Kryshtafovych

Public Documents 14

Modeling Alternative Conformational States in CASP16

Namita Dube

and 6 more

September 04, 2025

The CASP16 Ensemble Prediction experiment assessed advances in methods for modeling proteins, nucleic acids, and their complexes in multiple conformational states. Targets included systems with experimental structures determined in two or three states, evaluated by direct comparison to experimental coordinates, as well as domain–linker–domain (D–L–D) targets assessed against statistical models from NMR and SAXS data. This paper focuses on the former class of multi-state targets. Ten ensembles were released as community challenges, including ligand-induced conformational changes, protein–DNA complexes, a trimeric protein, a stem-loop RNA, and multiple oligomeric states of a single RNA. For five targets, some groups produced reasonably accurate models of both reference states (best TM-score >0.75). However, with the exception of one protein–ligand complex (T1214), where an apo structure was available as a template, predictors generally failed to capture key structural details distinguishing the states. Overall, accuracy was significantly lower than for single-state targets in other CASP experiments. The most successful approaches generated multiple AlphaFold2 models using enhanced multiple sequence alignments and sampling protocols, followed by model quality based selection. While the AlphaFold3 server performed well on several targets, individual groups outperformed it in specific cases. By contrast, predictions for one protein–DNA complex, three RNA targets, and multiple oligomeric RNA states consistently fell short (TM-score <0.75). These results highlight both progress and persistent challenges in multi-state prediction. Despite recent advances, accurate modeling of conformational ensembles, particularly RNA and large multimeric assemblies, remains a critical frontier for structural biology.

Assessment of Pharmaceutical Protein-Ligand Pose and Affinity Predictions in CASP16

Michael Gilson

and 5 more

April 26, 2025

The protein-ligand component of the 16th Critical Assessment of Structure Prediction (CASP16) challenged participants to predict both binding poses and affinities of small molecules to protein targets, with a focus on drug-like compounds from pharmaceutical discovery projects. Thirty research groups submitted predictions for 229 protein-ligand pose targets and 140 affinity targets across five protein systems. Template-based pose-prediction methods did particularly well, with the best groups achieving mean LDDT-PLI values of 0.69 (scale of 0-1 with 1 best). For comparison, we also ran a set of automated baseline pose-prediction methods, including ones using deep neural networks. Of these, AlphaFold 3 did particularly well, with a mean LDDT-PLI of 0.8, thus outscoring the best CASP16 predictor. The CASP affinity predictions showed modest correlation with experimental data (maximum Kendall’s τ = 0.42), well below the theoretical maximum possible given experimental uncertainty. As seen in prior challenges, providing experimental structures did not improve affinity predictions in the second stage of the challenge, suggesting that the scoring functions used here are a key limiting factor. Overall, the accuracy achieved by CASP participants is similar to that observed in the prior Drug Design Data Resource (D3R) blinded prediction challenges. The present results highlight the progress and persistent challenges in computational protein-ligand modeling and provide valuable benchmarks for the field of computer-aided drug design.

Updates to the CASP infrastructure in 2024

Andriy Kryshtafovych

and 6 more

May 05, 2025

CASP (Critical Assessment of Structure Prediction) conducts community experiments to determine the state of the art in calculating macromolecular structures. The CASP data management system is continually evolving to address the changing needs of the experiments. For CASP16, we expanded the infrastructure to enable data handling of newly introduced categories and fully support pilot categories introduced in CASP15. This technical note also documents integration of the CASP and CAPRI (Critical Assessment of PRedicted Interactions) systems.

Protein target highlights in CASP16: insights from the structure providers

Leila T. Alexander

and 34 more

May 05, 2025

This article presents an in-depth analysis of selected CASP16 targets, with a focus on their biological and functional significance. The authors highlight the most relevant features of the target proteins and discuss how well these were reproduced in the submitted predictions. While the overall performance of structure prediction methods remains impressive, challenges persist, particularly in modeling rare structural motifs, flexible regions, small molecule interactions, post-translational modifications, and biologically important interfaces. Addressing these limitations can strengthen the role of structure prediction in complementing experimental efforts and advancing both basic research and biomedical applications.

CASP15 cryoEM protein and RNA targets: refinement and analysis using experimental map...

Thomas Mulvaney

and 7 more

June 22, 2023

CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, errors in the reference structures can potentially reduce the accuracy of the assessment. This issue is particularly prominent in cryoEM-determined structures, and therefore, in the assessment of CASP15 cryoEM targets, we directly utilized density maps to evaluate the predictions. A method for ranking the quality of protein chain predictions based on rigid fitting to experimental density was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy although local assessment of predicted side chains in a 1.52 Å resolution map showed that side-chains are sometimes poorly positioned. The top 136 predictions associated with 9 protein target reference structures were selected for refinement, in addition to the top 40 predictions for 11 RNA targets. To this end, we have developed an automated hierarchical refinement pipeline in cryoEM maps. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure, including some regions with better fit to the density. This refinement was successful despite large conformational changes and secondary structure element movements often being required, suggesting that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryoEM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors with even short loops failing to be accurately modeled or refined at times. The lack of consensus amongst models suggests that modeling holds the potential for identifying more flexible regions within the structure.

Critical Assessment of Methods of Protein Structure Prediction (CASP) – Round XV

Andriy Kryshtafovych

and 4 more

October 06, 2023

Computing protein structure from amino acid sequence information has been a long-standing grand challenge. CASP (Critical Assessment of Structure Prediction) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every two years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of Proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.

Tertiary structure assessment at CASP15

Daniel Rigden

and 7 more

May 24, 2023

The results of tertiary structure assessment at CASP15 are reported. For the first time, recognising the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single chain predictions were assessed together, irrespective of whether a template was available. At CASP15 there was no single stand-out group, with most of the best-scoring groups - led by PEZYFoldings, UM-TBM and Yang Server - employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues: small size, high α-helical content and monomeric structure were other likely aggravating factors. Local divergence between prediction and target correlated with localisation at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, but should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups, including those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas, produced high quality predictions for most targets which are valuable for experimental structure determination, functional analysis and many other tasks across biology.

Breaking the conformational ensemble barrier: Ensemble structure modeling challenges...

Andriy Kryshtafovych

and 5 more

August 14, 2023

For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.

RNA target highlights in CASP15: Evaluation of predicted models by structure provider...

Rachael C. Kretsch

and 15 more

May 23, 2023

A document by Rachael C. Kretsch. Click on the document to view its contents.

To split or not to split: CASP15 targets and their processing into tertiary structure...

Andriy Kryshtafovych

and 1 more

March 13, 2023

Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors’ performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.

New prediction categories in CASP15

Andriy Kryshtafovych

and 16 more

May 10, 2023

Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.

Assessment of Prediction Methods for Protein Structures Determined by NMR in CASP14:...

Yuanpeng Janet Huang

and 14 more

July 27, 2021

NMR studies can provide unique information about protein conformations in solution. In CASP14, three reference structures provided by solution NMR methods were available (T1027, T1029, and T1055), as well as a fourth data set of NMR-derived contacts for a integral membrane protein (T1088). For the three targets with NMR-based structures, the best prediction results ranged from very good (GDT_TS = 0.90, for T1055) to poor (GDT_TS = 0.47, for T1029). We explored the basis of these results by comparing all CASP14 prediction models against experimental NMR data. For T1027, the NMR data reveal extensive internal dynamics, presenting a unique challenge for protein structure prediction. The analysis of T1029 motivated exploration of a novel method of “inverse structure determination”, in which an AF2 model was used to guide NMR data analysis. NMR data provided to CASP predictor groups for target T1088, a 238-residue integral membrane porin, was also used to assess several NMR-assisted prediction methods. Most groups involved in this exercise generated similar beta-barrel models, with good agreement with the experimental data. However, as was also observed in CASP13, some pure prediction groups that did not use the NMR data generated structures for T1088 that better fit the NMR data than the models generated using these experimental data. These results demonstrate the remarkable power of modern methods to predict structures of proteins with accuracies rivaling solution NMR structures, and that it is now possible to reliably use prediction models to guide and complement experimental NMR data analysis.