Namita Dube

and 6 more

The CASP16 Ensemble Prediction experiment assessed advances in methods for modeling proteins, nucleic acids, and their complexes in multiple conformational states. Targets included systems with experimental structures determined in two or three states, evaluated by direct comparison to experimental coordinates, as well as domain–linker–domain (D–L–D) targets assessed against statistical models from NMR and SAXS data. This paper focuses on the former class of multi-state targets. Ten ensembles were released as community challenges, including ligand-induced conformational changes, protein–DNA complexes, a trimeric protein, a stem-loop RNA, and multiple oligomeric states of a single RNA. For five targets, some groups produced reasonably accurate models of both reference states (best TM-score >0.75). However, with the exception of one protein–ligand complex (T1214), where an apo structure was available as a template, predictors generally failed to capture key structural details distinguishing the states. Overall, accuracy was significantly lower than for single-state targets in other CASP experiments. The most successful approaches generated multiple AlphaFold2 models using enhanced multiple sequence alignments and sampling protocols, followed by model quality based selection. While the AlphaFold3 server performed well on several targets, individual groups outperformed it in specific cases. By contrast, predictions for one protein–DNA complex, three RNA targets, and multiple oligomeric RNA states consistently fell short (TM-score <0.75). These results highlight both progress and persistent challenges in multi-state prediction. Despite recent advances, accurate modeling of conformational ensembles, particularly RNA and large multimeric assemblies, remains a critical frontier for structural biology.

Yuanpeng Janet Huang

and 14 more

NMR studies can provide unique information about protein conformations in solution. In CASP14, three reference structures provided by solution NMR methods were available (T1027, T1029, and T1055), as well as a fourth data set of NMR-derived contacts for a integral membrane protein (T1088). For the three targets with NMR-based structures, the best prediction results ranged from very good (GDT_TS = 0.90, for T1055) to poor (GDT_TS = 0.47, for T1029). We explored the basis of these results by comparing all CASP14 prediction models against experimental NMR data. For T1027, the NMR data reveal extensive internal dynamics, presenting a unique challenge for protein structure prediction. The analysis of T1029 motivated exploration of a novel method of “inverse structure determination”, in which an AF2 model was used to guide NMR data analysis. NMR data provided to CASP predictor groups for target T1088, a 238-residue integral membrane porin, was also used to assess several NMR-assisted prediction methods. Most groups involved in this exercise generated similar beta-barrel models, with good agreement with the experimental data. However, as was also observed in CASP13, some pure prediction groups that did not use the NMR data generated structures for T1088 that better fit the NMR data than the models generated using these experimental data. These results demonstrate the remarkable power of modern methods to predict structures of proteins with accuracies rivaling solution NMR structures, and that it is now possible to reliably use prediction models to guide and complement experimental NMR data analysis.