Navigating the Pre- and Post-AlphaFold Divide: CAPRI 8th evaluation meeting, February 12-14, Grenoble, FRAlexandre M.J.J. Bonvin1 and M. F. Lensink21 Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, The Netherlands2 University of Lille, CNRS UMR8576 UGSF, Unité de Glycobiologie Structurale et Fonctionnelle, F-59000 Lille, France* Corresponding authors: marc.lensink@univ-lille.fr and a.m.j.j.bonvin@uu.nlBiomolecular interactions govern a broad spectrum of cellular and intercellular processes. Characterising these interactions at atomic resolution is fundamental for elucidating molecular mechanisms in health and disease and for guiding the development of novel therapeutics. In addition to classical structural biology methods—including X-ray crystallography, NMR, cryo-electron microscopy and tomography—as well as a wide array of complementary experimental techniques, computational approaches for predicting the structures of protein complexes have long played an essential and synergistic role. Over time, these computational methods have advanced to model increasingly large and intricate assemblies, often integrating diverse experimental and bioinformatics data within the framework of integrative structural biology. More recently, the advent of artificial intelligence (AI) has introduced powerful new predictive strategies, further expanding the toolkit of computational structural biology.Progress in this field has been closely monitored and continues to be catalysed by the Critical Assessment of Predicted Interactions (CAPRI) initiative (capri-docking.org ; https://www.ebi.ac.uk/pdbe/complex-pred/capri/ ), established in 2001. CAPRI provides a community-wide platform for blind assessment of modelling algorithms on experimentally determined protein complexes (called “targets”), made available to the organisers prior to publication. Prediction Rounds are launched as new targets arise and typically last 3–6 weeks. Registered participants are invited to predict the three-dimensional structure of target assemblies, using either sequence information alone or, when available, the structures of unbound components. CAPRI Rounds also include a Scoring challenge, in which participants must discriminate near-native solutions from ensembles of models gathered from all submissions. Groups may participate as Predictors, Scorers, or both. Comprehensive details on the organisation of CAPRI Rounds, evaluation procedures, and group performance rankings are available at the CAPRI websites.To date, 59 CAPRI Rounds have been completed, comprising more than 300 targets. Results from Rounds 1–45 were presented at seven assessment meetings between 2002 and 2018. Initially centred on homo- and hetero-protein complexes, CAPRI has progressively expanded its scope to encompass protein–peptide, protein–nucleic acid, and protein–carbohydrate assemblies. Several Rounds have further explored the prediction of binding affinities and the explicit modelling of interfacial water molecules. Since 2014, CAPRI has also conducted joint experiments with the Critical Assessment of Structure Prediction (CASP), fostering deeper integration across modelling methodologies and research communities. While CASP originally focused on the prediction of individual protein structures (http://predictioncenter.org/ ), it has since broadened its scope to include protein assemblies, protein–nucleic acid, and protein–ligand complexes. Outcomes from joint CAPRI–CASP challenges have been published in dedicated Special Issues ofProteins .1–7This CAPRI Special Issue reports on the prediction results for 16 targets from Rounds 47–55 (2020–2024). The extended evaluation period reflects both a scarcity of suitable targets and the disruption caused by the COVID-19 pandemic. During this period, the CAPRI community also contributed to the urgent modelling of SARS-CoV-2 complexes (the COVID effort, Round 51), which is not included in the present analysis. The current evaluation also excludes the CASP-linked Rounds 50 and 54, described elsewhere.8,9A total of 64 groups participated in these six Rounds, including 61 predictor and 53 scorer groups. Results were presented at the eighth CAPRI evaluation meeting, held jointly with the Integrative Computational Biology Workshop in February 2024 at the Institut de Biologie Structurale (IBS) in Grenoble, France (https://workshops.ill.fr/event/392/ ; http://www.capri-docking.org/events/ ). The meeting was organised by Dr. Martin Blackledge (IBS, Grenoble), Dr. Frédéric Cazals (INRIA, Sophia Antipolis), Dr. Sergei Grudinin (Université Grenoble Alpes), Dr. Anne Martel (Institut Laue-Langevin, European Neutron Source), Dr. Sylvain Prévost (Institut Laue-Langevin, European Neutron Source), Dr. Guy Schoehn (IBS, Grenoble), Dr. Mark Tully (ESRF, Grenoble), Dr. Martin Weik (IBS, Grenoble), Dr. Alexandre Bonvin (Utrecht University), and Dr. Shoshana Wodak (VIB-VUB Center for Structural Biology, Brussels).

Marc Lensink

and 112 more

We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.

Hugo Schweke

and 36 more

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94 respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines were shown to recall the physiological dimers with significantly higher accuracy than the non-physiological set, lending support for the pertinence of our benchmark dataset. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.