A. Clay Richard

and 1 more

The last few years have seen the rapid proliferation of machine learning- (ML) based binding protein design methods. Although these methods have shown large increases in experimental success rates compared to prior approaches, the majority of their predictions fail when experimentally tested. It is evident that computational methods still struggle to distinguish the features of real protein binding interfaces from false predictions. To identify features of interactions that should occur in protein binding interfaces, short molecular dynamics simulations of 20 antibody-protein complexes were conducted. Intermolecular salt bridges, hydrogen bonds, and hydrophobic interactions were evaluated for their persistences, energies, and stabilities during the simulations. It was determined that only hydrogen bonds where both residues are stabilized in the bound complex are expected to persist and contribute meaningfully to the binding between proteins. In contrast, stabilization was not a requirement for salt bridges and hydrophobic interactions to persist but interactions where both residues are stabilized in the bound complex persist significantly longer and have significantly stronger energies. Using a dataset of 220 real antibody- protein complexes and 8194 false complexes from docking, a random forest classifier was trained and tested using features of the expected persistent interactions and compared to one only using the complex-level features of interaction energy (IE), buried surface area (BSA), IE/BSA, and shape complementarity. Inclusion of the features of the expected persistent interactions reduced the false positive rate of the classifier by two to five fold across a range of true positive classification rates.