4 Conclusion
With combined prediction of protein-ligand complexes forming the next
frontier for deep learning in computational structural biology, we need
approaches for independent, comprehensive and blind assessment of
prediction methods to better assess the advantages and shortcomings of
classical and novel approaches. Two complementary approaches can be
employed for this purpose: weekly continuous evaluation of structures
released in the PDB, and the creation of a representative, diverse
dataset for benchmarking.
In this study, we examined three challenges essential for establishing
such systems in an automated and unsupervised manner: determining
whether an experimentally solved PLC can be used as ground truth,
assessing the interest or difficulty of a PLC for prediction, and
automating the scoring of predicted PLC. In the process, we defined
quality criteria for PLC pockets, assessed novelty in the PDB over the
years, and developed an automated workflow for PLC prediction and
assessment using newly developed scoring metrics. Ligand preparation is
a known challenge in docking and throughout our research we faced
obstacles in automating ligand preparation, in particular with molecule
parsing and protonation.
The PDBBind dataset has been frequently utilized for training
deep-learning based docking methods and evaluating their accuracy. Many
deep learning methods retained 363 PDBBind PLC as a test set based on
their release date after 2019. However, this selection is not ideal for
benchmarking, as only half of the structures meet the quality criteria
indicating unreliable ground truth, redundancy removal was not
performed, and diversity was not considered when choosing the PLC.
Consequently, there is a need for a representative dataset that follows
the concepts presented in this study.