loading page

Hist-i-fy: Multiple histidine function prediction based on protein sequences using deep neural network
  • +2
  • Debashree Bandyopadhyay,
  • Abhishek Jalan,
  • Dibyansu Diptiman,
  • Rishabh Pal,
  • Sachin Dodwani
Debashree Bandyopadhyay
Birla Institute of Technology & Science Pilani - Hyderabad Campus

Corresponding Author:banerjee.debi@hyderabad.bits-pilani.ac.in

Author Profile
Abhishek Jalan
Birla Institute of Technology & Science Pilani - Hyderabad Campus
Author Profile
Dibyansu Diptiman
Birla Institute of Technology & Science Pilani - Hyderabad Campus
Author Profile
Rishabh Pal
Birla Institute of Technology & Science Pilani - Hyderabad Campus
Author Profile
Sachin Dodwani
Birla Institute of Technology & Science Pilani - Hyderabad Campus
Author Profile

Abstract

Histidine (His) is the most reactive amino acid at enzyme active sites. Multiple post-translational modifications (functions) are reported for His side chains. The high-throughput sequencing techniques produce a large number of protein sequences without functional annotations at the amino acid level. Experimental characterization of His functions in proteins is laborious and time-consuming. Computational characterization based on protein sequences may complement the need. There are only a handful of Histidine function prediction tools available and those annotate only a single function. Here we curated a dataset of active Histidine with known functions based on protein sequences obtained from UniProt database (sample size n=1584) and trained against four machine learning methods. The convolution neural network (CNN) model (“ Hist-i-fy”) performed the best with 75% overall accuracy. The external validation of Hist-i-fy on phosphorylated histidine data (sample size 34) showed 94.1% prediction accuracy. For the first time, we report multiple His function prediction, based on protein sequences using deep neural networks. The inputs to the model are i) protein sequence containing His, and ii) the His residue number. The model predicts one out of the eight histidine functions, namely, acetylation, ribosylation, glycosylation, hydroxylation, methylation, oxidation, phosphorylation, and protein splicing. The novelty of the work is, it predicts maximum number of histidine functions at a time with optimal performance. There is a scope of improvement in the model upon availability of a larger dataset. The model is available as a web application ([https://histify.streamlit.app/](https://histify.streamlit.app/)) and a stand-alone code [https://github.com/dibyansu24-maker/Histify](https://github.com/dibyansu24-maker/Histify)).