Introduction
Mass spectrometry (MS) is an indispensable tool in proteomics. Due to the high-throughput nature, loads of mass spectral data are generated in any typical proteomics experiment. Therefore, manual interpretation of mass spectral data becomes time-consuming and cumbersome. Consequently, several softwares, including web applications, standalone tools using various algorithms were developed with the key purpose to annotate the mass spectrometric data, thereby simplifying the efforts devoted to data analysis and interpretation [1-12]. Thus far, many software programs have been developed and widely used for the well-established Bottom-up Proteomic (BUP) approach [13, 14]. Similarly, softwares have also been developed for the Top-down proteomics (TDP) (https://www.topdownproteomics.org/ resources/software/). For the approaches involved in middle-down proteomics (MDP), only a few softwares such as YADA, XDIA, isoScale, and Histone coder (https://middle-down.github.io/Software/) are available especially for histone and antibody characterization [15-17]. In all these available softwares, protein sequence database is imperative, which must be entered as an input for identifying proteins. The protein sequences in a database are then used to calculate the m/z values of precursor ions and peptide fragment ions. These calculated m/zvalues are actually saved or stored in the form of another database, which is subsequently used to annotate the spectra resulting from tandem mass spectrometry (MS/MS) and eventually leading to identify proteolytic peptides and/or proteins. Therefore, at the end of the database search process, the user views only the ‘matched hits’ in the output, viz., the agreement between the experimental MS/MS spectra and the relevant database entries. This is the typical way of functioning of several proteomic softwares for protein identification. In all these cases, the user cannot view the database containing the m/z values of precursor ions and fragment ions, prior to database search. In other words, the user is aware of the protein sequence database that he/she enters as an input file, whereas the user cannot ‘view’ and hence, is oblivious of the database comprising m/z values of the precursor ions and the fragment ions that has been generated using the protein sequences, before the database search process. Thus, the user does not know, what is happening with the ‘sequence database’ that he/she uploads in the search engine.
And, since it is important that the choice of ‘optimal database’ is critical for more reliable protein identification from MS/MS [18], we decided to develop a new standalone software tool called ‘Database Creator for Protein/Peptide Mass Analysis, (DC-PPMA)’, wherein the user can ‘view’ the database containing the calculated m/z values of precursor ions and fragment ions, before the process of database search . So, the user is aware of the ‘custom’ database of m/z values of precursor and fragment ions that he/she will be using subsequently for MS/MS based search and for further analysis.
In DC-PPMA, the ‘database’ can be created and tailored according to the proteomic approach that a user follows. Further, DC-PPMA can be used for analysing PTMs, isoforms and also user-defined (custom/new) modifications of targeted peptides/proteins. Furthermore, DC-PPMA is suited for analysing sequences of intact peptides, e.g., natural product polypeptides or synthetic peptides, whose sequences can be entered in an input file. With respect to MD proteomic analysis, two features have been included in DC-PPMA: (i) specialized enzymes used for the MDP are given in the python dictionary and (ii) ‘mass range’ filter is provided for creating databases containing longer proteolytic/truncated peptides. Additionally, TDP analysis can be performed in DC-PPMA by creating database containing multiply charged ions of intact protein sequences, for which no protease need to be selected. So, DC-PPMA is applicable for any proteomic approach, be it MDP, BUP or TDP. Thus, altogether DC-PPMA can be utilized for the identification and characterization of sequences: (i) derived from transcriptomic data, (ii) targeted proteins of user’s interest, (iii) peptide(s) of any length and (iv) custom modified peptides/proteins. So, it can be used not only for mass spectral data analysis for proteomics but also for peptidomics. The detailed workflow of DC-PPMA containing three modules is shown in (Figure 1 )