Introduction
Mass spectrometry (MS) is an indispensable tool in proteomics. Due to
the high-throughput nature, loads of mass spectral data are generated in
any typical proteomics experiment. Therefore, manual interpretation of
mass spectral data becomes time-consuming and cumbersome. Consequently,
several softwares, including web applications, standalone tools using
various algorithms were developed with the key purpose to annotate the
mass spectrometric data, thereby simplifying the efforts devoted to data
analysis and interpretation [1-12]. Thus far, many software programs
have been developed and widely used for the well-established Bottom-up
Proteomic (BUP) approach [13, 14]. Similarly, softwares have also
been developed for the Top-down proteomics (TDP)
(https://www.topdownproteomics.org/ resources/software/). For the
approaches involved in middle-down proteomics (MDP), only a few
softwares such as YADA, XDIA, isoScale, and Histone coder
(https://middle-down.github.io/Software/) are available especially
for histone and antibody characterization [15-17]. In all these
available softwares, protein sequence database is imperative, which must
be entered as an input for identifying proteins. The protein sequences
in a database are then used to calculate the m/z values of
precursor ions and peptide fragment ions. These calculated m/zvalues are actually saved or stored in the form of another database,
which is subsequently used to annotate the spectra resulting from tandem
mass spectrometry (MS/MS) and eventually leading to identify proteolytic
peptides and/or proteins. Therefore, at the end of the database search
process, the user views only the ‘matched hits’ in the output, viz., the
agreement between the experimental MS/MS spectra and the relevant
database entries. This is the typical way of functioning of several
proteomic softwares for protein identification. In all these cases, the
user cannot view the database containing the m/z values of
precursor ions and fragment ions, prior to database search. In other
words, the user is aware of the protein sequence database that he/she
enters as an input file, whereas the user cannot ‘view’ and hence, is
oblivious of the database comprising m/z values of the precursor
ions and the fragment ions that has been generated using the protein
sequences, before the database search process. Thus, the user does not
know, what is happening with the ‘sequence database’ that he/she uploads
in the search engine.
And, since it is important that the choice of ‘optimal database’ is
critical for more reliable protein identification from MS/MS [18],
we decided to develop a new standalone software tool called ‘Database
Creator for Protein/Peptide Mass Analysis, (DC-PPMA)’, wherein the user
can ‘view’ the database containing the calculated
m/z values of precursor ions and fragment ions, before the
process of database search . So, the user is aware of the ‘custom’
database of m/z values of precursor and fragment ions that he/she
will be using subsequently for MS/MS based search and for further
analysis.
In DC-PPMA, the ‘database’ can be created and tailored according to the
proteomic approach that a user follows. Further, DC-PPMA can be used for
analysing PTMs, isoforms and also user-defined (custom/new)
modifications of targeted peptides/proteins. Furthermore, DC-PPMA is
suited for analysing sequences of intact peptides, e.g., natural product
polypeptides or synthetic peptides, whose sequences can be entered in an
input file. With respect to MD proteomic analysis, two features have
been included in DC-PPMA: (i) specialized enzymes used for the MDP are
given in the python dictionary and (ii) ‘mass range’ filter is provided
for creating databases containing longer proteolytic/truncated peptides.
Additionally, TDP analysis can be performed in DC-PPMA by creating
database containing multiply charged ions of intact protein sequences,
for which no protease need to be selected. So, DC-PPMA is applicable for
any proteomic approach, be it MDP, BUP or TDP. Thus, altogether DC-PPMA
can be utilized for the identification and characterization of
sequences: (i) derived from transcriptomic data, (ii) targeted proteins
of user’s interest, (iii) peptide(s) of any length and (iv) custom
modified peptides/proteins. So, it can be used not only for mass
spectral data analysis for proteomics but also for peptidomics. The
detailed workflow of DC-PPMA containing three modules is shown in
(Figure 1 )