Fragterminomics: extracting information on proteolytic processing from
shotgun proteomics data processed by FragPipe
Abstract
State-of-the-art mass spectrometers combined with modern bioinformatics
algorithms for peptide-to-spectrum matching (PSM) with robust
statistical scoring allow for more variable features (i.e.,
post-translational modifications) being reliably identified from
(tandem-) mass spectrometry data, often without the need for biochemical
enrichment. Semi-specific proteome searches, that enforces a theoretical
enzymatic digestion to solely the N- or C-terminal end, allow to
identify native protein termini or those arising from endogenous
proteolytic activity (also referred to ‘neo-N-termini’ analysis or
‘N-terminomics’. Nevertheless, deriving biological meaning from these
search outputs can be challenging in terms of data mining and analysis.
Thus, we introduce Fragterminomics, a data analysis approach for the (1)
annotation of peptides according to their enzymatic cleavage
specificity, (2) differential abundance and enrichment analysis of
N-terminal sequence patterns, (3) visualization of neo-N-termini
location, and (4) mapping neo-N-termini to known protein processing
features. We illustrate the use of Fragterminomics by applying it to
tandem mass tag (TMT)-based proteomics data of a mouse model of
polycystic kidney disease and assess the semi-specific searches for
biological interpretation of cleavage events and the variable
contribution of proteolytic products to general protein abundance. The
Fragterminomics approach and example data are available as an R package
at https://github.com/MiguelCos/Fragterminomics.