Precursor deconvolution error estimation: the missing puzzle piece in
false discovery rate in top-down proteomics
Abstract
Top-down proteomics (TDP) directly analyzes intact proteins and thus
provides more comprehensive qualitative and quantitative
proteoform-level information than conventional bottom-up proteomics that
relies on digested peptides and protein inference. While significant
advancements have been made in TDP in sample preparation, separation,
instrumentation, and data analysis, reliable and reproducible data
analysis still remains one of the major bottlenecks in TDP. A key step
for robust data analysis is the establishment of an objective estimation
of proteoform-level false discovery rate (FDR) in proteoform
identification. The most widely used FDR estimation scheme is based on
the target-decoy approach (TDA), which has primarily been established
for bottom-up proteomics. We present evidence that the TDA-based FDR
estimation may not work at the proteoform-level due to an overlooked
factor, namely the erroneous deconvolution of precursor masses, which
leads to incorrect FDR estimation. We argue that the conventional
TDA-based FDR in proteoform identification is in fact protein-level FDR
rather than proteoform-level FDR unless precursor deconvolution error
rate is taken into account. To address this issue, we propose a formula
to correct for proteoform-level FDR bias by combining TDA-based FDR and
precursor deconvolution error rate.