loading page

Exploration of Machine Learning-Generated Spectral Libraries for Data Independent Acquisition in Complex Ocean Metaproteomic Analyses
  • +6
  • Margaret Mars Brisbin,
  • Matthew Mcilvin,
  • Damien Wilburn,
  • Jaclyn Saunders,
  • Natalie Cohen,
  • Maya Bhatia,
  • Elizabeth Kujawinski,
  • Brian Searle,
  • Mak Saito
Margaret Mars Brisbin
University of South Florida

Corresponding Author:mmarsbrisbin@usf.edu

Author Profile
Matthew Mcilvin
Woods Hole Oceanographic Institution
Author Profile
Damien Wilburn
The Ohio State University
Author Profile
Jaclyn Saunders
University of Georgia
Author Profile
Natalie Cohen
University of Georgia
Author Profile
Maya Bhatia
University of Alberta
Author Profile
Elizabeth Kujawinski
Woods Hole Oceanographic Institution
Author Profile
Brian Searle
The Ohio State University Medical Center
Author Profile
Mak Saito
Woods Hole Oceanographic Institution
Author Profile

Abstract

Ocean metaproteomics provides valuable insights into the structure and function of marine microbial communities. Yet, ocean samples are challenging due to their extensive biological diversity that results in a very large number of peptides with a large dynamic range. This study characterized the capabilities of data independent acquisition (DIA) mode for use in ocean metaproteomic samples. Spectral libraries were constructed from discovered peptides and proteins using machine learning algorithms to remove incorporation of false positives in the libraries. When compared with 1-dimensional and 2-dimensional data dependent acquisition analyses (DDA), DIA outperformed DDA both with and without gas phase fractionation. We found that larger discovered protein spectral libraries performed better, regardless of the geographic distance between where samples were collected for library generation and where the test samples were collected. Moreover, the spectral library containing all unique proteins present in the Ocean Protein Portal outperformed smaller libraries generated from individual sampling campaigns. However, a spectral library constructed from all open reading frames in a metagenome was found to be too large to be workable, resulting in low peptide identifications due to challenges maintaining a low false discovery rate with such a large database size. Given sufficient sequencing depth and validation studies, spectral libraries generated from previously discovered proteins can serve as a community resource, saving resequencing efforts. The spectral libraries generated in this study are available at the Ocean Protein Portal for this purpose.
16 Nov 2024Submitted to PROTEOMICS
21 Nov 2024Submission Checks Completed
21 Nov 2024Assigned to Editor
22 Nov 2024Review(s) Completed, Editorial Evaluation Pending
22 Nov 2024Reviewer(s) Assigned
16 Dec 2024Editorial Decision: Revise Minor