An approach to integrate metagenomics, metatranscriptomics and
metaproteomics data in public resources
Abstract
The availability of public metaproteomics, metagenomics and
metatranscriptomics data in public resources such as MGnify (for
metagenomics/metatranscriptomics) and the PRIDE database (for
metaproteomics), continues to increase. When these omics techniques are
applied to the same samples, their integration offers new opportunities
to understand the structure (metagenome) and functional expression
(metatranscriptome and metaproteome) of the microbiome. Here, we
describe a pilot study aimed at integrating public multi-meta-omics
datasets from studies based on human gut and marine hatchery samples.
Reference search databases (search DBs) were built using assembled
metagenomic (and metatranscriptomic, where available) sequence data
followed by de novo gene calling, using both data from the same sampling
event and from independent samples. The resulting protein sets were
evaluated for their utility in metaproteomics analysis. In agreement
with previous studies, the highest number of peptide identifications was
generally obtained when using search DBs created from the same samples.
Data integration of the multi-omics results was performed in MGnify. For
that purpose, the MGnify website was extended to enable the
visualisation of the resulting peptide/protein information from three
reanalysed metaproteomics datasets. A workflow
(https://github.com/PRIDE-reanalysis/MetaPUF) has been developed
allowing researchers to perform equivalent data integration, using
paired multi-omics datasets. This is the first time that a data
integration approach for multi-omics datasets has been implemented from
public data available in the world-leading MGnify and PRIDE databases.