2.5 | Pre-processing of mass spectrometric data and
statistical analysis
Raw mass spectrometric data from the LC-MS/MS were converted to mzXML
format using the MSConvert tool (Version 3.0) of the open-source
ProteoWizard software
(http://proteowizard.sourceforge.net/).
Peak peaking, retention time correction and peak grouping were performed
using the XCMS package in the R statistical programing language. After
annotating the isotopes and adducts using the CAMERA package , the
filtered peak lists were normalized by the mass of the leaves used for
metabolite extraction. The peak lists were imported to the metaboanalyst
platform
(https://www.metaboanalyst.ca/),
filtered using inter-quantile range (IQR) to remove metabolite features
that do not provide useful information (e.g. metabolites whose
concentrations are close to the background noise, that are constant in
all samples and/or have low repeatability), normalized by the median,
and log-transformed and scaled to undertake statistical comparison.
Principal component analysis (PCA) was conducted using the normalized
peak areas to assess the overall relationship of the samples in an
unsupervised manner. To identify the 15 top metabolite features that
accumulate significantly differently among the plant species and/or
treatments, we conducted partial least squares-discriminant analysis
(PLS-DA). We also conducted hierarchical cluster analysis (HCA) to
identify the top 25 significant metabolite features and followed that
with Pearson correlation to find out the relatedness of these samples.
To probe and visualize the metabolite-based relationship of the samples,
we constructed dendrogram using Ward’s clustering algorithm and Pearson
correlation.