2.5 | Pre-processing of mass spectrometric data and statistical analysis
Raw mass spectrometric data from the LC-MS/MS were converted to mzXML format using the MSConvert tool (Version 3.0) of the open-source ProteoWizard software (http://proteowizard.sourceforge.net/). Peak peaking, retention time correction and peak grouping were performed using the XCMS package in the R statistical programing language. After annotating the isotopes and adducts using the CAMERA package , the filtered peak lists were normalized by the mass of the leaves used for metabolite extraction. The peak lists were imported to the metaboanalyst platform (https://www.metaboanalyst.ca/), filtered using inter-quantile range (IQR) to remove metabolite features that do not provide useful information (e.g. metabolites whose concentrations are close to the background noise, that are constant in all samples and/or have low repeatability), normalized by the median, and log-transformed and scaled to undertake statistical comparison.
Principal component analysis (PCA) was conducted using the normalized peak areas to assess the overall relationship of the samples in an unsupervised manner. To identify the 15 top metabolite features that accumulate significantly differently among the plant species and/or treatments, we conducted partial least squares-discriminant analysis (PLS-DA). We also conducted hierarchical cluster analysis (HCA) to identify the top 25 significant metabolite features and followed that with Pearson correlation to find out the relatedness of these samples. To probe and visualize the metabolite-based relationship of the samples, we constructed dendrogram using Ward’s clustering algorithm and Pearson correlation.