MStractor: R Workflow Package for Enhancing Metabolomics Data Pre-Processing and Visualization
Abstract
:1. Introduction
2. Discussion and Results
2.1. General Overview
- Functions developed by the authors to provide the user with GUIs for parameter input and data QC monitoring (in green in Figure 1).
- Wrappers of XCMS and CAMERA functions that are implemented with additional code to automate routine operations and graphical output generation (in blue in Figure 1).
- Native CAMERA and XCMS functions. All the arguments and parameters required by these functions are automatically generated along the workflow (in orange in Figure 1).
2.2. MStractor Performance Evaluation
2.2.1. Data Input
- Project() allows two analytical replicates to be defined that are used in the early steps of the workflow to evaluate peak detection parameters. It also generates a QC directory where all the QC graphical outputs and data tables generated along the intermediate steps are stored.
- DefineClassAttributes() automatically defines symbols, colours, and identifiers for each sample class. In this way, samples belonging to different classes are labelled with different colours and symbols in the graphical outputs generated. This enables easier interpretations of the generated plots, as well as performing additional quality control of the loaded files.
2.2.2. Data Processing Parameter Input
2.2.3. Functionalities for Workflow QC
2.2.4. Results
- Condensing peak groups by retaining only the most intense feature for each peak group (FilterDM()). This feature selection assumes that features in an assigned group belong to the same chemical entity. After filtering, the matrix was reduced to 343 peaks. The same data-reduction step was manually performed for comparison purposes on the XCMS Online dataset, which was reduced to 1000 features.
- Performing manual curation via CollectBP_EICs() and BasePks_Curated(). This step was aimed at removing background signals and peaks that were not well resolved (described in Section 2.2.3). During the curation step, 31 features were removed, and the final data table contained 312 features. This further curation step, however, was not available in XCMS Online, since EICs are not generated for all the features. The MStractor and XCMS Online results are summarized in Figure 3.
- Median normalization, which was carried out to minimize possible inter-run instrument variability.
- Calculation of descriptive statistics via the statsByClass() function that returned separate data tables containing average values, standard deviation, and coefficient of variation (%CV ) for each sample class (Treatment and Mix in the present dataset).
2.2.5. Data Analysis and Visualization
- Extracted Ion Chromatograms (EICs) and Box PlotsEIC visualization and box plots are available in XCMS Online depending on the output of the t-test (displayed in the result table). If the feature result is not significant, the plot is not generated. This is quite limiting, as the user cannot visualize all the metabolites in the biological sample. On the contrary, MStractor does not perform a significance test, but provides extracted ion chromatograms and box plots for every feature. A t-test was not included among MStractor features because it was designed to be able to accommodate pairwise and multigroup comparisons without the need of selecting dedicated workflows, as required by XCMS Online. Extracted ion chromatogram plots are automatically generated at different stages of the workflow and stored in dedicated folders.In regard to box plots, MStractor provides more advanced visualization compared to XCMS Online. Using the bpSel() function, a dedicated GUI enables the user to select the classes to be represented in the box plot (an option that is particularly valuable in case of multigroup experiments). Both individual box plots and group box plot (Figure 4) visualizations are saved as .html files. This allows performing immediate visual comparisons of the analyte differences among sample classes. In addition, all the plots are interactive, allowing zooming in, as well as source-data display upon hovering.
- Heat map, Principal Component Analysis and Cloud PlotHeat maps generated in MStractor and XCMS Online are very similar, as both provide interactive visualization (Heatmaply package for MStractor [18]). However, the heat map in XCMS Online can only display a limited number of features (first 1000). Thanks to the data-reduction steps, the MStractor user is provided with a heat map for the whole dataset, avoiding a partial visualization. Using the present case study, the heatmap could not be generated in XCMS Online. This, however, could be related to the speed of the internet connection, rather than XCMS Online’s computational power. An example of the heat map produced in MStractor is reported in Figure 5.On the other hand, XCMS Online provides PCA and cloud plots. Specifically, interactive cloud plot functionality is useful, since it represents the feature fold change along the retention time domain as bubble plots. A number of interactive settings are also available to filter the results based on intensity, m/z, rt, p-value, and fold change. An example is reported in Figure 6. This type of feature is not available in MStractor.
- Library Search, Putative IdentificationsLibrary search and putative identifications represent another point of divergence between MStractor and XCMS Online.Within XCMS Online, parameters for putative identification are entered via the “Identification” tab, which allows users to define the mass tolerance and select possible adducts.Library search is performed via the METLIN database using the feature mass and the corresponding fragmentation pattern. On one hand, this option proves very valuable for datasets containing MS/MS data. On the other hand, when MS1 data are used (such as in the example dataset considered), the confidence of the putative identification performed might drop significantly, as it is based only on the accurate mass of the molecular ion and its isotopic pattern.In MStractor, the spectrum of each chemical entity is saved as a list of m/z vs. intensities that are stored as .msp files. This type of file can be uploaded in the NIST software and searched for matches against a given spectral library.Using GUIs, StoreRefFeat() and spectraFromScan() functions extract the raw spectra from a selected reference file, nistEntryFromScan() creates individual NIST compatible entries for each compound, while createSearchList() lists all compound spectra with unique identifiers in the .msp file. In this way, the library search can be performed for a single compound of interest or, alternatively, for all the compounds at once.
3. Materials and Methods
3.1. Samples and Sample Preparation
3.2. HPLC-QToF MS Analytical Platform
3.3. Dataset Preparation
3.4. Hardware and Software
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Johnson, C.H.; Ivanisevic, J.; Benton, P.H.; Siuzdak, G. Bioinformatics: The Next Frontier of Metabolomics. Anal. Chem. 2015, 87, 147–156. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Smith, R.; Mathis, A.D.; Ventura, D.; Prince, J.T. Proteomics, lipidomics, metabolomics: A mass spectrometry tutorial from a computer scientist’s point of view. BMC Bioinform. 2014, 15, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Katajamaa, M.; Oresic, M. Data processing for mass spectrometry-based metabolomics. J. Chromatogr. A 2007, 1158, 318–328. [Google Scholar] [CrossRef] [PubMed]
- Wan, E.; Masson, P. Processing and analysis of GC/LC-MS-based metabolomics data. Methods Mol. Biol. 2011, 708, 277–298. [Google Scholar] [CrossRef]
- Sugimoto, M.; Kawakami, M.; Robert, M.; Soga, T.; Tomita, M. Bioinformatics Tools for Mass Spectroscopy-Based Metabolomics Data Processing and analysis. Curr. Bioinform. 2012, 7, 96–108. [Google Scholar] [CrossRef] [PubMed]
- Johnsen, L.G.; Skou, P.B.; Khakimov, B.; Bro, R. Gas chromatography-mass spec data processing made easy. J. Chromatogr. A 2017, 1503, 57–64. [Google Scholar] [CrossRef] [PubMed]
- Cambiaghi, A.; Ferrario, M.; Masseroli, M. Analysis of metabolomic data: Tools, current strategies and future challenges for omics data integration. Brief. Bioinform. 2016, 18, 498–510. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Stanstrup, J.; Broeckling, C.; Helmus, R.; Hoffmann, N.; Mathe, E.; Naake, T.; Nicolotti, L.; Peters, K.; Rainer, J.; Salek, R.; et al. The metaRbolomics Toolbox in Bioconductor and beyond. Metabolites 2019, 9, 200. [Google Scholar] [CrossRef] [Green Version]
- Pang, Z.; Chong, J.; Li, S.; Xia, J. MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics. Metabolites 2020, 10, 186. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.M.; Toh, W.; Benke, P.I. MetaboNexus: An interactive platform for integrated metabolomics analysis. Metabolomics 2014, 10, 1084–1093. [Google Scholar] [CrossRef]
- Smith, C.A.; Want, E.J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification. Anal.Chem. 2006, 78, 779–787. [Google Scholar] [CrossRef]
- Pluskal, T.; Castillo, S.; Villar-Briones, A. MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform. 2010, 11, 395–405. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Clasquin, M.F.; Melamud, E.; Rabinowitz, J.D. LC-MS Data Processing with MAVEN: A Metabolomic Analysis and Visualization Engine. Curr. Protoc. Bioinform. 2012, 37, 1–23. [Google Scholar] [CrossRef] [Green Version]
- Tautenhahn, R.; Boettcher, C.; Neumann, S. Highly sensitive feature detection for high resolution LC/MS. BMC Bioinform. 2008, 9, 504–519. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tautenhahn, R.; Patti, G.J.; Rinehart, D.; Siuzdak, G. XCMS Online: A web-based platform to process untargeted metabolomic data. Anal. Chem. 2012, 84, 5035–5039. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gowda, H.; Ivanisevic, J.; Johnson, C.H.; Kurczy, M.E.; Benton, P.H.; Rinehart, D.; Nguyen, T.; Ray, J.; Kuehl, J.; Arevalo, B.; et al. Interactive XCMS Online: Simplifying Advanced Metabolomic Data Processing and Subsequent Statistical Analyses. Anal. Chem. 2014, 86, 6931–6939. [Google Scholar] [CrossRef] [PubMed]
- Kuhl, C.; Tautenhahn, R.; Boettcher, C.; Larson, T.R.; Neumann, S. CAMERA: An integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 2012, 84, 283–289. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Galili, T.; O’Callaghan, A.; Sidi, J.; Sievert, C. heatmaply: An R package for creating interactive cluster heatmaps for online publishing. Bioinformatics 2017, 34, 1600–1602. [Google Scholar] [CrossRef] [PubMed]
XCMS Online | MStractor | |
---|---|---|
Feature Detection CentWave | ||
ppm | 10 | 10 |
Min peak width (seconds) | 10 | 10 |
Max peak width (seconds) | 20 | 20 |
Signal-to-noise threshold | 100 | 100 |
m/z difference | 0.01 | 0.01 |
Integration method | 1 | 1 |
Prefilter peaks | 100 | 100 |
Prefilter intensity | 750 | 750 |
Noise filter | None | Not applicable |
Integration threshold | Not applicable | 2000 |
Sensitivity | Not applicable | 0.7 |
Fit Gaussian | Not applicable | FALSE |
Retention Time Correction | ||
Method | loess | loess |
Extra peaks | 1 | 1 |
Missing | 3 | 3 |
Bw (seconds) | 20 | 20 |
Mzwid | 0.1 | 0.1 |
Minfrac | 0.5 | 0.5 |
Span | 0.6 | 0.6 |
Family | Gaussian | Gaussian |
Alignment | ||
Bw (seconds) | 20 | 20 |
Minfrac | 0.3 | 0.3 |
Mzwid | 0.1 | 0.1 |
Minsamp | 2 | 2 |
Max | 50 | 50 |
Peak annotation | ||
Sigma | Not applicable | 6 |
Percentage of FHWM | Not applicable | 1 |
Intensity | Not applicable | maxo |
Max number of expected isotopes | Not applicable | 4 |
ppm error | 10 | 10 |
m/z absolute error | 0.005 | Not applicable |
Group-correlation threshold | Not applicable | 0.7 |
Intensity-correlation threshold | Not applicable | 0.7 |
Correlation-threshold significance | Not applicable | 0.1 |
Identification | ||
ppm | 10 | Not applicable |
adducts | [M − H]−, [M + FA-H]− | Not applicable |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nicolotti, L.; Hack, J.; Herderich, M.; Lloyd, N. MStractor: R Workflow Package for Enhancing Metabolomics Data Pre-Processing and Visualization. Metabolites 2021, 11, 492. https://doi.org/10.3390/metabo11080492
Nicolotti L, Hack J, Herderich M, Lloyd N. MStractor: R Workflow Package for Enhancing Metabolomics Data Pre-Processing and Visualization. Metabolites. 2021; 11(8):492. https://doi.org/10.3390/metabo11080492
Chicago/Turabian StyleNicolotti, Luca, Jeremy Hack, Markus Herderich, and Natoiya Lloyd. 2021. "MStractor: R Workflow Package for Enhancing Metabolomics Data Pre-Processing and Visualization" Metabolites 11, no. 8: 492. https://doi.org/10.3390/metabo11080492
APA StyleNicolotti, L., Hack, J., Herderich, M., & Lloyd, N. (2021). MStractor: R Workflow Package for Enhancing Metabolomics Data Pre-Processing and Visualization. Metabolites, 11(8), 492. https://doi.org/10.3390/metabo11080492