Variable Selection in Untargeted Metabolomics and the Danger of Sparsity
Abstract
:1. Introduction
2. Results
2.1. Vitamin B3 Pathway-Related Metabolic Features
2.2. Treatment of A–T Patients with Nicotinamide Riboside
2.3. Pathway Analysis with Mummichog Based on Features Selected with Wilcoxon or sMC
2.4. Correlation Analysis with Only the Significant Annotated Metabolites
3. Discussion
4. Materials and Methods
4.1. Measurements and Conversion to the Feature Intensity Table
4.2. Additional Preprocessing of the Large Feature Intensity Table
4.3. Data Analysis, Variable Selection, and Validation
4.4. Annotation and Pathway Analysis with Mummichog
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Pezzatti, J.; Boccard, J.; Codesido, S.; Gagnebin, Y.; Joshi, A.; Picard, D.; González-Ruiz, V.; Rudaz, S. Implementation of liquid chromatography–high resolution mass spectrometry methods for untargeted metabolomic analyses of biological samples: A tutorial. Anal. Chim. Acta 2020, 1105, 28–44. [Google Scholar] [CrossRef] [PubMed]
- de Souza, L.P.; Alseekh, S.; Naake, T.; Fernie, A. Mass Spectrometry-Based Untargeted Plant Metabolomics. Curr. Protoc. Plant Biol. 2019, 4, e20100. [Google Scholar]
- Patti, G.J.; Yanes, O.; Siuzdak, G. Metabolomics: The apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol. 2012, 13, 263–269. [Google Scholar] [CrossRef] [PubMed]
- Strimbu, K.; Tavel, J.A. What are biomarkers? Curr. Opin. Hiv Aids 2010, 5, 463. [Google Scholar] [CrossRef] [PubMed]
- Gertsman, I.; Barshop, B.A. Promises and pitfalls of untargeted metabolomics. J. Inherit. Metab. Dis. 2018, 41, 355–366. [Google Scholar] [CrossRef]
- Galindo-Prieto, B.; Eriksson, L.; Trygg, J. Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS). J. Chemom. 2014, 28, 623–632. [Google Scholar] [CrossRef]
- Rajalahti, T.; Arneberg, R.; Berven, F.S.; Myhr, K.-M.; Ulvik, R.J.; Kvalheim, O.M. Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemom. Intell. Lab. Syst. 2009, 95, 35–48. [Google Scholar] [CrossRef]
- Tran, T.N.; Afanador, N.L.; Buydens, L.M.; Blanchet, L. Interpretation of variable importance in partial least squares with significance multivariate correlation (sMC). Chemom. Intell. Lab. Syst. 2014, 138, 153–160. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2011, 73, 273–282. [Google Scholar] [CrossRef]
- Lê Cao, K.-A.; Boitard, S.; Besse, P. Sparse PLS discriminant analysis: Biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinform. 2011, 12, 253. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Liang, Y.; Xu, Q.; Cao, D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 2009, 648, 77–84. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Park, Y.; Duraisingham, S.; Strobel, F.H.; Khan, N.; Soltow, Q.A.; Jones, D.P.; Pulendran, B. Predicting network activity from high throughput metabolomics. PLoS Comput. Biol. 2013, 9, e1003123. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yang, Y.; Sauve, A.A. NAD+ metabolism: Bioenergetics, signaling and manipulation for therapy. Biochim. Et Biophys. Acta (BBA) Proteins Proteom. 2016, 1864, 1787–1800. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Coene, K.L.; Kluijtmans, L.A.; van der Heeft, E.; Engelke, U.F.; de Boer, S.; Hoegen, B.; Kwast, H.J.; van de Vorst, M.; Huigen, M.C.; Keularts, I.M. Next-generation metabolic screening: Targeted and untargeted metabolomics for the diagnosis of inborn errors of metabolism in individual patients. J. Inherit. Metab. Dis. 2018, 41, 337–353. [Google Scholar] [CrossRef] [Green Version]
- Wishart, D.S.; Tzur, D.; Knox, C.; Eisner, R.; Guo, A.C.; Young, N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S. HMDB: The human metabolome database. Nucleic Acids Res. 2007, 35, D521–D526. [Google Scholar] [CrossRef]
- Sumner, L.W.; Amberg, A.; Barrett, D.; Beale, M.H.; Beger, R.; Daykin, C.A.; Fan, T.W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J.L. Proposed minimum reporting standards for chemical analysis. Metabolomics 2007, 3, 211–221. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Schmidt, R.J.; Foxworthy, P.; Emkey, R.; Oler, J.K.; Large, T.H.; Wang, H.; Su, E.W.; Mosior, M.K.; Eacho, P.I. Niacin mediates lipolysis in adipose tissue through its G-protein coupled receptor HM74A. Biochem. Biophys. Res. Commun. 2005, 334, 729–732. [Google Scholar] [CrossRef]
- Basu, T.K.; Makhani, N.; Sedgwick, G. Niacin (nicotinic acid) in non-physiological doses causes hyperhomocysteineaemia in Sprague–Dawley rats. Br. J. Nutr. 2002, 87, 115–119. [Google Scholar] [CrossRef]
- Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef]
- Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 2019, 28, 1947–1951. [Google Scholar] [CrossRef]
- Kanehisa, M.; Sato, Y.; Furumichi, M.; Morishima, K.; Tanabe, M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 2019, 47, D590–D595. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adusumilli, R.; Mallick, P. Data conversion with ProteoWizard msConvert. In Proteomics; Springer: Berlin/Heidelberg, Germany, 2017; pp. 339–368. [Google Scholar]
- Tautenhahn, R.; Patti, G.J.; Rinehart, D.; Siuzdak, G. XCMS Online: A web-based platform to process untargeted metabolomic data. Anal. Chem. 2012, 84, 5035–5039. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kuligowski, J.; Sánchez-Illana, Á.; Sanjuán-Herráez, D.; Vento, M.; Quintás, G. Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). Analyst 2015, 140, 7810–7817. [Google Scholar] [CrossRef] [PubMed]
- Filzmoser, P.; Walczak, B. What can go wrong at the data normalization step for identification of biomarkers? J. Chromatogr. A 2014, 1362, 194–205. [Google Scholar] [CrossRef] [PubMed]
- Parsons, H.M.; Ludwig, C.; Günther, U.L.; Viant, M.R. Improved classification accuracy in 1-and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation. BMC Bioinform. 2007, 8, 234. [Google Scholar] [CrossRef] [Green Version]
- Di Guida, R.; Engel, J.; Allwood, J.W.; Weber, R.J.; Jones, M.R.; Sommer, U.; Viant, M.R.; Dunn, W.B. Non-targeted UHPLC-MS metabolomic data processing methods: A comparative investigation of normalisation, missing value imputation, transformation and scaling. Metabolomics 2016, 12, 93. [Google Scholar] [CrossRef] [Green Version]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Trygg, J.; Wold, S. Orthogonal projections to latent structures (O-PLS). J. Chemom. 2002, 16, 119–128. [Google Scholar] [CrossRef]
- Bylesjö, M.; Rantalainen, M.; Cloarec, O.; Nicholson, J.K.; Holmes, E.; Trygg, J. OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. J. Chemom. 2006, 20, 341–351. [Google Scholar] [CrossRef]
- Tran, T.; Szymańska, E.; Gerretzen, J.; Buydens, L.; Afanador, N.L.; Blanchet, L. Weight randomization test for the selection of the number of components in PLS models. J. Chemom. 2017, 31, e2887. [Google Scholar] [CrossRef]
- Szymańska, E.; Saccenti, E.; Smilde, A.; Westerhuis, J. Double-check: Validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics 2012, 8, 3–16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pang, Z.; Chong, J.; Li, S.; Xia, J. MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics. Metabolites 2020, 10, 186. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Pozhitkov, A.; Ryan, R.A.; Manning, C.S.; Brown-Peterson, N.; Brouwer, M. Constructing a fish metabolic network model. Genome Biol. 2010, 11, R115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Assigned Metabolite | Adduct | m/z | Retention Time | Fold Change | Wilcoxon p-Value | sMC F-Value |
---|---|---|---|---|---|---|
M + H[1+] | 124.0329 | 1.20 | 7.00 | 1.2 × 10−4 | 302.90 | |
M + H[1+] | 124.0329 | 1.01 | 2.02 | 1.2 × 10−4 | 118.44 | |
M − H2O + H[1+] | 105.0445 | 3.52 | 1.06 | 0.9 | 0.0149 | |
M + H[1+] | 123.0552 | 1.38 | 2.76 | 1.2 × 10−4 | 193.84 | |
M + H[1+] | 123.0553 | 0.97 | 1.94 | 1.2 × 10−4 | 254.34 | |
M−NH3 + H[1+] | 136.0394 | 2.86 | 9.50 | 1.2 × 10−4 | 361.59 | |
M − H[−] | 151.0511 | 2.97 | 6.77 | 1.2 × 10−4 | 359.63 | |
M + H[1+] | 153.0659 | 2.96 | 7.75 | 1.2 × 10−4 | 377.00 | |
M + Na[1+] | 175.0479 | 2.86 | 6.98 | 1.2 × 10−4 | 539.53 | |
M + Cl[−] | 187.0276 | 2.97 | 7.20 | 1.2 × 10−4 | 187.52 |
Method | Wilcoxon Sign Test with False Discovery Rate Correction | OPLS-DA VIP | WRT-PLS sMC α = 0.01 F > 7.06 | WRT-PLS sMC α = 3 × 10−7 F > 32.99 | CARS-PLS-DA | sPLS-DA |
---|---|---|---|---|---|---|
Accuracy | N/A | 100% | 100% | 100% | 100% | 100% |
Number features | 612 | 842 | 770 | 214 | 15 | 48 |
Annotated features | 51 | 41 | 73 | 23 | 3 | 14 |
Tolerance | Features | Metabolites | Ratio | Wilcoxon Sign Test | sMC α = 0.01 F > 7.06 | sMC α = 3 × 10−7 F > 32.99 |
---|---|---|---|---|---|---|
1 ppm | 1033 | 618 | 2.41 | 0 | 0 | 0 |
3 ppm | 2465 | 959 | 2.68 | 0.99 | 1 | 0.46 |
5 ppm | 3268 | 1131 | 2.68 | 1 | 1 | 0.88 |
Pathway | Wilcoxon Sign Test | sMC α = 0.01 | sMC α = 3 × 10−7 |
---|---|---|---|
Vitamin B3 | 5 * | 5 * | 4 * |
Arachidonic acid | 3 | 6 * | 3 * |
Methionine and cysteine | 4 | 9 * | 2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tinnevelt, G.H.; Engelke, U.F.H.; Wevers, R.A.; Veenhuis, S.; Willemsen, M.A.; Coene, K.L.M.; Kulkarni, P.; Jansen, J.J. Variable Selection in Untargeted Metabolomics and the Danger of Sparsity. Metabolites 2020, 10, 470. https://doi.org/10.3390/metabo10110470
Tinnevelt GH, Engelke UFH, Wevers RA, Veenhuis S, Willemsen MA, Coene KLM, Kulkarni P, Jansen JJ. Variable Selection in Untargeted Metabolomics and the Danger of Sparsity. Metabolites. 2020; 10(11):470. https://doi.org/10.3390/metabo10110470
Chicago/Turabian StyleTinnevelt, Gerjen H., Udo F.H. Engelke, Ron A. Wevers, Stefanie Veenhuis, Michel A. Willemsen, Karlien L.M. Coene, Purva Kulkarni, and Jeroen J. Jansen. 2020. "Variable Selection in Untargeted Metabolomics and the Danger of Sparsity" Metabolites 10, no. 11: 470. https://doi.org/10.3390/metabo10110470
APA StyleTinnevelt, G. H., Engelke, U. F. H., Wevers, R. A., Veenhuis, S., Willemsen, M. A., Coene, K. L. M., Kulkarni, P., & Jansen, J. J. (2020). Variable Selection in Untargeted Metabolomics and the Danger of Sparsity. Metabolites, 10(11), 470. https://doi.org/10.3390/metabo10110470