Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry
Abstract
:1. Introduction
2. Materials and Methods
2.1. Reagents
2.2. Sample Preparations for GC–MS
2.3. Gas Chromatography/Mass Spectrometry Conditions
2.4. Data Processing and Data Normalization Scheme
2.5. SERDA Implementation
- (1)
- Take generalized log transformation on training data: (e.g., QC samples in a compound); and target data: (e.g., study samples in a compound).
- (2)
- Draw noise from Gaussian distribution, . Here, is determined by , where is the estimated standard deviation of and is of .
- (3)
- Update training data to obtain corrupted input, , by adding Gaussian noise corruption, , to the training data (i.e., ).
- (4)
- Optionally, oversampling samples can be applied by adding different random Gaussian noise to each of the training data.
- (5)
- Apply auto-scaling on the training data and target data.
- (6)
- Split the training data, , into two parts, and , with proportions of 80% and 20%, respectively.
- (7)
- Initialize and with Glorot uniform initializer. Initialize and by zeros.
- (8)
- For each neural network training epoch,
- i.
- randomly set (i.e., the dropout rate) of the elements in to zero;
- ii.
- randomly select samples from as a mini batch of samples;
- iii.
- update parameters , , , and using backpropagation using the Adam algorithm [35] so that the average absolute error of the mini batch samples is reduced as much as possible;
- iv.
- calculate the average absolute error on with the updated parameters.
- (9)
- Repeat (7) i–iv until the average absolute error on does not decrease for 50 epochs. Mark the number of epochs iteratively processed as .
- (10)
- Apply (7) i–iii on the whole training set, , epochs. Denote the final trained model as .
- (11)
- Apply trained model to the target data and obtain the predicted systematic error .
- (12)
- Calculate the normalized values, , by removing the predicted systematic error with subtraction, , where is the mean average of the predicted systematic error.
- (13)
- Optionally, median normalization can be applied to to remove leftover inter-batch effect.
- (14)
- Scale and exponentially transform the data back to the original scale to achieve the final normalized dataset.
2.6. Samples and Datasets
3. Results
3.1. GC–MS-Based Metabolomics: Data Normalization for Small Sample Sets
3.2. GC–MS-Based Metabolomics: Data Normalization for very Large Sample Sets
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sakaguchi, C.A.; Nieman, D.C.; Signini, E.F.; Abreu, R.M.; Catai, A.M. Metabolomics-Based Studies Assessing Exercise-Induced Alterations of the Human Metabolome: A Systematic Review. Metabolites 2019, 9, 164. [Google Scholar] [CrossRef]
- Wishart, D.S.; Tzur, D.; Knox, C.; Eisner, R.; Guo, A.C.; Young, N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S.; et al. HMDB: The Human Metabolome Database. Nucleic Acids Res. 2007, 35 (Suppl. S1), D521–D526. [Google Scholar] [CrossRef] [PubMed]
- Zeki, Ö.C.; Eylem, C.C.; Reçber, T.; Kır, S.; Nemutlu, E. Integration of GC–MS and LC–MS for Untargeted Metabolomics Profiling. J. Pharm. Biomed. Anal. 2020, 190, 113509. [Google Scholar] [CrossRef] [PubMed]
- De Livera, A.M.; Dias, D.A.; De Souza, D.; Rupasinghe, T.; Pyke, J.; Tull, D.; Roessner, U.; McConville, M.; Speed, T.P. Normalizing and Integrating Metabolomics Data. Anal. Chem. 2012, 84, 10768–10776. [Google Scholar] [CrossRef]
- Scholz, M.; Gatzek, S.; Sterling, A.; Fiehn, O.; Selbig, J. Metabolite Fingerprinting: Detecting Biological Features by Independent Component Analysis. Bioinformatics 2004, 20, 2447–2454. [Google Scholar] [CrossRef] [PubMed]
- Borrego, S.L.; Fahrmann, J.; Datta, R.; Stringari, C.; Grapov, D.; Zeller, M.; Chen, Y.; Wang, P.; Baldi, P.; Gratton, E.; et al. Metabolic Changes Associated with Methionine Stress Sensitivity in MDA-MB-468 Breast Cancer Cells. Cancer Metab. 2016, 4, 9. [Google Scholar] [CrossRef]
- Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.; Saito, K.; Kusano, M. Compensation for Systematic Cross-Contribution Improves Normalization of Mass Spectrometry Based Metabolomics Data. Anal. Chem. 2009, 81, 7974–7980. [Google Scholar] [CrossRef] [PubMed]
- Sysi-Aho, M.; Katajamaa, M.; Yetukuri, L.; Orešič, M. Normalization Method for Metabolomics Data Using Optimal Selection of Multiple Internal Standards. BMC Bioinform. 2007, 8, 93. [Google Scholar] [CrossRef]
- Boysen, A.K.; Heal, K.R.; Carlson, L.T.; Ingalls, A.E. Best-Matched Internal Standard Normalization in Liquid Chromatography-Mass Spectrometry Metabolomics Applied to Environmental Samples. Anal. Chem. 2018, 90, 1363–1369. [Google Scholar] [CrossRef]
- Dunn, W.B.; Wilson, I.D.; Nicholls, A.W.; Broadhurst, D. The Importance of Experimental Design and QC Samples in Large-Scale and MS-Driven Untargeted Metabolomic Studies of Humans. Bioanalysis 2012, 4, 2249–2264. [Google Scholar] [CrossRef]
- Li, B.; Tang, J.; Yang, Q.; Li, S.; Cui, X.; Li, Y.; Chen, Y.; Xue, W.; Li, X.; Zhu, F. NOREVA: Normalization and Evaluation of MS-Based Metabolomics Data. Nucleic Acids Res. 2017, 45, W162–W170. [Google Scholar] [CrossRef] [PubMed]
- De Livera, A.M.; Sysi-Aho, M.; Jacob, L.; Gagnon-Bartsch, J.A.; Castillo, S.; Simpson, J.A.; Speed, T.P. Statistical Methods for Handling Unwanted Variation in Metabolomics Data. Anal. Chem. 2015, 87, 3606–3615. [Google Scholar] [CrossRef] [PubMed]
- Fan, S.; Kind, T.; Cajka, T.; Hazen, S.L.; Tang, W.H.W.; Kaddurah-Daouk, R.; Irvin, M.R.; Arnett, D.K.; Barupal, D.K.; Fiehn, O. Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. Anal. Chem. 2019, 91, 3590–3596. [Google Scholar] [CrossRef] [PubMed]
- Viant, M.R.; Ebbels, T.M.D.; Beger, R.D.; Ekman, D.R.; Epps, D.J.T.; Kamp, H.; Leonards, P.E.G.; Loizou, G.D.; MacRae, J.I.; van Ravenzwaay, B.; et al. Use Cases, Best Practice and Reporting Standards for Metabolomics in Regulatory Toxicology. Nat. Commun. 2019, 10, 3041. [Google Scholar] [CrossRef] [PubMed]
- Law, K.P.; Han, T.L.; Yang, Y.; Zhang, H. Analytical Challenges of Untargeted GC-MS-Based Metabolomics and the Critical Issues in Selecting the Data Processing Strategy. F1000Research 2017, 6, 967. [Google Scholar]
- Zhao, Y.; Hao, Z.; Zhao, C.; Zhao, J.; Zhang, J.; Li, Y.; Li, L.; Huang, X.; Lin, X.; Zeng, Z.; et al. A Novel Strategy for Large-Scale Metabolomics Study by Calibrating Gross and Systematic Errors in Gas Chromatography-Mass Spectrometry. Anal. Chem. 2016, 88, 2234–2242. [Google Scholar] [CrossRef] [PubMed]
- Duan, L.; Ma, A.; Meng, X.; Shen, G.A.; Qi, X. QPMASS: A Parallel Peak Alignment and Quantification Software for the Analysis of Large-Scale Gas Chromatography-Mass Spectrometry (GC-MS)-Based Metabolomics Datasets. J. Chromatogr. A 2020, 1620, 460999. [Google Scholar] [CrossRef]
- Bijlsma, S.; Bobeldijk, I.; Verheij, E.R.; Ramaker, R.; Kochhar, S.; Macdonald, I.A.; Van Ommen, B.; Smilde, A.K. Large-Scale Human Metabolomics Studies: A Strategy for Data (Pre-) Processing and Validation. Anal. Chem. 2005, 78, 567–574. [Google Scholar] [CrossRef]
- Adeola, H.A.; Papagerakis, S.; Papagerakis, P. Systems Biology Approaches and Precision Oral Health: A Circadian Clock Perspective. Front. Physiol. 2019, 10, 399. [Google Scholar] [CrossRef]
- Fiehn, O. Metabolomics by Gas Chromatography–Mass Spectrometry: Combined Targeted and Untargeted Profiling. Curr. Protoc. Mol. Biol. 2016, 114, 30.4.1–30.4.32. [Google Scholar] [CrossRef]
- Beale, D.J.; Pinu, F.R.; Kouremenos, K.A.; Poojary, M.M.; Narayana, V.K.; Boughton, B.A.; Kanojia, K.; Dayalan, S.; Jones, O.A.H.; Dias, D.A. Review of Recent Developments in GC–MS Approaches to Metabolomics-Based Research. Metabolomics 2018, 14, 152. [Google Scholar] [CrossRef] [PubMed]
- Khodadadi, M.; Pourfarzam, M. A Review of Strategies for Untargeted Urinary Metabolomic Analysis Using Gas Chromatography-Mass Spectrometry. Metabolomics 2020, 16, 66. [Google Scholar] [CrossRef] [PubMed]
- Curtius, H.C.; Wolfensberger, M.; Steinmann, B.; Redweik, U.; Siegfried, J. Mass Fragmentography of Dopamine and 6-Hydroxydopamine: Application to the Determination of Dopamine in Human Brain Biopsies from the Caudate Nucleus. J. Chromatogr. A 1974, 99, 529–540. [Google Scholar] [CrossRef]
- Šťávová, J.; Beránek, J.; Nelson, E.P.; Diep, B.A.; Kubátová, A. Limits of Detection for the Determination of Mono- and Dicarboxylic Acids Using Gas and Liquid Chromatographic Methods Coupled with Mass Spectrometry. J. Chromatogr. B 2011, 879, 1429–1438. [Google Scholar] [CrossRef] [PubMed]
- Rahn, W.; König, W.A. GC/MS Investigations of the Constituents in a Diethyl Ether Extract of an Acidified Roast Coffee Infusion. J. High Resolut. Chromatogr. 1978, 1, 69–71. [Google Scholar] [CrossRef]
- Lamoureux, G.; Agüero, C. A Comparison of Several Modern Alkylating Agents. Arkivoc 2009, 2009, 251–264. [Google Scholar] [CrossRef]
- Liebeke, M.; Puskás, E. Drying enhances signal intensities for global GC–MS metabolomics. Metabolites 2019, 9, 68. [Google Scholar] [CrossRef]
- Fiehn, O.; Kopka, J.; Dörmann, P.; Altmann, T.; Trethewey, R.N.; Willmitzer, L. Metabolite profiling for plant functional genomics. Nat. Biotechnol. 2000, 18, 1157–1161. [Google Scholar] [CrossRef]
- Piergiovanni, M.; Termopoli, V. Derivatization Strategies in Flavor Analysis: An Overview over the Wine and Beer Scenario. Chemistry 2022, 4, 1679–1695. [Google Scholar] [CrossRef]
- Barupal, D.K.; Zhang, Y.; Shen, T.; Fan, S.; Roberts, B.S.; Fitzgerald, P.; Wancewicz, B.; Valdiviez, L.; Wohlgemuth, G.; Byram, G.; et al. A Comprehensive Plasma Metabolomics Dataset for a Cohort of Mouse Knockouts within the International Mouse Phenotyping Consortium. Metabolites 2019, 9, 101. [Google Scholar] [CrossRef]
- Yu, S.; Fan, J.; Zhang, L.; Qin, X.; Li, Z. Assessment of Biphasic Extraction Methods of Mouse Fecal Metabolites for Liquid Chromatography-Mass Spectrometry-Based Metabolomic Studies. J. Proteome Res. 2021, 20, 4487–4494. [Google Scholar] [CrossRef] [PubMed]
- Badawy, A.A.B.; Morgan, C.J.; Turner, J.A. Application of the Phenomenex EZ:FaastTM Amino Acid Analysis Kit for Rapid Gas-Chromatographic Determination of Concentrations of Plasma Tryptophan and Its Brain Uptake Competitors. Amino Acids 2008, 34, 587–596. [Google Scholar] [CrossRef] [PubMed]
- Liang, J.; Liu, R. Stacked Denoising Autoencoder and Dropout Together to Prevent Overfitting in Deep Neural Network. In Proceedings of the 2015 8th International Congress on Image and Signal Processing, CISP 2015, Shenyang, China, 14–16 October 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2016; pp. 697–701. [Google Scholar]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Simón-Manso, Y.; Lowenthal, M.S.; Kilpatrick, L.E.; Sampson, M.L.; Telu, K.H.; Rudnick, P.A.; Mallard, W.G.; Bearden, D.W.; Schock, T.B.; Tchekhovskoi, D.V.; et al. Metabolite Profiling of a NIST Standard Reference Material for Human Plasma (SRM 1950): GC-MS, LC-MS, NMR, and Clinical Laboratory Analyses, Libraries, and Web-Based Resources. Anal. Chem. 2013, 85, 11725–11731. [Google Scholar] [CrossRef] [PubMed]
- Ballman, K.V.; Grill, D.E.; Oberg, A.L.; Therneau, T.M. Faster Cyclic Loess: Normalizing RNA Arrays via Linear Models. Bioinformatics 2004, 20, 2778–2786. [Google Scholar] [CrossRef] [PubMed]
- Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-Mcintyre, S.; Anderson, N.; Brown, M.; Knowles, J.D.; Halsall, A.; Haselden, J.N.; et al. Procedures for Large-Scale Metabolic Profiling of Serum and Plasma Using Gas Chromatography and Liquid Chromatography Coupled to Mass Spectrometry. Nat. Protoc. 2011, 6, 1060–1083. [Google Scholar] [CrossRef] [PubMed]
- Lange, S.; Riedmiller, M. Deep Auto-Encoder Neural Networks in Reinforcement Learning. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
- Parsons, H.M.; Ekman, D.R.; Collette, T.W.; Viant, M.R. Spectral Relative Standard Deviation: A Practical Benchmark in Metabolomics. Analyst 2009, 134, 478–485. [Google Scholar] [CrossRef]
SERDA | SERRF | fTIC | iTIC | mTIC | Raw | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pool QC | Cross-Valid. | BioIVT Valid. | Pool QC | Cross-Valid. | BioIVT Valid. | Pool QC | BioIVT Valid. | Pool QC | BioIVT Valid. | Pool QC | BioIVT Valid. | Pool QC | BioIVT Valid. | |
Median | 5% | 16% | 19% | 13% | 19% | 34% | 53% | 51% | 59% | 63% | 53% | 60% | 58% | 56% |
Mean | 15% | 25% | 24% | 15% | 21% | 53% | 74% | 67% | 83% | 80% | 75% | 83% | 83% | 74% |
GC–MS Study | Raw Data | SERRF | SERDA |
---|---|---|---|
GeneBank | 55% | 25% | 21% |
T2D | 58% | 19% | 16% |
MPA | 50% | 28% | 22% |
GC–MS study | raw data | SERRF | SERDA |
T2D | 56% | 34% | 17% |
Dataset | Raw Data | SERRF | SERDA |
---|---|---|---|
GeneBank | 27% | 61% | 76% |
T2D | 15% | 81% | 98% |
MPA | 17% | 50% | 72% |
dataset | raw data | SERRF | SERDA |
T2D | 12% | 67% | 91% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Fan, S.; Wohlgemuth, G.; Fiehn, O. Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry. Metabolites 2023, 13, 944. https://doi.org/10.3390/metabo13080944
Zhang Y, Fan S, Wohlgemuth G, Fiehn O. Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry. Metabolites. 2023; 13(8):944. https://doi.org/10.3390/metabo13080944
Chicago/Turabian StyleZhang, Ying, Sili Fan, Gert Wohlgemuth, and Oliver Fiehn. 2023. "Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry" Metabolites 13, no. 8: 944. https://doi.org/10.3390/metabo13080944
APA StyleZhang, Y., Fan, S., Wohlgemuth, G., & Fiehn, O. (2023). Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography–Mass Spectrometry. Metabolites, 13(8), 944. https://doi.org/10.3390/metabo13080944