Machine Learning Applications in Metabolomics Analysis

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 13730

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Málaga University, 29071 Málaga, Spain
Interests: artificial intelligence; biomedicine; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Leicester School of Pharmacy, Faculty of Health and Life Sciences, De Montfort University, Leicester LE1 9BH, UK
Interests: chemical pathology; clinical chemistry; NMR-based metabolomics; disease diagnosis and prognostic monitoring; metabolic pathway analysis; bioinorganic chemistry; drug design; development and synthesis; artificial intelligence; machine learning; research ethics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Metabolomics research is gaining much popularity since it enables the study of biological problems at a biochemical level, and can help us to understand the induction, development and mechanisms of many diseases, complementing information from other ‘omics technologies. Similar to other high-throughput biological technologies, metabolomics can produce large volumes of data, and therefore, machine learning strategies can facilitate its application, with the discovery of new biomolecular signatures, which consequently facilitate the diagnosis/prognostic monitoring of diseases, including rare metabolic disorders, etc.

This Special Issue aims to attract publications focused on the application of machine learning techniques to the analysis of multidimensional metabolomics data, including the development of methods, data augmentation procedures, preprocessing techniques, the comparisons of different methods, interpretability of results, the identification of new signatures, etc.

Prof. Dr. Leonardo Franco
Prof. Dr. Martin Grootveld
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • metabolites
  • biofluids
  • Artificial Intelligence
  • metabolomics
  • machine learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 1886 KiB  
Article
Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
by Erik D. Huckvale and Hunter N. B. Moseley
Metabolites 2024, 14(11), 582; https://doi.org/10.3390/metabo14110582 - 27 Oct 2024
Viewed by 546
Abstract
Background/Objectives: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other [...] Read more.
Background/Objectives: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists. Methods: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset. Results: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098. Conclusions: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

14 pages, 2401 KiB  
Article
Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways
by Erik D. Huckvale and Hunter N. B. Moseley
Metabolites 2024, 14(9), 510; https://doi.org/10.3390/metabo14090510 - 21 Sep 2024
Viewed by 793
Abstract
Metabolism is a network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic [...] Read more.
Metabolism is a network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic knowledgebases, such as the Kyoto Encyclopedia of Gene and Genomes (KEGG) contain metabolites, reactions, and pathway annotations; however, such resources are incomplete due to current limits of metabolic knowledge. To fill in missing metabolite pathway annotations, past machine learning models showed some success at predicting the KEGG Level 2 pathway category involvement of metabolites based on their chemical structure. Here, we present the first machine learning model to predict metabolite association to more granular KEGG Level 3 metabolic pathways. We used a feature and dataset engineering approach to generate over one million metabolite-pathway entries in the dataset used to train a single binary classifier. This approach produced a mean Matthews correlation coefficient (MCC) of 0.806 ± 0.017 SD across 100 cross-validation iterations. The 172 Level 3 pathways were predicted with an overall MCC of 0.726. Moreover, metabolite association with the 12 Level 2 pathway categories was predicted with an overall MCC of 0.891, representing significant transfer learning from the Level 3 pathway entries. These are the best metabolite pathway prediction results published so far in the field. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

15 pages, 3898 KiB  
Article
Explainable AI to Facilitate Understanding of Neural Network-Based Metabolite Profiling Using NMR Spectroscopy
by Hayden Johnson and Aaryani Tipirneni-Sajja
Metabolites 2024, 14(6), 332; https://doi.org/10.3390/metabo14060332 - 14 Jun 2024
Cited by 1 | Viewed by 1291
Abstract
Neural networks (NNs) are emerging as a rapid and scalable method for quantifying metabolites directly from nuclear magnetic resonance (NMR) spectra, but the nonlinear nature of NNs precludes understanding of how a model makes predictions. This study implements an explainable artificial intelligence algorithm [...] Read more.
Neural networks (NNs) are emerging as a rapid and scalable method for quantifying metabolites directly from nuclear magnetic resonance (NMR) spectra, but the nonlinear nature of NNs precludes understanding of how a model makes predictions. This study implements an explainable artificial intelligence algorithm called integrated gradients (IG) to elucidate which regions of input spectra are the most important for the quantification of specific analytes. The approach is first validated in simulated mixture spectra of eight aqueous metabolites and then investigated in experimentally acquired lipid spectra of a reference standard mixture and a murine hepatic extract. The IG method revealed that, like a human spectroscopist, NNs recognize and quantify analytes based on an analyte’s respective resonance line-shapes, amplitudes, and frequencies. NNs can compensate for peak overlap and prioritize specific resonances most important for concentration determination. Further, we show how modifying a NN training dataset can affect how a model makes decisions, and we provide examples of how this approach can be used to de-bug issues with model performance. Overall, results show that the IG technique facilitates a visual and quantitative understanding of how model inputs relate to model outputs, potentially making NNs a more attractive option for targeted and automated NMR-based metabolomics. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Graphical abstract

16 pages, 1281 KiB  
Article
A Chemical Structure and Machine Learning Approach to Assess the Potential Bioactivity of Endogenous Metabolites and Their Association with Early Childhood Systemic Inflammation
by Mario Lovrić, Tingting Wang, Mads Rønnow Staffe, Iva Šunić, Kristina Časni, Jessica Lasky-Su, Bo Chawes and Morten Arendt Rasmussen
Metabolites 2024, 14(5), 278; https://doi.org/10.3390/metabo14050278 - 10 May 2024
Viewed by 1703
Abstract
Metabolomics has gained much attention due to its potential to reveal molecular disease mechanisms and present viable biomarkers. This work uses a panel of untargeted serum metabolomes from 602 children from the COPSAC2010 mother–child cohort. The annotated part of the metabolome consists of [...] Read more.
Metabolomics has gained much attention due to its potential to reveal molecular disease mechanisms and present viable biomarkers. This work uses a panel of untargeted serum metabolomes from 602 children from the COPSAC2010 mother–child cohort. The annotated part of the metabolome consists of 517 chemical compounds curated using automated procedures. We created a filtering method for the quantified metabolites using predicted quantitative structure–bioactivity relationships for the Tox21 database on nuclear receptors and stress response in cell lines. The metabolites measured in the children’s serums are predicted to affect specific targeted models, known for their significance in inflammation, immune function, and health outcomes. The targets from Tox21 have been used as targets with quantitative structure–activity relationships (QSARs). They were trained for ~7000 structures, saved as models, and then applied to the annotated metabolites to predict their potential bioactivities. The models were selected based on strict accuracy criteria surpassing random effects. After application, 52 metabolites showed potential bioactivity based on structural similarity with known active compounds from the Tox21 set. The filtered compounds were subsequently used and weighted by their bioactive potential to show an association with early childhood hs-CRP levels at six months in a linear model supporting a physiological adverse effect on systemic low-grade inflammation. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Graphical abstract

21 pages, 2911 KiB  
Article
Predicting the Pathway Involvement of Metabolites Based on Combined Metabolite and Pathway Features
by Erik D. Huckvale and Hunter N. B. Moseley
Metabolites 2024, 14(5), 266; https://doi.org/10.3390/metabo14050266 - 7 May 2024
Cited by 2 | Viewed by 1224
Abstract
A major limitation of most metabolomics datasets is the sparsity of pathway annotations for detected metabolites. It is common for less than half of the identified metabolites in these datasets to have a known metabolic pathway involvement. Trying to address this limitation, machine [...] Read more.
A major limitation of most metabolomics datasets is the sparsity of pathway annotations for detected metabolites. It is common for less than half of the identified metabolites in these datasets to have a known metabolic pathway involvement. Trying to address this limitation, machine learning models have been developed to predict the association of a metabolite with a “pathway category”, as defined by a metabolic knowledge base like KEGG. Past models were implemented as a single binary classifier specific to a single pathway category, requiring a set of binary classifiers for generating the predictions for multiple pathway categories. This past approach multiplied the computational resources necessary for training while diluting the positive entries in the gold standard datasets needed for training. To address these limitations, we propose a generalization of the metabolic pathway prediction problem using a single binary classifier that accepts the features both representing a metabolite and representing a pathway category and then predicts whether the given metabolite is involved in the corresponding pathway category. We demonstrate that this metabolite–pathway features pair approach not only outperforms the combined performance of training separate binary classifiers but demonstrates an order of magnitude improvement in robustness: a Matthews correlation coefficient of 0.784 ± 0.013 versus 0.768 ± 0.154. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

15 pages, 1929 KiB  
Article
Diagnostics of Thyroid Cancer Using Machine Learning and Metabolomics
by Alyssa Kuang, Valentina L. Kouznetsova, Santosh Kesari and Igor F. Tsigelny
Metabolites 2024, 14(1), 11; https://doi.org/10.3390/metabo14010011 - 22 Dec 2023
Cited by 2 | Viewed by 1974
Abstract
The objective of this research is, with the analysis of existing data of thyroid cancer (TC) metabolites, to develop a machine-learning model that can diagnose TC using metabolite biomarkers. Through data mining, pathway analysis, and machine learning (ML), the model was developed. We [...] Read more.
The objective of this research is, with the analysis of existing data of thyroid cancer (TC) metabolites, to develop a machine-learning model that can diagnose TC using metabolite biomarkers. Through data mining, pathway analysis, and machine learning (ML), the model was developed. We identified seven metabolic pathways related to TC: Pyrimidine metabolism, Tyrosine metabolism, Glycine, serine, and threonine metabolism, Pantothenate and CoA biosynthesis, Arginine biosynthesis, Phenylalanine metabolism, and Phenylalanine, tyrosine, and tryptophan biosynthesis. The ML classifications’ accuracies were confirmed through 10-fold cross validation, and the most accurate classification was 87.30%. The metabolic pathways identified in relation to TC and the changes within such pathways can contribute to more pattern recognition for diagnostics of TC patients and assistance with TC screening. With independent testing, the model’s accuracy for other unique TC metabolites was 92.31%. The results also point to a possibility for the development of using ML methods for TC diagnostics and further applications of ML in general cancer-related metabolite analysis. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

16 pages, 2267 KiB  
Article
Prediction of Clinical Remission with Adalimumab Therapy in Patients with Ulcerative Colitis by Fourier Transform–Infrared Spectroscopy Coupled with Machine Learning Algorithms
by Seok-Young Kim, Seung Yong Shin, Maham Saeed, Ji Eun Ryu, Jung-Seop Kim, Junyoung Ahn, Youngmi Jung, Jung Min Moon, Chang Hwan Choi and Hyung-Kyoon Choi
Metabolites 2024, 14(1), 2; https://doi.org/10.3390/metabo14010002 - 19 Dec 2023
Viewed by 1935
Abstract
We aimed to develop prediction models for clinical remission associated with adalimumab treatment in patients with ulcerative colitis (UC) using Fourier transform–infrared (FT–IR) spectroscopy coupled with machine learning (ML) algorithms. This prospective, observational, multicenter study enrolled 62 UC patients and 30 healthy controls. [...] Read more.
We aimed to develop prediction models for clinical remission associated with adalimumab treatment in patients with ulcerative colitis (UC) using Fourier transform–infrared (FT–IR) spectroscopy coupled with machine learning (ML) algorithms. This prospective, observational, multicenter study enrolled 62 UC patients and 30 healthy controls. The patients were treated with adalimumab for 56 weeks, and clinical remission was evaluated using the Mayo score. Baseline fecal samples were collected and analyzed using FT–IR spectroscopy. Various data preprocessing methods were applied, and prediction models were established by 10-fold cross-validation using various ML methods. Orthogonal partial least squares–discriminant analysis (OPLS–DA) showed a clear separation of healthy controls and UC patients, applying area normalization and Pareto scaling. OPLS–DA models predicting short- and long-term remission (8 and 56 weeks) yielded area-under-the-curve values of 0.76 and 0.75, respectively. Logistic regression and a nonlinear support vector machine were selected as the best prediction models for short- and long-term remission, respectively (accuracy of 0.99). In external validation, prediction models for short-term (logistic regression) and long-term (decision tree) remission performed well, with accuracy values of 0.73 and 0.82, respectively. This was the first study to develop prediction models for clinical remission associated with adalimumab treatment in UC patients by fecal analysis using FT–IR spectroscopy coupled with ML algorithms. Logistic regression, nonlinear support vector machines, and decision tree were suggested as the optimal prediction models for remission, and these were noninvasive, simple, inexpensive, and fast analyses that could be applied to personalized treatments. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

24 pages, 3968 KiB  
Article
Benchmark Dataset for Training Machine Learning Models to Predict the Pathway Involvement of Metabolites
by Erik D. Huckvale, Christian D. Powell, Huan Jin and Hunter N. B. Moseley
Metabolites 2023, 13(11), 1120; https://doi.org/10.3390/metabo13111120 - 1 Nov 2023
Cited by 6 | Viewed by 1699
Abstract
Metabolic pathways are a human-defined grouping of life sustaining biochemical reactions, metabolites being both the reactants and products of these reactions. But many public datasets include identified metabolites whose pathway involvement is unknown, hindering metabolic interpretation. To address these shortcomings, various machine learning [...] Read more.
Metabolic pathways are a human-defined grouping of life sustaining biochemical reactions, metabolites being both the reactants and products of these reactions. But many public datasets include identified metabolites whose pathway involvement is unknown, hindering metabolic interpretation. To address these shortcomings, various machine learning models, including those trained on data from the Kyoto Encyclopedia of Genes and Genomes (KEGG), have been developed to predict the pathway involvement of metabolites based on their chemical descriptions; however, these prior models are based on old metabolite KEGG-based datasets, including one benchmark dataset that is invalid due to the presence of over 1500 duplicate entries. Therefore, we have developed a new benchmark dataset derived from the KEGG following optimal standards of scientific computational reproducibility and including all source code needed to update the benchmark dataset as KEGG changes. We have used this new benchmark dataset with our atom coloring methodology to develop and compare the performance of Random Forest, XGBoost, and multilayer perceptron with autoencoder models generated from our new benchmark dataset. Best overall weighted average performance across 1000 unique folds was an F1 score of 0.8180 and a Matthews correlation coefficient of 0.7933, which was provided by XGBoost binary classification models for 11 KEGG-defined pathway categories. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

27 pages, 1512 KiB  
Article
Urinary Metabolic Distinction of Niemann–Pick Class 1 Disease through the Use of Subgroup Discovery
by Cristóbal J. Carmona, Manuel German-Morales, David Elizondo, Victor Ruiz-Rodado and Martin Grootveld
Metabolites 2023, 13(10), 1079; https://doi.org/10.3390/metabo13101079 - 13 Oct 2023
Cited by 2 | Viewed by 1357
Abstract
In this investigation, we outline the applications of a data mining technique known as Subgroup Discovery (SD) to the analysis of a sample size-limited metabolomics-based dataset. The SD technique utilized a supervised learning strategy, which lies midway between classificational and descriptive criteria, in [...] Read more.
In this investigation, we outline the applications of a data mining technique known as Subgroup Discovery (SD) to the analysis of a sample size-limited metabolomics-based dataset. The SD technique utilized a supervised learning strategy, which lies midway between classificational and descriptive criteria, in which given the descriptive property of a dataset (i.e., the response target variable of interest), the primary objective was to discover subgroups with behaviours that are distinguishable from those of the complete set (albeit with a differential statistical distribution). These approaches have, for the first time, been successfully employed for the analysis of aromatic metabolite patterns within an NMR-based urinary dataset collected from a small cohort of patients with the lysosomal storage disorder Niemann–Pick class 1 (NPC1) disease (n = 12) and utilized to distinguish these from a larger number of heterozygous (parental) control participants. These subgroup discovery strategies discovered two different NPC1 disease-specific metabolically sequential rules which permitted the reliable identification of NPC1 patients; the first of these involved ‘normal’ (intermediate) urinary concentrations of xanthurenate, 4-aminobenzoate, hippurate and quinaldate, and disease-downregulated levels of nicotinate and trigonelline, whereas the second comprised ‘normal’ 4-aminobenzoate, indoxyl sulphate, hippurate, 3-methylhistidine and quinaldate concentrations, and again downregulated nicotinate and trigonelline levels. Correspondingly, a series of five subgroup rules were generated for the heterozygous carrier control group, and ‘biomarkers’ featured in these included low histidine, 1-methylnicotinamide and 4-aminobenzoate concentrations, together with ‘normal’ levels of hippurate, hypoxanthine, quinolinate and hypoxanthine. These significant disease group-specific rules were consistent with imbalances in the combined tryptophan–nicotinamide, tryptophan, kynurenine and tyrosine metabolic pathways, along with dysregulations in those featuring histidine, 3-methylhistidine and 4-hydroxybenzoate. In principle, the novel subgroup discovery approach employed here should also be readily applicable to solving metabolomics-type problems of this nature which feature rare disease classification groupings with only limited patient participant and sample sizes available. Full article
(This article belongs to the Special Issue Machine Learning Applications in Metabolomics Analysis)
Show Figures

Figure 1

Back to TopTop