1. Introduction
NMR spectroscopy has been successfully applied in the last 40 years to investigate the metabolic content of biofluid mixtures, such as urines and plasma, in order to discover biomarkers and reveal hidden biochemical mechanisms underlying complex diseases [
1,
2]. Due to its high level of reproducibility and stability, simple and non-destructive sample preparation, short time of analysis and the possibility to simultaneously identify and quantify a large number of metabolites, high-resolution high-field (HF) NMR spectroscopy has become a standard approach in targeted and in untargeted metabolomics. Unfortunately, the high cost of the instrumentation, its large size, the use of cryogenic systems and the need of a specialist technical staff limited its accessibility to many researchers. In the last 10 years, the progress in the technology underlying magnets has led to permanent magnets with a magnetic field sufficiently homogeneous to allow NMR spectroscopy using low-field (LF) [
3,
4]. A new family of instruments for NMR spectroscopy operating at less than 2 T was available on the market. Interestingly, the new instruments are benchtop, do not require complex operations for setting and, then, dedicated expert users; they are easy to maintain, and it is also easy to achieve low running cost. In spite of a lack of sensitivity and resolution with respect to HF instruments, benchtop LF NMR instruments seem to be suitable for metabolic fingerprinting and targeted quantification of specific metabolites, fitting the needs of clinical applications [
5,
6].
Another standard approach to metabolomics is based on Mass Spectrometry (MS) [
7,
8]. A large number of techniques have been developed combining chromatography and Mass Spectrometry to identify and quantify metabolites in complex biological samples. These techniques outperform HF NMR spectroscopy in terms of sensitivity using moderately expensive instrumentation with small footprint, but require complex sample preparation, show moderate reproducibility and lead to partial structure determination.
The aim of our proof-of-concept study was to evaluate whether a benchtop LF NMR instrument may be promising for an early detection of sepsis by urinary metabolic fingerprinting at birth, and to compare the classifier based on its fingerprint with that obtained with the more complex MS-based platform.
Neonatal sepsis is an infection-induced systemic inflammatory response syndrome common in premature and term neonates [
9,
10,
11]. It is one of the leading causes of neonatal death and morbidity and is believed to have a key role in most inflammatory disorders that cause or enhance the main morbidities affecting the preterm (bronchopulmonary dysplasia, white matter injury, necrotizing enterocolitis, and retinopathy of prematurity). Sepsis in newborns is typically classified as either early-onset sepsis (EOS), when the infection occurs within three days after birth, or late-onset sepsis (LOS) if it develops afterward. Early detection of neonatal sepsis and prompt administration of broad-spectrum antibiotic therapy can prevent its clinical course towards septic shock and death, but it is not easy to diagnose neonatal sepsis early on. Blood culture is still considered the gold standard, even though it takes time to obtain the results, and false-negative findings are not uncommon because neonatal bacteremia is often intermittent, and intrapartum antibiotic treatment may limit the culture’s diagnostic value [
12]. Neonatal sepsis is therefore mainly suspected on the grounds of non-specific clinical signs and symptoms; moreover, none of the most widely used biomarkers are entirely reliable indicators of sepsis in newborns [
13,
14].
In Mardegan et al. [
15], the possibility to apply untargeted MS-based metabolomics for an early detection of sepsis has been investigated. The study highlighted that neonates with EOS have a perturbation at the urinary metabolic level at birth that clearly distinguishes them from those without sepsis. Specifically, some metabolites belonging to glutathione and tryptophan pathways resulted to be promising as new biomarkers of neonatal sepsis. However, mass profiling requires highly trained personal and complex analytical instrumentation that make difficult direct translation of the method in a clinical environment.
Here, the same urine samples of that study were analyzed using a benchtop LF NMR instrument. Specifically, a fingerprinting approach was applied to prove the feasibility of building a tool able to detect EOS with a potential application in a clinical environment. We did not investigate the complex biochemical mechanisms underlying EOS as performed by MS because the detail in the structure of 1D NMR spectra from LF instruments is not adequate for a comprehensive untargeted metabolomics analysis and allows only the identification of a small set of metabolites with high concentration. Indeed, a large number of resonances overlap and crowding problems are encountered. However, we took advantage of the high reproducibility of the spectra and of their relatively high sensitivity to a large number of metabolites to build representative fingerprints of the samples. Moreover, the reduced need for maintenance and expertise in sample preparation, as well as in the use of the instrument, and the short time and cost of analysis make LF NMR-based classification models easy to translate in a clinical environment.
3. Results
For one neonate of the sepsis group enrolled in the study of Mardegan et al. [
15], the volume of the collected urine was not enough to allow NMR analysis, since the previous MS-based analysis required the whole sample. As a result, the control group was composed of 10 neonates and the sepsis group consisted of of 8 neonates. The demographic and perinatal characteristics and the laboratory findings at birth of the 18 neonates are reported in
Table 1. Assuming a significance level of 0.05, no significant differences were discovered between the two groups.
Untargeted MS-based metabolomics led to two datasets, one composed of 2394 variables generated by negative ionization mode (NEG dataset) and another of 3224 variables arising from the positive ionization mode (POS dataset). Considering each dataset, no outliers were detected analyzing each group of neonates by PCA and assuming a significance level of 0.05 for the T2-test and the Q-test.
Applying PLS2C, the best model obtained for the NEG dataset showed two score components, Matthew’s correlation coefficient in fitting the data (MCC) equal to 0.892 (p = 0.012), Matthew’s correlation coefficient calculated by 20 repeated fivefold cross-validation (MCCcv) equal to 0.433 (p = 0.032), area under the Receiver Operating Characteristic curve in fitting the data (AUC) equal to 1.000 (p = 0.005) and area under the Receiver Operating Characteristic curve calculated by 20 repeated fivefold cross-validation (AUCcv) equal to 0.725 (p = 0.023). For the POS dataset, the best model showed two score components, MCC equal to 0.892 (p = 0.015), MCCcv equal to 0.316 (p = 0.041), AUC equal to 1.000 (p = 0.009) and AUCcv equal to 0.663 (p = 0.025). Both models wrongly classified one sample of the sepsis group in fitting, while two errors and three errors in cross-validation were obtained for the control and the sepsis groups, respectively. The Matthew’s correlation coefficients for the out-of-bag predictions calculated by stability selection were 0.333 and 0.217 for the NEG and the POS datasets, respectively.
By pre-processing the LF NMR spectra, a dataset (indicated as Fourier 80 dataset in the following) composed of 84 variables (ROIs) was obtained. In
Figure 1, the 18 NMR spectra and the intervals used for bucketing the spectra into the 84 ROIs are reported. Assuming a significance level of 0.05, no outliers were detected applying the T2-test and the Q-test to the PCA model built on each of the two groups.
Prior to solving the classification problem, the data variation of the Fourier 80 dataset was investigated by OPLS-W2A and by oCPCA to evaluate the common data variation shared with the NEG and POS datasets and the unique data variation, respectively. Specifically, the Fourier 80 dataset was compared with the predictive part of the post-transformed PLS2C models; it includes the data variation useful to distinguish the two groups under investigation, obtained for the NEG and the POS datasets. Considering the NEG dataset, the OPLS-W2A model showed one parallel score component explaining the 9.8% of the total variance of the Fourier 80 dataset and two orthogonal score components, whereas the oCPCA model build on the sum of the residuals and the orthogonal part of the model presented two score components explaining a unique data variation equal to 77.6% of the total variance. The correlation between the parallel score component and the predictive component was 0.76. For the POS dataset, the model of the Fourier 80 dataset showed one parallel score component explaining the 7.3% of the total variance, whereas the analysis of the unique data variation discovered two score components explaining the 80.1% of the total variance. The correlation between the parallel score component and the predictive component was 0.70. In
Figure 2, the parallel score components of the two OPLS-W2A models are reported using a color code according to the group of the sample. Interestingly, the components seem to be suitable to distinguish the two groups since most of the controls showed positive values, while most of the neonates developing sepsis negative values. As a consequence, we can expect that the Fourier 80 dataset may lead to classification models that performs similarly or, if the unique part of the Fourier 80 dataset is suitable to distinguish the two groups, better than those obtained by the NEG and the POS datasets.
Considering the Fourier 80 dataset, the best PLS2C model able to distinguish the two groups showed two score components, MCC equal to 0.775 (
p = 0.031), MCCcv equal to 0.325 (
p = 0.040), AUC equal to 0.950 (
p = 0.018) and AUCcv equal to 0.700 (
p = 0.028). One sample belonging to the control group and one belonging to the sepsis group were wrongly classified in fitting, while the same errors in cross-validation of the models obtained considering the NEG and the POS datasets were observed. The score scatter plot obtained post transforming the model is reported in
Figure 3. As it can be observed, the points representing urine samples belonging to different groups lie in different regions of the plot.
Figure 4 reports the distributions of MCCcv and AUCcv for the classification models obtained considering NEG, POS and Fourier 80 datasets. As it can be seen, the three datasets led to models with similar performance in cross-validation and, as a consequence, we can expect that one does not outperform the other in predicting new observations.
Applying stability selection and assuming a significance level of 0.05, a subset of 16 relevant ROIs was obtained (
Table 2). Interestingly, the signals in the relevant ROIs showed high signal-to-noise ratio, as it can be observed in
Figure S1 of Supplementary Materials. The median of the Matthew’s correlation coefficient calculated considering the out-of-bag predictions was 0.433. Also, when considering the out-of-bag predictions during stability selection, the three datasets showed similar results.
LF NMR was used within a fingerprinting approach, and the structure of the spectra is not adequate to extract clear information about the metabolites responsible for the discrimination of neonates developing sepsis and controls. As a consequence, to allow an interpretation of the behavior of the classifier in decision making in terms of metabolites, the signals of the relevant ROIs in the LF NMR spectra were annotated through the analysis of the corresponding regions in the HF NMR data. This was possible because the two sets of spectra were highly correlated, as the heatmap of
Figure 5 shows. The diagonal of the heatmap corresponds to signals recorded by both instruments that detected the same regions of signals, even if they used a different resolution and signal-to-noise ratio. Due to the high resolution of the HF NMR spectra, the characteristic signals of 2,3,4-trihydroxybutyric acid, 3,4-dihydroxybutanoic acid, d-glucose, d-serine, gluconate, hippuric acid, lactate, L-threonine, N-glycine, pseudo uridine, ribitol, kynurenic acid, myo-inositol, taurine and phenylalanine were found in the relevant ROIs.
4. Discussion
Mardegan et al. [
15] proved that the complex infection-induced systemic inflammatory response syndrome of sepsis produces a perturbation of the urinary metabolome. Specifically, untargeted MS-based metabolomics was applied to discover the differences in the urinary metabolome between neonates developing sepsis and controls, highlighting that some metabolic pathways were perturbed. Thousands of metabolites were quantified to discover a small set of relevant ones. Since the reproducibility of the analytical performance is only moderate and usually limited to a small number of independent analytical sessions, untargeted data are in general unsuitable for clinical applications, and specific targeted methods must be developed to translate the findings into clinical tools. As a consequence, single targeted methods should be developed for the relevant metabolites discovered, if one would want to test the approach in a clinical environment.
LF NMR spectra map a smaller chemical space than that explored by untargeted MS-based metabolomics, and only metabolites with concentration greater than 10
−5 M can be detected [
23]. However, the high reproducibility of the NMR fingerprint and the robustness of the LF NMR instrumentation make LF NMR a candidate analytical technique for clinical applications, provided that the perturbations at the metabolomic level are captured by the fingerprint.
In principle, it is not guaranteed that a perturbation discovered by MS can be measured also by LF NMR, and this must be experimentally proven, as in our study. It is worth noting that the two approaches do not necessarily need to measure the same metabolites, since metabolic pathways are closely related to each other and a perturbation is usually not localized into a single pathway, but rather affects the concentration of a large number of metabolites, even far away from that pathway. Moreover, in the case of a multifactorial disease such as sepsis, several pathways are perturbed and a large perturbation is generated in the urinary metabolome. As a consequence, different analytical techniques detecting different sets of metabolites may provide sample representations that are different from a biochemical point of view, but that may be equally effective in distinguishing classes of subjects and, then, may help in solving clinical problems.
This was the case of the present study. We proved that the dataset obtained by LF NMR showed the same performance in prediction as the more complex datasets generated by untargeted MS-based metabolomics when used to discriminate between neonates developing sepsis and controls. The chemical space mapped by LF NMR was not the same as that explored by MS. Indeed, due to the strong correlation between LF and HF NMR data, it was possible to interpret the relevant ROIs in terms of metabolites discovering that 2,3,4-trihydroxybutyric acid, 3,4-dihydroxybutanoic acid, d-glucose, d-serine, hippuric acid, lactate, L-threonine, glycine, pseudo uridine, ribitol, kynurenic acid, myo-inositol, taurine and phenylalanine were involved in determining the differences at the urinary metabolic level between neonates developing sepsis and controls. Only taurine and phenylalanine were found relevant both in MS and in NMR analysis. However, in our previous published study [
15], glycine and kynurenic acid that were not found to be relevant in urines resulted to be relevant in distinguishing between controls and the EOS group when analyzing plasma samples.
Several studies have been published comparing the urinary metabolome of neonates developing sepsis, EOS or LOS, and controls [
24,
25,
26,
27]. However, none were suitable for a clinical application. Interestingly, all the annotated metabolites were discovered also in those studies. Specifically, Fanos et al. [
24] discovered 2,3,4-trihydroxybutyric acid, 3,4-dihydroxybutanoic, pseudo uridine and ribitol using GC-MS, and glycine, lactate and d-glucose by HF NMR as related to the differences between healthy neonates and neonates developing sepsis. Serafidis et al. [
25] found differences in the level of d-glucose and myo-inositol by HF NMR and in the levels of hippuric acid, phenylalanine and taurine using LC-MS between sepsis and control groups. In Dessì et al. [
26], d-serine and L-threonine were discovered using GC-MS. The reader can refer to the references for a detailed discussion about the supposed role played by these metabolites in sepsis development.
Another interesting result of our study is the strong correlation between LF and HF NMR spectra. A similar result was found by Leenders et al. [
6] that observed and discussed the close relationship between LF and HF NMR data in the metabolomics investigation of type 2 diabetes.
An important aspect that needs to be investigated by a clinical procedure for decision making is interpretability. Indeed, when an algorithm or, more generally, a classifier is used to make a decision, the manner and reason for the decision that was made must be understood. In other words, the mechanisms underlying decision making should be clear and understandable, and not driven by a black box. This can only be achieved if both the sample representation and the classifier/algorithm are interpretable. The fingerprint based on LF NMR used to represent the sample is in principle interpretable since each ROI can be associated to a single or a set of metabolites and, then, fits the requirement for a clinical application.
The main limitation of our study was that the estimate of the performance in prediction was based on cross-validation and on the out-of-bag predictions during stability selection without an independent test set, because a test set was not included in the experimental design. Moreover, the small number of recruited subjects limits the power of the study. However, this proof-of-concept study is necessary to justify the design of new investigations, where a larger number of recruited neonates is considered and one or more test sets are included to better evaluate the real performance in prediction of the classifier based on LF NMR spectra.
Interestingly, more sophisticated classifiers can be considered when an adequate number of training samples is available. Indeed, we used PLS2C as a classifier to avoid overfitting since PLS-based techniques are suitable for a dataset with a reduced number of observations, and overfitting can be controlled by a randomization test, but random forests may be trained with a larger number of samples and, then, non-linearity and more complex substructures within observations can be taken into account.
Author Contributions
Conceptualization, M.S., E.D. and G.G.; methodology, M.S., C.N. and C.C.; software, M.S.; validation, M.S., C.N. and C.C.; formal analysis, M.S.; investigation, C.N. and C.C; resources, E.D. and E.B.; data curation, M.S., C.N. and C.C.; writing—original draft preparation, M.S.; writing—review and editing, E.D., C.N. and G.G.; visualization, C.C.; supervision, E.D.; project administration, E.D. and E.B.; funding acquisition, E.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of Padua Hospital (protocol 3636/AO/15, date of approval 15 October 2015).
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
The data presented in this study are available on request from the corresponding author as specified in the previously published paper. Data are provided on request because the author must explain how to read the data.
Conflicts of Interest
The authors declare no conflict of interest. Claire Cannet, Claudia Napoli and Elena Demetrio are employees of Bruker. The paper reflects the views of the scientists, and not the company.
References
- Gowda, G.A.N.; Raftery, D. NMR metabolomics methods for investigating disease. Anal. Chem. 2023, 95, 83–99. [Google Scholar] [CrossRef]
- Wishart, D.S. NMR metabolomics: A look ahead. J. Magn. Reson. 2019, 306, 155–161. [Google Scholar] [CrossRef]
- Blümich, B. Introduction to compact NMR: A review of methods. TrAC Trends Anal. Chem. 2016, 83, 2–11. [Google Scholar] [CrossRef]
- Blümich, B. Low-field and benchtop NMR. J. Magn. Reson. 2019, 306, 27–35. [Google Scholar] [CrossRef] [PubMed]
- Grootveld, M.; Percival, B.; Gibson, M.; Osman, Y.; Edgar, M.; Molinari, M.; Mather, M.L.; Casanova, F.; Wilson, P.B. Progress in low-field benchtop NMR spectroscopy in chemical and biochemical analysis. Anal. Chim. Acta 2019, 1067, 11–30. [Google Scholar] [CrossRef] [PubMed]
- Leenders, J.; Grootveld, M.; Percival, B.; Gibson, M.; Casanova, F.; Wilson, P.B. Benchtop Low-Frequency 60 MHz NMR analysis of urine: A comparative metabolomics investigation. Metabolites 2020, 10, 155. [Google Scholar] [CrossRef]
- Zou, W.; She, J.; Tolstikov, V.V. A comprehensive workflow of mass spectrometry-based untargeted metabolomics in cancer metabolic biomarker discovery using human plasma and urine. Metabolites 2013, 3, 787–819. [Google Scholar] [CrossRef] [PubMed]
- Schrimpe-Rutledge, A.C.; Codreanu, S.G.; Sherrod, S.D.; McLean, J.A. Untargeted metabolomics strategies—Challenges and emerging directions. J. Am. Soc. Mass. Spectrom. 2016, 27, 1897–1905. [Google Scholar] [CrossRef]
- Fleischmann-Struzek, C.; Goldfarb, D.M.; Schlattmann, P.; Schlapbach, L.J.; Reinhart, K.; Kissoon, N. The global burden of paediatric and neonatal sepsis: A systematic review. Lancet Respir. Med. 2018, 6, 223–230. [Google Scholar] [CrossRef]
- Shane, A.L.; Sánchez, P.J.; Stoll, B.J. Neonatal sepsis. Lancet 2017, 390, 1770–1780. [Google Scholar] [CrossRef]
- Strunk, T.; Inder, T.; Wang, X.; Burgner, D.; Mallard, C.; Levy, O. Infection-induced inflammation and cerebral injury in preterm infants. Lancet Infect. Dis. 2014, 14, 751–762. [Google Scholar] [CrossRef]
- Dong, H.; Cao, H.; Zheng, H. Pathogenic bacteria distributions and drug resistance analysis in 96 cases of neonatal sepsis. BMC Pediatr. 2017, 17, 44. [Google Scholar] [CrossRef]
- Chauhan, N.; Tiwari, S.; Jain, U. Potential biomarkers for effective screening of neonatal sepsis infections: An overview. Microb. Pathog. 2017, 107, 234–242. [Google Scholar] [CrossRef] [PubMed]
- Das, A.; Shukla, S.; Rahman, N.; Gunzler, D.; Abughali, N. Clinical Indicators of Late-Onset Sepsis Workup in Very Low-Birth-Weight Infants in the Neonatal Intensive Care Unit. Am. J. Perinatol. 2016, 33, 856–860. [Google Scholar] [PubMed]
- Mardegan, V.; Giordano, G.; Stocchero, M.; Pirillo, P.; Poloniato, G.; Donadel, E.; Salvadori, S.; Giaquinto, C.; Priante, E.; Baraldi, E. Untargeted and targeted metabolomic profiling of preterm newborns with early onset sepsis: A case-control study. Metabolites 2021, 11, 115. [Google Scholar] [CrossRef]
- Dona, A.C.; Jiménez, B.; Schäfer, H.; Humpfer, E.; Spraul, M.; Lewis, M.R.; Pearce, J.T.M.; Holmes, E.; Lindon, J.C.; Nicholson, J.K. Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Anal. Chem. 2014, 86, 9887–9894. [Google Scholar] [CrossRef] [PubMed]
- Wider, G.; Dreier, L. Measuring protein concentrations by NMR spectroscopy. J. Am. Chem. Soc. 2006, 128, 2571–2576. [Google Scholar] [CrossRef] [PubMed]
- Vu, T.N.; Valkenborg, D.; Smets, K.; Verwaest, K.A.; Dommisse, R.; Lemière, F.; Verschoren, A.; Goethals, B.; Laukens, K. An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data. BMC Bioinform. 2011, 12, 405. [Google Scholar] [CrossRef] [PubMed]
- Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2002. [Google Scholar]
- Stocchero, M.; De Nardi, M.; Scarpa, B. PLS for classification. Chemom. Intell. Lab. Syst. 2021, 216, 104374. [Google Scholar] [CrossRef]
- Stocchero, M. Relevant and irrelevant predictors in PLS2. J. Chemometr. 2020, 34, e3237. [Google Scholar] [CrossRef]
- Stocchero, M.; Riccadonna, S.; Franceschi, P. Projection to latent structures with orthogonal constraints for metabolomics data. J. Chemometr. 2018, 32, e2987. [Google Scholar] [CrossRef]
- Percival, B.C.; Grootveld, M.; Gibson, M.; Osman, Y.; Molinari, M.; Jafari, F.; Sahota, T.; Martin, M.; Casanova, F.; Mather, M.L.; et al. Low-Field, benchtop NMR spectroscopy as a potential tool for point-of-care diagnostics of metabolic conditions: Validation, protocols and computational models. High Throughput 2019, 8, 2. [Google Scholar] [CrossRef] [PubMed]
- Fanos, V.; Caboni, P.; Corsello, G.; Stronati, M.; Gazzolo, D.; Noto, A.; Lussu, M.; Dessì, A.; Giuffrè, M.; Lacerenza, S.; et al. Urinary 1H-NMR and GC-MS metabolomics predicts early and late onset neonatal sepsis. Early Hum. Dev. 2014, 90, S78–S83. [Google Scholar] [CrossRef]
- Sarafidis, K.; Chatziioannou, A.C.; Thomaidou, A.; Gika, H.; Mikros, E.; Benaki, D.; Diamanti, E.; Agakidis, C.; Raikos, N.; Drossou, V.; et al. Urine metabolomics in neonates with late-onset sepsis in a case-control study. Sci. Rep. 2017, 7, 45506. [Google Scholar] [CrossRef] [PubMed]
- Dessì, A.; Liori, B.; Caboni, P.; Corsello, G.; Giuffrè, M.; Noto, A.; Serraino, F.; Stronati, M.; Zaffanello, M.; Fanos, V. Monitoring neonatal fungal infection with metabolomics. J. Matern. Fetal Neonatal Med. 2014, 27, 34–38. [Google Scholar] [CrossRef] [PubMed]
- Bjerkhaug, A.U.; Granslo, H.N.; Klingenberg, C. Metabolic responses in neonatal sepsis—A systematic review of human metabolomic studies. Acta Paediatr. 2021, 110, 2316–2325. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).