1. Introduction
Inflammatory bowel disease (IBD) is recognized for its inflammatory activity of the intestines, and two types can be distinguished: ulcerative colitis (UC) and Crohn’s disease (CD) [
1]. UC only concerns the colon, whereas CD can affect all parts of the gastrointestinal (GI) tract. Compared to UC, CD carries significantly higher morbidity since it is more aggressive and causes transmural inflammation, which occurs throughout the wall of the affected GI segment [
2]. Thus, the distinction between the types of IBD is based on the location and depth of GI inflammation and the different complications and prevalence. However, the clinical symptoms between UC and CD are very similar, including abdominal pain, haematochezia, and diarrhea [
3]. The disease is very complex and can be characterized by occasional periods of acute aggravation of the clinical symptoms and diagnostic signs, such as worsening abdominal pain or diarrhea, with subsequent clinical periods of remission. The course of IBD is thus determined by periods of activity or relapse and periods of remission [
4].
Regarding IBD diagnosis, several techniques and medical tests are available to assist the physician. Diagnosis is first made by assessing clinical symptoms, which include diarrhea, abdominal pain, rectal bleeding, weight loss, loss of appetite, fever, and fatigue [
5]. Examination for diagnostic signs is then performed using a combination of capsule endoscopy or colonoscopy and imaging studies, including magnetic resonance imaging (MRI), contrast radiography, or X-ray computed tomography (CT). Doctors might also examine fecal samples to ensure that an infection is not causing the symptoms and perform blood tests or biopsies of the colon to confirm the diagnosis [
6].
C-reactive protein (CRP) is used as a possible screening biomarker for IBD. Specifically, CD is associated with a remarkable response of this protein, although UC is accompanied by a lower (or non-existent) elevation in CRP synthesis. This can be explained since inflammation in the case of UC is limited to the colonic mucosa. At the same time, in CD, it is transmural and, therefore, with a theoretically greater systemic impact [
7,
8,
9]. However, the CRP values are inconclusive since it measures only the inflammatory response. Its levels are greatly affected by other biologically induced factors, including viral and bacterial infections and other diseases [
10].
Fecal markers have the theoretical advantage of having greater specificity for the diagnosis of gastrointestinal diseases such as IBD, as they do not rise in other extra digestive localization processes [
7,
8,
9,
11,
12]. Fecal calprotectin is one of the most important biomarkers, which correlates with patients having active IBD disease, as it has been shown to be useful in predicting relapse in patients with IBD in remission. Calprotectin is a protein present in the cytoplasm of neutrophils and accounts for 60% of the cytosolic proteins of granulocytes [
13]. Therefore, the presence of calprotectin in the stool is directly proportional to the migration of neutrophils into the intestinal tract [
7,
8,
9]. However, the great drawback of this biomarker is that currently, its optimal cut-off points are not clearly defined, leaving wide intervals [
14]. Thus, calprotectin level monitoring is not currently a relevant strategy for IBD disease management.
The application of metabolomic strategies opens a new field in searching for biomarkers that help diagnose IBD and its different types [
15]. Among the techniques currently used for metabolomic analysis in IBD-related studies, liquid chromatography with tandem mass spectrometry (LC-MS/MS) and nuclear magnetic resonance spectroscopy (
1H-NMR) are the most used. The technique
1H-NMR has previously been used for the analysis of urine, feces, serum, plasma, and colon biopsy samples from adult or pediatric IBD patients in either of its variants, UC or CD [
16,
17,
18,
19,
20,
21,
22]. In contrast, LC-MS/MS has been used for the analysis of samples from children and adults, including urine, feces, serum, colon, and mucosa biopsies, with different objectives such as characterization of the inflammatory pattern using metabolomics analysis for better understanding disease pathogenesis of pediatric IBD, description of the mucosal lipid profile UC patients compared with healthy subjects, evaluation of the gut microbiome or tryptophan metabolism associated with IBD or specific bacteria and metabolites association with fecal microbiota in UC patients [
23,
24,
25,
26,
27,
28,
29].
Other analytical techniques such as gas chromatography coupled to time-of-flight mass spectrometry (GC-TOF-MS) [
24,
30,
31], two-dimensional GC coupled to high-resolution TOF-MS (GCxGC-HRTOF-MS) [
32], and multisegment injection capillary electrophoresis-MS (MSI-CE-MS) [
33] have also previously been used for diverse metabolomic approaches in the study of IBD in breath, feces, mucosa biopsies, serum, and urine samples.
Ion mobility spectrometry (IMS) has proven effective for volatile organic compounds (VOCs) monitoring and characterization in biological analysis as it is recognized for high sensitivity and selectivity and short response times [
34]. The operation principle of IMS consists of the separation of VOCs on the basis of their different mobility within a drift tube subjected to a constantly maintained electric field at ambient atmospheric pressure. By combining GC with IMS detection, a two-dimensional separation of the VOCs (based on GC retention time and IMS drift time) can be rapidly accomplished for many applications with high sensitivity and selectivity [
35].
The combination GC-IMS has been previously used for simultaneous assessment of urinary and fecal VOCs in pediatric IBD patients, including 10 IBD (five UC and five CD) subjects and 10 controls in the cohort and then applying a 10-fold cross-validation approach as a statistical treatment for data classification [
36]. In breath samples, this technique has allowed distinguishing between IBD patients and healthy controls and within IBD patients between CD and UC. The dataset consisted of 39 samples (thirty IBD and nine controls), and data analysis was based on a supervised feature selection and k-fold cross-validation. In addition, tentative identification of breath VOCs that contribute to the classification was performed using the NIST database [
37]. In addition, Bosch et al. 2019 conducted a study to explore the potential of fecal VOC patterns to predict IBD course in adults using GC-IMS in which a 10-fold cross-validation approach was used as data treatment. No alterations in fecal VOC profiles preceding a change in clinical disease were observed [
38].
Within this context, this work proposes the use of GC-IMS for the metabolomic study of IBD using headspace injection (HS) for VOCs present in serum and urine samples. Urine has advantages such as easy collection, which is one of the main routes of excretion of water-soluble metabolites and xenobiotics. In addition, blood serum is the second most widely used biofluid in metabolomics studies, following urine. Blood can provide a metabolic snapshot integrating many tissues of the human body by means of its interactions with these tissues and thus provides a metabolic overview of overall metabolism, although with a lower specificity level compared to urine [
39]. Thus, a non-targeted metabolomic tool will be developed with the aim of not only enabling the diagnosis of IBD but also assessing the course of the disease (active or remission) in case of a positive diagnosis, something that has never been performed to date using IMS. GC-IMS is an emerging analytical technique whose applications in the clinical field are increasingly frequent due to its multiple advantages over other analytical techniques. IMS operates at atmospheric pressure with minimal or no sample preparation, performing a two-dimensional separation since analytes are separated based on their interaction with the GC column and their drift behavior under the influence of an electrical field, thus proving advantageous over GC-MS. Therefore, the coupling GC-IMS increases the peak capacity of separation systems and the resolving power, enabling the determination of a greater number of compounds, which is practically mandatory in metabolomics studies. To the best of our knowledge, this study is the first to jointly investigate urine and serum samples from IBD patients and healthy IBD volunteers and create robust chemometric models using VOC data obtained by HS-GC-IMS, including a larger number of samples than in previous studies. In addition, a simultaneously targeted strategy is developed that allows the quantification and identification of volatiles in biological samples of interest to determine which compounds contribute the most to each of the IBD classes (patients with active IBD, IBD in remission, or healthy IBD volunteers), a novelty that had not been achieved so far.
2. Materials and Methods
2.1. Standards and Reagents
For the characterization of the volatile profile of serum and urine samples, a total of 19 analytical standards were used: two alcohols (1-propanol, 1-pentanol), five ketones (2-heptanone, 2-hexanone, 2-pentanone, 2-butanone, acetone), four aldehydes (heptanal, hexanal, pentanal, benzaldehyde), one acid (acetic acid), one ester (ethyl acetate), two aromatic hydrocarbons (o-xylene, toluene) and four terpenes (alpha-pinene, beta-pinene, linalool, limonene). All of them were supplied by Sigma-Aldrich (St. Louis, MO, USA). Separate stock solutions of the analytical standards were prepared at 1000 µg mL−1 in artificial urine and stored at 4 °C.
For the preparation of artificial urine, the procedure described by Farias et al. [
35] was followed with slight modifications. Thus, urea (0.411 M), potassium chloride (0.0268 M), sodium chloride (3 g L
−1), disodium sulphate (0.0513 M), potassium phosphate (0.014 M), and ammonium chloride (0.0186 M) were dissolved in Milli-Q water. All of them were supplied by Sigma-Aldrich.
A Milli–Q Plus system (Millipore Bedford, MA, USA) supplied ultrapure water (18.2 MΩ cm−1). Nitrogen gas (99.99% purity, Air Liquide, Madrid) was also employed.
A standard solution of ketones (2-octanone, 2-heptanone, 2-hexanone, 2-pentanone, and 2-butanone) at 500 μg L−1 in water was used as quality control (QC) throughout the analyses. QC is injected prior to a sample sequence to check that the equipment is in optimal condition. The use of a mixture of ketones as a QC is common in GC-IMS. The ketones analyzed are distributed throughout the topographic map (in retention time and drift), which makes it possible to evaluate the quality of the results at different moments of the analysis.
In addition, a serum alkaline solution, Isofundin from B. Braun Medical (Barcelona, Spain), was used to perform calibrations of the compounds identified in the serum.
2.2. Samples
Two sets of clinical samples were collected for analysis. One consisted of urine samples and another of serum samples, corresponding to patients with confirmed IBD disease and healthy IBD control volunteers. The population used for this study consisted of 166 individuals, of whom only 74 provided serum and urine samples. Specifically, a total of 118 serum samples were analyzed, 37 from patients diagnosed with active IBD, 41 from patients with IBD in remission, and 40 from healthy IBD patients. In addition, 123 urine samples from 34 patients with active IBD, 43 with IBD in remission, and 46 healthy IBD patients were analyzed. Both serum and urine were obtained from the Hospital Universitario Rafael Méndez (Lorca, Murcia, Spain). Ethical committees from the University of Murcia (Favourable Report ID:2908/2020) and from the Hospital Universitario Rafael Méndez (Lorca, Spain) approved this study. Informed consent was obtained from all individuals, and samples were used according to hospital guidelines. A demographic summary of the IBD patient and healthy control data is shown in
Table 1. The mean of the available parameters, together with the median and the interquartile range (IQR), including both the 25th and 75th percentile, were calculated.
As shown in
Table 1, the mean age of cohort patients with IBD disease for serum analysis is 47.1 years, and there were 26 males and 11 females in the active IBD phase and 17 males and 24 females in the remission IBD phase. Furthermore, while the mean values of hemoglobin (HGB), Fe, and cholesterol are similar in IBD patients in active and remission phases, large differences are found in the values of calprotectin (mean 1099.7 µg g
−1 for active and 396.4 µg g
−1 for remission) and CRP (1.3 mg L
−1 for active and 0.4 mg L
−1 for remission). Healthy IBD volunteers, conversely, had a mean calprotectin value of 17.6 µg g
−1, which is below the limit established (>50 µg g
−1) as an indicator of possible IBD. An elevated level of calprotectin (>50 µg g
−1) identifies patients who are more likely to have IBD.
Regarding the patients for the urine sample set, the mean age of the patients with active IBD disease was 48.3 years, which consisted of 20 males and 14 females. For the patients in remission, 48.9 years (15 males and 28 females), and in the control samples, 44 years, which consisted of 19 males and 27 females. In relation to the values of calprotectin, CRP, Fe, HGB, and cholesterol, the same behavior was observed between the groups as in the serum set. Calprotectin, CRP, Fe, HGB, and cholesterol variables were collected because they have been previously described as abnormally occurring in patients with IBD [
40,
41].
Smoking habits of the study cohort of IBD patients were also recorded, where there were 2 or 1 smokers and 3 or 2 ex-smokers in active IBD patients in serum or urine samples set, respectively. While among patients providing serum samples with the disease in remission, there were 1 smoker and 8 ex-smokers, and among patients providing urine samples with IBD in remission phase, there were 3 smokers and 1 ex-smoker.
Concerning urine and blood sample collection, each blood sample was collected in a special tube that contains a separating gel that allows the cells to be separated from the blood, obtaining serum as the final result after being centrifuged at 3000 rpm for 10 min at room temperature. In the case of urine samples, they were collected in 50 mL sterile bottles, and then both serum and urine samples were aliquoted in clean tubes and immediately frozen at −20 °C for storage prior to analysis.
For their analysis, serum and urine samples were tempered at room temperature for approximately thirty minutes. Then, the serum (400 µL) or urine (2 mL) was placed into 20 mL glass vials fitted with 18 mm aluminum screw caps and silicone septum and submitted to HS-GC-IMS analysis.
2.3. Instrumentation and Software
The analyses were performed using GC-IMS, which consists of a GC from Agilent Technologies 6890N (Waldbronn, Germany) and from G.A.S (Dortmund, Germany) an IMS module equipped with tritium source and a drift tube of 98 mm length.
A 2.5 mL syringe (Gerstel GmbH, Mühlheim, Germany) was used for headspace sampling, and a non-polar GC column HP-5MS UI (30 m × 0.25 mm, 0.25 µm) also from Agilent was used for GC separation.
LAV software (G.A.S.) was employed for data acquisition. Data processing was carried out using VOCal software (G.A.S.), SIMCA software (Umetrics, Sweden, version 14.1), and Statgraphics Centurion XV (StatPoint Technologies Inc., Warrenton, VA, USA).
2.4. Analytical Procedure for Serum and Urine Samples
The amount of sample was 2 mL and 400 μL for urine and serum, respectively. Incubation was performed at 750 rpm speed for 4 min at 70 °C. Using splitless mode, a total of 750 μL of the vial headspace was injected. The temperature of the syringe was adjusted to 80 °C, and the injector temperature was 100 °C. Nitrogen was used as carrier gas with a flow rate of 1 mL min−1. The temperature program of the oven started at 50 °C, which was maintained for 4 min, using a rate of 10 °C min−1, increased to 130 °C, and held for 8 min. The GC total runtime of the analysis was 20 min.
Atmospheric pressure and positive ion mode using nitrogen as the drift gas (150 mL min−1) were applied to operate the IMS module. The drift tube temperature was set at 90 °C, operating at a fixed voltage of 500 V cm−1. The blocking and drift voltages used were 50 and 241 V, respectively. The acquisition parameters were as follows: repetition rate of 30 ms, grid pulse width of 150 μs, and an average of 32 scans.
2.5. Data Processing
For data processing, all topographic maps for each sample type (serum or urine) were initially aligned using LAV software. After a visual screening of all the spectra, the following markers were selected: 209 for serum and 193 for urine samples.
Intensity values above the baseline were selected as an analytical response, and normalization was performed using the reactive ion peak (RIP) signal.
All urine and serum samples and all selected markers were used to form the datasets, which were then randomly divided into two groups: the training set, comprising 80% of the samples, and the validation set consisting of the remaining 20% of samples.
Chemometric models were constructed with the training set using orthogonal partial least squares-discriminant analyses (OPLS-DA) and unit variance (UV) scale. The success of the model was assessed by the following parameters: Q2(cum) (cumulative predictive ability), R2Y(cum) and R2X(cum) (cumulative explained variation in X and Y, respectively), sensitivity, classification rate (CR) and precision [
42].
Accordingly, R2X(cum) and R2Y(cum) indicate the cumulative fraction of the variance explained by a specific component. The two parameters give values between zero and one, indicating a better fit of the model. The predictive ability of the chemometric model, which is recommended to have a value greater than 0.4 in metabolomics, is indicated by Q2(cum) [
43]. The CR (%) was assessed for the validation and training sets, while the sensitivity (Σ true positives/(Σ true positives + Σ false negatives) × 100%) and precision (Σ true positives/(Σ true positives + Σ false positives) × 100%) were assessed for the validation set [
44].
3. Results and Discussion
3.1. Optimisation of the HS-GC-IMS Method
The parameters of sample volume, sample incubation time and temperature, the addition of NaCl, and the temperature of the drift tube were investigated to optimize the HS-GC-IMS method for both biological matrices. The optimization was carried out using serum and urine samples from the same patient during all experiments. The optimum settings were adopted on the basis of the highest sum of intensities achieved for all compounds in the topographic map.
Initially, the sample quantity was studied. In the case of serum samples, a very limited sample volume was available, with a maximum of 400 µL of each sample, so the volume of serum to be used was optimized between 100 and 400 µL. The urine sample volume was optimized between 0.5 and 3 mL. For serum samples, the intensity of the signals increases with increasing sample volume, so 400 µL was selected as the optimum serum volume. The same pattern is observed when increasing the urine sample volume. No significant differences were observed when using 2 and 3 mL of sample. Therefore, it was decided to work with 2 mL, given the limitations in the amount of sample available from some patients.
The incubation temperature, incubation time, and NaCl mass addition were then jointly optimized using a multivariate approach to account for possible interactions between the parameters. To generate the response surface, a central composite design 23 + star and face-centered approach was used, using the peak area as the analytical response. In addition, the use of three-spaced central points resulted in a total of 17 runs. The different parameters were studied in the following ranges: incubation temperature (60–80 °C), incubation time (1–15 min), and NaCl percentage (0–10% m/v). Based on these conditions, two parallel multivariate experimental designs were carried out, one for each type of sample.
Optimization of the multiple responses was carried out using the desirability function, and the response surface plots obtained are shown in
Figure 1. In the case of the serum sample design of experiments, the coefficient of determination (R2) was 84.92%, demonstrating the suitability of the design. The optimal conditions were as follows: incubation temperature = 70 °C, incubation time = 4 min, and percentage of sodium chloride = 0% (
Figure 1A). A 90.34% suitability success was obtained in the urine design of experiments, the optimal conditions being identical to those for serum samples (
Figure 1B).
After the multivariate study, the temperature of the drift tube was optimized between 60 and 90 °C using a univariate method. Therefore, four experiments were performed, setting the temperature at 60, 70, 80, and 90 °C using the previously optimized parameters for urine and serum. When comparing the sum of the areas of all markers for each of the tested drift tube temperatures for both serum and urine, the sum of signals increases continuously as the temperature increases. The best results were obtained using 90 °C as the drift tube temperature.
The volatile profiles obtained with the optimized method are shown in
Figure 2 for both a serum sample and a urine sample of an IBD subject.
3.2. Identification of Compounds
The HS-GC-IMS method was employed to analyze urine or serum samples from patients diagnosed with IBD, patients with IBD in remission, and healthy IBD patients, to differentiate the topographic maps of each group and to identify some biomarkers.
3.2.1. Urine Samples
Urine samples from 123 volunteers (34 IBD-active patients, 43 IBD-remission patients, and 46 healthy IBD patients) were analyzed.
Supplementary Figure S1 shows the spectra obtained for each IBD category. As can be seen, the initial part of the spectrum (between 230 and 400 s) is the richest in the number of signals, collecting both the most volatile and polar compounds. The three categories showed very similar spectra, but slight differences can be appreciated by visual exploration. For instance, the active IBD urine sample showed fewer polar compounds in the early part of the spectrum, and alsofewer non-polar compounds than the samples from healthy and remission-stage patients. On the other hand, the intensity of the volatile compounds in the middle of the spectrum is significantly higher in remission-stage samples compared to the other categories.
To investigate the spectra more thoroughly, some of the volatile compounds detected by HS-GC-IMS in the analyzed urine samples were identified. For this purpose, a preliminary prospective identification was made by using VOCal software. With this software, a total of 27 compounds were identified. To confirm this tentative identification, the available 16 standards were injected separately. Specifically, 1-propanol, o-xylene, 2-heptanone, heptanal, benzaldehyde, alpha-pinene, beta-pinene, hexanal, 2-hexanone, 1-pentanol, pentanal, 2-pentanone, ethyl acetate, 2-butanone, acetic acid, and acetone. Individual and mixed standards of the above compounds were prepared at 10 µg mL−1 in artificial urine.
The proposed HS-GC-IMS method enables the separation, monitoring, and identification of 12 of these compounds, as shown in
Figure 3A. All identified compounds had been previously reported as urine biomarkers.
Table 2 shows the retention and drift times and the reduced mobility constant (
K0) of each identified compound. It should be noted that both proton-bound dimer and protonated monomer were detected for all compounds.
Figure 4A shows the comparison of these features, including proton-bound dimer and protonated monomer signals, in the different types of IBD samples studied. A visual exploration of this figure reveals that the compounds 1-propanol and ethyl acetate mainly appear in active IBD urine. In contrast, the opposite occurs with the ketones 2-butanone, 2-pentanone, and 2-hexanone, which appear mainly in healthy urine samples. As for the content of 2-heptanone, it was higher in remission of IBD urine.
3.2.2. Serum Samples
A total of 118 serum samples from 37 patients diagnosed with active IBD, 41 patients with IBD in remission, and 40 healthy IBD patients were analyzed. The spectral differences between each category can be seen in
Supplementary Figure S2. The number of signals is higher than in the case of urine, with the richest area also being the initial spectrum part. For these samples, differentiation by visual inspection is complicated. Again, the spectra of the three categories are very similar, but a higher intensity of markers is observed between 230 and 300 s of measurement time in the remission serum samples.
To further investigate the spectra, the identification of the detectable VOCs by HS-GC-IMS in the analyzed serum samples was carried out. To this end, as was conducted with the urine samples, a tentative identification was performed by means of VOCal software, and 15 compounds were identified. For confirmation of identification, a separate analysis of the 10 available standards was carried out. These are benzaldehyde, linalool, limonene, 2-heptanone, toluene, 2-hexanone, hexanal, 2-pentanone, 1-propanol, and heptanal. Individual and mixed standards of the above compounds were prepared at 10 µg mL−1 in commercial serum alkaline solution.
Thus, six of the ten available compounds were separated, monitored, and identified using the proposed HS-GC-IMS method, as shown in
Figure 3B. The retention and drift times, together with the reduced mobility constant (
K0) of each identified compound, are shown in
Table 2. The proton-bound dimer and protonated monomer were detected for all compounds.
Figure 4B shows the differences between IBD samples, comparing the proton-bound dimer and protonated monomer signals between them. Visual examination of this figure reveals that compounds 1-propanol and benzaldehyde mainly appear in individuals in remission stage IBD, while the opposite occurs with 2-pentanone. Healthy IBD patients showed the highest concentration of heptanal, while diseased patients had the lowest content. On the other hand, the opposite is observed for 2-hexanone, with a higher concentration in diseased patients compared to patients without IBD.
3.3. Method Characterization and Quantification of Identified Volatile Compounds
To quantify the content of the volatiles identified in the serum and urine samples, calibration curves were established using artificial urine or serum alkaline solution spiked at eight concentration levels between 0.01 and 2 µg mL
−1. Each concentration level was twice injected. To obtain the best calibration curve, as IMS analysis may result in the presence of several signals for one compound, different analytical responses were examined: protonated monomer intensity, protonated dimer intensity, or the sum of protonated monomer and protonated dimer intensities. Additionally, least squares and logarithmic regressions were attempted. Data were, in all cases, fitted to a logarithmic regression in accordance with previous studies [
45]. After logarithmic regression adjustment, the sum of the protonated monomer and proton-bound dimer gives the best regression coefficients (R2 > 0.97 or R2 > 0.98 for urine or serum samples, respectively), as can be seen in
Table 3. The slopes were also higher for the sum of the protonated monomer and proton-bound dimer, demonstrating an increase in the sensitivity of the method. The increase in sensitivity was statistically verified for each compound using a t-test to determine whether the slope of the regression using the sum of the protonated monomer and proton-bound dimer was greater than the slope of the regression using the monomer or dimer individually, obtaining
p-values lower than 0.05 in all the cases. The limits of quantification (LOQs) are also shown in
Table 3, calculated as 10 signal-to-noise ratios (S/N). For the compounds identified in urine, LOQs ranged between 0.007 and 0.046 µg mL
−1, which corresponded to acetone and o-xylene, respectively. For the compounds identified in serum, LOQs were in the 0.009 to 0.014 µg mL
−1 range, which corresponded to hexanal or heptanal and 1-propanol or 2-pentanone, respectively.
Once the method was characterized, it was subsequently applied for the quantification of the compounds in urine and serum samples.
Table 4 shows the average contents, together with their standard deviation values for each patient category in urine and serum. This information was further used to perform a one-way analysis of variance (ANOVA) and least significant difference (LSD) test to compare the different categories. The samples were assembled into three groups, two according to disease stage in patients with active disease or remission stage and a third group consisting of healthy patients (without IBD).
The results showed that, in the case of the compounds quantified in urine, the compound 1-propanol allowed differentiation of active IBD patients from the rest of the groups since its concentration is significantly higher (0.5 ± 0.8 µg mL−1) in urine from the active stage of the disease compared to remission and healthy individuals. In contrast, the compounds acetone and ethyl acetate showed the highest abundance in samples from patients in remission, with average contents of 1.5 ± 0.9 and 1.3 ± 1.8 µg mL−1, respectively. Urine samples from healthy patients showed the lowest abundance of these compounds, acetone and ethyl acetate, while a high variability was found in the group of patients with active disease. However, the contents of the other compounds did not show significant differences between the different categories. Furthermore, neither heptanal nor o-xylene contents can be quantified in any samples as they are below the LOQ.
In the case of the compounds quantified in serum, the results showed that hexanal and benzaldehyde differentiated healthy individuals from diseased patients (in the active and remission phase) with mean concentrations of 1.7 ± 1.6 and 0.028 ± 0.005 µg mL−1, respectively. Note that hexanal compound differentiated between diseased and healthy individuals, showing its highest concentration in healthy individuals. At the same time, benzaldehyde distinguished the same but in reverse, showing a lower concentration in healthy patients compared to the concentration found in diseased individuals. Heptanal aldehyde could only be quantified in healthy IBD patients, as its concentration in IBD patients was below the LOQ. Regarding 2-hexanone ketone, it is the only compound of those studied that allowed differentiation between the three groups, showing its highest content in individuals with the disease in remission (0.06 ± 0.04) and the lowest concentration in healthy individuals (0.010 ± 0.014). Finally, no significant differences were found for 1-propanol and 2-pentanone.
As demonstrated, it was possible to assign certain compounds to a specific category. Nevertheless, given the significant variability found within the group, it was not possible to establish a concentration threshold for each category to create classification guidelines, and consequently, chemometric models are required.
3.4. Chemometric Models for Classification According to IBD-Diagnosis
3.4.1. Chemometric Models Built with Urine Samples
As described in
Section 2.5, the alignment of the samples was first performed by overlaying them on a reference urine sample. The data matrix obtained, including all samples and markers, had the following dimensions: 123 (urine samples) × 193 (markers). Although some of the variables employed do not provide any useful information and only add noise to the model, the choice was not to remove data with the aim of providing a rapid screening method to find differences between IBD states (healthy, diseased (in remission and active) individuals), avoiding tedious chemometric treatments. Thus, all metabolites were monitored together for the differentiation of the individual’s IBD status.
Prior to the construction of the models, the residual normal probability plot was obtained to check whether data were normally distributed. As can be seen in the diagram (
Supplementary Figure S3), all the points were arranged in a straight line, indicating that the residuals were random and normally distributed. Given this normal distribution of the data, the chemometric analysis was conducted using raw data normalized with respect to the RIP intensity and without performing any transformation. Specifically, OPLS-DA-type chemometric models were constructed using UV scaling or auto-scaling.
Firstly, a binary OPLS-DA model was run to differentiate between healthy and diseased IBD patients. The obtained dataset was divided into 80% and 20% for model construction, using the samples for model training and validation, respectively. In this case, the calibration set consisted of a total of 98 samples (37 samples from healthy IBD patients and 61 samples from IBD-diseased patients), and the validation set consisted of a total of 25 samples (nine samples from healthy IBD patients and sixteen from diseased IBD patients). Data obtained in this model are shown in
Supplementary Table S1, with the best results obtained with data normalized to the RIP intensity and adjusted to the UV scale (Q2 = 0.692). The optimal OPLS-DA model is shown in
Figure 5A, for which both the training and validation success rates were 100%, correctly classifying all samples and therefore obtaining 100% sensitivity and precision.
In a second approach, a chemometric model is created to differentiate whether the patient is in the active phase of IBD or the remission stage. In this case, the dataset was composed only of urine samples from diseased patients (77 urine samples × 193 markers). The calibration set consisted of a total of 61 samples, including 27 in the active stage and 34 in the remission stage. The validation set consisted of 7 active and 9 remission IBD stage patients, giving a total of 16 validation samples, as in the previous model, working with data normalized with respect to RIP intensity and unnormalized data adjusted to the UV scale were studied. The results obtained using the different model fits are shown in
Supplementary Table S2. The best results were obtained with data normalized to the RIP intensity and adjusted to the UV scale. The score plot of the optimal OPLS-DA model is shown in
Figure 5B, which achieves the best calibration and validation success rates (100% in both cases) and the highest Q2 value (Q2 = 0.190). However, it cannot be considered a valid model, as the parameter Q2, which measures the predictive ability of the model, is lower than 0.4, the minimum recommended value to consider the model valid in metabolomics.
3.4.2. Chemometric Models Built with Serum Samples
Initially, G.A.S LAV software was used for the alignment of the topographic maps by overlaying the samples on a reference serum sample. The manual selection of 209 markers (identified by drift and retention times) was then assessed by visual scanning of the topographic plots and resulted in a data matrix with the raw data following dimensions: 118 serum samples × 209 markers.
Then, the normal distribution of data was evaluated by means of a normal residual probability plot. The plot shows that all data fit a straight line, so they are normally distributed, and therefore, models can be built using the raw data without transforming them (
Supplementary Figure S4). Concretely, OPLS-DA models were constructed using UV scaling or autoscaling, working with data normalized with respect to RIP intensity and the unnormalized data.
As for urine samples, two types of models are constructed for serum samples: one to differentiate between healthy IBD patients and patients with IBD; and another to differentiate between patients with IBD in remission and those in the active period.
Initially, to differentiate between healthy IBD patients and patients with IBD, the 118 samples were split into two groups, 80% of the samples for the training set (94 samples, including 62 patients with IBD disease and 32 healthy IBD patients) and the remaining 20% of the samples for the validation set (24 samples, including 16 patients with IBD disease and 8 healthy IBD patients). Then, an OPLS-DA model adjusted to the UV scale working with data normalized with respect to RIP intensity and unnormalized data was carried out, and the results obtained are shown in
Supplementary Table S3.
Figure 6A shows the score-plot of the optimal OPLS-DA model (R2X = 0.870, R2Y = 0.953, Q2 = 0.901), adjusted to UV scale using normalized data with respect to the RIP where there is a clear separation between diseased and healthy IBD patients. The classification and validation rates were 100% in both cases, correctly classifying all serum samples.
Secondly, different OPLS-DA models to differentiate between IBD disease phases (active or in remission) were constructed. Dataset dimensions were 78 serum samples × 209 markers, which were split into training (63 samples) and validation (15 samples) sets corresponding to 80% and 20% of samples, respectively. Specifically, the training set was composed of 30 and 33 samples of patients in active and remission IBD stages, and the validation set consisted of 7 active and 8 remission IBD stage patients. For OPLS-DA models, construction data normalization with respect to the RIP intensity and raw data without normalization were tested. The optimal OPLS-DA model is shown in
Figure 6B, which achieves the best calibration and validation success rates (100% in both cases) and the highest Q2 value (Q2 = 0.424), using normalized data with respect to the RIP (
Supplementary Table S4).