1. Introduction
Salmonella is a Gram-negative, facultative anaerobic bacillus with flagella and, therefore, capable of mobility [
1]—except for
S. Gallinarum, which is not motile [
2]—comprising 2659 serovars [
3]. Due to this variability and numerousness,
Salmonella’s nomenclature has been debated for many years. In 2005, the Judicial Commission of the International Committee on the Systematics of Prokaryotes issued an opinion to resolve all of the discrepancies [
4]. According to this, the
Salmonella genus contains only two species:
Salmonella enterica and
Salmonella bongori.
Salmonella enterica is divided into six different subspecies that present different biochemical characteristics, named (or assigned Roman numerals) as follows: enterica (I), salamae (II), arizonae (IIIa), diarizonae (IIIb), houtenae (IV), and indica (VI).
Salmonella bongori has no subspecies [
1].
S. enterica subsp.
enterica and subsp.
salamae serovars are principally related to warm-blooded animals, while the other
S. enterica subspecies serovars, along with
S. bongori serovars, are mainly associated with cold-blooded animals and the environment [
1]. However, this distinction is not always clear, due to the ubiquity of this pathogen in the environment.
The important antigenic characteristics of
Salmonella for serological tests are divided into three main types: the O-antigen (also called the somatic antigen), the H-antigen (also called the flagellar antigen), and the Vi-antigen (also called the capsular antigen). All
Salmonella strains, regardless of the species or subspecies to which they belong, therefore need to be serotyped in order to identify first the serogroup (i.e., the serological group that includes
Salmonella strains sharing the same somatic antigen (O-antigen)) and then the serovar (i.e.,
Salmonella strains sharing the same somatic antigen (O-antigen) with other strains of the same serogroup that differ from one another in their combination of flagellar antigens (first- and second-phase H-antigens)). The Vi-antigen, which confers more virulence to the strains that possess it than to those without it, may be present in only three
Salmonella serovars:
S. Typhi (9,12,[Vi]:d:-),
S. Paratyphi C (6,7,[Vi]:c:1,5), and
S. Dublin (
1,9,12,[Vi]:g,p:-). Therefore, the antigenic formula of
Salmonella spp. consists of these three types of antigens, reported in the following way: O-antigen, Vi-antigen (if present), first-phase H-antigens, and second-phase H-antigens. A total of 2659 serovars of
Salmonella have been identified to date, distributed as
S. enterica (2637 serovars: subsp.
enterica (1586), subsp.
salamae (522), subsp.
arizonae (102), subsp.
diarizonae (338), subsp.
houtenae (76), subsp.
indica (13)), and
S. bongori (22 serovars), according to Supplement 2008–2010 (no. 48) [
3] to the White–Kauffmann–Le Minor (WKL, formerly “Kauffmann–White”) scheme [
1], which is the current gold-standard reference method to determine
Salmonella serovars.
Based on the human clinical syndromes caused by
Salmonella, it is possible to identify two groups, typhoidal
Salmonella and non-typhoidal
Salmonella (NTS), which require the adoption of different therapeutic approaches and different prophylactic measures for patients by a competent health authority. Strains belonging to the first group, including the serovars Typhi and Paratyphi (A, B d-tartrate negative, C), are responsible for enteric fever and are associated with a high number of fatal cases; therefore, they always need to be treated with antibiotics [
5]. The latter group, composed of the remaining strains, including the d-tartrate positive Paratyphi B variant Java, and referred to as minor
Salmonella, is associated with different clinical syndromes of variable severity, from generally self-limiting gastroenteric symptoms that generally require only supportive therapy to rarer invasive diseases such as bacteraemia, endovascular infection, focal infection [
6], meningitis, and osteomyelitis, especially in young infants and in the immunocompromised (both adults and children) [
7]. In addition, according to a circular from the Italian Ministry of Health [
8], in cases of typhoidal
Salmonella infection, enteric precautions must be applied until negative results are obtained from three consecutive stool cultures from faeces collected no less than 24 h apart and no less than 48 h from the suspension of any antibiotic. If even a single stool culture tests positive, the entire procedure must be repeated after one month. The infected individual should be removed from activities involving food handling, healthcare, or childcare until full clearance of the infection is confirmed.
In cases of minor
Salmonella (NTS) infection, isolation of the infected patient must be applied until their clinical recovery (solid faeces) and/or until obtaining negative results of two consecutive stool cultures from faeces collected no less than 24 h apart and no less than 48 h after the suspension of any antimicrobial treatment. Even the measures against cohabitants and contacts of the infected patient are more stringent in the first than in the second case [
8]. Therefore, it follows that the discrimination between typhoidal
Salmonella infection and minor
Salmonella (NST) infection is strictly required.
According to the European Union One Health 2022 Zoonoses Report [
9,
10], salmonellosis was the second-most commonly reported foodborne gastrointestinal infection in humans in the European Union and a major cause of foodborne outbreaks in European Union member states as well as non-member states. A total of 65,208 confirmed human salmonellosis cases were reported by 27 MSs in 2022, corresponding to an EU notification rate of 15.3 cases per 100,000 population. As in previous years, the top five acquired
Salmonella serovars involved in human infections in the European Union were distributed as follows:
S. Enteritidis (67.3%),
S. Typhimurium (13.1%), monophasic variant of
S. Typhimurium (1,4,[5],12:i:-) (4.3%),
S. Infantis (2.3%), and
S. Derby (0.89%). In Italy, despite the European trend, salmonellosis has always represented the most commonly reported foodborne gastrointestinal infection in humans, with 3302 reported cases in 2022 and a notification rate of 5.6 cases per 100,000 population [
9,
10].
It follows that clinical laboratories need to have easily usable and reliable identification systems that allow for the quickest possible diagnosis and the possibility of discriminating between typhoidal Salmonella infection, which always requires antibiotic therapy, and minor Salmonella (NST) infection, which is generally self-limiting and requires only supportive therapy (i.e., administration of oral rehydration solutions, lactic ferments, and probiotics) except in newborns under 3 months of age and in subjects with chronic degenerative diseases. Moreover, the serotyping of Salmonella strains plays an important role in terms of epidemiological surveillance, and rapid responses could help the competent authorities in the more immediate management of individual cases or outbreaks.
In clinical microbiology laboratories,
Salmonella identification is generally performed using biochemical and serological tests, including automated systems such as the VITEK2 system (bioMérieux, Lyon, France) [
11].
In recent years, matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) has been introduced as a routine identification system due to its speed and accuracy [
12]. Two common MALDI-TOF MS systems—the MALDI Biotyper
® (Bruker Daltonik GmbH, Bremen, Germany) and the VITEK MS system (bioMérieux, Lyon, France)—are routinely used in clinical laboratories.
According to Bastin et al. [
13], the MALDI Biotyper could identify 100% of
Salmonella isolates at the genus level; however, it failed to correctly identify typhoidal
Salmonella, making it an unsuitable tool for the identification of
Salmonella at the serovar level [
14]. In the same way, according to Gyu Ri Kim et al. [
15], VITEK MS is suitable for the identification of
Salmonella at the genus level, with 100% sensitivity. However, additional tests, such as the VITEK2 system, are required to identify typhoidal
Salmonella spp. (
Salmonella Typhi and
Salmonella Paratyphi A). Nevertheless,
Salmonella Paratyphi B cannot be correctly identified at the serovar level by either the VITEK2 system or VITEK MS, and additional tests, such as traditional serological typing, are needed [
15]. The other NTS serovars lack prompt characterisation methods beyond serology. Therefore,
Salmonella typing always requires at least two different steps, the identification of the genus and the separate typing, leading to higher costs, expertise, and maintenance requirements and lengthening the wait for results.
Real-time PCR methods that can distinguish between the most common
Salmonella serovars by detecting unique serovar-specific gene markers [
16], along with multiple in silico tools to determine
Salmonella serovars from whole-genome sequence data, have been developed [
17,
18]. However, these techniques, requiring specific expertise from the personnel [
19] and incurring high costs in terms of the reagents and laboratory equipment used, are generally employed in second-level laboratories (i.e., regional or national) rather than in clinical first-level microbiology laboratories.
In recent years, vibrational spectroscopy techniques (e.g., infrared spectroscopy), coupled with chemometrics and multivariate machine learning algorithms, have become a highly promising tool for the rapid and accurate physicochemical characterisation and differentiation of microbes at several taxonomic levels [
20,
21,
22,
23,
24,
25].
Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy analyses the absorption of infrared light by biomolecules with microbial cells, providing insights into the sample’s chemical composition. ATR refers to the sampling technique for measuring samples (liquid, solid, gel-like, etc.) directly spread in tiny amounts on a mounted crystal with a high refractive index. The system allows for fast and easy measurements without extensive sample preparation. The instrument’s geometry is compact and does not require additional accessories, such as a microscope, as the focal point is intrinsically set. Briefly, the IR beam goes from the source to the sample through the optical path and the ATR crystal. The IR beam is then partially absorbed by the sample at the interface with the crystal, and the resulting spectrum reflects the chemical composition of the sample itself. Then, the crystal can be easily cleaned, if necessary, in preparation for the next analysis, making it ergonomic and safe for the operator.
The ATR-FTIR spectrum of an intact microorganism provides the metabolic fingerprint that reflects the macromolecular composition of cells in terms of nucleic acids, proteins, lipids, and carbohydrate levels (700–1500 cm
−1,
Figure 1) [
23,
26,
27]. The qualitative interpretation of the spectral pattern of this region plays a significant role in the discrimination of specimens and, therefore, in the identification process.
For this purpose, I-dOne software (Alifax S.r.l., IT, Polverara, Italy) has recently been placed on the market, and it is the first CE-IVD-marked tool for the identification of 56 different microbial species or genera, including Salmonella spp., from colonies grown on agar media of clinical interest.
Indeed,
Salmonella enterica strains have shown distinctive features across several serogroups and serovars under the lens of infrared spectroscopy, considering several sampling methods [
24,
27] and culture media [
23,
26,
28], positioning this technique as a potential serotyping method [
22]. In this regard, in addition to the actual relationship between the spectral fingerprint and the strain or the species to which it belongs, the metabolism of the isolate and its FTIR spectra can also be slightly altered by the culture media from which it originates [
28]; hence, growth conditions must also be adequately considered when using FTIR for identification purposes.
The literature does not report any vibrational spectroscopic options combining species and subspecies identification. In this paper, we present the results of a feasibility study for the detection of NTS belonging to four Salmonella enterica serogroups (B, C1, D1, and E1) and typhoidal salmonellae (referring to S. Typhi from solid culture of human origin), using ATR-FTIR technology coupled with the I-dOne software v2.2, based on machine learning prediction models. This study aims to propose the potential use of this methodology in first-level clinical laboratories, which often cannot subtype Salmonella, as a tool to identify the prevalent serogroups in humans.
2. Materials and Methods
Four
Salmonella enterica subsp.
enterica serogroups (B, C1, D1, and E1) of major clinical interest were chosen from among the total of 47 included in the WKL scheme [
1]. The study included 225 strains referable to different serovars of minor
Salmonella (NTS), including, among others, the top five acquired
Salmonella serovars involved in human infections in the European Union (
S. Enteritidis,
S. Typhimurium, monophasic variant of
S. Typhimurium (1,4,[5],12:i:-),
S. Infantis, and
S. Derby). Moreover, a total of 23 strains belonging to typhoidal
Salmonella and referable to
S. Typhi were added (
Table 1).
Only Salmonella strains with a complete antigenic formula, i.e., belonging to a known serogroup, were included. Overall, the database was built focusing on five main classes: the four serogroups plus the serovar S. Typhi (part of serogroup D1).
All of the selected Salmonella strains, both non-typhoidal and typhoidal, are available from the human-origin frozen culture collection of the Centro di Riferimento Regionale Patogeni Enterici, Marche region (CRRPE) of IZSUM and were obtained via the Enter-Net surveillance network, which sends Salmonella strains isolated from clinical samples at hospitals or private laboratories to the CRRPE for further characterisations, beginning with serotyping.
Therefore, serotyping was previously performed on the strains used in this study, in accordance with ISO/TR 6579-3:2014 [
29].
Salmonella isolates grown as pure 18–24 h cultures in tryptic soy agar (TSA) tubes from the original culture and confirmed to be
Salmonella through biochemical characterisation using BioMérieux strips (rapid ID 32 E, system for identification of
Enterobacteriaceae in 4 h), comprising a series of miniaturised biochemical tests, were first tested by saline drop slide agglutination to screen for O-rough isolates (i.e., auto-agglutinating strains).
Antigenic formulae and serovars were determined using the WKL scheme by slide agglutination of the individual strain with commercially available Salmonella antisera (Statens Serum Institut, Copenaghen, Denmark) against the somatic (O), capsular (Vi), and flagellar (H) antigens. Where required, phase inversion was induced to determine the second phase by pouring the Sven Gard medium (previously dissolved in a boiling water bath) into a Petri dish, waiting for it to solidify, and then placing 3–4 drops of flagellar serum from the already defined phase at the centre of the dish and inoculating it with a loop of culture. After incubation at 37 °C for 24 h, the culture was tested with the flagellar sera to determine the presence of the other flagellar phase, if any.
Each selected Salmonella strain, as mentioned above, was cultured as a pure 18–24 h culture on five different types of agar medium from the frozen culture collection according to a random pattern in order to reduce potential systematic error. The agar plates were used as follows: tryptic soy agar (TSA, IZSUM-made production, Perugia, Italy), blood agar +5% sheep blood (BA, IZSUM-made production, Perugia, Italy), MacConkey agar (MCK, IZSUM-made production, Perugia, Italy), chromogenic agar for Salmonella (CROM, IZSUM-made production, Perugia, Italy), and Columbia agar +5% sheep blood (COS, BioMérieux, Marcy-l’Étoile, Francia). Among the 248 samples in the list, 42 were acquired only on TSA, MCK, and COS at the Alifax S.r.l. facility (Nimis, Italy), providing a second location in the dataset.
2.1. Spectrum Acquisition
Spectra were acquired using an ATR-FTIR spectrometer (5500a Series, Agilent, Santa Clara, CA, USA) working in the mid-infrared range (4000–650 cm
−1). The data were recorded at room temperature (25 ± 2 °C) via 64 scans at a resolution of 4 cm
−1. Spectrum collection was carried out using I-dOne IVD software v2.2 (Alifax S.r.l., IT) to store data according to the proprietary database’s requirements using the acquisition wizard, which helps the user in the process (
Figure 2). The complete procedure flows automatically and involves (i) crystal cleanliness check, (ii) background acquisition, (iii) sample deposition and spectral profile check, and (iv) spectrum acquisition. After each measurement, the ATR crystal was cleaned with a few drops of 70%
v/
v ethanol and wiped with tissue paper to avoid inter-sample cross-contamination. Data were collected from pure isolates on solid culture. Briefly, a small amount of sample (possibly an isolated colony) was picked with a 1 μL loop from the fourth quadrant of a culture plate and evenly spread on the ATR crystal during the sample deposition phase, followed by spectrum collection. The software itself checks the quality of the sample signal and allows for the automatic acquisition of the spectrum if it complies with the criteria defined by the manufacturer. Each acquisition takes less than two minutes. The purpose of the wizard is to guarantee standardised analytical procedures, solving the operator-dependent bias issue. For this reason, all of the acquired spectra were considered eligible for the data analysis, as they were compatible with the I-dOne software’s quality standard.
2.2. Dataset Construction and Metrics
The measurements were organised according to a random scheme across media and serogroups from three different operators and on three different instruments at two locations. The goal was to limit environment-, time-, instrument-, and operator-dependent biases in the dataset and the predictive algorithm, enabling better tuning of the bias–variance trade-off and generalisation of the model to unseen data [
30]. Overall, about 4500 spectra were acquired, with at least 3 technical replicates for each strain in each culture medium. Raw data were reprocessed using R software (v4.0.3) [
31] to build a new algorithm capable of differentiating the four
Salmonella enterica serogroups. Preliminary exploratory data analysis for visualisation purposes involved spectral processing with the Savitzky–Golay filter to obtain the smoothed second derivative, along with vector normalisation. The algorithm pipeline was fully elaborated under patent [
32], from the raw data up to the optimised model for the identification of unknown samples.
Building a multivariate predictive algorithm implies establishing and verifying a model and validating it on unknown spectra. This is generally achieved using different portions of the dataset.
Table 1 reports the sample size of the databases used for training and validating the prediction model, describing the serovars included in each part. Briefly, 30 strains from each of the four serogroups and 15 from
S. Typhi were included for the algorithm’s training, all sourced from each culture medium (135 isolates). In each class, the strains were chosen according to a proportional stratified random sampling, in order to include the greatest diversity with respect to the serovars. Where only one item was available, the distribution was random. Training the algorithm with diverse and balanced classes prevents bias towards one or more classes and allows for better model generalisation. Conversely, class weight data were used as parameters to tune the model towards
S. Typhi versus the other cases. The remaining 113 strains served as the validation set. Since not all of the validation strains were analysed from all of the culture media, the sample size was unbalanced across this factor: 113 strains on COS, MCK, and TSA, including 71 that were also measured on both BA and CROM. It should be noted that no strains from the training set were included in the validation set, preventing overfitting and ensuring unbiased performance results.
A 10-fold cross-validation stratified sampling scheme was applied to train and test the algorithm in order to estimate the model parameters that would maximise the identification performances on the test sets. Moreover, according to the I-dOne standard workflow, if the first identification of an unknown sample is not reliable, i.e., the spectrum is dissimilar to the database, a second or even a third independent acquisition might be required. Results were given after the comparison between up to three acquisitions. In the event of low agreement, the sample was deemed to be unidentifiable. The rationale behind the acceptance criteria is covered by the patent [
32]. The number of spectra per result was a performance metric that mirrors the effort required to achieve the final identification. Validation results were then obtained by running the optimised model on the validation set and following this workflow.
To benchmark the model’s performance on the validation dataset, sensitivity and accuracy percentages were retrieved for each class from the multiclass confusion matrix [
33]. In a multiclass classification problem, the confusion matrix is a table that is used to evaluate the performance of a classification model by showing the numbers of true positives, false positives, false negatives, and true negatives for each class. To demonstrate the algorithm’s efficiency across the different culture media, each strain–medium combination—represented by each plate—was treated as an individual sample, simulating the results attainable through a routine evaluation method. The class-wise sensitivity counted the number of correctly identified strains versus the overall number of samples for each class. The accuracy was computed as the ratio of the correctly identified samples to the total number of strains. In a multiclass scenario with unbalanced classes, individual class sensitivity—here, the serogroups—avoided the sample size bias. This ensured that the performance of the majority classes did not overshadow the minority ones. Additionally, presenting the actual number of correctly identified isolates, stratified by culture medium, highlighted potential issues related to specific conditions. Eventual misclassifications can be commented on by considering this factor.
2.3. Data Clustering Visualisation
A prior step in data classification is the visualisation of relationships among spectra. Several multivariate clustering techniques allow for the definition of whether similar spectra naturally cluster with respect to classes (serogroups). When dealing with a complex dataset containing hundreds of variables per spectrum, distilling it into a clear, visually interpretable 2- or 3-dimensional plot is challenging. In addition, t-SNE (t-distributed stochastic neighbour embedding) is a nonlinear, unsupervised dimensionality reduction technique that is used specifically for visualising complex datasets in lower dimensions (in this case, 2D, referred to as features) [
34]; it maps data based on similarity or underlying implicit structure in the dataset and identifies patterns, often revealing the natural tendency of data to form clusters based on shared features. This method can capture more complex, nonlinear patterns that linear methods, such as the principal component analysis, might miss. Briefly, each point represents a spectrum, and the t-SNE algorithm computes their coordinates so that very similar objects in the original space are spatially closed within this new space, and vice versa. The quality of the t-SNE clustering can be visually inspected, as the low-dimensional projection of the high-dimensional data aligns with the classes’ distinction.
4. Discussion
FTIR spectroscopy is known to be capable of subtyping
Salmonella at different taxonomic levels [
37,
38] and even at the phagotype level [
39]. In this context, FTIR is counted among the state-of-the-art methods from a phenotypic point of view [
40] while also finding indirect relationships from a genotypic point of view [
37,
41] or at the clonal level [
22]. For instance, Novais et al. [
42] described the potential use of FTIR for
Klebsiella pneumoniae typing, which is useful for monitoring outbreaks and supporting the control of nosocomial infections. Baldauf [
37,
43], Preisner [
38], and Cordovana [
40,
44] successfully developed algorithms for the identification of several
Salmonella enterica serogroups or serovars using transmission or attenuated total reflectance FTIR spectroscopy, discussing the role played by culture media in the spectral profile and the need to expand the database according to the expected use and the experimental framework. However, these cases always require sample pretreatment or time-consuming spectrum collection, as well as proposing offline identification algorithms, which are available only after complete data collection.
The present study aimed at evaluating the capability of ATR-FTIR spectroscopy coupled with I-dOne (Alifax S.r.l.) software to subtype among four
Salmonella serogroups (B, C1, D1, and E1) and the serovar Typhi. The choice of serogroups included in this study was based on clinical incidence [
9,
45] and public health impact [
46].
Figure 3 and
Figure 4 highlight how ATR-FTIR can effectively distinguish among serogroups, focusing on the 900–1200 cm
−1 range, reflecting the structural variations in the polysaccharide portion [
47,
48]. This spectral region can correspond to the O-antigen’s chemical structure, in which the lipopolysaccharide chains underpin the specificity of each serogroup, allowing for class attribution, as in the reference typing methods. Bacterial lipopolysaccharides consist of a lipid portion, responsible for the toxicity of the germ (called Lipid A), and a polysaccharide portion, responsible for the antigenic specificity of the soma (comprising the core and O-antigen). Lipid A is hidden within the membrane and has a similar structure across different Gram-negative bacterial species. Conversely, the polysaccharide portion is located on the outer membrane of Gram-negative bacteria, including
Salmonella. The diversity and uniqueness of each O-antigen characterising a specific
Salmonella serogroup depends on this external polysaccharide portion included in the O-antigen, along with its peculiar sugar sequence [
35,
49]. Several studies suggest that structural variations in the O-antigen are among the primary factors enabling FTIR spectroscopy to distinguish bacterial strains at the infra-species level and for other genera. For instance, Beutin was able to discriminate
Escherichia coli O4 from O123 via the O-antigen signature in the FTIR spectra [
50], Kuhm highlighted the role of the polysaccharide region in differentiating
Yersinia enterocolitica subtypes [
51], and Vogt found high concordance between genetic clustering and the 900–1200 cm
−1 plus 700–900 cm
−1 FTIR regions for typing the
E. cloacae complex strains for real-time surveillance and outbreak analysis [
48].
The identification algorithm was built with balanced classes to ensure that every class was adequately represented. An exception was made for S. Typhi, due to the rarity of the strains. A balanced database allows the performance of the model to be fairly evaluated in all classes. Otherwise, following the natural epidemiology of the Salmonella serogroups would result in a severely unbalanced situation, leading to a bias towards the minority class in favour of the most represented one.
The sample size distribution, favouring the training set over the validation set, allowed intra- and interclass variability to be included in the algorithm as best as possible. The larger the training database, the greater the possibility of recognising new and unknown strains. It must be said, however, that misclassifications cannot be excluded a priori, no matter what phenotypic identification method is considered.
An overall robust sensitivity was obtained over different culture media, with the class sensitivity always higher than 97%, and an overall accuracy of 98.3%. Since only four strains were misclassified, and errors were spread over one or more culture media, no specific strain- or media-driven bias was detected. Possibly, the mistakes may have been due to peculiar features that the individual strains displayed at the infrared level; deeper characterisation shall be carried out on these specific cases. Indeed, no diagnostic test for microbial identification is perfect. Although there are many sensitive and specific tests available for identifying microorganisms, it is important to recognise that no test can ensure 100% accuracy, not even more established techniques such as MALDI-TOF, which is still unable to distinguish
E. coli from
Shigella due to their intrinsic similarity [
52]. Moreover, the Statens Serum Institut (SSI) sera, commonly used for
Salmonella serotyping and widely regarded as a reliable reference for typing different serovars according to the Kauffmann–White scheme, do not offer 100% sensitivity and specificity; in fact, they can successfully type 99% of the known
Salmonella serovars, as described on the SSI Diagnostica website.
We chose to use two blood-based agar media (BA, IZSUM-made production; COS, BioMérieux) that, in principle, should not differ in composition; however, depending on the manufacturer, they may differ in terms of the proportions in the recipe or in the use of different raw materials. Including a commercial substrate in the design allowed us to monitor eventual drifts in the identification due to the media not included in I-dOne’s use specifications. Indeed, the fact that differences in identification outcomes for the same strain were especially found when comparing these two-agar media is likely explainable based on these—albeit slight—differences in terms of recipe, which may induce metabolic profiles and ATR-FTIR spectra that can overlap with the IR signatures of IR-related classes. This issue is strain specific. As the ATR-FTIR spectrum is a combined reflection of both phenotypes and growth media, predicting the spectral profile of a brand-new sample is challenging. Amiali et al. [
41] emphasised that, for a robust classification algorithm, a database should account for all key variables: strain origin, culture medium, environmental conditions, and instrumental drift, among others. All of these possible foreseen factors should be introduced to the framework in a non-confounded fashion. Only by expanding the training dataset over time with diverse strains and media is it possible to fully capture the statistical behaviour of each class, improving the algorithm’s discriminative capacity. In this study, class variance was expanded using serovars of diverse phenotypes, along with several culture media. The choice of the culture media was suitable to design an algorithm primarily intended for clinical microbiology routines, where standard screening culture media can be used, without the need for a medium-specific identification workflow.
This study also aimed to distinguish between NTS and typhoidal
Salmonella strains, given the different therapeutic approaches that these two types of infection require and the different prophylactic measures that must be adopted for patients by the competent health authority [
8].
We started considering
S. Typhi due to its higher incidence in Italy than the Paratyphi serovars (A, d-tartrate-negative B, C) (in the period 2016–2021: 214 strains of
S. Typhi, 47 of
S. Paratyphi A, 22 of
S. Paratyphi B, and 18 of
S. Paratyphi C), along with its multidrug resistance (MDR) capability, higher than that of paratyphoidal serovars (29.7% vs. 12.0%) [
45].
Although the sample size of S. Typhi was unbalanced with respect to other cases, the results were all consistent, except for one case predicted as a member of serogroup D1—formerly correct—in MCK medium. One similar misclassification occurred for another D1 strain flagged as S. Typhi in COS medium. These errors, although severe, can be justified considering that S. Typhi belongs to serogroup D1 and high physicochemical similarities at the FTIR level are plausible; hence, to date, overlapping between the spectral regions of the two classes cannot be ruled out. In this sense, misclassifications could be linked not to systematic errors with respect to growth conditions but to the peculiar behaviour of the strains in the media. Moreover, in this case, it is important to recall the interaction between the isolates and the growth media, as the latter inevitably alters the metabolism of the former. Despite efforts to include the greatest possible cross-factor variability, outliers cannot be excluded a priori.
In the absence of additional S. Typhi strains, the algorithm could be retrained and tested using alternative oversampling techniques or synthetic data in order to balance the overall dataset across classes and to refine the algorithm’s performance. Although this can be a common choice in machine learning methods, it is not straightforward to guess biological variability in synthetic data, and reusing the same spectra multiple times does not introduce realistic or beneficial variability. For this reason, such a route was not chosen. Actual clinical strains would always represent a proper reference in terms of training and validation results; hence, misclassification might be overcome by retraining the algorithm over an enriched database with more strains and growth conditions.
In addition, it is worth mentioning the possibility of further refining the identification of the serovar Paratyphi B variant Java from among the other strains of serogroup B. This serovar displays spectral features similar to those of serogroup B, as mentioned by Cordovana et al. [
40], and preliminary results from the current database indicate a sensitivity of 80% in the discrimination of the Paratyphi B variant Java versus the other serovars of group B. However, the spectral overlapping between Paratyphi B and other serovars of serogroup B would lead to misclassifications from both sides due to the high degree of similarity between the spectra of these classes. This is explained by the similarity in terms of capsular chemistry within serogroup B, which prevents a clear distinction, possibly related to the diversity within
S. Paratyphi B clones and the high genetic heterogeneity within this serovar, making it similar to other serovars [
40,
53,
54,
55]. Accordingly, it is premature to include the Paratyphi B variant Java in a potential list of reliably identifiable serogroups. On the other hand, whereas the d-tartrate-positive Paratyphi B variant Java is NTS, the possibility of ruling out Paratyphi B d-tartrate-negative (which is among the typhoidal salmonellae) could help clinicians with patient management [
56,
57]. Therefore, further database enrichment should be carried out in order to better define the spectral features of this class and improve its identification performance.
The additional purpose of this study was to design an algorithm that could be implemented in an all-in-one system that streamlines the full process from the data acquisition to the analysis and simply returns the result of the identification, as the I-dOne CE-IVD software v2.2 does. The inclusion of the developed algorithm within or consecutively to I-dOne’s workflow has several major potential benefits. On the one hand, it makes it possible to take advantage of the automated quality control of the spectral profile prior to any acquisition; on the other, there is the possibility of combining strain identification at the spp. level and subsequent typing with the same instrument and the same pipeline within minutes: a spectroscopy-based solution not yet proposed in the state of the art.
In addition to the implementation of the Salmonella identification route, the ATR-FTIR technique coupled with I-dOne software merits additional comments. It ensures rapid sample processing without the need for pretreatment, reagents, or concerns about carryover, as the I-dOne software automatically verifies the crystal’s cleanliness before and after each measurement. As ATR-FTIR analyses one sample at a time, the progress of each measurement can be monitored in real time. The software’s built-in wizard enables the operator to visually track the spectrum’s development immediately after sample deposition and to eventually adjust to ensure the sample’s homogeneity on the ATR crystal, meeting I-dOne’s stringent standards. With the present algorithm, only one spectrum for each strain was enough to return a reliable identification. Considering that it takes less than 2 min for a complete identification, this system qualifies as quick and easy to use, speeding up the performance.
5. Conclusions
For Salmonella spp., serotyping analysis through the WKL scheme still represents the current gold-standard reference method; however, since it is expensive and time-consuming and requires considerable expertise and visual interpretation by the operator in the interpretation of the results, it is rightly restricted only to regional or national reference laboratories.
Nevertheless, preliminary discrimination at the serogroup level, performed by routine clinical laboratories through rapid and user-friendly methods, represents the first important indication for epidemiological investigations and for the control of foodborne outbreaks, as well as for the clinical management of salmonellosis.
In this study, we found that the ATR-FTIR system could represent a reliable and promising method for the discrimination of Salmonella spp. at the serogroup level.
The advantages of this system’s use are related both to the user-friendly nature of the automated software—which provides fast results in minutes directly from the pure culture on agar plates, avoiding any operator-dependent bias—and to the equipment, which requires almost no maintenance and does not require reagents.
For these reasons, the ATR-FTIR methodology seems to be easily implementable within routine laboratory activities as an alternative and rapid method for initial Salmonella typing at the serogroup level. The I-dOne suite offers the possibility to combine species and subspecies identification within the same workflow and with the same instrument, streamlining the process. Moreover, proper data collection could result in a flexible method for use in various laboratory setups.
Further studies on improving the discrimination of S. Typhi from other serovars of serogroup D1, as well as the identifiability of paratyphoid (S. Paratyphi A, S. Paratyphi B d-tartrate-negative, and S. Paratyphi C) and other serogroups, will be needed.
Limitations remain with regard to the refined identification of other serovars, since every machine learning-based method requires a large and well-described database of isolates, which could be overcome in the future. At the same time, growth conditions (e.g., incubation setup, culture medium) must be considered and included in the study design in order to identify the boundary conditions for the application of the method. This study highlights how expanding the cross-factors towards several setups is feasible, making the ATR-FTIR approach scalable.
Concurrently, further studies should be conducted on how the antibiotic resistance profiles of the Salmonella strains could influence the ability of the ATR-FTIR method to identify them.