Next Article in Journal
Newborn Screening by DNA-First: Systematic Evaluation of the Eligibility of Inherited Metabolic Disorders Based on Treatability
Previous Article in Journal
Evaluation of Neonatal Screening Programs for Tyrosinemia Type 1 Worldwide
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Digital-Tier Strategy Improves Newborn Screening for Glutaric Aciduria Type 1

1
Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, 69120 Heidelberg, Germany
2
Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), 69118 Heidelberg, Germany
3
Division of Pediatric Neurology and Metabolic Medicine, Department of Pediatrics I, Center for Pediatric and Adolescent Medicine, Medical Faculty of Heidelberg, Heidelberg University, 69120 Heidelberg, Germany
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work and should be considered joint first author.
These authors contributed equally to this work and should be considered joint last author.
Int. J. Neonatal Screen. 2024, 10(4), 83; https://doi.org/10.3390/ijns10040083
Submission received: 11 October 2024 / Revised: 20 November 2024 / Accepted: 10 December 2024 / Published: 21 December 2024

Abstract

:
Glutaric aciduria type 1 (GA1) is a rare inherited metabolic disease increasingly included in newborn screening (NBS) programs worldwide. Because of the broad biochemical spectrum of individuals with GA1 and the lack of reliable second-tier strategies, NBS for GA1 is still confronted with a high rate of false positives. In this study, we aim to increase the specificity of NBS for GA1 and, hence, to reduce the rate of false positives through machine learning methods. Therefore, we studied NBS profiles from 1,025,953 newborns screened between 2014 and 2023 at the Heidelberg NBS Laboratory, Germany. We identified a significant sex difference, resulting in twice as many false-positives male than female newborns. Moreover, the proposed digital-tier strategy based on logistic regression analysis, ridge regression, and support vector machine reduced the false-positive rate by over 90% compared to regular NBS while identifying all confirmed individuals with GA1 correctly. An in-depth analysis of the profiles revealed that in particular false-positive results with high associated follow-up costs could be reduced significantly. In conclusion, understanding the origin of false-positive NBS and implementing a digital-tier strategy to enhance the specificity of GA1 testing may significantly reduce the burden on newborns and their families from false-positive NBS results.

1. Introduction

Worldwide, newborn screening (NBS) programs aim to identify newborns with treatable severe rare diseases at an early stage, ideally pre-symptomatically, to enable an early start of treatment. Therefore, blood samples from newborns are collected on the first days of life (i.e., in Germany at 36–72 h of life) and sent to an NBS center for analysis. Hence, NBS is considered a highly effective public health program for secondary prevention [1,2]. Glutaric aciduria type 1 (GA1; OMIM: #231670) is a rare autosomal recessive inherited metabolic disease and part of NBS, with an estimated birth prevalence of 1:135,000 newborns in Germany [1]; however, it is found at a significantly higher prevalence in known high-risk populations [3,4,5]. The disease is characterized by an accumulation of glutaric acid, 3-hydroxyglutaric acid, and glutarylcarnitine (Glut) [6]. Untreated, 80% to 90% of patients with GA1 experience a metabolic decompensation and neurological complications due to striatal injury, particularly a complex movement disorder with predominant dystonia [6,7,8]. Despite its severe clinical manifestation, GA1 is considered a treatable disease in screened populations, since early diagnosis and the start of treatment improve the outcome [6,9,10,11,12]. Based on their amount of urinary excretion of glutaric acid, high (>100 mmol/molCrea) and low excretors (≤100 mmol/molCrea) are distinguished [13]. Importantly, low excretors have a similar risk for movement disorders, but often show only slight elevation or even normal concentrations of Glut, the primary NBS parameter. This has led to the need for relatively low cut-off values for Glut in NBS to avoid false-negative results [2,14,15]. This allowed a sensitivity of 95.6% for GA1 in Germany during 1999–2016 [2] at the cost of a high number of false positives, lower specificity and positive predictive value (PPV). Moreover, other factors influencing Glut levels and leading to false positive NBS results have been reported, e.g., renal insufficiency of the newborn [16,17]. Due to the relative high number of false positives, confirmatory diagnostics for suspected GA1 cases in NBS programs are frequently employed and associated with follow-up costs. They involve reanalysis of acylcarnitines, analysis of urinary 3-OH-GA concentration, and in some cases genetic and/or enzymatic testing to further support or rule out the diagnosis [12]. In recent decades, especially second or multiple tier testing was established to improve specificity in NBS and to reduce the number of false-positive NBS results, while maintaining 100% sensitivity, and minimize the harm false-positives cause to newborns and their families [18,19,20]. However, biochemical second tier methods for GA1 are not established and, hence, new methods need to be developed.
Recently, there has been a notable rise in the application of data-driven methodologies specifically machine learning (ML) and deep learning aimed at refining classification tasks within medical data sets and NBS contexts [21]. Numerous studies have demonstrated that these methods have the potential to enhance classification accuracy by minimizing false-positive rates and unveiling previously unidentified metabolic patterns within data sets by making use of the information of all metabolite concentrations screened for in NBS [22,23]. Among the previously applied ML methods, logistic regression (LR) and support vector machines (SVMs) showed good performance for NBS classification in single and comparative studies [21]. Moreover, more complex methods such as Feed Forward Neural Networks (FFNN) [24], boosting methods [22], and Random Forest (RF) [23] showed good results in reducing specificity, but did not always maintain 100% sensitivity. Furthermore, the efficacy of a digital-tier strategy, which acts as a second screening step after regular NBS, was previously demonstrated for isovaleric aciduria reducing the high number of false positives by about 70% through a combination of regular NBS with ML classification [25].
This study aims to improve the understanding of false-positive NBS results in regular NBS for GA1, to develop a full data and digital-tier strategy for NBS for GA1 based on ML classification methods, and, finally, to increase the specificity of NBS for GA1 and reduce the burden on falsely suspected newborns and their families.

2. Materials and Methods

2.1. NBS Data Set-Composition, Extraction and Data Cleaning

Heidelberg University Hospital’s (UKHD) NBS laboratory screens about 20% of newborns in Germany; approximately 140,000 newborns annually. The set of NBS variables was anonymized, and data extraction and evaluation were performed in accordance with the European general data protection regulation (GDPR). Thus, the approval of the local ethics committee was not necessary.
The data set for NBS encompasses a total of 61 variables, including 52 metabolite concentrations alongside birth weight, sex, gestational age, birth year, age at blood sample collection, sample arrival time, indicators specifying suspected and subsequently confirmed diagnoses, and comments on clinically relevant information (Supplementary Table S1A). To ensure the validity of the data set, it was meticulously narrowed down. It includes only the initial NBS results of newborns born at a minimum gestational age of 32 weeks, aged at least 36 h at the time of sampling, and possessing unremarkable NBS reports, herein referred to as ‘normal’. Additionally, all profiles of newborns flagged with suspected GA1, whether confirmed or later excluded as false positives, were isolated. The NBS data set comprises profiles from a total of 1,055,885 newborns. This includes 604 cases flagged as suspected GA1, and nine with confirmed GA1, spanning births between 2014 and 2021. In this period, no false negative GA1 case was reported. For Heidelberg GA1 screening, this leads to a sensitivity of 100%, a specificity of 99.94%, a false-positive rate of 0.06%, and a PPV of 1.5% between 2014 and 2021. Data cleaning procedures were executed on the extracted data set to uphold high data quality standards. Supplementary Figure S1 shows the data extraction and data cleaning steps performed on the data set. To achieve this, specific ranges were defined to exclude data sets containing implausible values: birth weight: 1000–6000 g; gestational age: 32–42 weeks; age at sampling: 36–120 h; age at sample arrival: 0–20 days; and metabolite concentrations: 0–50,000 μmol/L. Moreover, categorical values underwent conversion into numerical representations. In addition, three metabolite variables, namely glutamine, succinylacetone, and immune reactive trypsin, were excluded from the data set due to a substantial number of missing values, attributed to their intermittent measurement within the designated time frame.
Finally, the total data set for analysis (hereafter “full data set”) contained 1,025,953 NBS profiles (including 494 cases with the suspected diagnosis GA1, hereafter “suspected diagnosis data set”); see Supplementary Figure S1. The suspected diagnosis data set included nine subsequently confirmed GA1 cases, with six low excretors, three high excretors, and 485 confirmed false positives. For newborns with suspected GA1, comments indicating symptoms or treatments of newborns (e.g., parental nutrition, medication, transfusion, severe comorbidities, etc.) on the initial NBS card were retrieved. Moreover, results of quantitative analysis of urinary 3-OH-GA with stable isotope (normal, elevated) and results of further testing to disprove or confirm the suspected diagnosis of GA1 were retrieved in cases where the information was returned to the UKHD NBS laboratory. If no information on further testing was available, profiles were marked as lost to follow up.
Additionally, an independent test data set from the NBS laboratory in Heidelberg was extracted after the ML algorithms’ initial training and validation. The test data comprised screening profiles from January 2022 to October 2023, and consisted of 257,414 NBS profiles. The test data set was curated based on the same data cleaning and exclusion criteria as the original GA1 data set.

2.2. Data Analysis Methods

Data analysis methods aim to learn information and patterns from large amounts of data which support an ML-based classification method. In a binary classification problem, the data points x i R m belong to one of two classes, x i C 0 or x i C 1 .

ANOVA

Analysis of Variance (ANOVA) [26] is a statistical method that assumes the underlying data X R m × n with m features and n data points to be normally distributed. ANOVA tests whether the means μ 0 , μ 1 of different classes C 0 , C 1 are significantly different. This method is evaluated with a predefined p-value. The null hypothesis H 0 assumes that the means of the two classes are equal,
H 0 : μ 0 = μ 1 .
The alternative hypothesis H A assumes that the means are different,
H A : μ 0 μ 1 .
By this, the ANOVA can be applied to test whether the mean values of a feature differ between classes.

2.3. Machine Learning Classification Methods

ML classification algorithms attempt to learn a pattern for a classification task based on a labeled data set by updating internal model parameters. For the ML evaluation, the best-performing algorithms in comparative studies on ML-based NBS were applied [21,25].

2.3.1. Logistic Regression

Logistic regression (LR) stands as a discriminating approach, focusing on modeling the posterior probability distribution P ( Y | X ) of the target variable Y given the features X. This method draws from linear regression principles centered on capturing linear relationships within the data by identifying the most suitable linear model,
y ^ = β 0 + β 1 · x 1 + β 2 · x 2 + + β m · x m ,
through adjusting the regression coefficients β 0 , . . . , β m . Since LR is designed to estimate the probability of a data point belonging to a particular class, it employs a probability measure of class membership of the feature vector,
P ( y = 1 | X = x i ) = 1 1 + e ( x i T β ) .
During training, the regression coefficients are fitted using a maximum log-likelihood method to maximize the probability of obtaining the observed results with the fitted coefficients [24,27,28].

2.3.2. Ridge Logistic Regression

Ridge logistic regression (RR) enhances the LR model by imposing penalties on the complexity of the resulting model. Therefore, an additional regularization parameter λ > 0 is added to the LR function, and an additional λ β 2 is added to the log-likelihood [29,30]. In the RR optimization, coefficients are constrained by the square of the Euclidean norm of the coefficients. Hence, the regularized log-likelihood is
l r ( β ) = l ( β ) λ 2 k = 1 m β k 2 ,
where β are the regression coefficients and the penalty parameter λ regulates the degree of shrinkage towards zero [31].

2.3.3. Support Vector Machines

Support vector machines (SVMs) attempt to find a separating hyperplane between two classes by transforming the features x i of a data point x R m into a higher dimensional space [24,28]. A linear hyperplane can be written as
w T x + b = 0 ,
where w R m is the orthogonal vector to the hyperplane, and b R m is the distance of the hyperplane from the origin. A margin can be defined for the data point x R m . A true positive x has a margin of ( w · x + b ) > 0 , and a true negative x has a margin ( w · x + b ) < 0 . The data points x nearest to the decision boundary are called support vectors. An SVM determines the decision boundary as a linear combination of support vectors. In the case of a hard-margin linear SVM classifier, maximizing the margin entails solving a quadratic, constrained optimization problem to determine the optimal parameters w and b. However, data points from distinct classes cannot be effectively separated in many scenarios by a linear decision boundary. Therefore, kernel functions K ( · , · ) : R n × R n R are applied for non-linear SVM. These transform the input data into a high-dimensional feature space where the data are linearly separable.

2.4. Experimental Setup

For all evaluations, we applied the programming language Python (Python Software Foundation; Python Language Reference, version 3.9.2, available at http://www.python.org (accessed on 12 March 2023) and the Python libraries scikit-learn [32] (version 1.0.2) and scipy [33] (version 1.10.1). Each ML method was evaluated on the full data set and in a digital-tier strategy. The digital-tier strategy combines regular NBS with the subsequent application of ML methods to the suspected diagnosis data sets. To address the data imbalance, the optimal class weight parameter w 0 , w 1 for each classification method was identified to penalize misclassification of true positives more heavily in the cost function during optimization [32]. Therefore, the majority class weight parameter w 0 is set to 1, and a grid search to determine the optimal value for the minority class weight parameter w 1 is applied.

2.5. Validation

The suspected diagnosis and the full data sets were randomly split into 80 % training and 20 % validation sets for evaluation. The classification performance on both data sets was evaluated with the confusion matrix C,
C = T N F P F N T P ,
with true negatives (TN), false positives (FP), false negatives (FN), and true positives (TP). These results were then validated with ten repeats of 5-fold cross-validation on the two objectives, maintaining 100 % sensitivity S n and maximizing specificity S p ,
S n = TP TP + FN , and S p = TN TN + FP .
For the digital-tier strategy, the specificity is reported as combined specificity using ML as an additional step to traditional NBS for comparability with the results on the full data set. Furthermore, the ML algorithms were tested on an independent test data set which was extracted after the algorithms’ initial training and validation.

3. Results

3.1. Data Analysis

The data analysis revealed a sex imbalance in the suspected diagnosis data set (Figure 1). The false positives were divided into 326 ( 67 % ) males and 159 ( 33 % ) females, whereas GA1 was similarly prevalent in male (n = 4) and female (n = 5) newborns. This finding was stable across the birth years 2014 to 2021, with a larger number of false-positive males than females in every year investigated, Figure 1. This imbalance could be caused by differences in the distribution of the Glut value in male and female newborns (ANOVA p < 0.0001 ), as the mean Glut value of newborns without GA1 is 0.16 ± 0.058 in males and 0.15 ± 0.055 in females. An evaluation of false-positive NBS profiles on the test data set from 2022 to 2023 also revealed sex-specific differences, as the data set consisted of 135 (57%) males and 100 (43%) females.
While for 93% (n = 452) of newborns with false-positive screening results, no symptoms or treatments were indicated at NBS sampling, for 2% (n = 10) there was kidney insufficiency, and for 5% (n = 23), there were other major abnormalities potentially interfering with NBS results reported, such as postnatal transfusion, medication or sepsis (Figure 2A). In the false-positive data set, GA1 was excluded by quantitative analysis of urinary 3-OH-GA in 90% of the profiles (n = 435). In 7% (n = 34) of all suspected cases who were finally excluded as false positives, urinary analysis revealed elevated 3-OH-GA levels, prompting further evaluations such as genetic testing, enzymatic testing, or both to rule out the diagnosis. For 3% (n = 16), no follow-up data on urinary 3-OH-GA concentration was available (Figure 2B). For three patients in the false-positive data set, the UKHD NBS laboratory received feedback on the presence of a heterozygous pathogenic GCDH variant after genetic testing. All newborns heterozygous for GCHD showed elevated urinary 3-OH-GA, whereas none of the newborns with renal insufficiency had elevated 3-OH-GA levels (Supplementary Table S2).
Furthermore, we applied ANOVAs to compare the mean and standard deviation of measured metabolites in dried blood samples across three distinct groups: normal NBS profiles, false-positive NBS profiles, and newborns diagnosed with GA1 (Supplementary Table S1B).
As expected, the mean (±standard deviation) concentration of Glut was highly elevated in newborns with GA1 ( 2.698 ± 1.548 μ mol/L) compared to newborns with suspected, but not confirmed GA1 ( 0.526 ± 0.106 μ mol/L, ANOVA p < 0.0001 ) and newborns with normal NBS profiles ( 0.157 ± 0.057 μ mol/L, ANOVA p < 0.0001 ); see Supplementary Table S1B. The mean and standard deviation of Glut in GA1 patients was 1.9 ± 1.28 μ mol/L in the six low excretors, and 4.3 ± 0.2 μ mol/L in the three high excretors.
Then, we performed ANOVAs with significance levels of 5% (p-value < 0.05 ) on the full GA1 data set, comparing the metabolic profiles of patients with GA1 and all other NBS profiles (Table 1A). Furthermore, we compared the metabolic profiles of individuals with confirmed to those of individuals with false-positive NBS results using ANOVA with significance level of 5% (p-value < 0.05 ) (Table 1B), and confirmed the known biomarker Glut as a significant feature in both data sets (Table 1). However, further significant features (e.g., homocitrulline (Hci), isovalerylcarnitine (C5)) were present in the full data set. In contrast, other parameters (e.g., decanoylcarnitine (C10), tetradecenoylcarnitine (C14:1)) were identified as significant features with the highest F values by the ANOVA in the suspected diagnosis data set (Table 1). The box plots (Supplementary Figure S3) provide a detailed view of these features for both data sets. Overall, the mean and the median concentrations of most acylcarnitines, e.g., C10, are higher in the false-positive group than in newborns with normal NBS profiles or confirmed GA1 (Supplementary Table S1B and Supplementary Figure S3). The ANOVA was then used as a feature selection method to reduce the dimensionality of the data set for the ML methods.

3.2. Machine Learning Results for Full and Suspected Diagnosis Data Set

The digital-tier strategy describes an additional step after the first newborn screening. Here, the newborn screening profiles of all newborns that are suspected positive by the initial newborn screening are classified, in analogy to biochemical second-tier methods, with a machine learning method into ‘normal’ and ‘suspected GA1’ profiles, and by this the number of false positives that need to be further analyzed is reduced. For the evaluation of the machine learning approach on the full data set, as well as on the suspected diagnosis data (digital-tier strategy), we compared the classification results of different classification methods (LR, RR, and SVM), Table 2. Overall, the best methods on both data sets decreased the number of false positives while having no false negatives in the respective training and validation data set (Table 2), which is important since all newborn with GA1 need to be detected correctly. For most algorithms, basing the evaluations on Glut and C10 led to the best results.
On both data sets, the RR method performed worse than LR and SVM in terms of overall false-positive rate reduction. The LR, when only based on the features Glut and C10, presented the best-performing algorithm for reducing false positives while minimizing the number of false negatives for both data sets. LR overall reduced the number of false positives by 93.61% on the full data set, and by 95.05% on the suspected diagnosis data set for training and validation. Despite the LR, RR, and SVM algorithms demonstrating no false negatives on a randomly stratified split training and validation set, the 5-fold cross-validation using stratified splitting revealed that none of the algorithms achieved 100% sensitivity.
Therefore, a grid search with five-fold stratified cross-validation over the applied class weight parameter w 0 , w 1 was applied to achieve 100% cross-validation sensitivity. The parameter w 0 was set to w 0 = 1 , and the optimal parameter w 1 was searched in the interval I F = [ 1 , 50 , 000 ] for the full data set, and in the interval I S = [ 1 , 500 ] for the suspected diagnosis data set. The best-performing methods in terms of highest mean sensitivity and specificity are presented in Table 2D. Since, on the full data set, no method achieved 100% sensitivity, only the results of the suspected diagnosis data set are shown (Table 2D). All three methods (LR, RR, and SVM) achieved 100% sensitivity in cross-validation, by increasing the class weight w 1 to values between 180 and 183. However, this weight adaptation also reduced the specificity of all three methods. The LR classification achieved the best results on the training and validation set with 147 false positives, while SVM (164 FP) and RR (235 FP) had higher false-positive rates (Table 2D). Hence, the false positives were reduced by 69.69% with LR, 66.19% with SVM, and 51.55% with RR, compared to traditional NBS, while maintaining 100% sensitivity.
To validate the ML classification results on GA1, we extracted an independent test data set, including data from January 2022 to October 2023, from the NBS laboratory at UKHD. On the test data set, the LR classification method, which was initially trained on the full data set, achieved a reduction in false positives of 93.19% compared to traditional NBS (reduction from 235 to 16 FP results) on the test data set (Table 2B). The LR classification method, which was initially trained on the suspected diagnosis data set, achieved a reduction in false positives of 92.34% compared to traditional NBS (reduction from 235 to 18 FP results) on the test data set (Table 2C). The optimized LR method, classified 115 NBS profiles incorrectly as GA1, translating to a false-positive reduction of 51%. Overall, on the suspected diagnosis data set all ML methods identified patients with GA1 correctly, resulting in 100% sensitivity on the test data set (Table 2).

3.3. Machine Learning Results for False-Positive Subgroups

Data analysis identified different subgroups within the false-positive NBS profiles. To evaluate whether the reduction in false positives using ML corresponds to one of these subgroups specifically, we analyzed how these subgroups are divided by the LR classification in the 485 false-positive profiles used for the training and validation, as well as the 235 false-positive profiles used for testing. The digital-tier strategy with LR, which was trained on the suspected diagnosis data set (Table 2C) was applied and NBS profiles of newborns with and without kidney insufficiency, as well as newborns with and without elevated urinary 3-OH-GA were investigated. Analysis of results of 3-OH-GA in urine revealed relatively more (7%) elevated profiles in the original training and validation data set, than in the test data set (2%). Kidney insufficiency was reported in 2% of profiles in the original data and 1% of profiles in the test data set (Figure 3). There was no consistent distinction of newborns with kidney insufficiency or other indicated abnormalities using the LR method: 50% of the profiles with kidney insufficiency in the training and validation data set were classified as GA1, and 50% were classified as normal by the LR method (Figure 3). In the test data set all profiles with kidney insufficiency were predicted to be normal (Figure 3). Similarly, we did not find a complete overlap between the false-positive group identified by the LR prediction and the group with elevated 3-OH-GA, since two of the profiles with elevated 3-OH-GA are falsely classified as GA1, and 21 of the profiles without elevated 3-OH-GA are falsely classified as GA1 (Figure 4). In the test data set, all falsely classified newborns with GA1 are profiles without elevated 3-OH-GA levels (Figure 4).

4. Discussion

NBS is a highly successful instrument of secondary disease prevention. However, NBS for rare diseases such as GA1 faces a number of challenges, including false-positive screening results [34]. In this study, an in-depth analysis of false-positive NBS profiles for GA1 was performed, and a digital-tier strategy in which an ML classification method is applied as a second step after regular NBS was developed to decrease the number of false positives.
In the NBS training and validation data set, a false-positive rate of 0.047% (1,026,447 NBS profiles and 485 FP) was observed, which is higher than previously reported rates for other metabolic NBS target diseases [35], highlighting the necessity for improvement. Further investigation of all 720 false-positive screening results (including training/validation and test data set) revealed a sex-specific imbalance. This imbalance appears to stem from differences in the distribution of Glut values between male and female newborns. Future studies are needed to explore the possibilities of sex-specific differences in GA1 screening. Sex-specific ML models for example could be developed to improve classification accuracy. Nonetheless, the low number of confirmed GA1 cases among male and female patients presents a significant challenge for validating these models [36,37].
A comprehensive assessment of the false-positive screening profiles revealed that in the majority of cases, diagnosis of GA1 was ruled out following the analysis of 3-OH-GA in urine samples. However, in accordance with current international guideline recommendations [12], for 34 newborns (7% of the false-positive data set) with elevated 3-OH-GA results, initiation of metabolic treatment and further genetic and/or enzymatic tests was necessary to confirm or exclude the diagnosis, constituting a possible burden for affected families [38]. Additionally, the analysis identified that only 2% of the false-positive screening results were associated with kidney insufficiency, a known cause of false-positive NBS for GA1 results, although previous reports attributed a larger proportion of false-positive screening results in GA1 to kidney insufficiency [16,17]. While it is unclear if this elevation of Glut levels in newborns with kidney insufficiency is caused by disturbances in specific renal transporter systems of glutaric acid and its derivatives, or a general reduction in acylcarnitine excretion, it is important to note that the prevalence of kidney insufficiency in this study might be underestimated, as the condition might not have been diagnosed at the time point of screening and therefore not indicated on NBS cards.
Interestingly, the best-performing ML strategy developed in this study similarly reduced the number of false-positive screening profiles associated with subsequently normal and elevated urinary 3-OH-GA levels, as well as with and without kidney insufficiency. This indicates that the reduction in false-positive results through this strategy is not due to a pattern in the Glut and C10 values used by the algorithm correlating to urinary 3-OH-GA levels or kidney insufficiency. Moreover, it provides evidence that the strategy not only reduces the number of false-positive results in general, but eventually reduces the need for genetic and/or enzymatic testing and associated costs as well, thereby further lessening the impact of false-positive NBS on newborns and their families.
The best-performing ML methods on both the full and suspected diagnosis data sets utilized Glut and C10 as crucial features for classification, confirming well-established knowledge [12]. This consistent importance of the Glut parameter across data sets suggests that reevaluating current cut-off values for GA1 in NBS could benefit future studies. In addition to Glut, the ANOVA on the suspected diagnosis data set identified C10 as a significant feature. Importantly, C10 is known to be elevated in other inherited metabolic disorders, such as multiple acyl-CoA dehydrogenase deficiency, but not in GA1 [39]. In line with this, analysis of acylcarnitine measurements revealed that C10 and other acylcarnitines showed higher mean and median values in the false-positive group compared to those in the unremarkable NBS profiles and newborns with GA1 (see Supplementary Table S1B and Supplementary Figure S1). The high levels of several acylcarnitine measurements could be a potential reason for the high number of false-positive screening results for GA1. These findings suggest that Glut could be complemented by other parameters such as C10 or an overall increase in acylcarnitine profiles to serve as a more effective indicator of potential false-positive screening results for GA1. Furthermore, it has to be considered that the metabolites found to be elevated in NBS might actually represent other compounds that are isobaric and might explain some of the false positives.
ML methods can help to model such complex data relationships, and have recently been shown to improve NBS results [21,22,23,25]. In the training and validation data sets, the LR method demonstrated excellent performance, reducing the number of false positives by 93.61% on the full data set and by 95.05% on the suspected diagnosis data set, simulating a use as a digital-tier after traditional NBS, while correctly identifying all patients with GA1. This high level of performance was maintained on an additionally extracted test data set from newborns born between 2022 and 2023. However, only the LR method with cross-entropy loss function weight w 1 optimized for cross-validation achieved 100% sensitivity in cross-validation. This method reduced the false-positive rate on the suspected diagnosis data set by nearly 70% in training and validation and by 51% on the test data set (Table 2D). The grid search for optimal class weight parameters was crucial for increasing specificity while maintaining 100% sensitivity, which is essential for NBS to ensure all affected newborns are identified. These results suggest that ML methods can increase specificity, i.e., reducing the number of false positives, in NBS for GA1, especially if used in a digital-tier strategy. Importantly, these methods can be applied automatically within a few minutes at minimal costs. False-positive NBS results incur additional costs and efforts, including the need for physicians to communicate the remarkable results to local hospitals and families, clinical evaluations of the newborns, and sampling for confirmatory diagnostics by pediatricians. These processes also involve expenses for metabolic and genetic analyses. Therefore, ML methods can, by increasing the specificity of the screening, alleviate the burden on infants and their families while enhancing the cost-effectiveness of NBS.
However, due to the small number of true positives (nine affected patients in the training and validation data set), the relevance and significance of the results are challenging to estimate. An unfavorable splitting of the data set in cross-validation could decrease sensitivity. Despite this, both LR methods correctly identified all patients with GA1 in the training, validation, and test data sets (eleven patients with GA1). Moreover, since the LR model was only trained on data from the Heidelberg screening laboratory in Germany, the model may only apply to other NBS laboratories with center-specific data retraining, due to variations in screening procedures, equipment, and materials. Nevertheless, other screening centers could adopt the presented experimental setup and identified features to train an LR model optimized for their specific data sets, thereby improving the false-positive rate for GA1 in their patient populations. For the application of these methods in clinical practice, it must be ensured that the same data ranges such as the exclusion of newborns with a gestational age smaller than 36 weeks, are applied to the new data samples. Analogous to different cut-offs and algorithms for preterm and term newborns in the biochemical NBS. Samples that do not lie within these ranges are then analyzed according to current NBS, and not with a digital-tier.
In the future, a prospective parallel evaluation of the algorithms as a potential clinical decision support system would be beneficial to determine which algorithm yields the best results. Additionally, the findings should be validated on new data sets, including more positive GA1 cases. Moreover, recently post-analytical tools such as the Collaborative Laboratory Integrated Reports (CLIR) Tool, which is based on continuous adjustments of covariates instead of traditional cutoff values are applied in newborn screening [40]. Due to data protection laws and privacy restrictions a comparison between the CLIR tool and the presented method was not possible; however, this could be investigated in future studies. Future studies could also investigate other state-of-the-art ML methods for classification tasks such as Neural Networks [41], XGBoost [42], and LightGBM [43] to improve NBS for GA1. However, for clinicians, an explanation is helpful to estimate the reliability of the system’s decision [44]. Therefore, in the application of more complex methods, their interpretability should also be addressed, such as with the use of explainable artificial intelligence methods, which showed promising results in NBS for isovaleric aciduria [22] and clinical decision support systems [45].
In general, ML methods are expected to be more frequently applied in the clinical context [46]. However, it is not clear how they can be applied in critical areas such as decision support systems. Therefore, it needs to be determined which ethical and legal requirements are needed to apply ML methods [47]. Additionally, addressing patients’ fears of discrimination by the algorithms and securing data protection will be important steps to integrate artificial intelligence into the medical domain. In particular, in personalized treatment and precision medicine, data-based methods could enhance clinical work such as the digital metabolic twins for newborns and infants which predicted known biomarkers and responses to treatment strategies of inherited metabolic diseases [48].
Overall, this study provides evidence that ML methods can be implemented to increase the specificity of NBS for GA1, thereby helping to reduce the possible burden of false-positive screening results on newborns and their families, and offers new perspectives on NBS in the future.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijns10040083/s1, Figure S1: Data extraction and data cleaning flow chart for newborn screening data; Figure S2: Box plots with distribution of Glut, Hci, and C10 in normal, false positive and GA1 newborn screening profiles; Table S1: Data overview; Table S2: Patient subgroup analysis; Table S3: Feature selection overview.

Author Contributions

Conceptualization: E.Z., J.T., N.B., S.H., S.F.G., U.M., S.K. and V.H.; methodology: E.Z., J.T., N.B., S.F.G., S.H. and U.M.; software, E.Z.; writing—original draft preparation: E.Z., J.T. and U.M.; writing—review and editing: E.Z., J.T., N.B., S.F.G., S.H., P.F., G.F.H., S.K., U.M. and V.H.; supervision: U.M., S.K. and V.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Klaus Tschira Foundation through the Informatics for Life project and the Dietmar Hopp Foundation, St. Leon Rot, Germany (grant numbers 2311221, 1DH2011117 and 1DH1911376 to G.F.H. and S.K.). The authors confirm independence from the sponsors; the content of the article has not been influenced by the sponsors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Patient consent was waived due to the Heidelberg University Hospital (UKHD) data protection officer checking the set of NBS variables to be anonymized, data extraction and evaluation to be in accordance with the European General data protection regulation (GDPR).

Data Availability Statement

The NBS data that support the findings of this study are not publicly available due to privacy restrictions. Supplementary data that support the findings of this study are available in the supplementary material of this article (Supplementary Tables S1–S3).

Acknowledgments

For the publication fee we acknowledge financial support by Heidelberg University. The present contribution is supported by the Helmholtz Association under the joint research school HIDSS4Health—Helmholtz Information and Data Science School for Health.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
3-OH-GA3-hydroxyglutaric acid
ANOVAanalysis of variance
C14:1tetradecenoylcarnitine
C5isovalerylcarnitine
C10decanoylcarnitine
GA1glutaric aciduria type 1
Glutglutarylcarnitine
GDPRgeneral data protection regulation
Hcihomocitrulline
LRlogistic regression
MLmachine learning
NBSnewborn screening
RRlogistic ridge regression
SVMsupport vector machine
UKHDHeidelberg University Hospital

References

  1. Mütze, U.; Garbade, S.; Gramer, G.; Lindner, M.; Freisinger, P.; Grünert, S.C.; Hennermann, J.; Ensenauer, R.; Thimm, E.; Zirnbauer, J.; et al. Long-Term Outcomes of Individuals with Metabolic Diseases Identified Through Newborn Screening. Pediatrics 2020, 146, e20200444. [Google Scholar] [CrossRef] [PubMed]
  2. Boy, N.; Mengler, K.; Thimm, E.; Schiergens, K.; Marquardt, T.; Weinhold, N.; Marquardt, I.; Das, A.; Freisinger, P.; Grünert, S.; et al. Newborn screening: A disease-changing intervention for glutaric aciduria type 1. Ann. Neurol. 2018, 83, 970–979. [Google Scholar] [CrossRef] [PubMed]
  3. Strauss, K.; Puffenberger, E.; Robinson, D.; Morton, D. Type I glutaric aciduria, part 1: Natural history of 77 patients. Am. J. Med. Genet. Part C Semin. Med. Genet. 2003, 121C, 38–52. [Google Scholar] [CrossRef] [PubMed]
  4. Rockman-Greenberg, C.; Prasad, A.; Dilling, L.; Thompson, J.; Haworth, J.; Martin, B.; Wood-Steiman, P.; Seargeant, L.; Seifert, B.; Booth, F.; et al. Outcome of the First 3-Years of a DNA-Based Neonatal Screening Program for Glutaric Acidemia Type 1 in Manitoba and Northwestern Ontario, Canada. Mol. Genet. Metab. 2002, 75, 70–78. [Google Scholar] [CrossRef] [PubMed]
  5. van der Watt, G.; Owen, E.; Berman, P.; Meldau, S.; Watermeyer, N.; Olpin, S.; Manning, N.; Baumgarten, I.; Leisegang, F.; Henderson, H. Glutaric aciduria type 1 in South Africa-high incidence of glutaryl-CoA dehydrogenase deficiency in black South Africans. Mol. Genet. Metab. 2010, 101, 178–182. [Google Scholar] [CrossRef] [PubMed]
  6. Kölker, S.; Christensen, E.; Leonard, J.V.; Greenberg, C.R.; Boneh, A.; Burlina, A.B.; Burlina, A.P.; Dixon, M.; Duran, M.; García Cazorla, A.; et al. Diagnosis and management of glutaric aciduria type I—Revised recommendations. J. Inherit. Metab. Dis. 2011, 34, 677–694. [Google Scholar] [CrossRef] [PubMed]
  7. Heringer, J.; Valayannopoulos, V.; Lund, A.; Wijburg, F.; Freisinger, P.; Barić, I.; Baumgartner, M.; Burgard, P.; Burlina, A.; Chapman, K.; et al. Impact of age at onset and newborn screening on outcome in organic acidurias. J. Inherit. Metab. Dis. 2016, 39. [Google Scholar] [CrossRef] [PubMed]
  8. Boy, N.; Mengler, K.; Heringer-Seifert, J.; Hoffmann, G.; Garbade, S.; Kölker, S. Impact of newborn screening and quality of therapy on the neurological outcome in glutaric aciduria type 1: A meta-analysis. Genet. Med. 2021, 23, 13–21. [Google Scholar] [CrossRef] [PubMed]
  9. Kölker, S.; Christensen, E.; Leonard, J.; Rockman-Greenberg, C.; Burlina, A.; Burlina, A.; Dixon, M.; Duran, M.; Goodman, S.; Koeller, D.; et al. Guideline for the diagnosis and management of glutaryl-CoA dehydrogenase deficiency (glutaric aciduria type I). J. Inherit. Metab. Dis. 2007, 30, 5–22. [Google Scholar] [CrossRef] [PubMed]
  10. Boy, N.; Mühlhausen, C.; Maier, E.M.; Heringer, J.; Assmann, B.; Burgard, P.; Dixon, M.; Fleissner, S.; Greenberg, C.R.; Harting, I.; et al. Proposed recommendations for diagnosing and managing individuals with glutaric aciduria type I: Second revision. J. Inherit. Metab. Dis. 2017, 40, 75–101. [Google Scholar] [CrossRef]
  11. Heringer, J.; Boy, N.; Ensenauer, R.; Assmann, B.; Zschocke, J.; Harting, I.; Lücke, T.; Maier, E.; Mühlhausen, C.; Haege, G.; et al. Use of Guidelines Improves the Neurological Outcome in Glutaric Aciduria Type I. Ann. Neurol. 2010, 68, 743–752. [Google Scholar] [CrossRef] [PubMed]
  12. Boy, N.; Mühlhausen, C.; Maier, E.M.; Ballhausen, D.; Baumgartner, M.R.; Beblo, S.; Burgard, P.; Chapman, K.A.; Dobbelaere, D.; Heringer-Seifert, J.; et al. Recommendations for diagnosing and managing individuals with glutaric aciduria type 1: Third revision. J. Inherit. Metab. Dis. 2023, 46, 482–519. [Google Scholar] [CrossRef] [PubMed]
  13. Baric, I.; Wagner, L.; Feyh, P.; Liesert, M.; Buckel, W.; Hoffmann, G. Sensitivity and specificity of free and total glutaric acid and 3-hydroxyglutaric acid measurements by stable-isotope dilution assays for the diagnosis of glutaric aciduria type I. J. Inherit. Metab. Dis. 1999, 22, 867–882. [Google Scholar] [CrossRef] [PubMed]
  14. Spenger, J.; Maier, E.M.; Wechselberger, K.F.; Bauder, F.; Kocher, M.; Sperl, W.; Preisel, M.; Schiergens, K.A.; Konstantopoulou, V.; Röschinger, W.; et al. Glutaric Aciduria Type I Missed by Newborn Screening: Report of Four Cases from Three Families. Int. J. Neonatal Screen. 2021, 7, 32. [Google Scholar] [CrossRef]
  15. Guenzel, A.; Hall, P.; Scott, A.; Lam, C.; Chang, I.; Thies, J.; Ferreira, C.; Pichurin, P.; Laxen, W.; Raymond, K.; et al. The low excretor phenotype of glutaric acidemia type I is a source of false negative newborn screening results and challenging diagnoses. JIMD Rep. 2021, 60, 67–74. [Google Scholar] [CrossRef] [PubMed]
  16. Hennermann, J.B.; Roloff, S.; Gellermann, J.; Grüters, A.; Klein, J. False-positive newborn screening mimicking glutaric aciduria type I in infants with renal insufficiency. J. Inherit. Metab. Dis. 2009, 32, 355–359. [Google Scholar] [CrossRef]
  17. Matsumoto, M.; Awano, H.; Bo, R.; Nagai, M.; Tomioka, K.; Nishiyama, M.; Ninchouji, T.; Nagase, H.; Yagi, M.; Morioka, I.; et al. Renal insufficiency mimicking glutaric acidemia type 1 on newborn screening. Pediatr. Int. 2018, 60, 67–69. [Google Scholar] [CrossRef] [PubMed]
  18. Monostori, P.; Klinke, G.; Richter, S.; Barath, A.; Fingerhut, R.; Baumgartner, M.R.; Kölker, S.; Hoffmann, G.F.; Gramer, G.; Okun, J.G. Simultaneous determination of 3-hydroxypropionic acid, methylmalonic acid and methylcitric acid in dried blood spots: Second-tier LC-MS/MS assay for newborn screening of propionic acidemia, methylmalonic acidemias and combined remethylation disorders. PLoS ONE 2017, 12, e0184897. [Google Scholar] [CrossRef]
  19. Murko, S.; Aseman, A.D.; Reinhardt, F.; Gramer, G.; Okun, J.G.; Mütze, U.; Santer, R. Neonatal screening for isovaleric aciduria: Reducing the increasingly high false-positive rate in Germany. JIMD Rep. 2023, 64, 114–120. [Google Scholar] [CrossRef]
  20. Sommerburg, O.; Hammermann, J.; Lindner, M.; Stahl, M.; Muckenthaler, M.; Kohlmueller, D.; Happich, M.; Kulozik, A.E.; Stopsack, M.; Gahr, M.; et al. Five years of experience with biochemical cystic fibrosis newborn screening based on IRT/PAP in Germany. Pediatr. Pulmonol. 2015, 50, 655–664. [Google Scholar] [CrossRef]
  21. Zaunseder, E.; Haupt, S.; Mütze, U.; Garbade, S.; Kölker, S.; Heuveline, V. Opportunities and challenges in machine learning-based newborn screening—A systematic literature review. JIMD Rep. 2022, 63, 250–261. [Google Scholar] [CrossRef]
  22. Zaunseder, E.; Mütze, U.; Garbade, S.F.; Haupt, S.; Kölker, S.; Heuveline, V. Deep Learning and Explainable Artificial Intelligence for Improving Specificity and Detecting Metabolic Patterns in Newborn Screening. In Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 5–8 December 2023; pp. 1566–1571. [Google Scholar] [CrossRef]
  23. Peng, G.; Tang, Y.; Cowan, T.; Enns, G.; Zhao, H.; Scharfe, C. Reducing False-Positive Results in Newborn Screening Using Machine Learning. Int. J. Neonatal Screen. 2020, 6, 16. [Google Scholar] [CrossRef]
  24. Baumgartner, C.; Baumgartner, D. Biomarker Discovery, Disease Classification, and Similarity Query Processing on High-Throughput MS/MS Data of Inborn Errors of Metabolism. J. Biomol. Screen. 2006, 11, 90–99. [Google Scholar] [CrossRef]
  25. Zaunseder, E.; Mütze, U.; Garbade, S.F.; Haupt, S.; Feyh, P.; Hoffmann, G.F.; Heuveline, V.; Kölker, S. Machine Learning Methods Improve Specificity in Newborn Screening for Isovaleric Aciduria. Metabolites 2023, 13, 304. [Google Scholar] [CrossRef]
  26. Girden, E.R. ANOVA: Repeated measures; Number 84 in 1; Sage: Thousand Oaks, CA, USA, 1992. [Google Scholar]
  27. Hosmer, D.; Lemeshow, S. Introduction to the Logistic Regression Model; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2000; Chapter 1; pp. 1–30. [Google Scholar] [CrossRef]
  28. Flach, P. Machine Learning: The Art and Science of Algorithms That Make Sense of Data; Cambridge University Press: Cambridge, UK, 2012; pp. 215–286. [Google Scholar]
  29. Le Cessie, S.; Van Houwelingen, J. Ridge Estimators in Logistic Regression. J. R. Stat. Society. Ser. C (Appl. Stat.) 1992, 41, 191–201. [Google Scholar] [CrossRef]
  30. Van den Bulcke, T.; Vanden Broucke, P.; Van Hoof, V.; Wouters, K.; Broucke, S.V.; Smits, G.; Smits, E.; Proesmans, S.; Genechten, T.V.; Eyskens, F. Data Mining Methods for Classification of Medium-Chain Acyl-CoA Dehydrogenase Deficiency (MCADD) Using Non-Derivatized Tandem MS Neonatal Screening Data. J. Biomed. Inform. 2011, 44, 319–325. [Google Scholar] [CrossRef]
  31. Šinkovec, H.; Heinze, G.; Blagus, R.; Geroldinger, A. To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Med Res. Methodol. 2021, 21, 199. [Google Scholar] [CrossRef]
  32. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  33. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
  34. Malvagia, S.; Forni, G.; Ombrone, D.; La Marca, G. Development of Strategies to Decrease False Positive Results in Newborn Screening. Int. J. Neonatal Screen. 2020, 6, 84. [Google Scholar] [CrossRef]
  35. Lüders, A.; Blankenstein, O.; Brockow, I.; Ensenauer, R.; Lindner, M.; Schulze, A.; Nennstiel, U. Neonatal Screening for Congenital Metabolic and Endocrine Disorders. Dtsch. Arztebl. Int. 2021, 118, 101–108. [Google Scholar] [CrossRef] [PubMed]
  36. Lin, B.; Yin, J.; Shu, Q.; Deng, S.; Li, Y.; Jiang, P.; Yang, R.; Pu, C. Integration of Machine Learning Techniques as Auxiliary Diagnosis of Inherited Metabolic Disorders: Promising Experience with Newborn Screening Data. Collab. Comput. Netw. Appl. Work. 2019, 292, 334–349. [Google Scholar] [CrossRef]
  37. Cabitza, F.; Campagner, A.; Soares, F.; García de Guadiana-Romualdo, L.; Challa, F.; Sulejmani, A.; Seghezzi, M.; Carobene, A. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput. Methods Programs Biomed. 2021, 208, 106288. [Google Scholar] [CrossRef]
  38. Waisbren, S.E.; Albers, S.; Amato, S.; Ampola, M.; Brewster, T.G.; Demmer, L.; Eaton, R.B.; Greenstein, R.; Korson, M.; Larson, C.; et al. Effect of Expanded Newborn Screening for Biochemical Genetic Disorders on Child Outcomes and Parental Stress. JAMA 2003, 290, 2564–2572. [Google Scholar] [CrossRef] [PubMed]
  39. Chace, D.; Kalas, T.; Naylor, E. Use of Tandem Mass Spectrometry for Multianalyte Screening of Dried Blood Specimens from Newborns. Clin. Chem. 2003, 49, 1797–1817. [Google Scholar] [CrossRef]
  40. Mørkrid, L.; Rowe, A.D.; Elgstoen, K.B.P.; Olesen, J.H.; Ruijter, G.; Hall, P.L.; Tortorelli, S.; Schulze, A.; Kyriakopoulou, L.; Wamelink, M.M.C.; et al. Continuous Age- and Sex-Adjusted Reference Intervals of Urinary Markers for Cerebral Creatine Deficiency Syndromes: A Novel Approach to the Definition of Reference Intervals. Clin. Chem. 2015, 61, 760–768. [Google Scholar] [CrossRef] [PubMed]
  41. Zhang, Z.C.; Zhao, X.; Dong, G.; Zhao, X.M. Improving Alzheimer’s Disease Diagnosis With Multi-Modal PET Embedding Features by a 3D Multi-Task MLP-Mixer Neural Network. IEEE J. Biomed. Health Inform. 2023, 27, 4040–4051. [Google Scholar] [CrossRef] [PubMed]
  42. Xgboost Developers. XGBoost. Available online: https://pypi.org/project/xgboost/ (accessed on 12 March 2023).
  43. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 3146–3154. [Google Scholar]
  44. Antoniadi, A.M.; Du, Y.; Guendouz, Y.; Wei, L.; Mazo, C.; Becker, B.A.; Mooney, C. Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Appl. Sci. 2021, 11, 5088. [Google Scholar] [CrossRef]
  45. Ghanvatkar, S.; Rajan, V. Evaluating Explanations From AI Algorithms for Clinical Decision-Making: A Social Science-Based Approach. IEEE J. Biomed. Health Inform. 2024, 28, 4269–4280. [Google Scholar] [CrossRef]
  46. Budde, K.; Dasch, T.; Kirchner, E.; Ohliger, U.; Schapranow, M.; Schmidt, T.; Schwerk, A.; Thoms, J.; Zahn, T.; Hiltawsky, K. Künstliche Intelligenz: Patienten im Fokus. Dtsch. Ärzteblatt 2020, 117, A2407. Available online: https://www.aerzteblatt.de/archiv/216998/Kuenstliche-Intelligenz-Patienten-im-Fokus (accessed on 12 March 2023).
  47. Arnold, M. Teasing out Artificial Intelligence in Medicine: An Ethical Critique of Artificial Intelligence and Machine Learning in Medicine. J. Bioethical Inq. 2021, 18, 121–139. [Google Scholar] [CrossRef] [PubMed]
  48. Zaunseder, E.; Mütze, U.; Okun, J.G.; Hoffmann, G.F.; Kölker, S.; Heuveline, V.; Thiele, I. Personalized metabolic whole-body models for newborns and infants predict growth and biomarkers of inherited metabolic diseases. Cell Metab. 2024, 36, 1882–1897.e7. [Google Scholar] [CrossRef] [PubMed]
Figure 1. False-positive newborn screening results for GA1. Sex-specific differences in false-positive newborn screening results for GA1 from 2014 to 2021.
Figure 1. False-positive newborn screening results for GA1. Sex-specific differences in false-positive newborn screening results for GA1 from 2014 to 2021.
Ijns 10 00083 g001
Figure 2. Further analysis of 485 false-positive screening results in GA1. (A) Reports of kidney insufficiency (n = 10) and transfusion, medication, or sepsis (n = 23) in false-positive newborn screening results for GA1. (B) Evaluation of urinary 3-OH-GA analysis in false-positive newborn screening results for GA1 including 34 (7%) newborns with elevated 3-OH-GA.
Figure 2. Further analysis of 485 false-positive screening results in GA1. (A) Reports of kidney insufficiency (n = 10) and transfusion, medication, or sepsis (n = 23) in false-positive newborn screening results for GA1. (B) Evaluation of urinary 3-OH-GA analysis in false-positive newborn screening results for GA1 including 34 (7%) newborns with elevated 3-OH-GA.
Ijns 10 00083 g002
Figure 3. Detailed analysis of classification by LR algorithm of subgroups of false-positive newborn screening results in the training and validation data set (n = 485) and the test data set (n = 235) for GA1 for newborns with kidney insufficiency (n = 10 patients with kidney insufficiency in the training and validation data set, n = 3 patients with kidney insufficiency in the test data set).
Figure 3. Detailed analysis of classification by LR algorithm of subgroups of false-positive newborn screening results in the training and validation data set (n = 485) and the test data set (n = 235) for GA1 for newborns with kidney insufficiency (n = 10 patients with kidney insufficiency in the training and validation data set, n = 3 patients with kidney insufficiency in the test data set).
Ijns 10 00083 g003
Figure 4. Detailed analysis of classification by LR algorithm of subgroups of false-positive newborn screening results in the training and validation data set (n = 485) and the test data set (n = 235) for GA1. Evaluation for newborns with elevated urinary 3-OH-GA (n = 34) in the training and validation data set and patients with with elevated urinary 3-OH-GA (n = 5) in the test data set).
Figure 4. Detailed analysis of classification by LR algorithm of subgroups of false-positive newborn screening results in the training and validation data set (n = 485) and the test data set (n = 235) for GA1. Evaluation for newborns with elevated urinary 3-OH-GA (n = 34) in the training and validation data set and patients with with elevated urinary 3-OH-GA (n = 5) in the test data set).
Ijns 10 00083 g004
Table 1. ANOVA results with presented features that have a p-value p < 0.05 , showing the mean ( μ ) and standard deviation ( σ ) of these features in the normal and GA1 NBS profiles. All methods were applied to the full newborn screening (NBS) data set (A) and the suspected diagnosis data set (B). Showing the five features with the largest F values with binary target variable either normal or GA1. Abbreviations: C10—decanoylcarnitine, C12—dodecanoylcarnitine, C14:1—tetradecenoylcarnitine, C18:1—octadecenoylcarnitine, C5—isovalerylcarnitine, C8—octanoylcarnitine, Hci—homocitrulline, Glut—glutarylcarnitine, Glu—glutamic acid.
Table 1. ANOVA results with presented features that have a p-value p < 0.05 , showing the mean ( μ ) and standard deviation ( σ ) of these features in the normal and GA1 NBS profiles. All methods were applied to the full newborn screening (NBS) data set (A) and the suspected diagnosis data set (B). Showing the five features with the largest F values with binary target variable either normal or GA1. Abbreviations: C10—decanoylcarnitine, C12—dodecanoylcarnitine, C14:1—tetradecenoylcarnitine, C18:1—octadecenoylcarnitine, C5—isovalerylcarnitine, C8—octanoylcarnitine, Hci—homocitrulline, Glut—glutarylcarnitine, Glu—glutamic acid.
(A) Full NBS Data Set(B) Suspected Diagnosis Data Set
FeatureNormal
( μ ± σ )
GA1
( μ ± σ )
F ValueFeatureNormal
( μ ± σ )
GA1
( μ ± σ )
F Value
Glut0.2 ± 0.12.7 ± 1.517,270.68Glut0.5 ± 0.12.7 ± 1.5757.7
Hci1.6 ± 0.82.7 ± 1.417.4C100.2 ± 0.10.1 ± 010.1
C50.1 ± 0.10.2 ± 0.113.2C14:10.3 ± 0.10.1 ± 0.15.7
Glu400 ± 105524 ± 10812.6C80.2 ± 0.10.1 ± 0.15.6
C18:11 ± 0.31.2 ± 0.410.1C120.3 ± 0.10.1 ± 0.15.3
Table 2. Machine learning classification results for GA1 compared from (A) traditional screening results, (B) the full data set, (C) the suspected diagnosis data set, and (D) algorithms optimized for 100% sensitivity on the suspected diagnosis data set. The best-performing features were selected with ANOVA. The methods were evaluated by sensitivity S n and specificity S p calculated from the mean results of ten repeats of 5-fold cross-validation (CV), as well as the number of false positives (FP), false negatives (FN), true negatives (TN), and true positives (TP) (real numbers are rounded up). For (A,B) the train and validation set consists of 1,025,944 normal and 9 GA1 profiles and the test set consists of 257,414 normal and 2 GA1 profiles. For (C,D) the train and validation set consists of 485 normal and 9 GA1 profiles and the test set consists of 235 normal and 2 GA1 profiles. For (C,D), the specificity and sensitivity were calculated based on the full data set to allow comparability between data sets. Abbreviations: C10—decanoylcarnitine, C14:1—tetradecenoylcarnitine, Glut—glutarylcarnitine.
Table 2. Machine learning classification results for GA1 compared from (A) traditional screening results, (B) the full data set, (C) the suspected diagnosis data set, and (D) algorithms optimized for 100% sensitivity on the suspected diagnosis data set. The best-performing features were selected with ANOVA. The methods were evaluated by sensitivity S n and specificity S p calculated from the mean results of ten repeats of 5-fold cross-validation (CV), as well as the number of false positives (FP), false negatives (FN), true negatives (TN), and true positives (TP) (real numbers are rounded up). For (A,B) the train and validation set consists of 1,025,944 normal and 9 GA1 profiles and the test set consists of 257,414 normal and 2 GA1 profiles. For (C,D) the train and validation set consists of 485 normal and 9 GA1 profiles and the test set consists of 235 normal and 2 GA1 profiles. For (C,D), the specificity and sensitivity were calculated based on the full data set to allow comparability between data sets. Abbreviations: C10—decanoylcarnitine, C14:1—tetradecenoylcarnitine, Glut—glutarylcarnitine.
MethodFeaturesTrain + Validation SetCVTest Set
FNFPTNTP S n (%) S p (%)FNFPTNTP
(A) TRADITIONAL NEWBORN SCREENING
NBSGlut04851.03 M910099.952702350.26 M2
(B) FULL DATA SET
LRGlut, C100311.03 M999.1199.9960160.26 M2
RRGlut, C10, C14:137351.03M692.4499.251710.26 M1
SVMGlut, C100871.03 M992.4499.9990230.26 M2
(C) SUSPECTED DIAGNOSIS DATA SET
LRGlut, C10024461986.6799.9990182172
RRGlut, C10069416984.6799.9970352002
SVMGlut, C10033452990.8999.9980202152
(D) SUSPECTED DIAGNOSIS DATA SET OPTIMIZED ( 100 % S n )
LR-100Glut, C100147338910099.98901151202
RR-100Glut, C100235250910099.9810146892
SVM-100Glut, C100164321910099.98701231122
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zaunseder, E.; Teinert, J.; Boy, N.; Garbade, S.F.; Haupt, S.; Feyh, P.; Hoffmann, G.F.; Kölker, S.; Mütze, U.; Heuveline, V. Digital-Tier Strategy Improves Newborn Screening for Glutaric Aciduria Type 1. Int. J. Neonatal Screen. 2024, 10, 83. https://doi.org/10.3390/ijns10040083

AMA Style

Zaunseder E, Teinert J, Boy N, Garbade SF, Haupt S, Feyh P, Hoffmann GF, Kölker S, Mütze U, Heuveline V. Digital-Tier Strategy Improves Newborn Screening for Glutaric Aciduria Type 1. International Journal of Neonatal Screening. 2024; 10(4):83. https://doi.org/10.3390/ijns10040083

Chicago/Turabian Style

Zaunseder, Elaine, Julian Teinert, Nikolas Boy, Sven F. Garbade, Saskia Haupt, Patrik Feyh, Georg F. Hoffmann, Stefan Kölker, Ulrike Mütze, and Vincent Heuveline. 2024. "Digital-Tier Strategy Improves Newborn Screening for Glutaric Aciduria Type 1" International Journal of Neonatal Screening 10, no. 4: 83. https://doi.org/10.3390/ijns10040083

APA Style

Zaunseder, E., Teinert, J., Boy, N., Garbade, S. F., Haupt, S., Feyh, P., Hoffmann, G. F., Kölker, S., Mütze, U., & Heuveline, V. (2024). Digital-Tier Strategy Improves Newborn Screening for Glutaric Aciduria Type 1. International Journal of Neonatal Screening, 10(4), 83. https://doi.org/10.3390/ijns10040083

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop