1. Introduction
Focal segmental glomerulosclerosis (FSGS) is a complex and heterogeneous glomerular disease that poses significant challenges in diagnosis, treatment, and long-term management. Characterized by scarring of the kidney’s filtering units, FSGS can lead to progressive kidney dysfunction and, if left untreated, may result in end-stage renal disease. The disease’s multifactorial nature, coupled with its variable clinical presentations, makes it particularly difficult to manage effectively without personalized treatment plans. Traditional clinical decision-making in FSGS often relies on a combination of patient history, laboratory results, and expert judgment, but such approaches can be prone to inconsistencies and delayed interventions.
To address these challenges, the integration of machine learning (ML) and expert systems offers promising solutions in nephrology. In particular, the development of modular decision support systems can significantly enhance the ability to predict disease progression, recommend tailored treatment strategies, and monitor patient outcomes in real time. By leveraging large volumes of clinical data—ranging from laboratory test results to biopsy findings—these systems can assist clinicians in making evidence-based decisions that improve patient care and outcomes.
This paper focuses on a specific module within a larger modular expert system designed for nephrology: the decision support module for focal segmental glomerulosclerosis. The system was given the working name FSGS Nephro Decision Support System (FNDSS). The module utilizes AI-driven models and expert-guided algorithms to streamline the management of FSGS. By integrating clinical data, patient-specific biomarkers, and dynamic treatment protocols, it enables timely risk stratification, precise treatment recommendations, and continuous monitoring of treatment efficacy. This approach not only helps in managing the disease more effectively but also contributes to a more personalized, patient-centered care model.
The following sections delve into the design and functionality of the FSGS decision support module, exploring its key components, including risk classification, treatment guidance, and response monitoring. Through a detailed analysis of these elements, we aim to demonstrate how such a system can significantly enhance the clinical management of FSGS, ultimately leading to better patient outcomes and more efficient healthcare delivery.
The contributions of this work are twofold. First, we present the design and implementation of the FSGS Nephro Decision Support System, highlighting its innovative approach to integrating machine learning models with expert algorithms to support clinical decision-making in FSGS. Second, we provide an in-depth evaluation of the system’s performance in predicting disease progression, guiding treatment strategies, and monitoring patient outcomes, demonstrating its potential to improve clinical practice in nephrology. By leveraging advanced ML techniques and personalized data, this work paves the way for more efficient, precise, and individualized care for patients suffering from FSGS.
2. State of the Art and Related Works
Recent advancements in nephrology have demonstrated the increasing role of expert-guided systems, machine learning models, and artificial intelligence in enhancing diagnostic accuracy, predicting outcomes, and supporting clinical decision-making. A notable application of rule-based AI systems, particularly those utilizing fuzzy logic, is in predicting chronic kidney disease (CKD). For instance, one such system achieved high accuracy (92.13%) and sensitivity (95.37%) in predicting CKD, showcasing the potential of AI to improve early diagnosis and patient outcomes [
1].
Other studies have leveraged machine learning to uncover key biomarkers and predictors in focal segmental glomerulosclerosis. One study applied ML to plasma metabolomic profiling, identifying dysmetabolism in the sphingomyelin–ceramide axis and plasmalogen metabolites as markers for FSGS. ML models, including logistic regression and random forests, were used to stratify these biomarkers based on CKD causes [
2].
Another study combined clinical, genetic, and pathology data using ridge regression to predict FSGS outcomes. ML models showed excellent discrimination (iAUC = 0.95) and identified risk and protective factors, such as high-risk APOL1 genotype and serum albumin levels [
3]. Further research identified NR4A1 and DUSP1 as immune-related biomarkers through gene expression profiling and ML, revealing insights into the immune mechanisms behind FSGS [
4].
A separate study utilized ML to analyze histopathologic features from biopsies, uncovering novel descriptors predictive of outcomes in MCD/FSGS. This highlights ML’s role in improving biopsy reporting and prognosis prediction [
5]. Finally, a study discovered ApoA-Ib, a misprocessed form of ApoA-I, as a potential urinary biomarker for recurrent FSGS, offering new molecular insights [
6].
Machine learning models have also been employed to predict the progression of kidney diseases such as diabetic kidney disease (DKD) and membranous nephropathy. For instance, a machine learning risk score derived from biomarker data and electronic health records demonstrated strong predictive power for the progression of DKD, aiding in early intervention and personalized treatment plans [
7]. Similarly, a fuzzy expert system for diagnosing primary membranous nephropathy showed a high sensitivity of 98%, accuracy of 97.8%, and an area under the curve (AUC) of 0.93, suggesting its robustness in clinical settings [
8].
The integration of ML with expert systems (AL) has proven particularly effective in predicting kidney disease progression. For example, a study combining fuzzy logic with AI algorithms to predict CKD using clinical indicators such as age, blood pressure, and serum creatinine levels yielded favorable diagnostic outcomes [
9]. In addition, fuzzy logic-based clinical decision support systems (CDSSs) have been explored in post-transplant renal function monitoring, achieving over 90% accuracy in assessing renal health and optimizing drug dosages [
10]. The application of such expert-guided AI (AL) solutions in analyzing complex medical data is further demonstrated by a predictive model for kidney disease based on symptoms, where a fuzzy soft expert system (AL) showed high reliability and efficiency, emphasizing its potential for early detection and treatment optimization [
11]. Meanwhile, advancements in deep learning (DL) have led to automated systems for detecting kidney stones and other nephrological conditions from medical imaging, providing clinicians with a powerful diagnostic tool [
12].
Similarly, Ref. [
13] discusses AI applications in dialysis, covering areas such as intradialytic hypotension prediction, anemia management, and treatment optimization. Nevertheless, challenges such as data privacy and model interpretability must be addressed for successful implementation. In CKD management, Ref. [
14] explores AI’s applications in continuous kidney replacement therapy (CKRT), emphasizing the importance of accurate data handling, ethical considerations, and prospective validation. Moreover, Ref. [
15] advocates cautious optimism for AI in acute kidney injury (AKI) care, highlighting the need for rigorous evaluation and unbiased model development to ensure clinical effectiveness.
The integration of artificial intelligence and machine learning in nephrology diagnostics is further evidenced by studies such as [
16], which address challenges in developing robust risk prediction models. These challenges include the need for high-quality data and comprehensive performance measurement strategies. Additionally, Ref. [
17] highlights the potential biases in AI-driven clinical decisions, proposing strategies to mitigate these biases to ensure fair and equitable AI utilization. In chronic kidney disease (CKD) detection, studies such as [
7,
18] employ ML models to predict disease progression and enhance risk stratification. For instance, Ref. [
19] demonstrates the application of ML in predicting tacrolimus blood concentration, underscoring its potential in personalized medication management.
Furthermore, ML techniques such as support vector machines (SVMs) and ensemble methods have shown significant promise in early CKD detection. Studies like [
20,
21,
22,
23] emphasize the role of these methods in improving diagnostic accuracy, enabling timely intervention and better patient outcomes. The integration of these models into clinical practice marks a transformative shift in nephrology care. Accurate measurements are pivotal to the success of predictive models in biomedicine. These measurements are vital for predicting and analyzing laboratory test results in nephrology [
24,
25,
26], as well as for broader applications in biomedicine, including spatial modeling and EEG analysis [
27,
28,
29]. Such data also support decision-making in kidney diseases [
30,
31,
32].
Challenges in the development of predictive models extend beyond diagnostics. Issues such as modeling complexity [
33], parameter estimation [
34], and healthcare monitoring support systems [
35] underscore the need for ongoing advancements in ML methodologies. Addressing these challenges is essential to ensure the successful integration of these technologies into clinical workflows and their reliable application in nephrology.
The study [
36] employs ML models to predict short-term prognosis for severe acute kidney injury (AKI) patients undergoing prolonged intermittent renal replacement therapy (PIRRT). By analyzing 493 hospitalized AKI patients, the study identifies key factors, such as electrolyte levels and comorbidities, which influence survival and renal recovery. Various ML algorithms, including Naive Bayes, random forest, and K-nearest neighbors, effectively predicted these outcomes, underscoring the importance of electrolyte management for improving prognosis. Additionally, the work [
37] introduces a multiple linear regression model using Sugeno’s fuzzy inference system, a type of rule-based AI, which outperforms traditional methods, demonstrating superior performance even with limited datasets. The review also highlights the use of health–disease phase diagrams (HDPDs) for precision medicine, utilizing AI techniques to visualize disease onset probabilities based on biomarkers. HDPDs are identified as a powerful tool for identifying intervention targets and preventing disease onset in many cases [
38]. Furthermore, CKD.Net, a hybrid model combining S-MTL, SimpleRNN, and MLP, demonstrates its ability to predict chronic kidney disease (CKD) stages with remarkable accuracy (99.2–99.8%) and represents a step forward in real-time, non-invasive diagnosis in clinical practice [
39]. This highlights the importance of Artificial Neural Network (ANN) applications.
AI also features prominently in medical imaging and diagnostic tools. For example, one study [
40] uses ultrasonography to measure kidney volume in children, outperforming traditional methods. Other studies explore ML’s role in detecting biomarkers for Papillary Renal Cell Carcinoma (PRCC) [
41] and predicting complications in diabetic kidney disease [
42]. The integration of AI with urinalysis for disease diagnosis and treatment is also discussed [
43], emphasizing its revolutionary impact on healthcare. Additionally, predictive models for AKI using ML highlight the importance of considering baseline serum creatinine (sCr) levels, as performance varies with different estimation methods [
44]. Another study [
45] integrates deep learning with 1D-CNNs and LSTM for diagnosing Pancreatic Ductal Adenocarcinoma (PDAC), achieving high accuracy (97%) and AUC (98%) using urine proteomic biomarkers. Similarly, ML models, particularly XGBoost, have been applied to predict end-stage renal disease (ESRD) risk in type 2 diabetes patients using clinical data [
46].
AI’s capacity to predict postoperative acute kidney injury (AKI) after cardiothoracic surgery using recurrent neural networks (RNNs), a subset of ANNs, is also explored in [
47], demonstrating superior prediction accuracy (AUC of 0.893) compared to clinicians. The integration of such AI models into electronic health records (EHRs) can facilitate real-time patient monitoring and early intervention. The study [
48] further underscores the importance of accurate diagnostic information in improving patient outcomes and reducing healthcare costs. An ML model predicting 5-year kidney transplant survival achieved an AUC of 89.7%, showcasing its potential for early detection of graft status [
49].
The application of AI continues to expand with automated systems for diagnosing kidney stones from CT images, marking significant advancements in AI-driven medical imaging interpretation [
50]. Moreover, studies [
51,
52] explore smartphone-based systems for diagnosing microalbuminuria and quantifying albuminuria, demonstrating high accuracy across various conditions.
The integration of AI with various healthcare domains is also explored in [
53,
54]. These studies demonstrate how AI optimizes resource allocation and improves kidney disease diagnosis through advanced algorithms and models. Electrochemical energy mechanisms for early kidney failure detection are explored in [
55], showcasing AI’s role in streamlining data analysis and enhancing diagnostic accuracy for preemptive interventions. The literature also emphasizes AI’s potential in supporting clinicians in diagnosing, prognosticating, and treating kidney diseases, and stresses the need for further advancements in AI to address the significant burdens posed by acute kidney injury and chronic kidney disease [
56,
57]. Several studies [
58,
59,
60] highlight the growing use of AI in diagnosing kidney diseases, especially AKI and CKD, employing various approaches such as machine learning ensembles, deep learning, and federated learning. It is worth mentioning that medicine, and especially nephrology, are not the only applications of AI and ML. Other applications of these engineering solutions are presented in [
61,
62,
63,
64,
65].
A compact comparative overview of AI applications in nephrology is presented in tabular form (see
Table 1). In sum, artificial intelligence is transforming nephrology by providing more accurate diagnostic tools, predictive models, and decision support systems. As AI continues to evolve, its integration into nephrology promises to improve patient care, enhance diagnostic capabilities, and streamline healthcare practices, marking a significant advancement in the field. These works collectively illustrate the growing influence of expert systems, fuzzy logic, and machine learning in nephrology, with notable applications in risk prediction, diagnosis, and treatment planning. However, they also highlight the need for rigorous validation, ethical considerations, and unbiased model development to ensure the effective and equitable application of AI and ML in nephrology.
4. Research and Project Aimed at Developing Modules of a Classification and Expert System
The expert system for FSGS (FNDSS) is modular, guiding clinicians through key steps of patient management. It incorporates structured protocols for diagnosis, treatment initiation, evaluation, and follow-up, ensuring adherence to clinical standards while allowing flexibility for individual cases. The core modules are described below:
Diagnosis: This module encompasses the initial steps for confirming FSGS and assessing its severity, including:
- -
Patient history: systematic collection of clinical data to identify risk factors, secondary causes, and symptoms suggestive of FSGS.
- -
Diagnostic tests: recommendations for laboratory investigations and imaging studies, including assessments of kidney function.
- -
Kidney biopsy: a decision-making pathway for interpretation of histopathological findings.
- -
Risk assessment: evaluation of disease progression risk based on clinical indicators.
Management and treatment: This module provides structured guidance for patient care based on disease severity and clinical characteristics:
- -
Basic management and treatment: emphasizes nephroprotective strategies, including lifestyle modifications, dietary adjustments, and pharmacological interventions.
- -
Induction therapy with periodic evaluation: proposes immunosuppressive treatments tailored to disease severity, with regular monitoring to assess effectiveness and side effects.
Final classification: at designated intervals, this module evaluates the patient’s response to treatment. Based on clinical markers and outcomes, the system categorizes the disease into remission, partial response, or resistance.
Treatment continuation: this module offers guidance for adjusting or continuing therapy. It includes strategies for maintaining remission, addressing partial responses, and managing relapses or resistance to first-line treatments.
FSGS analysis: This module facilitates an in-depth review of FSGS cases, including analysis of disease patterns, treatment outcomes, and progression trends. It serves as a decision-support tool for complex or atypical cases, ensuring an evidence-based approach.
The expert system’s modular design (see
Figure 1) ensures a clear and logical progression through each stage of patient management. A user-friendly graphical interface enhances its utility, presenting clinicians with interactive decision trees, data visualization tools, and step-by-step guidance for diagnosis, treatment, and follow-up. By standardizing processes and incorporating current clinical evidence, the system aims to optimize outcomes for FSGS patients.
In the following sections, each module is discussed in detail, highlighting its methodology, functionality, and integration within the broader framework of FSGS management.
4.1. Diagnosis of FSGS Module
The diagnosis module of the expert system (see
Figure 2) is designed to provide a structured framework for the accurate and timely identification of focal segmental glomerulosclerosis in patients.
It integrates various diagnostic elements, including clinical assessment, laboratory results, and histopathological evaluation, to ensure comprehensive disease evaluation. The module’s primary function is to assist clinicians in confirming the diagnosis of FSGS, stratifying the disease severity, and identifying any potential secondary causes or associated risk factors:
Patient history: The first step in the diagnosis process involves a thorough patient history to identify clinical factors suggestive of FSGS. This step is essential as it helps establish a baseline understanding of the patient’s overall health and potential underlying conditions that could predispose them to glomerular diseases. Key elements in the history include:
- -
Infections: chronic bacterial or viral infections can lead to glomerulopathy and contribute to FSGS.
- -
Chronic inflammatory diseases: conditions like rheumatoid arthritis can trigger kidney inflammation and glomerular damage.
- -
Autoimmune disorders: diseases such as systemic lupus erythematosus (SLE) are linked to kidney inflammation and nephritis.
- -
Cancer: some cancers are associated with secondary glomerulopathies, either through direct kidney involvement or treatment-related nephrotoxicity.
Diagnostic tests: After gathering a comprehensive patient history, the next critical step involves performing diagnostic tests to confirm the presence of FSGS and assess its severity. The system recommends a set of standard tests to evaluate renal function and detect biomarkers indicative of FSGS. These include:
- -
Kidney function tests: serum creatinine, eGFR, and urea levels to assess kidney function and any decline in filtration capacity.
- -
Proteinuria assessment: quantification of proteinuria, which is a hallmark feature of FSGS, through 24-h urine collection or urine protein-to-creatinine ratio (PCR).
- -
Biomarkers: specific biomarkers such as anti-PLA2R antibodies, which can help differentiate between primary FSGS and secondary forms of glomerulonephritis.
Kidney biopsy: Kidney biopsy remains the gold standard for diagnosing FSGS. In cases of atypical presentation or when non-invasive tests yield inconclusive results, a biopsy allows for direct visualization of glomerular changes. The histopathological evaluation typically reveals characteristics such as segmental sclerosis, foot process effacement, and podocyte injury, which are diagnostic of FSGS. The expert system includes guidelines on biopsy interpretation, helping clinicians differentiate between primary and secondary forms of FSGS based on histopathological features, and determining whether further testing for secondary causes, such as viral infections or autoimmune diseases, is warranted.
Risk assessment: An important aspect of the diagnosis module is the evaluation of the patient’s risk for progression to end-stage renal disease (ESRD) or chronic kidney disease stage 5 (CKD5). The system incorporates clinical factors such as proteinuria levels, eGFR, and other relevant biomarkers to assess the likelihood of rapid disease progression.
The diagnosis module is seamlessly integrated into the broader framework of the expert system, allowing for a smooth progression from initial assessment through to the confirmation of FSGS. One example of the system screens for biopsy decision support is included in
Figure 3. The system’s modular design ensures that each diagnostic step is logically sequenced, with clear decision points that direct the clinician towards the most appropriate diagnostic test or treatment intervention. The user-friendly interface facilitates the clinician’s workflow, providing step-by-step guidance and immediate feedback based on input data.
4.2. Management and Treatment Module
The management and treatment module of the expert system provides clinicians with structured, evidence-based guidance for treating FSGS [
66]. This module is divided into two main sub-modules:
basic management and treatment and
induction therapy with periodic evaluation, each offering specific recommendations tailored to the patient’s clinical needs. An overview of the module’s operating principles is presented in
Figure 4, while detailed recommendations are described later in this section.
The
basic management and treatment sub-module (see
Figure 5) emphasizes nephroprotective strategies, targeting both the underlying pathophysiology of FSGS and associated comorbidities. The system recommends a combination of non-pharmacological and pharmacological interventions, detailed in
Table 3.
For patients requiring immunosuppressive therapy, the induction therapy with periodic evaluation sub-module outlines strategies for induction therapy, along with protocols for regular evaluation. This includes:
Glucocorticoids: recommendations for high-dose glucocorticoid therapy, including dosing schedules, treatment duration, and tapering strategies, are based on clinical response and tolerance.
Calcineurin inhibitors (CNIs): guidance on the use of cyclosporine or tacrolimus, including dose adjustments based on therapeutic drug monitoring (TDM) to minimize nephrotoxicity and optimize efficacy.
Symptomatic treatment: use of diuretics for edema and antihypertensives to manage blood pressure.
Periodic evaluation: monthly monitoring of clinical and laboratory markers, such as proteinuria, renal function, and blood counts, to assess therapeutic response and detect adverse effects.
Table 4 summarizes the key elements of induction therapy.
The management and treatment module integrates clinical guidelines with an interactive decision-support framework, enabling clinicians to tailor treatment strategies to individual patients. By addressing both nephroprotective and immunosuppressive interventions, the module ensures comprehensive care for FSGS patients.
The next part of the article effectively describes the issues related to disease state classification in periodic assessments using AI and ML tools.
4.3. Final Classification After Treatment Module
This section provides a detailed overview of the classification process used to evaluate the final outcomes after six months of treatment for FSGS. This module plays a critical role in categorizing patients into seven distinct outcome classes based on thirteen clinical features. The classification system operates as a Multi-Input Multi-Output (MIMO) framework, with 13 input features informing the assignment to one of the 7 outcome classes. This design ensures a comprehensive evaluation and accurate monitoring of treatment efficacy while guiding subsequent therapeutic decisions.
4.3.1. Input–Output MIMO Framework
The seven distinct outcome classes are defined by clinical markers such as proteinuria, serum albumin, and serum creatinine levels, as well as the patient’s response to treatment. These categories are described in
Table 5. Expert knowledge from the literature [
66] was used to literally describe the thresholds for class membership of feature ranges and then manually divide them for training and testing.
The classification model operates using a MIMO structure. The inputs include clinical data collected at six months post-treatment, such as proteinuria levels, serum albumin concentration, and serum creatinine levels. The output is the assigned outcome category, which reflects the patient’s response to treatment.
Figure 6 illustrates the classification model’s architecture, including the inputs and outcome categories. The classification operates within a structure, where 13 clinical features serve as input variables, and the system generates one of seven output categories. The features encompass critical markers of renal function, disease activity, and therapeutic response. The system relies on predefined clinical thresholds to ensure consistency with nephrology guidelines.
The MIMO structure ensures flexibility and robustness in the classification process. The 13 inputs were evaluated using predefined thresholds from
Table 5, which enabled the manual classification of training and test data for model building. The outputs were mapped to the appropriate output class. The MIMO structure enables the system to handle complex relationships between inputs while providing clear, actionable classifications for clinicians.
The classification system supports clinicians in evaluating treatment efficacy and determining the next steps in patient management. By automating the classification process, the system reduces variability and improves decision-making consistency.
The integration of interactive elements allows clinicians to view detailed explanations of each category and adjust management plans accordingly. Clinicians input patient data, and the system automatically categorizes the patient into one of the seven classes. The results are displayed on an intuitive interface, highlighting the assigned outcome category, relevant input data supporting the classification as well as recommendations for further management based on the classification.
Section 4.3.4 describes the application interface, showcasing the module’s design and functionality. The subsequent sections provide insights into the machine learning models employed for predictive analysis and the system’s testing and validation results.
4.3.2. The Process of Data Preparation and Model Training
In this section, we provide a detailed account of the data preparation process, the steps taken to train the classification model, and the evaluation methodology used in the study of FSGS patient classification. The classification framework, as outlined in
Section 4.3.1, relied on a robust dataset generated to reflect the seven outcome categories defined in
Table 5.
The dataset was prepared to encompass a comprehensive representation of FSGS outcomes, ensuring equal distribution across all seven categories. To ensure the model’s robustness and to address potential class imbalances, data augmentation techniques were employed. These techniques aimed to generate new synthetic data points based on the distribution of the original dataset, thereby improving the model’s generalization capabilities. Synthetic data generation was performed in Python v24.0, adhering to the clinical thresholds and characteristics specified in
Table 5. For each class, 200 data records were generated, resulting in a total dataset of 1400 instances. Each instance consisted of 13 clinical features, including key markers such as proteinuria, serum albumin, and serum creatinine levels.
The process began with data preprocessing, where missing values in the feature columns (excluding the target variable) were handled using median imputation. This was performed with the SimpleImputer, ensuring that no feature contained null values. The target variable was then encoded using , which converted the categorical labels into numeric form suitable for model training. Next, the features were standardized using to normalize the data, ensuring that all features had a mean of zero and a standard deviation of one, which aids the convergence of many machine learning algorithms.
Each model was evaluated using 5-fold cross-validation to estimate its performance on the training set. The function calculated the accuracy for each fold, and the mean accuracy score was stored for comparison. Once the cross-validation was complete, each classifier was trained on the entire training set using the fit method, and predictions were made on the test set.
To assess the performance of each model, a classification report was generated, providing precision, recall, and F1-score for each class, as well as the weighted averages of these metrics. In addition, confusion matrices were calculated and visualized to display the true positives, false positives, true negatives, and false negatives for each model, giving further insight into the models’ behavior and areas where they may struggle. Finally, to facilitate model comparison, precision, recall, and F1-score for each class were visualized in bar charts, allowing for a clear side-by-side comparison of model performance across the different classes. This detailed approach helped identify the strengths and weaknesses of each model and provided insights into the data and features that contributed most to accurate predictions.
Table 6 details the settings for each model used during development.
The training data were based on predefined thresholds specified in
Table 5 and
Figure 6 for initial data labeling and validation of system performance against clinical standards. Specific model performance results and algorithm evaluation are presented in the next section.
The detailed results of the model and algorithm performance evaluation are presented later in the paper. In
Section 5, a detailed evaluation of the integration of this classification model into the broader framework of the proposed decision support system in FSGS is presented.
4.3.3. Achieved Results
The precision scores for the evaluated machine learning models, shown in
Figure 7, highlight key differences in classification performance. Precision, which measures the proportion of true positive predictions among all positive predictions, is crucial for minimizing false positives, particularly in clinical applications. Here is a summary of the results:
Top performers: Bagging, LightGBM, logistic regression and random forest achieved the highest precision scores (0.93 to 0.91), demonstrating robust performance in accurate classification.
Boosting algorithms: XGBoost, gradient boosting, and CatBoost showed strong performance with precision scores ranging from 0.90 to 0.91. AdaBoost performed slightly worse, achieving a precision score of around 0.89, which may require further tuning to the complexity of the dataset.
Traditional and linear models: models like SVM, decision trees, and linear SVC demonstrated reliable performance, achieving precision scores of around 0.92.
Lower performers: the MLP classifier and Naive Bayes achieved precision scores of around 0.89, suggesting limitations in handling the dataset’s structure.
Ensemble methods, especially Bagging and LightGBM, excelled in precision, making them suitable for clinical decision-making tasks. In contrast, the somewhat weaker performance of K-nearest neighbors and other models highlights the importance of proper model selection and parameter optimization.
The recall scores for the machine learning models, depicted in
Figure 8, provide insights into the ability of each model to identify true positive cases among all actual positives. High recall is essential in clinical settings to minimize false negatives, ensuring critical conditions are not overlooked. The results were as follows:
Top performers: Bagging and LightGBM achieved the highest recall score (0.91 to 0.93), reflecting superior sensitivity in detecting positive cases.
Consistently high recall: random forest, logistic regression, SVM, and gradient boosting showed strong recall scores (0.91 to 0.90), indicating reliable detection of positive cases across different classes.
Moderate performance: K-nearest neighbors (KNN), AdaBoost, and the MLP classifier achieved recall scores of around 0.87 to 0.89, respectively, suggesting moderate effectiveness in identifying true positives.
The highest recall scores were achieved by ensemble models such as LightGBM and Bagging (and several other models), highlighting their suitability for applications requiring high sensitivity. The slightly lower performance of KNN and Naive Bayes suggests the need for further optimization or alternative strategies to improve their recall capability. These results highlight the importance of balancing recall with other metrics for comprehensive model evaluation.
The F1-score, presented in
Figure 9, combines precision and recall into a single metric, providing a balanced measure of a model’s accuracy in both identifying true positives and avoiding false positives. It is particularly useful when dealing with imbalanced datasets. A summary of the results is as follows:
Top performer: LightGBM achieved the highest F1-score (around 0.93), demonstrating exceptional balance between precision and recall.
Strong contenders: models such as random forest, logistic regression, SVM, gradient boosting, and CatBoost all achieved high F1-scores in the range from 0.91 to 0.90.
Moderate scores: AdaBoost, K-nearest neighbors (KNN), and Naive Bayes showed lower scores (from around 0.88 to 0.87), suggesting moderate trade-offs in precision and recall.
The highest F1 scores were achieved by ensemble models such as LightGBM and random forest (and several other models), confirming their robustness for applications requiring a balanced trade-off between precision and recall. The suboptimal performance of KNN, Naive Bayes, or AdaBoost suggests that it may require further optimization to improve their effectiveness in handling complex data. These observations reinforce the utility of ensemble methods in delivering better overall performance.
The confusion matrices, presented in
Figure 10, provide detailed insights into the classification performance of the evaluated machine learning models. By illustrating the distribution of true positives, false positives, true negatives, and false negatives across all seven outcome classes, the confusion matrices enable a deeper understanding of the models’ strengths and weaknesses. Key observations were as follows:
Random forest: The random forest classifier demonstrated high accuracy across most classes, with minimal misclassifications. Notable challenges included occasional confusion with Class 5, likely due to overlapping clinical features in these categories.
KNN: KNN had slightly more difficulty with accurate predictions, showing misclassification in some classes. These results were consistent with its lower precision, recall, and F1 scores compared to the other models, indicating some limitations of the model in handling complex data distributions.
Bagging: The Bagging classifier showed robust performance, with relatively balanced classification across all classes. Misclassifications were rare and mostly occurred between adjacent classes, reflecting its ability to handle minor ambiguities effectively.
LightGBM: LightGBM achieved the most accurate predictions, with the confusion matrix showing strong diagonal dominance, indicating excellent classification performance. Misclassifications were minimal.
Misclassifications between specific classes (see
Figure 10) are particularly concerning in clinical contexts, as these distinctions inform treatment strategies. The confusion observed in
KNN’s matrix highlights the importance of selecting models that prioritize precision and recall in critical clinical categories. The confusion matrix analysis confirmed that some ensemble models like
Bagging and
LightGBM outperformed less robust methods like
KNN. This strengthens the conclusion that some ensemble techniques are better suited to classifying FSGS scores, offering greater accuracy and reliability in real-world clinical applications.
This part provides an overview of the cross-validation results for the machine learning models designed for classifying FSGS outcomes. Cross-validation was performed using 5-fold splitting to ensure a robust estimation of model performance. Each model’s mean accuracy score across the folds is presented in
Table 7. The results highlight key insights into the models’ generalization capabilities and reliability.
The cross-validation results revealed the following key insights:
Top performers: LightGBM and random forest demonstrated the highest mean accuracy (around 0.93), closely followed by logistic regression and Bagging (all with an accuracy of around 0.92). These models showed strong consistency with low standard deviation, indicating reliable performance across different folds.
Ensemble methods: Bagging, random forest, and LightGBM performed well, confirming the effectiveness of ensemble techniques in handling complex data. These models were not only accurate but also demonstrated robust performance, with small variability across folds.
Moderate performers: Models such as gradient boosting, Extra Trees and CatBoost showed competitive performance with a slight decrease in accuracy compared to the top models (from around 0.87 to 0.88). These results suggest that these models perform well but require further fine-tuning or adjustments.
Traditional and linear models: Logistic regression and SVC achieved mean accuracy scores of 0.92 to 0.89. These models performed reliably but with slightly less accuracy than the ensemble methods.
K-nearest neighbors, AdaBoost, and the MLP classifier also showed decent performance (from 0.85 to 0.87), but their results indicated limitations in handling the dataset’s complexity.
The results suggest that ensemble methods, particularly LightGBM, random forest, and Bagging, offer the best performance for this classification task. Their high accuracy and low variability make them suitable candidates for clinical decision support systems where reliability and precision are crucial. On the other hand, AdaBoost underperformed significantly, highlighting the importance of selecting and tuning models appropriately for the dataset. The findings also emphasize that traditional methods like logistic regression and SVC can perform well in many scenarios, although they may not be as robust as ensemble methods. These insights guided the selection and further optimization of models in the clinical decision-making framework for FSGS treatment outcomes.
4.3.4. Implementation of Results in the System Module
In the final outcome assessment module, the implementation incorporates visualization tools to enhance interpretability and facilitate medical staff’s decision-making process.
Figure 11 and
Figure 12 represent two examples of key components included in the module interface.
Figure 11 displays the probability distribution for a specific instance across all potential classification outcomes. As shown in
Figure 11, the system computes and presents the likelihood of each class, allowing medical professionals to assess the confidence associated with the predicted outcome. In this example, the random forest classifier assigned a dominant probability of
to Class 3 while attributing
to Class 5, reflecting a nuanced differentiation between similar categories. The remaining classes received probabilities close to zero, indicating minimal ambiguity in the classification for these outcomes.
Figure 12 provides insights into the importance of individual features in the classification process, as determined by the random forest algorithm. In
Figure 12, features are ranked based on their contribution to the model’s decision-making. The top features (
feature12, feature1, and feature2) exhibit the highest importance scores (
), while lower-ranked features (
feature3 and feature4) have minimal impact. Such insights are invaluable for understanding which clinical parameters play a critical role in the predictive model, aiding in interpretability and potential model refinement.
Both figures, integrated into the application interface (
Figure 13), provide a comprehensive overview to support medical professionals in evaluating system predictions. By presenting both the probabilistic outputs and the underlying feature contributions, the module ensures transparency, allowing domain experts to validate and trust the system’s recommendations in real-world scenarios. This approach enhances the system’s utility by bridging the gap between automated predictions and clinical expertise.
Figure 13 illustrates the user interface of the final outcome assessment module within the application. This graphical window integrates multiple functionalities to support the medical personnel in reviewing and validating the system’s predictions. On the right side of the interface, two key visualizations are displayed. The feature importance analysis bar chart (top-right) highlights the contribution of individual features to the classification process. The class probability distribution chart (bottom-right) provides the likelihood of the prediction for each class, enabling the clinician to assess the confidence of the assigned classification.
On the left side of the interface, direct clinical questions are presented. Below these questions, two options (Yes or No) are provided, allowing the user to confirm or deny, based on the patient’s clinical condition. The system then presents a definition of a given final assessment class along with its justification. The module’s layout is designed for clarity and usability, ensuring that clinicians can interact with the system effortlessly while interpreting predictions in real time.
4.4. Treatment Continuation Module
Once the initial treatment phase is complete and clinical outcomes are assessed, the system provides a clear path for decision-making based on the patient’s response to treatment. The system classifies the patient into one of several categories, which are discussed in
Section 4.3.
The treatment continuation module is an integral part of the expert system designed to guide clinicians through the management of patients with FSGS following the initial treatment phase. This module provides evidence-based recommendations on how to proceed with therapy after six months, after assessing whether the patient has achieved remission, is in partial remission, or has experienced a relapse. The module ensures that the treatment plan remains tailored to the patient’s evolving needs. The flowchart of the described approach is presented in
Figure 14. The key steps in the workflow are as follows:
Assessment of remission status: The clinician is asked whether the patient has achieved complete or partial remission. If the answer is “Yes”, the system displays further treatment options based on remission status, leading to the recommendations shown in
Table 3.
Therapeutic adjustments for relapse: If relapse is detected, the system prompts the clinician to choose from alternative treatment regimens. These may include second-line therapies such as calcineurin inhibitors or more intensive immunosuppressive treatments.
Disease resistance to first-line treatment: In cases where steroid resistance is observed, the system provides guidance on potential second-line therapies. Options include calcineurin inhibitors, rituximab, or other immunosuppressive agents.
Monitoring and follow-up: The system integrates periodic follow-up assessments to ensure that the treatment remains effective. It adjusts recommendations based on the patient’s clinical response over time, with regular monitoring of key parameters such as proteinuria, serum creatinine, and albumin.
The treatment continuation decision process is guided by an interactive interface (see
Figure 15), where clinicians are prompted with specific questions based on the patient’s progress. Clinicians must provide answers regarding the patient’s remission status, after which the system proposes an appropriate treatment strategy. The interface is designed to allow easy navigation between different treatment options, providing detailed information about each one.
4.5. FSGS Analysis Module
The FSGS analysis module (see
Figure 16) gives as a comprehensive diagnostic tool for categorizing and analyzing the various forms of FSGS. This module is designed to provide clinicians with detailed insights into the genetic and secondary causes of FSGS, enabling precise identification and treatment planning. The system offers two primary categories for analysis: genetic FSGS and secondary FSGS.
The module presents a user-friendly interface where clinicians can select between different forms of FSGS to obtain detailed information. Upon selection, the system provides an in-depth explanation of each category, including key clinical considerations, diagnostic indications, and treatment implications. The two main categories of FSGS analyzed in this module are as follows:
Genetic FSGS: This category focuses on inherited forms of FSGS, which include familial, sporadic, and syndromic variants. The system provides guidance on genetic testing, appropriate clinical indications, and considerations for clinical management.
Secondary FSGS: This category includes FSGS arising from secondary causes such as infections, medication-induced damage, and adaptive changes associated with glomerular hypertension. The module outlines various conditions that lead to secondary FSGS, including viral infections, certain medications, and systemic diseases.
The FSGS analysis module integrates seamlessly into the broader clinical decision-making process. By providing detailed and structured information about the genetic and secondary causes of FSGS, it supports clinicians in making informed decisions regarding the need for genetic testing or further investigation into secondary causes.
The next section discusses the research results in the context of the decision support and classification system as a whole.
5. Practical Application Research and Discussion of Results
5.1. Material and Methods
The analysis of the effectiveness and efficiency of the application was conducted by examining the database of patients diagnosed with focal segmental glomerulosclerosis with nephrotic syndrome, diagnosed and treated at the University Clinical Hospital in Opole from 2012 to 2023. Data from 181 patients were analyzed; however, based on incomplete documentation, data from 127 patients were selected for the analysis. The decision-making process undertaken by the physicians was examined based on recommendations from scientific societies such as the International Society of Nephrology, the European Renal Association, and the Polish Society of Nephrology, as well as specialized textbooks. Subsequently, a similar decision-making process was carried out using the application, which analyzed ordered laboratory tests, the results of histopathological examinations of kidney biopsies, anthropometric data, risk factors for the development of cardiovascular and oncological diseases, and potential complications associated with the treatment applied, particularly immunosuppression. An evaluation of treatment methods, dosing of individual medications, and further monitoring of patients during hospitalization and in the nephrology outpatient clinic was also conducted.
5.2. Results
Based on the recorded data, errors in decision-making were identified in 17 out of 127 patients. Specifically, errors concerned the initial diagnostic and treatment processes for 7 out of 127 patients, while for 10 out of 127, errors were noted in the further stages of treatment, both in the nephrology ward and in the nephrology outpatient clinic. In 7 out of 127 patients, errors during the initial diagnostic phase were found in the process of ordering laboratory tests, as not all required tests were performed, such as lipid profiles, protein measurements in 24 h urine collections, glucose concentration, and hemoglobin A1c as screening tests for the presence of diabetes. In 2 out of 127 patients, a too low dose of steroids was applied, and in one patient, the appropriate dose of nephroprotective drugs—ACE inhibitors—was not administered. In the later stages of treatment, 10 out of 127 patients exhibited errors in the decisions made, with 7 of those errors being related to the failure to perform all recommended laboratory tests, including creatinine levels with eGFR measurements, serum protein levels, and the 24 h urine protein quantification, as well as hemoglobin A1c measurement as a risk factor for detecting post-steroid diabetes. Additionally, in the medical history, the weights and blood pressure readings of eight patients were not recorded. In 8 out of 127 patients, the steroid dose was reduced too slowly, and in 5 out of 127 patients, either an inappropriate dose of nephroprotective drugs was applied, or they were not used at all.
5.3. Discussion
The conducted analysis of the decision-making process using standard methods—specialist recommendations and specialized textbooks—compared with the application, revealed that errors in decision-making in the diagnosis and treatment of nephron diseases—specifically, focal segmental glomerulosclerosis with nephrotic syndrome—were identified in 17 out of 127 patients. Although the number of cases and clinical significance of the detected errors were rather insignificant, a comparative analysis of the outcome measures between the group of patients who had errors and those who did not was not conducted, mainly due to the small number of these patients and the lack of statistical significance regarding the achievement of outcome measures. Traditional methods were associated with a higher error rate (13.4%) in comparison to the potential improvements observed during application-assisted decision-making, emphasizing the tool’s capability to reduce errors and enhance diagnostic and therapeutic efficiency. The study showed that the use of the application could assist the physician in the diagnostic and therapeutic process, reduce the time needed for accurate diagnosis and treatment, and improve the efficiency of the verification process of treatment effectiveness for kidney diseases. Further prospective studies with detailed analysis of the treatment process in groups with and without the application are required.
6. Conclusions and Future Work
This study presented a novel expert system (FNDSS) designed to enhance the management of focal segmental glomerulosclerosis by integrating advanced ML techniques. The system utilized a modular structure that incorporated diagnostic workflows, personalized risk stratification, treatment recommendations, and continuous outcome monitoring. The integration of ML algorithms within the system significantly improved the accuracy and consistency of clinical decision-making, automating complex aspects of the diagnostic and therapeutic process, which traditionally rely on human expertise and can be prone to inconsistencies.
The ML models employed in the system, particularly ensemble methods such as LightGBM and random forest, demonstrated superior performance in classifying FSGS outcomes with high precision and recall. These models efficiently handled large, complex datasets, providing valuable insights for clinicians by leveraging data-driven, real-time predictions. The system’s ability to continuously learn from new data ensured that its diagnostic and treatment recommendations remained adaptive, thereby facilitating personalized care for patients.
The key findings from the evaluation phase highlighted that the expert system effectively reduced diagnostic errors, streamlined treatment protocols, and improved patient outcomes. By automating the classification of disease progression and response to treatment, the system not only accelerated the decision-making process but also mitigated the risk of human error. This enhanced both the efficiency and effectiveness of clinical workflows, offering a robust tool for nephrologists managing FSGS.
Despite these advancements, several challenges remain. The system’s reliance on standardized, high-quality data underscores the necessity for continuous data curation and validation. Furthermore, the predefined thresholds used for classification must be periodically updated to align with the latest clinical research and evolving biomarkers. The system’s performance is also contingent on the availability and accuracy of clinical data inputs, which may vary across healthcare settings. Future work will focus on the following key areas:
Dataset expansion and heterogeneity: To further enhance the generalizability of the system, it is crucial to integrate larger and more diverse datasets, including multi-center clinical data. This will ensure that the system can adapt to the wide variability found in real-world patient populations and medical practices.
AI-driven real-time decision support: A more integrated approach with electronic health records (EHRs) is essential to enable real-time data analysis and predictive decision support. This integration will facilitate the seamless flow of clinical information and enhance the system’s responsiveness to dynamic patient conditions.
Advanced predictive modeling with deep learning: the current ensemble learning models demonstrated promising results; however, further exploration into deep learning techniques, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), could improve the system’s ability to detect subtle patterns and make predictions for more complex cases of FSGS, especially in rare or atypical forms of the disease.
Automated model training and updating: Implementing continuous learning protocols will allow the system to autonomously update its models based on new patient data, enhancing its adaptability and predictive capabilities. This will further optimize treatment strategies and improve the system’s long-term accuracy.
Ethical and regulatory compliance: Given the potential of AI and ML to impact clinical decision-making, it is imperative to ensure that the system adheres to ethical guidelines and regulatory standards, particularly concerning data privacy, transparency, and bias mitigation. Addressing these aspects will ensure that the system can be deployed in clinical practice without compromising patient safety or care quality.
Clinical validation and prospective trials: To validate the clinical utility of the expert system, prospective randomized trials are required. These trials will assess the system’s impact on patient outcomes, treatment efficacy, and healthcare resource utilization, providing empirical evidence of its effectiveness in real-world settings.
In conclusion, the integration of ML in the management of FSGS represents a transformative shift in nephrology. By automating critical aspects of the diagnostic and treatment process, the proposed expert system offers a promising tool to enhance clinical decision-making, personalize patient care, and ultimately improve health outcomes. Continued research, model refinement, and clinical validation will be essential to fully realize the potential of AI-driven decision support in nephrology.