1. Introduction
CKD presents a significant global health challenge, affecting approximately 850 million people worldwide [
1]. The kidneys, vital organs situated on both sides of the spine just below the ribcage, play a crucial role in maintaining the body’s internal environment by filtering the blood and removing waste products, excess fluids, and toxins through urine. Additionally, they regulate electrolyte levels, blood pressure, and the acid–base balance, while producing hormones that control calcium metabolism and stimulate red blood cell production [
2,
3].
CKD is characterized by a progressive and long-term decline in kidney function, leading to an inability to effectively filter waste and maintain fluid and electrolyte balance, resulting in the accumulation of waste products and fluid retention. The burden of CKD is immense, contributing to complications like electrolyte imbalances, bone disorders, anemia, and cardiovascular diseases [
4,
5,
6]. If left untreated, CKD can progress to end-stage renal disease, necessitating dialysis or kidney transplantation [
7,
8]. Early detection and proper management of CKD are pivotal to preserving kidney function, slowing down the disease progression, and improving patient outcomes [
9].
Despite its global prevalence and impact on public health, detecting CKD early and ensuring access to quality kidney care pose significant challenges, particularly in low- and middle-income countries with limited resources [
10,
11,
12]. Traditional methods for CKD detection, such as blood tests and urinalysis, may have limitations in identifying the early stages of kidney damage and might not capture fluctuations in kidney health over time. Invasive procedures like kidney biopsy are unsuitable for routine screening, and imaging tests can be both expensive and time-consuming [
13,
14,
15].
ML methods offer promising solutions to these challenges. ML algorithms can analyze large and complex datasets, improving the accuracy in CKD detection by identifying subtle patterns and trends that may go unnoticed with traditional methods. These models can incorporate various variables, enabling personalized risk assessments and tailored treatment plans. The efficiency of ML algorithms allows for quick processing of new patient data, facilitating timely diagnosis and intervention. Moreover, ML can predict CKD development in high-risk individuals, enabling early preventive measures [
16,
17,
18].
In this paper, we investigate the feasibility and potential benefits of using ML for early CKD diagnosis. Our objective is to develop an ML model that incorporates data imputation, data scaling methods, split ratio, and optimal parameters, while evaluating classifiers based on their classification accuracy. The goal is to effectively detect CKD using ML algorithms such as the k-nearest neighbor and naive Bayes. Missing values are handled using iterative imputation, and a novel sequential data scaling method is introduced by combining robust scaling, z-standardization, and min–max scaling. Boruta feature selection is applied to identify important features, and the hyperparameters are tuned using grid-search CV. The testing accuracy of our proposed work is evaluated by comparing it to the results of various other studies.
The remaining sections of this paper are structured as follows: In
Section 2, we conduct a comprehensive review of the existing literature and highlight the novelty of our work.
Section 3 outlines the methodologies employed and presents the proposed system model. The experimental results are analyzed in
Section 4. In
Section 5, we engage in a discussion and compare our proposed model with other studies. Finally, the paper concludes in
Section 6 by exploring potential avenues for future research.
2. Literature Review
In recent times, there has been a notable advancement in applying ML techniques to the field of healthcare, with a specific focus on early diagnosis and preventive measures [
19,
20,
21]. This progress has also extended to the field of CKD, where numerous noteworthy studies have contributed to advancements in CKD research [
17,
22]. In this literature review, we provide a comprehensive overview of the current state of CKD research by thoroughly discussing the relevant studies. Our analysis includes a detailed examination of the methodologies employed, the findings obtained, and the limitations identified in each study. By doing so, we aim to present a comprehensive and unbiased understanding of the progress and challenges in CKD research.
A study by Debabrata et al. (2023) aimed to develop an ML model for early CKD detection using the UCI CKD dataset. The researchers employed imputation techniques, a sampling technique for data balancing, and data normalization. They selected nine features based on the chi-square test and used support vector machines for classification. However, the study had limitations, such as the exclusion of advanced imputation algorithms and the potential information loss from reducing the feature set [
23].
In a study by Z. Ullah and M. Jamjoom (2023), the researchers aimed to predict CKD progression using a DT-based missing value imputation method. They performed feature selection using the filter method and employed the k-nearest neighbor algorithm for classification. However, the study did not utilize data scaling methods or hyperparameter optimization techniques [
24].
A study conducted by A. Farjana et al. (2023) focused on CKD prediction using ML algorithms on the UCI CKD dataset. The researchers filled the missing data with mean values and employed hold-out validation. Light GBM demonstrated superior performance, but the study lacked advanced imputation techniques, outlier handling, data scaling, feature selection, and model optimization [
25].
In a study by M. A. Islam et al. (2023), the researchers predicted CKD using ML algorithms. They used mean and mode techniques for missing data imputation and employed recursive feature elimination and principal component analysis for feature selection. However, the study did not utilize scaling methods or hyperparameter optimization techniques [
26].
A study by M. M. Hassan (2023) focused on CKD prediction using ML on patients’ clinical records. The researchers used predictive mean matching for missing data imputation and performed data clustering using K-means. They employed the XGBoost approach with SHAP value analysis for feature selection. However, the study did not incorporate scaling methods or hyperparameter optimization [
27].
In a study conducted by C. Kaur et al. (2023), the researchers utilized machine learning for CKD prediction. They employed Little’s MCAR test for missing data analysis and the Ant Colony Optimization algorithm for feature selection. They used ensemble methods and found that bagging produced the best results. However, the study did not employ scaling methods, cross validation, or hyperparameter optimization techniques [
28].
Through the review of these studies, it is evident that several research gaps and limitations need to be addressed to further improve the field of CKD prediction. This study aims to specifically target these limitations and contribute novel approaches to the existing body of research. The key novelties of our work are as follows:
An advanced imputation method is employed to iteratively estimate missing values in the dataset. By implementing this technique, the completeness and quality of the dataset can be improved, leading to enhanced accuracy in the CKD prediction models.
A sequential approach to scaling the variables in the dataset is proposed. Robust scaling is initially used to adjust for outliers, ensuring that their influence is minimized. Subsequently, z-standardization is applied to further normalize the variables. Finally, min–max scaling is utilized to bring all features within a similar range.
To ensure the inclusion of only relevant and informative features, a robust feature selection algorithm called Boruta, is utilized.
Various ML models are explored and evaluated using grid-search CV to identify the most suitable algorithm for accurately classifying CKD.
The performance of the proposed model is rigorously validated using a range of evaluation metrics, including accuracy, precision, recall, F1-score, and curve analysis.
By addressing these limitations and incorporating these novel approaches, we aim to contribute to the advancement of CKD prediction models and provide more accurate and reliable predictions forthe early detection and prevention of CKD.
3. Methodology
This work presents a precise system for the detection of CKD through the utilization of a robust model. The proposed approach leverages ML techniques to construct a prediction model that is both effective and accurate. To visually depict the various stages of the proposed system,
Figure 1 provides a schematic representation.
3.1. Data Collection
In order to validate our proposed ML model, we obtained the CKD dataset from the UCI ML Repository. The dataset contains a total of 400 samples, which we used for evaluating and validating our ML model in this study [
29]. Each sample comprises 24 predictive variables, including 11 numerical variables and 13 categorical (nominal) variables. The dataset also includes a categorical response variable called ‘class’, which indicates the presence or absence of CKD. The ‘class’ variable has two distinct values: ‘ckd’ for samples diagnosed with CKD and ‘notckd’ for samples without CKD. To provide additional insights, a descriptive summary of the attributes involved in our comprehensive analysis is presented in
Table 1.
3.2. Preprocessing
Medical datasets are prone to various issues that can have a negative impact on the performance of ML models. Therefore, it is crucial to address these challenges to improve the quality of the data. The preprocessing stage plays a vital role in enhancing data quality by tackling key issues such as data encoding, missing values, and outliers [
30].
3.2.1. Data Encoding
To handle the combination of categorical and numeric features in the dataset, the label encoder module from the Scikit-learn library was used. This module transformed the categorical features into numeric representations, allowing for the improved performance of the machine learning model.
3.2.2. Data Imputation
Handling missing data requires choosing appropriate statistical methods based on the extent of missing data and the significance of the missing feature. Traditional techniques like mean, maximum, and mode work well with a low proportion of missing values [
31]. In our study, we encountered a substantial amount of missing data, as illustrated in
Figure 2.
To tackle this issue, we utilized iterative imputation, a statistical approach that iteratively estimates the missing values based on the observed data while considering the relationships between variables. This iterative process progressively refines the imputed values over multiple iterations, leading to a comprehensive and accurate estimation [
32]. Algorithm 1 outlines the steps involved in constructing the iterative imputation process.
Algorithm 1 The iterative imputation pseudocode. |
Input: 1: Dataset X 2: Features with missing values: 3: Maximum iterations: 4: Convergence threshold: Output: Imputed dataset 5: procedure IterativeImputation(X, , , ) 6: Initialize 7: for each feature f in do 8: Initialize missing mask for feature f 9: Initialize model (Linear Regression) for feature f 10: Initialize convergence 11: Initialize iterations 12: while not convergence and iterations < do 13: Fit model on 14: Predict missing values using 15: Update with predicted values 16: Check for convergence using mean absolute change 17: if CheckConvergence(, f, ) then 18: convergence ← True 19: end if 20: Increment iterations 21: end while 22: end for 23: return 24: end procedure |
3.2.3. Data Scaling
To address outliers and achieve data normalization, a sequential approach of scaling techniques was employed, as outlined in Algorithm 2. The process began with robust scaling, which reduces the impact of extreme values and enhances robustness. It involved subtracting the median (
) and dividing by the interquartile range (
). This can be represented by the following equation:
Next, z-score standardization was applied, resulting in a standardized distribution by subtracting the mean (
) and dividing by the standard deviation (
). This can be represented by the following equation:
Finally, to bring the features within a specific range (typically 0 to 1), min–max scaling was performed by subtracting the minimum value (
) and dividing by the range (
). This can be represented by the following equation:
Algorithm 2 The sequential approach of scaling techniques. |
Input: Dataset Output: Scaled dataset 1: procedure SequentialScaling(X) 2: Initialize 3: Apply Robust Scaling to X and store the result in 4: Apply Z-score Standardization to and update 5: Apply Min–Max Scaling to and update 6: return 7: end procedure |
3.2.4. Feature Selection
Feature selection is a crucial step in ML, as it helps extract a subset of important features from the dataset. This process offers several benefits, including improved prediction accuracy, reduced model complexity, and enhanced interpretability.
In this study, we utilized the Boruta feature selection technique, which leverages random shadow features and an ML model. Boruta compares the importance of each feature to that of the shadow features iteratively, categorizing features as confirmed, tentative, or rejected based on their significance. Ultimately, Boruta provides a subset of the most significant features from the dataset. We implemented the technique using a random forest classifier as the base model to evaluate the feature importance. This classifier was trained on the dataset, including both original and shadow features, using measures such as the mean decrease in accuracy. The combination of the Boruta algorithm and the random forest classifier enabled us to identify the most relevant features for our analysis [
33,
34].
Algorithm 3 provides a concise overview of the Boruta feature selection algorithm, outlining the steps of initialization, iteration, feature evaluation, and the selection of confirmed features.
Algorithm 3 The Boruta feature selection pseudocode. |
Input: 1: Dataset with n samples and m features 2: Target variable y with n labels 3: Random forest classifier with and 4: Number of iterations for Boruta algorithm Output: Selected features 5: Initialize a set of tentative features T with all m features 6: Initialize an empty set of confirmed features C 7: Initialize an empty set of rejected features R 8: for to do 9: Fit the random forest classifier on using features from T 10: Perform a permutation test for each feature in T to evaluate its importance 11: for each feature f in T do 12: if the feature importance of f is significantly higher than random, then 13: Move f from T to C 14: else 15: Move f from T to R 16: end if 17: end for 18: if T is empty then 19: break 20: end if 21: end for 22: 23: return |
The Boruta feature selection technique was applied to the UCI CKD dataset, resulting in the selection of 19 features, while 5 features were rejected. The features that were rejected include pus cell clumps, bacteria, potassium, coronary artery disease, and anemia. The selected 19 features were considered important for the classification task and were used for further analysis and model building. These selected variables are also clinically relevant to CKD, as supported by the previous literature [
23,
24,
26,
27]. The incorporation of these relevant features enhances the model’s ability to accurately identify and predict cases of CKD, making it a valuable tool for early detection and effective management of the condition.
3.3. Data Splitting
Data splitting is a crucial step in machine learning for reliable model evaluation and generalization [
35]. It involves dividing the dataset into training and testing subsets:
In this study, we used an 80:20 split ratio, where 80% of the dataset was allocated for training and the remaining 20% for testing. This ensures that the model learns from a significant portion of the data and is then evaluated on unseen data to assess its generalization performance.
3.4. Model Traning
During the model training phase, we employed two highly efficient ML classifiers: naïve Bayes and k-nearest neighbor. To optimize their performance, we utilized the hyperparameter optimization technique to tune the parameters of both algorithms.
3.4.1. Hyperparameter Optimization
Hyperparameter optimization is a critical step in ML to optimize the model performance by selecting the best combination of hyperparameters. In our study, we employed the widely used technique of grid-search CV. This approach systematically explores predefined grids of the hyperparameter values, evaluating the model’s performance for each combination using CV. By exhaustively searching through the hyperparameter space, it allows for a comprehensive exploration and selection of the optimal hyperparameter configuration [
36,
37]. The workflow of grid search CV for the selection of the hyperparameters is illustrated in
Figure 3.
3.4.2. Naïve Bayes
It is a supervised algorithm that assumes feature independence during classification. It is particularly useful for datasets with a high number of input features. The algorithm considers all features, including those with weak effects on the prediction. The probabilistic model is represented by the equation:
In this equation, A and B represent independent events. This equation calculates the probability of event A occurring given that event B has occurred. By applying this model, naïve Bayes can make predictions based on the class with the highest probability.
In our study, we utilized the Gaussian naïve Bayes (NB) algorithm for classification. This variant assumes a Gaussian distribution for the features. It estimates the likelihood of observing specific feature values given a class label using the Gaussian probability density function.
The step-by-step procedure and essential hyperparameter choices for constructing the Gaussian NB model in this research are outlined in the pseudocode provided in Algorithm 4. The hyperparameters include the training data, smoothing parameter, and priors, which are utilized to build the model. The algorithm begins by calculating the prior probability for each class and then estimates the mean and variance of features for each class. Using Bayes’ theorem, it computes the posterior probability for each class given a new data point. Finally, the algorithm assigns the class with the highest posterior probability as the predicted class for the new data point [
38].
Algorithm 4 The Gaussian NB pseudocode. |
Input: Training dataset: New data point: : [None, [0.5, 0.5], [0.3, 0.7]] : [, , ] Output: Predicted class label: y 1: procedure GaussianNBs(, , , ) 2: Calculate the prior probability for each class : using priors 3: for each feature do 4: if feature is discrete, then 5: Calculate the proportion of occurrences of in class 6: else if feature is continuous, then 7: for each class do 8: Estimate the mean and variance of using examples in class 9: end for 10: end if 11: end for 12: Calculate the posterior probability for each class using Bayes’ theorem: 13: 14: Assign the class label with the highest posterior probability as the predicted class: 15: 16: return y 17: end procedure |
3.4.3. K-Nearest Neighbor
It is a simple and widely used supervised ML algorithm. It predicts the class of an observation by considering the classes of its k nearest neighbors, determined using a distance metric such as the Euclidean, Minkowski, or Manhattan distance. The equations for these distance metrics are as follows:
In these equations, and represent the kth features of and in a d-dimensional space, respectively.
Using these distance metrics, it identifies the k nearest neighbors of a data point and determines its class based on the majority class among those neighbors. It is a straightforward and intuitive algorithm, making it applicable to various classification tasks.
The step-by-step procedure and essential hyperparameter choices for constructing the Gaussian NB model in this research are outlined in the pseudocode provided in Algorithm 5. The hyperparameters, such as the training data, leaf size, parameter, weight function, algorithm, number of neighbors, and distance metric, are utilized to build the model. The algorithm predicts the class label of a test instance by considering the majority class among its k nearest neighbors. It accomplishes this by calculating the distances between the test instance and training instances, selecting the k nearest neighbors and determining the predicted class label through a majority voting process [
39,
40].
Algorithm 5 The k-nearest neighbors pseudocode. |
Input: Training dataset: Test instance: : [‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’] : [20, 30, 40] : [‘euclidean’, ‘manhattan’, ‘minkowski’] : [3, 5, 7] p: [3, 4] : [‘uniform’, ‘distance’] Output: Predicted class label for : 1: procedure KNN(, , , , , , p, ) 2: Empty list 3: for each in do 4: Calculate the distance between and using 5: Add to 6: end for 7: Sort in ascending order based on distance 8: First elements from 9: Extract labels from 10: if is ‘distance’, then 11: Compute weights based on the distance for 12: end if 13: Perform a weighted majority vote of using 14: return 15: end procedure |
3.5. Performance Metrics
The effectiveness and accuracy of the developed ML models in this research were evaluated using various performance metrics. These metrics, including the accuracy, recall, precision, and F1-score, provided valuable insights into different aspects of the classifiers’ performance. The evaluation relied on a confusion matrix, which is presented in
Table 2. The confusion matrix allowed for a comprehensive examination of the classification results. True positives (TP) represented instances correctly predicted as the positive class, while true negatives (TN) represented instances correctly predicted as the negative class. False positives (FP) were instances incorrectly predicted as the positive class, and false negatives (FN) were instances incorrectly predicted as the negative class. This evaluation approach facilitated a thorough assessment of the accuracy and effectiveness of the model in the early detection of CKD.
4. Results
An experimental study was conducted on the UCI CKD dataset, where the categorical features were encoded. The missing values were addressed using alternative imputation techniques. A novel sequential approach was implemented for data scaling, involving robust scaling, z-standardization, and min–max scaling in that order. To perform feature selection, we utilized the Boruta algorithm. The dataset was divided into training and testing sets using an 80:20 ratio. For constructing the models, we employed ML techniques such as k-nearest neighbor and Gaussian NB. To optimize the model parameters, a grid-search CV was utilized. All preprocessing, visualization, and analysis tasks were carried out using Python programming.
In
Figure 4, the confusion matrices are presented, depicting the performance of the models.
Table 3 provides the optimal hyperparameters obtained through the grid-search CV, along with the performance metrics including the accuracy, precision, recall, and F1-score. It shows that the k-nearest neighbors model achieved a 100% accuracy, precision, recall, and F1-score, indicating excellent performance.
Figure 5 displays the evaluation of the model through the area under the ROC curve and the precision–recall curve. The k-nearest neighbor model achieved the highest performance, indicating its superiority as the best model for the early detection of CKD.
To assess the generalization capability of the trained models, a rigorous 15-fold CV technique was employed. The results, as depicted in
Figure 6, illustrate the accuracy of both models on each fold, providing valuable insights into their performance. The k-nearest neighbor algorithm demonstrated remarkable consistency across diverse folds, achieving an exceptional accuracy of 99.37%. This high score highlights the model’s impressive performance and robustness, indicating its ability to generalize well to unseen data. In contrast, the Gaussian NB achieved a slightly lower CV accuracy of 97.05%.
5. Discussion
CKD is a critical condition, and accurate diagnosis plays a pivotal role in improving patient outcomes. To address this, our paper focuses on proposing a comprehensive ML model for CKD prediction. However, in implementing ML techniques for medical diagnosis, we must be mindful of the potential risks and ethical considerations. Complex ML models may lack interpretability, raising concerns about trust and accountability in the healthcare domain. Additionally, biases in training data can lead to discriminatory outcomes, exacerbating healthcare disparities, and handling sensitive patient information raises privacy and data security issues. Despite these challenges, ML models offer benefits like accurate and personalized diagnoses, identifying rare conditions, and adapting to changing scenarios. Therefore, striking a balance between the risks and benefits is essential to harness ML’s potential for improved medical diagnosis while upholding ethical standards and patient wellbeing.
As we embark on improving CKD prediction, it is crucial to address the existing challenges in the field of ML-based medical diagnosis. Commonly used sampling techniques in existing studies to balance datasets and improve accuracy may introduce artificial data, limiting real-world applicability. Handling missing data is another significant challenge in medical datasets, with mean or mode imputation methods potentially introducing biases. Some studies focus on using a reduced set of features to improve accuracy, but this approach may not generalize well in real-world scenarios. Additionally, data scaling, often overlooked, is a critical preprocessing step that can significantly impact model performance. In our approach, we systematically address these challenges to enhance CKD prediction accuracy. By utilizing iterative imputation for missing data, introducing a novel sequential data scaling method, and employing the Boruta algorithm for feature selection, we aim to create a robust and reliable model for CKD prediction. Through grid-search CV, we optimize the k-nearest neighbor and Gaussian NB algorithms, further refining the model’s performance.
To evaluate the efficacy of our proposed model, we conducted extensive validation on the UCI CKD dataset. Remarkably, our approach achieved an outstanding accuracy, precision, recall, and F1-score, all reaching 100%. Additionally, we compared our model with existing ML models that were developed on the same dataset. The comparison presented in
Table 4 demonstrates the superiority of our proposed model, showcasing its higher accuracy compared to previous studies.
Table 5 focuses on comparing our k-nearest neighborsand naïve Bayes models with other studies that also employed k-nearest neighbor and naïve Bayes algorithms. We evaluated the models’ performance using the same dataset to validate the effectiveness of our preprocessing steps. The results show that our models consistently outperformed the previous studies, highlighting the impact of our preprocessing techniques in enhancing prediction accuracy.
6. Conclusions
This study successfully developed a robust ML model for the early detection of CKD. The model’s exceptional performance, achieving 100% accuracy, percision, recall, and F1-score on the UCI CKD dataset, validates its reliability and potential for clinical application. By incorporating various preprocessing steps and the Boruta algorithm for feature selection, our proposed model demonstrates its robustness in accurately identifying CKD cases. The results obtained through multiple performance metrics further strengthen the confidence in its accuracy. The implementation of this model as a reliable and accurate tool for early CKD detection holds great promise for improving clinical decision making and ultimately enhancing patient outcomes. The potential impact of this research in advancing early diagnosis and management of CKD highlights its significance in addressing a critical global health challenge.
Limitations
The main limitation of this study was the reliance on a single dataset, the UCI CKD dataset, which contains a substantial amount of missing values. While we employed iterative imputation to estimate the missing data, it is crucial to acknowledge the uncertainty introduced by imputation methods, which may influence the model’s predictive capability. Additionally, the generalizability of our findings to other populations and real-world scenarios needs further investigation. The model’s adaptability to handle diverse data sources and missing data patterns should be carefully examined in future research. Furthermore, the retrospective nature of the performance evaluation raises questions about the model’s ability to predict CKD in real-time or prospective settings. Addressing these limitations will strengthen the model’s reliability and applicability for early CKD detection, making it a more effective tool for clinical use.
Author Contributions
D.A., conceptualization, data curation, methodology, software, validation, visualization, writing—original draft; M.S.A., conceptualization, methodology, validation, project administration, visualization, writing—original draft; A.M., funding acquisition, supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.
Funding
The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
We would like to extend our gratitude to the Prince Sultan University, Riyadh, Saudi Arabia, for facilitating the publication of this paper through the Theoretical and Applied Sciences Lab.
Conflicts of Interest
The authors declare that there are no conflict of interest regarding the publication of this paper.
References
- New Global Kidney Health Report Sheds Light on Current Capacity around the World to Deliver Kidney Care. Available online: https://www.theisn.org/blog/2023/03/30/new-global-kidney-health-report-sheds-light-on-current-capacity-around-the-world-to-deliver-kidney-care/ (accessed on 20 June 2023).
- Wadei, H.M.; Textor, S.C. The role of the kidney in regulating arterial blood pressure. Nat. Rev. Nephrol. 2012, 8, 602–609. [Google Scholar] [CrossRef] [PubMed]
- Mukoyama, M.; Nakao, K. Hormones of the kidney. Basic Clin. Princ. 2005, 353–365. [Google Scholar] [CrossRef]
- Webster, A.C.; Nagler, E.V.; Morton, R.L.; Masson, P. Chronic kidney disease. Lancet 2017, 389, 1238–1252. [Google Scholar] [CrossRef] [PubMed]
- Kalantar-Zadeh, K.; Jafar, T.H.; Nitsch, D.; Neuen, B.L.; Perkovic, V. Chronic kidney disease. Lancet 2021, 398, 786–802. [Google Scholar] [CrossRef] [PubMed]
- Hall, M.E.; do Carmo, J.M.; da Silva, A.A.; Juncos, L.A.; Wang, Z.; Hall, J.E. Obesity, hypertension, and chronic kidney disease. Int. J. Nephrol. Renov. Dis. 2014, 75–88. [Google Scholar] [CrossRef]
- Ghaderian, S.B.; Hayati, F.; Shayanpour, S.; Mousavi, S.S.B. Diabetes and end-stage renal disease; a review article on new concepts. J. Ren. Inj. Prev. 2015, 4, 28. [Google Scholar]
- Parmar, M.S. Chronic renal disease. BMJ 2002, 325, 85–90. [Google Scholar] [CrossRef]
- Wagner, L.A.; Tata, A.L.; Fink, J.C. Patient safety issues in CKD: Core curriculum 2015. Am. J. Kidney Dis. 2015, 66, 159–169. [Google Scholar] [CrossRef]
- Luyckx, V.A.; Al-Aly, Z.; Bello, A.K.; Bellorin-Font, E.; Carlini, R.G.; Fabian, J.; Garcia-Garcia, G.; Iyengar, A.; Sekkarie, M.; Van Biesen, W.; et al. Sustainable development goals relevant to kidney health: An update on progress. Nat. Rev. Nephrol. 2021, 17, 15–32. [Google Scholar] [CrossRef]
- Hoste, E.A.; Kellum, J.A.; Selby, N.M.; Zarbock, A.; Palevsky, P.M.; Bagshaw, S.M.; Goldstein, S.L.; Cerdá, J.; Chawla, L.S. Global epidemiology and outcomes of acute kidney injury. Nat. Rev. Nephrol. 2018, 14, 607–625. [Google Scholar] [CrossRef]
- Lin, M.Y.; Chiu, Y.W.; Lin, Y.H.; Kang, Y.; Wu, P.H.; Chen, J.H.; Luh, H.; Hwang, S.J.; iH3 Research Group. Kidney Health and Care: Current Status, Challenges, and Developments. J. Pers. Med. 2023, 13, 702. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.K.; Knicely, D.H.; Grams, M.E. Chronic kidney disease diagnosis and management: A review. JAMA 2019, 322, 1294–1304. [Google Scholar] [CrossRef] [PubMed]
- Ferguson, M.A.; Waikar, S.S. Established and emerging markers of kidney function. Clin. Chem. 2012, 58, 680–689. [Google Scholar] [CrossRef] [PubMed]
- Lopez-Giacoman, S.; Madero, M. Biomarkers in chronic kidney disease, from kidney function to kidney damage. World J. Nephrol. 2015, 4, 57. [Google Scholar] [CrossRef]
- Shehab, M.; Abualigah, L.; Shambour, Q.; Abu-Hashem, M.A.; Shambour, M.K.Y.; Alsalibi, A.I.; Gandomi, A.H. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med. 2022, 145, 105458. [Google Scholar] [CrossRef]
- Sanmarchi, F.; Fanconi, C.; Golinelli, D.; Gori, D.; Hernandez-Boussard, T.; Capodici, A. Predict, diagnose, and treat chronic kidney disease with machine learning: A systematic literature review. J. Nephrol. 2023, 36, 1101–1117. [Google Scholar]
- Ibrahim, I.; Abdulazeez, A. The role of machine learning algorithms for diagnosing diseases. J. Appl. Sci. Technol. Trends 2021, 2, 10–19. [Google Scholar] [CrossRef]
- Ghazal, T.M.; Hasan, M.K.; Alshurideh, M.T.; Alzoubi, H.M.; Ahmad, M.; Akbar, S.S.; Al Kurdi, B.; Akour, I.A. IoT for smart cities: Machine learning approaches in smart healthcare—A review. Future Internet 2021, 13, 218. [Google Scholar] [CrossRef]
- Asif, D.; Bibi, M.; Arif, M.S.; Mukheimer, A. Enhancing Heart Disease Prediction through Ensemble Learning Techniques with Hyperparameter Optimization. Algorithms 2023, 16, 308. [Google Scholar] [CrossRef]
- Siddique, S.; Chow, J.C. Machine learning in healthcare communication. Encyclopedia 2021, 1, 220–239. [Google Scholar] [CrossRef]
- Krisanapan, P.; Tangpanithandee, S.; Thongprayoon, C.; Pattharanitima, P.; Cheungpasitporn, W. Revolutionizing Chronic Kidney Disease Management with Machine Learning and Artificial Intelligence. J. Clin. Med. 2023, 12, 3018. [Google Scholar] [CrossRef] [PubMed]
- Swain, D.; Mehta, U.; Bhatt, A.; Patel, H.; Patel, K.; Mehta, D.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. A Robust Chronic Kidney Disease Classifier Using Machine Learning. Electronics 2023, 12, 212. [Google Scholar] [CrossRef]
- Ullah, Z.; Jamjoom, M. Early detection and diagnosis of chronic kidney disease based on selected predominant features. J. Healthc. Eng. 2023, 2023, 3553216. [Google Scholar] [CrossRef] [PubMed]
- Farjana, A.; Liza, F.T.; Pandit, P.P.; Das, M.C.; Hasan, M.; Tabassum, F.; Hossen, M.H. Predicting Chronic Kidney Disease Using Machine Learning Algorithms. In Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference, Las Vegas, NV, USA, 8–11 March 2023; pp. 1267–1271. [Google Scholar]
- Islam, M.A.; Majumder, M.Z.H.; Hussein, M.A. Chronic kidney disease prediction based on machine learning algorithms. J. Pathol. Inform. 2023, 14, 100189. [Google Scholar] [CrossRef]
- Hassan, M.M.; Hassan, M.M.; Mollick, S.; Khan, M.A.R.; Yasmin, F.; Bairagi, A.K.; Raihan, M.; Arif, S.A.; Rahman, A. A Comparative Study, Prediction and Development of Chronic Kidney Disease Using Machine Learning on Patients Clinical Records. Hum.-Centric Intell. Syst. 2023, 3, 92–104. [Google Scholar] [CrossRef]
- Kaur, C.; Kumar, M.S.; Anjum, A.; Binda, M.B.; Mallu, M.R.; Al Ansari, M.S. Chronic Kidney Disease Prediction Using Machine Learning. J. Adv. Inf. Technol. 2023, 14, 384–391. [Google Scholar] [CrossRef]
- Rubini, L.; Soundarapandian, P.; Eswaran, P. Chronic Kidney Disease. UCI Machine Learning Repository. 2015. Available online: https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease (accessed on 10 June 2023).
- García, S.; Luengo, J.; Herrera, F. Data preprocessing in data mining. CA Cancer J. Clin. 2015, 72, 59–139. [Google Scholar]
- Dong, Y.; Peng, C.Y.J. Principled missing data methods for researchers. SpringerPlus 2013, 2, 222. [Google Scholar] [CrossRef] [PubMed]
- Hoque, G. A Better Way to Handle Missing Values in your Dataset: Using IterativeImputer (PART I). Towards Data Sci. 2021. Available online: https://towardsdatascience.com/a-better-way-to-handle-missing-values-in-your-dataset-using-iterativeimputer-9e6e84857d98 (accessed on 20 June 2023).
- Kursa, M.B.; Rudnicki, W.R. Feature selection with the Boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
- Python Implementations of the Boruta All Relevant Feature Selection Method. Available online: https://github.com/scikit-learn-contrib/boruta_py (accessed on 20 June 2023).
- Joseph, V.R. Optimal ratio for data splitting. Stat. Anal. Data Mining ASA Data Sci. J. 2022, 15, 531–538. [Google Scholar] [CrossRef]
- Agrawal, T. Hyperparameter optimization using scikit-learn. In Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient; Apress: Berkeley, CA, USA, 2021; pp. 31–51. [Google Scholar]
- Liashchynskyi, P.; Liashchynskyi, P. Hyperparameter optimization using scikit-learn. Grid search, random search, genetic algorithm: A big comparison for NAS. arXiv 2019, arXiv:1912.06059. [Google Scholar]
- Alfaiz, N.S.; Fati, S.M. Enhanced credit card fraud detection model using machine learning. Electronics 2022, 11, 662. [Google Scholar] [CrossRef]
- Kataria, A.; Singh, M.D. A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 2013, 3, 354–360. [Google Scholar]
- Cunningham, P.; Delany, S.J. k-Nearest neighbour classifiers-A Tutorial. ACM Comput. Surv. (CSUR) 2021, 54, 1–25. [Google Scholar] [CrossRef]
- Nishat, M.M.; Faisal, F.; Dip, R.R.; Nasrullah, S.M.; Ahsan, R.; Shikder, F.; Asif, M.A.A.R.; Hoque, M.A. A comprehensive analysis on detecting chronic kidney disease by employing machine learning algorithms. Eai Endorsed Trans. Pervasive Health Technol. 2021, 7, e1. [Google Scholar] [CrossRef]
- Khalid, H.; Khan, A.; Zahid Khan, M.; Mehmood, G.; Shuaib Qureshi, M. Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease. Comput. Intell. Neurosci. 2023, 2023, 9266889. [Google Scholar] [CrossRef] [PubMed]
- Chittora, P.; Chaurasia, S.; Chakrabarti, P.; Kumawat, G.; Chakrabarti, T.; Leonowicz, Z.; Jasiński, M.; Jasiński, Ł.; Gono, R.; Jasińska, E.; et al. Prediction of chronic kidney disease-a machine learning perspective. IEEE Access 2021, 9, 17312–17334. [Google Scholar] [CrossRef]
- Ekanayake, I.U.; Herath, D. Chronic kidney disease prediction using machine learning methods. In Proceedings of the 2020 Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 28–30 July 2020; Volume 9, pp. 260–265. [Google Scholar]
- Almustafa, K.M. Prediction of chronic kidney disease using different classification algorithms. Inform. Med. Unlocked 2021, 24, 100631. [Google Scholar] [CrossRef]
- Poonia, R.C.; Gupta, M.K.; Abunadi, I.; Albraikan, A.A.; Al-Wesabi, F.N.; Hamza, M.A. Intelligent diagnostic prediction and classification models for detection of kidney disease. Healthcare 2022, 10, 371. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).