Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach
Abstract
:1. Introduction
- We propose a stacking ensemble model NCDG using NB, CatBoost, DT and GB for the prediction of HD.
- SMOTE and BorderLineSMOTE balancing procedures are utilized to obtain the model’s consistent accuracy.
- We performed hard and soft voting using the VotingClassifier and compared the results to our proposed stacking model NCDG.
- An eXplainable Artificial Intelligence (XAI) technique, SHAP is used to find the contribution of the features found in the heart disease predictions of our proposed model.
- We implemented the K-Fold Cross-Validation method to validate our results, proving that our appied approach is symmetric.
2. Related Work
3. Material and Methods for Heart Disease Prediction
3.1. Role of Decision Tree (DT) for Heart Disease Prediction
3.2. Role of Naive Bayes (NB) for Heart Disease Prediction
3.3. Role of Gradient Boosting (GB) for Heart Disease Prediction
3.4. Categorical Boosting (CatBoost) for Heart Disease Prediction
4. Proposed Methodology for Heart Disease Prediction
4.1. Data Preprocessing
4.2. Heart Disease Dataset Description
4.3. Data Balancing Using BorderLineSMOTE
4.4. NCDG Stacking Model for Heart Disease Prediction
Algorithm 1: NCDG Stacking Model for Heart Disease Prediction |
|
4.5. Voting Classifier for Heart Disease Prediction
5. Results and Discussion of Proposed Models for Heart Disease Prediction
5.1. Performance Metrics
- 1.
- Accuracy;
- 2.
- Precision;
- 3.
- F1 Score;
- 4.
- Recall;
- 5.
- AUC–ROC;
- 6.
- Confusion Matrix.
5.2. Simulation Results for Heart Disease Prediction
5.3. Comparison of Different Stacking Models with the Proposed Stacking Model
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
ANN | Artificial Neural Network |
ASM | Attribute Selection Measures |
AUC–ROC | Area Under the Curve–Receiver Operating Characteristics Curve |
CatBoost | Categorical Boosting |
CVD | Cardiovascular Disease |
DT | Decision Tree |
DL | Deep Learning |
FCV | Fold Cross-Validation |
GA | Gaussian Algorithm |
GB | Gradient Boosting |
HD | Heart Disease |
KNN | K-Nearest Neighbors |
ML | Machine Learning |
NB | Naive Bayes |
NCDG | Stacking of Naive Bayes, CatBoost, Decision Tree, Gradient Boosting |
RF | Random Forest |
SHAP | SHapley Additive exPlanations |
SMOTE | Synthetic Minority Oversampling TEchnique |
TPR | True Positive Rate |
XAI | eXplainable Artificial Intelligence |
A | Attribute |
D | Dataset |
Observed value | |
L | Loss function |
p | Predicted probability |
S | Root Node |
Predicted value |
References
- Martin, S.S.; Aday, A.W.; Almarzooq, Z.I.; Anderson, C.A.; Arora, P.; Avery, C.L.; Baker-Smith, C.M.; Barone Gibbs, B.; Beaton, A.Z.; Boehme, A.K.; et al. 2024 heart disease and stroke statistics: A report of US and global data from the American Heart Association. Circulation 2024, 149, e347–e913. [Google Scholar] [PubMed]
- Yashudas, A.; Gupta, D.; Prashant, G.C.; Dua, A.; AlQahtani, D.; Reddy, A.S.K. DEEP-CARDIO: Recommendation System for Cardiovascular Disease Prediction Using IOT Network. IEEE Sens. J. 2024, 24, 14539–14547. [Google Scholar] [CrossRef]
- Hannan, A.; Cheema, S.M.; Pires, I.M. Machine learning-based smart wearable system for cardiac arrest monitoring using hybrid computing. Biomed. Signal Process. Control 2024, 87, 105519. [Google Scholar] [CrossRef]
- Chandrasekhar, N.; Peddakrishna, S. Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization. Processes 2023, 11, 1210. [Google Scholar] [CrossRef]
- Nandy, S.; Adhikari, M.; Balasubramanian, V.; Menon, V.G.; Li, X.; Zakarya, M. An intelligent heart disease prediction system based on swarm-artificial neural network. Neural Comput. Appl. 2023, 35, 14723–14737. [Google Scholar] [CrossRef]
- Abdellatif, A.; Abdellatef, H.; Kanesan, J.; Chow, C.O.; Chuah, J.H.; Gheni, H.M. An effective heart disease detection and severity level classification model using machine learning and hyperparameter optimization methods. IEEE Access 2022, 10, 79974–79985. [Google Scholar] [CrossRef]
- Rani, P.; Kumar, R.; Ahmed, N.M.S.; Jain, A. A decision support system for heart disease prediction based upon machine learning. J. Reliab. Intell. Environ. 2021, 7, 263–275. [Google Scholar] [CrossRef]
- Nissa, N.; Jamwal, S.; Neshat, M. A Technical Comparative Heart Disease Prediction Framework Using Boosting Ensemble Techniques. Computation 2024, 12, 15. [Google Scholar] [CrossRef]
- Ali, M.M.; Paul, B.K.; Ahmed, K.; Bui, F.M.; Quinn, J.M.; Moni, M.A. Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Comput. Biol. Med. 2021, 136, 104672. [Google Scholar] [CrossRef] [PubMed]
- Esmaeili, P.; Roshanravan, N.; Mousavi, S.; Ghaffari, S.; Mesri Alamdari, N.; Asghari-Jafarabadi, M. Machine learning framework for atherosclerotic cardiovascular disease risk assessment. J. Diabetes Metab. Disord. 2023, 22, 423–430. [Google Scholar] [CrossRef] [PubMed]
- Baghdadi, N.A.; Farghaly Abdelaliem, S.M.; Malki, A.; Gad, I.; Ewis, A.; Atlam, E. Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. J. Big Data 2023, 10, 144. [Google Scholar] [CrossRef]
- Wang, J.; Rao, C.; Goh, M.; Xiao, X. Risk assessment of coronary heart disease based on cloud-random forest. Artif. Intell. Rev. 2023, 56, 203–232. [Google Scholar] [CrossRef]
- Rimal, Y.; Sharma, N. Hyperparameter optimization: A comparative machine learning model analysis for enhanced heart disease prediction accuracy. In Multimedia Tools and Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–17. [Google Scholar]
- Dalal, S.; Goel, P.; Onyema, E.M.; Alharbi, A.; Mahmoud, A.; Algarni, M.A.; Awal, H. Application of Machine Learning for Cardiovascular Disease Risk Prediction. Comput. Intell. Neurosci. 2023, 2023, 9418666. [Google Scholar] [CrossRef]
- Khan, A.; Qureshi, M.; Daniyal, M.; Tawiah, K. A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction. Health Soc. Care Community 2023, 2023, 1406060. [Google Scholar] [CrossRef]
- Abdellatif, A.; Mubarak, H.; Abdellatef, H.; Kanesan, J.; Abdelltif, Y.; Chow, C.O.; Chuah, J.H.; Gheni, H.M.; Kendall, G. Computational detection and interpretation of heart disease based on conditional variational auto-encoder and stacked ensemble-learning framework. Biomed. Signal Process. Control 2024, 88, 105644. [Google Scholar] [CrossRef]
- Rao, C.; Li, M.; Huang, T.; Li, F. Stroke Risk Assessment Decision-Making Using a Machine Learning Model: Logistic-AdaBoost. CMES-Comput. Model. Eng. Sci. 2024, 139, 699–724. [Google Scholar] [CrossRef]
- Ahmad, S.; Asghar, M.Z.; Alotaibi, F.M.; Alotaibi, Y.D. Diagnosis of cardiovascular disease using deep learning technique. Soft Comput. 2023, 27, 8971–8990. [Google Scholar] [CrossRef]
- Albert, A.J.; Murugan, R.; Sripriya, T. Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology. Res. Biomed. Eng. 2023, 39, 99–113. [Google Scholar] [CrossRef]
- Almazroi, A.A.; Aldhahri, E.A.; Bashir, S.; Ashfaq, S. A Clinical Decision Support System for Heart Disease Prediction Using Deep Learning. IEEE Access 2023, 11, 61646–61659. [Google Scholar] [CrossRef]
- Yongcharoenchaiyasit, K.; Arwatchananukul, S.; Temdee, P.; Prasad, R. Gradient Boosting Based Model for Elderly Heart Failure, Aortic Stenosis, and Dementia Classification. IEEE Access 2023, 11, 48677–48696. [Google Scholar] [CrossRef]
- Anusha, K.; Archana, M.; Janardhan, G. Heart Disease Prediction Using ML and DL Approaches. In Proceedings of the International Conference on Communications and Cyber Physical Engineering, Hyderabad, India, 28–29 February 2018; Springer Nature: Singapore, 2018; pp. 113–123. [Google Scholar]
- Kumar, J.; Pandey, V.; Tiwari, R.K. Predictive Modeling for Heart Disease Detection: A Machine Learning Approach. In Proceedings of the 2024 5th International Conference on Recent Trends in Computer Science and Technology (ICRTCST), Online, 9–10 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 668–675. [Google Scholar]
- John, B. When to Choose CatBoost over XGBoost or LIGHTGBM [Practical Guide]. 2023. Available online: https://neptune.ai/blog/when-to-choose-catboost-over-xgboost-or-lightgbm (accessed on 20 June 2024).
- Jabeur, S.B.; Gharib, C.; Mefteh-Wali, S.; Arfi, W.B. CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Chang. 2021, 166, 120658. [Google Scholar] [CrossRef]
- Shahani, N.M.; Kamran, M.; Zheng, X.; Liu, C.; Guo, X. Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar Coalfield. Adv. Civ. Eng. 2021, 2021, 1–19. [Google Scholar] [CrossRef]
- Pardede, J.; Pamungkas, D.P. The Impact of Balanced Data Techniques on Classification Model Performance. Sci. J. Inform. 2024, 11, 401–412. [Google Scholar]
- Trivedi, S.; Patel, N. The Determinants of AI Adoption in Healthcare: Evidence from Voting and Stacking Classifiers. ResearchBerg Rev. Sci. Technol. 2021, 1, 69–83. [Google Scholar]
Classifier | Accuracy | F1-Score | Precision | Recall | Time (s) | AUC–ROC | TPR |
---|---|---|---|---|---|---|---|
Naive Bayes | 0.7401 | 0.7345 | 0.739 | 0.7338 | 0.5 | 0.8134 | 0.7306 |
CatBoost | 0.8832 | 0.8821 | 0.8814 | 0.8809 | 114 | 0.9612 | 0.8817 |
Decision Tree | 0.8815 | 0.8796 | 0.8789 | 0.8875 | 7 | 0.8779 | 0.8903 |
Gradient Boosting | 0.8421 | 0.8437 | 0.860 | 0.8035 | 101 | 0.9236 | 0.8612 |
NCDG | 0.9120 | 0.9123 | 0.9121 | 0.9115 | 653 | 0.9732 | 0.9118 |
AdaBoost | 0.8105 | 0.8198 | 0.805 | 0.8361 | 20 | 0.8011 | 0.8725 |
Random Forest | 0.8542 | 0.8526 | 0.8541 | 0.8543 | 99 | 0.9623 | 0.8802 |
GNCD | 0.8342 | 0.8325 | 0.8403 | 0.8230 | 972 | 0.8319 | 0.8217 |
DGNC | 0.8944 | 0.8928 | 0.8892 | 0.9134 | 512 | 0.9546 | 0.9115 |
CDGN | 0.9031 | 0.9025 | 0.8987 | 0.9109 | 995 | 0.9617 | 0.9107 |
Hard Voting | 0.8712 | 0.8624 | 0.8821 | 0.8537 | 291 | 0.8715 | 0.8518 |
Soft Voting | 0.8833 | 0.8817 | 0.8640 | 0.9116 | 203 | 0.8832 | 0.9101 |
Classifiers | Accuracy | F1_Score | Precision | Recall | Time (s) | AUC–ROC | TRP |
---|---|---|---|---|---|---|---|
Decision Tree | 0.8644 | 0.2398 | 0.2325 | 0.2475 | 4.8 | 0.3454 | 0.5412 |
Naive Bayes | 0.8479 | 0.3249 | 0.2635 | 0.4235 | 0.3 | 0.4324 | 0.4325 |
Gradient Boosting | 0.9147 | 0.1428 | 0.0823 | 0.5404 | 47 | 0.2343 | 0.3421 |
CatBoost | 0.9142 | 0.1606 | 0.5208 | 0.0949 | 58 | 0.3476 | 0.5623 |
AdaBoost | 0.9032 | 0.2247 | 0.3221 | 0.07436 | 75 | 0.4974 | 0.6743 |
Random Forest | 0.9143 | 0.2612 | 0.3192 | 0.0843 | 43 | 0.5346 | 0.6238 |
NCDG | 0.9144 | 0.0956 | 0.5522 | 0.0524 | 352 | 0.5423 | 0.5234 |
GNCD | 0.8672 | 0.2381 | 0.2361 | 0.2402 | 451 | 0.4356 | 0.6238 |
DGNC | 0.9144 | 0.1365 | 0.5328 | 0.0783 | 205 | 0.4823 | 0.4571 |
CDGN | 0.8756 | 0.3734 | 0.3307 | 0.4288 | 461 | 0.6234 | 0.4352 |
Technique | Hyperparameter | Value |
---|---|---|
SMOTE | sampling_strategy | auto |
BorderLineSMOTE | random_state | 42 |
Classifier | Accuracy | F1-Score | Precision | Recall | Time (s) | AUC–ROC | TPR |
---|---|---|---|---|---|---|---|
Naive Bayes | 0.7236 | 0.7215 | 0.7240 | 0.7153 | 66 | 0.7912 | 0.7127 |
CatBoost | 0.8721 | 0.8718 | 0.8734 | 0.8721 | 122 | 0.9537 | 0.8749 |
Decision Tree | 0.8742 | 0.8741 | 0.8645 | 0.8793 | 7 | 0.8769 | 0.8824 |
Gradient Boosting | 0.8315 | 0.8357 | 0.8134 | 0.8517 | 83 | 0.9105 | 0.8428 |
NCDG | 0.9032 | 0.9023 | 0.9015 | 0.9037 | 455 | 0.9615 | 0.8938 |
AdaBoost | 0.7912 | 0.8034 | 0.7826 | 0.8254 | 17 | 0.7632 | 0.7835 |
Random Forest | 0.8543 | 0.8543 | 0.8547 | 0.8523 | 94 | 0.9628 | 0.8821 |
GNCD | 0.8242 | 0.8221 | 0.8329 | 0.8156 | 1005 | 0.8234 | 0.8137 |
DGNC | 0.8842 | 0.8815 | 0.8739 | 0.8876 | 522 | 0.9536 | 0.8937 |
CDGN | 0.8951 | 0.8914 | 0.8821 | 0.9043 | 948 | 0.9623 | 0.9024 |
Hard Voting | 0.8532 | 0.8547 | 0.8834 | 0.8225 | 182 | 0.8542 | 0.8231 |
Soft Voting | 0.8744 | 0.8726 | 0.8624 | 0.8941 | 187 | 0.8715 | 0.8946 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sultan, S.Q.; Javaid, N.; Alrajeh, N.; Aslam, M. Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach. Symmetry 2025, 17, 185. https://doi.org/10.3390/sym17020185
Sultan SQ, Javaid N, Alrajeh N, Aslam M. Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach. Symmetry. 2025; 17(2):185. https://doi.org/10.3390/sym17020185
Chicago/Turabian StyleSultan, Sara Qamar, Nadeem Javaid, Nabil Alrajeh, and Muhammad Aslam. 2025. "Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach" Symmetry 17, no. 2: 185. https://doi.org/10.3390/sym17020185
APA StyleSultan, S. Q., Javaid, N., Alrajeh, N., & Aslam, M. (2025). Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach. Symmetry, 17(2), 185. https://doi.org/10.3390/sym17020185