Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers
Abstract
:1. Introduction
2. Materials and Methods
- (1)
- Firstly, Authors begin by addressing the matter of datasets, which they then refine and standardize. The datasets are then castoff to train in addition test classifiers in order to determine which ones provide the finest accuracy.
- (2)
- Secondly, authors categorize the best values or features using the correlation matrix.
- (3)
- Thirdly, the authors applied the ML classifiers to the preprocessed dataset to obtain the maximum accuracy which was performed through parameter tuning.
- (4)
- Fourthly, the proposed classifiers are evaluated on accuracy, precision (specificity), recall (sensitivity), and F-Measure.
- (5)
- Finally, the proposed classifiers give better accuracy as associated to the accuracy of state-of-the-art as listed in Table 1.
3. Proposed Methodology
3.1. Dataset
3.2. Correlation Matrix
4. Result and Analysis
4.1. Disease Status
4.2. Analyzing Sex
4.3. Analyzing Age
4.4. Analyzing Chest Pain
4.5. Analyzing Fasting Blood Sugar
4.6. Analyzing Resting ElectroCardioGraphic
4.7. Analyzing Exercise-Induced Angina
4.8. Analyzing Slope
4.9. Analyzing Coronary Artery
4.10. Analyzing Thalassemia Affects the Heart
5. Result and Discussions
5.1. Evaluating Parameters
- Accuracy is measured as the number of fittingly identified examples divided by the total occurrences in the dataset as in Equation (1).
- Precision: the average likelihood of retrieving relevant information, as indicated in Equation (2).
- Recall: the average likelihood of complete retrieval, which is defined in Equation (3).
- F-Measure: once the precision and recall for the classification problem have been calculated, the two scores are combined to compute the F-Measure. The conventional F measure is computed as shown in Equation (4).
5.2. Performance of ML Classifiers
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
Notations | Description |
---|---|
ML | Machine Learning |
UCI | University of California, Irvine |
EDA | Exploratory Data Analysis |
CVDs | CardioVascular Diseases |
kNN | k-Nearest Neighbors |
LR | Logistics Regression |
RF | Random Forest |
DT | Decision Tree |
NB | Naïve Bayes |
XGB | XGBoost |
NN | Neural Network |
ANN | Artificial Neural Network |
SVM | Support Vector Machine |
EVF | Evimp functions |
CART | Classification and Regression |
MARS | Multivariate Adaptive Regression Splines |
FBS | Fasting Blood Sugar |
Resting ECG | Resting ElectroCardioGraphic |
PEM | Proposed Ensemble Model |
References
- World Health Organization. Cardiovascular Diseases (CVDs). Available online: https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1 (accessed on 10 January 2022).
- World Health Organization. Cardiovascular Diseases (CVDs). Available online: https://www.afro.who.int/health-topics/cardiovascular-diseases (accessed on 10 January 2022).
- Available online: https://www.heart.org/en/health-topics/high-blood-pressure/why-high-blood-pressure-is-a-silent-killer/know-your-risk-factors-for-high-blood-pressure (accessed on 10 January 2022).
- Balla, C.; Pavasini, R.; Ferrari, R. Treatment of Angina: Where Are We? Cardiology 2018, 140, 52–67. [Google Scholar] [CrossRef] [PubMed]
- Rumsfeld, J.S.; Joynt, K.E.; Maddox, T.M. Big data analytics to improve cardiovascular care: Promise and challenges. Nat. Rev. Cardiol. 2016, 13, 350–359. [Google Scholar] [CrossRef] [PubMed]
- Johnson, K.W.; Torres Soto, J.; Glicksberg, B.S.; Shameer, K.; Miotto, R.; Ali, M.; Ashley, E.; Dudley, J.T. Artificial intelligence in cardiology. J. Am. Coll. Cardiol. 2018, 71, 2668–2679. [Google Scholar] [CrossRef] [PubMed]
- Gomathi, K.; Shanmugapriyaa, D. Heart disease prediction using data mining classification. Int. J. Res. Appl. Sci. Eng. Technol. 2016, 4, 60–63. [Google Scholar]
- Alom, Z.; Azim, M.A.; Aung, Z.; Khushi, M.; Car, J.; Moni, M.A. Early Stage Detection of Heart Failure Using Machine Learning Techniques. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning, Cox’s Bazar, Bangladesh, 23–25 September 2021. [Google Scholar]
- Juhola, M.; Joutsijoki, H.; Penttinen, K.; Shah, D.; Pölönen, R.P.; -Setälä, K. Data analytics for cardiac diseases. Comput. Biol. Med. 2022, 142, 105218. [Google Scholar] [CrossRef] [PubMed]
- Gour, S.; Panwar, P.; Dwivedi, D.; Mali, C. A Machine Learning Approach for Heart Attack Prediction. In Intelligent Sustainable Systems; Springer: Singapore, 2022; pp. 741–747. [Google Scholar]
- Available online: https://www.potentiaco.com/what-is-machine-learning-definition-types-applications-and-examples/ (accessed on 10 January 2022).
- Maryam, A.; Mahmoud, Q.; Mohammad, H. Machine Learning Classification Techniques for Heart Disease Prediction: A Review. Int. J. Eng. Technol. 2018, 7, 5373–5379. [Google Scholar] [CrossRef]
- Marimuthu, M.; Abinaya, M.; Hariesh, K.; Madhankumar, K.; Pavithra, V. A Aalto Review on Heart Disease Prediction using Machine Learning and Data Analytics Approach. Int. J. Comput. Appl. 2018, 181, 975–8887. [Google Scholar] [CrossRef]
- Golande, A.; Pavan Kumar, T. Heart disease prediction using effective machine learning techniques. Int. J. Recent Technol. Eng. IJRTE 2019, 8, 944–950, ISSN 2277-3878. [Google Scholar]
- Sharma, S.; Parmar, M. Heart diseases prediction using deep learning neural network model. Int. J. Innov. Technol. Explor. Eng. IJITEE 2020, 9, 124–137, ISSN 2278-3075. [Google Scholar] [CrossRef]
- Arunpradeep, N.; Niranjana, G. Different Machine Learning Models Based Heart Disease Prediction. Int. J. Recent Technol. Eng. IJRTE 2020, 8, 544–548, ISSN 2277-3878. [Google Scholar]
- Ravindhar, N.V.; Anand, H.S.; Ragavendran, G.W. Intelligent diagnosis of cardiac disease prediction using machine learning. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 1417–1421, ISSN 2278-3075. [Google Scholar] [CrossRef]
- Latha, C.B.C.; Jeeva, S.C. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform. Med. Unlocked 2019, 16, 100203. [Google Scholar] [CrossRef]
- Mishra, J.; Tarar, S. Chronic Disease Prediction Using Deep Learning; Springer: Singapore, 2020; p. 201211. [Google Scholar]
- Abdeldjouad, F.Z.; Brahami, M.; Matta, N. A Hybrid Approach for Heart Disease Diagnosis and Prediction Using Machine Learning Techniques; Springer: Cham, Switzerland, 2020; p. 299306. [Google Scholar]
- Tarawneh, M.; Embarak, O. Hybrid approach for heart disease prediction using data mining techniques. Acta Sci. Nutr. Health 2019, 3, 147151. [Google Scholar]
- Javid, I.; Alsaedi, A.K.Z.; Ghazali, R. Enhanced accuracy of heart disease prediction using machine learning and recurrent neural networks ensemble majority voting method. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 540–551. [Google Scholar] [CrossRef] [Green Version]
- Kumar, N.; Sikamani, K. Prediction of chronic and infectious diseases using machine learning classiers-A systematic approach. Int. J. Intell. Eng. Syst. 2020, 13, 1120. [Google Scholar]
- Saqlain, S.M.; Sher, M.; Shah, F.A.; Khan, I.; Ashraf, M.U.; Awais, M.; Ghani, A. Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl. Inf. Syst. 2019, 58, 139–167. [Google Scholar]
- Mohan, S.; Thirumalai, C.; Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 2019, 7, 81542–81554. [Google Scholar] [CrossRef]
- Miao, F.; Cai, Y.P.; Zhang, Y.X.; Fan, X.M.; Li, Y. Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest. IEEE Access 2018, 6, 7244–7253. [Google Scholar]
- Raju, C.; Philipsy, E.; Chacko, S.; Suresh, L.P.; Rajan, S.D. A survey on predicting heart disease using data mining techniques. In Proceedings of the 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), Tiruchengode, India, 2–3 March 2018; pp. 253–255. [Google Scholar]
- Chicco, D.; Jurman, G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med. Inform. Decis. Mak. 2020, 20, 16. [Google Scholar] [CrossRef] [PubMed]
- Ahmad, E.; Tiwari, A.; Kumar, A. Cardiovascular Diseases (CVDs) Detection using Machine Learning Algorithms. Int. J. Res. Appl. Sci. Eng. 2020, 8, 2341–2346. [Google Scholar] [CrossRef]
- Wang, L.; Zhou, W.; Chang, Q.; Chen, J.; Zhou, X. Deep ensemble detection of congestive heart failure using short-term RR intervals. IEEE Access 2019, 7, 69559–69574. [Google Scholar] [CrossRef]
- Gupta, A.; Kumar, R.; Arora, H.S.; Raman, B. MIFH: A machine intelligence framework for heart disease diagnosis. IEEE Access 2019, 8, 14659–14674. [Google Scholar] [CrossRef]
- Rashmi, G.O.; Kumar, U.M.A. Machine learning methods for heart disease prediction. Int. J. Eng. Adv. Technol. 2019, 8, 220–223. [Google Scholar]
- Nadakinamani, R.G.; Reyana, A.; Kautish, S.; Vibith, A.S.; Gupta, Y.; Abdelwahab, S.F.; Mohamed, A.W. Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques. Comput. Intell. Neurosci. Vol. 2022. [Google Scholar] [CrossRef]
- Hossen, M.D.; Tazin, T.; Khan, S.; Alam, E.; Sojib, H.A.; Monirujjaman Khan, M.; Alsufyani, A. Supervised machine learning-based cardiovascular disease analysis and prediction. Math. Probl. Eng. 2021, 2021, 1792201. [Google Scholar] [CrossRef]
- Saboor, A.; Usman, M.; Ali, S.; Samad, A.; Abrar, M.F.; Ullah, N. A Method for Improving Prediction of Human Heart Disease Using Machine Learning Algorithms. Mob. Inf. Syst. 2022, 1–9. [Google Scholar] [CrossRef]
- Arumugam, K.; Naved, M.; Shinde, P.P.; Leiva-Chauca, O.; Huaman-Osorio, A.; Gonzales-Yanac, T. Multiple disease prediction using Machine learning algorithms. Mater. Today Proc. 2021. [Google Scholar] [CrossRef]
- Gupta, C.; Saha, A.; Reddy, N.S.; Acharya, U.D. Cardiac Disease Prediction using Supervised Machine Learning Techniques. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; Volume 2161, p. 012013. [Google Scholar]
- Truong, V.T.; Nguyen, B.P.; Nguyen-Vo, T.H.; Mazur, W.; Chung, E.S.; Palmer, C.; Tretter, J.T.; Alsaied, T.; Pham, V.T.; Do, H.Q.; et al. Application of machine learning in screening for congenital heart diseases using fetal echocardiography. Int. J. Cardiovasc. Imaging 2022, 38, 1007–1015. [Google Scholar] [CrossRef]
- Abdalrada, A.S.; Abawajy, J.; Al-Quraishi, T.; Islam, S.M.S. Machine learning models for prediction of co-occurrence of diabetes and cardiovascular diseases: A retrospective cohort study. J. Diabetes Metab. Disord. 2022, 21, 251–261. [Google Scholar] [CrossRef]
- Singh, N.; Bhatnagar, S. Machine Learning for Prediction of Drug Targets in Microbe Associated Cardiovascular Diseases by Incorporating Host-pathogen Interaction Network Parameters. Mol. Inform. 2022, 41, 2100115. [Google Scholar] [CrossRef] [PubMed]
- Doppala, B.P.; Bhattacharyya, D.; Janarthanan, M.; Baik, N. A Reliable Machine Intelligence Model for Accurate Identification of Cardiovascular Diseases Using Ensemble Techniques. J. Healthc. Eng. 2022, 2022, 2585235. [Google Scholar] [CrossRef]
- Kondababu, A.; Siddhartha, V.; Kumar, B.B.; Penumutchi, B. A comparative study on machine learning based heart disease prediction. Mater. Today Proc. 2021. [Google Scholar] [CrossRef]
- Gudmundsson, E.F.; Björnsdottir, G.; Sigurdsson, S.; Andersen, K.; Thorsson, B.; Aspelund, T.; Gudnason, V. Carotid plaque is strongly associated with coronary artery calcium and predicts incident coronary heart disease in a population-based cohort. Atherosclerosis 2022, 346, 117–123. [Google Scholar] [CrossRef] [PubMed]
- Bharti, R.; Khamparia, A.; Shabaz, M.; Dhiman, G.; Pande, S.; Singh, P. Prediction of heart disease using a combination of machine learning and deep learning. Comput. Intell. Neurosci. 2021, 2021, 8387680. [Google Scholar] [CrossRef]
- Kishor, A.; Jeberson, W. Diagnosis of heart disease using internet of things and machine learning algorithms. In Proceedings of the Second International Conference on Computing, Communications, and Cyber-Security, Ghaziabad, India, 3–4 October 2020; pp. 691–702. [Google Scholar]
- Gupta, G.; Adarsh, U.; Reddy, N.S.; Rao, B.A. Comparison of various machine learning approaches uses in heart ailments prediction. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; Volume 2161, p. 012010. [Google Scholar]
- Marco, L.; Farinella, G.M. (Eds.) Computer Vision for Assistive Healthcare; Academic Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Siuly, S.; Zhang, Y. Medical big data: Neurological diseases diagnosis through medical data analysis. Data Sci. Eng. 2016, 1, 54–64. [Google Scholar] [CrossRef] [Green Version]
- Available online: https://archive.ics.uci.edu/ml/datasets/heart+disease(Dataset) (accessed on 10 January 2022).
Author | Year | Methods/Classifiers | Datasets | Evaluation Parameters | Highest Accuracy% |
---|---|---|---|---|---|
[36] | 2022 | LR, NB, RF REP, M5P Tree, J48, JRIP | Hungarian and Statlog (heart) dataset | RMSE, MAE | RF 99.81% |
[37] | 2021 | RF, DT, LR | UCI Cleveland database | Accuracy | LR 92.10% |
[38] | 2021 | AB, ET, LR, MNB, CART, LDA, SVM, RF, XGB | Heart Dataset (UCI repository) | Accuracy | AB 90% |
[39] | 2021 | SVM, NB, DT | Heart Dataset (UCI repository) | Accuracy | DT 90% |
[40] | 2022 | KNN, DT, LR, NB, SVM | Heart Dataset (UCI repository) | Accuracy, Specificity, Sensitivity, F1-Score | LR 92% |
[41] | 2022 | RF into fetal echocardiography | Congenital heart disease database of 3910 Singleton Fetuses | Sensitivity, Specificity | sensitivity 0.85, specificity 0.88, |
[42] | 2022 | LR, Evimp functions, Multivariate adaptive regression | DiScRi dataset | Accuracy, Sensitivity, Specificity | 94.09% |
[43] | 2022 | LR, KNN, SVM, RF | Pathogen, Host feature | Accuracy | RF 99% |
[44] | 2022 | DT, LR, XGB, NB, GB, RF, SVM, PEM | Cardiovascular disease dataset (Mendeley Data Center) | Accuracy | EM 96.75% |
[45] | 2021 | NB, LM, LR, DT, RF, SVM, HRFLM | Heart Cleveland (UCI repository) | Accuracy, Precision, Specificity, Sensitivity, F-Measure | HRFLM 88.4% |
[47] | 2021 | RF, LR, KNN, SVM, DT, XGB | Public Health Dataset | Accuracy, Specificity, Sensitivity | SVM 84% |
[48] | 2022 | K-NN, DT, RF, MLP, NB, L-SVM, | IoT based Produced Data | Accuracy | L-SVM 92.30%, RF 92.30% |
[49] | 2022 | DT, NB, KNN, RF, ANN, Ada, GBA | Heart Disease (Kaggle Repository) | Accuracy, Precision, recall, f1-score | RF 86.89% |
In our Proposed Scheme | |||||
Proposed Methodology | 2022 | LR, SVM, NB, RF, XGB, DT, NN, RBF, KNN, GBT, MLP | Heart Disease (UCI Repository) | Accuracy, Precision (specificity), Recall (sensitivity), F-Measure | RF 96.28% |
Dataset Details | |||
---|---|---|---|
No. | Features | Description | Value |
1. | Age | Age is an important aspect of health care. | Its value is an integer. |
2 | Sex | Gender | Female = 0, Male = 1 |
3. | Chest pain(cp) | The patient is suffering from chest pain. | Asymptomatic = 4, typicalangina = 1, atypicalangina = 2, non-anginal pain = 3 |
4. | RestingBloodPressure (trestbps) | High blood pressure ensues with some other factors which increase the risk. | It has either an integer or float value. |
5. | Cholesterol(Chol) | Serum cholesterol | It has either an integer or float value |
6. | FastingBloodSugar(Fbs) | Fasting blood sugar is more than 120 mg/dL | 0 = false; 1 = true |
7. | RestingECG (restech) | ElectroCardioGraphic Resting | ST-T wave abnormality =2, Normal =0, Left ventricular hypertrophy =1, |
8. | Max Heart Rate Achieved (thalach) | This is the highest heart rate you have ever had. | It has either an integer or float value. |
9. | Exercise-Induced Angina (exang) | Angina instigated by exercise | no = 0, yes = 1 |
10. | Oldpeak | Exercise-tempted ST depression compared to rest | It shows the value as either an integer or a float. |
11. | Slope | slope of peak exercise ST segment | flat = 1, downsloping = 2, Upsloping =0 |
12. | Coronary Artery (ca) | Fluoroscopy has colored a large number of major vessels. | It has either an integer or float value. |
13. | Thalassemia (thal) | Normal, reversible defect, fixed defect, | Measuring scales: 3 = normal; 7 = reversable defect; 6 = fixed defect |
14. | Num(target: Heart Disease predicting attribute) | Heart disease diagnosis (angiographic disease status) | 0 indicates a diameter narrowing of less than 50%, 1 indicates a diameter narrowing of more than 50%. |
Attributes | Value |
---|---|
Age | 0.225439 |
Sex | 0.280937 |
Chest Pain | 0.433798 |
Fasting Blood Sugar | 0.028046 |
Resting Blood Pressure | 0.144931 |
Cholesterol | 0.085239 |
Exercise-Induced Angina | 0.436757 |
Max Heart Rate Achieved | 0.421741 |
Resting ECG | 0.137230 |
Oldpeak | 0.430696 |
Slope | 0.345877 |
Coronary Artery | 0.391724 |
Thalassemia | 0.344029 |
Heart Disease Diagnosis | 1.000000 |
Classifiers | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|
Logistic Regression | 88.25% | 0.8791 | 0.8825 | 0.8865 |
Support Vector Regression | 84.97% | 0.8407 | 0.8496 | 0.8437 |
Naive Bayes | 88.25% | 0.8825 | 0.8854 | 0.8825 |
Random Forest | 96.28% | 0.9628 | 0.9537 | 0.9668 |
XGBoost | 88.25% | 0.8786 | 0.8810 | 0.8815 |
Decision Tree | 84.97% | 0.8497 | 0.8475 | 0.8527 |
Neural Network | 84.33% | 0.8433 | 0.8501 | 0.8413 |
k-Nearest Neighbors | 70.21% | 0.7021 | 0.6901 | 0.7101 |
Gradient Boosted Tree | 95.83% | 0.9493 | 0.9583 | 0.9613 |
Radial Basis Function | 86.35% | 0.8635 | 0.8644 | 0.8635 |
Multilayer perceptron | 94.96% | 0.9516 | 0.9506 | 0.9506 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hassan, C.A.u.; Iqbal, J.; Irfan, R.; Hussain, S.; Algarni, A.D.; Bukhari, S.S.H.; Alturki, N.; Ullah, S.S. Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers. Sensors 2022, 22, 7227. https://doi.org/10.3390/s22197227
Hassan CAu, Iqbal J, Irfan R, Hussain S, Algarni AD, Bukhari SSH, Alturki N, Ullah SS. Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers. Sensors. 2022; 22(19):7227. https://doi.org/10.3390/s22197227
Chicago/Turabian StyleHassan, Ch. Anwar ul, Jawaid Iqbal, Rizwana Irfan, Saddam Hussain, Abeer D. Algarni, Syed Sabir Hussain Bukhari, Nazik Alturki, and Syed Sajid Ullah. 2022. "Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers" Sensors 22, no. 19: 7227. https://doi.org/10.3390/s22197227
APA StyleHassan, C. A. u., Iqbal, J., Irfan, R., Hussain, S., Algarni, A. D., Bukhari, S. S. H., Alturki, N., & Ullah, S. S. (2022). Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers. Sensors, 22(19), 7227. https://doi.org/10.3390/s22197227