Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models †
Abstract
:1. Introduction
2. Literature Review
3. Proposed Model
- Phase 1: We developed the proposed model for heart disease prediction using machine-learning classification algorithms. These are some of the steps followed: Data was collected from different sources through sensors. We deployed sensors (heart rate and blood pressure sensors, pulse oximeters, activity sensors, temperature sensors, respiration sensors, and electrocardiogram (ECG) signals) into the patient’s body for testing.
- Phase 2: The major steps were data cleansing, normalization, and feature engineering (data preparation and analysis (DPA)). The objective of this step was to train the model. During DPA, we also identified relevant features in the dataset and then performed standard statistical tests and correlation analysis. When feature engineering was completed, we used novel machine learning classification algorithms (RF, DTC, K-NN, SVM, GNB, AdaBoost, Bagging, KNN, and LR) for CVD prediction. Figure 1 presents the structural healthcare monitoring model for cardiovascular prediction. In our proposed model we used machine learning classification algorithms like random forest, decision tree, K-NN, Gussian Naive Bayes, AdaBoost, bagging, and logistic regression are examples of ensemble learning algorithm that generates multiple numbers of decision trees during training and produces output of the predicted classes of the individual trees. Normally, this classifier is satisfactory when we are dealing with high-dimensional data, as well as when found missing values in the dataset. A decision tree (DT) is one of the classifiers that allow splitting data into different homogeneous sets based on good features. It is simple because this classifier can handle both numerical and categorical data. K-NN is used for the same purpose, and classifies the unseen heart disease instances of a patient. A support vector machine is one of the powerful classifiers that finds the best hyperplane and separates data into different classes. It performs well using high-dimensional data and is capable of handling both linear and non-linear data. Gussian Naive Bayes (GNB) is a probabilistic method that computes the likelihood of each class given the input data and chooses the class with the highest likelihood. It is simple and quick, and can handle data with multiple dimensions. AdaBoost is another type of ensemble learning technique which combines different weak classifiers to create another strong classifier. As our dataset was large, we used this algorithm to handle the imbalance issue, which can enhance the performance of weak classifiers. Bagging is also an ensemble learning technique that generates different subsets of training data and trains a classifier using each subset. We used bagging classifiers to address overfitting and enhance the overall performance of our model. This technique involves training multiple models on different subsets of the data and combining their predictions to achieve better generalization to new, unseen data. Logistic regression was also used for classification purposes to estimate the probability of a binary outcome. Mutual information feature selection (MIFS) is a technique that chooses relevant features by evaluating their mutual information using the target variable. MIFS can be particularly beneficial when working with sensor data because it can identify features that have a strong correlation with the target variable. This method evaluates how much information a feature provides about the target variable and selects those that have a high degree of mutual information. Using MIFS, it is possible to select the most relevant features and reduce the dimensionality of a dataset. This can improve the accuracy and efficiency of heart disease prediction models that use sensor data. Figure 1 shows the proposed structural healthcare monitoring model for cardiovascular prediction.
- Phase 3: In this phase, we mainly trained the model and tested its effectiveness. The best model was identified during the training phase, and the model, after being trained, faced a challenging task of being tested on unfamiliar data. Performance metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic (ROC) curve were evaluated. The required hyper parameters were used to enhance CVD prediction accuracy.
- Phase 4: The model was validated using performance evaluation parameters on the test dataset, which we originally collected from patients through sensory data to ensure that it was not over fitting to the training data validation required.
4. Results and Discussion
The Critical Observation
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tang, C.; Liu, Z.; Li, L. Mechanical sensors for cardiovascular monitoring: From battery-powered to self-powered. Biosensors 2022, 12, 651. [Google Scholar]
- Lin, J.; Fu, R.; Zhong, X.; Yu, P.; Tan, G.; Li, W.; Zhang, H.; Li, Y.; Zhou, L.; Ning, C. Wearable sensors and devices for real-time cardiovascular disease monitoring. Cell Rep. Phys. Sci. 2021, 2, 100541. [Google Scholar]
- Esther, G.M.; Ahila, S.S.; Kumar, P.H. Coronary Heart Disease (Cad) Monitoring System Based On Wireless Sensors. J. Physics Conf. Series 2019, 1362, 012045. [Google Scholar]
- Salvi, S.; Dhar, R.; Karamchandani, S. IoT-Based Framework for Real-Time Heart Disease Prediction Using Machine Learning Techniques. In Innovations in Cyber Physical Systems: Select Proceedings of ICICPS 2020; Springer: Singapore, 2021; pp. 485–496. [Google Scholar]
- Kumar, R.; Kumar, P.; Kumar, Y. Time series data prediction using IoT and machine learning technique. Procedia Comput. Sci. 2020, 167, 373–381. [Google Scholar] [CrossRef]
- Kumar, P.M.; Gandhi, U.D. A novel three-tier Internet of Things architecture with machine learning algorithm for early detection of heart diseases. Comput. Electr. Eng. 2018, 65, 222–235. [Google Scholar] [CrossRef]
- Yin, H.; Jha, N.K. A health decision support system for disease diagnosis based on wearable medical sensors and machine learning ensembles. IEEE Trans. Multi-Scale Comput. Syst. 2017, 3, 228–241. [Google Scholar] [CrossRef]
- Ravi, D.; Wong, C.; Lo, B.; Yang, G.Z. A deep learning approach to on-node sensor data analytics for mobile or wearable devices. IEEE J. Biomed. Health Inform. 2016, 21, 56–64. [Google Scholar] [CrossRef] [PubMed]
Algorithms | Accuracy | Precision | Recall | F1-Score | TNR | TPR |
---|---|---|---|---|---|---|
RF | 89.90 | 88.47 | 87.07 | 85.11 | 78.4% | 84% |
DTC | 88.87 | 85.62 | 86.82 | 82.30 | 82.5% | 54% |
K-NN | 87.16 | 86.34 | 85.99 | 88.40 | 89.2% | 82% |
SVM | 93.87 | 93.33 | 93.67 | 93.35 | 77.6% | 82% |
GNB | 87.25 | 89.20 | 87.54 | 91.20 | 89.2% | 86% |
AdaBoost | 85.20 | 88.45 | 85.56 | 87.20 | 78.6% | 84% |
Bagging | 89.54 | 89.99 | 90.20 | 81.25 | 87.7% | 86% |
KNN | 87.55 | 89.25 | 88.89 | 90.20 | 89.7% | 87% |
LR | 92.25 | 90.20 | 89.99 | 90.09 | 90.4% | 88% |
Name of the Model | Accuracy | AUC | TNR | TPR | F-Score |
---|---|---|---|---|---|
RF | 89.90 | 0.88 | 78.4% | 84% | 85.11 |
DTC | 88.87 | 0.87 | 82.5% | 54% | 82.30 |
K-NN | 87.16 | 0.86 | 89.2% | 82% | 88.40 |
SVM | 93.87 | 0.91 | 77.6% | 82% | 93.35 |
GNB | 87.25 | 0.88 | 89.2% | 86% | 91.20 |
AdaBoost | 85.20 | 0.85 | 78.6% | 84% | 87.20 |
Bagging | 89.54 | 0.90 | 87.7% | 86% | 81.25 |
KNN | 87.55 | 0.87 | 89.7% | 87% | 90.20 |
LR | 92.25 | 0.93 | 90.4% | 88% | 90.09 |
Model | MSE | RMSE |
---|---|---|
RF | 0.37322 | 0.47185 |
DTC | 0.33547 | 0.57805 |
K-NN | 0.45281 | 0.59233 |
SVM | 0.18752 | 0.32589 |
GNB | 0.32558 | 0.50785 |
AdaBoost | 0.45287 | 0.60890 |
Bagging | 0.32897 | 0.54780 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Padhy, N. Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models. Eng. Proc. 2023, 58, 73. https://doi.org/10.3390/ecsa-10-16239
Padhy N. Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models. Engineering Proceedings. 2023; 58(1):73. https://doi.org/10.3390/ecsa-10-16239
Chicago/Turabian StylePadhy, Neelamadhab. 2023. "Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models" Engineering Proceedings 58, no. 1: 73. https://doi.org/10.3390/ecsa-10-16239
APA StylePadhy, N. (2023). Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models. Engineering Proceedings, 58(1), 73. https://doi.org/10.3390/ecsa-10-16239