1. Introduction
Cardiovascular Disease (CVD) often result due to symptoms such as high cholesterol, high blood pressure, and hypertension. Many studies have focused on detecting CVD in its early stages. Computer technologies have been used to diagnose diseases in patients at early stages, preventing them from becoming life-threatening [
1]. CVD is considered to be one of the most critical health concern, and many people are suffering from this disease. It not only affects old age individuals but also people across almost all age groups. The most common symptoms of CVD includes: physical body weakness, shortness of breath, dizziness, fatigue, sweatiness, and swollen feet, as shown in
Figure 1 [
2]. Predicting CVD at early stages is much effective because of multiple reasons such as it can be treated in time and can reduce the death rate [
3]. However, even with advance technology, the absence of medical experts can affect the early diagnosis of CVD.
CVD is not a new term [
4]. Everyone around the globe is familiar with this chronic disease. CVD has become the most important concern in everyday life. CVD encompasses all types of diseases that are related to the heart. Heart attack usually occurs due to bad flow of blood in veins. The only solution to treat CVD is through early detection. Health experts suggest consulting a doctor if any symptoms appears. However, many challenges are left for physicians as they strive to treat this disease [
5].
Heart failure is a serious risk and can be caused due to various factors. Physicians and medical scientists have divided these risk factors into two main categories: those that cannot be changed, such as age, sex, and family history, and factors that can be changed, e.g., high blood pressure, smoking, etc. There exist several treatment methods to treat CVD, such as angiography. However, there are some limitations to this method. One of the main drawbacks is that it is costly because when diagnosing the patient, the physician has to consider many factors, such as high chorestrol, high blood pressure, cancer, kidney, and liver disease. This entire procedure is time consuming, and there is no guarantee that the disease can be cured completely.
Table 1 shows common types of CVD [
6]. To predict CVD, some of the common attributes to consider are as follows [
6]:
The healthcare domains suffered significant challenges due to low accuracy in predicting various diseases. Artificial Intelligence (AI) techniques, Deep Learning (DL) [
7], and Machine Learning (ML) [
8] analyze the complex medical data to identify trends and risk factors, which is utilized in CVD prediction to facilitate early diagnosis and develop plans [
9]. Eventually, ML techniques made their way to the industry. However, to enhance accuracy and achieve efficient results, an automated system is required. Over the past few years, researchers have found that ML techniques work best in making predictions [
6]. ML techniques perform well when dealing with a small dataset. However, the problem arises in managing large amounts of data. To deal with the challenges faced by ML, DL [
10] came into existence [
11].
DL is the successor to ML [
6]. The main difference between ML and DL is that DL adds more depth in the model than ML. DL is highly efficient for performing tasks like pattern recognition, classification, and identification. It performs best with large dataset and can be applied on any form of data (image, text) [
11]. DL provides superior accuracy and performance, especially in feature extraction and complex pattern learning [
6]. The representation of DL is similar to an Artificial Neural Network (ANN) in terms of the hierarchical organization of data. ML and DL models are used in different domains such as healthcare, fraud detection, anomaly detection, fractional input non-linear exogenous auto-regressive (FINARX) systems [
12], and smart grids [
13]. DL can reduce time, resources, and effort in regression and increase accuracy in classification. DL models are composed of memory cells, gates, different layers (convolution, pooling, and fully connected), and activation functions.
Data imbalance is considered to be the main problem in datasets. If the data are not balanced, the model will be biased towards a specific class. In [
14], the authors did not specify any technique for balancing the dataset, even though the dataset is imbalanced. In our work, we have used the Proximity Weighted Random Affine Shadow Sampling (ProWRAS) balancing technique. This technique oversamples the minority class samples to make them equal to the majority class. In [
15], the authors proposed the Synthetic Minority OverSampling Technique (SMOTE) for data balancing. However, SMOTE may not be the best choice as it often leads to overfitting and bridging. The authors in [
16] have proposed a hybrid model of CNN and LSTM for CVD prediction. However, CNN can lead to vanishing gradient problems and LSTM is a computationally expensive model. To address this problem, we have proposed different DL models.
Table 2 highlights the research gaps identified from the literature.
In our proposed work, we have utilized DL models that would accurately predict CVD. This is because DL models have proven to be useful in the healthcare field for predicting various diseases such as diabetes, heart attacks, lung cancer, and so on. The abbreviations are mentioned in
Table 3.
Contributions
The main motivation behind this work is that the traditional DL models are unable to predict CVD at early stages. To overcome the flaws of existing models, we present Dense Belief Network (DB-Net) and Deep Vanilla Recurrent Network (DVR-Net) by combining different DL models. Class imbalance is one of the leading issues in the dataset. During classification, if data are not balanced, the model’s accuracy tends to be very low. Data balancing is achieved using oversampling and undersampling. In oversampling, minority class instances are increased to make them equal to majority class instances. In undersampling, majority class instances are reduced, which can lead to information loss. Many balancing techniques have been proposed to address data imbalance issue. The contributions made in this paper are mentioned below.
For data balancing, we used ProWRAS technique to improve model accuracy.
We proposed two models, DB-Net and DVR-Net, using DL architectures to efficiently predict CVD.
For validating our proposed model results, we implemented 10-Fold Cross Validation (10-FCV)
To identify feature’s contribution in DB-Net and DVR-Net, we used an eXplainable Artificial Intelligence (XAI) technique, SHapley Additive exPlanations (SHAP).
To see model generalizability, we performed cross-dataset evaluation.
In
Section 2, the literature on CVD prediction is discussed. In
Section 3, we discuss the DL techniques used in the proposed model. In
Section 4, an overview of the proposed model is discussed. The results are presented in
Section 5. In
Section 6, cross-dataset evaluation is discussed. Finally, the work is concluded in
Section 7.
2. Related Work
In [
2], the authors proposed a model that accurately predict heart disease using various ML techniques, such as Logistic Regression (LR), ANN, Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Decision Tree (DT). The most important features are selected using different feature learning techniques. The simulation results indicated that the proposed feature selection algorithm Fast Conditional Mutual Information (FCMIM) is compatible with the SVM classifier and achieved 85% classification accuracy.
Yuanyuan et al. [
17] introduced an Enhanced-DL-assisted Convolutional NN (EDCNN) to help doctors predict CVD using Internet of Things (IoT). The EDCNN model contains an MLP model. The validation is performed via full features and feature reduction. With reduced number of features, the accuracy of the model is recorded as well as the processing time. The proposed model is compared with other DL models such as ANN, DNN, Ensemble DL-based Smart Healthcare System (EDL-SHS), Recurrent NN (RNN), and NN Ensemble (NNE). The test results show that the proposed model achieved a 99.1% precision score.
In [
3], the authors introduced DNN for predicting heart disease. Different techniques are used, such as cross validation and Matthews Correlation Coefficient (MCC), to evaluate architectures. The dataset on which this survey is performed is publicly available. The proposed model scored 99% accuracy and outperformed the base models.
Shukur, Ban Salman, and Maad M. Mijwil introduced various ML techniques in [
18], such as LR, Random Forest (RF), ANN, SVM, and KNN, to diagnose heart disease using the Cleveland Clinic dataset. They also performed a comparison among these techniques. They found that after applying ML techniques, SVM gave the best result, with an accuracy of about 90%.
A Smart Healthcare Monitoring System (SHMS) was proposed in [
19] for predicting heart disease. The authors used ensemble DL and Feature Fusion (FF) approaches. FF was used to generate healthcare data by combining features from electronic medical records and sensor data. Features that were irrelevant and redundant were eliminated using the information gain technique, and important features were selected. The authors of this paper also compared their proposed system with other state-of-the-art models. The proposed system achieved an accuracy of 98.5%.
In [
20], the authors introduced DL algorithms, such as ANN and SVM to predict heart disease. The authors used the Cleveland dataset for training. The final results showed that ANN and SVM models performed best in terms of detecting chronic heart disease.
In [
21], the authors proposed a new model based on two SVM models to predict heart disease at early stages. The first SVM was used to remove the redundant features and the second SVM was used for prediction. For optimization, the authors applied the Hybrid Grid Search Algorithm (HGSA). After training and testing, the model achieved a 3.3% higher accuracy than the standard SVM model.
Karthick K. et al. proposed different ML models such as LR, Gaussian Naive Bayes (GNB), SVM, RF, and eXtreme Gradient Boosting for understanding and reducing heart disease symptoms. Chi-square feature extraction technique is used to select specific features. It was found that RF obtained an accuracy of 88.5% during validation.
Badal et al. [
22] addressed the problem faced by medical sciences to detect CVD. The authors proposed multiple comparisons to record different predictions. The simulation results showed that the DT classifier outperformed other ML models and achieved an accuracy of 98%.
In [
23], the authors applied various, data exploratory techniques to extract hidden patterns. Different ML algorithms were used to predict heart disease and seek better performance in prediction. The techniques that were used are GNB, DT, and LR. It was found that LR and GNB achieved the same accuracy of 82.75%. However, for the Area under Receiver Operating Characteristic (AUC-ROC) curve, GNB’s value was higher than LR.
In [
24], the mortality of patients from heart failure was detected via ML algorithms. The authors proposed a stacking model that outperformed all the base models, such as RF. RF achieved an accuracy of 88.89%, while the stacking model gave 90% accuracy and performed the best on Heart Failure Prediction dataset.
Liaqat et al. [
25] proposed a model comprising a DNN and a
statistical model. The DNN was used for classification, while the
statistical model was used for feature extraction. The proposed model was applied to the Cleveland dataset. The proposed model beat ANN, with an accuracy of 93.33%.
In [
26], the authors proposed a method for Chronic Heart Failure (CHF) based on heart sounds. The proposed method was composed of ML and DL. DL learned from signal temporal representation and ML learned from important features. This method was applied to the CHF dataset and scored an accuracy of 89.3%, which beats the base model by 9.1%.
In [
27], the authors introduced a DL approach to predict heart failure patients with high risk. LSTM memory layers were utilized in the proposed model. For comparison, different ML techniques, such as LR, RF, and XGBoost were used. The results showed that the proposed model outperformed all the base ML techniques with an AUC of 0.861.
In [
28], the authors proposed a traditional LR model to predict heart failure. After training and testing, the outcomes were modeled with LR and compared with DL models and a Gradient Boosting Model using sequential and non-sequential inputs. It turned out that the proposed DL models outperformed traditional LR.
Mohamed et al. [
29] proposed DL models such as LSTM and CNN, for automatic detection of arrhythmia for IoT application. The images were obtained from ECG signals, represented in a two-dimensional format and fed into DL models for classification. The proposed model in case of noisy data were found to be efficient and robust.
In [
30], the authors introduced a framework comprising SVM, DT, and RF for diabetes prediction. The authors named their proposed model the Intelligent Diabetes Mellitus Prediction Framework (IDMPF). Using this model, the authors described different assessment strategies, training procedures, and issues in predicting diabetes. With an accuracy rate of 83%, the suggested model performed the best.
Rayan. et al. [
31] proposed using a DL model, CNN, and an ML technique, KNN. CNN was used for accurate disease prediction and feature extraction while KNN was used to find an accurate match in the dataset and predict the result. The proposed model’s performance comparison was performed comparing GNB, LR, and DT. Since the proposed model had a 97% accuracy rate, it was deemed to be the best.
Kumar et al. [
32] introduced ensemble learning techniques to predict Parkinson disease. The proposed model beat all the traditional ML models, such as SVM, KNN, RF, DT, Multilayer Perceptron (MLP), Stacking Classifier (SC), and LR. The proposed model outperformed all the models, yielding 94.87% accuracy.
In [
33], the authors proposed a model to predict diabetes in an individual. After sorting out the complete dataset, the authors used different DL models to predict diabetes. They compared different models such as two-class decision jungle, two-class LR, two-class boosted DT, as well as two-class NN. The two-class boosted model turned out to perform the best and provided accuracy of 99%, beating the other two models.
Table 4 presents the summarized related work.
3. Deep Learning Techniques for Cardiovascular Disease Prediction
Individual deep models that are utilized in the proposed model are discussed in this section.
3.1. Role of Deep Belief Network for Cardiovascular Disease Prediction
Deep Belief Network (DBN) is a popular DL model. The main reason for its popularity is its deep architecture. DBN is basically an alternate to a DNN in which there are multiple input, output, and hidden layers. Each layer is interconnected with the other. The connections between these layers have weights that are drived from the input. An unsupervised DBN is used for feature detection, while supervised DBN is used for classification. DBN has many real time applications including object detection, computer vision, NLP, and more [
34]. The design of the generative unsupervised algorithm, DBN, includes a stack of RBMs. The input of the second layer’s is the output of first layer, and so on. RBM training is conducted using input data. The visible layer (v) and hidden layer (h) of RBM are its two layers. While the hidden layer captures features, the visible layer holds input data. There is no connection between the nodes in a single layer, which is the primary distinction between Boltzmann and RBM. In order to discover significant characteristics in raw data, RBM is utilized. The DBN model functions in the following two phases.
Step 1: Training In DBN, each RBM is trained, individually. The top layer RBMs are trained first. They capture the least important or low-level features from the data. After one RBM layer is trained, it becomes visible for the next RBM layer.
Step 2: Fine-Tune
Once the entire DBN model is trained, it is backpropagated to minimize the error in the model by updating the weights:
where
x is the input and
h are the hidden layers. DBN can be implemented using RBM and an autoencoder. We have used RBM in DBN layers because of the following reason.
Vanishing is one of the biggest challenges in Deep Neural Network (DNN). During training, weights are updated using gradient, and if gradient is very small, vanishing gradient can occur. During pretraining, RBM learns data representation that does not change with minor changes in the weights. This means that the accuracy of the model will increase because the gradient used to update weights is significantly larger [
35]. On the other hand, in an autoencoder, decoders and encoders have their own gradients due to which vanishing gradient can be more worse leading to poor accuracy of model and poor performance.
3.2. Role of Deep Neural Network for Cardiovascular Disease Prediction
The foundation of many AI applications is the DNN. DNN is capable of extracting features and is advanced form of NN. Usually NN [
36] is composed of the input, output, and hidden layer. However, DNN has more than one hidden layers. Each layer contains neurons.
Input Layer: it receives data from user which could be text, image, or any type of data.
Hidden Layer: this layer is placed between the input and the output layers. Each hidden layer has its own nodes. Hidden layers process the data received from input layer, perform weighted calculations, apply activation function, and produce output.
Output Layer: it is used to produce the predicted outcome:
where
Y is the output,
i is the unit,
l is the layer associated with the
x output,
k is the prior layer,
w is the weight, and
b is the bias in Equation (
2). In DNN, each node in hidden layer makes a connection between the inputs to determine the output [
37].
3.3. Role of Vanilla Recurrent Neural Network for Cardiovascular Disease Prediction
Vanilla Recurrent Neural Network (VRNN) is simplified variant of RNN that consist of three units: an input unit, hidden unit, and output unit, along with a context unit in its hidden layer. The VRNN is a bit different from traditional NN because it includes a feedback loop that increases the model’s learning capability.
In a VRNN, there is a loop through which information flows. Before making any decision, the VRNN considers both the previous and the current input to predict the next word. During backpropagation, the weights of the neurons are updated. Several elements need to considered such as activation function, transfer function, bias, learning rate, and error. The sigmoid activation function is employed in the output layer, which outputs a number between 0 and 1.
To process time series data, VRNN uses three layers, i.e., input, hidden and output. The functionality of each layer is described below.
Input Layer: at each time step t, VRNN takes the time series data as an input. The input layer is not responsible for training. It simply passes the input to the hidden layer.
Hidden Layer: the primary function of this layer is to store the past information and to capture dependencies. In Equation (
3),
is the hidden state with respect to time,
is the input at time step
t,
is the input connection weight matrix,
is the bias for hidden state, and
represents sigmoid activation function. To capture dependencies, the hidden state stores the information of both the current state
and the previous state
:
Output Layer: the main functionality of output layer is to produce the output
y at each time step
t. Equation (
4) presents the equation for output layer, where
y presents the output,
is the weight matrix, and
is the bias [
38,
39]:
In Equation (
5), we present the sigmoid activation function formula through which we have computed our results.
3.4. Role of Densely Connected Convolutional Network for Cardiovascular Disease Prediction
Densely Connected Convolutional Network (DenseNet) is a DL architecture specifically designed for computer vision task, image processing, and object detection. In traditional CNN, identifying and utilizing the optimal parameters is quite challenging. DenseNet addresses this issue by using efficient parameters. DenseNet is composed of an input layer, an initial convolution layer, dense blocks, transition layers, a global feature pooling, and a fully connected layer.
Input Layer: This layer is responsible for receiving input for DenseNet from the previous layer, which can be either a feature map or an image.
Initial Convolution Layer: Similar to many traditional NN, DenseNet included an initial convolutional layer. This layer is followed by an activation function and batch normalization. Low-level features are extracted using this layer.
Dense Blocks: These blocks consist of multiple convolutional layers. Each convolutional layer’s output is concatenated with the output of the preceding layers within the same block.
Transition Layer: Due to the increasing number of feature maps, dense blocks become computationally expensive. To address this problem, transition layers are introduced after each dense block. Transition layers are combinations of convolutional layers and pooling layers.
Global Feature Pooling: At the end of DenseNet architecture, there is a global average pooling followed by sigmoid classifier for classification and a fully connected layer.
Fully Connected Layer: this layer is used for classification task followed by softmax activation function [
40]:
where
l is the number of layers that receive the feature map from all the preceding layers; however,
,
,...,
concatenate the feature map produced in preceding layers [
41].
4. Proposed System Models for Cardiovascular Disease Prediction
The overall flow of the work is discussed in this section. DB-Net and DVR-Net are proposed in our work, as shown in
Figure 2. Before passing the input dataset to the DL models, it needs to be preprocessed. The dataset is clean and does not have the missing values and outliers. The problem with the dataset that we are using is that it is not balanced. Class 0 has significantly higher number of instances than class 1. To address class imbalance, we employed the ProWRAS balancing technique.
4.1. Heart Disease Health Indicator Dataset
We used the Heart Disease Health Indicator (HDHI) dataset for predicting CVD [
42]. The Centers for Disease Control (CDC) conducts an annual health-related telephone survey known as the Behavioural Risk Factor Surveillance System (BRFSS). Each year since 1948, the health survey is conducted in which the responses of over 400,000 Americans are collected. The HDHI dataset is in the form of a table with rows and columns. Complete information of heart patients is represented in rows. This dataset contains 253,680 instances and 22 features that will be used for binary classification of CVD. These features are derived from individual responses to questions asked directly from participants. Removing outliers is an essential step in data cleaning. Our dataset is already cleaned, and does not have missing values and outliers. For scaling, we have performed a sandard scaler. The dataset is highly imbalanced, with class 0 having 91% instances and class 1 having 9% instances; therefore, preprocessing is essential.
4.2. Standard Scaler
A popular feature scaling method for preparing data for DL models is the standard scaler. Each feature in the dataset is transformed by subtracting the mean and dividing by the standard deviation, resulting in a new distribution with a standard deviation of one and a mean of zero:
where the feature value is donated by
X, the mean of each feature is represented by
, and standard deviation is denoted by
.
4.3. Data Balancing Using Proximity Weighted Random Affine Shadow Sampling
Data balancing is one of the most important steps in preprocessing. Data imbalancing issues occur when the instances of one class are higher than the instances of other class. There are two classes in the dataset: majority class and minority class. In the dataset, the majority class contains more data instances than the minority class, resulting in an imbalanced dataset. If the dataset is not balanced, then it leads to poor and inefficient performance of the model.
Data imbalance is one of the major issues in real-world datasets [
43]. Two different types of data balancing techniques are used to address data imbalance issue, i.e., undersampling and oversampling. In undersampling, majority class instances are reduced to make them equal to the minority class instances. In this case, most of the information is lost. In oversampling, minority class instances are increased by creating synthetic samples.
Various methods are used for data balancing. However, it is crucial to carefully perform balancing to avoid information loss. The HDHI dataset is highly imbalanced such that class 0 has 229,787 participants who do not have heart disease and class 1 has 23,893 participants who have heart disease. These number highlights that only a small number of participants have heart disease. In this paper, we have used the ProWRAS data balancing technique to balance the ratio of minority and majority classes.
Proximity Weighted Random Affine Shadow Sampling
ProWRAS is an oversampling technique that cluster the data points of minority class. The clusters are formed based on the distance the data points have from the majority class. Weights are assigned to each cluster. A large weight is given to the cluster that is close to the majority class. Weights determine the number of synthetic samples from each cluster. To avoid overlapping, majority class synthetic samples having low variance are generated in borderline clusters [
44].
max-conv and net-conv are two parameters depending on whether the model offers four oversampling schemes. max-conv is used to select the number of shadowsamples from which one synthetic data point is generated. The oversampling technique is designed so that data are balanced between both classes. Multiple classifiers can produce efficient and accurate results [
45].
4.4. Description of DB-Net and DVR-Net for Cardiovascular Disease Prediction
The proposed models basically concatenates the features of multiple models and then these features are concatenated into a single model. The models are used for various purpose like detecting CVD, theft detection in smart grids, and so on. It incorporates the advantages of individual models and give enhanced performance. The main objective of the proposed models is to overcome the limitation of individual model and to produce improved classification and prediction results. In DB-Net and DVR-Net model, two base models are combined. The main points that need to be noted are as follows.
The proposed models take more execution time than the individual models;
Both models are computationally expensive, as they require more resources;
DB-Net and DVR-Net require large dataset for training to prevent model from overfitting.
There are two ways to create a model that combines the prediction of different models. In a sequential model, the output of the first base model is fed as input to the second base model and a single output is generated. In the case of a parallel model, both the models work independently. Input is fed to both models. At the end, the outputs of both models are then concatenated to produce a single output.
4.4.1. Dense Belief Network for Cardiovascular Disease Prediction
DB-Net processes input data independently through dense layers and dense blocks in parallel. The working of DB-Net is provided in Algorithm 1. Different feature representations are extracted by the dense layers and dense blocks operating independently. While dense layers concentrate on capturing fundamental feature patterns, dense blocks which are made up of interconnected layers enhance and refine more intricate representations. To create an accurate prediction, the outputs from the dense layers and dense blocks are concatenated at the last layer. Because of this parallel methodology, DB-Net is able to efficiently learn a wide variety of feature abstractions that are essential for the prediction of CVD. By enhancing gradient flow within the dense blocks, the parallel design helps to avoid vanishing gradients and promotes more stable training. DB-Net is very good at extracting deep abstract features in both structured and unstructured data that can be used to find complex patterns in CVD datasets. This is particularly helpful in the healthcare industry, as the models predictions may be influenced by subtle relationships between features like blood pressure, cholesterol levels, and other indicators. DB-Net is well-suited to handle high-dimensional health data in CVD prediction because it reduces the risk of the vanishing gradient problem, improving stability during training. Densely connected layers in DB-Net facilitate effective information and gradient flow by allowing direct connections between early and subsequent layers. DB-Net learn complex data dependencies more efficiently in order to capture the relationships between different health indicators in CVD datasets.
Algorithm 1: Dense Belief Network for Cardiovascular Disease Prediction |
- Require:
Dataset X, labels y - Ensure:
Predicted labels for CVD classification - 1:
Input: CVD data X - 2:
Initialize Dense layers and Dense blocks - 3:
Pass X through a series of dense layers to extract fundamental feature patterns: - 4:
- 5:
Pass X through a series of dense blocks to capture complex representations: - 6:
- 7:
Concatenate the outputs of the dense layers and dense blocks: - 8:
- 9:
Pass the concatenated output through the final prediction layer for CVD classification: - 10:
return
|
4.4.2. Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction
DVR-Net is a parallel model in which data flows independently through two distinct processing paths: one with recurrent layers and another with fully connected layers, as shown in Algorithm 2. First, data pass through several dense layers. Hidden patterns in the data are captured by these layers as they gradually extract foundational feature representations using weighted connections and activation functions. Data are processed concurrently by recurrent layers. To construct meaningful feature representations, each layer uses connections that capture dependencies across time steps utilizing both past and present inputs. The final layer creates a complete feature set by concatenating the outputs from the dense and recurrent paths. The DVR-Nets prediction is then generated by passing this final combined representation through an output layer. To avoid the common vanishing gradient problems in DVR-Net, the parallel processing structure enables effective gradient flow in both pathways, which helps to ensure stable training. Richer and more varied representations are provided by DVR-Nets. Two separate pathways that capture different feature types make it more suitable for spotting hidden patterns.
Algorithm 2: Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction |
- Require:
Dataset X, labels y - Ensure:
Predicted labels for CVD classification - 1:
Input: CVD data X - 2:
Initialize Dense and Recurrent Layers - 3:
Pass X through multiple dense layers to capture static feature representations: - 4:
- 5:
Pass X through multiple recurrent layers to capture sequential dependencies: - 6:
- 7:
Concatenate the outputs of dense and recurrent layers: - 8:
- 9:
Pass the concatenated output through the final prediction layer for CVD classification: - 10:
return
|
5. Simulation and Results of Cardiovascular Disease Prediction
We have discussed the results of our proposed model using different performance metrics in this section.
5.1. Proposed Model Performance Evaluation
This section discusses the performance of the proposed models using a variety of performance measures, including accuracy, F1-score, precision, recall, and execution time.
5.2. Metrics
The term “positive” basically refers to the majority class and “negative” refers to the minority class. The following measures are used in the performance evaluation:
TP = Heart disease is categorized as true;
TN = Normal person is categorized as normal;
FN = Heart disease is categorized as normal;
FP = Normal person is categorized with heart disease.
Accuracy: Out of all the instances, it demonstrates how well a classifier has classified the instances:
Precision: It calculates the TP instances from the total number of instances predicted as positive. It can be calculated by Equation (
9):
Recall: It calculates the ratio of instances which are predicted as positive from the instances which are actually positive. It can be calculated by Equation (
10):
F1-score: It calculates the Harmonic mean of recall and precision. It can be calculated by Equation (
11):
5.3. 10-Fold Cross Validation
10-FCV is used for evaluating the overall performance of our model. The dataset is split 10-fold [
46]. The dataset is then trained and tested 10 times to avoid overfitting. For 10-fold validation, we split the dataset into three sets: Training, Testing, and Validation. Each time, a different fold is selected for validation. During training, the model is iterated over 10-folds. In each fold, nine folds are used for training while the remaining fold is used for validation. After training and testing, we obtain an accuracy score along with F1-score, precision, recall, and execution time. FCV improves the model’s effectiveness by providing highly reliable results.
5.4. Validation of Dense Belief Network for Cardiovascular Disease Prediction
DB-Net outperform the state-of-the-art models with an accuracy, F1-score, precision, and recall of 92%, 92%, 92%, and 92%, respectively, as shown in
Table 5. The combination of dense layers are used for their feature extraction capabilities and densely connected layers produces DB-Net’s superior results. By directly using the outputs from the previous layers, the architecture, in particular the dense block structure, helps avoid the vanishing gradient issue, thus improving gradient flow and model stability during training. Furthermore, dense blocks minimizes parameter usage, increasing computational effectiveness without compromising performance. DenseNet and DBN the baseline models that perform poorly because they are not able to generalize across all features. While DenseNet alone is effective, it might not capture enough abstract patterns essential for the prediction of CVD. DBN, on the other hand, takes a lot of processing time and can face overfitting issues.
The baseline models (DBN and DenseNet) have significantly lower training, inference time, and memory usage compared to DB-Net, despite its superior performance. DB-Net architecture integrates the outputs of dense blocks and consists of several densely connected layers. Since the outputs of each layer are directly connected to those of preceding layers, there are many parameters to store and control throughout training and inference. The baseline models, DBN and DenseNet, with fewer layers and consequently fewer parameters, use less memory. Because of its dual-process design and layered structure, DB-Net has a longer inference time and more memory usage than other models. Baseline models are faster during inference and have less memory usage because they employ a single architecture.
The ‘Adam’ optimizer aids in the compilation of DB-Net, exhibiting a learning rate of 0.0001. The complete architecture utilized in base models and DB-Net is illustrated in
Table 6. The training of the DB-Net model is performed using the training data.
Table 7 shows the list of hyperparameters utilized in individual model and DB-Net model. According to the results, it is validated that our DB-Net beats the base models with an accuracy of 92%.
Figure 3 shows the performance metrics of individual models and DB-Net for 10, 20, and 30 epochs.
Figure 4 shows that the execution time of both DB-Net and DVR-Net is higher than the individual models. Higher execution times are the tradeoff for improved accuracy. For instance, DB-Net can take up to 2756 s to train over 30 epochs, while DenseNet and DBN only take about 2356 and 2377 s, respectively. Because dense layers and blocks are integrated to provide stable training and effective gradient flow, this results in increased computational load. The DB-Net confusion matrix, which shows the classification performance on the CVD prediction task, is shown in
Figure 5. The matrix displays the quantity of False Positives (FP), False Negatives (FN), True Positives (TP), and True Negatives (TN). High predictive accuracy is demonstrated by DB-Net, which exhibits a good balance between correctly classified positive and negative cases. This outcome shows that DB-Net is successfully identifying patterns in the data, leading to a low number of incorrect classifications. By concentrating on borderline samples, ProWRAS reduces FP and improves the precision and recall values of the model. The Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) curve for DB-Net is shown in
Figure 6, demonstrating the model’s capacity to discriminate between positive and negative classes at different threshold values. The strong model performance indicated by an AUC score near 1.0 reflects the high discriminative power of DB-Net in predicting CVD.
Table 8,
Table 9 and
Table 10 show the results of 10-FCV for DB-Net with the individual models for different performance parameters against 30, 20, and 10 epochs. According to the results, it is validated that our DB-Net beats the base models with an accuracy of 91%.
5.5. Validation of Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction
The results show that DVR-Net outperforms the base models with an accuracy of 91%, F1-score of 90%, recall of 90%, and precision of 91%, as shown in
Table 11. When compared to DNN (about 81.5%) and VRNN (about 83.8%) DVR-Net achieves higher accuracy (about 90.5%) and F1-scores for the former. This suggests that DVR-Net offers a better trade-off between recall and precision and is more accurately able to predict CVD. DNN is unable to capture sequential dependencies; its accuracy and F1-score are comparatively lower. This lessens its efficacy when working with complex datasets that exhibit temporal relationships, like data from health indicators, where prediction relies heavily on sequential trends. DNN is computationally expensive and requires specialized hardware to train large datasets. With a recall rate of up to 92%, VRNN outperforms DNN, indicating its superiority in detecting positive cases (heart disease patients). Nevertheless, it has a lower precision (about 79–80%), which means there are more false positives. The overall F1-score and accuracy are impacted by this imbalance. At the time of compilation of DVR-Net, the ‘Adam’ optimizer is used along with ‘loss function’. DVR-Net is trained and the final output is generated. Architecture details for baseline models and DVR-Net is shown in
Table 12. The hyperparameters used in individual model and DVR-Net model are shown in
Table 13. The Stochastic Gradient Descent (SGD) optimizer is used, as it performs better by reducing fluctuations and greatly accelerating convergence [
47].
Table 14,
Table 15 and
Table 16 show the results of 10-FCV for DVR-Net with 10, 20, and 30 epochs. The findings demonstrate that, with an accuracy of 91%, our DVR-Net model performs better than the base models.
Figure 7 show the different metrics of individual models and DVR-Net.
Figure 4 shows that the execution time of both DB-Net and DVR-Net is higher than the individual models. DVR-Net has a substantially longer execution time (training and inference) than DNN and VRNN. While DNN and VRNN are significantly faster, taking only about 742 and 2393 s, respectively, DVR-Net, for instance, can take up to 8712 s to train for 30 epochs. The DVR-Net model complexity, which processes data using both recurrent layers and dense layers, is the cause of this longer execution time. The parameters in DVR-Net have some redundancy because it contains both dense and recurrent layers. In comparison to a single architecture like DNN or VRNN, the model requires twice as much memory because it must store the weights and biases for both the dense and recurrent components. DVR-Net also monitors the intermediate activations from both the dense and recurrent paths during inference and training. Because the model must store activations from two distinct architectures for backpropagation and gradient updates, this further raises the memory requirement. The confusion matrix for DVR-Net is shown in this
Figure 8, which summarizes the classification performance of the model by displaying the counts of FP, FN, TP, and TN. DVR-Net successfully detects both positive cases (patients with CVD) and negative cases (healthy individuals) with few misclassifications, as indicated by high values in the TP and TN cells. This balanced accuracy shows how well DVR-Net can differentiate between the two dataset classes. The AUC score in
Figure 9 depicts the ROC curve, which indicates how well the model discriminates. A commonly used metric to assess the models ability to distinguish between positive and negative classes across a range of thresholds is the ROC-AUC.
Figure 9 illustrates a high level of accuracy with an AUC near 1.0, suggesting that DVR-Net does a great job of differentiating between people who are healthy and those who are at risk of CVD.
5.6. SHapley Additive exPlanations for Cardiovascular Disease Prediction
SHAP is a technique that helps us to understand our model’s outcome in more depth. SHAP is basically a visualization technique through which we can easily visualize the performance of our model. It explains the contribution of each feature to the prediction.
SHAP sums up the behavior of each feature and decomposes the output. It calculates the value of each feature in the model and provides its outcome. SHAP is used to understand the importance of each feature and it is best for handling complex behaviors.
Determining the significance or contribution of each feature in DL is really challenging. SHAP values make each feature that contributes to the model’s output visible. To forecast each feature’s behavior for the prediction of CVD, we employed SHAP [
48].
SHAP values are crucial for improving the interpretability of model outputs, particularly in complex models used for CVD predictions. By giving each feature a SHAP value we can determine how much of a contribution each feature makes to a particular prediction. Cooperative game theory is the source of SHAP values, which divide the prediction output among all features according to their respective impacts, thereby determining the role of each feature. This approach makes it easier to understand how various health indicators like heart rate, blood pressure, and cholesterol affect the final risk assessment for CVD. We applied SHAP explainer to view the internal working of our proposed model. SHAP explainer is used with kernel explainer with background data that is summarized. After the kernel explainer, we used all 22 features from our dataset to generate SHAP values. Using SHAP values, different visualization graphs are generated.
Force Plot: SHAP force plots of DB-Net and DB-Net shown in
Figure 10 and
Figure 11 illustrate individual predictions by demonstrating how each feature influences the model output to either move toward or away from a particular class in CVD prediction. The base value or average model output across the dataset is the focal point of each force plot. Red (positive SHAP values) indicates features that are pushing the prediction toward higher risk, while blue (negative SHAP values) indicates features that are pulling it toward lower risk. In both DB-Net and DVR-Net, for example, high blood pressure or high cholesterol may cause the prediction to lean toward a positive risk classification, whereas normal values for these characteristics may cause it to deviate from a risk indication. DVR-Net display more dynamic changes in SHAP values because of its focus on temporal or sequential patterns, while DB-Net’s force plot display a more stable gradient because the DenseNet and DBN components handle individual feature importance differently.
Summary Plot: The SHAP summary plot, as illustrated in
Figure 12, provides a broad picture of which features are most important for the predictions in each instance. Each · represents an instance in the dataset, and the color of the · on the summary plot indicates the feature value. The most significant features are at the top of the list, arranged according to importance. For example, characteristics like age, blood pressure, and cholesterol levels may rank highly in both models for CVD predictions. The recurrent component of DVR-Net emphasize sequentially influenced features like blood pressure trends or glucose levels over time, while dense blocks emphasis on feature richness in DB-Net result in a high concentration of critical features like cholesterol.
Waterfall Plot: The path from the base value to the final prediction can be traced with the aid of SHAP waterfall plots, which show a breakdown of each feature contribution for a single prediction. The waterfall plots for DB-Net and DVR-Net, shown in
Figure 13 and
Figure 14, display the features in descending order of how they affect the prediction, with each feature either having a positive or negative effect on the outcome. For example, in a high-risk prediction, the plot might indicate that while factors like young age may deduct from the risk score, high cholesterol adds a significant positive SHAP value.
Dependence Plot: Features are plotted against their values for each instance in the dataset using the SHAP dependence plot shown in
Figure 15 and
Figure 16. If, for instance, the dependence plot is for the feature age, it will display the change in the SHAP value. Every point on this plot represents a distinct individual in the dataset, and the color of each point indicates the value of a feature that strongly interacts with the main feature. This aids in determining whether a feature value (such as age) rises or falls in relation to the anticipated CVD risk.
5.7. Ablation Study
In the ablation study, we compared our proposed models, DB-Net and DVR-Net, with different DL models.
Table 17 shows the performance metrics’ comparison of different DL models with proposed models. From the results, it is clear that our proposed models are more robust and efficient as compared to the state-of-the-art models.
DBNs may be difficult and computationally costly to train, especially on big datasets. DBNs, like many other DL systems, are sensitive to hyperparameters. Therefore, selecting proper initialization techniques and hyperparameters is critical. VRNNs are primarily intended to handle sequential data and are unsuitable for our dataset, which is not time-dependent. DNNs can lead to overfitting issues when we are dealing with large datasets or high-dimensional data. HighwayNet does not perform well on this dataset because of its gating mechanism, which makes it complex and does not give good accuracy because it deals well with sequential data. On this dataset, ShuffleNet also does not perform well, as it leads to overfitting issues. From the results, we can see that the ShuffleNet model is biased and does not perform better than the proposed models. ResNet is well-known for its depth and capacity to train very deep architectures due to skip connections. However, these skip connections fail to adequately reflect the data’s complexity. DenseNet’s dense connection architecture allows each layer to directly access features from the layers that came before it. While this promotes feature reuse and information flow, it can also result in the duplication of learned representations. Redundant features increase the model’s memory footprint and computational complexity while not necessarily enhancing performance, particularly in deeper systems.
DB-Net and DVR-Net outperform all the baseline models. The reason is that the proposed models combine the strengths of different base models. They concatenate the features learned by the base models in the final layer. It helps reduce overfitting issues and increase generalization by combining predictions from many models. The features extracted by base models are then combined and passed to subsequent layers. This allows the proposed models to capture a more diverse representation of input data.
SMOTE and Random Oversampling Examples (ROSE) are thoroughly tested on the HDHI dataset, with an emphasis on how well each proposed worked with our proposed models, DB-Net and DVR-Net. These findings are shown in
Table 18, along with the effects of each technique. SMOTE and ROSE pose a challenge to DB-Net and DVR-Net mainly because of the way these methods generate synthetic data. ROSE and SMOTE are two oversampling strategies intended to rectify class imbalance by producing synthetic samples for the minority class. These approaches do, however, have certain drawbacks that affect DL models that are sensitive to subtle patterns in data, such as DB-Net and DVR-Net. Through interpolation between existing minority samples, SMOTE creates synthetic instances yielding data points that fall between two existing minority class points. It disregards their distribution in the feature space though and treats every minority sample equally. Overfitting may result from this method, especially if the synthetic samples produced do not closely match the real decision boundary. Within a radius of each minority point, ROSE randomly creates samples around instances that already exist in the feature space. This may lead to noise and sampling instability, especially in regions in the plane where the decision boundaries of the two classes are tight. In order to capture complex feature hierarchies, DB-Net depends on dense connections between layers and the DenseNet structure, which makes it sensitive to even the smallest changes and patterns in the data. DB-Net might overfit to artificial patterns that do not accurately represent the relationships in the original data when it is trained on data that have been synthesized using SMOTE or ROSE. DVR-Net’s recurrent structure, which is intended to capture dependencies between features, depends on precisely learning relationships among complex patterns. The natural spatial relationships within the feature space are not always preserved by oversampling using SMOTE or ROSE, particularly close to decision boundaries where classes intersect. The synthetic samples may introduce noise or irrelevant patterns that are inconsistent with the underlying data structure, which decreases DVR-Nets capacity to generalize effectively. ProWRAS contributes to DB-Net and DVR-Net by specify a more accurate boundary in borderline regions, which avoids overfitting in minority samples. This makes models perform better because there is a mix of both positive and negative instances, which helps the DB-Net with its dense and complex layer connections and DVR-Net due to its recurrent connections.
6. Cross-Dataset Evaluation for Cardiovascular Disease Prediction
In DL, we usually train and test our model on the same dataset. However, it is critical to ensure the model’s performance on unknown data. It is essential to ensure how well the model performs on training data and how well it generalizes on unseen data. DL models are often trained on a specific dataset to identify patterns and correlations within it. However, it is critical to guarantee that the model’s performance extends beyond the training data and applies effectively to new data. Cross-dataset evaluation validates the model’s robustness and applicability across many datasets.
In our work, we evaluated our models on different datasets. For model training, we have used the HDHI, dataset and for testing, the CVD dataset is used. In 2021, the CDC provided CVD risk prediction data through the Behavioral Risk Factor Surveillance System (BRFSS) [
49]. The CVD dataset displays all patient data in tabular form. This dataset contains 308,854 samples and 19 features. The BRFSS dataset is first preprocessed and cleaned. In preprocessing, two columns have been renamed, i.e., Weight_(kg) as weight and Height_(cm) as height. The majority of the dataset columns are categorical. Dealing with categorical values is challenging for DL models. Therefore, we have formatted them so that DL models can easily handle them. On the other hand, the training dataset HDHI contains 253,680 samples and 22 features. Using a different dataset to evaluate a model raises a number of issues. The testing dataset may differ from the training dataset in terms of feature representation, data distribution, and class distribution. To avoid this issue, we have removed three less important features, i.e., sex, fruits, and education, from the training dataset. The training dataset is then split into 80% training dataset and 20% testing dataset. DB-Net and DVR-Net are trained on the training dataset. Then, the CVD dataset is imported from the google drive and is first preprocessed by factorizing the categorical columns. The CVD dataset is then split into training and testing data. Cross-dataset evaluation results on DB-Net and DVR-Net are shown in
Table 19. The results indicate that the models generalize well to unseen data and maintain good performance. From the results, it is clear that there is no difference in results when the model is trained and tested on different dataset.
7. Conclusions
CVD has become the most rising issue recently. Many CVD cases are reported annually. To predict and control CVD at early stages, we need a systemic approach. Different DL models are proposed for CVD prediction in healthcare. ProWRAS balancing technique is used for balancing the HDHI dataset as the dataset is highly imbalanced. For classification, two models are formed, DB-Net and DVR-Net, for CVD prediction. The proposed models’ performance is tested using different performance metrics and the results proved that the proposed DB-Net beats all the base models by achieving an accuracy of 91%, F1-score of 91%, precision of 93%, recall of 89%, and execution time of 1883 s on 30 epochs with a batch size of 32. The DVR-Net beats the state-of-the-art models with an accuracy of 90%, F1-score of 90%, precision of 90%, recall of 90%, and execution time of 2853 s. 10-FCV is performed to fine-tune parameters of the proposed models and to estimate models prediction. To view the performance of each feature in the model, we have used an XAI technique, SHAP. SHAP values sum up the behavior of each feature and decompose the output. Cross-dataset evaluation is performed to check the models’ robustness. In the future, using feature selection techniques, we can achieve better performance in terms of predicting CVD. The models’ performance is assessed on a specific dataset, which may limit their generalizability to other datasets with different features. Models can be tested on other datasets with different features to determine their robustness and generalizability. Additionally, no hyperparameter-tuning techniques were utilized. To boost the model performance, we can utilize different hyperparameter-tuning techniques and optimization algorithms to minimize model complexity and execution time.