DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction

Javed, Aymin; Javaid, Nadeem; Alrajeh, Nabil; Aslam, Muhammad

doi:10.3390/app142210516

Open AccessArticle

DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction

by

Aymin Javed

¹,

Nadeem Javaid

^1,*,

Nabil Alrajeh

²

and

Muhammad Aslam

³

¹

ComSens Lab, International Graduate School of Artificial Intelligence, National Yunlin University of Science and Technology, Douliu 64002, Taiwan

²

Department of Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh 11633, Saudi Arabia

³

Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3FL, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10516; https://doi.org/10.3390/app142210516

Submission received: 14 October 2024 / Revised: 5 November 2024 / Accepted: 8 November 2024 / Published: 15 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Cardiovascular Disease (CVD) is one of the main causes of death in recent years. To overcome the challenges faced during diagnosing CVD at an early stage, deep learning has been used. With advancements in technology, the clinical practice in the health care industry is likely to transform significantly. To predict CVD, we constructed two models: Dense Belief Network (DB-Net) and Deep Vanilla Recurrent Network (DVR-Net). Proximity Weighted Random Affine Shadow sampling balancing technique is used for balancing the highly imbalanced Heart Disease Health Indicator dataset. SHapley Additive exPlanations exhibits each feature’s contribution. It is used to visualize features contribution to the output of DB-Net and DVR-Net in CVD prediction. Furthermore, 10-Fold Cross Validation is performed for evaluating the proposed models performance. Cross-dataset evaluation is also conducted on proposed models to see how well our proposed models generalize on unseen data. Various evaluation measures are used for assessment of models. The proposed DB-Net outperforms all the base models by achieving an accuracy of 91%, F1-score of 91%, precision of 93%, recall of 89%, and execution time of 1883 s on 30 epochs with batch size 32. The DVR-Net beats the state-of-art models with an accuracy of 90%, F1-score of 90%, precision of 90%, recall of 90%, and execution time of 2853 s on 30 epochs with batch size 32.

Keywords:

Cardiovascular Disease; Deep Learning; Heart Failure; 10-Fold Cross Validation; SHapley Additive exPlanations

1. Introduction

Cardiovascular Disease (CVD) often result due to symptoms such as high cholesterol, high blood pressure, and hypertension. Many studies have focused on detecting CVD in its early stages. Computer technologies have been used to diagnose diseases in patients at early stages, preventing them from becoming life-threatening [1]. CVD is considered to be one of the most critical health concern, and many people are suffering from this disease. It not only affects old age individuals but also people across almost all age groups. The most common symptoms of CVD includes: physical body weakness, shortness of breath, dizziness, fatigue, sweatiness, and swollen feet, as shown in Figure 1 [2]. Predicting CVD at early stages is much effective because of multiple reasons such as it can be treated in time and can reduce the death rate [3]. However, even with advance technology, the absence of medical experts can affect the early diagnosis of CVD.

CVD is not a new term [4]. Everyone around the globe is familiar with this chronic disease. CVD has become the most important concern in everyday life. CVD encompasses all types of diseases that are related to the heart. Heart attack usually occurs due to bad flow of blood in veins. The only solution to treat CVD is through early detection. Health experts suggest consulting a doctor if any symptoms appears. However, many challenges are left for physicians as they strive to treat this disease [5].

Heart failure is a serious risk and can be caused due to various factors. Physicians and medical scientists have divided these risk factors into two main categories: those that cannot be changed, such as age, sex, and family history, and factors that can be changed, e.g., high blood pressure, smoking, etc. There exist several treatment methods to treat CVD, such as angiography. However, there are some limitations to this method. One of the main drawbacks is that it is costly because when diagnosing the patient, the physician has to consider many factors, such as high chorestrol, high blood pressure, cancer, kidney, and liver disease. This entire procedure is time consuming, and there is no guarantee that the disease can be cured completely. Table 1 shows common types of CVD [6]. To predict CVD, some of the common attributes to consider are as follows [6]:

High blood pressure;
High cholesterol;
Stroke;
Physical activity;
Age;
Heart rate;
Electrocardiogram (ECG) results;
Sex;
Physical health.

The healthcare domains suffered significant challenges due to low accuracy in predicting various diseases. Artificial Intelligence (AI) techniques, Deep Learning (DL) [7], and Machine Learning (ML) [8] analyze the complex medical data to identify trends and risk factors, which is utilized in CVD prediction to facilitate early diagnosis and develop plans [9]. Eventually, ML techniques made their way to the industry. However, to enhance accuracy and achieve efficient results, an automated system is required. Over the past few years, researchers have found that ML techniques work best in making predictions [6]. ML techniques perform well when dealing with a small dataset. However, the problem arises in managing large amounts of data. To deal with the challenges faced by ML, DL [10] came into existence [11].

DL is the successor to ML [6]. The main difference between ML and DL is that DL adds more depth in the model than ML. DL is highly efficient for performing tasks like pattern recognition, classification, and identification. It performs best with large dataset and can be applied on any form of data (image, text) [11]. DL provides superior accuracy and performance, especially in feature extraction and complex pattern learning [6]. The representation of DL is similar to an Artificial Neural Network (ANN) in terms of the hierarchical organization of data. ML and DL models are used in different domains such as healthcare, fraud detection, anomaly detection, fractional input non-linear exogenous auto-regressive (FINARX) systems [12], and smart grids [13]. DL can reduce time, resources, and effort in regression and increase accuracy in classification. DL models are composed of memory cells, gates, different layers (convolution, pooling, and fully connected), and activation functions.

Data imbalance is considered to be the main problem in datasets. If the data are not balanced, the model will be biased towards a specific class. In [14], the authors did not specify any technique for balancing the dataset, even though the dataset is imbalanced. In our work, we have used the Proximity Weighted Random Affine Shadow Sampling (ProWRAS) balancing technique. This technique oversamples the minority class samples to make them equal to the majority class. In [15], the authors proposed the Synthetic Minority OverSampling Technique (SMOTE) for data balancing. However, SMOTE may not be the best choice as it often leads to overfitting and bridging. The authors in [16] have proposed a hybrid model of CNN and LSTM for CVD prediction. However, CNN can lead to vanishing gradient problems and LSTM is a computationally expensive model. To address this problem, we have proposed different DL models. Table 2 highlights the research gaps identified from the literature.

In our proposed work, we have utilized DL models that would accurately predict CVD. This is because DL models have proven to be useful in the healthcare field for predicting various diseases such as diabetes, heart attacks, lung cancer, and so on. The abbreviations are mentioned in Table 3.

Contributions

The main motivation behind this work is that the traditional DL models are unable to predict CVD at early stages. To overcome the flaws of existing models, we present Dense Belief Network (DB-Net) and Deep Vanilla Recurrent Network (DVR-Net) by combining different DL models. Class imbalance is one of the leading issues in the dataset. During classification, if data are not balanced, the model’s accuracy tends to be very low. Data balancing is achieved using oversampling and undersampling. In oversampling, minority class instances are increased to make them equal to majority class instances. In undersampling, majority class instances are reduced, which can lead to information loss. Many balancing techniques have been proposed to address data imbalance issue. The contributions made in this paper are mentioned below.

For data balancing, we used ProWRAS technique to improve model accuracy.
We proposed two models, DB-Net and DVR-Net, using DL architectures to efficiently predict CVD.
For validating our proposed model results, we implemented 10-Fold Cross Validation (10-FCV)
To identify feature’s contribution in DB-Net and DVR-Net, we used an eXplainable Artificial Intelligence (XAI) technique, SHapley Additive exPlanations (SHAP).
To see model generalizability, we performed cross-dataset evaluation.

In Section 2, the literature on CVD prediction is discussed. In Section 3, we discuss the DL techniques used in the proposed model. In Section 4, an overview of the proposed model is discussed. The results are presented in Section 5. In Section 6, cross-dataset evaluation is discussed. Finally, the work is concluded in Section 7.

2. Related Work

In [2], the authors proposed a model that accurately predict heart disease using various ML techniques, such as Logistic Regression (LR), ANN, Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Decision Tree (DT). The most important features are selected using different feature learning techniques. The simulation results indicated that the proposed feature selection algorithm Fast Conditional Mutual Information (FCMIM) is compatible with the SVM classifier and achieved 85% classification accuracy.

Yuanyuan et al. [17] introduced an Enhanced-DL-assisted Convolutional NN (EDCNN) to help doctors predict CVD using Internet of Things (IoT). The EDCNN model contains an MLP model. The validation is performed via full features and feature reduction. With reduced number of features, the accuracy of the model is recorded as well as the processing time. The proposed model is compared with other DL models such as ANN, DNN, Ensemble DL-based Smart Healthcare System (EDL-SHS), Recurrent NN (RNN), and NN Ensemble (NNE). The test results show that the proposed model achieved a 99.1% precision score.

In [3], the authors introduced DNN for predicting heart disease. Different techniques are used, such as cross validation and Matthews Correlation Coefficient (MCC), to evaluate architectures. The dataset on which this survey is performed is publicly available. The proposed model scored 99% accuracy and outperformed the base models.

Shukur, Ban Salman, and Maad M. Mijwil introduced various ML techniques in [18], such as LR, Random Forest (RF), ANN, SVM, and KNN, to diagnose heart disease using the Cleveland Clinic dataset. They also performed a comparison among these techniques. They found that after applying ML techniques, SVM gave the best result, with an accuracy of about 90%.

A Smart Healthcare Monitoring System (SHMS) was proposed in [19] for predicting heart disease. The authors used ensemble DL and Feature Fusion (FF) approaches. FF was used to generate healthcare data by combining features from electronic medical records and sensor data. Features that were irrelevant and redundant were eliminated using the information gain technique, and important features were selected. The authors of this paper also compared their proposed system with other state-of-the-art models. The proposed system achieved an accuracy of 98.5%.

In [20], the authors introduced DL algorithms, such as ANN and SVM to predict heart disease. The authors used the Cleveland dataset for training. The final results showed that ANN and SVM models performed best in terms of detecting chronic heart disease.

In [21], the authors proposed a new model based on two SVM models to predict heart disease at early stages. The first SVM was used to remove the redundant features and the second SVM was used for prediction. For optimization, the authors applied the Hybrid Grid Search Algorithm (HGSA). After training and testing, the model achieved a 3.3% higher accuracy than the standard SVM model.

Karthick K. et al. proposed different ML models such as LR, Gaussian Naive Bayes (GNB), SVM, RF, and eXtreme Gradient Boosting for understanding and reducing heart disease symptoms. Chi-square feature extraction technique is used to select specific features. It was found that RF obtained an accuracy of 88.5% during validation.

Badal et al. [22] addressed the problem faced by medical sciences to detect CVD. The authors proposed multiple comparisons to record different predictions. The simulation results showed that the DT classifier outperformed other ML models and achieved an accuracy of 98%.

In [23], the authors applied various, data exploratory techniques to extract hidden patterns. Different ML algorithms were used to predict heart disease and seek better performance in prediction. The techniques that were used are GNB, DT, and LR. It was found that LR and GNB achieved the same accuracy of 82.75%. However, for the Area under Receiver Operating Characteristic (AUC-ROC) curve, GNB’s value was higher than LR.

In [24], the mortality of patients from heart failure was detected via ML algorithms. The authors proposed a stacking model that outperformed all the base models, such as RF. RF achieved an accuracy of 88.89%, while the stacking model gave 90% accuracy and performed the best on Heart Failure Prediction dataset.

Liaqat et al. [25] proposed a model comprising a DNN and a

X^{2}

statistical model. The DNN was used for classification, while the

X^{2}

statistical model was used for feature extraction. The proposed model was applied to the Cleveland dataset. The proposed model beat ANN, with an accuracy of 93.33%.

In [26], the authors proposed a method for Chronic Heart Failure (CHF) based on heart sounds. The proposed method was composed of ML and DL. DL learned from signal temporal representation and ML learned from important features. This method was applied to the CHF dataset and scored an accuracy of 89.3%, which beats the base model by 9.1%.

In [27], the authors introduced a DL approach to predict heart failure patients with high risk. LSTM memory layers were utilized in the proposed model. For comparison, different ML techniques, such as LR, RF, and XGBoost were used. The results showed that the proposed model outperformed all the base ML techniques with an AUC of 0.861.

In [28], the authors proposed a traditional LR model to predict heart failure. After training and testing, the outcomes were modeled with LR and compared with DL models and a Gradient Boosting Model using sequential and non-sequential inputs. It turned out that the proposed DL models outperformed traditional LR.

Mohamed et al. [29] proposed DL models such as LSTM and CNN, for automatic detection of arrhythmia for IoT application. The images were obtained from ECG signals, represented in a two-dimensional format and fed into DL models for classification. The proposed model in case of noisy data were found to be efficient and robust.

In [30], the authors introduced a framework comprising SVM, DT, and RF for diabetes prediction. The authors named their proposed model the Intelligent Diabetes Mellitus Prediction Framework (IDMPF). Using this model, the authors described different assessment strategies, training procedures, and issues in predicting diabetes. With an accuracy rate of 83%, the suggested model performed the best.

Rayan. et al. [31] proposed using a DL model, CNN, and an ML technique, KNN. CNN was used for accurate disease prediction and feature extraction while KNN was used to find an accurate match in the dataset and predict the result. The proposed model’s performance comparison was performed comparing GNB, LR, and DT. Since the proposed model had a 97% accuracy rate, it was deemed to be the best.

Kumar et al. [32] introduced ensemble learning techniques to predict Parkinson disease. The proposed model beat all the traditional ML models, such as SVM, KNN, RF, DT, Multilayer Perceptron (MLP), Stacking Classifier (SC), and LR. The proposed model outperformed all the models, yielding 94.87% accuracy.

In [33], the authors proposed a model to predict diabetes in an individual. After sorting out the complete dataset, the authors used different DL models to predict diabetes. They compared different models such as two-class decision jungle, two-class LR, two-class boosted DT, as well as two-class NN. The two-class boosted model turned out to perform the best and provided accuracy of 99%, beating the other two models.

Table 4 presents the summarized related work.

3. Deep Learning Techniques for Cardiovascular Disease Prediction

Individual deep models that are utilized in the proposed model are discussed in this section.

3.1. Role of Deep Belief Network for Cardiovascular Disease Prediction

Deep Belief Network (DBN) is a popular DL model. The main reason for its popularity is its deep architecture. DBN is basically an alternate to a DNN in which there are multiple input, output, and hidden layers. Each layer is interconnected with the other. The connections between these layers have weights that are drived from the input. An unsupervised DBN is used for feature detection, while supervised DBN is used for classification. DBN has many real time applications including object detection, computer vision, NLP, and more [34]. The design of the generative unsupervised algorithm, DBN, includes a stack of RBMs. The input of the second layer’s is the output of first layer, and so on. RBM training is conducted using input data. The visible layer (v) and hidden layer (h) of RBM are its two layers. While the hidden layer captures features, the visible layer holds input data. There is no connection between the nodes in a single layer, which is the primary distinction between Boltzmann and RBM. In order to discover significant characteristics in raw data, RBM is utilized. The DBN model functions in the following two phases.

Step 1: Training In DBN, each RBM is trained, individually. The top layer RBMs are trained first. They capture the least important or low-level features from the data. After one RBM layer is trained, it becomes visible for the next RBM layer.

Step 2: Fine-Tune

Once the entire DBN model is trained, it is backpropagated to minimize the error in the model by updating the weights:

\begin{matrix} P (x, h^{1}, . . ., h^{l}) = (\prod_{k = 0}^{l - 2} P (h^{k} | h^{k + 1}) P (h^{l - 1}, h^{l})) \end{matrix}

(1)

where x is the input and h are the hidden layers. DBN can be implemented using RBM and an autoencoder. We have used RBM in DBN layers because of the following reason.

Vanishing is one of the biggest challenges in Deep Neural Network (DNN). During training, weights are updated using gradient, and if gradient is very small, vanishing gradient can occur. During pretraining, RBM learns data representation that does not change with minor changes in the weights. This means that the accuracy of the model will increase because the gradient used to update weights is significantly larger [35]. On the other hand, in an autoencoder, decoders and encoders have their own gradients due to which vanishing gradient can be more worse leading to poor accuracy of model and poor performance.

3.2. Role of Deep Neural Network for Cardiovascular Disease Prediction

The foundation of many AI applications is the DNN. DNN is capable of extracting features and is advanced form of NN. Usually NN [36] is composed of the input, output, and hidden layer. However, DNN has more than one hidden layers. Each layer contains neurons.

Input Layer: it receives data from user which could be text, image, or any type of data.

Hidden Layer: this layer is placed between the input and the output layers. Each hidden layer has its own nodes. Hidden layers process the data received from input layer, perform weighted calculations, apply activation function, and produce output.

Output Layer: it is used to produce the predicted outcome:

\begin{matrix} y_{i}^{2} = f \sum_{j = 1}^{J} W_{(i, k)} + b_{i} \end{matrix}

(2)

where Y is the output, i is the unit, l is the layer associated with the x output, k is the prior layer, w is the weight, and b is the bias in Equation (2). In DNN, each node in hidden layer makes a connection between the inputs to determine the output [37].

3.3. Role of Vanilla Recurrent Neural Network for Cardiovascular Disease Prediction

Vanilla Recurrent Neural Network (VRNN) is simplified variant of RNN that consist of three units: an input unit, hidden unit, and output unit, along with a context unit in its hidden layer. The VRNN is a bit different from traditional NN because it includes a feedback loop that increases the model’s learning capability.

In a VRNN, there is a loop through which information flows. Before making any decision, the VRNN considers both the previous and the current input to predict the next word. During backpropagation, the weights of the neurons are updated. Several elements need to considered such as activation function, transfer function, bias, learning rate, and error. The sigmoid activation function is employed in the output layer, which outputs a number between 0 and 1.

To process time series data, VRNN uses three layers, i.e., input, hidden and output. The functionality of each layer is described below.

Input Layer: at each time step t, VRNN takes the time series data

x_{i}

as an input. The input layer is not responsible for training. It simply passes the input to the hidden layer.

Hidden Layer: the primary function of this layer is to store the past information and to capture dependencies. In Equation (3),

h_{t}

is the hidden state with respect to time,

x_{i}

is the input at time step t,

W_{h_{r}}

is the input connection weight matrix,

b_{h}

is the bias for hidden state, and

σ

represents sigmoid activation function. To capture dependencies, the hidden state stores the information of both the current state

x_{i}

and the previous state

h_{t - 1}

:

\begin{matrix} h_{t} = σ (W_{h r} . h_{t - 1} + W_{h x} . x_{i} + b_{h}) \end{matrix}

(3)

Output Layer: the main functionality of output layer is to produce the output y at each time step t. Equation (4) presents the equation for output layer, where y presents the output,

W_{y h}

is the weight matrix, and

b_{y}

is the bias [38,39]:

\begin{matrix} y = W_{y h} . h + b_{y} \end{matrix}

(4)

\begin{matrix} y = 1 / (1 + e^{- x}) \end{matrix}

(5)

In Equation (5), we present the sigmoid activation function formula through which we have computed our results.

3.4. Role of Densely Connected Convolutional Network for Cardiovascular Disease Prediction

Densely Connected Convolutional Network (DenseNet) is a DL architecture specifically designed for computer vision task, image processing, and object detection. In traditional CNN, identifying and utilizing the optimal parameters is quite challenging. DenseNet addresses this issue by using efficient parameters. DenseNet is composed of an input layer, an initial convolution layer, dense blocks, transition layers, a global feature pooling, and a fully connected layer.

Input Layer: This layer is responsible for receiving input for DenseNet from the previous layer, which can be either a feature map or an image.

Initial Convolution Layer: Similar to many traditional NN, DenseNet included an initial convolutional layer. This layer is followed by an activation function and batch normalization. Low-level features are extracted using this layer.

Dense Blocks: These blocks consist of multiple convolutional layers. Each convolutional layer’s output is concatenated with the output of the preceding layers within the same block.

Transition Layer: Due to the increasing number of feature maps, dense blocks become computationally expensive. To address this problem, transition layers are introduced after each dense block. Transition layers are combinations of convolutional layers and pooling layers.

Global Feature Pooling: At the end of DenseNet architecture, there is a global average pooling followed by sigmoid classifier for classification and a fully connected layer.

Fully Connected Layer: this layer is used for classification task followed by softmax activation function [40]:

\begin{matrix} x_{l} = H_{l} (x_{0}, x_{1}, . . ., x_{l - 1}) \end{matrix}

(6)

where l is the number of layers that receive the feature map from all the preceding layers; however,

x_{0}

,

x_{1}

,...,

x_{l - 1}

concatenate the feature map produced in preceding layers [41].

4. Proposed System Models for Cardiovascular Disease Prediction

The overall flow of the work is discussed in this section. DB-Net and DVR-Net are proposed in our work, as shown in Figure 2. Before passing the input dataset to the DL models, it needs to be preprocessed. The dataset is clean and does not have the missing values and outliers. The problem with the dataset that we are using is that it is not balanced. Class 0 has significantly higher number of instances than class 1. To address class imbalance, we employed the ProWRAS balancing technique.

4.1. Heart Disease Health Indicator Dataset

We used the Heart Disease Health Indicator (HDHI) dataset for predicting CVD [42]. The Centers for Disease Control (CDC) conducts an annual health-related telephone survey known as the Behavioural Risk Factor Surveillance System (BRFSS). Each year since 1948, the health survey is conducted in which the responses of over 400,000 Americans are collected. The HDHI dataset is in the form of a table with rows and columns. Complete information of heart patients is represented in rows. This dataset contains 253,680 instances and 22 features that will be used for binary classification of CVD. These features are derived from individual responses to questions asked directly from participants. Removing outliers is an essential step in data cleaning. Our dataset is already cleaned, and does not have missing values and outliers. For scaling, we have performed a sandard scaler. The dataset is highly imbalanced, with class 0 having 91% instances and class 1 having 9% instances; therefore, preprocessing is essential.

4.2. Standard Scaler

A popular feature scaling method for preparing data for DL models is the standard scaler. Each feature in the dataset is transformed by subtracting the mean and dividing by the standard deviation, resulting in a new distribution with a standard deviation of one and a mean of zero:

X_{scaled} = \frac{X - μ}{σ}

(7)

where the feature value is donated by X, the mean of each feature is represented by

μ

, and standard deviation is denoted by

σ

.

4.3. Data Balancing Using Proximity Weighted Random Affine Shadow Sampling

Data balancing is one of the most important steps in preprocessing. Data imbalancing issues occur when the instances of one class are higher than the instances of other class. There are two classes in the dataset: majority class and minority class. In the dataset, the majority class contains more data instances than the minority class, resulting in an imbalanced dataset. If the dataset is not balanced, then it leads to poor and inefficient performance of the model.

Data imbalance is one of the major issues in real-world datasets [43]. Two different types of data balancing techniques are used to address data imbalance issue, i.e., undersampling and oversampling. In undersampling, majority class instances are reduced to make them equal to the minority class instances. In this case, most of the information is lost. In oversampling, minority class instances are increased by creating synthetic samples.

Various methods are used for data balancing. However, it is crucial to carefully perform balancing to avoid information loss. The HDHI dataset is highly imbalanced such that class 0 has 229,787 participants who do not have heart disease and class 1 has 23,893 participants who have heart disease. These number highlights that only a small number of participants have heart disease. In this paper, we have used the ProWRAS data balancing technique to balance the ratio of minority and majority classes.

Proximity Weighted Random Affine Shadow Sampling

ProWRAS is an oversampling technique that cluster the data points of minority class. The clusters are formed based on the distance the data points have from the majority class. Weights are assigned to each cluster. A large weight is given to the cluster that is close to the majority class. Weights determine the number of synthetic samples from each cluster. To avoid overlapping, majority class synthetic samples having low variance are generated in borderline clusters [44].

max-conv and net-conv are two parameters depending on whether the model offers four oversampling schemes. max-conv is used to select the number of shadowsamples from which one synthetic data point is generated. The oversampling technique is designed so that data are balanced between both classes. Multiple classifiers can produce efficient and accurate results [45].

4.4. Description of DB-Net and DVR-Net for Cardiovascular Disease Prediction

The proposed models basically concatenates the features of multiple models and then these features are concatenated into a single model. The models are used for various purpose like detecting CVD, theft detection in smart grids, and so on. It incorporates the advantages of individual models and give enhanced performance. The main objective of the proposed models is to overcome the limitation of individual model and to produce improved classification and prediction results. In DB-Net and DVR-Net model, two base models are combined. The main points that need to be noted are as follows.

The proposed models take more execution time than the individual models;
Both models are computationally expensive, as they require more resources;
DB-Net and DVR-Net require large dataset for training to prevent model from overfitting.

There are two ways to create a model that combines the prediction of different models. In a sequential model, the output of the first base model is fed as input to the second base model and a single output is generated. In the case of a parallel model, both the models work independently. Input is fed to both models. At the end, the outputs of both models are then concatenated to produce a single output.

4.4.1. Dense Belief Network for Cardiovascular Disease Prediction

DB-Net processes input data independently through dense layers and dense blocks in parallel. The working of DB-Net is provided in Algorithm 1. Different feature representations are extracted by the dense layers and dense blocks operating independently. While dense layers concentrate on capturing fundamental feature patterns, dense blocks which are made up of interconnected layers enhance and refine more intricate representations. To create an accurate prediction, the outputs from the dense layers and dense blocks are concatenated at the last layer. Because of this parallel methodology, DB-Net is able to efficiently learn a wide variety of feature abstractions that are essential for the prediction of CVD. By enhancing gradient flow within the dense blocks, the parallel design helps to avoid vanishing gradients and promotes more stable training. DB-Net is very good at extracting deep abstract features in both structured and unstructured data that can be used to find complex patterns in CVD datasets. This is particularly helpful in the healthcare industry, as the models predictions may be influenced by subtle relationships between features like blood pressure, cholesterol levels, and other indicators. DB-Net is well-suited to handle high-dimensional health data in CVD prediction because it reduces the risk of the vanishing gradient problem, improving stability during training. Densely connected layers in DB-Net facilitate effective information and gradient flow by allowing direct connections between early and subsequent layers. DB-Net learn complex data dependencies more efficiently in order to capture the relationships between different health indicators in CVD datasets.

Algorithm 1: Dense Belief Network for Cardiovascular Disease Prediction

Require:: Dataset X, labels y
Ensure:: Predicted labels $\hat{y}$ for CVD classification
1:: Input: CVD data X
2:: Initialize Dense layers and Dense blocks
3:: Pass X through a series of dense layers to extract fundamental feature patterns:
4:: $Dense Layer Output \leftarrow Dense Layers (X)$
5:: Pass X through a series of dense blocks to capture complex representations:
6:: $Dense Block Output \leftarrow Dense Blocks (X)$
7:: Concatenate the outputs of the dense layers and dense blocks:
8:: $Concatenated Output \leftarrow Concatenate (Dense Layer Output, Dense Block Output)$
9:: Pass the concatenated output through the final prediction layer for CVD classification:
10:: $\hat{y} \leftarrow Prediction Layer (Concatenated Output)$
return $\hat{y}$

4.4.2. Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction

DVR-Net is a parallel model in which data flows independently through two distinct processing paths: one with recurrent layers and another with fully connected layers, as shown in Algorithm 2. First, data pass through several dense layers. Hidden patterns in the data are captured by these layers as they gradually extract foundational feature representations using weighted connections and activation functions. Data are processed concurrently by recurrent layers. To construct meaningful feature representations, each layer uses connections that capture dependencies across time steps utilizing both past and present inputs. The final layer creates a complete feature set by concatenating the outputs from the dense and recurrent paths. The DVR-Nets prediction is then generated by passing this final combined representation through an output layer. To avoid the common vanishing gradient problems in DVR-Net, the parallel processing structure enables effective gradient flow in both pathways, which helps to ensure stable training. Richer and more varied representations are provided by DVR-Nets. Two separate pathways that capture different feature types make it more suitable for spotting hidden patterns.

Algorithm 2: Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction

Require:: Dataset X, labels y
Ensure:: Predicted labels $\hat{y}$ for CVD classification
1:: Input: CVD data X
2:: Initialize Dense and Recurrent Layers
3:: Pass X through multiple dense layers to capture static feature representations:
4:: $Dense Output \leftarrow Dense Layers (X)$
5:: Pass X through multiple recurrent layers to capture sequential dependencies:
6:: $Recurrent Output \leftarrow Recurrent Layers (X)$
7:: Concatenate the outputs of dense and recurrent layers:
8:: $Concatenated Output \leftarrow Concatenate (Dense Output, Recurrent Output)$
9:: Pass the concatenated output through the final prediction layer for CVD classification:
10:: $\hat{y} \leftarrow Prediction Layer (Concatenated Output)$
return $\hat{y}$

5. Simulation and Results of Cardiovascular Disease Prediction

We have discussed the results of our proposed model using different performance metrics in this section.

5.1. Proposed Model Performance Evaluation

This section discusses the performance of the proposed models using a variety of performance measures, including accuracy, F1-score, precision, recall, and execution time.

5.2. Metrics

The term “positive” basically refers to the majority class and “negative” refers to the minority class. The following measures are used in the performance evaluation:

TP = Heart disease is categorized as true;
TN = Normal person is categorized as normal;
FN = Heart disease is categorized as normal;
FP = Normal person is categorized with heart disease.

Accuracy: Out of all the instances, it demonstrates how well a classifier has classified the instances:

A c c u r a c y = \frac{T N + T P}{T N + F N + T P + F P}

(8)

Precision: It calculates the TP instances from the total number of instances predicted as positive. It can be calculated by Equation (9):

P r e c i s i o n = \frac{T P}{T P + F P}

(9)

Recall: It calculates the ratio of instances which are predicted as positive from the instances which are actually positive. It can be calculated by Equation (10):

R e c a l l = \frac{T P}{T P + F N}

(10)

F1-score: It calculates the Harmonic mean of recall and precision. It can be calculated by Equation (11):

F 1 - s c o r e = 2 . \frac{P R E C I S I O N . R E C A L L}{P R E C I S I O N + R E C A L L}

(11)

5.3. 10-Fold Cross Validation

10-FCV is used for evaluating the overall performance of our model. The dataset is split 10-fold [46]. The dataset is then trained and tested 10 times to avoid overfitting. For 10-fold validation, we split the dataset into three sets: Training, Testing, and Validation. Each time, a different fold is selected for validation. During training, the model is iterated over 10-folds. In each fold, nine folds are used for training while the remaining fold is used for validation. After training and testing, we obtain an accuracy score along with F1-score, precision, recall, and execution time. FCV improves the model’s effectiveness by providing highly reliable results.

5.4. Validation of Dense Belief Network for Cardiovascular Disease Prediction

DB-Net outperform the state-of-the-art models with an accuracy, F1-score, precision, and recall of 92%, 92%, 92%, and 92%, respectively, as shown in Table 5. The combination of dense layers are used for their feature extraction capabilities and densely connected layers produces DB-Net’s superior results. By directly using the outputs from the previous layers, the architecture, in particular the dense block structure, helps avoid the vanishing gradient issue, thus improving gradient flow and model stability during training. Furthermore, dense blocks minimizes parameter usage, increasing computational effectiveness without compromising performance. DenseNet and DBN the baseline models that perform poorly because they are not able to generalize across all features. While DenseNet alone is effective, it might not capture enough abstract patterns essential for the prediction of CVD. DBN, on the other hand, takes a lot of processing time and can face overfitting issues.

The baseline models (DBN and DenseNet) have significantly lower training, inference time, and memory usage compared to DB-Net, despite its superior performance. DB-Net architecture integrates the outputs of dense blocks and consists of several densely connected layers. Since the outputs of each layer are directly connected to those of preceding layers, there are many parameters to store and control throughout training and inference. The baseline models, DBN and DenseNet, with fewer layers and consequently fewer parameters, use less memory. Because of its dual-process design and layered structure, DB-Net has a longer inference time and more memory usage than other models. Baseline models are faster during inference and have less memory usage because they employ a single architecture.

The ‘Adam’ optimizer aids in the compilation of DB-Net, exhibiting a learning rate of 0.0001. The complete architecture utilized in base models and DB-Net is illustrated in Table 6. The training of the DB-Net model is performed using the training data. Table 7 shows the list of hyperparameters utilized in individual model and DB-Net model. According to the results, it is validated that our DB-Net beats the base models with an accuracy of 92%. Figure 3 shows the performance metrics of individual models and DB-Net for 10, 20, and 30 epochs. Figure 4 shows that the execution time of both DB-Net and DVR-Net is higher than the individual models. Higher execution times are the tradeoff for improved accuracy. For instance, DB-Net can take up to 2756 s to train over 30 epochs, while DenseNet and DBN only take about 2356 and 2377 s, respectively. Because dense layers and blocks are integrated to provide stable training and effective gradient flow, this results in increased computational load. The DB-Net confusion matrix, which shows the classification performance on the CVD prediction task, is shown in Figure 5. The matrix displays the quantity of False Positives (FP), False Negatives (FN), True Positives (TP), and True Negatives (TN). High predictive accuracy is demonstrated by DB-Net, which exhibits a good balance between correctly classified positive and negative cases. This outcome shows that DB-Net is successfully identifying patterns in the data, leading to a low number of incorrect classifications. By concentrating on borderline samples, ProWRAS reduces FP and improves the precision and recall values of the model. The Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) curve for DB-Net is shown in Figure 6, demonstrating the model’s capacity to discriminate between positive and negative classes at different threshold values. The strong model performance indicated by an AUC score near 1.0 reflects the high discriminative power of DB-Net in predicting CVD. Table 8, Table 9 and Table 10 show the results of 10-FCV for DB-Net with the individual models for different performance parameters against 30, 20, and 10 epochs. According to the results, it is validated that our DB-Net beats the base models with an accuracy of 91%.

5.5. Validation of Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction

The results show that DVR-Net outperforms the base models with an accuracy of 91%, F1-score of 90%, recall of 90%, and precision of 91%, as shown in Table 11. When compared to DNN (about 81.5%) and VRNN (about 83.8%) DVR-Net achieves higher accuracy (about 90.5%) and F1-scores for the former. This suggests that DVR-Net offers a better trade-off between recall and precision and is more accurately able to predict CVD. DNN is unable to capture sequential dependencies; its accuracy and F1-score are comparatively lower. This lessens its efficacy when working with complex datasets that exhibit temporal relationships, like data from health indicators, where prediction relies heavily on sequential trends. DNN is computationally expensive and requires specialized hardware to train large datasets. With a recall rate of up to 92%, VRNN outperforms DNN, indicating its superiority in detecting positive cases (heart disease patients). Nevertheless, it has a lower precision (about 79–80%), which means there are more false positives. The overall F1-score and accuracy are impacted by this imbalance. At the time of compilation of DVR-Net, the ‘Adam’ optimizer is used along with ‘loss function’. DVR-Net is trained and the final output is generated. Architecture details for baseline models and DVR-Net is shown in Table 12. The hyperparameters used in individual model and DVR-Net model are shown in Table 13. The Stochastic Gradient Descent (SGD) optimizer is used, as it performs better by reducing fluctuations and greatly accelerating convergence [47]. Table 14, Table 15 and Table 16 show the results of 10-FCV for DVR-Net with 10, 20, and 30 epochs. The findings demonstrate that, with an accuracy of 91%, our DVR-Net model performs better than the base models. Figure 7 show the different metrics of individual models and DVR-Net. Figure 4 shows that the execution time of both DB-Net and DVR-Net is higher than the individual models. DVR-Net has a substantially longer execution time (training and inference) than DNN and VRNN. While DNN and VRNN are significantly faster, taking only about 742 and 2393 s, respectively, DVR-Net, for instance, can take up to 8712 s to train for 30 epochs. The DVR-Net model complexity, which processes data using both recurrent layers and dense layers, is the cause of this longer execution time. The parameters in DVR-Net have some redundancy because it contains both dense and recurrent layers. In comparison to a single architecture like DNN or VRNN, the model requires twice as much memory because it must store the weights and biases for both the dense and recurrent components. DVR-Net also monitors the intermediate activations from both the dense and recurrent paths during inference and training. Because the model must store activations from two distinct architectures for backpropagation and gradient updates, this further raises the memory requirement. The confusion matrix for DVR-Net is shown in this Figure 8, which summarizes the classification performance of the model by displaying the counts of FP, FN, TP, and TN. DVR-Net successfully detects both positive cases (patients with CVD) and negative cases (healthy individuals) with few misclassifications, as indicated by high values in the TP and TN cells. This balanced accuracy shows how well DVR-Net can differentiate between the two dataset classes. The AUC score in Figure 9 depicts the ROC curve, which indicates how well the model discriminates. A commonly used metric to assess the models ability to distinguish between positive and negative classes across a range of thresholds is the ROC-AUC. Figure 9 illustrates a high level of accuracy with an AUC near 1.0, suggesting that DVR-Net does a great job of differentiating between people who are healthy and those who are at risk of CVD.

5.6. SHapley Additive exPlanations for Cardiovascular Disease Prediction

SHAP is a technique that helps us to understand our model’s outcome in more depth. SHAP is basically a visualization technique through which we can easily visualize the performance of our model. It explains the contribution of each feature to the prediction.

SHAP sums up the behavior of each feature and decomposes the output. It calculates the value of each feature in the model and provides its outcome. SHAP is used to understand the importance of each feature and it is best for handling complex behaviors.

Determining the significance or contribution of each feature in DL is really challenging. SHAP values make each feature that contributes to the model’s output visible. To forecast each feature’s behavior for the prediction of CVD, we employed SHAP [48].

SHAP values are crucial for improving the interpretability of model outputs, particularly in complex models used for CVD predictions. By giving each feature a SHAP value we can determine how much of a contribution each feature makes to a particular prediction. Cooperative game theory is the source of SHAP values, which divide the prediction output among all features according to their respective impacts, thereby determining the role of each feature. This approach makes it easier to understand how various health indicators like heart rate, blood pressure, and cholesterol affect the final risk assessment for CVD. We applied SHAP explainer to view the internal working of our proposed model. SHAP explainer is used with kernel explainer with background data that is summarized. After the kernel explainer, we used all 22 features from our dataset to generate SHAP values. Using SHAP values, different visualization graphs are generated.

Force Plot: SHAP force plots of DB-Net and DB-Net shown in Figure 10 and Figure 11 illustrate individual predictions by demonstrating how each feature influences the model output to either move toward or away from a particular class in CVD prediction. The base value or average model output across the dataset is the focal point of each force plot. Red (positive SHAP values) indicates features that are pushing the prediction toward higher risk, while blue (negative SHAP values) indicates features that are pulling it toward lower risk. In both DB-Net and DVR-Net, for example, high blood pressure or high cholesterol may cause the prediction to lean toward a positive risk classification, whereas normal values for these characteristics may cause it to deviate from a risk indication. DVR-Net display more dynamic changes in SHAP values because of its focus on temporal or sequential patterns, while DB-Net’s force plot display a more stable gradient because the DenseNet and DBN components handle individual feature importance differently.

Summary Plot: The SHAP summary plot, as illustrated in Figure 12, provides a broad picture of which features are most important for the predictions in each instance. Each · represents an instance in the dataset, and the color of the · on the summary plot indicates the feature value. The most significant features are at the top of the list, arranged according to importance. For example, characteristics like age, blood pressure, and cholesterol levels may rank highly in both models for CVD predictions. The recurrent component of DVR-Net emphasize sequentially influenced features like blood pressure trends or glucose levels over time, while dense blocks emphasis on feature richness in DB-Net result in a high concentration of critical features like cholesterol.

Waterfall Plot: The path from the base value to the final prediction can be traced with the aid of SHAP waterfall plots, which show a breakdown of each feature contribution for a single prediction. The waterfall plots for DB-Net and DVR-Net, shown in Figure 13 and Figure 14, display the features in descending order of how they affect the prediction, with each feature either having a positive or negative effect on the outcome. For example, in a high-risk prediction, the plot might indicate that while factors like young age may deduct from the risk score, high cholesterol adds a significant positive SHAP value.

Dependence Plot: Features are plotted against their values for each instance in the dataset using the SHAP dependence plot shown in Figure 15 and Figure 16. If, for instance, the dependence plot is for the feature age, it will display the change in the SHAP value. Every point on this plot represents a distinct individual in the dataset, and the color of each point indicates the value of a feature that strongly interacts with the main feature. This aids in determining whether a feature value (such as age) rises or falls in relation to the anticipated CVD risk.

5.7. Ablation Study

In the ablation study, we compared our proposed models, DB-Net and DVR-Net, with different DL models. Table 17 shows the performance metrics’ comparison of different DL models with proposed models. From the results, it is clear that our proposed models are more robust and efficient as compared to the state-of-the-art models.

DBNs may be difficult and computationally costly to train, especially on big datasets. DBNs, like many other DL systems, are sensitive to hyperparameters. Therefore, selecting proper initialization techniques and hyperparameters is critical. VRNNs are primarily intended to handle sequential data and are unsuitable for our dataset, which is not time-dependent. DNNs can lead to overfitting issues when we are dealing with large datasets or high-dimensional data. HighwayNet does not perform well on this dataset because of its gating mechanism, which makes it complex and does not give good accuracy because it deals well with sequential data. On this dataset, ShuffleNet also does not perform well, as it leads to overfitting issues. From the results, we can see that the ShuffleNet model is biased and does not perform better than the proposed models. ResNet is well-known for its depth and capacity to train very deep architectures due to skip connections. However, these skip connections fail to adequately reflect the data’s complexity. DenseNet’s dense connection architecture allows each layer to directly access features from the layers that came before it. While this promotes feature reuse and information flow, it can also result in the duplication of learned representations. Redundant features increase the model’s memory footprint and computational complexity while not necessarily enhancing performance, particularly in deeper systems.

DB-Net and DVR-Net outperform all the baseline models. The reason is that the proposed models combine the strengths of different base models. They concatenate the features learned by the base models in the final layer. It helps reduce overfitting issues and increase generalization by combining predictions from many models. The features extracted by base models are then combined and passed to subsequent layers. This allows the proposed models to capture a more diverse representation of input data.

SMOTE and Random Oversampling Examples (ROSE) are thoroughly tested on the HDHI dataset, with an emphasis on how well each proposed worked with our proposed models, DB-Net and DVR-Net. These findings are shown in Table 18, along with the effects of each technique. SMOTE and ROSE pose a challenge to DB-Net and DVR-Net mainly because of the way these methods generate synthetic data. ROSE and SMOTE are two oversampling strategies intended to rectify class imbalance by producing synthetic samples for the minority class. These approaches do, however, have certain drawbacks that affect DL models that are sensitive to subtle patterns in data, such as DB-Net and DVR-Net. Through interpolation between existing minority samples, SMOTE creates synthetic instances yielding data points that fall between two existing minority class points. It disregards their distribution in the feature space though and treats every minority sample equally. Overfitting may result from this method, especially if the synthetic samples produced do not closely match the real decision boundary. Within a radius of each minority point, ROSE randomly creates samples around instances that already exist in the feature space. This may lead to noise and sampling instability, especially in regions in the plane where the decision boundaries of the two classes are tight. In order to capture complex feature hierarchies, DB-Net depends on dense connections between layers and the DenseNet structure, which makes it sensitive to even the smallest changes and patterns in the data. DB-Net might overfit to artificial patterns that do not accurately represent the relationships in the original data when it is trained on data that have been synthesized using SMOTE or ROSE. DVR-Net’s recurrent structure, which is intended to capture dependencies between features, depends on precisely learning relationships among complex patterns. The natural spatial relationships within the feature space are not always preserved by oversampling using SMOTE or ROSE, particularly close to decision boundaries where classes intersect. The synthetic samples may introduce noise or irrelevant patterns that are inconsistent with the underlying data structure, which decreases DVR-Nets capacity to generalize effectively. ProWRAS contributes to DB-Net and DVR-Net by specify a more accurate boundary in borderline regions, which avoids overfitting in minority samples. This makes models perform better because there is a mix of both positive and negative instances, which helps the DB-Net with its dense and complex layer connections and DVR-Net due to its recurrent connections.

6. Cross-Dataset Evaluation for Cardiovascular Disease Prediction

In DL, we usually train and test our model on the same dataset. However, it is critical to ensure the model’s performance on unknown data. It is essential to ensure how well the model performs on training data and how well it generalizes on unseen data. DL models are often trained on a specific dataset to identify patterns and correlations within it. However, it is critical to guarantee that the model’s performance extends beyond the training data and applies effectively to new data. Cross-dataset evaluation validates the model’s robustness and applicability across many datasets.

In our work, we evaluated our models on different datasets. For model training, we have used the HDHI, dataset and for testing, the CVD dataset is used. In 2021, the CDC provided CVD risk prediction data through the Behavioral Risk Factor Surveillance System (BRFSS) [49]. The CVD dataset displays all patient data in tabular form. This dataset contains 308,854 samples and 19 features. The BRFSS dataset is first preprocessed and cleaned. In preprocessing, two columns have been renamed, i.e., Weight_(kg) as weight and Height_(cm) as height. The majority of the dataset columns are categorical. Dealing with categorical values is challenging for DL models. Therefore, we have formatted them so that DL models can easily handle them. On the other hand, the training dataset HDHI contains 253,680 samples and 22 features. Using a different dataset to evaluate a model raises a number of issues. The testing dataset may differ from the training dataset in terms of feature representation, data distribution, and class distribution. To avoid this issue, we have removed three less important features, i.e., sex, fruits, and education, from the training dataset. The training dataset is then split into 80% training dataset and 20% testing dataset. DB-Net and DVR-Net are trained on the training dataset. Then, the CVD dataset is imported from the google drive and is first preprocessed by factorizing the categorical columns. The CVD dataset is then split into training and testing data. Cross-dataset evaluation results on DB-Net and DVR-Net are shown in Table 19. The results indicate that the models generalize well to unseen data and maintain good performance. From the results, it is clear that there is no difference in results when the model is trained and tested on different dataset.

7. Conclusions

CVD has become the most rising issue recently. Many CVD cases are reported annually. To predict and control CVD at early stages, we need a systemic approach. Different DL models are proposed for CVD prediction in healthcare. ProWRAS balancing technique is used for balancing the HDHI dataset as the dataset is highly imbalanced. For classification, two models are formed, DB-Net and DVR-Net, for CVD prediction. The proposed models’ performance is tested using different performance metrics and the results proved that the proposed DB-Net beats all the base models by achieving an accuracy of 91%, F1-score of 91%, precision of 93%, recall of 89%, and execution time of 1883 s on 30 epochs with a batch size of 32. The DVR-Net beats the state-of-the-art models with an accuracy of 90%, F1-score of 90%, precision of 90%, recall of 90%, and execution time of 2853 s. 10-FCV is performed to fine-tune parameters of the proposed models and to estimate models prediction. To view the performance of each feature in the model, we have used an XAI technique, SHAP. SHAP values sum up the behavior of each feature and decompose the output. Cross-dataset evaluation is performed to check the models’ robustness. In the future, using feature selection techniques, we can achieve better performance in terms of predicting CVD. The models’ performance is assessed on a specific dataset, which may limit their generalizability to other datasets with different features. Models can be tested on other datasets with different features to determine their robustness and generalizability. Additionally, no hyperparameter-tuning techniques were utilized. To boost the model performance, we can utilize different hyperparameter-tuning techniques and optimization algorithms to minimize model complexity and execution time.

Author Contributions

Methodology, Writing—review & editing, Conceptualization, Validation A.J.; Validation, Supervision, Formal analysis, Resources N.J.; Investigation, Finding, Data curation, N.A.; Formal analysis, Software, Resources M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This project is funded by the Researchers Supporting Project (number RSPD2024R648), King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The datasets can be found here: https://www.kaggle.com/datasets/alexteboul/heart-disease-health-indicators-dataset and https://www.kaggle.com/datasets/alphiree/cardiovascular-diseases-risk-prediction-dataset.

Acknowledgments

The authors extend their appreciation to the Researchers Supporting Program (project number RSPD2024R648), at King Saud University for supporting this research project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bharti, R.; Khamparia, A.; Shabaz, M.; Dhiman, G.; Pande, S.; Singh, P. Prediction of heart disease using a combination of machine learning and deep learning. Comput. Intell. Neurosci. 2021, 2021, 8387680. [Google Scholar] [CrossRef] [PubMed]
Tomov, N.S.; Tomov, S. On deep neural networks for detecting heart disease. arXiv 2018, arXiv:1808.07168. [Google Scholar]
Li, J.P.; Haq, A.U.; Din, S.U.; Khan, J.; Khan, A.; Saboor, A. Heart disease identification method using machine learning classification in e-healthcare. IEEE Access 2020, 8, 107562–107582. [Google Scholar] [CrossRef]
Noor, A.; Javaid, N.; Alrajeh, N.; Mansoor, B.; Khaqan, A.; Bouk, S.H. Heart disease prediction using stacking model with balancing techniques and dimensionality reduction. IEEE Access 2023, 11, 116026–116045. [Google Scholar] [CrossRef]
Bakar, W.A.W.A.; Josdi, N.L.N.B.; Man, M.B.; Zuhairi, M.A.B. A review: Heart disease prediction in machine learning & deep learning. In Proceedings of the 2023 19th IEEE international colloquium on signal processing & its applications (CSPA), Kedah, Malaysia, 3–4 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 150–155. [Google Scholar]
Katarya, R.; Srinivas, P. Predicting heart disease at early stages using machine learning: A survey. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 302–305. [Google Scholar]
Javed, A.; Javaid, N.; Hasnain, M.; Sarfraz, U.; Ahmed, I.; Shafiq, M.; Choi, J.G. Applying Advanced Data Analytics on Pregnancy Complications to Predict Miscarriage with eXplainable AI. IEEE Access 2024, 4, 1–19. [Google Scholar] [CrossRef]
Qureshi, H.; Shah, Z.; Raja, M.A.Z.; Alshahrani, M.Y.; Khan, W.A.; Shoaib, M. Machine learning investigation of tuberculosis with medicine immunity impact. Diagn. Microbiol. Infect. Dis. 2024, 110, 116472. [Google Scholar] [CrossRef]
Trimarchi, G.; Pizzino, F.; Paradossi, U.; Gueli, I.A.; Palazzini, M.; Gentile, P.; Di Spigno, F.; Ammirati, E.; Garascia, A.; Tedeschi, A.; et al. Charting the unseen: How non-invasive imaging could redefine cardiovascular prevention. J. Cardiovasc. Dev. Dis. 2024, 11, 245. [Google Scholar] [CrossRef]
Khan, T.A.; Chaudhary, N.I.; Hsu, C.C.; Mehmood, K.; Khan, Z.A.; Raja, M.A.Z.; Shu, C.M. A gazelle optimization expedition for key term separated fractional nonlinear systems with application to electrically stimulated muscle modeling. Chaos Solitons Fractals 2024, 185, 115111. [Google Scholar] [CrossRef]
Rana, M.; Bhushan, M. Advancements in healthcare services using deep learning techniques. In Proceedings of the 2022 International mobile and embedded technology conference (MECON), Noida, India, 10–11 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 157–161. [Google Scholar]
Khan, T.A.; Chaudhary, N.I.; Khan, Z.A.; Mehmood, K.; Hsu, C.C.; Raja, M.A.Z. Design of Runge–Kutta optimization for fractional input nonlinear autoregressive exogenous system identification with key-term separation. Chaos Solitons Fractals 2024, 182, 114723. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Shrivastava, P.K.; Sharma, M.; Kumar, A. HCBiLSTM: A hybrid model for predicting heart disease using CNN and BiLSTM algorithms. Meas. Sens. 2023, 25, 100657. [Google Scholar] [CrossRef]
Balakrishnan, M.; Christopher, A.A.; Ramprakash, P.; Logeswari, A. February. Prediction of cardiovascular disease using machine learning. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1767, p. 012013. [Google Scholar]
Sudha, V.K.; Kumar, D. Hybrid CNN and LSTM network for heart disease prediction. SN Comput. Sci. 2023, 4, 172. [Google Scholar] [CrossRef]
Pan, Y.; Fu, M.; Cheng, B.; Tao, X.; Guo, J. Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access 2020, 8, 189503–189512. [Google Scholar] [CrossRef]
Shukur, B.S.; Mijwil, M.M. Involving machine learning techniques in heart disease diagnosis: A performance analysis. Int. J. Electr. Comput. Eng. 2023, 13, 2177. [Google Scholar]
Ali, F.; El-Sappagh, S.; Islam, S.R.; Kwak, D.; Ali, A.; Imran, M.; Kwak, K.S. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf. Fusion 2020, 63, 208–222. [Google Scholar] [CrossRef]
Gangadhar, M.S.; Sai, K.V.S.; Kumar, S.H.S.; Kumar, K.A.; Kavitha, M.; Aravinth, S.S. Machine learning and deep learning techniques on accurate risk prediction of coronary heart disease. In Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 23–25 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 227–232. [Google Scholar]
Ali, L.; Niamat, A.; Khan, J.A.; Golilarz, N.A.; Xingzhong, X.; Noor, A.; Nour, R.; Bukhari, S.A.C. An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 2019, 7, 54007–54014. [Google Scholar] [CrossRef]
Parmar, B.; Patel, S.; Kanani, J.; Vaghasia, M.; Patel, K. Cardio-Vascular Risk Detection System using different Machine Learning Techniques. In Proceedings of the 2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), Gunupur, Odisha, India, 15–17 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Ananey-Obiri, D.; Sarku, E. Predicting the presence of heart diseases using comparative data mining and machine learning algorithms. Int. J. Comput. Appl. 2020, 176, 17–21. [Google Scholar] [CrossRef]
Kedia, S.; Bhushan, M. Prediction of mortality from heart failure using machine learning. In Proceedings of the 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), Patna, India, 24–25 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Ali, L.; Rahman, A.; Khan, A.; Zhou, M.; Javeed, A.; Khan, J.A. An automated diagnostic system for heart disease prediction based on χ² statistical model and optimally configured deep neural network. IEEE Access 2019, 7, 34938–34945. [Google Scholar] [CrossRef]
Gjoreski, M.; Gradišek, A.; Budna, B.; Gams, M.; Poglajen, G. Machine learning and end-to-end deep learning for the detection of chronic heart failure from heart sounds. IEEE Access 2020, 8, 20313–20324. [Google Scholar] [CrossRef]
Wang, Z.; Chen, X.; Tan, X.; Yang, L.; Kannapur, K.; Vincent, J.L.; Kessler, G.N.; Ru, B.; Yang, M. Using deep learning to identify high-risk patients with heart failure with reduced ejection fraction. J. Health Econ. Outcomes Res. 2021, 8, 6. [Google Scholar] [CrossRef]
Lewis, M.; Elad, G.; Beladev, M.; Maor, G.; Radinsky, K.; Hermann, D.; Litani, Y.; Geller, T.; Pines, J.M.; Shapiro, N.L.; et al. Comparison of deep learning with traditional models to predict preventable acute care use and spending among heart failure patients. Sci. Rep. 2021, 11, 1164. [Google Scholar] [CrossRef] [PubMed]
Hammad, M.; Abd El-Latif, A.A.; Hussain, A.; Abd El-Samie, F.E.; Gupta, B.B.; Ugail, H.; Sedik, A. Deep learning models for arrhythmia detection in IoT healthcare applications. Comput. Electr. Eng. 2022, 100, 108011. [Google Scholar] [CrossRef]
Maniruzzaman, M.; Rahman, M.J.; Ahammed, B.; Abedin, M.M. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf. Sci. 2020, 8, 1–14. [Google Scholar] [CrossRef]
Alanazi, R. Identification and prediction of chronic diseases using machine learning approach. J. Healthc. Eng. 2022, 2022, 2826127. [Google Scholar] [CrossRef]
Mall, P.K.; Yadav, R.K.; Rai, A.K.; Narayan, V.; Srivastava, S. Early Warning Signs Of Parkinson’s Disease Prediction Using Machine Learning Technique. J. Pharm. Negat. Results 2022, 13, 4784–4792. [Google Scholar]
Chou, C.Y.; Hsu, D.Y.; Chou, C.H. Predicting the onset of diabetes with machine learning methods. J. Pers. Med. 2023, 13, 406. [Google Scholar] [CrossRef]
Naskath, J.; Sivakamasundari, G.; Begum, A.A.S. A study on different deep learning algorithms used in deep neural nets: MLP SOM and DBN. Wirel. Pers. Commun. 2023, 128, 2913–2936. [Google Scholar] [CrossRef]
Manimurugan, S.; Al-Mutairi, S.; Aborokbah, M.M.; Chilamkurti, N.; Ganesan, S.; Patan, R. Effective attack detection in internet of medical things smart environment using a deep belief neural network. IEEE Access 2020, 8, 77396–77404. [Google Scholar] [CrossRef]
Raja, M.A.Z.; Shoaib, M.; Khan, Z.; Zuhra, S.; Saleel, C.A.; Nisar, K.S.; Islam, S.; Khan, I. Supervised neural networks learning algorithm for three dimensional hybrid nanofluid flow with radiative heat and mass fluxes. Ain Shams Eng. J. 2022, 13, 101573. [Google Scholar] [CrossRef]
Johnson, J. What is a Deep Neural Network? Deep Nets Explained, BMC Blogs. 2020. Available online: https://www.bmc.com/blogs/deep-neural-network/ (accessed on 6 August 2023).
Kumar, N.P.; Vijayabaskar, S.; Murali, L.; Ramaswamy, K. Design of optimal Elman Recurrent Neural Network based prediction approach for biofuel production. Sci. Rep. 2023, 13, 8565. [Google Scholar] [CrossRef]
Ab Aziz, M.F.; Mostafa, S.A.; Foozy, C.F.M.; Mohammed, M.A.; Elhoseny, M.; Abualkishik, A.Z. Integrating Elman recurrent neural network with particle swarm optimization algorithms for an improved hybrid training of multidisciplinary datasets. Expert Syst. Appl. 2021, 183, 115441. [Google Scholar] [CrossRef]
Baldha, S. Introduction to DenseNets (Dense CNN), Analytics Vidhya. 2022. Available online: https://www.analyticsvidhya.com/blog/2022/03/introduction-to-densenets-dense-cnn/ (accessed on 9 August 2023).
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Bajpai, A.; Sinha, S.; Yadav, A.; Srivastava, V. Early prediction of cardiac arrest using hybrid machine learning models. In Proceedings of the 2023 17th International Conference on Electronics Computer and Computation (ICECCO), Kaskelen, Kazakhstan, 1–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
Ammar, M.; Javaid, N.; Alrajeh, N.; Shafiq, M.; Aslam, M. A Novel Blending Approach for Smoking Status Prediction in Hidden Smokers to Reduce Cardiovascular Disease Risk. IEEE Access 2024, 12, 2169–3536. [Google Scholar] [CrossRef]
Kim, M.; Hwang, K.B. An empirical evaluation of sampling methods for the classification of imbalanced data. PloS ONE 2022, 17, e0271260. [Google Scholar] [CrossRef] [PubMed]
Schultz, K.; Bej, S.; Hahn, W.; Wolfien, M.; Srivastava, P.; Wolkenhauer, O. ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets. arXiv 2022, arXiv:2206.09812. [Google Scholar] [CrossRef]
Shaheen, I.; Javaid, N.; Alrajeh, N.; Asim, Y.; Aslam, S. Hi-Le and HiTCLe: Ensemble Learning Approaches for Early Diabetes Detection using Deep Learning and eXplainable Artificial Intelligence. IEEE Access 2024, 12, 66516–66538. [Google Scholar] [CrossRef]
Khan, Z.A.; Chaudhary, N.I.; Khan, T.A.; Farooq, U.; Pinto, C.M.; Raja, M.A.Z. Enhanced fractional prediction scheme for effective matrix factorization in chaotic feedback recommender systems. Chaos Solitons Fractals 2023, 176, 114109. [Google Scholar] [CrossRef]
Shahzadi, N.; Javaid, N.; Akbar, M.; Aldegheishem, A.; Alrajeh, N.; Bouk, S.H. A novel data driven approach for combating energy theft in urbanized smart grids using artificial intelligence. Expert Syst. Appl. 2024, 253, 124182. [Google Scholar] [CrossRef]
Alphiree. Cardiovascular Diseases Risk Prediction Dataset, Kaggle. 2023. Available online: https://www.kaggle.com/datasets/alphiree/cardiovascular-diseases-risk-prediction-dataset (accessed on 2 November 2024).

Figure 1. Causes of cardiovascular disease.

Figure 2. Proposed system model for CVD prediction.

Figure 3. Simulation representation of DB-Net on different epochs for cardiovascular disease prediction.

Figure 4. Execution time graph for cardiovascular disease prediction.

Figure 5. Confusion matrix of DB-Net for cardiovascular disease prediction.

Figure 6. ROC-AUC curve of DB-Net for cardiovascular disease prediction.

Figure 7. Simulation representation of DVR-Net on different epochs for cardiovascular disease prediction.

Figure 8. Confusion matrix of DVR-Net for cardiovascular disease prediction.

Figure 9. ROC-AUC curve of DVR-Net for cardiovascular disease prediction.

Figure 10. Force Plot of DB-Net for Cardiovascular Disease Prediction.

Figure 11. Force plot of DVR-Net for cardiovascular disease prediction.

Figure 12. Summary plot for cardiovascular disease prediction.

Figure 13. Waterfall plot of DB-Net for cardiovascular disease prediction.

Figure 14. Waterfall plot of DVR-Net for cardiovascular disease prediction.

Figure 15. Dependence plot of DB-Net for cardiovascular disease prediction.

Figure 16. Dependence plot of DVR-Net for cardiovascular disease prediction.

Table 1. Types of cardiovascular disease.

Types of Disease	Description
Cardiomyopathy	A type of CVD in which the heart muscles become stretched and stiff. The heart does not work well, leading to weakness and impaired heart function.
Congenital CVD	Congenital CVD occurs when the baby’s heart is developing abnormally in the mother’s womb.
Heart Valve Disease	The heart is composed of four valves that open and close to regulate blood flow. Any abnormality in these valves can cause heart valve disease.
Heart Rhythm Disorder	This disorder involves an abnormal heart rhythm, causing the heart to beat too fast or too slow.

Table 2. Identified research gaps.

Proposed Methodology	Limitations	Proposed Solution
The authors have not proposed any technique for data balancing [14].	The imbalanced dataset will not provide efficient results.	Proposed ProWRAS oversampling technique to balance the dataset.
Proposed SMOTE for data balancing [15].	SMOTE does not deal with borderline instances.	ProWRAS deals with borderline instances by giving higher weight to those close to the borderline.
The authors have proposed a hybrid of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) for CVD prediction [16].	CNN and LSTM are not good choices as CNN leads to vanishing gradient problems and LSTM is a computationally expensive model.	The proposed Dense Belief Network (DB-Net) can be used to overcome the vanishing gradient problem.

Table 3. List of abbreviations.

Abbreviation	Description
ANN	Artificial Neural Network
AI	Artificial Intelligence
CNN	Convolutional Neural Network
DL	Deep Learning
DT	Decision Tree
DenseNet	Densely Connected Convolutional Network
DBN	Deep Belief Network
DB-Net	Dense Belief Network
DNN	Deep Neural Network
DVR-Net	Deep Vanilla Recurrent Network
FF	Feature Fusion
GNB	Gaussian Naive Bayes
IoT	Internet of Things
KNN	K-Nearest Neighbor
LASSO	Least Absolute Shrinkage Selection Operator
LR	Logistic Regression
ML	Machine Learning
NLP	Natural Language Processing
NN	Neural Network
ProWRAS	Proximity Weighted Random Affine Shadowsampling
RF	Random Forest
RBM	Restricted Boltzmann Machine
RNN	Recurrent Neural Network
SVM	Support Vector Machine
VRNN	Vanilla Recurrent Neural Network

Table 4. Related work summary.

Limitations	Methodology	Evaluation Metrics
Feature selection problem	Proposed an FCMIM feature selection technique [2]	Accuracy, Sensitivity, Precision, MCC
Prone to overfitting problem	Proposed an EDCNN technique [17]	Accuracy, Sensitivity, Precision, Specificity
The existing techniques have low accuracy for predicting heart disease	Proposed a five-layered DNN architecture named HEARO-5 to improve model accuracy [3]	Accuracy
The existing techniques have parameter fine tuning problems and they are time consuming	Proposed a model comprising of LR, RF, ANN, KNN, and SVM to accurately predict heart disease [18]	Accuracy, Sensitivity, Specificity, F1-Score, Precision
State-of-the-art ML models cannot handle high-dimensional datasets	Used an ensemble DL technique to predict heart disease and feature fusion to extract important features [19]	Accuracy
Feature scaling problem	ANN, DT, RF, SVM, and KNN [20]	Accuracy
Computationally expensive	Proposed a Stacked SVM model [21]	Accuracy, Sensitivity, Specificity, MCC
Existing techniques are not ideal for heart disease risk prediction	Chi-square statistical test along with SVM, GNB, LR, XGBoost, and RF algorithms	Accuracy, ROC
ML techniques lead to overfitting problems, and thus, they are only suitable for large data samples	KNN, DT, LR, Simple Vector Classifier, and GNB [22]	Accuracy
Existing techniques have scalability issues	SVD feature selection technique along with Decision tree classifier model CART, LR, and GNB [23]	Accuracy, Precision, Recall
Prior techniques are inefficient for heart failure prediction	Proposed stack model to beat the other single models [24]	Accuracy, Precision, Recall, F1-score
Underfitting and overfitting issues faced by existing models	Proposed a $X^{2}$ -DNN hybrid model [25]	Accuracy, Sensitivity, Specificity, MCC
Heart sounds are not efficiently detected by existing techniques	Proposed ML and end-to-end DL models to detect heart sounds [26]	Accuracy, Sensitivity, Specificity
Patients having heart failure risk are not well identified using DL	Proposed a sequential model based on Bi-LSTM [27]	AUC, Precision, Recall
Existing techniques have poor performance in detecting heart failure patients	Proposed a traditional LR based model for prediction [28]	AUC-ROC
Existing techniques might not extract high-level features	Proposed hybrid model using CNN and ConvLSTM [29]	TPR, FPR, FNR, TNR, Accuracy
Hyperparameter tuning problem	Proposed IDMPF using ML, and the Grid search algorithm is used [30]	ROC, Precision, test score
Current techniques are not suitable for extracting features	Proposed CNN for feature extraction and disease prediction [31]	Accuracy, Precision, Recall, F1-score
Poor accuracy of prior techniques	Proposed model based on ensemble techniques [32]	Accuracy, MCC, F1-score
Existing techniques cannot handle big data and are prone to overfitting training data	Proposed two-class decision jungle, two-class LR, two-class boosted DT, and two-class NN for prediction [33]	Accuracy, Precision, Recall, F1-score

Table 5. Results of DB-Net for cardiovascular disease prediction.

Models	Accuracy	F1-Score	Recall	Precision	Training Time (s)	Inference Time (s)	Memory Usage (MB)	Epochs
DBN	0.844	0.844	0.846	0.841	833	20	13	10
DenseNet	0.865	0.865	0.865	0.865	826	20	13	10
DB-Net	0.905	0.910	0.948	0.875	850	20	23	10
DBN	0.853	0.849	0.841	0.829	1802	20	13	20
DenseNet	0.864	0.861	0.865	0.881	1610	20	13	20
DB-Net	0.914	0.909	0.929	0.890	1705	20	28	20
DBN	0.857	0.859	0.845	0.875	2377	20	13	30
DenseNet	0.861	0.856	0.828	0.885	2356	20	13	30
DB-Net	0.916	0.910	0.932	0.893	2756	20	30	30

Table 6. Deep learning model and dense belief network architecture for cardiovascular disease prediction.

Models	Architecture
DBN	Dense layer(Neurons = 512, activation function = ‘relu’)
	Dense layer (Neurons = 256, activation function = ‘relu’)
	Dense layer (Number of neuron = 128, activation function = ‘relu’)
	Dense layer (Neurons = 64, activation function = ‘relu’)
	Dense layer (Number of neurons = 32, activation function = ‘relu’)
	Dropout (0.2)
	Dense layer (Neurons = 1, activation function = ‘sigmoid’)
DenseNet	Dense layer (Neurons = 256, activation function = ‘relu’)
	Dense layer (Neurons = 128, activation function = ‘relu’)
	Dense layer (Neurons = 64, activation function = ‘relu’)
	Dense layer (Neurons = 32, activation function = ‘relu’)
	Dropout (0.2)
	Dense layer (Neurons = 1, activation function = ‘sigmoid’)
DB-Net	Dense layer (Number of neurons = 512, activation function = ’relu’)
	Dense layer (Neurons = 256, activation function = ‘relu’)
	Dense layer (Neurons = 128, activation = function = ‘relu’)
	Dense layer (Neurons = 64, activation function = ‘relu’)
	Dense layer (Neurons = 32, activation function = ‘relu’)
	Dense layer (Neurons = 2, activation function = ‘sigmoid’)

Table 7. Hyperparameters utilized in DB-Net for cardiovascular disease prediction.

Common Parameters:
Loss Function	Binary cross_entropy
Metrics	Accuracy
Epoch	10–30
Batch Size	32
DBN:
Hidden Layers	5
Activation Function in every Hidden layer	sigmoid
Optimizer	Adam
Number of Neurons	32–512
Activation Function on Output Layer	Sigmoid
DenseNet:
Dense Blocks	4
kernel_regularizer rate	0.001
Optimizer	Adam
Dropout Rate	0.2
Number of Neurons	32–256
Activation Function in Dense Block	ReLU
Activation Function on Output Neuron	softmax
DB-Net:
Optimizer	Adam
kernel_regularizer rate	0.001
Dropout Rate	0.2
Hidden Layers in DBN	5
Dense Blocks in DenseNet	4
Number of Neurons	32–512
Activation Function	ReLU
Activation Function on Output Neuron	Sigmoid

Table 8. Results of 10-FCV DB-Net (10 epochs) for cardiovascular disease prediction.

Models	Accuracy	F1 Score	Recall	Precision	Execution Time (s)
DBN	0.844	0.844	0.846	0.841	557
DenseNet	0.865	0.865	0.865	0.865	468
DB-Net	0.911, 0.920, 0.921, 0.925, 0.924, 0.925, 0.925, 0.923, 0.925, 0.926, 0.923, 0.905	0.909, 0.910, 0.910, 0.920, 0.921, 0.923, 0.923, 0.923, 0.923, 0.919, 0.921, 0.910	0.909, 0.910, 0.911, 0.912, 0.921,0.922, 0.924, 0.921, 0.925, 0.921, 0.921, 0.948	0.908, 0.906, 0.911, 0.912, 0.915, 0.921, 0.921, 0.924, 0.921, 0.921, 0.922, 0.875	221, 222, 321, 313, 375, 442, 412, 223, 315, 312, 312, 875

The first bold value represents the result of DB-Net before applying 10-FCV, while the second bold value represents the DB-Net value with 10-FCV applied.

Table 9. Results of 10-FCV DB-Net (20 epochs) for cardiovascular disease prediction.

Models	Accuracy	F1 Score	Recall	Precision	Execution Time (s)
DBN	0.853	0.849	0.841	0.829	1654
DenseNet	0.864	0.861	0.865	0.881	1161
DB-Net	0.920, 0.920, 0.926, 0.928, 0.921, 0.922, 0.924, 0.923, 0.923, 0.924, 0.922, 0.914	0.920, 0.920, 0.926, 0.928, 0.923, 0.924, 0.921, 0.924, 0.923, 0.922, 0.921, 0.909	0.920, 0.920, 0.926, 0.928, 0.922, 0.924, 0.922, 0.922, 0.925, 0.921, 0.921, 0.929	0.920, 0.920, 0.921, 0.928, 0.925, 0.921, 0.920, 0.922, 0.922, 0.924, 0.923, 0.890	1586, 1524, 1584, 1524, 1523, 1524, 1526, 1527, 1527, 1524, 1523, 1524

Table 10. Results of 10-FCV DB-Net (30 epochs) for cardiovascular disease prediction.

Models	Accuracy	F1 Score	Recall	Precision	Execution Time (s)
DBN	0.857	0.859	0.845	0.875	1687
DenseNet	0.861	0.856	0.828	0.885	1468
DB-Net	0.918, 0.921, 0.926, 0.928, 0.924, 0.921, 0.924, 0.923, 0.921, 0.923, 0.921, 0.916	0.918, 0.921, 0.926, 0.928, 0.924, 0.921, 0.924, 0.923, 0.921, 0.923, 0.921, 0.910	0.918, 0.921, 0.926, 0.928, 0.924, 0.921, 0.924, 0.923, 0.921, 0.923, 0.921, 0.932	0.919, 0.921, 0.926, 0.929, 0.925, 0.922, 0.921, 0.922, 0.923, 0.924, 0.923, 0.893	2548, 2604, 2604, 2564, 2532, 2531, 2527, 2524, 2527, 2542, 2522, 1883

Table 11. Results of DVR-Net for cardiovascular disease prediction.

Models	Accuracy	F1-Score	Recall	Precision	Training Time (s)	Inference Time (s)	Memory Usage (MB)	Epochs
DNN	0.813	0.814	0.814	0.814	319	8	15	10
VRNN	0.808	0.830	0.940	0.793	829	20	15	10
DVR-Net	0.905	0.905	0.904	0.908	4428	20	25	10
DNN	0.816	0.816	0.817	0.816	667	9	15	20
VRNN	0.838	0.844	0.898	0.798	1570	41	15	20
DVR-Net	0.904	0.904	0.904	0.904	5515	43	27	20
DNN	0.815	0.815	0.815	0.814	931	12	15	30
VRNN	0.8383	0.850	0.92	0.789	1921	46	15	30
DVR-Net	0.906	0.904	0.904	0.908	8712	47	31	30

Table 12. Deep learning models and DVR-Net architecture for cardiovascular disease prediction.

Models	Architecture
DNN	Dense layer (Number of neurons = 64, activation function = ‘linear’)
DNN	Dense layer (num_classes, activation function = ‘sigmoid’)
VRNN	SimpleRNN (Number of neurons = 64, return_sequences = True)
	SimpleRNN (Number of neurons = 64, return_sequences = True)
	SimpleRNN (Number of neurons = 64, return_sequences = True)
	SimpleRNN (Number of neurons = 64, return_sequences = True)
	SimpleRNN (Number of neurons = 64, return_sequences = True)
	Dropout (0.2)
	Dense layer (Number of neurons = 2, activation function = ‘sigmoid’)
DVR-Net	Dense layer (Number of neurons = 64, activation function = ‘linear’)
	Dense layer (num_classes, activation function = ‘sigmoid’)
	SimpleRNN (Number of neurons = 64, return_sequences = True)
	Dropout (0.2)
	Dense layer (Number of neurons = 2, activation function = ‘sigmoid’)

Table 13. Hyperparameters utilized in DVR-Net for cardiovascular disease prediction.

Common Parameters:
Loss Function	Binary cross_entropy
Metrics	Accuracy
Epoch	10–30
Batch Size	32
DNN:
Hidden Layers	1
Activation Function in every Hidden layer	Linear
Number of Neurons	32–64
Optimizer	sgd
Activation Function on Output Layer	Sigmoid
VRNN:
Hidden Layers	5
Optimizer	sgd
Dropout Rate	0.2
Number of Neurons	64
Activation Function on Output Neuron	Sigmoid
DVR-Net:
Optimizer	sgd
Dropout Rate	0.2
Hidden Layers in DNN	1
Dense Blocks in VRNN	5
Number of Neurons	32–512
Activation Function on Output Neuron	Sigmoid

Table 14. Results of 10-FCV DVR-Net (10 epochs) for cardiovascular disease prediction.

Models	Accuracy	F1 Score	Recall	Precision	Execution Time (s)
DNN	0.813	0.814	0.814	0.814	452
VRNN	0.808	0.830	0.940	0.793	746
DVR-Net	0.893, 0.911, 0.913, 0.915, 0.919, 0.920, 0.917, 0.915, 0.920, 0.923, 0.911, 0.905	0.893, 0.911, 0.912, 0.915, 0.918, 0.920, 0.916, 0.915, 0.919, 0.923, 0.910, 0.905	0.893, 0.911, 0.913, 0.915, 0.919, 0.920, 0.917, 0.915, 0.920, 0.923, 0.915, 0.904	0.893, 0.912, 0.915, 0.921, 0.922, 0.923, 0.924, 0.922, 0.926, 0.926, 0.927, 0.908	805, 804, 750, 805, 765, 765, 804, 805, 804, 744, 805, 866

Table 15. Results of 10-FCV DVR-Net (20 epochs) for cardiovascular disease prediction.

Models	Accuracy	F1 Score	Recall	Precision	Execution Time (s)
DNN	0.816	0.816	0.817	0.816	694
VRNN	0.838	0.844	0.898	0.798	1466
DVR-Net	0.881, 0.912, 0.919, 0.920, 0.920, 0.918, 0.922, 0.923, 0.923, 0.924, 0.924, 0.904	0.881, 0.912, 0.919, 0.922, 0.920, 0.918, 0.922, 0.924, 0.923, 0.924, 0.923, 0.904	0.881, 0.912, 0.919, 0.920, 0.922, 0.918, 0.922, 0.922, 0.925, 0.924, 0.924, 0.904	0.882, 0.914, 0.921, 0.923, 0.925, 0.922, 0.924, 0.922, 0.922, 0.927, 0.929, 0.904	1175, 1167, 1046, 1166, 1166, 1106, 1288, 1527, 1527, 1408, 1183, 1913

Table 16. Results of 10-FCV DVR-Net (30 epochs) for cardiovascular disease prediction.

Models	Accuracy	F1 Score	Recall	Precision	Execution Time (s)
DNN	0.815	0.815	0.815	0.814	742
VRNN	0.838	0.850	0.921	0.789	2393
DVR-Net	0.910, 0.914, 0.920, 0.922, 0.926, 0.925, 0.927, 0.925, 0.926, 0.925, 0.924, 0.906	0.909, 0.913, 0.920, 0.922, 0.925, 0.925, 0.926, 0.925, 0.925, 0.925, 0.924, 0.904	0.909, 0.914, 0.920, 0.922, 0.926, 0.925, 0.927, 0.925, 0.926, 0.925, 0.924, 0.904	0.912, 0.920, 0.925, 0.927, 0.930, 0.929, 0.930, 0.928, 0.930, 0.926, 0.923, 0.908	1411,1467, 1586, 1527, 1466, 1405, 1526, 1586, 1406, 1504, 1408, 2853

Table 17. Ablation comparison of baseline models and proposed models.

Model	Accuracy	F1-score	Recall	Precision
HighwayNet	0.857	0.859	0.861	0.869
ShuffleNet	0.500	0.334	0.500	0.250
ResNet	0.873	0.871	0.883	0.901
DBN	0.844	0.844	0.846	0.841
DenseNet	0.865	0.865	0.865	0.865
DB-Net	0.911	0.909	0.909	0.908
VRNN	0.808	0.830	0.940	0.793
DNN	0.813	0.814	0.814	0.814
DVR-Net	0.893	0.893	0.893	0.893

Table 18. Ablation results for DB-Net and DVR-Net for cardiovascular disease prediction using ROSE and SMOTE balancing.

Model	Balancing Method	Accuracy	Precision	Recall	F1-Score	Execution Time (s)
DBN		0.7720	0.7493	0.8166	0.7815	295
DenseNet	ROSE	0.7660	0.7190	0.8700	0.7870	168
DB-Net		0.7760	0.7506	0.8256	0.7863	279
DBN		0.8330	0.8120	0.8370	0.8440	245
DenseNet	SMOTE	0.8000	0.7990	0.8220	0.7850	177
DB-Net		0.8320	0.8570	0.8440	0.8510	279
VRNN		0.7270	0.6590	0.9370	0.7740	175
DNN	ROSE	0.7700	0.7700	0.7710	0.7690	44
DVR-Net		0.7710	0.7779	0.7710	0.7700	231
VRNN		0.7930	0.7310	0.9250	0.8170	158
DNN	SMOTE	0.8110	0.8130	0.8120	0.8110	62
DVR-Net		0.8160	0.8200	0.8160	0.8150	170

Table 19. DB-Net and DVR-Net tested on different datasets.

DB-Net
Dataset	Accuracy	F1-score	Precision	Recall	Test Time (s)
HDHI	0.911	0.909	0.908	0.909	13
CVD	0.918	0.880	0.849	0.918	15
DVR-Net
Dataset	Accuracy	F1-score	Precision	Recall	Test Time (s)
HDHI	0.893	0.893	0.893	0.893	8 s
CVD	0.881	0.855	0.852	0.850	5 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Javed, A.; Javaid, N.; Alrajeh, N.; Aslam, M. DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction. Appl. Sci. 2024, 14, 10516. https://doi.org/10.3390/app142210516

AMA Style

Javed A, Javaid N, Alrajeh N, Aslam M. DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction. Applied Sciences. 2024; 14(22):10516. https://doi.org/10.3390/app142210516

Chicago/Turabian Style

Javed, Aymin, Nadeem Javaid, Nabil Alrajeh, and Muhammad Aslam. 2024. "DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction" Applied Sciences 14, no. 22: 10516. https://doi.org/10.3390/app142210516

APA Style

Javed, A., Javaid, N., Alrajeh, N., & Aslam, M. (2024). DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction. Applied Sciences, 14(22), 10516. https://doi.org/10.3390/app142210516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DB-Net and DVR-Net: Optimized New Deep Learning Models for Efficient Cardiovascular Disease Prediction

Abstract

1. Introduction

Contributions

2. Related Work

3. Deep Learning Techniques for Cardiovascular Disease Prediction

3.1. Role of Deep Belief Network for Cardiovascular Disease Prediction

3.2. Role of Deep Neural Network for Cardiovascular Disease Prediction

3.3. Role of Vanilla Recurrent Neural Network for Cardiovascular Disease Prediction

3.4. Role of Densely Connected Convolutional Network for Cardiovascular Disease Prediction

4. Proposed System Models for Cardiovascular Disease Prediction

4.1. Heart Disease Health Indicator Dataset

4.2. Standard Scaler

4.3. Data Balancing Using Proximity Weighted Random Affine Shadow Sampling

Proximity Weighted Random Affine Shadow Sampling

4.4. Description of DB-Net and DVR-Net for Cardiovascular Disease Prediction

4.4.1. Dense Belief Network for Cardiovascular Disease Prediction

4.4.2. Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction

5. Simulation and Results of Cardiovascular Disease Prediction

5.1. Proposed Model Performance Evaluation

5.2. Metrics

5.3. 10-Fold Cross Validation

5.4. Validation of Dense Belief Network for Cardiovascular Disease Prediction

5.5. Validation of Deep Vanilla Recurrent Network for Cardiovascular Disease Prediction

5.6. SHapley Additive exPlanations for Cardiovascular Disease Prediction

5.7. Ablation Study

6. Cross-Dataset Evaluation for Cardiovascular Disease Prediction

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI