1. Introduction
Millions of people worldwide are affected by cardiovascular disease (CVD) [
1]. According to a World Health Organization study [
2], cardiovascular diseases account for 30% of all deaths worldwide. The most common cause of death from cardiovascular disease is cardiac arrest, which is primarily caused by heart arrhythmias. An irregular pulse could signify cardiac arrhythmias, which are heart-related illnesses [
3,
4]. Arrhythmias cause abnormal cardiac rhythms by causing the heart to beat too quickly, too slowly or too irregularly [
5]. As a result of the heart’s failure to pump enough blood to all parts of the body, there is reduced blood flow to the heart, brain and other areas of the body. There are two types of arrhythmias: those that are not life-threatening and those that are. The majority of life-threatening arrhythmias involve the heart. As a result, it is critical to monitor heartbeat activity at regular intervals. As a result, determining the presence of CVD and identifying arrhythmia are required to provide appropriate medical therapy [
6,
7].
For this reason, an electrocardiogram is the gold standard for monitoring and diagnosing arrhythmias. Electrodes placed on the skin can be used to collect an electrocardiogram signal. Electrocardiogram signals can detect changes in electrical potential caused by the heart’s contraction and relaxation stages. Analyzing an ECG signal repeatedly over long periods is difficult and time-consuming for any clinician. Computer-aided diagnostic technologies are frequently used to analyze electrocardiogram data and identify arrhythmias due to their dependability and accuracy. There are 15 different types of beats present in ECG Signals, if there are significant ECG beats in the data, cardiac arrhythmia is most likely to present [
8]. As a result, classifying the heartbeat signal is an essential step in detecting arrhythmias.
Many machine learning algorithms have been used in the available literature to identify arrhythmias using ECG data. Methods such as random forests [
9], ANNs [
10] and support vector machines [
11] have been presented in various studies. Standard machine learning techniques [
9,
10], and [
11] cannot be used without first performing a feature extraction strategy. Prior to applying traditional ML techniques, a feature extraction process is required, which extracts various handcrafted features to affect the classification results [
12,
13]. Manual feature extraction, on the other hand, takes a long time and effort and does not take advantage of the database’s underlying information. Over-fitting is a common problem in ML-based classification systems [
14]. Because deep learning models can automatically construct sophisticated feature representations from input databases, deep learning approaches do not require handwritten feature representations. This deep learning-based solution may aid in addressing current shortages while also generating revenue from existing challenges.
In this paper [
15], the authors present a Neural Network for expert knowledge that is modeled on the basis of variable projections. It used a Variable Projection (VP) layer as a general-purpose, trainable feature extractor or filtering approach which can be tailored to different 1D signal-processing issues by the selection of an application-specific function system. The result shows that the VPNet can match or outperform the classification accuracy for fully linked and CNN networks with less number of parameters. It has marginally better convergence than the CNN and FCNN. While the number of weights and biases for the FCNN and CNN increased linearly with the length of the input signals, the VP layer only needed two parameters for learning in all circumstances.
Deep Learning techniques have been rapidly used for image and signal processing research domains. Deep learning algorithms categorized electrocardiogram (ECG) data based on different input parameters. There are many methods for deep learning in biological signal processing that have had significant success. The DL model is used to examine ECG readings (arrhythmia detection). CNN [
16], Deep belief network [
17], Recurrent neural network (RNN) [
18], Auto-encoder [
19], and Deep neural network (DNN) [
20] are some examples of these networks. Many examples of ECG research have explored employing neural networks, notably CNNs and RNNs, to classify arrhythmias. Acharya et al. [
21] created an 11-layer CNN model that distinguishes four different heartbeats, including normal, ventricular fibrillation, atrial flutter and atrial fibrillation, utilizing three separate ECG signal datasets.
Similarly, [
22] published a nine-layer deep CNN model for recognizing five different arrhythmias from the MIT-BIH database. The American National Criteria Institute standards were used to categorize these heart rates (ANSI-AAMI). They also developed a fictitious data set to compensate for the discrepancy in heartbeat classification. Li et al. [
23] developed a generic CNN model that can be trained and fine-tuned on a large number of heartbeats to increase the accuracy of ECG signal categorization [
24].
- A.
Scope of paper
Furthermore, utilizing long-term ECG readings, our CAD model can assist cardiologists in identifying heartbeats and detecting arrhythmias. This kind of CAD system has several uses in the polyclinic, including reducing the burden on cardiologists and the cost of ECG data processing.
- B.
Major contributions:
The significant contributions of this study are as below:
The dataset comes from PhysioNet.org, an MIT-BIH AD that contains 48 patient records.
This team created a novel hybrid deep learning model based on CNN, and BLSTM approaches to identify arrhythmias from ECG data.
The dataset is normalized first using the Z score normalization approach, and then heartbeat segmentation is conducted using the annotation files given by the cardiologist along with the database.
The dataset is split into two parts: training–testing. It then augments the training dataset for the imbalanced class with the synthetic minority oversampling strategy to generate a balanced one.
This balanced dataset is used to train the newly deployed hybrid deep learning model. After training, the models were assessed on an unidentified dataset (test dataset) to quantify heartbeats in five groups. Metrics used to evaluate statistical performance include accuracy, precision, recall and F-score.
We compared our findings with numerous existing approaches recommended in the relevant study. Our model outperforms the traditional techniques offered in terms of analytical performance. This study employs a unique deep learning-based CNN and BLSTM model hybrid to identify and improve classification performance on unprocessed raw ECG data. The remainder of the paper is divided into the following groups: The theoretical foundation that served as the primary approach employed in the construction of this research is covered in
Section 2, along with information about the ECG database.
Section 3 contains the findings of the tests that were carried out utilizing the proposed hybrid model.
Section 4 discusses the experiment outcomes and compares with previously published studies.
Section 5 contains the last thoughts.
2. Literature Review
In this section, we have reviewed the existing work conducted by different authors and researchers using different approaches.
In their paper [
24], Sellami and Hwang proposed a novel nine-layer deep CNN model for categorizing heartbeats into five major classes based on inter-patient and intra-patient comparisons. To address the issue of class imbalance, the author used a batch-weighted loss function with three input variants: a target heartbeat with class labels, the last heartbeat followed by a target heartbeat with a class labeled of target heartbeat, and neighboring heartbeats followed by a target heartbeat with a class labeled of target heartbeat. Furthermore, in the method in [
25], Xia presents and employs CNNs and active learning models to classify the MIT-BIH AD automatically. The researchers completed this task. Active learning was combined with cutting-edge algorithms and updated versions to improve the system’s overall performance.
Lightnet [
26] is the name given to the release by He et al. (CNN model). LightNet training was carried out on low-power PCs. This trained model was designed for mobile devices to detect anomalies in the ECG signal while consuming as few resources as possible. Multiple filter sizes were used in this model to generate alternative feature combinations in each convolutional layer, resulting in improved classification accuracy. The authors of [
17] present a DL-based model for categorizing ECG heartbeats, including three classification phases. The first stage uses an unsupervised Gaussian–Bernoulli deep belief network to extract a feature representation from the dataset. The second classification stage is supervised training, which is used to train linear SVM classifiers for the problem at hand. In the third step, the system consults with an expert to examine potentially confused heartbeats and modify its recommendations accordingly.
In the MIT-BIH arrhythmia database, Mathews et al. [
27] used deep learning to identify ventricular and supraventricular heartbeats. Despite the study’s small sample size, the authors achieved excellent results by employing a constrained Boltzmann machine and a deep belief network. Ozal Yildirim [
28] created a wavelet sequence based on a deep LSTM model to classify ECG data into five distinct heartbeat groups.
Furthermore, Oh Shu Lih et al. [
29] examined ECG data in conjunction with CNN and LSTM models, with variable lengths of ECG heartbeats used to train the model. Tan, Jen, Hong and colleagues [
30] proposed a model that uses CNN and LSTM to differentiate between healthy and diseased coronary arteries.
Deep learning techniques outperformed previous approaches in biological signal processing for learning these properties from ECG data [
31]. We developed a novel hybrid deep learning-based CNN and BLSTM model to improve automated arrhythmia classification performance by employing an oversampling strategy to build consistent data and assess the impact of class imbalance on training and testing data. Oversampling was used to achieve this result and provide reliable data.
Ojha, Manoj Kumar and colleagues [
32] created a CNN-based autoencoder with an SVM classifier that divides the MIT-BIH AD into five categories. The model first encodes the features from the ECG signal before passing them to the SVM classifier. Li, Yuanlu et al. [
33] proposed a deep residual CNN model to classify arrhythmic heartbeats automatically. This model extracts a heartbeat segment from a 5-s ECG signal and then denoises using the Discrete Wavelet Transform. In addition, the signal focal loss function’s imbalanced process is used.
Cui, Jianfeng et al. [
34] created a model based on feature extraction and selection that extracts and selects optimized features from ECG signals to improve arrhythmia classification accuracy.
4. Experimental Results
A Dell workstation with two 2.4 GHz Intel Xeon E5-2600 processors and 64 GB of RAM is used for both the training and testing of the given CNN-BLSTM model. For a learning rate of 1 × 10
−3, it takes an average of 358 s per epoch on the CPU to finish training and testing on the ECG dataset. The Python Keras package was used to develop the deep CNN-BLSTM model. The CNN-BLSTM model was evaluated using its overall accuracy (Acc), recall (
Se), precision (
P) and F-score. The dataset’s true positive (
TP), false positive (
FP), false negative (
FN) and actual negative values are as follows: (
TN):
To train and evaluate the CNN-BLSTM model, we employ two methods. In the first scheme, the models are trained on an oversampled database using the SMOTE technique and then tested on the unseen dataset. In contrast, in the second, models are trained and evaluated on an under-sampled database (it is used to adjust the distribution of the dataset to get the actual value).
4.1. Experiment One: Models Trained on the Oversampled Database Using SMOTE Method
The tenfold cross-validation findings on MIT-BIH AD are shown in
Table 4. First, the model is trained on the oversampled dataset, then, it is used for cross-validation on the test dataset. The performance of the model is summarized based on the performance analysis on various parameters such as recall, precision, F-score and overall accuracy of the classification findings for each class. Regarding accuracy, precision, recall and F-score, the suggested CNN-BLSTM model-averaged 98.36%, 94.24% and 91.676%, respectively.
Table 4 shows Q beats have the highest classification sensitivity, and F beats have the lowest since the real positive value is tiny and the actual negative value is significant.
We compare the results of the proposed CNN-BLSTM model with two other deep learning models CNN-GRU and CNN-LSTM, and we present a justification for our findings based on cutting-edge techniques.
Table 5 and
Table 6 illustrate the classification performance of the CNN-GRU and CNN-LSTM models on oversampled training datasets.
Overall, the CNN-GRU model had a 96.63% success rate, with an average precision of 79.97%, recall of 95.59% and an F-score of 85.75%. Compared to other models, the CNN-LSTM model performed at a higher level overall, with 98.16% accuracy, 87.01% precision, 94.77% recall and 90.42% F-score on average.
Table 4,
Table 5 and
Table 6 show that the proposed CNN-BLSTM model outperforms the state-of-the-art methods in average F-score, precision and accuracy. Even so, the CNN-GRU model had the highest average sensitivity of all tested ones. On average, the CNN-LSTM model is more sensitive than its simpler cousin, the CNN-BLSTM.
4.2. Experiment Two: Model Trained with Under-Sampling Method of MIT-BIH Arrhythmia Database
Under-sampling is a technique where only one in every 12% of normal heartbeat (N) is used for model training and testing. Since most Normal heartbeats in the MIT-BIH AD (around 80%) are standard (N), the remaining classes of heartbeat datasets are discarded. To locate a pulse, we under-sample just class N, while all other classes are sampled without under-sampling. As a consequence, we used 25,019 heartbeat segments in Experiment 2; 10,382 are normal (N), 2726 are supraventricular ectopic (S), 7222 are ectopic (V), 802 are Fusion (F) and 3888 are unknown (Q). On an under-sampled database,
Table 7 summarizes the average results of tenfold cross-validation of the CNN-BLSTM model.
Table 7 demonstrates that the proposed CNN-BLSTM model has a high level of accuracy (97.15%), high level of precision (96.50%), high level of recall (93%) and high level of F-score (94.55%). The same under-sampled data set is also used to test CNN-GRU and CNN-LSTM, two more models.
Table 8 and
Table 9 exhibit the classification results after 10-fold cross-validation.
The overall accuracy obtained by the CNN-LSTM model is 97.17%, slightly better than the CNN-BLSTM model, but the average precision and average F-score are the highest of the CNN-BLSTM model.
Figure 3 compares three different deep learning models with two schemes, oversampling and under-sampling, on the MIT-BIH AD. As shown in
Figure 3, the highest classification accuracy obtained by the CNN-BLSTM model is 98.36% using the oversampling SMOTE method on the database. Hence, the CNN-BLSTM model is superior to this experiment’s other two deep learning models.
5. Result Discussion
Over the past two decades, researchers have extensively studied the MIT-BIH AD to develop better methods for classifying arrhythmias using machine learning and artificial neural networks. Various deep learning-based techniques have been used in recent years to categorize arrhythmias. The diversity of rhythms that arrhythmias may induce is categorized and listed in
Table 10. To meet the standards set by ANSI-AAMI, Zubair et al. [
44] trained a deep CNN model on a raw ECG database to categorize arrhythmia heartbeats into one of five categories. This CNN model combines the functions of two key modules used for recognizing ECG patterns. Feature extraction and heartbeat detection are two such parts. Insight into the possibility of automating feature representation and categorization on the raw ECG database without requiring human-created features has recently emerged. The overall accuracy of the gathered test set is 92.7%.
Meanwhile, Shadmand and Mashoufi [
45] have divided the MIT-BIH AD into five groups using a Block-based neural network. The particle swarm optimization approach was applied in this study to optimize the neural network’s different parameters. A classification accuracy of 97% was achieved by extracting the input of network feature vectors from the ECG signal using the Hermit E function [
52] coefficient and temporal feature techniques through extracting interpretable features automatically. In the research publication [
22], to differentiate between five distinct heartbeats recorded in the MIT-BIH AD, Acharya and colleagues built a 9-layer deep CNN model. The CNN model used for training and testing was evaluated using an oversampled database version because the original database had unequal classes. Testing accuracy obtained by the supplemented database was 94.03%, whereas the testing accuracy when the oversampled database was not used declined to 89.07%.
Furthermore, Shadmand and Mashoufi [
45] used a Block-based neural network to divide the MIT-BIH AD into five categories. The particle swarm optimization method was used to improve the neural network parameters. We recovered the input network feature vectors from the ECG signal using the Hermit E function coefficient and temporal feature approaches, resulting in a 97% increase in classification accuracy. Acharya et al. [
22] used a 9-layer deep CNN model to classify heartbeats from the MIT-BIH AD. Because of the severely skewed class distributions in the initial database, the CNN model was trained and evaluated using an oversampled database. When the oversampled database was not used for training and testing, the accuracy fell to 89.07%. On the other hand, the upgraded database achieved a testing accuracy of 94.03%.
In addition, Xu et al. [
53] developed deep learning approaches to automatically categorize raw ECG data into five categories. In this model, the input is in the form of raw electrocardiogram data, and the output provides a classification judgment for each pulse individually. Our model used an innovative preprocessing methodology to support a deep neural network. The model included the segmentation and alignment of heartbeats, and a deep neural network was utilized to categorize the different types of heartbeats. Armani et al. introduced a deep learning-based technique for categorizing heart disease detection that is applied automatically in article [
48]. The architecture for deep learning that was just shown may be broken into two pieces. First, a deep autoencoder is an unsupervised form of feature learning, and second, a deep neural network working as a classifier can improve the outcomes of older machine-learning techniques. These older techniques consisted of several redundant feature extractions, selection and feature reduction steps. A deep autoencoder is an unsupervised form of feature learning. A deep neural network working as a classifier can improve the outcomes of older machine learning techniques. Deep learning assists with embedded feature extraction and selection during the pre-training phase of DAE and the fine-tuning phase of DNN. This, in turn, makes it possible to extract high-level features from both the training data and the data that has not yet been seen.
Table 10 compares the accuracy of the classification results obtained using the proposed CNN-BLSTM model and the accuracy obtained using other methods that have been utilized in the past. The proposed model successfully recognizes heartbeat signals, and its accuracy is exceptional compared to past Work in this area.
This study classified the MIT-BIH AD using end-to-end computer-aided design (CAD) techniques rather than manual feature extraction. Overfitting has been observed in the majority of machine learning classifiers. As a result, the CNN-BLSM model, which uses regularization and dropout, aids in preventing the model from becoming overly accurate during training. We dropped out of the recurring connections at 20% of the model’s BLSTM layer total. As a result, in this hybrid model, CNN is in charge of selecting high-quality spatial features, while BLSTM layers are in charge of learning the temporal data more efficiently. As a result, the CNN-BLSTM model could recognize ECG data more accurately than before. Because the aforementioned deep learning model is developed for end-to-end systems on raw ECG data, noise filtering and feature extraction techniques are unnecessary.
The Following Conclusions May Be Drawn from the CNN-BLSTM Model’s Findings
The suggested CNN-BLSTM model’s strength in terms of overall accuracy, recall, precision and F-score is its ability to automatically identify ECG heartbeats in line with AAMI criteria. When testing on synthetic data and addressing the issue of class imbalance across minority groups, the CNN-BLSTM model was found to be superior at categorizing heartbeat types. The model’s autonomy means that ancillary tools, such as those used for feature extraction, feature selection, and classification are redundant. Improved average and total accuracy are the primary advantages of the model [
54].
We compared the results of two separate experiments conducted with k-fold cross-validation with and without dropout regularization. The training procedure in experiment A did not employ dropout regularization. In this instance, we used every available metric in the learning procedure. In experiment B, however, we used dropout regularization with 0.5 dropouts, which means that half of the data was thrown away, and only the other half was used for training. The experimental outcomes of both methods are analyzed. The standard deviation from the 10-fold evaluation and overall accuracy was 98.36%, 98.16% and 97.17%.
CNN-BLSTM is more stable since it uses a cross-validation scheme with ten iterations. This CNN-BLSTM model requires specialized hardware (GPU) and a massive dataset for training, which is the model’s main drawback. Therefore, it appears that the computation time is longer than that needed for regular sorts. It helps to identify the pattern in ECG signal.
6. Conclusions
In this research, we created a model of a wholly automated classifier for ECG data. This model used ANSI-AAMI criteria to categorize the ECG signal into five groups. This study developed a deep learning classification model using the MIT-BIH ECG database for training and testing. Arrhythmia classification accuracy was significantly improved once the class imbalance problem was fixed. Through the use of the SMOTE method, we were able to generate artificial data that corrected the class imbalance. This virtual experiment shows that the CNN-BLSTM model is a viable option for identifying and labeling ECG arrhythmias. The categorization results were presented using four metrics (Recall, precision, F1-score and overall accuracy).
Our results were 94.24% for recall, 89.50% for accuracy and 91.67% for F1. When these findings were averaged across all 105 heartbeat categories, the CNN-BLSTM model had an overall accuracy of 98.36%. A CAD ECG system may use the CNN-BLSTM model to analyze and diagnose based on electrocardiograms. This model can aid cardiologists in interpreting electrocardiogram (ECG) readings. To reduce the cardiologist’s burden and expenses associated with analyzing ECG signals, such devices find widespread application in polyclinics.
In the future, we will use different deep learning methods to train the model like ReLU, Batch Normalization, Soft Max Function and Sigmoid Activation function with different layers to reduce the complexity and time.