1. Introduction
The human heart has an electric transmission system that involuntarily generates regular electrical signals and transmits these signals to the entire heart. Heart disease takes the lives of many people all over the world [
1,
2,
3]. Arrhythmia is a condition of irregular heartbeats caused by changes or dysfunctions of this electric transmission system. Arrhythmia can generally be diagnosed in the hospital using a measured electrocardiogram (ECG), which is a record of the electrical activity in the heart, obtained through electrodes located on the skin of the chest and limbs. In recent years, wearable devices such as the Apple Watch [
4] and the Zio patch cardiac monitor [
5,
6] were introduced to the market; they measure ECG in a new way, different from the conventional device used in hospitals, and they detect irregular heartbeats. The demand for ECG measurement through tele-medical devices is exploding. We are also interested in the development of the tele-medical device for ECG, and for this we needed to develop a deep learning-based algorithm for ECG classification, so in this paper, we proposed the deep learning model for this problem.
An ECG usually refers to a 12-lead ECG, which gathers 12 different types of information from the heart. To precisely classify a 12-lead ECG signal, doctors examine the ECG data and diagnose specific arrhythmias based on their medical knowledge and extensive experience. Unfortunately, judgment errors are likely to occur during this process. Even an experienced specialist requires considerable time to analyze the signals, and the accuracy may not be high [
5,
6]. In addition, in the case of a Holter monitor, the cardiologist cannot see the entire signal, which is usually recorded over several days. Thus, many researchers have attempted to classify 12-lead ECG signals automatically and accurately.
Thus far, rule-based algorithms that classify ECG by rules such as comparison of the length of a specific section (for example, based on the QRS duration or the RR interval etc.) in the entire ECG, or whether the beats of the ECG are regular or not, have been unsuitable for use in practice, owing to their poor performance. In addition, this classification has been approached using various machine-learning methods, e.g., logistic regression [
7], support vector machines (SVMs) [
8], random forests [
9], and K-nearest neighbors [
10,
11]. In recent years, deep-learning based approaches are showing better performance in ECG signal classification than rule-based algorithms and machine learning algorithms.
The most obvious approach for deep-learning research using ECG data involves creating a deep-learning model using all 12-lead ECG data measured in a hospital. Smith et al. [
12] found that the accuracy of a deep-learning network (with 13 convolutional layers and 3 fully connected layers) using all 12-lead ECG data was higher than that of a rule-based algorithm. However, in most cases, rather than using all the ECG information, researchers have approached ECG signal classification using the information from one specific lead; e.g., lead I or lead II. Lee et al. [
13] used a residual network (ResNet) with six residual blocks and an Alex network to classify atrial fibrillation (Normal/Atrial Fibrillation), which provided accuracies of 99.9 and 99.7%, respectively.
Rajpurkar et al. [
5] and Hannun et al. [
6] showed that a deep-learning model, using a 34-layer ResNet model, exceeded the performance of average cardiologists in terms of ECG discrimination ability of 12 output rhythm classes (10 arrhythmias/Normal/Noise). It is important to note that the data they collected were large-scale, obtained from patients in actual hospitals. Their model used 91,232 modified lead-II ECG records from 53,549 patients, and was recorded using Zio cardiac monitors.
Recently, beyond simple convolutional neural network (CNN) structures, attempts have been made to find a better ECG signal-classification structure by using structures that produce good results for image classification. Kim et al. [
14] used the visual DenseNet architecture with 34 layers for two classifications (Normal /Abnormal), with lead-II ECG data measured in a hospital. This structure achieved an overall accuracy of 98.89% and an F1 score of 99.09%. Their results showed that a single-lead ECG, rather than the 12-lead ECGs measured in a hospital, was sufficient to distinguish between the normal and abnormal classes.
In contrast to the methods mentioned thus far, some researchers have used short-term Fourier and wavelet transforms to convert ECG data into two-dimensional (frequency, time) data and used them as input for a deep neural network. Salem et al. [
15] used the transformation “spectrogram” from a one-dimensional (1D) ECG signal from the MIT-BIH dataset and the European ST-T dataset to make 2D images. They also used a 161-layer DenseNet, pre-trained on millions of images, to extract abstract information and then applied a SVM for four-class classification (Normal Sinus/Atrial Fibrillation and Flutter/Ventricular Fibrillation/ST Segment Change). Their model’s accuracy and F1 score were 97.23 and 97.35%, respectively. Amin et al. [
16] used two-dimensional spectrograms through short-time Fourier transform and data augmentation to classify data from the MIT-BIH arrhythmia dataset as: normal beat, premature ventricular contraction beat, paced beat, right bundle branch block beat, left bundle branch block beat, atrial premature contraction beat, ventricular flutter wave beat, and ventricular escape beat. Their model consisted of four convolutional layers and four pooling layers; the accuracy and F1 score were 99.11 and 98.0%, respectively. Rajput et al. [
17] constructed an ECG-based heartbeat-classification model that consisted of preprocessing (filtering and segmentation), feature extraction (Morlet wavelet transform and short-term Fourier transform), and a densely connected network. Their model’s F1 score was 83.4%, using the following classification: Normal Sinus/Atrial Fibrillation/Sinus Tachycardia/Sinus Bradycardia/Ventricular Bigeminy/Ventricular Trigeminy/Ventricular Tachycardia/Paroxysmal Supraventricular Tachycardia(PSVT)/Noise/Ventricular Ectopic Beats(VEB).
Our Contribution
The deep learning approach to ECG classification is similar to that of image classification, with the deep learning layer deepening and with decisioning based on a complex structure. Following this trend, we tried to find an end-to-end deep learning model suitable for ECG multi-classification only through changes in the deep learning architecture, without introducing a process of data transformation such as using a spectrogram.
The major contributions of our paper are the following:
A squeeze-and-excitation block was applied to the ECG classification model.
Our model was applied to a large-scale ECG dataset used in a hospital.
Our model, based on a squeeze-and-excitation residual network (SE-ResNet) surpassed the performance of a ResNet, which is known as one of the best models for ECG-signal multi-classification [
5,
6], in terms of the F1 score.
We compared the inference time between the ResNet and the SEResNet, and confirmed that there is no significant difference in inference time.
2. Materials and Methods
2.1. ECG Dataset Description
We constructed a large ECG-signal dataset that includes 28,308 lead-II ECGs collected from the Korea University Anam Hospital in South Korea. The collected data are meaningful in that they are not refined, and they include various types of actual data measured in hospitals. The data consist of the following 7 categories:
- -
Normal sinus rhythm (Normal)
- -
Atrial fibrillation (AF)
- -
Atrial flutter (AFL)
- -
First-degree atrioventricular block (FAB)
- -
Sinus bradycardia (SB)
- -
Sinus tachycardia (ST)
- -
Premature ventricular contraction (PVC)
Cardiologists in the Korea University Anam Hospital in South Korea annotated the labels for these 7 classes. In this study, our model was designed to classify seven rhythm classes (Normal/AF /AFL/FAB/SB/ST/PVC) from raw single-lead ECG data. The data ratios for each sector were 34.48 (Normal), 33.86 (AF), 6.17 (AFL), 6.90 (FAB), 6.87 (SB), 6.20 (ST), and 5.53% (PVC).
The ECG data were measured for 10 s at a frequency of 200 Hz. The data we used were based on lead-II ECG data taken from the 12-lead ECG data. In addition, the range of data values was adjusted to enable the smooth learning of deep-learning models with min–max normalization.
2.2. Classifier Model Architecture and Experiment
In this study, we proposed an ECG-signal multi-classification model using the SE-ResNet. The SE-ResNet focuses on the interdependencies between the channels of its convolutional features, instead of investigating the spatial information. The SE block consists of a squeeze operation, which summarizes the overall information about each feature map, and an excitation operation, which scales the importance of each feature map. In this way, the squeeze operation extracts only the important information from each channel, using global average pooling, and the excitation operation computes the inter-channel dependencies, using a fully connected layer and a nonlinear function. The SE-ResNet is considered to be one of the most popular of the many CNN architectures because of its high performance on ImageNet for image classification. In addition, an SE network is easy to apply because it simply adds an SE block without changing the shape of the existing model.
The main difference between the proposed network and the original SE-ResNet on ImageNet is that the proposed network uses 1D convolutions instead of 2D convolutions. We modified the original SE-ResNet by changing the input and output from 224 × 224 × 3 and 1000 to 1 × 2000 × 1 and seven (classes). The architectures of the SE-ResNet and ResNet for all the layers (18/34/50/101/152) used in our experiment are shown in
Figure 1.
To verify the performance of our model using the SE-ResNet, we chose the the ResNet used in [
5,
6] as the baseline model. The ResNet is known as one of the best models for ECG-signal multi-classification. Specifically, we used a modified ResNet [
18], which uses pre-activated weight layers, instead of the original ResNet [
19], which uses post-activated weight layers, because the modified ResNet has better performance than the original network.
We evaluated our model on the lead-II ECG signal dataset measured in the Korea University Anam Hospital; it consists of seven classes. We considered 28,308 10 s ECG signals. We first split our lead-II ECG dataset into two, a training dataset and a test dataset, in the ratio of 8:2. After this, we set aside the test dataset and chose 80% of the training dataset to be the actual training dataset and the remaining 20% to be the validation dataset. In order to divide the dataset, we randomly selected each dataset; however, we fixed the random seed to compare the results. In this way, 64% of our total dataset was used to train the network, 16% was used to validate the model, and the remaining 20% was used to test the model. The number of training samples was 18,116. For the validation and test of the ECG signals, 4530 and 5662 samples were used, respectively.
In order to optimize the model, we selected the Adam optimizer presented by Kingma et al. [
20], with an initial learning rate of 0.0001. The training process continued until the validation loss did not decrease for a certain step. Similar to other deep-learning classification models, we used categorical cross entropy as a loss function.
This retrospective study was approved by the Institutional Review Board of Korea University Anam Hospital on 12 February 2018 (2018AN0037). Informed consent was waived by the IRB, given that data were de-identified.
3. Results
Under the above training setting, we investigated the SE-ResNet and ResNet with 18/34/50/101/152 layers for seven-class classification. The test set was evaluated using the parameters in the validation set that exhibited the best performance; for the model evaluation, the accuracy and F1 score were used. The F1 score is the harmonic mean of the precision and recall. It is a more efficient criterion for model evaluation than accuracy, if the ratio between the data sectors is very different, e.g., the ECG signal dataset we cover in this study.
Table 1 and
Table 2 summarize the performance of the seven-class classification models on the testing data using the SE-ResNet and ResNet, respectively.
The SE-ResNet had better classification performance than the ResNet, for all layers (18/34/50/101/152). Our model also had a higher F1 score than the baseline Resnet model for all sectors (Normal/AF/AFL/PVC/SB/ST/FAB). The best result for the seven-class classification model using the SE-ResNet was the 152-layer model with a 97.05% F1 score (
Table 1). The best result for the seven-class classification model using the ResNet was the 152-layer model with a 95.65% F1 score (
Table 2). The F1 scores of the best SE-ResNet models for the seven-class ECG signal classifications were +1.40% (difference between 97.05% for the 152-layer SE-ResNet and 95.65% for the 152-layer ResNet) higher than the baseline model.
When the data were analyzed, the point to note about
Table 1 is that the F1 scores of our model for AFL, PVC, SB and FAB were relatively lower than the F1 scores for Normal, AF and ST. Furthermore, to analyze the results in more detail, we selected the 152-layer SE-ResNet models with the highest F1 scores for the seven-class classifications. We calculated the confusion matrices of these models, and they are graphically shown in
Figure 2, which confirms the advantages and disadvantages of our models.
As shown in
Figure 2, our model shows good overall performance for most sectors; however, it had difficulty distinguishing between FAB and SB, AF and AFL, and FAB and PVC, which explains the lower F1 scores for AFL, PVC, SB and FAB.
In addition, we compared the inference time of the ResNet and SE-ResNet for the test set in
Table 3. When the number of layers is 18 or 34, there is no significant difference in inference times between the ResNet and SE-ResNet, however, when the number of layers is 50 or 101 or 152, the inference time of the ResNet is less than the inference time of the SE-ResNet with the same number of layers. Nevertheless, even the 152-layer SE-ResNet model does not take much time for processing the test set and shows that it has practical application. When considering both the inference time and performance, an 18 or 34-layer SE-ResNet may also be considered. This is because an 18 or 34-layer SE-ResNet is several times faster than the 152-layer SE-ResNet, but the F1 score (
Table 1) is not much different. Note that an 18 or 34-layer SE-ResNet has far better results than the ResNet of any layer (18/34/50/101/152).
4. Discussion
Although the SE-ResNet has better classification performance than the ResNet, our classifier still has several limitations. First, the F1 scores of AFL, PVC, SB, ST and FAB were much lower than those of the other sectors (Normal / AF) in our model. This was because of the model’s inability to distinguish between FAB and SB, AF and AFL, FAB and PVC, and ST and PVC. This lower accuracy might be due to the insufficient data for AFL, PVC, SB, ST and FAB compared with the data provided for Normal and AF. In the future, we plan to upgrade our model to increase the F1 scores of AFL, PVC, SB, ST and FAB.
Second, owing to the lack of ECG data, we created a classification model for only some often-observed arrhythmias. To expand this model to other arrhythmia conditions, we must accumulate additional arrhythmia data; e.g., junction rhythm, Supraventricular Tachycardia(SVT), Ventricular Tachycardia(VT), Wenckebach, etc., as presented in Rajpurkar et al. [
5] and Hannun et al. [
6].
Finally, we built the model and conducted various experiments to determine the best F1 score and accuracy, without considering the model’s complexity. To achieve higher accuracy, we followed the deep-learning trend of making deeper and more complicated networks. However, many studies in the real world must be carried out on computationally limited platforms. In the future, we will consider the model’s file size and computation speed, as well as its accuracy and F1 score.