1. Introduction
Electrocardiogram (ECG) is a bioelectrical signal test which provides information about human heart activity [
1]. It is widely used by medical institutions because it is non-invasive and inexpensive. With the rapid development of wearable ECG detection systems and telemedicine applications in healthcare [
2], ECG signals generated by these devices need to be stored and transmitted. However, 12-lead ECG signals require large storage space [
3]. For example, an hour-long ECG record with a sampling rate of 360 Hz and a data resolution of 11 bits per sample has a size of 20.39 megabytes. Long-time ECG signal detection, such as with a Holter monitor, will create a very large amount of data.
Cardiovascular diseases are mainly monitored by ECG, but the application of traditional ECG equipment is limited to professional medical institutions. In order to achieve more comprehensive monitoring of patients (such as community monitoring, home monitoring, etc.), there is an increasing demand for portable ECG monitoring systems. Generally, portable ECG monitoring systems use wireless technology to transmit ECG data, which is inconvenient for real-time transmission of large amounts of ECG data. Fortunately, signal compression technology can solve this problem by compressing the ECG signal before data transmission, which reduces the amount of transmitted data while ensuring the effect of diagnosis and treatment. Therefore, it is critical to choose an efficient compression coding technology. The traditional lossless compression method has a small compression ratio, which makes it difficult to meet the real-time data transmission requirements. The near-lossless compression method can achieve a high compression ratio and low signal distortion at the same time, meeting the requirements of portable ECG monitoring systems [
4]. In the following experiments, the effectiveness of our near-lossless compression method and reconstruction scheme was verified. Generally, there are two aspects to achieving near-lossless ECG compression: transform-based methods and deep learning methods [
5].
Transform-based methods mainly convert the signals into transform domain and abandon information which is not helpful for signal reconstruction. Due to the energy compaction property, Fourier transform (FT), wavelet transformation (WT), and discrete cosine transform (DCT) have shown validity in ECG compression [
6]. By encoding the critical information, the ECG signal can be compressed. In ref. [
7], P. Ziran et al. extracted frequency information by lifting wavelet transformation and discarding the insignificant information. The Embedded Zerotree Wavelet (EZW) was used to select features and improve the compression ratio. In ref. [
8], Chunyu Tan presented an adaptive Fourier decomposition (AFD) with application to ECG compression. It sped up the de-composition and improved compression performance. In ref. [
9], JiaLi Ma et al. fused AFD with the symbol substitution (SS) technique. AFD guarantees high fidelity and SS improves the compression rate without information loss. In ref. [
10], Sibasankar Padhy et al. presented a compression method on multi-lead ECG records by using singular value decomposition in the multiresolution domain. In ref. [
11], M.L. Hilton introduced wavelet transform in ECG compression. By combining it with EZW coding, the ECG signal can be compressed. However, there are two disadvantages to these transform-based methods. Firstly, these methods reduce the signal size by discarding some parameters directly, but some critical information is carried by these parameters, so these processes will degrade compression quality. Secondly, transform-based methods are always combined with the independent encoding algorithm, which will also increase computing complexity [
12]. Therefore, transform-domain-based methods are not suitable for application in portable systems.
Recently, deep learning compression methods have become more popular for their high-quality compression, and the above two problems do not occur in the deep learning compression method. According to Andrew Y. Ng, DNNs can recognize patterns and learn useful features from raw input data without requiring extensive data preprocessing, feature engineering, or handcrafted rules, making them particularly suitable for interpreting ECG data [
13]. As an end-to-end method, deep learning based on an auto-encoder can directly compress the ECG signal without additional encoding algorithms. Auto-encoder is a promising technique used in obtaining the low-dimensional representation of original signal and information restoration [
14,
15,
16,
17], which is a classical end-to-end deep learning algorithm. In ref. [
18], Ozal Yildirim implemented a deep convolutional auto-encoder in the compression of ECG signals. A model of 27 stacked layers guarantees the quality of compression. In ref. [
19], Wang et al. presented a spindle structure of a convolutional auto-encoder to increase the sufficient information extraction and the compression ratio. All these ECG compression methods based on deep learning rely on reducing hidden nodes to increase the compression ratio. However, the reduction of nodes in the hidden layer will degrade the quality of reconstruction [
20]. In the above articles, the implementation of high ratio compression inevitably sacrifices the quality of reconstruction. However, portable ECG detection systems need to ensure good signal quality, so it is necessary to maintain the compressed signal quality while achieving a high compression ratio.
In this paper, a novel deep learning compression method was presented, which is based on binary convolutional auto-encoder (BCAE) equipped with residual error complement (REC). In this method, the convolutional auto-encoder (CAE) was determined as the base model [
21]. CAE encoder encodes the input signal to obtain the compressed code of floating-point type, and then the CAE decoder decodes the compressed code to obtain the reconstructed signal. The novelty of BCAE is the binary output of the encoder section. By altering the activation function and gradient, the encoder can directly generate a binary code without extra coding. In this way, the floating nodes of CAE can be replaced by binary nodes. Without reducing the compression ratio, BCAE can greatly increase hidden nodes to improve the restoration capability of the network. Moreover, to further improve the compression quality, a new optimization model named residual error compensation (REC) was developed. It is a network to obtain the residual error between the output of BCAE and the original signals. Compensated with this residual error, the reconstructed signal can be more similar to the original signal. Thus, the novel strategy of BCAE + REC is an attractive method in both high reconstruction quality and high compression ratio.
Compared with previous compression methods, the innovations of the method proposed in this paper are the following:
BCAE directly generated binary compressed code. Under the premise of a high compression ratio, hidden nodes were increased to improve the reconstruction quality.
By using REC, the quality of the reconstructed signal from BCAE was improved, which guarantees the compression quality.
Five categories of signals (normal beat, left bundle branch block beat, right bundle branch block beat, atrial premature beat, and premature ventricular contraction) from the MIT-BIH database were classified using the original and reconstructed signals, respectively, further verifying the effectiveness of the compression.
A portable device based on Raspberry Pi was designed to realize the proposed compression algorithm. It was proven that BCAE has practicality and is helpful for the application of portable ECG monitoring systems.
In summary, the ECG compression method proposed in this paper has a high compression ratio and little signal distortion, and so can be used for the transmission and storage of ECG data. The experiments verified that the proposed scheme can meet the requirements of portable ECG monitoring systems for data transmission while ensuring the effect of diagnosis and treatment.
The rest of this paper is organized as followings.
Section 2 introduces the datasets used in the model and principle of proposed BCAE and REC, followed by model configuration.
Section 3 introduces the evaluation criteria and shows detailed results.
Section 4 presents the discussion and comparison. Finally,
Section 5 concludes this paper.
2. Materials and Methods
This section first introduces the MIT-BIH database and ECG signal preprocessing, then explains the principle of the proposed method, and finally illustrates the model configuration.
As shown in
Figure 1, the method proposed in this paper contains three parts: ECG raw signal preprocessing, the binary convolutional auto-encoder (BCAE), and the residual error compensation network (RECN).
The structures of BCAE and RECN are shown in
Figure 2 and
Figure 3. The first advantage is the BCAE which can generate the binary compressed output by encoding the hidden features. As shown in
Figure 2, the encoder is composed of six 1-D convolutional layers to extract information as the feature vectors in the hidden layers. Through the binary convolutional layer, these features can be encoded into the binary codes. In this way, a conventional floating node can be replaced by a series of binary nodes. With nodes increasing, the effect of signal reconstruction is improved, ensuring a high compression ratio. These binary codes can be restored to the original signal by stacked deconvolutional layers in the decoder. The second advantage is the REC network which can compensate the loss to improve the reconstruction performance. As depicted in
Figure 1 and
Figure 3, RECN is designed to reduce the residual between the input ECG signal of the BCAE Encoder and the decoded signal of the BCAE Decoder output. Combined with the output of RECN, the reconstructed signal by BCAE can be higher quality. Details of each part are introduced as follows.
2.1. Datasets
The MIT-BIH database is provided by the Massachusetts Institute of Technology. It is one of the three standard ECG databases in the world and has been widely used to train the proposed network [
22]. It contains 48 records from 47 patients, each with the diagnosis of several cardiologists. In the database, all signals are sampled at a frequency of 360 Hz with a resolution of 11 bits.
Raw ECG carries redundant information such as noises and low-energy components. In preprocessing stage, the noise of ECG signals was removed by a 0.5–150 Hz bandpass filter. Therefore, eliminating these redundancies is good to retain important information for compressing the ECG signals. Furthermore, all signals were normalized by the max–min normalization technique [
23]. Deep learning compressed methods usually process on beats. Like many previous deep learning compressed methods, single beats were used as basic samples. This requires heartbeat segmentation of the original records. In the heartbeat segmentation stage, R-peak detection was first performed on each ECG recording using the Pan–Tompkins algorithm [
24]. Then, 127 sample points on the left side and 192 sample points on the right side of the R peak were taken to obtain a heartbeat containing 320 (127 + 192 + 1 = 320) points [
25]. At this point, the samples containing 320 11-bit floating-point numbers required for the training phase had been obtained.
2.2. Binary Convolutional Auto-Encoder (BCAE)
The proposed BCAE was developed from the traditional convolutional auto-encoder (CAE) [
26]. CAE is an available technique successfully used in ECG compression. Considering its acceptable compression ability, CAE was used here as a basic model. However, conventional CAE achieves a high compression ratio by reducing the number of floating hidden nodes. Similar to CAE, BCAE can be also separated into encoder and decoder parts. In the encoder, convolution layers and pooling layers extract feature vectors with critical information from input signals. The improvement of BCAE is the binary encoding layer of the encoder, which replaces the conventional floating-point output with binary codes. For example, as shown in
Figure 4a, the traditional CAE compresses the original signal into floating-point numbers; two 11-bit floating-point numbers occupy 22 bits, while in
Figure 4b, BCAE compresses the ECG signal into binary numbers, and two binary numbers occupy only 2 bits. Therefore, compared with the traditional floating-point compression method, even if the number of hidden nodes of BCAE increases by 11 times, the compression ratio will not decrease. Enough numbers of hidden nodes can guarantee the reconstruction quality of the network and improve the compression quality [
20]. Therefore, BCAE has great potential to achieve high compression performance. After training, transposed convolutional layers and up-sampling layers of the decoder help rebuild original signals from binary code.
The detailed operations of the binary encoding layer (BEL) and function layers used in the proposed model are illustrated in the following sections.
2.2.1. Binary Encoding Layer
The improvement is to modify the activation function and gradient of the convolutional layer to generate the binary output. The binary technique has been employed in the convolutional network (CNN) successfully [
27]. In our work, the most important modification is to utilize the step function as the activation function
f (•):
Due to the binary output of 0 and 1, this layer can directly achieve binary encoding. Thus, a 20-bit floating node in conventional CAE can be replaced by 20 binary nodes in BCAE. Under the same compression ratio, BCAE has more nodes. With nodes increasing, the quality of the reconstructed signal can be improved to guarantee the compression quality. This work used the backpropagation algorithm to train the network, which requires the input to be differentiable at every point. However, the above step function is not differentiable at
x = 0. Therefore, in this work, the gradient in a small range near zero was modified to a constant 1. In
Figure 5, from −0.5 to 0.5, the gradient
g (•) as Formula (2) calculated is 1 and the other gradients are 0.
2.2.2. 1-D Convolutional Layers
These layers use the convolution kernel to perform convolution operations on the input data and output the result through the activation function. The kernels are a series of filters with trainable weights. Through training, kernels can extract the significant information from the input and discard the redundancy. The convolution operation can be described by Formula (3).
in which
denotes the
th output of the
th convolution layer,
is the weight between
and
;
represents the connection between
and
;
is the 1D convolution operation;
is the bias of
th output from the
th convolution layer. Additionally,
represents the activation function. In this paper, the hyperbolic tangent function (tanh) was used as an activation function in all convolutional layers except the binary encoding layer.
2.2.3. Transposed Convolution Layer
These layers achieve signal restoration by deconvolution. As an inverse process of convolution, deconvolution is to convolve the transposed 1-D kernels with input signals as Formula (4) defines:
where
is the
th output of the
th Transposed Convolution layer, and
represents the transpose operation. The other parameters have the same meaning as the 1-D convolutional layer. By stacked transposed convolution layers, the compressed code can be restored to the original signal.
2.2.4. Max Pooling Layer
The max pooling layer is always adopted to reduce the dimension of features by down-sampling. As Formula (5) defines, max pooling retains the maximum value in the range of the pooling window and discards other values while moving this window.
in which
is the
th value of
th output from
th max-pooling layer;
represents the size of the window; and
is the stride pooling layer.
was set to be equal to
for a nonoverlapped pooling.
is the index of the sampling window. The information can be centralized for better compression by this layer. Moreover, it can alleviate the computational burden and avoid overfitting.
2.2.5. Up-Sampling Layer
Up-sampling is commonly used in feature extensions. Here, zero-padding is used instead of interpolation to reduce computational complexity. In each up-sampling window, the first value is restored by the corresponding input and the rest is padded with zeros. This operation is formulated as Formula (6):
where
represents the size of the up-sampling window and
is the stride of sampling; the other parameters in Formula (6) are as same as those in the encoder. In this way, compressed data can be restored to the original size.
2.2.6. Linear Layer
The ECG signal is reconstructed from features by linear transform in this layer. Because of the hyperbolic tangent activation, the output of the transposed convolutional layer is limited between −1 and 1. To address this problem, a linear layer was used to rebuild the original ECG signals. The linear layer operates as:
where
is the
th output in
th linear layer;
and
have the same definition as that in Formula (4).
As depicted in
Figure 2, these function layers are stacked to generate a BCAE network. The encoder of BCAE consists of stacked convolutional layers, max-pooling layers, and a binary encoding layer. The first few convolutional layers and max-pooling layers extract the main features and condense the features. The last of the encoder is the binary coding layer, which directly compresses the signal into binary code. As for the decoder of BCAE, it consists of eight transposed convolution layers, up-sampling layers, and a linear layer. The binary compressed code can be restored to the original signal by transposed convolution layers and up-sampling layer. The linear layer further transforms the feature signal into the reconstructed signal with accurate amplitude.
To strengthen the performance of BCAE, Batch Normalization (BN) and dropout are introduced after 1-D convolutions and transposed convolutions for reducing overfitting [
28]. As a deep learning model, BCAE can be also improved by BN, which eliminates the gradient vanishing and speeds up the convergence in the training phase [
29]. Furthermore, as an effective technique to avoid overfitting, dropout is also employed to improve the compression performance [
30]. In summary, all these strategies make BCAE a promising model in ECG signal compression.
2.3. Residual Error Compensation Network (RECN)
RECN can be treated as a complement of BCAE. It is designed to obtain the residual error between the reconstructed signal by BCAE and the original signal. The main structure of RECN is depicted in
Figure 3. The binary compressed code was used as the input of RECN. A Multi-Layer Perceptron (MLP) was set to transform the binary compressed data to the desired residual error. After BCAE training, the reconstructed signal and corresponding binary code could be obtained. Through the original signal, the difference between the original signal and the reconstructed signal could be obtained, which is the label of the RECN training process. The RECN network was trained based on this label to make the reconstructed signal closer to the original signal. Because the corresponding residual error of the reconstructed signal is actually small, the weight update speed is very slow. Therefore, the residual error was magnified ten times as the output of RECN for more efficient training. Therefore, the output of RECN is reduced by ten times before it is added to the output of BCAE. In summary, as an optimization method, RECN can compensate for the output of BCAE by outputting residual error. In this way, optimized by RECN, the compression quality of BCAE can be further improved.
2.4. Compression Package
As shown in
Figure 6a, the compression process of the heartbeat signal contains two parts: interval code and binary compression code. The interval code stores the interval between the current beat and the previous beat, represented by a 10-bit binary code [
19]. The first bit of the interval code is the flag bit. When it is set to 0, the beat needs to be delayed by the corresponding time to connect with the previous beat, and interpolation is used to pad the delay interval. When the flag bit is set to 1, the beat should be connected to the previous beat by the corresponding time earlier, and the overlapping part is represented by an average value of the overlapping signals. The remaining 9 bits represent the delay interval or the duration of the overlapping part, which can represent the time interval within 512 (2
9 = 512) sampling points (512 samples/360 Hz = 1.4 s). Considering that the heartbeat duration is usually not more than 1.4 s, through the above connection method, the reconstructed heartbeats segment can be restored to the original ECG signal by using this 10-bit interval code. As for binary compressed code, it is generated by BCAE. According to the structural configuration of BCAE, the output nodes of the encoder are 20, so the size of the compression code is fixed. Therefore, the compressed heartbeat can be encoded into a 20-bit binary code. In this way, a heartbeat with 320 samples can be compressed into a 30-bit binary compression pack.
As shown in
Figure 6b, in the reconstruction stage, the BCAE decoder and RECN decode the binary compression code to obtain the reconstructed heartbeat (waveform for floating point). Since the interval information of adjacent heartbeats is stored in the interval code, continuous ECG records can be obtained through the interval code.
2.5. Model Configuration
The proposed architecture and parameter configuration are summarized in
Table 1 and
Table 2. As for BCAE, the encoder consists of 9 convolutional layers, 4 max-pooling layers, and 1 binary encoding layer. The decoder consists of 9 transposed convolutional layers, 4 up-sampling layers, and 1 linear layer. As for RECN, it consists of 5 hidden layers and 1 linear layer. Here, all convolutions and transposed convolutions were operated as the “same” pattern, which pads the output to the input size.
Tensorflow [
31] (Python version) was used to build and train the proposed network. Since the noises have the same frequency as ECG signals and cannot be removed [
4], the Pseudo–Huber loss function was determined as the loss function, and defined as [
32]:
where
represents the loss;
represents the original signal; and
denotes the reconstructed signal.
represents the parameter controlling the gradient less steep for extremums and was set to 0.9 after debugging. As a smooth approximation of the Huber loss, it guarantees derivation of each order [
33]. To speed up the model convergence, the Adagrad optimizer [
34] was used for training. The initial learning rate and batch size were set to 0.1 and 256, respectively. The number of training iterations was set to 400 and the model can converge.
4. Discussion
More advantages of BCAE and RECN are illustrated in this section. Firstly, the improvements from BCAE and RECN are summarized in
Figure 10. Four models, CAE, BCAE, CAE + RECN, and BCAE + RECN were tested with a generic test set of 2400 beats. The compression ratios of the four models are controlled by the hidden layer nodes to be equal, and the reconstruction quality results are represented by the histogram. It can be noted that BCAE was much better than CAE in compression quality under the same compression ratio. The PRD decreased to a low level of 10.65% and the SNR was improved to 12.71 dB. This result proves the effectiveness of BCAE on reconstruction quality. By the innovative binary compressed code, quality improvement can be achieved without sacrificing the compression ratio.
Secondly, optimized with RECN, the compression quality was also further improved. It further reduced PRD to 7.76 and enhanced QS to 18.75. Complemented with the residual error, the reconstructed results can be more accurate. Hence, it was proved that the RECN, a novel optimization method proposed in this paper, boosts the compression quality. In all, the proposed BCAE compression model with RECN can achieve an attractive ECG compression.
In
Table 5, the results of the method proposed in this paper are compared with several existing studies on ECG compression. The main comparison is the average result on the records. According to
Table 5, though ref. [
18] achieves lower PRD values, this is always due to the high offset line and different amplitude dimensions. To objectively evaluate the compression quality, PRDN was also evaluated, which removes the offset influence. The proposed method has good compression quality with a PRD of 7.76, and has the largest compression ratio of 117.33 and the highest QS of 18.75. QS is the evaluation metric that best represents the comprehensive compression effect, and high QS proves the advantages of the proposed method. Compared with refs. [
7,
36], and [
37], with the approximate compression quality (PRD), the proposed BCAE strategy can greatly improve the CR and QS. In summary, the proposed method maintains a high quality and high ratio compression, achieves optimal overall performance, is attractive in ECG compression, and can be used for portable ECG monitoring systems.
To further verify the quality of the reconstructed signal, as shown in
Figure 11, five types of beats: Normal beat (N), Left bundle branch block beat (L), Right bundle branch block beat (R), Atrial premature beat (A), and Premature ventricular contraction (V) were classified using the original signals and reconstructed signals, respectively. In order to be consistent with the number of heartbeats compressed and reconstructed in
Section 3.2, a total of 48,000 heartbeats of the above five types in the MIT-BIH database were obtained according to the heartbeat extraction method and preprocessing method mentioned in
Section 2.1. These beats were compressed and reconstructed using the method proposed in this paper. To reproduce the visual inspection performed by a cardiologist, this experiment analyzed the signals in the time domain. Here, a convolutional neural network (CNN) [
39] is directly used to classify the heartbeat waveforms of the reconstructed and the original signals, respectively. Of the beats of each class, 80% were used as the training set and 20% were used as the test set. The classification results are shown in
Table 6.
This experiment compares the classification results of the original signal and the reconstructed signal, rather than pursuing the classification effect. In order to reduce the influence of other features on the classification, this experiment only analyzed the waveform features, that is, only the time domain signal was used for classification. The results showed that the difference in accuracy and average F1 score for the five-class signal classification using the original and reconstructed signals is small (less than 1%), which is acceptable, demonstrating the effectiveness of the proposed compression method.
In addition, to verify the practicability of the proposed method, a portable ECG signal compression device was made using Raspberry Pi 3 Model B (as shown in
Figure 12). Transfer the neural network model (BCAE Encoder) trained in
Section 3.2 into the Raspberry Pi and compress the 2400 heartbeats from the training set. The compression code is transmitted to the back-end processing system (such as a computer) through wifi, and the signal is reconstructed through the BCAE decoder and RECN. The reconstructed signal can be used for further disease detection. This experiment mainly calculates the time required for a single heartbeat containing 320 points to be compressed by the Raspberry Pi. The results are shown in
Table 7.
According to the above experimental results, for a single heartbeat (with a duration of 320/368 HZ = 0.88 s), the compression processing time on the Raspberry Pi is 0.0101 s, which is much less than 0.88 s. This proves that the portable ECG signal compression device designed in this paper can realize real-time processing of ECG signals, so the proposed compression method has practical significance and can be used in wearable ECG devices.
5. Conclusions
In this paper, a novel ECG compression method of RECN and BCAE was proposed. The main objective of this study was to achieve efficient ECG signal compression through deep learning while ensuring the quality of the reconstructed signal and a high compression ratio. Based on the CAE model, the binary encoding strategy was introduced into BCAE, which can guarantee the compression ratio and improve the quality of reconstruction. BCAE was an end-to-end model that needs no extra encoding algorithm. Additionally, an efficient optimization technique, residual error compensation, was applied to improve the compression quality. Validated experimentally by the MIT-BIH database, the efficiency of the proposed method was proved. As a result, it achieves state-of-the-art performance with a compression ratio of 117.33 and PRD of 7.76. In addition, an experiment was designed to compare the classification results of the original heartbeat and the reconstructed heartbeat for Normal beat, Left bundle branch block beat, Right bundle branch block beat, Atrial premature beat, and Premature ventricular contraction. The differences in accuracy and average F1 scores were small, demonstrating that the ECG signals reconstructed by these codes were of high quality. Moreover, a portable compression device was designed based on the proposed compression algorithm using Raspberry Pi, which proves the practicality of the proposed method. In a summary, this method has attractive prospects in large data storage and portable electrocardiogram detection systems, and it can provide an effective compression method for remote data transmission, especially in portable ECG detection systems.