The tandem feature approach has been tested on the ECG arrhythmia identification system presented in
Figure 2, which is based on four different stages: (1) signal preprocessing, where the ECG signals are initially denoised, (2) raw feature extraction, in which a
raw set of discriminant features is extracted from the denoised ECG signals, (3) augmented feature extraction, in which the raw feature vector is enhanced with the MLP-based features to create the so-called tandem feature vectors, and (4) pattern classification, which involves two different stages itself: training, which trains the Gaussian mixture model for each AAMI heartbeat class from the training tandem feature vectors, and testing, which classifies each heartbeat into one of the predefined AAMI heartbeat classes from the testing tandem feature vectors (see
Table 1). These stages are explained in more detail next.
3.2. Raw Feature Extraction
From the denoised ECG signals, the raw feature extraction aims to obtain the most discriminant information from the various types of heartbeats present in the database. To do so, the Hermite functions were used in this work for raw signal representation. The orthogonal Hermite functions have a shape reminiscent of QRS morphology and include a width parameter that enables an efficient modelling of QRS complexes of different amplitudes. This makes it possible to obtain accurate heartbeat representations with few coefficients. The heartbeat is represented by a feature vector with the coefficients that permit its reconstruction from the combination of the Hermite functions. This representation has been shown to be compact and robust in the presence of noise [
28].
From the ECG signals, a 200 ms window was extracted for each heartbeat by considering the samples before and after the actual heartbeat position labelled in the database. Hermite functions tend to zero both in
and
∞. To make Hermite functions converge at window edges, a 100 ms zero segment was added at both sides of the QRS complex so that the resulting window has length a of 400 ms. This window can be represented as Equation (
1):
where
l refers the window sample,
is the number of Hermite functions,
represents the coefficients of the linear combination,
is the n-Hermite discrete function that is obtained by sampling the corresponding Hermite continuous function (i.e.,
),
is the approximation error between the actual window
and the Hermite representation,
is a dilation parameter that relates the width of the Hermite function with the width of the QRS complex and
l varies according to Equation (
2):
where
W is the window size,
is the sampling frequency and
represents the floor function.
The Hermite functions
are defined as Equation (
3):
where
is the sampling period (i.e., the inverse of the sampling frequency
) and
is defined as Equation (
4):
The Hermite polynomial
in Equation (
3) is defined recursively as Equation (
5):
where
and
.
This Hermite representation enables the representation of the heartbeat contained in each signal window from the coefficients of the linear combination (referred to as ) from the Hermite functions, and from .
For a given value of
, the Hermite functions form an orthogonal basis, as shown in Equation (
6):
It must be noted that Equation (
6) holds if the discrete Hermite function
is close enough to zero on both the edges and outside the analysis window. For the edges of each analysis window,
is at most
of its maximum value within the window, as defined in Equation (
7):
where
and
refer to the first and last window samples, respectively. Moreover, we also impose that the value of
is smaller outside the analysis window than in the edge of the analysis window, as shown in Equation (
8):
For a certain value of
, the linear combination coefficients
are computed by minimizing the summed squared error given by Equation (
9):
in which the squared error is approximated following
. For a predefined window size and for a fixed number of Hermite functions, it is possible to calculate theoretical limits for the value of
. Through an incremental iterative process, different values of
are tested, starting at 0 and going up to the theoretical maximum, until the one that minimizes the error is found. The average values of
for
are from 14 ms to 21 ms.
Then, a raw feature vector is stored for each heartbeat, which consists of the numerical values of the Hermite representation of the corresponding heartbeat plus the value. This process is carried out per each ECG lead available; since our system employs two different ECG leads, -dimensional raw feature vectors comprise the output of this module and are given to the augmented feature extraction module.
3.3. Augmented Feature Extraction
This module takes the raw feature vectors
as input and produces tandem feature vectors
as output. An MLP is employed to add the feature-level augmented information to each heartbeat in the ECG arrhythmia identification system. The MLP consists of three layers, as shown in
Figure 3: an input layer with
raw feature vector values, a hidden layer, whose number of units was selected based on preliminary experiments, and an output layer, which employs the softmax activation function, with a number of units equal to the number of heartbeat classes (five in our case).
The MLP models are trained by the MLP training module in
Figure 2. The standard back-propagation algorithm [
49] is employed to learn the MLP weights (i.e., connections between the units of the input and hidden layers and connections between the units of the hidden and output layers, as shown in
Figure 3) so that the classification error in the training data is minimized. Henceforth, the set of weights learned are used then to obtain the posterior probability vectors.
The augmented feature extraction consists of two different stages, which are applied to each of the -dimensional raw feature vectors , as presented next.
3.3.1. Posterior Probability Vector Computation
From the raw feature vectors
and employing the weights computed during MLP training, the MLP calculates a posterior probability for each class to be recognized. This process is similar to the use of the MLP for classification in which each raw feature vector is assigned the class with the highest posterior probability. However, instead of making a class decision for each raw feature vector, the MLP generates one posterior probability per class, as shown in
Figure 3. These posterior probability values are then used as new features, hence building a set of
-dimensional posterior probability vectors, being
the number of different AAMI heartbeat classes.
3.3.2. Tandem Feature Vector Construction
This stage concatenates the original -dimensional raw feature vectors (those generated by the raw feature extraction module) and the posterior probability vectors computed by the MLP. Therefore, tandem feature vectors are built, which are then used in the pattern classification system.
The ICSI QuickNet toolkit [
50], which was originally developed for the tandem approach in speech recognition tasks, provides different tools for developing signal processing systems based on MLP strategies. Here, we have used the ICSI QuickNet toolkit with the default parameter values for MLP training, posterior probability vector computation and tandem feature vector construction.
3.4. Pattern Classification
Gaussian mixture modelling has a widespread usage within pattern classification tasks (e.g., speech recognition [
51], image recognition [
52], video recognition [
53], etc.). For ECG arrhythmia identification, GMMs are a suitable tool because: (1) GMMs can be trained from a limited amount of data [
54], as it occurs for some heartbeat types present in the MIT-BIH Arrhythmia Database; (2) GMMs provide a simple strategy for classification, making it suitable for embedding the ECG arrhythmia identification system in a wearable device that aims to continuously monitor heart activity; and (3) GMMs can represent a large class of sample distributions (e.g., those corresponding to the training and testing data).
Therefore, for the GMM , being k one of the five heartbeat classes, the probability that a certain feature vector belongs to the class represented by that model can be obtained. We will denote this probability as .
3.4.1. Training
From a subset of the heartbeats for a certain class
k, which comprises the training subset, the training stage estimates the parameters (i.e., mean and covariance values) of each GMM
from the tandem feature vectors of that subset. To do so, the Expectation-Maximization algorithm [
55], which makes use of a maximum likelihood criterion, is employed. This training stage is needed just once, so that the classification stage employs the set of trained GMMs. For the sake of simplicity, a single component for each GMM has been used to train each model.
3.4.2. Classification
Once the models have been trained, classification is conducted on a fully independent data subset, the so-called testing subset. The classification stage finds the class represented by the model
with the maximum posterior probability. Hence, for a given input tandem feature vector
the Bayes’ rule is applied as Equation (
10):
where we have considered a uniform prior probability for each class.