3.1. Signal Selection
Signal selection is a necessary stage when different types of physiological signals can be used for heartbeat detection. On the one hand, the complementary information that some of them convey could improve performance through the fusion of the information each provides. On the other hand, selecting multiple signals that convey the same or similar information may not outperform the heartbeat detection performance and even could reduce it due to the limited amount of data (theoretically, with infinite amount of training data, an algorithm should reject the bad data, but in a real scenario, there is only a finite amount of data).
We shall classify large amounts of physiological signals that can be recorded regarding heartbeat detection into the direct signal group and the indirect signal group. The direct signal group contains the signals that are directly related to cardiac activity (e.g., ECG, ABP, BP, PPG, SV, PAP, SAP, CVP, and ballistocardiogram (BCG), among others). The indirect signal group is composed of the signals that are not related to cardiac activity but that are influenced by such activity, and therefore, they could provide useful information for heartbeat detection (e.g., EEG, EMG, EOG, and general pressure (PRESS), among others).
Figure 2 shows some examples of these signals of interest. It can be appreciated that some physiological signals (e.g., BP) have their peaks delayed with respect to the ECG peaks. This should be taken into account during heartbeat detection to correct the position of the beats (see
Section 3.5.3).
Among the direct group, the ECG records the electrical activity of the heart. Therefore, it is the most suitable signal for heartbeat detection, since the signal is directly generated by the phenomenon of interest (the heartbeat). The PPG measures the volumetric change of the heart by measuring light transmission or reflection. When the heart contracts, a pulse of blood is sent into the arteries of the body. By detecting peaks in the amplitude of this signal, those pulses and, hence, the heartbeats can be detected. The SV is the volume of blood pumped from the left ventricle per beat. It is computed by subtracting the volume of the blood in the ventricle at the end of a beat from the volume of blood just prior to the beat. The BCG signal, which emerges from a minuscule motion of the human body in response to the recoil forces of the cardiac ejection into the vascular system, also comprises heartbeat detection information.
The SAP is the primary determinant of cerebral blood flow. It is computed from the cardiac output and, hence, is related to heart function. The BP signal relates to the pressure of the blood within the circulatory system. When the heart beats, it pumps blood around the body to give it the energy and oxygen needed. As the blood moves, it pushes against the sides of the blood vessels. The BP is the strength of this
pushing. After the pumping of blood produced by a beat, there is an increase in BP, followed by a decrease until the next blood pulse (beat) arrives. The specific moment at which the maximum pressure is reached after a beat depends on the distance between the heart and the point of the body in which the pressure is being measured. The more distance, the longer the delay in the arrival of the peak of the pressure wave. The different pressure-related signals (i.e., ABP, BP, CVP, PAP, and PRESS) measure pressure in different parts of the body. Through the count of the number of local maximums (or minimums) that occur in the BP, it is possible to establish a patient’s heart rate (see
Figure 2).
Among the indirect signal group, the EEG records the cortical electrical activity of the brain from the scalp. The relatively high electrical energy of the cardiac activity causes EEG artifacts, contaminating the EEG signal with QRS complexes (see
Figure 2). The EOG signal measures the electrical activity of the eyes from the corneo-retinal standing potential that exists between the front and the back of the human eye. The EMG records the electrical activity of the skeletal muscles by measuring the electrical potentials generated in them. The EOG/EMG signals are also contaminated with the electrical potentials generated by the QRS complexes (see
Figure 2). Therefore, by looking for those artifacts, the QRS complexes can be identified over the EEG/EOG/EMG signals, making them potential sources of information about the heartbeat.
Table 6 presents the signals that have been employed in the reviewed papers. Most of the proposals for heartbeat detection employ the ECG and ABP signals, since these are directly related with cardiac activity [
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52]. Other authors have added some other signals to that group: PPG signal [
53,
54]; SV and PPG signals [
55]; and EEG, EOG, and EMG signals [
56,
57].
The use of ECG and BP signals has also been extensively studied due to their direct relationship with cardiac activity [
58,
59,
60,
61,
62,
63,
64,
65], from which other approaches that integrate up to 3 additional signals have also been presented: SV [
66], EEG [
67,
68], and EOG [
69,
70] signals; EOG and EMG signals [
71]; SV and EOG signals [
72]; EEG, EOG, and EMG signals [
73]; and SV, EEG, and PPG signals [
74].
The combination of ECG and SAP signals has also been studied [
75,
76], and the combination of the most commonly used signals (ECG+BP and ECG+ABP) with some other signal/s as well [
77,
78,
79] has been studied. In particular, the pulmonary arterial pressure (PAP) signal is added in References [
77,
78], and SV and central venous pressure (CVP) signals are added in Reference [
79] to those presented in References [
77,
78]. The most extensive signal set composed of ECG, BP, ABP, SV, PAP, CVP, PRESS, PPG, EOG, EEG, and EMG signals is employed in Reference [
82], and ECG, BP, ABP, PRESS, PPG, and SV are used in Reference [
81]. BCG along with the ECG has been employed in Reference [
80].
3.2. Signal Preprocessing
Signal preprocessing is often needed to improve the quality of a signal for the subsequent stages. A summary of the signal preprocessing techniques is presented in
Table 7. Low-pass filters [
44,
45,
57,
71,
73,
80] and band-pass filters [
44,
55,
56,
59,
65,
79,
81] are widely used for ECG preprocessing. Different cutoff frequencies have been proposed for low-pass filtering, including 40 Hz for the ECG and BP signals [
57,
73]; 35 Hz for the ECG, BP, EOG, and EMG signals [
71]; 20 Hz for the BCG signal [
80]; and 16 Hz for the ECG and ABP signals [
44,
45]. Regarding the band-pass filters, cutoff frequencies spread in the range 0.5–10 Hz for the ABP signal [
44], and 0.5–80 Hz for the ECG signal [
65]. In Reference [
56], different cutoff frequencies are employed depending on the signal (5–40 Hz for ECG, 5–55 Hz for EEG, 10–25 Hz for EOG, and 5–15 for EMG). In some works, the cutoff frequencies of the band-pass filtering are computed from the percentiles of the RR intervals obtained from the QRS detection in the BP signals [
59] and from the d2, d3, and d4 coefficients of the wavelet transform in the ECG signals [
79]. High-pass filters have also been used in Reference [
80], with cutoff frequencies of 1 Hz for the ECG signal and 0.5 Hz for the BCG signal.
Baseline wander is a key point when processing ECG signals. This is typically caused by electrode-related issues, movement, and respiration of the person [
83] and affects the low-frequency components (below 0.5–0.6 Hz) of the ECG signal [
84] and, in general, of any signal that contains heartbeat detection information. From the signal recording perspective, baseline wander causes the signal to shift from its normal base. To address this, baseline wander suppression has been widely used: in Reference [
65], a two-order smooth filter was applied to the ECG and BP signals. In Reference [
44], 0.5–10 Hz band-pass filtering has been used for the ABP signals. Approximation coefficients of the wavelet transform are used in References [
67,
85] for the ECG signals. From all the filters that are applied in cascade in Reference [
63], the last high-pass filter used in the quadratic spline approach in that work is used for ECG and BP signals. Band-pass filtering with cutoff frequencies of 5–40 Hz has been employed for ECG signals in Reference [
56]. In Reference [
57], a moving median filter is used. In Reference [
78], the convolution-based filtering serves as baseline wander suppression for ECG, BP, ABP, and PAP signals. Cascade median filters are used in Reference [
52] for the ECG signal. In Reference [
79], the d1 coefficients of the wavelet transform are removed for ECG, BP, ABP, PAP, SV, and CVP signals.
Power line interference can also introduce noise in the signal recordings, which causes variations in signal parameters (e.g., amplitude and duration, among others) and may lead to diagnostic errors. This noise occurs at 50/60 Hz frequencies. Notch filtering, aiming to remove this power line interference, has been employed in References [
65,
80].
Other types of filters such as moving average [
61,
66], median [
73], mean [
57], anti-aliasing [
70], quadratic spline (from low and high-pass filters) [
63], and convolution-based filtering (OWN) [
78] have also been used. Wavelet transform has also been employed in Reference [
47].
To unify the sampling frequency of the input signals, which enables a meaningful frequency-based signal analysis, downsampling [
61,
71,
80,
81] and resampling techniques [
43,
52,
63,
70,
72] are also common preprocessing steps. Signal normalization has also been considered in References [
52,
61,
63,
72,
78,
79].
3.6. Fusion
The fusion stage is necessary to produce the final annotation list that contains all the heartbeats detected over the multiple physiological signals. A good fusion strategy can improve heartbeat detection by exploiting complementary information present in the different physiological signals while, at the same time, avoiding those signal intervals that have poor quality or high levels of noise.
There exist multiple strategies used to combine the annotations from the different signals. We shall distinguish four main approaches, although other classifications are possible, in particular when considering the diffuse edges between them and that some overlap exists. These approaches are RR-based methods (
Section 3.6.1), signal switching (
Section 3.6.2), voting (
Section 3.6.3), and approaches that merge heartbeat detection and fusion into a single step (
Section 3.6.4). RR-based methods rely on the beat-time information previously obtained, since the fusion is based on the time between two beats (often, between the R-peaks of two QRS complexes). This makes precise time location of the beat crucial for fusion and may present a bottleneck for improving the performance. Signal switching depends on the SQA algorithm, since the annotations that are kept in the final list are obtained by using the signal which yields the best SQA measurements. In this case, employing a robust SQA algorithm is a must. In voting approaches, each potential beat detection over a signal constitutes a vote for the final beat presence, a vote that is often weighted by some metric of the signal’s reliability and/or quality. This may be similar to the signal-switching approaches, since these also employ some kind of voting (i.e., they rely on the detection provided by the
best signal). Techniques that combine detection and fusion in a single step borrow the ideas either from statistical methods (often Bayesian approaches or techniques based on Hidden Markov Models) or from machine learning for merging detection and fusion into a single step.
Some authors [
80] have also proposed a manual fusion from the morphology and timing of the IJK complex in the BCG signal and of the R and T peaks in the ECG signal.
Table 14 presents a summary of the fusion approaches used in the reviewed papers.
3.6.1. RR-Based Methods
A common approach for combining the annotations obtained over different physiological signals is joining all the annotations into a single annotation list, sorting them, and combining very close annotations (usually employing a window of 150 ms approximately). Then, the RR intervals are used for detecting missing annotations (which would yield a very large RR interval) or spurious annotations (which would yield a very short one). Usually, spurious annotations are directly removed whereas missing annotations are predicted using interpolation or the mean RR interval. This approach or similar ones are used in References [
42,
46,
57,
58,
59,
61,
64,
72,
73,
78,
79]. The process may be repeated several times until convergence [
81]. It is worth discussing some of the variations of this idea.
Some proposals use advanced filtering techniques for improving the detection of outliers within the RR series. For example, in Reference [
42], a Hampel filter [
120,
121] is used for detecting outliers, which are then interpolated using a nearest neighbours approach. On the other hand, the Hjorth’s mobility is employed in Reference [
74] for estimating the number of missing annotations.
In Reference [
43], a nearest-neighbour selection scheme is employed. In this way, in case the annotation is output by two or more heartbeat detection algorithms, the end time and peak value corresponding to the mode RR interval time are assigned to the given annotation, and in case the annotation is output by a single algorithm, the end time and peak value yielding an RR interval closest to the previous averaged 12 RR intervals are assigned to the corresponding annotation.
The “sandwich rule” proposed in Reference [
60] states that an R-wave is valid if two conditions are met: It is the only R-wave between two consecutive ABP onsets, and each of these ABP onsets is the only onset between two consecutive R-waves. In the simplest cases, an invalid annotation (either from ECG or ABP) is corrected by using a mean QRS-BP delay. For example, if two consecutive BP onsets “sandwich” more than one QRS, the mean QRS-BP delay is used to predict proper positions of the missing QRS peaks. Since in pathological cases (such as premature ventricular heartbeats or a very noisy ECG segment) the “sandwich rule” may fail, the authors perform a sanity check to ensure that the ECG peaks are indeed QRSs. The idea is that a QRS complex will intersect a set of regularly spaced horizontal lines placed over the ECG at most six times, whereas this number will be probably larger in noisy segments. The “sandwich rule” is also applied in Reference [
42].
It is worth noting that RR intervals are also used to refine the final annotation list that results after the fusion algorithm [
47,
59,
61,
63,
74,
78,
82]. The techniques employed to that end are very similar to those introduced at the beginning of this subsection.
3.6.2. Signal Switching
A recurrent idea which usually yields good performance is generating the annotations by switching from one signal to another depending on the quality index of the current segment. Usually, a hierarchy of signals is built. For example, if the ECG has enough quality, the annotations from the ECG are used, and if not, the quality of the BP is assessed; if it is good enough, its annotations are used, and if not, the signal with the highest quality among the remaining ones is used. Examples of these approaches can be found in References [
47,
48,
49,
55,
57,
62,
65,
66,
68,
77]. The method presented in Reference [
79] simply rejects an annotation in case this belongs to a signal segment in which the physiological range values are out of a predefined interval.
3.6.3. Voting
Among the voting approaches, we may distinguish between majority voting and weighted voting. In majority voting, each of the signals votes for the presence of a heartbeat in a small window. The final heartbeat location is then found by searching for a local maximum or by requiring the agreement of a minimum number of signals (usually, half plus one). Majority voting is used in References [
67,
74,
79,
82].
Majority voting can be improved by assigning different weights to the signals according to diverse criteria. We shall summarize some of the algorithms using weighted voting to illustrate possible weighting schemes.
In Reference [
69,
70], a majority voting technique using a Tukey window is proposed. However, different weights are assigned to the signals (according to its type), so that ECG, BP, or PPG can trigger a detection on their own. However, two simultaneous detections over SV, EEG, and EOG or a single detection over any of these signals overlapping with the location of a predicted heartbeat (using linear interpolation) are required to trigger a final detection.
In Reference [
56], a majority voting fusion method integrates the information from the multiple physiological signals as follows. First, the signals are segmented into small windows and the average heartbeat signal quality index (SQI) is used to choose the best signal as reference signal. In this reference signal, each RR interval is selected and considered as a potential annotation in case it fits within some tolerance limits. All the annotations in the other signals within a window of 150 ms are considered for fusion. To that end, weights for each signal are assigned for majority voting, so that signals below a predefined threshold are considered noise and rejected in the voting. Finally, the mean temporal location computed from the annotations in the voting produces the final annotations.
In Reference [
71], the annotations are created by combining all the channels (which were preprocessed using template matching; see
Section 3.5) into two different new signals: the Total Correlation Response (TCR) signal, which weights all the channels according to the mean correlation between the channels and the template pattern, and the Best Correlation Response (BCR) signal, which just picks the temporal maximum across the channels. The annotations are created by ensuring that each new detection is far enough from the previous one and by switching from TCR to BCR when the quality of TCR is too low. In Reference [
54], Bayesian inference that permits weighting the different signals is employed for fusion.
The work presented in Reference [
76] uses the optimal fusion method proposed in References [
122,
123]. This decision rule combines the individual decisions of each detector, weighted according to its performance. On the other hand,
and and
or rules are applied in the annotations in Reference [
75] to confirm or reject these. In this work, a heartbeat detection is copied to the final annotation list if it is output by the two signals of interest (
and rule) or only by a single signal (
or rule).
3.6.4. Simultaneous Detection and Fusion
In Reference [
53], all the signals are employed to build a matrix that represents a multiparameter signal in which rows represent time and columns represent the type of signal to fuse. Then, the Euclidean distance is used to compute the similarity between each new signal and predefined templates for all the signals of interest. This Euclidean distance matrix is given to a variant of DTW, named Weighted Time Warping, which looks for the annotation boundaries. This algorithm yields as the result the final annotations, and therefore, it performs detection and fusion at the same time.
In References [
44,
45], hidden semi-Markov models (HSMM) are employed for robust heartbeat detection. The HSMM uses two hidden states (QRS complex and non-QRS complex) and a Gaussian emission function. To estimate the state of the ECG and BP signals, features obtained from derivative-based filters are fed to the HSMM. The most likely sequence of states for the observable signals are obtained using the Viterbi algorithm. As the Viterbi algorithm provides output probabilities for each feature vector, these are then merged with the signal-quality index computed previously to output the final annotations.
The algorithm from Reference [
51] proposes fusing ECG and ABP signals using a more complex Bayesian approach, which is based on two layers. In the first layer, signals are decoded in terms of states related to well-known waveform segments:
ISO (isoelectric),
P,
PQ,
QRS,
ST, and
T for the ECG and
SBP,
DBP,
Diastolic cusp, and
Offset for the ABP. This is achieved by modelling the waveforms as a Hidden Markov Model (HMM). The second level uses the HMM decoded states of the ECG and ABP to detect the presence or absence of a QRS segment. The authors propose two different models to make this decision. The simplest model uses a Bayesian Network (BN) to model the relationships between the three relevant random variables of the problem: the state of the ECG
E, the state of the ABP
B, and the classification output
C (which is a binary random variable). Within this model, the authors test different BNs which may be broadly categorized into two categories: BNs assuming that
E and
B participate independently in the decision about
C and BNs assuming that
E and
B are dependent on each other for deciding
C. The second model the authors propose is justified by the observation that consecutive states of the signals are correlated in time. This information can be incorporated in a model using state transitions, which results in Dynamic Bayesian Networks (DBNs). Just like with BNs, the authors test different transition models that show correlations between
E and
B or not.
Yet another example of a Bayesian approach is Reference [
50], which fuses ECG and BP using a generative model that captures a simplified understanding of the heart rhythm. The graphical model is a dynamic Bayesian network that relates hidden state variables (such as heart rate or a binary variable indicating if a BP peak is present) with both observations obtained from the GQRS and WABP algorithms and signal quality indices. The hidden states (including
ECGPeak and
ABPPeak) are learned by applying a particle filtering to the signals, which are first split into 25-ms windows. The hidden states are then used to annotate the signal. Specifically, the annotations correspond to timestamps where enough particles are in a state of
ABPPeak. The position of the annotation is corrected using yet another hidden variable that captures the latency between the ECG and the BP signals.
In Reference [
52], a Convolutional Neural Network (CNN)-based approach is proposed for the detection and fusion of the annotations obtained from ECG and BP signals. Note that the CNN is able to extract the features by itself, and therefore, in contrast to the method from Reference [
63], the inputs are the raw signals. Again, the CNN is trained with small intervals of data labelled as 1 or 0 depending on whether there is a heartbeat in the middle of the interval or not. Unlike the method from Reference [
63], this proposal blends together detection and fusion.