Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review

Argüello-Prada, Erick Javier; Castillo García, Javier Ferney

doi:10.3390/s24227193

Open AccessReview

Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review

by

Erick Javier Argüello-Prada

^1,*

and

Javier Ferney Castillo García

²

¹

Programa de Bioingeniería, Facultad de Ingeniería, Universidad Santiago de Cali, Calle 5 # 62-00 Barrio Pampalinda, Santiago de Cali 760032, Colombia

²

Programa de Mecatrónica, Facultad de Ingeniería, Universidad Autónoma de Occidente, Calle 25 # 115-85 Vía Cali-Jamundí, Santiago de Cali 760030, Colombia

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(22), 7193; https://doi.org/10.3390/s24227193

Submission received: 21 August 2024 / Revised: 10 September 2024 / Accepted: 2 October 2024 / Published: 9 November 2024

(This article belongs to the Section Biomedical Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning algorithms have brought remarkable advancements in detecting motion artifacts (MAs) from the photoplethysmogram (PPG) with no measured or synthetic reference data. However, no study has provided a synthesis of these methods, let alone an in-depth discussion to aid in deciding which one is more suitable for a specific purpose. This narrative review examines the application of machine learning techniques for the reference signal-less detection of MAs in PPG signals. We did not consider articles introducing signal filtering or decomposition algorithms without previous identification of corrupted segments. Studies on MA-detecting approaches utilizing multiple channels and additional sensors such as accelerometers were also excluded. Despite its promising results, the literature on this topic shows several limitations and inconsistencies, particularly those regarding the model development and testing process and the measures used by authors to support the method’s suitability for real-time applications. Moreover, there is a need for broader exploration and validation across different body parts and a standardized set of experiments specifically designed to test and validate MA detection approaches. It is essential to provide enough elements to enable researchers and developers to objectively assess the reliability and applicability of these methods and, therefore, obtain the most out of them.

Keywords:

motion artifacts; photoplethysmogram; machine learning; reference signal-less methods; real-time applications; computational complexity

1. Introduction

Out of all the sensing technologies for clinical and non-clinical applications, photoplethysmography (PPG) is arguably the one that has received the highest attention during the last decades. While initially conceived for monitoring fluctuations in peripheral blood volume, PPG technology has opened new opportunities for the non-invasive estimation of oxygen saturation [1], blood glucose levels [2], and arterial pressure [3]. PPG uses only one optical sensor, which captures light absorption changes due to the wave-like motion of the blood through the vessels, giving birth to a pulsatile waveform known as the PPG signal. However, the path light must travel through the tissues and alters during physical motion due to the relative displacement between the sensor and the skin surface, thus resulting in a severely distorted PPG waveform. As this type of disturbance can lead to erroneous estimation of several vital signs, PPG-based physiological monitoring during physical activity remains a critical challenge for the research community, which has devoted remarkable efforts to address this issue [4].

PPG signal distortions caused by body movements are commonly known as motion artifacts (MAs), and they are usually several times higher in amplitude than the signal [5]. MAs do not only significantly alter the contour of the PPG signal but also add frequency content to its power spectral density that may interfere with proper heart rate (HR) estimation based on spectral peak identification [6]. Alternatives to removing MA frequency components from the PPG spectrum include adaptive filtering and multi-resolution decomposition techniques, which have proven valuable in estimating HR accurately [7,8]. However, most MA removal and signal correction strategies are applied indistinctly to corrupted and clean segments, potentially distorting the latter and thus becoming inefficient. In this sense, there is an increasing interest in approaches that can differentiate an MA-corrupted PPG signal from a clean one.

According to Such [9], approaches for detecting MAs in biomedical signals may fall into one out of two categories: single- and multiparameter methods. Unlike single-parameter techniques, multiparameter approaches to detect MAs in PPG signals rely on additional sensors conveying information that can be associated with the motion or the signal itself. Additional sensing elements include accelerometers [10,11] and optical source-detector pairs with a peak response beyond the red–infrared wavelength range [12,13]. In some studies, the reference noise signal is generated internally from the MA-corrupted PPG segments, thus eliminating the dependency on additional hardware [14,15]. Extra sensor channels can also transmit information about the same or a similar physiological marker that responds differently to MAs. Still, using measured or synthetic reference signals to identify MA-corrupted PPG segments usually involves adaptive filtering, which, besides its high mathematical and computational complexity, may require large amounts of data and time to converge to an optimal solution [16].

Unlike the abovementioned techniques, MA detection approaches that do not require measured or synthetic reference signals (i.e., reference signal-less MA detection methods) could be more convenient for wearable, real-time applications as they overcome the need for extra data sensing and processing. Machine learning (ML) has brought remarkable advancements in this regard, as it can classify PPG signals or segments into “reliable” or “unreliable” by finding discriminatory information and identifying complex patterns, either autonomously or with minimal human intervention [17]. Several reviews on MA detection and removal from PPG signals have been published recently [5,7,8]. Nevertheless, no recent study has focused in depth on reference signal-less (RSL) methods to detect MAs in PPG signals via ML techniques. Therefore, this study aims to (i) synthesize the current state-of-the-art approaches using ML to detect MAs in PPG signals with no other information than that coming from the signal itself and (ii) provide researchers and developers with some insight to decide which method is more suitable to use for their work.

Scope: PPG signals can be distorted by factors other than motion, such as high-frequency noise, physiological processes, and environmental conditions [18,19]. Therefore, it is unfeasible to obtain an artifact-free PPG waveform in practice. In this regard, several authors have introduced the signal quality index (SQI), which aims to measure the degree of signal contamination and the reliability of the information we can extract from it [20,21,22,23]. However, given the increasing interest in developing PPG-based devices for physiological monitoring when individuals walk, exercise, or perform everyday activities, many methods for SQI estimation have been developed under the premise that motion is the factor that most distorts the PPG signal. Therefore, this review examines ML-based methods utilizing the SQI to express how contaminated the signal is mainly, or solely, due to MAs or as a feature for MA detection.

2. Source Identification and Selection

This narrative review was conducted by searching Scopus, IEEE Xplore, and Google Scholar using a Boolean combination of the terms “motion artifact” AND “machine learning” AND “photoplethysmogram” OR “photoplethysmography” OR “PPG.” The search was limited to conference and journal papers published in English from 2014 to 2024. Any study focusing on an ML-based approach to detect MAs in PPG signals with no measured or synthetic reference data or including it as part of a more elaborate strategy (e.g., heart rate estimation) was deemed eligible. Therefore, we excluded articles reporting techniques that filter or decompose the signal for reconstruction or quality improvement without previous identification of corrupted segments. We also discarded studies on MA-detecting approaches that require more information than that provided by the PPG signal itself. Articles outlining methodologies specifically conceived for non-contact PPG were also excluded. If several documents appeared as a sequence of incremental improvements, we selected only the one reporting the accumulation of such findings. Reviews and meta-analyses were not considered eligible, although their reference lists were used to retrieve any relevant study. Finally, we did not consider studies published as abstracts or posters.

3. Background on Machine Learning (ML)

Artificial intelligence (AI) encompasses all processes where machines solve problems or perform tasks mimicking human behavior [24], including a subset of techniques enabling performance improvement of computer programs from previous computations. These techniques are collectively termed machine learning (ML) and involve finding hidden insights and complex patterns without explicitly being programmed to produce the desired answer [25]. For instance, let us consider “teaching” a computer to simulate a two-input logic gate. We could use the built-in function provided by the programming environment (i.e., the explicit programming strategy) or train a classification learning algorithm (e.g., a single-layer perceptron) by presenting input data with their corresponding correct outputs according to the truth table. Thus, the computer “learns” in a supervised fashion and produces a predictive function that can deliver a binary response in the context of the given example [26]. ML techniques relying on pre-existing data labels or specifications fall under the supervised learning category. On the other hand, those algorithms detecting patterns and relationships without data specifications belong to the unsupervised learning category. Another kind of ML is reinforcement learning, in which the programmer lets the computer use the trial-and-error principle to achieve the goal by itself instead of providing input and output pairs [27]. The learner is not told which actions to take but discovers which one yields the maximum reward by trying each action iteratively.

Deep learning is a subset of ML that does not require handcrafted feature engineering (see Figure 1) but employs a cascade of multiple layers of nonlinear processing units for automated feature extraction and transformation [28]. While layers close to data inputs identify simple features, intermediate layers use them to learn more elaborated ones. Deep learning outperforms ML when data are numerous, noisy, and unstructured. Nonetheless, responses provided by ML algorithms are more interpretable than those for deep learning algorithms [29], whose performance may decline when data dimensionality is low [30].

4. Machine Learning Techniques Applied to Reference Signal-Less Detection of Motion Artifacts in PPG Signals: From Traditional to Deep Learning

Detecting MAs in PPG signals is often addressed as a classification task in which the algorithm’s output can take only two (binary classification) or several values (multiclass classification). Traditional ML algorithms comprise classification techniques that involve manual feature extraction. On the other hand, deep learning (DL) stands for the computationally intensive processes able to automatically identify and learn discriminative feature representations with minimal or no human effort (see Figure 2). The following sections synthesize the application of traditional and deep learning techniques in the RSL detection of MA-corrupted PPG segments.

4.1. Traditional Machine Learning Techniques

4.1.1. Characterization of Studies

Table 1 summarizes the studies using traditional ML algorithms for the RSL detection of MAs in PPG signals, with the support vector machine (SVM) being the most frequently employed classification model [31,32,33,34,35,36,37]. Only a few articles report using other classifiers, such as random forest [38,39], single-layer perceptron [40], self-organizing map [41], and elliptical envelope algorithm [42].

Most traditional ML-based approaches for MA detection in PPG signals rely on supervised learning models, and only a few studies have reported using unsupervised models. For instance, Roy and colleagues [41] employed a self-organizing map (SOM) to discriminate between clean, partially clean, and corrupted PPG segments by extracting four entropy-based features from 5 s sequences. Also known as Kohonen maps, SOMs are unsupervised learning algorithms capable of clustering groups from a high-dimensional input space onto a low-dimensional discrete lattice of output neurons [43]. In a different study, Mahmoudzadeh and co-workers [42] employed the elliptical envelope algorithm to identify MA-corrupted PPG segments from data continuously acquired over six days via a Samsung Gear Sport watch. The elliptical envelope algorithm is an unsupervised learning method that draws an ellipsoid around the center of the data samples by computing the Mahalanobis distance between them [44]. Assuming features computed over MA-free PPG segments adhere to a Gaussian distribution, data points outside the ellipsoid are deemed MA-corrupted.

4.1.2. Features

The authors of the included studies have used multiple features from various domains (i.e., time, frequency, and wavelet), so there is no clear preference for a specific set of features between approaches. Time-domain features employed to detect MAs in PPG signals include those relying on fiducial point detection, such as peak-to-peak and valley-to-valley intervals, peak-to-peak and valley-to-valley amplitude differences, onset-to-peak amplitude differences, and pulse width [33,35,37,38]. Approaches using non-fiducial-based features employ statistical indexes like the mean, standard deviation, variance, interquartile range, skewness, and kurtosis of the signal [36,39,42], energy operators (e.g., Kaiser–Teager energy) [39], and entropy-based features (e.g., Shannon entropy, approximate entropy, permutation entropy, and sample entropy) [36,39,41,42]. Frequency-domain features involve the computation of the signal’s power spectral density and range from amplitudes and phases of fundamental and harmonic frequencies of the PPG signal [31,34] to spectral entropy, spectral dominant peak amplitude, and power spectral density kurtosis [32,36,39,42]. Wavelet-based features are less frequent than those from other domains, and they include measures of asymmetry, tailedness, complexity, richness, regularity, and unpredictability of approximation coefficients of the wavelet transform [36].

As important as a successful feature extraction process for traditional ML models’ performance [17] is selecting informative features with little or no redundancy. Too many features may limit the model’s performance and increase its complexity, so examining the dependency of the target variable with each computed feature and, in turn, removing the redundant ones is crucial to building more accurate and efficient models [45]. Interestingly, only three studies applying traditional ML algorithms for the RSL detection of MAs in PPG signals report feature selection methods. Two of them [39,42] used the one-way analysis of variance (ANOVA) F-test, which assesses the discriminative power of each feature by calculating the ratio of between-class variance to within-class variance [46]. The other study [31] utilized a recursive feature elimination (RFE) technique, in which an initial set of features is used to train the learning model. After computing their relevance for the classification problem, some features are eliminated, and another feature set is tested until the algorithm achieves the highest accuracy with an optimal number of features. Several other feature selection methods, such as ReliefF [47] and minimum redundancy–maximum relevance (mRMR) [48], select features based on their importance to the target variable and their redundancy with each other. However, no study applying traditional ML models to detect MA-corrupted PPG segments reports these methods.

4.1.3. Benefits and Drawbacks

Traditional ML algorithms are helpful for low-dimensional data input, especially when training data availability is limited [30]. As can be seen from Table 1, the SVM is, arguably, the most widely used ML model for detecting MAs in PPG signals with no reference or extra data. SVMs can deal with nonlinear relationships between the input and output features and provide a high accuracy in exchange for a relatively short training time [26]. Just like random forest and neural networks, on the other hand, SVMs have low interpretability, which means that they do not allow an understanding of how or why they produce a specific result [49,50].

One of the main disadvantages of traditional ML algorithms is their need for a manual feature design. Handcrafted feature engineering is a time-consuming task that often requires a lot of domain expertise within an application-specific engineering process [17]. In the context of MA detection, it implies a concrete definition of how motion alters the signal morphology, which may vary depending on the movement performed by the subject. As stated before, deep learning models overcome this limitation by extracting discriminative feature representations with no or minimal human effort. They also have proven valuable in dealing with noisy, unstructured, and high-dimensional data input.

4.2. Deep Learning Techniques

4.2.1. Characterization of Studies

Research applying deep learning techniques for the RSL detection of MAs in PPG signals is summarized in Table 2, with convolutional neural networks (CNNs) being the most widely adopted model. CNNs are the fundamental structures employed in numerous fields of ML, especially those related to computer vision and image processing [28]. Two-dimensional (2-D) image data result in a much more powerful information representation, so several authors have considered transforming the one-dimensional PPG time series into images to exploit the current advantages of CNNs. One study by Liu and colleagues [51] used the Gramian Angular Field technique [52] to encode a PPG time series into images by calculating the polar coordinates of each point of the normalized time series. The resulting matrix was used as the input of a 2-D CNN classifier with ReLU as the activation function, thus achieving an accuracy of 96.6% and 94.6% for local and publicly available datasets. Gramian Angular Fields have also been used by Suzuki and Freitas [53], who combined them with recurrence plots and Markov transition fields to obtain a hyperspectral image of 48 PPG signals extracted from twelve subjects. Several neural networks, including AlexNet, ResNet, SqueezeNet, and Swin Transformer, were tested, with most achieving higher accuracies, except for ResNet. Zargari and co-workers [54] applied a 1-D-to-2-D transformation to remove the noise induced by MAs employing a trained Cycle Generative Adversarial Network (Cycle-GAN). Nevertheless, most studies on the RSL detection of MAs report the utilization of 1-D CNNs [55,56,57,58,59,60] as it reduces the computational complexity and enables real-time classification. Shahid and colleagues [61] found that 1-D and 2-D CNNs classify time series data with a similar accuracy. Therefore, it is feasible to expect that CNNs utilizing 1-D signals directly can achieve a classification performance comparable to 2-D CNNs.

Unlike research on applying traditional ML techniques for the RSL detection of MAs, studies introducing deep learning-based approaches seem to concentrate on a narrower and more recent range of years. Possible explanations include drastically increased processing capabilities, the relative affordability of computing hardware, and recent advances in machine learning and information processing research [28,62].

Table 2. Summary of the deep learning-based approaches for the reference signal-less detection of motion artifacts in PPG signals.

Author(s), Year; [Reference]	Dataset	Method	Performance
Liu et al., 2020; [51]	Fifteen records selected from Physionet database + self-collected records (n = 15)	Two-dimensional CNN (supervised) + 10-fold CV	Physionet data Sensitivity: 94.9% Specificity: 97.8% Accuracy: 94.3% Self-collected data Sensitivity: 93.5% Specificity: 96.4% Accuracy: 96.6%
Goh et al., 2020; [55]	MIMIC II (n = 69) + self-collected records (n = 38)	One-dimensional CNN (supervised) + hold-out	Sensitivity: 96.6% Specificity: 91.2% Accuracy: 94.5%
Azar et al., 2021; [56]	Self-collected (n = 2)	CNN and long short-term memory (unsupervised) + hold-out	Sensitivity: 95% Precision: 90%
Guo et al., 2021; [57]	PPG-DaLiA (n = 15), WESAD (n = 15), and IEEE-SPC 2015 datasets (n = 12)	One-dimensional CNN with U-net architecture (supervised) + 10-fold CV	DaLiA F1-Score: 87.34 ± 0.18% WESAD F1-Score: 91.14 ± 0.33% IEEE-SPC 2015 F1-Score: 80.50 ± 1.16%
Shin, 2022; [58]	MIMIC III (n = 458)	One-dimensional CNN (supervised) + five-fold CV	Sensitivity: 94.8% Specificity: 99.3% Precision: 98.5% Accuracy: 97.8% F1-Score: 96.9% AUC-ROC: 98.0%
Zargari et al., 2023; [54]	Physionet-BIDMC (n = 53) + self-collected records (n = 33)	Two-dimensional Cycle Generative Adversarial Network (unsupervised) + hold-out	The peak-to-peak error and RMSE were 0.95 and 2.18 beats per minute, respectively
Freitas et al., 2023; [63]	Self-collected (n = 46)	Vision transformer	Sensitivity: 93.38% Precision: 94.85% Accuracy: 92.21% F1-Score: 94.11%
Lucafó et al., 2023; [59]	Self-collected (n = 46)	One-dimensional CNN and single-decision rule (supervised) + LOOCV	Sensitivity: 87.5 ± 0.4% Precision: 97.1 ± 0.1% Accuracy: 89.9 ± 0.2% F1-Score: 92.0 ± 0.2% AUC-ROC: 91.1 ± 0.1%
Liu et al., 2023; [64]	MIMIC III, UCI, and Queensland datasets (n = no reported)	Two-dimensional CNN and Swin Transformer + hold-out	Sensitivity: 97.4% Specificity: 96.1% Precision: 95.3% Accuracy: 97.3% F1-Score: 95.7% AUC-ROC: 99.2%
Suzuki and Freitas, 2024; [53]	BUTPPG dataset (n = 12)	SqueezeNet + hold-out	Sensitivity: 97.9% Precision: 94.4% Accuracy: 93.8% F1-Score: 95.5%
Zheng et al., 2024; [60]	Self-collected (n = 15)	Depth-wise separable 1-D CNN (supervised) + 10-fold CV	F1-Score: 87.20 ± 0.16%

As occurs with research on traditional ML techniques applied to the RSL detection of MA-corrupted PPG segments, unsupervised deep learning models are less frequent than their supervised counterparts. Azar and co-workers [56] proposed a CNN–long short-term memory (LSTM)-based autoencoder to detect MAs in PPG signals. The method uses the discrete wavelet transform to convert redundant samples in the signal’s temporal domain into decorrelated coefficients in the time–frequency domain, thus allowing the original samples to be compressed and depicted with fewer coefficients. Reducing the input sequences’ length enabled the model to learn the same patterns while processing fewer data points, thus achieving a sensitivity and precision of 95 and 90%, respectively. Another unsupervised model successfully adapted for MA detection was the Cycle Generative Adversarial Network (Cycle-GAN). In that study [54], 2-D representations of MA-corrupted PPG signals are detected and reconstructed by the model, which relies on several convolutional layers followed by two fully connected layers that end in a Softmax activation to compute the probability of each class (i.e., clean or noisy). The method achieved a 9.5-times improvement in MA removal compared to other non-accelerometer-based approaches.

4.2.2. Automated Feature Learning

As previously pointed out, deep learning algorithms require minimal or no manual feature engineering, as they can automatically learn patterns from inputs and store them as the parameters of network connections [65]. Deep learning-based approaches eliminate the necessity to rely on parameter adjustments performed for different users, and they do not involve fiducial point detection or empirically defined features that may not hold across subjects. On the other hand, data diversity plays a significant role in how accurate deep learning-based methods can be. False classification may occur if models are trained with minimal arrhythmia-affected PPG samples [55] or lacking skin color diversity [57]. In this sense, more comprehensive datasets are necessary to produce more robust deep learning-based methods for MA detection.

5. Discussion

Commonly used metrics to evaluate the performance of ML-based classification algorithms include sensitivity, specificity, precision, accuracy, F1-score, and the area under the receiver operating characteristic curve (ROC-AUC). However, the metrics used to measure the effectiveness of the reviewed MA detection methods differ (see Table 1 and Table 2). Furthermore, authors used diverse datasets that include self-collected records [31,33,36,37,38,40,41,42,51,54,55,56,59,60,63], and dataset-splitting techniques are also different. This heterogeneity makes it quite difficult to systematically compare the reviewed ML-based approaches for the RSL detection of MAs, which underlines the need for using well-established reference datasets and more consistency in reporting approaches’ effectiveness. Performance differences genuinely reflecting the effectiveness of the algorithms rather than the peculiarities of the abovementioned aspects can be possible only when authors use the same datasets, data-splitting methods, and assessment metrics.

The following section will critically discuss state-of-the-art RSL methods for detecting MAs in PPG signals regarding the implications of removing or de-noising MA-corrupted PPG segments, experimental design, testing protocols, and the methods’ applicability for real-time implementations.

5.1. Risk of Bias in Evaluating and Reporting the Method’s Effectiveness

One crucial aspect of ML-based approaches is their ability to perform well with data different from those used for training (i.e., generalization). In this sense, it is necessary to use new, unseen data to evaluate the model’s performance; otherwise, results would be biased, depending on the model’s complexity [66]. One alternative is to divide the whole dataset into two sets: one for training the model and another for evaluation or testing. Whereas the training set allows the ML model to “learn” by updating its parameters, the test set provides a means to measure how the model reacts to new observations. In the context of the RSL detection of MA-corrupted PPG data, several strategies involving dataset splitting into training and test sets (e.g., hold-out, leave-one-out cross-validation (LOOCV), and k-fold cross-validation) have been used (see Table 1 and Table 2). However, more than half of the reviewed studies [31,32,34,37,38,40,41,42,51,53,58,59,63] do not provide clear-cut evidence indicating the utilization of unseen data (i.e., a test set) to evaluate the method’s performance. Furthermore, some articles [32,33,38,40,41,42,63] do not even report the strategy adopted to validate the proposed method or whether the training and test set distribution is the same. When authors do not take care of all these aspects, there is a risk of producing ML-based models that are extraordinarily good on paper but disappointingly average when facing unseen data [67]. It also may cast some shadows on the quality and credibility of this body of work if such aspects are not adequately outlined.

Another common pitfall of research applying ML techniques is considering single rather than multiple performance assessment metrics. Although most reviewed literature relies on several metrics when reporting the method’s performance, some studies [31,37,57,60] only considered one or two metrics (e.g., accuracy, F1-score, or both), thus hindering systematic comparisons between approaches. Accuracy can be a misleading metric for imbalanced datasets, and the F1-score, while providing a balanced view of sensitivity and precision, can be produced by multiple combinations of these two metrics [68]. Even for highly accurate ML algorithms, trade-offs are inevitable, so to help readers intuitively explore them and the implications of the model getting inaccurate, it is necessary to report not one but several well-established performance metrics. Providing information like the one displayed by confusion matrices and receiver operating characteristic (ROC) curves has also proven valuable in this context.

5.2. Publicly Available versus Self-Collected Records and the Experimental Design Diversity

As shown in Table 1 and Table 2, authors often employ publicly available records for benchmarking their research on MA detection methods. Data sources include the well-known Physionet’s Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) dataset [69] and the subset containing physiological recordings from 53 patients admitted to medical and surgical intensive care units at the Beth Israel Deaconess Medical Center (BIDMC) [70], followed by Capnobase [71], 2015 IEEE Signal Processing Cup (SPC) Challenge [72], WESAD [73], and PPG-DaLiA [74] datasets. While publicly available datasets allow a fairer comparison among methods, PPG records from most of the abovementioned ones may not reflect disturbances produced by physical activities like walking as they come from hospitalized and ICU patients. In this sense, researchers strive to collect their data, sometimes utilizing proprietary equipment and software from manufacturers like Masimo (Masimo Corporation, Irvine, CA, USA), Sotera Digital Health (Sotera Inc., Carlsbad, CA, USA), and BIOPAC (BIOPAC Systems Inc., Goleta, CA, USA), or PPG data acquisition prototypes developed by themselves.

Concerning the strategies employed to contaminate self-collected PPG signals with MAs, some studies report that authors asked participants to wear PPG-based devices (e.g., smart watches) and perform specific actions ranging from randomly, intermittent finger or hand movements [31,33,41,54,56] to standing, walking, climbing stairs, and running on a treadmill [32,34,37,51]. However, there is no information on the duration of stationary and non-stationary conditions. In some articles, the participants could engage in daily activities while they wore the PPG acquisition device, but there were no details about the movements performed [38,42,59]. Only a few studies [34,40,51,54] report well-structured protocols (i.e., a timeline describing what actions were performed by the participants and how long they did it), which underlines the need for a standardized set of experiments specifically designed to test and validate MA detection approaches.

5.3. Implications of Ignoring or De-Noising MA-Corrupted Signal Segments for PPG-Based Physiological Monitoring

MAs may interfere with accurate PPG-based vital sign computation, so approaches able to differentiate clean from MA-corrupted signal segments may help to determine whether the value of the computed physiological index is reliable. Once the algorithm labels a PPG segment as MA-corrupted, it might be ignored or corrected using well-known or novel filtering techniques. Most of the reviewed RSL approaches have focused on assessing the usability of each signal segment [31,33,34,35,36,38,39,40,41,42,51,53,55,58,59,60,63,64], which may improve the reliability of vital sign estimation (see Figure 3a). Moreover, ignoring distorted signal segments could save computational resources and significantly reduce computing time. However, it also may lead to information loss if the subject’s movements extend for long periods or are very frequent. In this regard, RSL methods for identifying and de-noising MA-corrupted PPG segments could be more convenient for continuous PPG-based physiological monitoring. Interestingly, only a few studies fall into this subcategory [32,41,54,56].

5.4. Body Site Measurement

PPG-based wearable devices can operate more effectively when placed on specific body parts that do not restrict the subjects’ motion or manual activities, such as the wrist and earlobe [75]. Nevertheless, most reviewed RSL methods for MA detection rely on finger PPG data (see Figure 3b). Other body measurement sites have barely been considered, with just six [31,36,42,57,60,63], two [34,37], and one article [60] reporting the utilization of PPG data from the wrist, forehead, and earlobe, respectively. Significant differences between morphological features of finger PPG waveforms and those collected from the wrist, arm, earlobe, and forehead have been found [76]. Therefore, ML-based approaches for the RSL detection of MAs relying on finger PPG signals may not perform well in other body locations, thus limiting their applicability in the wearable industry.

5.5. The Promise of Real-Time Processing

Despite providing higher accuracies than rule-based methods, ML-based approaches for the RSL detection of MA-corrupted PPG signals are more computationally intensive, so they might not appear amenable to real-time implementation. Still, some attempts at online MA detection through ML techniques have been successful. Pflugradt and colleagues [40] introduced a system for Online Pulse Reliability Analysis (OPRA) using principal component analysis (PCA) and a single-layer perceptron, which is one of the most fundamental neural networks that can be implemented for solving binary classification tasks [77]. Athaya and Choi [39] used a previously trained random forest model to develop a standalone real-time application to detect MAs from the PPG signal captured by the rear camera of an Android smartphone (Google LLC, Mountain View, CA, USA). The unsupervised Cycle-GAN model proposed by Zargari and co-workers [54] takes roughly 0.4 s to detect and clean an MA-corrupted PPG signal. It also uses 45% less power than an accelerometer-based MA removal approach when implemented in a Raspberry Pi 4 (Raspberry Pi Foundation, Cambridge, UK) device for five minutes.

One critical step for implementing ML-based approaches in embedded devices is the training phase, as the computation runtime and memory consumption may increase considerably during model training. In this regard, some researchers have exploited the benefits of cloud computing to save time, memory, and power consumption by training their learning algorithms in the cloud instead of the device itself [42,60]. In another study [36], a semi-supervised one-class SVM (OCSVM) was developed for online differentiation between reliable and unreliable PPG segments. Unlike conventional SVM, OCSVM solely employs data from one class for training [78], considerably reducing computational resource usage and power consumption. All these approaches enable the utilization of the same user’s data in the training phase and real-time identification of MA-corrupted PPG segments.

While there have been remarkable efforts introducing real-time MA detection methods using ML techniques, more than half of the reviewed literature has been focused on offline processing, and no information concerning the method’s suitability for real-time implementations is provided. Several authors of RSL approaches for MA detection in PPG signals report processing time and power consumption (see Table 3). However, these measures may not be enough to claim that the method is suitable for real-time applications. One study by Faust and colleagues [79] suggests that computational complexity and speed-up are more appropriate measures to support the real-time claim. The computational complexity relates to the number of the algorithm’s executed operations as a function of the input data size [80,81], and the speed-up refers to how a parallel algorithm implementation is faster than its sequential implementation [82]. No author of any of the reviewed studies report such measures.

The computational complexity is measured using the big O notation and may help researchers and developers find whether a specific method is computationally suitable for real-time implementations. For instance, MA detection and removal approaches involve several signal decomposition techniques, such as principal component analysis (PCA) and empirical mode decomposition (EMD), which have been deemed computationally expensive in the past. On the other hand, Wang and co-workers [83] showed that the EMD’s complexity is equivalent to that of the fast Fourier transform (i.e., O(NlogN), where N is the size of the vector to be processed). It was found that processing algorithms with a computational complexity equal to O(NlogN) are suitable for real-time implementations [84], so it would be feasible to use EMD (and several other processing techniques) in real-time scenarios.

5.6. Recommendations for Future Endeavors

After identifying several challenges in the current literature on RSL methods for MA detection in PPG signals employing ML algorithms, what we consider valuable in addressing those issues is outlined below:

Studies must provide clear-cut evidence of using new, unseen data to evaluate the proposed method and, thus, deliver an unbiased and realistic estimate of its performance. Furthermore, authors should use not one but several well-established performance metrics, as well as information like the one provided by confusion matrices and receiver operating characteristic (ROC) curves, when reporting the effectiveness of their proposed approaches;
When relying on self-collected data, authors should include a timeline describing the actions (e.g., walking on a treadmill) performed by the participants and how long they did it. PPG recordings must contain manual annotations identifying MA-corrupted pulses or segments. In addition, all data necessary to replicate findings should be made publicly available, as suggested by an increasing number of journals and conferences;
Given the increasing utilization of smartwatches and several other wearable devices for PPG-based physiological monitoring [75,85], researchers should design and develop their MA-detecting approaches around PPG data from body parts like the wrist, forehead, and earlobe instead of being limited to data from the fingertips;
Authors should report objective measures, such as computational complexity and speed-up, to support the method’s suitability for real-time applications. Studies that have quantified the computational complexity of several signal decomposition and processing techniques may provide some insights into assessing the method’s suitability for identifying MA-corrupted PPG segments in real-time.

5.7. Limitations

A systematic comparison between studies could not be performed because their authors used different datasets, data-splitting methods, and performance assessment metrics. We did not show enough recent studies, and some potentially relevant documents might have been excluded due to a potential search engine bias (e.g., only English papers from 2014 to 2024 were included). We also did not conduct a quality assessment of the reviewed articles. Therefore, the implications and recommendations herein are limited by the lack of a standard methodological quality evaluation of the included papers.

6. Conclusions

Accurate identification of PPG sequences contaminated with MAs is crucial for efficient MA removal and preserving good-quality segments, and ML techniques have brought outstanding progress in the field. Nevertheless, no previous study has provided an in-depth discussion of these methods. This narrative review synthesized the current state-of-the-art approaches applying ML algorithms to detect MAs in PPG signals with no other information than that provided by the very signal. Even though there are only a few datasets where MAs are labeled in each PPG signal, supervised learning models are more frequent than their unsupervised counterparts, with SVM and CNN being the most widely used for reference signal-less-based MA detection. While traditional and deep learning-based methods show comparable performances, the latter could be more convenient for overcoming the limitations of handcrafted feature engineering. However, limitations and flaws in the current literature, particularly those regarding the model development and testing process and the measures used by authors to support the real-time claim, may prevent drawing firm conclusions about the reliability and applicability of these approaches. There is also a need for broader exploration and validation across different body parts to ensure the robustness and versatility of RSL methods for detecting MAs in PPG signals. A standardized set of experiments designed to test and validate these approaches is also necessary. Future efforts should consider providing enough proper elements to enable researchers and developers to obtain the most out of these methods.

Author Contributions

E.J.A.-P.: Conceptualization, Methodology, Investigation, Writing—original draft. J.F.C.G.: Conceptualization, Methodology, Investigation, Visualization, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any specific grant for funding agencies in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No data were generated in this work.

Acknowledgments

This research has been funded by Dirección General de Investigaciones of Universidad Santiago de Cali under call No. DGI 01-2024. The authors would like to express their gratitude to Danny Aurora Valera Bermúdez for her valuable comments in the early and final drafts of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

1-D	One-dimensional
2-D	Two-dimensional
AI	Artificial intelligence
ANOVA	Analysis of variance
CNN	Convolutional neural network
DL	Deep learning
EMD	Empirical decomposition mode
GAN	Generative Adversarial Network
HR	Heart rate
ICU	Intensive care unit
K-fold CV	K-fold cross-validation
LOOCV	Leave-one-out cross-validation
LSTM	Long short-term memory
MA	Motion artifact
ML	Machine learning
mRMR	Minimum redundancy–maximum relevance
OCSVM	One-class support vector machine
PCA	Principal component analysis
PPG	Photoplethysmogram
RFE	Recursive feature elimination
RMSE	Root mean square error
ROC	Receiver operating characteristic
RSL	Reference signal-less
SOM	Self-organizing map
SQI	Signal quality index
SpO2	Peripheral oxygen saturation
SVM	Support vector machine

References

Koteska, B.; Bodanova, A.M.; Mitrova, H.; Sidorenko, M.; Lehocki, F. A deep learning approach to estimate SpO2 from PPG signals. In Proceedings of the 9th International Conference on Bioinformatics Research and Applications, Berlin, Germany, 18–20 September 2022; pp. 142–148. [Google Scholar] [CrossRef]
Argüello-Prada, E.J.; Bolaños, S.M. On the role of perfusion index for estimating blood glucose levels with ultrasound-assisted and conventional finger photoplethysmography in the near-infrared wavelength range. Biomed. Signal Process. Control 2023, 86, 105338. [Google Scholar] [CrossRef]
Gupta, S.; Singh, A.; Sharma, A. Exploiting moving slope features of PPG derivatives for estimation of mean arterial pressure. Biomed. Eng. Lett. 2023, 13, 1–9. [Google Scholar] [CrossRef]
Heikenfeld, J.; Jajack, A.; Rogers, J.; Gutruf, P.; Tian, L.; Pan, T.; Li, R.; Khine, M.; Kim, J.; Wang, J.; et al. Wearable sensors: Modalities, challenges, and prospects. Lab Chip 2018, 18, 217–248. [Google Scholar] [CrossRef]
Seok, D.; Lee, S.; Kim, M.; Cho, J.; Kim, C. Motion artifact removal techniques for wearable EEG and PPG sensor systems. Front. Electron. 2021, 2, 685513. [Google Scholar] [CrossRef]
Stoica, P.; Moses, R.L. Spectral Analysis of Signals; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2005. [Google Scholar]
Pollreisz, D.; TaheriNejad, N. Detection and removal of motion artifacts in PPG signals. Mobile Netw. Appl. 2022, 27, 728–738. [Google Scholar] [CrossRef]
Ismail, S.; Akram, U.; Siddiqi, I. Heart rate tracking in photoplethysmography signals affected by motion artifacts: A review. EURASIP J. Adv. Signal Process. 2021, 2021, 5. [Google Scholar] [CrossRef]
Such, O. Motion tolerance in wearable sensors-The challenge of motion artifact. In Proceedings of the 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; pp. 1542–1545. [Google Scholar] [CrossRef]
Nabavi, S.; Bhadra, S. A robust fusion method for motion artifacts reduction in photoplethysmography signal. IEEE Trans. Instrum. Meas. 2020, 69, 9599–9608. [Google Scholar] [CrossRef]
Tăuţan, A.M.; Young, A.; Wentink, E.; Wieringa, F. Characterization and Reduction of Motion Artifacts in Photoplethysmographic Signals from a Wrist-worn Device. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 6146–6149. [Google Scholar] [CrossRef]
Zhang, Y.; Song, S.; Vullings, R.; Biswas, D.; Simões-Capela, N.; Van Helleputte, N.; Van Hoff, C.; Groenendaal, W. Motion Artifact Reduction for Wrist-Worn Photoplethysmograph Sensors Based on Different Wavelengths. Sensors 2019, 19, 673. [Google Scholar] [CrossRef]
Hayes, M.J.; Smith, P.R. A New Method for Pulse Oximetry Possessing Inherent Insensitivity to Artifact. IEEE Trans. Biomed. Eng. 2001, 48, 452–461. [Google Scholar] [CrossRef]
Ram, M.R.; Madhav, V.; Krishna, E.H.; Komalla, N.R.; Reddy, K.A. A Novel Approach for Motion Artifact Reduction in PPG Signals Based on AS-LMS Adaptive Filter. IEEE Trans. Instrum. Meas. 2012, 61, 1445–1457. [Google Scholar] [CrossRef]
Raghuram, M.; Sivani, K.; Reddy, K.A. Use of complex EMD generated noise reference for adaptive reduction of motion artifacts from PPG signal. In Proceedings of the 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 3–5 March 2016. [Google Scholar] [CrossRef]
Kumar, A.; Komaragiri, R.; Kumar, M. A review on computation methods used in photoplethysmography signal analysis for heart rate estimation. Arch. Comput. Methods Eng. 2022, 29, 921–940. [Google Scholar] [CrossRef]
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Markets 2021, 31, 685–695. [Google Scholar] [CrossRef]
Allen, J. Photoplethysmography and its application in clinical physiological measurement. Physiol. Meas. 2007, 28, R1–R39. [Google Scholar] [CrossRef]
Alian, A.A.; Shelley, K.H. Photoplethysmography. Best Pract. Res. Clin. Anaesthesiol. 2014, 28, 395–406. [Google Scholar] [CrossRef] [PubMed]
Lim, P.K.; Ng, S.C.; Lovell, N.H.; Yu, Y.P.; Tan, M.P.; McCombie, D.; Lim, E.; Redmond, S.J. Adaptive template matching of photoplethysmogram pulses to detect motion artefact. Physiol. Meas. 2018, 39, 105005. [Google Scholar] [CrossRef] [PubMed]
Vadrevu, S.; Manikandan, M.S. Real-time PPG signal quality assessment system for improving battery life and false alarms. IEEE Trans. Circuits Syst. II Express Briefs. 2019, 66, 1910–1914. [Google Scholar] [CrossRef]
Reddy, G.N.K.; Manikandan, M.S.; Murty, N.N. On-device integrated PPG quality assessment and sensor disconnection/saturation detection system for IoT health monitoring. IEEE Trans. Instrum. Meas. 2020, 69, 6351–6361. [Google Scholar] [CrossRef]
Elgendi, M. Optimal signal quality index for photoplethysmogram signals. Bioengineering 2016, 3, 21. [Google Scholar] [CrossRef]
Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson: London, UK, 2021. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning—Information Science and Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Shinde, P.P.; Shah, S. A review of machine learning and deep learning applications. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 16–18 August 2018; pp. 1–6. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Zhang, Y.; Ling, C. A strategy to apply machine learning to small datasets in materials science. Npj Comput. Mater. 2018, 4, 25. [Google Scholar] [CrossRef]
Longjie, L.; Abeysekera, S.S. Motion Artefact Removal using Single Beat Classification of Photoplethysmographic Signals. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–4. [Google Scholar] [CrossRef]
Karna, V.R.; Kumar, N. Determination of Absolute Heart Beat from Photoplethysmographic Signals in the Presence of Motion Artifacts. In Proceedings of the 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), Bangalore, India, 9–10 February 2018; pp. 1–5. [Google Scholar] [CrossRef]
Subhagya, D.S.; Keshavamurth, C. Motion Artifact Detection Model using Machine Learning Technique for Classifying Abnormalities in Human Being. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 334–340. [Google Scholar]
Dao, D.; Salehizadeh, S.M.; Noh, Y.; Chong, J.W.; Cho, C.H.; McManus, D.; Darling, C.E.; Mendelson, Y.; Chon, K.H. A robust motion artifact detection algorithm for accurate detection of heart rates from photoplethysmographic signals using time–frequency spectral features. IEEE J. Biomed. Health Inform. 2016, 21, 1242–1253. [Google Scholar] [CrossRef] [PubMed]
Sabeti, E.; Reamaroon, N.; Mathis, M.; Gryak, J.; Sjoding, M.; Najarian, K. Signal quality measure for pulsatile physiological signals using morphological features: Applications in reliability measure for pulse oximetry. Inform. Med. Unlocked 2019, 16, 100222. [Google Scholar] [CrossRef] [PubMed]
Feli, M.; Azimi, I.; Anzanpour, A.; Rahmani, A.M.; Liljeberg, P. An energy-efficient semi-supervised approach for on-device photoplethysmogram signal quality assessment. Smart Health 2023, 28, 100390. [Google Scholar] [CrossRef]
Chong, J.W.; Dao, D.K.; Salehizadeh, S.M.A.; McManus, D.D.; Darling, C.E.; Chon, K.H.; Mendelson, Y. Photoplethysmograph signal reconstruction based on a novel hybrid motion artifact detection–reduction approach. Part I: Motion and noise artifact detection. Ann. Biomed. Eng. 2014, 42, 2238–2250. [Google Scholar] [CrossRef]
Oliveira, L.C.; Lai, Z.; Geng, W.; Siefkes, H.; Chuah, C.N. A machine learning driven pipeline for automated Photoplethysmogram signal artifact detection. In Proceedings of the 2021 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Washington, DC, USA, 16–18 December 2021; pp. 149–154. [Google Scholar] [CrossRef]
Athaya, T.; Choi, S. An efficient fingertip photoplethysmographic signal artifact detection method: A machine learning approach. J. Sens. 2021, 2021, 9925033. [Google Scholar] [CrossRef]
Pflugradt, M.; Moeller, B.; Orglmeister, R. OPRA: A fast on-line signal quality estimator for pulsatile signals. IFAC Pap. 2015, 48, 459–464. [Google Scholar] [CrossRef]
Roy, M.S.; Gupta, R.; Sharma, K.D. Photoplethysmogram signal quality evaluation by unsupervised learning approach. In Proceedings of the 2020 IEEE Applied Signal Processing Conference (ASPCON), Kolkata, India, 7–9 October 2020; pp. 6–10. [Google Scholar] [CrossRef]
Mahmoudzadeh, A.; Azimi, I.; Rahmani, A.M.; Liljeberg, P. Lightweight photoplethysmography quality assessment for real-time IoT-based health monitoring using unsupervised anomaly detection. Procedia Comput. Sci. 2021, 184, 140–147. [Google Scholar] [CrossRef]
Kohonen, T. Self-Organizing Maps; Springer: Berlin, Germany, 2001. [Google Scholar]
Shriram, S.; Sivasankar, E. Anomaly detection on shuttle data using unsupervised learning techniques. In Proceedings of the 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 11–12 December 2019; pp. 221–225. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Gu, Q.; Li, Z.; Han, J. Generalized Fisher Score for Feature Selection. 2012. Available online: http://arxiv.org/abs/1202.3725 (accessed on 5 August 2024).
Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
Yu, L.; Liu, H. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 2004, 5, 1205–1224. Available online: https://www.jmlr.org/papers/volume5/yu04a/yu04a.pdf (accessed on 20 August 2024).
Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine learning interpretability: A survey on methods and metrics. Electronics 2019, 8, 832. [Google Scholar] [CrossRef]
Zihni, E.; Madai, V.I.; Livne, M.; Galinovic, I.; Khalil, A.A.; Fiebach, J.B.; Frey, D. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS ONE 2020, 15, e0231166. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Hu, Q.; Yuan, H.; Yang, C. Motion artifact detection in ppg signals based on gramian angular field, 2.-D.-C.N.N. In Proceedings of the 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Chengdu, China, 17–19 October 2020; pp. 743–747. [Google Scholar] [CrossRef]
Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. arXiv 2015, arXiv:1506.00327. [Google Scholar] [CrossRef]
Suzuki, G.; Freitas, P.G. On the Performance of Composite 1D-to-2D Projections for Signal Quality Assessment. In Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS); SBC: Vancouver, WA, USA, 2024; pp. 319–330. [Google Scholar] [CrossRef]
Zargari, A.H.A.; Aqajari, S.A.H.; Khodabandeh, H.; Rahmani, A.; Kurdahi, F. An accurate non-accelerometer-based ppg motion artifact removal technique using cyclegan. ACM Trans. Comput. Healthc. 2023, 4, 1–14. [Google Scholar] [CrossRef]
Goh, C.H.; Tan, L.K.; Lovell, N.H.; Ng, S.C.; Tan, M.; Lim, E. Robust PPG motion artifact detection using a 1-D convolution neural network. Comput. Methods Programs Biomed. 2020, 196, 105596. [Google Scholar] [CrossRef]
Azar, J.; Makhoul, A.; Couturier, R.; Demerjian, J. Deep recurrent neural network-based autoencoder for photoplethysmogram artifacts filtering. Comput. Electr. Eng. 2021, 92, 107065. [Google Scholar] [CrossRef]
Guo, Z.; Ding, C.; Hu, X.; Rudin, C. A supervised machine learning semantic segmentation approach for detecting artifacts in plethysmography signals from wearables. Physiol. Meas. 2021, 42, 125003. [Google Scholar] [CrossRef]
Shin, H. Deep convolutional neural network-based signal quality assessment for photoplethysmogram. Comput. Biol. Med. 2022, 145, 105430. [Google Scholar] [CrossRef]
Lucafó, G.D.; Freitas, P.; Lima, R.; da Luz, G.; Bispo, R.; Rodrigues, P.; Cabello, F.; Penatti, O. Signal quality assessment of photoplethysmogram signals using hybrid rule-and learning-based models. J. Health Inform. 2023, 15. [Google Scholar] [CrossRef]
Zheng, Y.; Wu, C.; Cai, P.; Zhong, Z.; Huang, H.; Jiang, Y. Tiny-PPG: A lightweight deep neural network for real-time detection of motion artifacts in photoplethysmogram signals on edge devices. Internet Things 2024, 25, 101007. [Google Scholar] [CrossRef]
Shahid, S.M.; Ko, S.; Kwon, S. Performance comparison of 1d and 2d convolutional neural networks for real-time classification of time series sensor data. In Proceedings of the 2022 International Conference on Information Networking (ICOIN), Jeju-si, Republic of Korea, 12–15 January 2022; pp. 507–511. [Google Scholar] [CrossRef]
Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofjur, M.; Shawkat Ali, A.B.M.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Freitas, P.G.; De Lima, R.G.; Lucafo, G.D.; Penatti, O.A. Assessing the quality of photoplethysmograms via gramian angular fields and vision transformer. In Proceedings of the 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 4–8 September 2023; pp. 1035–1039. [Google Scholar] [CrossRef]
Liu, J.; Hu, S.; Hu, Q.; Wang, D.; Yang, C. A Lightweight Hybrid Model Using Multiscale Markov Transition Field for Real-Time Quality Assessment of Photoplethysmography Signals. IEEE J. Biomed. Health Inform. 2023, 28, 1078–1088. [Google Scholar] [CrossRef] [PubMed]
Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. Overview of Supervised Learning. In The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009; pp. 9–42. [Google Scholar]
Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Saeed, M.; Villarroel, M.; Reisner, A.T.; Clifford, G.; Lehman, L.W.; Moody, G.; Heldt, T.; Kyaw, T.H.; Moody, B.; Mark, R.G. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database. Crit. Care Med. 2011, 39, 952–960. [Google Scholar] [CrossRef] [PubMed]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
Karlen, W.; Turner, M.; Cooke, E.; Dumont, G.; Ansermino, J.M. CapnoBase: Signal database and tools to collect, share and annotate respiratory signals. In Proceedings of the 2010 Annual Meeting of the Society for Technology in Anesthesia, San Diego, CA, USA, 16–20 October 2010; p. 27. [Google Scholar] [CrossRef]
Zhang, Z.; Pi, Z.; Liu, B. TROIKA: A general framework for heart rate monitoring using wrist-type photoplethysmographic signals during intensive physical exercise. IEEE Trans. Biomed. Eng. 2015, 62, 522–531. [Google Scholar] [CrossRef]
Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, Paris, France, 9–13 October 2018; pp. 400–408. [Google Scholar] [CrossRef]
Reiss, A.; Indlekofer, I.; Schmidt, P.; Van Laerhoven, K. Large-scale heart rate estimation with convolutional neural networks. Sensors 2019, 19, 3079. [Google Scholar] [CrossRef]
Castaneda, D.; Esparza, A.; Ghamari, M.; Soltanpur, C.; Nazeran, H. A review on wearable photoplethysmography sensors and their potential future applications in health care. Int. J. Biosens. Bioelectron. 2018, 4, 195. [Google Scholar] [CrossRef]
Hartmann, V.; Liu, H.; Chen, F.; Qiu, Q.; Hughes, S.; Zheng, D. Quantitative comparison of photoplethysmographic waveform characteristics: Effect of measurement site. Front. Physiol. 2019, 10, 198. [Google Scholar] [CrossRef] [PubMed]
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Englewood Cliffs, NJ, USA, 1999. [Google Scholar]
Schölkopf, B.; Williamson, R.C.; Smola, A.; Shawe-Taylor, J.; Platt, J. Support vector method for novelty detection. Adv. Neural Inf. Process. Syst. 1999, 12, 582–588. [Google Scholar]
Faust, O.; Yu, W.; Acharya, U.R. The role of real-time in biomedical science: A meta-analysis on computational complexity, delay and speedup. Comput. Biol. Med. 2015, 58, 73–84. [Google Scholar] [CrossRef] [PubMed]
Knuth, D.E. The Art of Computer Programming; Addison-Wesley: Boston, MA, USA, 2006. [Google Scholar]
Arora, S.; Barak, B. Computational Complexity: A Modern Approach; University Press: Cambridge, UK, 2009. [Google Scholar]
Hwang, K.; Briggs, F.A. Computer Architecture and Parallel Processing; McGraw-Hill: New York, NY, USA, 1984. [Google Scholar]
Wang, Y.H.; Yeh, C.H.; Young, H.W.V.; Hu, K.; Lo, M.T. On the computational complexity of the empirical mode decomposition algorithm. Phys. A Stat. Mech. Appl. 2014, 400, 159–167. [Google Scholar] [CrossRef]
Faust, O.; Yu, W.; Kadri, N.A. Computer-based identification of normal and alcoholic EEG signals using wavelet packets and energy measures. J. Mech. Med. Biol. 2013, 13, 1350033. [Google Scholar] [CrossRef]
Charlton, P.H.; Allen, J.; Bailón, R.; Baker, S.; Behar, J.A.; Chen, F.; Clifford, G.D.; Clifton, D.A.; Davies, H.J.; Ding, C.; et al. The 2023 wearable photoplethysmography roadmap. Physiol. Meas. 2023, 44, 111001. [Google Scholar] [CrossRef]

Figure 1. A Venn diagram-like representation of the conceptual distinction between artificial intelligence (AI), machine learning (ML), and deep learning (DL).

Figure 2. Flowchart of the processes involved in motion artifact detection from PPG signals, under the light of traditional and deep learning.

Figure 3. Pie charts showing (a) the proportion of studies according to their scope (i.e., removing versus de-noising MA-corrupted PPG segments) and (b) the frequency of use of several body sites across studies.

Table 1. Summary of the traditional ML-based approaches for the reference signal-less detection of motion artifacts in PPG signals.

Author(s), Year; [Reference]	Dataset	Method	Performance
Chong et al., 2014; [37]	Laboratory-controlled finger (n = 13), forehead (n = 11), and daily-activity data (n = 9)	SVM (supervised) + 11-fold CV	Accuracy: 94.4, 93.4, and 93.7% for laboratory-controlled finger, forehead, and daily-activity movement data, respectively. HR and SpO₂ errors reduced to 2.3 bpm and 2.7%.
Pflugradt et al., 2015; [40]	Ten records selected from Physionet database + self-collected records (n = 20)	Single-layer perceptron (supervised)	Physionet data Sensitivity: 84 ± 13% Specificity: 95 ± 3% Accuracy: 89 ± 11% Self-collected data Sensitivity: 83 ± 6% Specificity: 87 ± 10% Accuracy: 82 ± 10%
Dao et al., 2016; [34]	Chon Lab (n = 11) + UMass Memorial Medical Center Dataset (n = 10)	SVM (supervised) + LOOCV	Finger Sensitivity: 92.5% Specificity: 97.5% Accuracy: 95.9% Forehead Sensitivity: 91.9% Specificity: 97.7% Accuracy: 95.5%
Karna and Kumar, 2018; [32]	IEEE-SPC 2015 (n = 12)	SVM (supervised)	The HR mean absolute error was 1.6 beats per minute
Sabeti et al., 2019; [35]	Capnobase (n = 42) + 46 records collected from acute respiratory distress syndrome databank	SVM and decision trees (supervised) + hold-out	Sensitivity: 98.27% Precision: 100.00%
Longjie and Abeysekera, 2019; [31]	Capnobase (n = 42) + self-collected records (n = 26)	SVM (supervised) + LOOCV	Accuracy: 96.6%
Subhagya and Keshavamurty, 2019; [33]	Simulated and self-collected records (n = no reported)	Enhanced SVM (supervised)	Sensitivity: 94.60% Specificity: 97.50% Precision: 98.57% Accuracy: 95.97%
Roy et al., 2020; [41]	Self-collected (n = 30)	SOM (unsupervised)	Sensitivity: 95.8% Accuracy: 92.0% F1-Score: 91.5%
Oliveira et al., 2021; [38]	Self-collected (newborns, n = 21)	Random forest–gradient boosting (supervised) + hierarchical rule-based approach	Sensitivity: 85.44% Specificity: 82.18% Accuracy: 84.27%
Athaya and Choi, 2021; [39]	MIMIC II (n = 121)	Random forest (supervised) + 10-fold CV	Sensitivity: 86.57% Specificity: 85.09% Accuracy: 85.68%
Mahmoudzadeh et al., 2021; [42]	Self-collected (women, n = 5)	Elliptical envelope algorithm + intra- and inter-participant CV	Sensitivity: 94.75% Precision: 94.25% F1-Score: 94.25%
Feli et al., 2023; [36]	Self-collected (n = 46)	SVM (supervised) + five-fold CV	Accuracy: 97.0% False Positive Rate: 1.0% AUC-ROC: 99.71%

Table 3. Measures provided by authors of RSL-MA detection methods based on machine learning to support the real-time claim.

Reference	Measurements	Platform
[34]	Processing time (7 ms for a 7 s PPG window length)	Intel Xeon 3.6 GHz computer
[37]	Processing time (33.3 ms for a 4 s PPG window length)	Intel Xeon 3.6 GHz computer
[40]	Processing time (181.25 ms at a sampling rate of 500 Hz) and memory usage (512 RAM bytes)	Not reported
[39]	Processing time (57.5 ms) and memory usage (15.93 KB)	Android smartphone with 4 GB RAM and 64-bit Kirin 710 processor.
[42]	Processing time (12.75 ± 0.60 ms)	Intel core i9 CPU at 2.90 GHz and 32 GB RAM
[36]	Processing time (24.71 and 26.35 ms) and power consumption (0.95 and 3.1 W)	Raspberry pi 4 and Jetson Nano
[53]	Memory usage (2 MB)	Not reported
[54]	Processing time (398 ms), memory usage (26 MB), and power consumption (3.07 W)	Raspberry pi 4
[59]	Size in disk (35.1 KB) and energy consumption (49.2 µJ per inference)	AMD EPYC 7742 64-Core Processor with 16 GB RAM
[60]	Processing time (206 ms for a 30 s PPG window length) and memory usage (134.82 RAM Bytes)	ARM 32-bit single-core Cortex-M7 processor at 216 MHz with 512 KB RAM
[64]	Processing time (515 ms for a 5 s PPG window length) and floating-point operations per second (FLOPS) (6.56)	NVIDIA RTX 3060 (12 GB VRAM) used in Python 3.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Argüello-Prada, E.J.; Castillo García, J.F. Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review. Sensors 2024, 24, 7193. https://doi.org/10.3390/s24227193

AMA Style

Argüello-Prada EJ, Castillo García JF. Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review. Sensors. 2024; 24(22):7193. https://doi.org/10.3390/s24227193

Chicago/Turabian Style

Argüello-Prada, Erick Javier, and Javier Ferney Castillo García. 2024. "Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review" Sensors 24, no. 22: 7193. https://doi.org/10.3390/s24227193

APA Style

Argüello-Prada, E. J., & Castillo García, J. F. (2024). Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review. Sensors, 24(22), 7193. https://doi.org/10.3390/s24227193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Applied to Reference Signal-Less Detection of Motion Artifacts in Photoplethysmographic Signals: A Review

Abstract

1. Introduction

2. Source Identification and Selection

3. Background on Machine Learning (ML)

4. Machine Learning Techniques Applied to Reference Signal-Less Detection of Motion Artifacts in PPG Signals: From Traditional to Deep Learning

4.1. Traditional Machine Learning Techniques

4.1.1. Characterization of Studies

4.1.2. Features

4.1.3. Benefits and Drawbacks

4.2. Deep Learning Techniques

4.2.1. Characterization of Studies

4.2.2. Automated Feature Learning

5. Discussion

5.1. Risk of Bias in Evaluating and Reporting the Method’s Effectiveness

5.2. Publicly Available versus Self-Collected Records and the Experimental Design Diversity

5.3. Implications of Ignoring or De-Noising MA-Corrupted Signal Segments for PPG-Based Physiological Monitoring

5.4. Body Site Measurement

5.5. The Promise of Real-Time Processing

5.6. Recommendations for Future Endeavors

5.7. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI