1. Introduction
Heart disease remains a major global health concern, necessitating effective early detection and intervention strategies to mitigate its impact on the public. By identifying individuals with early signs or risk factors of heart disease, researchers can conduct comprehensive studies to understand the underlying mechanisms, genetic predispositions, and environmental determinants contributing to the disease [
1].
Photoplethysmography (PPG) and remote photoplethysmography (rPPG) techniques enable non-invasive monitoring of physiological signals, capturing vital information about heart rate, blood volume changes, and arterial pulsations [
2]. These allow for the early detection of people who are at a higher risk of developing heart disease, allowing for targeted healthcare methods. PPG methods may not be suitable in the context of conditions like COVID-19, multi-parameter monitoring, and transmission risk [
3,
4,
5]. Recent research has explored various models and factors to extract accurate heartbeat information from remote photoplethysmography (rPPG) signals. One of the approaches involves utilizing signal processing techniques such as filtering, noise reduction, and motion artifact removal to improve the quality of rPPG signals [
4]. Investigating emotion recognition from facial images despite variations in occlusion, pose, and illumination takes a unique angle in this study [
6,
7,
8,
9].
The association between skin color and heart wellness has gained attention due to the visible variations that are influenced by factors such as melanin content, blood flow, and tissue composition [
1]. In our research, we have leveraged the foundation laid by Dasari et al. [
4]. Their study assessed the accuracy of the Chrominance-based Remote PPG Algorithm (CHROM), Plane Orthogonal-to-Skin (POS), Bounded Kalman Filter (BKF), Spherical Mean (SPH), and DeepPhys approaches, thereby exploring the potential use of skin color analysis as a predictive tool. They obtained significant
p-values, indicating the statistical significance of the relationship between skin color characteristics and heart wellness. By referencing these
p-values and incorporating them into our research, we can establish a connection with their findings and further contribute to the existing knowledge in this domain. In addition to leveraging the
p-values from the referenced study, our research expands on these findings by utilizing advanced regression techniques, namely, Lasso Regression and Random Forest Regression. These models offer powerful tools for feature selection and prediction, enhancing the accuracy and robustness of heart rate prediction based on skin color analysis. Kuriakose et al. [
10] embarked on a journey to predict diabetes using advanced machine learning techniques while harnessing the insights gleaned from a substantial clinical patient database. Although distinct from heart wellness prediction, their methodology and findings resonate with the potential applicability of machine learning in various healthcare domains [
11,
12,
13].
It is important to acknowledge the limitations and challenges associated with our research. Factors such as genetic variations, environmental determinants, and lifestyle choices can influence both skin color and heart wellness. Rigorous data collection, statistical analysis, and validation procedures are vital to ensure the reliability and generalizability of our models.
This research paper aims to establish a connection between our work and the aforementioned studies by incorporating the obtained p-values and introducing advanced regression techniques, specifically Lasso Regression and Random Forest Regression, to enhance the accuracy of heart disease prediction based on skin color analysis.
4. Results
4.1. Comparison and Analysis
This study compared the predicted heart rates from Lasso Regression and Random Forest Regression with the ground truth values. For Lasso Regression, the deviation between the predicted and ground truth heart rates was assessed using MSE and R2 metrics.
Similarly, for Random Forest Regression, MSE and R2 values were calculated to evaluate the accuracy and goodness of fit of the predictions. The graphical representation of the predicted heart rates against the ground truth values allowed for visual analysis and comparison between the two regression models.
By applying Lasso Regression and Random Forest Regression techniques, the study aimed to assess the accuracy of heart rate prediction in the context of skin color analysis. These regression methods, along with the evaluation metrics, provided a quantitative analysis of the predictions and facilitated a comprehensive understanding of their performance.
4.2. Analysis of Results
The results of our analysis indicate that the Random Forest Regression (RFR) model outperformed the Lasso Regression model in predicting heart rates. This conclusion is supported by the graphical comparison of the predicted heart rates with the ground truth values from
Figure 2a,b.
In the linear regression graph comparing the predicted heart rates from Lasso Regression with the ground truths in
Figure 2b, the points deviated noticeably from the x = y line. This noticeable deviation suggests that the Lasso Regression model exhibits limitations in accurately predicting the heart rate. Points that closely adhere to the x = y line signify a greater degree of concordance between the predicted values and the actual ground truth values.
On the other hand, in the graph representing the Random Forest Regression model in
Figure 2a, the points were much closer to the x = y line. This proximity suggests that the Random Forest Regression model yielded a more accurate prediction of the heart rate. The points aligning closely with the x = y line indicate a higher level of agreement between the predicted and ground truth values.
The consistent pattern observed in the Random Forest Regression graph indicates the model’s ability to accurately estimate heart rates across a range of values. The small deviations from the x = y line suggest that the Random Forest Regression model captured the underlying relationships in the data more effectively, resulting in an improved prediction accuracy.
These findings support the superiority of the Random Forest Regression model over the Lasso Regression model in predicting heart rates. While visual analysis of the graphs provides valuable insights, it is essential to supplement these observations with quantitative metrics such as Mean Squared Error (MSE) and R-squared (R2) to obtain a more comprehensive evaluation of the models’ performance.
In summary, our results indicate that the Random Forest Regression model exhibited a greater accuracy in predicting heart rates compared to the Lasso Regression model. The data points on the graph of Random Forest Regression, which closely correspond to the x = y line, strongly signal a substantial harmony between the projected and actual values. This alignment underscores the model’s effectiveness in estimating heart rates, emphasizing its robustness. These findings have important implications for an accurate heart disease prediction and emphasize the utility of the Random Forest Regression approach in healthcare research and monitoring.
5. Discussion
The present study aimed to investigate the influence of skin factors on remote photoplethysmography (rPPG) results and evaluate the performance of different regression methods in predicting rPPG heart rates. The study findings provide valuable insights into the relationship between skin factors and rPPG results, as well as the efficacy of various regression techniques for accurate signal prediction.
The first aspect examined in this study was the influence of skin factors on rPPG results. The KS test was utilized to assess the significance of the differences in rPPG heart rates between different gender groups within the same country and between the two countries.
The significance level (alpha) was set at 0.05, which means that
p-values below this threshold provide strong evidence that supports the alternative hypothesis of significant differences, while
p-values above 0.05 provide insufficient evidence for the rejection of the null hypothesis of no significant difference. The obtained
p-values from the KS test provide insights into the comparisons between different groups. For instance, from
Table 1, the
p-value of 0.001 for the comparison between India Male and Sierra Leone Male indicates a significant difference in rPPG heart rates between these two groups. This finding suggests that there are substantial differences between the rPPG heart rates obtained from DeepPhys and the ground truth values measured using the Masimo oximeter for male individuals from India and Sierra Leone.
On the other hand, the p-value of 0.990 for the comparison between Sierra Leone Male and Sierra Leone Female is above the significance level of 0.05. This indicates that there is insufficient evidence to support the hypothesis of significant differences between the rPPG heart rates obtained from Masimo and POS for male and female individuals in Sierra Leone. To put it differently, the available data do not present compelling indications of a noteworthy variance between the Masimo oximeter and the POS method for measuring rPPG heart rates within these distinct gender groups in Sierra Leone. These results underscore the importance of interpreting p-values in relation to the chosen significance level. p-values below the significance level (e.g., 0.05) provide evidence to support the alternative hypothesis of significant differences, suggesting that there are substantial variations between the compared groups. Conversely, p-values above the significance level suggest that the observed differences may be due to chance, and there is no strong evidence to conclude the presence of significant distinctions between the groups. In summary, the KS test results reveal that the rPPG heart rates exhibit significant differences between certain groups, while in other cases, the differences are not statistically significant. These findings highlight the need for careful interpretation of p-values in light of the chosen significance level and provide insights into the extent of variation in rPPG heart rates between different gender groups and measurement methods.
In addition to the analysis of skin factors, the study evaluated the performance of different regression methods in predicting rPPG heart rates. Mean Squared Error (MSE) and R-squared (R2) values were employed as evaluation metrics for assessing the accuracy and goodness of fit of the regression models. These metrics provide valuable insights into the predictive capabilities of the regression methods for rPPG signal estimation. The MSE measures the average squared difference between the predicted and actual rPPG heart rates values. Lower MSE values indicate a higher accuracy in predicting the rPPG heart rates. The obtained MSE values for each gender group and regression method were compared to assess their relative performance. Higher MSE values indicate a larger discrepancy between the predicted and actual rPPG heart rates, suggesting limitations in the accuracy of the regression models. Conversely, the R2 value denotes the fraction of the variability observed in the rPPG heart rates that can be accounted for by the regression model. Elevated R2 values signify a stronger alignment of the regression model with the dataset, suggesting the model’s capability to elucidate a greater segment of fluctuations in the rPPG heart rates.
The analysis of the MSE and R2 values revealed a varying performance across the different regression methods and gender groups. For instance, in the case of Masimo_India_F, the BKF method displayed an MSE of 31.941 and an R2 value of 0.124. These values indicate that the BKF method had a moderate accuracy in predicting the rPPG heart rates for Indian females. Similarly, the Chrom method yielded an MSE of 32.854 and an R2 value of 0.099, suggesting a similar level of accuracy but with a slightly lower goodness of fit. Meanwhile, the Sph method showed an MSE of 31.635 and an R2 value of 0.132, indicating comparable performance to the BKF method. It is important to note that the Random Forest Regression models also demonstrated competitive performance in terms of MSE and R2 values. These models exhibited promising accuracy and a good fit to the data, suggesting their potential as reliable alternatives for rPPG signal prediction. Random Forest combines many decision trees to predict outcomes. Each tree learns different patterns from the given data, making it efficient in capturing complex relationships. Heart rates are influenced by many factors, some of which do not have straightforward relationships. Random Forest can handle these complexities by considering interactions between factors and avoiding overfitting. Lasso regression is a linear method that works best when relationships between predictors and outcomes are simple. It is like fitting a straight line through data points. However, heart rate prediction is more intricate and does not always follow a straight line pattern. Lasso’s simplicity may cause it to miss important factors and interactions that Random Forest can identify. This is why Random Forest, with its ability to capture complex relationships, outperforms Lasso for predicting heart rates. The interpretations of the MSE and R2 values in the context of rPPG signal prediction must consider the specific objectives of the research and the desired level of accuracy. In applications where high precision is crucial, lower MSE values and higher R2 values are desired. Conversely, in scenarios where a rough estimation is acceptable, higher MSE values and lower R2 values may still yield satisfactory results.
In conclusion, this study highlights the influence of skin factors on rPPG results and the importance of selecting appropriate regression methods for accurate signal prediction. The study findings indicate significant gender-related differences in rPPG heart rates within the same country, emphasizing the need to account for gender as a contributing factor. The MSE and R2 values provide valuable insights into the accuracy and goodness of fit of the regression models, enabling the evaluation of their performance. The results suggest that Random Forest Regression shows promise as a reliable approach for rPPG signal prediction. These findings contribute to the understanding of factors affecting rPPG heart rates and provide guidance for improving their prediction accuracy in diverse populations.
Future research could explore additional skin factors, such as melanin content and skin thickness, to further understand their impact on rPPG results. Additionally, investigating the performance of other regression techniques or ensemble methods could contribute to identifying the most accurate and robust approach for rPPG signal estimation. Moreover, expanding the sample size and including a more diverse range of populations would enhance the generalizability and applicability of the study findings.