Research on Heart Rate Detection from Facial Videos Based on an Attention Mechanism 3D Convolutional Neural Network
Abstract
:1. Introduction
- Manually selecting the facial ROI to enlarge the facial area while minimizing the regions prone to motion artifacts, thus avoiding their interference.
- Performing normalization on videos with uneven illumination, allowing for the adaptive adjustment of lighting variations in the video and preserving facial details to reduce the impact of lighting changes.
- Introducing the lightweight attention mechanism SimAM, based on the 3D-CNN, aiming to reduce computational complexity while accurately extracting rPPG signals and minimizing the influence of noise on signal extraction.
- Incorporating BiLSTM to extract heart rate information from rPPG signals through bidirectional processing, long-term and short-term dependency modeling, and temporal feature learning, improving the accuracy and generalization ability of heart rate estimation.
2. Algorithm Description
2.1. General Block Diagram
2.2. SimAM Branches
2.3. 3D-SimAM Structure
2.4. Heart Rate Estimation Module
2.5. Evaluation Indicators
2.5.1. Mean Absolute Error (MAE)
2.5.2. Root Mean Squared Error (RMSE)
2.5.3. Pearson Correlation Coefficient (R)
3. Experiment
3.1. Dataset
3.1.1. UBFC-rPPG Dataset
3.1.2. PURE Dataset
3.2. ROI Selection
3.3. Illumination Normalization
3.4. Data Augmentation
3.5. Implementation Details
3.6. Experimental Results and Comparison
3.6.1. Analysis of Experimental Results
3.6.2. Comparative Experiment
3.6.3. Testing Across Datasets
3.7. Ablation Experiment
- (1)
- Without incorporating any attention mechanism module;
- (2)
- Add SimAM attention mechanism to the existing model;
- (3)
- Add CBAM attention mechanism to the existing model [35];
- (4)
- Add SKAttention attention mechanism to the existing model.
4. Conclusions and Prospects
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cook, M.S.; Togni, M.C.; Schaub, M.; Wenaweser, P.; Hess, O.M. High heart rate: A cardiovascular risk factor? Eur. Heart J. 2006, 27, 2387–2393. [Google Scholar] [CrossRef] [PubMed]
- Cao, J.; Li, Y.; Zhang, K.; Gool, L.V. Video Super-Resolution Transformer. arXiv 2021, arXiv:2106.06847. [Google Scholar]
- Xun, C.; Cheng, J.; Song, R.; Liu, Y.; Rabab, W.; Wang, Z.J. Video-based heart rate measurement: Recent advances and future prospects. IEEE Trans. Instrum. Meas. 2018, 68, 3600–3615. [Google Scholar]
- Xin, L.; Patel, S.; McDuff, D. RGB Camera-based physiological sensing: Challenges and future directions. arXiv 2021, arXiv:2110.13362. [Google Scholar]
- Yu, Z.; Li, X.; Zhao, G. Facial-Video-Based Physiological Signal Measurement: Recent advances and affective applications. IEEE Signal Process. Mag. 2021, 38, 50–58. [Google Scholar] [CrossRef]
- Xiao, H.; Liu, T.; Sun, Y.; Sun, Y.; Li, Y.; Zhao, S.; Avolio, A. Remote photoplethysmography for heart rate measurement: A review. Biomed. Signal Process. Control 2024, 88, 105608. [Google Scholar] [CrossRef]
- Verkruysse, W.; Svaasand, L.; Nelson, J. Remote plethymographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [PubMed]
- Poh, M.-Z.; MCDuff, D.; Picard, R. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 2010, 18, 10762–10774. [Google Scholar] [CrossRef]
- Comon, P. Independent component analysis, A new concept? Signal Process. 1994, 36, 287–314. [Google Scholar] [CrossRef]
- Hsu, G.; Ambikapathi, A.; Chen, M. Deep learning with time-frequency representation for pulse estimation from facial videos. In Proceedings of the IEEE International Joint Conference on Biometrics(IJCB), Denver, CO, USA, 1–4 October 2017; pp. 383–389. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Špetlík, K.; Franc, V.; Matas, J. Visual heart rate estimation with convolutional neural network. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; pp. 3–6. [Google Scholar]
- Chen, W.; McDuff, D. DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 349–365. [Google Scholar]
- Song, R.; Chen, H.; Cheng, J.; Li, C.; Liu, Y.; Chen, X. PulseGAN: Learning to Generate Realistic Pulse Waveforms in Remote Photoplethysmography. IEEE J. Biomed. Health Inform. 2021, 25, 1373–1384. [Google Scholar] [CrossRef] [PubMed]
- Gupta, A.; Ravelo-García, A.G.; Dias, F.M. Availability and performance of face based non-contact methods for heart rate and oxygen saturation estimations: A systematic review. Comput. Methods Programs Biomed. 2022, 219, 106771. [Google Scholar] [CrossRef] [PubMed]
- Niu, X.; Yu, Z.; Hu, H.; Li, X.; Shan, S.; Zhao, G. Video-Based Remote Physiological Measurement via Cross-Verified Feature Disentangling. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Volume 12347, pp. 295–310. [Google Scholar]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef]
- King, D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
- Yang, L.; Zhang, R.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR: New York, NY, USA, 2021; pp. 11863–11874. [Google Scholar]
- Mazhar, N.; Malik, F.M.; Raza, A.; Khan, R. Predefined-time control of nonlinear systems: A sigmoid function based sliding manifold design approach. Alex. Eng. J. 2022, 61, 6831–6841. [Google Scholar] [CrossRef]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
- Bobbia, S.; Macwan, R.; Benezeth, Y.; Mansouri, A.; Dubois, J. Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recognit. Lett. 2019, 124, 82–90. [Google Scholar] [CrossRef]
- Stricker, R.; Miller, S.; Gross, H.M. Non-contact video-based pulse rate measurement on a mobile service robot. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, Edinburgh, UK, 25–29 August 2014; pp. 1056–1062. [Google Scholar]
- De Haan, G.; Jeanne, V. Robust Pulse Rate From Chrominance-Based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; den Brinker, A.C.; Stuijk, S.; De Haan, G. Algorithmic Principles of Remote ppg. IEEE Trans. Biomed. Eng. 2016, 64, 1479–1491. [Google Scholar] [CrossRef]
- Niu, X.; Han, H.; Shan, S.; Chen, X. SynRhythm: Learning a Deep Heart Rate Estimator from General to Specific. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3580–3585. [Google Scholar]
- Gideon, J.; Stent, S. The Way to My Heart Is Through Contrastive Learning: Remote Photoplethysmography From Unlabelled Video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3995–4004. [Google Scholar]
- Sun, Z.; Li, X. Contrast-Phys: Unsupervised Video-Based Remote Physiological Measurement via Spatiotemporal Contrast. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; Volume 13672, pp. 492–510. [Google Scholar]
- Speth, J.; Vance, N.; Flynn, P.; Czajka, A. Non-Contrastive Unsupervised Learning of Physiological Signals From Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14464–14474. [Google Scholar]
- Lu, H.; Han, H.; Zhou, S.K. Dual-Gan: Joint Bvp and Noise Modeling for Remote Physiological Measurement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 12404–12413. [Google Scholar]
- Sun, Z.; Li, X. Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; Volume 14, pp. 5835–5851. [Google Scholar]
- Wang, W.; Stuijk, S.; Haan, G. A Novel Algorithm for Remote Photoplethysmography: Spatial Subspace Rotation. IEEE Trans. Biomed. Eng. 2016, 63, 1974–1984. [Google Scholar] [CrossRef]
- Yu, Z.; Li, X.; Zhao, G. Remote Photoplethysmograph Signal Measurement from Facial Videos Using Spatio-Temporal Networks. arXiv 2019, arXiv:1905.02419. [Google Scholar]
- Yu, Z.; Shen, Y.; Shi, J.; Zhao, H.; Torr, P.; Zhao, G. PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4186–4196. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Method | MAE (bpm) ↓ | RMSE (bpm) ↓ | R ↑ |
---|---|---|---|
ICA [9] | 5.17 | 11.76 | 0.65 |
CHROM [24] | 2.37 | 4.91 | 0.89 |
POS [25] | 4.05 | 8.75 | 0.78 |
SynRhythm [26] | 5.59 | 6.82 | 0.75 |
PulseGAN [14] | 1.19 | 2.10 | 0.98 |
Gideon2021 [27] | 1.85 | 4.28 | 0.93 |
Contrast-Phys [28] | 0.64 | 1.00 | 0.99 |
SiNC [29] | 0.59 | 1.83 | 0.99 |
Dual-GAN [30] | 0.44 | 0.67 | 0.99 |
Contrast-Phys + (100%) [31] | 0.21 | 0.80 | 0.99 |
3D-SimAM(Ours) | 0.24 | 0.65 | 0.99 |
Method | MAE (bpm) ↓ | RMSE (bpm) ↓ | R ↑ |
---|---|---|---|
CHROM [24] | 2.07 | 9.92 | 0.99 |
2SR [32] | 2.44 | 3.06 | 0.98 |
HR-CNN [12] | 1.84 | 2.37 | 0.98 |
PhysNet [33] | 2.10 | 2.60 | 0.99 |
Dual-GAN [30] | 0.82 | 1.31 | 0.99 |
SiNC [29] | 0.61 | 1.84 | 0.99 |
Gideon2021 [27] | 2.3 | 2.90 | 0.99 |
Contrast-Phys [28] | 1.00 | 1.40 | 0.99 |
3D- SimAM(Ours) | 0.63 | 1.30 | 0.99 |
Method | MAE (bpm) ↓ | RMSE (bpm) ↓ | R ↑ |
---|---|---|---|
CHROM [24] | - | 13.97 | 0.55 |
PhysNet [33] | 2.20 | 6.85 | 0.86 |
Physformer [34] | 2.68 | 7.01 | 0.86 |
Contrast-Phys [28] | 2.43 | 7.43 | 0.86 |
3D- SimAM(Ours) | 2.15 | 7.08 | 0.88 |
SimAM Attention | CBAM Attention | SK Attention | MAE (bpm) ↓ | RMSE (bpm) ↓ | R ↑ |
---|---|---|---|---|---|
0.64 | 1.00 | 0.99 | |||
√ | 0.24 | 0.65 | 0.99 | ||
√ | 0.49 | 1.73 | 0.986 | ||
√ | 0.20 | 0.57 | 0.98 |
SimAM Attention | CBAM Attention | SK Attention | MAE (bpm) ↓ | RMSE (bpm) ↓ | R ↑ |
---|---|---|---|---|---|
1.00 | 1.40 | 0.99 | |||
√ | 0.63 | 1.30 | 0.99 | ||
√ | 1.08 | 0.97 | 0.99 | ||
√ | 2.43 | 7.29 | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, X.; Su, Y.; Hou, X.; Yuan, X.; Li, H.; Wang, C. Research on Heart Rate Detection from Facial Videos Based on an Attention Mechanism 3D Convolutional Neural Network. Electronics 2025, 14, 269. https://doi.org/10.3390/electronics14020269
Sun X, Su Y, Hou X, Yuan X, Li H, Wang C. Research on Heart Rate Detection from Facial Videos Based on an Attention Mechanism 3D Convolutional Neural Network. Electronics. 2025; 14(2):269. https://doi.org/10.3390/electronics14020269
Chicago/Turabian StyleSun, Xiujuan, Ying Su, Xiankai Hou, Xiaolan Yuan, Hongxue Li, and Chuanjiang Wang. 2025. "Research on Heart Rate Detection from Facial Videos Based on an Attention Mechanism 3D Convolutional Neural Network" Electronics 14, no. 2: 269. https://doi.org/10.3390/electronics14020269
APA StyleSun, X., Su, Y., Hou, X., Yuan, X., Li, H., & Wang, C. (2025). Research on Heart Rate Detection from Facial Videos Based on an Attention Mechanism 3D Convolutional Neural Network. Electronics, 14(2), 269. https://doi.org/10.3390/electronics14020269