Effective Attention-Based Mechanism for Masked Face Recognition
Abstract
:1. Introduction
- A new MFR method using a deep learning network architecture based on the attention module and angular margin loss ArcFace is proposed to focus on the informative parts not occluded by the facial mask (i.e., the regions around the eyes).
- The CBAM attention module is integrated with a refined ResNet-50 network architecture for feature extraction without additional computational cost.
- Proposed new simulated masked face images generated from regular face recognition datasets using a data argumentation tool for model training and valuation. Datasets generated in this research are available through the website https://github.com/MaskedFaceDataSet/SimulatedMaskedFaceDataset (accessed on 6 May 2022).
- The experimental results on simulated and real masked face datasets demonstrate that the proposed method outperforms other state-of-the-art methods for all datasets.
2. Related Works
3. Proposed Method
3.1. Feature Extraction Network
3.2. Convolutional Block Attention Module (CBAM)
- 1.
- Channel Attention Module
- 2.
- Spatial Attention Module
3.3. Network Architecture
3.4. Loss Function
4. Experiments and Results
4.1. Datasets
- LFW_m is generated from the LFW dataset, which is most used for face verification. This dataset contains 5749 unique identities and a total of 13,233 face images. The experiment in this paper follows the LFW standard protocol using 6000 predefined comparison pairs, of which 3000 pairs have the same identities and the other 3000 pairs have different identities.
- AgeDB-30_m is generated from the public benchmark dataset AgeDB, which is an unconstrained face recognition dataset which is most used for cross-age face verification. This dataset contains 568 unique identities and a total of 16,588 face images. The experiment follows the protocol of AgeDB-30 using 6000 predefined comparison pairs, of which 3000 pairs have the same identities and the other 3000 pairs have different identities.
- CFP-FP_m is generated from the public benchmark dataset CFP, which contains 500 celebrities in frontal and profile views. This dataset has two verification protocols: CFP-FF and CFP-FP. In the experiment, the method uses the CFP-FP protocol using 7000 predefined comparison pairs, of which 3500 pairs have the same identities and the other 3500 pairs have different identities.
- MFR2 is a small set of real masked face images. It contains 53 identities of celebrities and politicians among 269 images, where each identity has an average of five images. This dataset consists of strange mask patterns. We collect 800 pairs of images for real masked face verification in the experiment. This means that 400 pairs have the same identities whereas 400 pairs have different identities.
4.2. Experimental Setting
4.3. Evaluation Metrics
4.4. Experimental Results
4.5. Ablation Experiments
5. Discussion
6. Conclusions
- The attention module can focus on the non-occluded part of the masked face and significantly improve the recognition performance.
- The newly generated masked face dataset can effectively help the model training and evaluation.
- The results show that the proposed method provides outstanding performance and a better recognition rate on both generated masked face and real masked image datasets compared to the state-of-the-art methods.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
- Liu, B.; Deng, W.; Zhong, Y.; Wang, M.; Hu, J.; Tao, X.; Huang, Y. Fair loss: Margin-aware reinforcement learning for deep face recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 10052–10061. [Google Scholar]
- Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5265–5274. [Google Scholar]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar]
- Huang, Y.; Wang, Y.; Tai, Y.; Liu, X.; Shen, P.; Li, S.; Li, J.; Huang, F. Curricularface: Adaptive curriculum learning loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5901–5910. [Google Scholar]
- Khan, S.; Siddique, R.; Shereen, M.A.; Ali, A.; Liu, J.; Bai, Q.; Bashir, N.; Xue, M. Emergence of a novel coronavirus, severe acute respiratory syndrome coronavirus 2: Biology and therapeutic options. J. Clin. Microbiol. 2020, 58, e00187-20. [Google Scholar] [CrossRef] [Green Version]
- Damer, N.; Grebe, J.H.; Chen, C.; Boutros, F.; Kirchbuchner, F.; Kuijper, A. The effect of wearing a mask on face recognition performance: An exploratory study. In Proceedings of the 2020 International Conference of the Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany, 14–16 September 2020; pp. 1–6. [Google Scholar]
- Anwar, A.; Raychowdhury, A. Masked face recognition for secure authentication. arXiv 2020, arXiv:2008.11104. [Google Scholar]
- Montero, D.; Nieto, M.; Leskovsky, P.; Aginako, N. Boosting masked face recognition with multi-task arcface. arXiv 2021, arXiv:2104.09874. [Google Scholar]
- Deng, H.; Feng, Z.; Qian, G.; Lv, X.; Li, H.; Li, G. MFCosface: A masked-face recognition algorithm based on large margin cosine loss. Appl. Sci. 2021, 11, 7310. [Google Scholar] [CrossRef]
- Jiang, M.; Fan, X.; Yan, H. Retinamask: A face mask detector. arXiv 2020, arXiv:2005.03950. [Google Scholar]
- Yang, G.; Feng, W.; Jin, J.; Lei, Q.; Li, X.; Gui, G.; Wang, W. Face Mask Recognition System with YOLOV5 Based on Image Recognition. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2020; pp. 1398–1404. [Google Scholar]
- Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 2021, 167, 108288. [Google Scholar] [CrossRef]
- Mandal, B.; Okeukwu, A.; Theis, Y. Masked Face Recognition using ResNet-50. arXiv 2021, arXiv:2104.08997. [Google Scholar]
- Hariri, W. Efficient masked face recognition method during the COVID-19 pandemic. Signal Image Video Process. 2022, 16, 605–612. [Google Scholar] [CrossRef]
- Song, L.; Gong, D.; Li, Z.; Liu, C.; Liu, W. Occlusion robust face recognition based on mask learning with pairwise differential siamese network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–27 October 2019; pp. 773–782. [Google Scholar]
- Din, N.U.; Javed, K.; Bae, S.; Yi, J. A novel GAN-based network for unmasking of masked face. IEEE Access 2020, 8, 44276–44287. [Google Scholar] [CrossRef]
- Li, Y.; Guo, K.; Lu, Y.; Liu, L. Cropping and attention based approach for masked face recognition. Appl. Intell. 2021, 51, 3012–3025. [Google Scholar] [CrossRef]
- Boutros, F.; Damer, N.; Kirchbuchner, F.; Kuijper, A. Unmasking Face Embeddings by Self-restrained Triplet Loss for Accurate Masked Face Recognition. arXiv 2021, arXiv:2103.01716. [Google Scholar]
- Alzu’bi, A.; Albalas, F.; Al-Hadhrami, T.; Younis, L.B.; Bashayreh, A. Masked Face Recognition Using Deep Learning: A Review. Electronics 2021, 10, 2666. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, G.; Huang, B.; Xiong, Z.; Hong, Q.; Wu, H.; Yi, P.; Jiang, K.; Wang, N.; Pei, Y. Masked face recognition dataset and application. arXiv 2020, arXiv:2003.09093. [Google Scholar]
- Cabani, A.; Hammoudi, K.; Benhabiles, H.; Melkemi, M. MaskedFace-Net–A dataset of correctly/incorrectly masked face images in the context of COVID-19. Smart Health 2021, 19, 100144. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yan, C.; Meng, L.; Li, L.; Zhang, J.; Wang, Z.; Yin, J.; Zhang, J.; Sun, Y.; Zheng, B. Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-attention. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 18, 1–18. [Google Scholar] [CrossRef]
- Li, S.; Lee, H.J. Effective Attention-Based Feature Decomposition for Cross-Age Face Recognition. Appl. Sci. 2022, 12, 4816. [Google Scholar] [CrossRef]
- Wu, C.Y.; Ding, J.J. Occluded face recognition using low-rank regression with generalized gradient direction. Pattern Recognit. 2018, 80, 256–268. [Google Scholar] [CrossRef] [Green Version]
- Qiu, H.; Gong, D.; Li, Z.; Liu, W.; Tao, D. End2End occluded face recognition by masking corrupted features. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef]
- Zeng, D.; Veldhuis, R.; Spreeuwers, L. A survey of face recognition techniques under occlusion. arXiv 2020, arXiv:2006.11366. [Google Scholar] [CrossRef]
- Yuan, L.; Li, F. Face recognition with occlusion via support vector discrimination dictionary and occlusion dictionary based sparse representation classification. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 110–115. [Google Scholar]
- Deng, W.; Hu, J.; Guo, J. Extended SRC: Undersampled face recognition via intraclass variant dictionary. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1864–1870. [Google Scholar] [CrossRef] [Green Version]
- Huang, J.; Nie, F.; Huang, H.; Ding, C. Supervised and projected sparse coding for image classification. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 14–18 July 2013. [Google Scholar]
- Yang, J.; Luo, L.; Qian, J.; Tai, Y.; Zhang, F.; Xu, Y. Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 156–171. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Yeh, R.A.; Chen, C.; Yian Lim, T.; Schwing, A.G.; Hasegawa-Johnson, M.; Do, M.N. Semantic image inpainting with deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5485–5493. [Google Scholar]
- Li, Y.; Liu, S.; Yang, J.; Yang, M.-H. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, B.; Wang, Z.; Wang, G.; Jiang, K.; Zeng, K.; Han, Z.; Tian, X.; Yang, Y. When Face Recognition Meets Occlusion: A New Benchmark. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 4240–4244. [Google Scholar]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal. Processing Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef] [Green Version]
- Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1867–1874. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 12–18 October 2008. [Google Scholar]
- Moschoglou, S.; Papaioannou, A.; Sagonas, C.; Deng, J.; Kotsia, I.; Zafeiriou, S. Agedb: The first manually collected, in-the-wild age database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 51–59. [Google Scholar]
- Sengupta, S.; Chen, J.-C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32, pp. 8024–8035. [Google Scholar]
Dataset | Type | Identities | Images |
---|---|---|---|
CASIA-WebFace_m | Simulated mask | 10,575 | 789,296 |
LFW_m | Simulated mask | 5749 | 12,000 |
AgeDB-30_m | Simulated mask | 568 | 12,000 |
CFP-FP_m | Simulated mask | 500 | 14,000 |
MFR2 | Real masked faces | 53 | 269 |
Dataset | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
LFW_m | 99.43 | 99.30 | 99.56 | 99.43 |
AgeDB-30_m | 95.86 | 93.83 | 97.82 | 95.78 |
CFP-FP_m | 97.74 | 96.77 | 98.69 | 97.72 |
MFR2 | 96.75 | 96.25 | 97.22 | 96.73 |
Method | Training Set | LFW_m | AgeDB-30_m | CFP-FP_m | MFR2 |
---|---|---|---|---|---|
CosFace [4] | CASIA-Webface | 95.23 | 93.40 | 92.21 | 63.00 |
Softmax [5] | CASIA-Webface | 96.68 | 93.50 | 94.78 | 69.75 |
ArcFace [5] | CASIA-Webface | 96.85 | 94.10 | 95.10 | 71.87 |
Proposed method | CASIA-Webface_m | 99.43 | 95.86 | 97.74 | 96.75 |
Method | Training Set | LFW_m | AgeDB-30_m | CFP-FP_m | MFR2 |
---|---|---|---|---|---|
Huang et al. [38] | WebFace-OCC | 97.08 | 87.18 | 86.07 | - |
Anwar et al. [9] | VGGFace2-mini-SM | 97.25 | - | - | 95.99 |
MFCosface [11] | VGG-Face2_m | 99.33 | - | - | 98.50 |
Proposed method | CASIA-Webface_m | 99.43 | 95.86 | 97.74 | 96.75 |
Dataset | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
LFW_m | 99.41 | 99.26 | 99.56 | 99.40 |
AgeDB-30_m | 95.38 | 93.10 | 98.11 | 95.53 |
CFP-FP_m | 96.98 | 96.17 | 98.40 | 97.27 |
MFR2 | 99.00 | 99.50 | 98.54 | 99.02 |
Method | LFW_m | AgeDB-30_m | CFP-FP_m | MFR2 |
---|---|---|---|---|
CBAM | 98.66 | 94.45 | 96.15 | 95.50 |
Backbone | 99.31 | 95.28 | 97.08 | 96.25 |
Backbone + Mchannel | 99.35 | 95.53 | 97.47 | 96.50 |
Backbone + Mspatial | 99.38 | 95.58 | 97.38 | 96.75 |
Backbone + Mchannel + Mspatial | 99.43 | 95.86 | 97.74 | 96.75 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pann, V.; Lee, H.J. Effective Attention-Based Mechanism for Masked Face Recognition. Appl. Sci. 2022, 12, 5590. https://doi.org/10.3390/app12115590
Pann V, Lee HJ. Effective Attention-Based Mechanism for Masked Face Recognition. Applied Sciences. 2022; 12(11):5590. https://doi.org/10.3390/app12115590
Chicago/Turabian StylePann, Vandet, and Hyo Jong Lee. 2022. "Effective Attention-Based Mechanism for Masked Face Recognition" Applied Sciences 12, no. 11: 5590. https://doi.org/10.3390/app12115590
APA StylePann, V., & Lee, H. J. (2022). Effective Attention-Based Mechanism for Masked Face Recognition. Applied Sciences, 12(11), 5590. https://doi.org/10.3390/app12115590