Augmented EMTCNN: A Fast and Accurate Facial Landmark Detection Network †
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. Network Architecture
3.1.1. EMTCNN Augmentation
3.1.2. Dilated Convolution
3.1.3. CoordConv Layer
3.2. Metric (Loss Function)
3.3. Dataset
4. Experiment
4.1. Training
4.2. Accuracy of Landmark Point Extraction
4.3. Landmark Point Extraction Speed
4.4. Effects of Weights on Accuracy
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Kim, H.; Park, J.; Kim, H.; Hwang, E. Facial landmark extraction scheme based on semantic segmentation. In Proceedings of the 2018 International Conference on Platform Technology and Service (PlatCon), Jeju, Korea, 29–31 January 2018; pp. 1–6. [Google Scholar]
- Kim, H.; Kim, H.; Hwang, E. Real-Time Facial Feature Extraction Scheme Using Cascaded Networks. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 27 February–2 March 2019; pp. 1–7. [Google Scholar]
- Kim, H.; Kim, H.; Hwang, E. Real-time shape tracking of facial landmarks. Multimedia Tools Appl. 2018, in press. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Jung, S.; Kim, Y.; Hwang, E. Real-time car tracking system based on surveillance videos. EURASIP J. Image Video Process. 2018, 2018, 133. [Google Scholar] [CrossRef] [Green Version]
- Fan, H.; Zhou, E. Approaching human level facial landmark localization by deep learning. Image Vis. Comput. 2016, 47, 27–35. [Google Scholar] [CrossRef]
- Ramanan, D.; Zhu, X. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2879–2886. [Google Scholar]
- Hou, Q.; Wang, J.; Cheng, L.; Gong, Y. Facial landmark detection via cascade multi-channel convolutional neural network. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 1800–1804. [Google Scholar]
- Feng, Z.H.; Kittler, J.; Awais, M.; Huber, P.; Wu, X.J. Face detection, bounding box aggregation and pose estimation for robust facial landmark localisation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2106–2115. [Google Scholar]
- Kim, H.; Park, J.; Kim, H.; Hwang, E.; Rho, S.; Rho, S. Robust facial landmark extraction scheme using multiple convolutional neural networks. Multimedia Tools Appl. 2018, 78, 3221–3238. [Google Scholar] [CrossRef]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 94–108. [Google Scholar]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef] [Green Version]
- Deng, Z.; Li, K.; Zhao, Q.; Chen, H. Face landmark localization using a single deep network. In Proceedings of the Chinese Conference on Biometric Recognition, Chengdu, China, 14–16 October 2016; pp. 68–76. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Liu, R.; Lehman, J.; Molino, P.; Such, F.P.; Frank, E.; Sergeev, A.; Yosinski, J. An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution. Adv. Neural Inf. Process. Syst. 2018, 31, 9605–9616. [Google Scholar]
- Rew, J.; Choi, Y.H.; Kim, D.; Rho, S.; Hwang, E. Evaluating skin hereditary traits based on daily activities. Front. Innov. Future Comput. Commun. 2014, 301, 261–270. [Google Scholar] [CrossRef]
- Kim, H.; Kim, W.; Rew, J.; Rho, S.; Hwang, E. Evaluation of hair and scalp condition based on microscopy image analysis. In Proceedings of the 2017 International Conference on Platform Technology and Service (PlatCon), Busan, Korea, 13–15 February 2017; pp. 1–4. [Google Scholar]
- Rew, J.; Choi, Y.H.; Rho, S.; Hwang, E. Monitoring skin condition using life activities on the SNS user documents. Multimed. Tools Appl. 2018, 77, 9827–9847. [Google Scholar] [CrossRef]
- Rew, J.; Choi, Y.H.; Kim, H.; Hwang, E. Skin Aging Estimation Scheme Based on Lifestyle and Dermoscopy Image Analysis. Appl. Sci. 2019, 9, 1228. [Google Scholar] [CrossRef] [Green Version]
- Kim, J.; Moon, J.; Hwang, E.; Kang, P. Recurrent inception convolution neural network for multi short-term load forecasting. Energy Build. 2019, 194, 328–341. [Google Scholar] [CrossRef]
- Le, T.; Vo, M.; Vo, B.; Hwang, E.; Rho, S.; Baik, S. Improving Electric Energy Consumption Prediction Using CNN and Bi-LSTM. Appl. Sci. 2019, 9, 4237. [Google Scholar] [CrossRef] [Green Version]
- Le, N.Q.K.; Ho, Q.T.; Ou, Y.Y. Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins. J. Comput. Chem. 2017, 38, 2000–2006. [Google Scholar] [CrossRef]
- Le, N.Q.K.; Yapp, E.K.Y.; Ou, Y.Y.; Yeh, H.Y. iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou’s 5-step rule. Anal. Biochem. 2019, 575, 17–26. [Google Scholar] [CrossRef]
- Le, N.Q.K.; Nguyen, V.N. SNARE-CNN: A 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data. PeerJ Comput. Sci. 2019, 5, e177. [Google Scholar] [CrossRef] [Green Version]
- Le, N.Q.K.; Huynh, T.T.; Yapp, E.K.Y.; Yeh, H.Y. Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles. Comput. Methods Programs Biomed. 2019, 177, 81–88. [Google Scholar] [CrossRef]
- Le, N.Q.K.; Ho, Q.T.; Yapp, E.K.Y.; Ou, Y.Y.; Yeh, H.Y. DeepETC: A deep convolutional neural network architecture for investigating and classifying electron transport chain’s complexes. Neurocomputing 2020, 375, 71–79. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Uijlings, J.R.R.; Van De Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- King, D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
- Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1867–1874. [Google Scholar]
- Sun, Y.; Wang, X.; Tang, X. Deep convolutional network cascade for facial point detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 3476–3483. [Google Scholar]
- Ranjan, R.; Patel, V.M.; Chellappa, R. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 121–135. [Google Scholar] [CrossRef] [Green Version]
- Xiao, S.; Feng, J.; Liu, L.; Nie, X.; Wang, W.; Yan, S.; Kassim, A. Recurrent 3d-2d dual learning for large-pose facial landmark detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1633–1642. [Google Scholar]
- Lai, H.; Xiao, S.; Pan, Y.; Cui, Z.; Feng, J.; Xu, C.; Yin, J.; Yan, S. Deep Recurrent Regression for Facial Landmark Detection. IEEE Trans. Circuits Syst. Video Technol. 2016, 28, 1144–1157. [Google Scholar] [CrossRef] [Green Version]
- Badrinarayanan, V.; Handa, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv 2015, arXiv:1505.07293. [Google Scholar]
- Badrinarayanan, V.; Badrinarayanan, V.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Rothe, R.; Guillaumin, M.; Van Gool, L. Non-maximum suppression for object detection by passing messages between windows. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; pp. 290–306. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Wider face: A face detection benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5525–5533. [Google Scholar]
- Shen, J.; Zafeiriou, S.; Chrysos, G.G.; Kossaifi, J.; Tzimiropoulos, G.; Pantic, M. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 7–13 December 2015; pp. 50–58. [Google Scholar]
- Le, V.; Brandt, J.; Lin, Z.; Bourdev, L.; Huang, T.S. Interactive facial feature localization. In Proceedings of the European Conference on Computer Vision, Rome, Italy, 8–11 October 2012; pp. 679–692. [Google Scholar]
- Asthana, A.; Zafeiriou, S.; Cheng, S.; Pantic, M. Robust discriminative response map fitting with constrained local models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3444–3451. [Google Scholar]
- Burgos-Artizzu, X.P.; Perona, P.; Dollár, P. Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1513–1520. [Google Scholar]
- Cao, X.; Wei, Y.; Wen, F.; Sun, J. Face Alignment by Explicit Shape Regression. Int. J. Comput. Vis. 2013, 107, 177–190. [Google Scholar] [CrossRef]
- Zhang, J.; Shan, S.; Kan, M.; Chen, X. Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 1–16. [Google Scholar]
- Xiong, X.; De la Torre, F. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 532–539. [Google Scholar]
- Zhu, S.; Li, C.; Change Loy, C.; Tang, X. Face alignment by coarse-to-fine shape searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4998–5006. [Google Scholar]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 918–930. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Model | Input Size | Number of Parameters |
---|---|---|
VGG-19 [47] | 224 × 224 | 143,667,240 |
Hyper Face [40] | 227 × 227 | 29,677,932 |
ResNet-18 [48] | 224 × 224 | 11,689,512 |
MTCNN [12] | - | 481,336 |
EMTCNN [2] | - | 6,029,938 |
Augmented EMTCNN | - | 6,083,186 |
Images | Augmentation Operations | |||
---|---|---|---|---|
Flip (Left/Right) | Brightness Adjustment | Salt & Pepper Noise | Region of Interest Filling | |
42,000 | ×2 | ×2 | ×2 | ×5 |
Total | 42,000 × 40 = 1,680,000 |
Method | Mean Normalized Distance | |
---|---|---|
Helen | 300-W | |
DRMF (Discriminative Response Map Fitting) [52] | 6.70 | 9.22 |
RCPR (Robust Cascaded Pose Regression) [53] | 5.93 | 8.35 |
ESR (Explicit Shape Regression) [54] | 5.70 | 7.58 |
CFAN (Coarse-to-Fine Auto-encoder Networks) [55] | 5.53 | 7.69 |
SDM (Supervised Descent Method) [56] | 5.50 | 7.50 |
CFSS (Coarse-to-Fine Shape Searching) [57] | 4.63 | 5.76 |
TCDCN (Tasks-Constrained Deep Convolutional Network) [58] | 4.60 | 5.54 |
Dlib [37] | 4.47 | - |
EMTCNN | 5.66 | 6.63 |
Augmented EMTCNN | 4.65 | 5.59 |
Augmented EMTCNN | EMTCNN | MTCNN | TCDCN | Dlib | |
---|---|---|---|---|---|
No. of landmark points | 68 | 68 | 5 | 68 | 68 |
Speed (fps) | 68 | 70 | 99 | 23 | 15 |
(x-axis Learning Weight/y-axis Learning Weight) | ||||||
---|---|---|---|---|---|---|
5:5 | 6:4 | 7:3 | 8:2 | 9:1 | ||
Meannormalized distance | EMTCNN | 6.94 | 6.72 | 6.36 | 5.66 | 6.83 |
Augmented EMTCNN | 5.42 | 4.65 | 5.22 | 5.43 | 6.07 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, H.-W.; Kim, H.-J.; Rho, S.; Hwang, E. Augmented EMTCNN: A Fast and Accurate Facial Landmark Detection Network. Appl. Sci. 2020, 10, 2253. https://doi.org/10.3390/app10072253
Kim H-W, Kim H-J, Rho S, Hwang E. Augmented EMTCNN: A Fast and Accurate Facial Landmark Detection Network. Applied Sciences. 2020; 10(7):2253. https://doi.org/10.3390/app10072253
Chicago/Turabian StyleKim, Hyeon-Woo, Hyung-Joon Kim, Seungmin Rho, and Eenjun Hwang. 2020. "Augmented EMTCNN: A Fast and Accurate Facial Landmark Detection Network" Applied Sciences 10, no. 7: 2253. https://doi.org/10.3390/app10072253
APA StyleKim, H. -W., Kim, H. -J., Rho, S., & Hwang, E. (2020). Augmented EMTCNN: A Fast and Accurate Facial Landmark Detection Network. Applied Sciences, 10(7), 2253. https://doi.org/10.3390/app10072253