SACN: A Novel Rotating Face Detector Based on Architecture Search
Abstract
:1. Introduction
- We introduce architecture search to construct the network structure, which can reduce the angle error and the size of the model.
- We propose CC instead of non-maximum suppression (NMS). CC is a cluster method based on mean shift and can improve the accuracy of angle classification.
- Experiments were conducted on MOFDDB, which proved that the proposed approach provides a performance improvement compared to the state-of-the-art techniques in terms of angle error.
2. Related Work
2.1. Rotation-Invariance Face Detector
2.1.1. DFEN
2.1.2. ASCN
2.1.3. Rotational Regression
2.1.4. PCN
2.1.5. MTPCN
2.2. Architecture Search
- Select a state from and or from the set of states created by previous blocks.
- Select a second hidden state from the same options as in Step 1.
- Select an operation from the operation set to process the state selected in Step 1.
- Select an operation from the operation set to process the state selected in Step 2.
- Select a method from element-wise summation, element-wise multiplication or element-wise concatenation to combine the outputs of Steps 3 and 4 to create a new state.
a) 1 × 3 then 3 × 1 conv | b) 1 × 5 then 5 × 1 conv |
c) 1 × 7 then 7 × 1 conv | d) 3 × 3 dilated conv |
e) 5 × 5 dilated conv | f) 7 × 7 dilated conv |
g) 3 × 3 average pooling | h) 5 × 5 average pooling |
i) 7 × 7 average pooling | j) 3 × 3 max pooling |
k) 5 × 5 max pooling | l) 7 × 7 max pooling |
m) 3 × 3 depth-sep conv | n) 5 × 5 depth-sep conv |
o) 7 × 7 depth-sep conv | p) 1 × 1 convolution |
q) 3 × 3 convolution | r) skip connect |
3. Searching Architecture Calibration Network
3.1. Motivation of this Approach
3.2. Hypothesis of Center Cluster
3.3. Overall Processing
3.4. Center Cluster Calibration
3.5. SACN in First Stage
3.6. SACN in Second Stage
3.7. SACN in Third Stage
4. Experiments
4.1. Implementation Details
4.2. Benchmark Datasets
4.3. Evaluation Results
4.3.1. Results of Rotation Calibration
4.3.2. Accuracy Comparison
4.3.3. Problems and Limitations
4.3.4. Ablation Experiment
5. Conclusions and Future Works
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
RIPD | Rotation-Invariant Face Detection |
RIP | rotation-in-plane |
FCN | fully convolutional network |
CC | center cluster |
CNN | convolution neural network |
MTCNN | Multitask Cascaded Convolutional Networks |
FPN | Feature Pyramid Network |
DCNN | Deep Convolutional Neural Network |
DFEN | Direction-Sensitivity Features Ensemble Network |
MTPCN | Multi-task Progressive Calibration Networks |
ASCN | Angle-Sensitivity Cascaded Networks |
PCN | Progressive Calibration Networks |
SSD | Single Shot Detector |
NIN | network in network |
RNN | Recurrent Neural Network |
NMS | non-maximum suppression |
SACN | searching architecture calibration network |
RL | reinforcement learning |
FDDB | Face Detection Data Set and Benchmark |
IoU | Intersection over Union |
SGD | stochastic gradient descent |
References
- Zhao, H.; Ying, X.; Shi, Y.; Tong, X.; Wen, J.; Zha, H. RDCFace: Radial Distortion Correction for Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7721–7730. [Google Scholar]
- Deng, J.; Guo, J.; Liu, T.; Gong, M.; Zafeiriou, S. Sub-center arcface: Boosting face recognition by large-scale noisy web faces. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 741–757. [Google Scholar]
- Zhang, Y.; Deng, W.; Wang, M.; Hu, J.; Li, X.; Zhao, D.; Wen, D. Global-local gcn: Large-scale label noise cleansing for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7731–7740. [Google Scholar]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Lin, Z.; Shen, X.; Brandt, J.; Hua, G. A convolutional neural network cascade for face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5325–5334. [Google Scholar]
- Farfade, S.S.; Saberian, M.J.; Li, L.J. Multi-view face detection using deep convolutional neural networks. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China, 23–26 June 2015; pp. 643–650. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
- Kim, Y.; Park, W.; Roh, M.C.; Shin, J. GroupFace: Learning Latent Groups and Constructing Group-Based Representations for Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, Q.; Wu, T.; Zheng, H.; Guo, G. Hierarchical Pyramid Diverse Attention Networks for Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Najibi, M.; Singh, B.; Davis, L.S. Fa-rpn: Floating region proposals for face detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7723–7732. [Google Scholar]
- Prasad, S.; Li, Y.; Lin, D.; Sheng, D. maskedFaceNet: A Progressive Semi-Supervised Masked Face Detector. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 3389–3398. [Google Scholar]
- Zhou, L.F.; Gu, Y.; Liang, S.; Lei, B.J.; Liu, J. Direction-Sensitivity Features Ensemble Network for Rotation-Invariant Face Detection. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV); Springer: Berlin/Heidelberg, Germany, 2020; pp. 581–590. [Google Scholar]
- Yang, B.; Yang, C.; Liu, Q.; Yin, X.C. Joint rotation-invariance face detection and alignment with angle-sensitivity cascaded networks. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1473–1480. [Google Scholar]
- Rowley, H.A.; Baluja, S.; Kanade, T. Rotation invariant neural network-based face detection. In Proceedings of the 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231), Santa Barbara, CA, USA, 25 June 1998; pp. 38–44. [Google Scholar]
- Shi, X.; Shan, S.; Kan, M.; Wu, S.; Chen, X. Real-time rotation-invariant face detection with progressive calibration networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2295–2303. [Google Scholar]
- Zhou, L.F.; Gu, Y.; Wang, P.S.; Liu, F.Y.; Liu, J.; Xu, T.Y. Rotation-Invariant Face Detection with Multi-task Progressive Calibration Networks. In Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Zhongshan, China, 19–23 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 513–524. [Google Scholar]
- Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. arXiv 2018, arXiv:1806.09055. [Google Scholar]
- Hosang, J.; Benenson, R.; Schiele, B. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4507–4515. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8697–8710. [Google Scholar]
- Jain, V.; Learned-Miller, E. Fddb: A Benchmark for Face Detection in Unconstrained Settings; Technical Report, UMass Amherst Technical Report; UMass Amherst: Amherst, MA, USA, 2010. [Google Scholar]
- Liu, A.; Li, X.; Wan, J.; Liang, Y.; Escalera, S.; Escalante, H.J.; Madadi, M.; Jin, Y.; Wu, Z.; Yu, X.; et al. Cross-ethnicity face anti-spoofing recognition challenge: A review. IET Biom. 2021, 10, 24–43. [Google Scholar] [CrossRef]
- Karkkainen, K.; Joo, J. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 1548–1558. [Google Scholar]
Method | Recall Rate at 100 FP on FDDB | Angle Error | Detecting Speed | Model Size | |||||
---|---|---|---|---|---|---|---|---|---|
Up | Down | Left | Right | Average | CPU | GPU | |||
Rotation Router | 85.4 | 84.7 | 84.6 | 84.5 | 84.8 | 16.3 | 12 FPS | 15 FPS | 2.5 M |
Cascade CNN | 84.9 | 84.2 | 84.7 | 85.7 | 84.9 | 15.3 | 31 FPS | 67 FPS | 4.2 M |
Faster R-CNN | 84.2 | 82.5 | 81.9 | 82.1 | 82.7 | 18.2 | 1 FPS | 20 FPS | 350 M |
PCN | 87.9 | 87.3 | 86.8 | 87.4 | 87.5 | 12.6 | 29 FPS | 63 FPS | 4.2 M |
SACN (ours) | 88.2 | 87.2 | 87.2 | 87.1 | 87.8 | 10.5 | 27 FPS | 60 FPS | 4 M |
Method | Recall Rate at 100 FP on FDDB | Angle Error | ||||
---|---|---|---|---|---|---|
Up | Down | Left | Right | Average | ||
SACN () | 87.8 | 87.1 | 86.8 | 86.5 | 87.1 | 11.6° |
SACN () | 88.2 | 87.2 | 87.2 | 87.1 | 87.8 | 10.5° |
SACN () | 87.5 | 86.8 | 87.1 | 86.5 | 87.0 | 12.2° |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, A.; Xu, X.; Zhai, X. SACN: A Novel Rotating Face Detector Based on Architecture Search. Electronics 2021, 10, 558. https://doi.org/10.3390/electronics10050558
Song A, Xu X, Zhai X. SACN: A Novel Rotating Face Detector Based on Architecture Search. Electronics. 2021; 10(5):558. https://doi.org/10.3390/electronics10050558
Chicago/Turabian StyleSong, Anping, Xiaokang Xu, and Xinyi Zhai. 2021. "SACN: A Novel Rotating Face Detector Based on Architecture Search" Electronics 10, no. 5: 558. https://doi.org/10.3390/electronics10050558
APA StyleSong, A., Xu, X., & Zhai, X. (2021). SACN: A Novel Rotating Face Detector Based on Architecture Search. Electronics, 10(5), 558. https://doi.org/10.3390/electronics10050558