Joint Pedestrian and Body Part Detection via Semantic Relationship Learning
Abstract
:1. Introduction
- We propose a BPIF representation to encode the semantic relationship between individual body parts (i.e., head, head-shoulder, upper body, and whole body), providing robustness against partial even full occlusions of the body part. While BPIF was used in the image space to perform face landmark detection in [24], our work differs from [24] in that we build BPIF in the feature space in order to allow feature sharing between different modules.
- We also propose an AJ-NMS to replace the original NMS algorithms widely used in object detection. The traditional NMS is operated on each category of foreground object, i.e., NMS is applied to the head, head-shoulder, upper-body, and body, separately, without considering their correlations. By contrast, the proposed AJ-NMS treat one person’s head, head-shoulder, upper-body, and body as a whole unit, leading to higher recall for detecting overlapped pedestrians, and small part such as pedestrian head. In addition, the proposed AJ-NMS possesses an additional advantage of knowing which body parts belong to the same pedestrian. This is useful for succeeding pedestrian analysis applications, such as person re-identification.
- The proposed approach advances the state-of-the-art in joint pedestrian and body part detection on the widely used CUHK-SYSU Person Search Dataset [25].
2. Related Work
2.1. Pedestrian Detection
2.2. Non-Maximum Suppression
2.3. Object Relation Learning
3. Our Approach
3.1. Overview
3.2. Body Part Indexed Feature (BPIF)
Algorithm 1 Compute BPIF |
|
3.3. Adaptive Joint Non-Maximum Suppression
3.4. Network Training
Algorithm 2 AJ-NMS |
|
4. Experiments
4.1. Datasets and Settings
4.2. Comparisons with the State-of-Art
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Liu, Y.; Zhao, Q.; Wu, Z. Pooling body parts on feature maps for misalignment robust person re-identification. In Proceedings of the 4th IEEE International Conference on Identity, Security, and Behavior Analysis (ISBA 2018), Singapore, 11–12 January 2018; pp. 1–8. [Google Scholar]
- Mousas, C.; Anagnostopoulos, C.N. Performance-Driven Hybrid Full-Body Character Control for Navigation and Interaction in Virtual Environments. 3D Res. 2017, 8, 18. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.A.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dollár, P.; Appel, R.; Belongie, S.J.; Perona, P. Fast Feature Pyramids for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1532–1545. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dollár, P.; Tu, Z.; Perona, P.; Belongie, S.J. Integral Channel Features. In Proceedings of the 20th British Machine Vision Conference (BMVC 2009), London, UK, 7–10 September 2009; pp. 1–11. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kim, K.; Cheon, Y.; Hong, S.; Roh, B.; Park, M. PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. arXiv, 2016; arXiv:1608.08021. [Google Scholar]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Wang, X.; Xiao, T.; Jiang, Y.; Shao, S.; Sun, J.; Shen, C. Repulsion Loss: Detecting Pedestrians in a Crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7774–7783. [Google Scholar]
- Zhang, S.; Yang, J.; Schiele, B. Occluded Pedestrian Detection Through Guided Attention in CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6995–7003. [Google Scholar]
- Noh, J.; Lee, S.; Kim, B.; Kim, G. Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 966–974. [Google Scholar]
- Lin, T.; Dollár, P.; Girshick, R.B.; He, K.; Hariharan, B.; Belongie, S.J. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Tang, C.; Ling, Y.; Yang, X.; Jin, W.; Chao, Z. Multi-View Object Detection Based on Deep Learning. Appl. Sci. 2018, 8, 1423. [Google Scholar] [CrossRef]
- Hu, H.; Gu, J.; Zhang, Z.; Dai, J.; Wei, Y. Relation Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 3588–3597. [Google Scholar]
- Liu, Y.; Wang, R.; Shan, S.; Chen, X. Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6985–6994. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4203–4212. [Google Scholar]
- Ramalingam, B.; Lakshmanan, A.K.; Ilyas, M.; Le, A.V.; Elara, M.R. Cascaded Machine-Learning Technique for Debris Classification in Floor-Cleaning Robot Application. Appl. Sci. 2018, 8, 2649. [Google Scholar] [CrossRef]
- Chen, G.; Cai, X.; Han, H.; Shan, S.; Chen, X. HeadNet: Pedestrian Head Detection Utilizing Body in Context. In Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 556–563. [Google Scholar]
- Han, H.; Jain, A.K.; Wang, F.; Shan, S.; Chen, X. Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2597–2609. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, F.; Han, H.; Shan, S.; Chen, X. Deep Multi-Task Learning for Joint Prediction of Heterogeneous Face Attributes. In Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 173–179. [Google Scholar]
- Zhang, G.; Han, H.; Shan, S.; Song, X.; Chen, X. Face Alignment across Large Pose via MT-CNN Based 3D Shape Reconstruction. In Proceedings of the 13th International Conference on Automatic Face and Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 210–217. [Google Scholar]
- Xiao, T.; Li, S.; Wang, B.; Lin, L.; Wang, X. Joint Detection and Identification Feature Learning for Person Search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 3376–3385. [Google Scholar]
- Zhang, S.; Benenson, R.; Omran, M.; Hosang, J.H.; Schiele, B. Towards Reaching Human Performance in Pedestrian Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 973–986. [Google Scholar] [CrossRef] [PubMed]
- Shrivastava, A.; Gupta, A.; Girshick, R.B. Training Region-Based Object Detectors with Online Hard Example Mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R.B. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]
- Divvala, S.K.; Hoiem, D.; Hays, J.; Efros, A.A.; Hebert, M. An empirical study of context in object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 1271–1278. [Google Scholar]
- Galleguillos, C.; Rabinovich, A.; Belongie, S.J. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Stewart, R.; Andriluka, M.; Ng, A.Y. End-to-End People Detection in Crowded Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 2325–2333. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv, 2018; arXiv:1704.04861. [Google Scholar]
- Han, H.; Li, J.; Jain, A.K.; Shan, S.; Chen, X. Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1–15. [Google Scholar] [CrossRef] [PubMed]
Pedestrian AP | Upper-Body AP | Head-Shoulder AP | Head AP | |
---|---|---|---|---|
Faster R-CNN [7] | 92.1 | 82.3 | 56.2 | 20.3 |
R-FCN-50 [35] | 91.4 | 79.7 | 54.7 | 17.2 |
R-FCN-101 [35] | 91.8 | 78.6 | 55.3 | 14.3 |
PVANet [8] | 90.8 | 89.3 | 85.1 | 64.6 |
HeadNet [21] | 94.1 | 92.8 | 90.8 | 86.0 |
Proposed | 94.9 | 93.1 | 91.4 | 87.0 |
Model | Pedestrian AP | Upper-Body AP | Head-Shoulder AP | Head AP |
---|---|---|---|---|
Proposed w/o BPIF&AJ-NMS | 94.1 | 92.8 | 90.8 | 86.0 |
Proposed w/o BPIF | 94.6(↑0.5) | 92.7(↓0.1) | 90.8(↑0.0) | 86.3(↑0.3) |
Proposed w/o AJ-NMS | 94.5(↑0.4) | 93.0(↑0.2) | 91.3(↑0.5) | 86.6(↑0.6) |
Proposed | 94.9(↑0.8) | 93.1(↑0.3) | 91.4(↑0.6) | 87.0(↑1.0) |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, J.; Lan, C.; Chen, W.; Han, H. Joint Pedestrian and Body Part Detection via Semantic Relationship Learning. Appl. Sci. 2019, 9, 752. https://doi.org/10.3390/app9040752
Gu J, Lan C, Chen W, Han H. Joint Pedestrian and Body Part Detection via Semantic Relationship Learning. Applied Sciences. 2019; 9(4):752. https://doi.org/10.3390/app9040752
Chicago/Turabian StyleGu, Junhua, Chuanxin Lan, Wenbai Chen, and Hu Han. 2019. "Joint Pedestrian and Body Part Detection via Semantic Relationship Learning" Applied Sciences 9, no. 4: 752. https://doi.org/10.3390/app9040752
APA StyleGu, J., Lan, C., Chen, W., & Han, H. (2019). Joint Pedestrian and Body Part Detection via Semantic Relationship Learning. Applied Sciences, 9(4), 752. https://doi.org/10.3390/app9040752