Region Resolution Learning and Region Segmentation Learning with Overall and Body Part Perception for Pedestrian Detection
Abstract
:1. Introduction
2. Related Works
2.1. Anchor-Based
2.2. Anchor-Free
3. Proposed Method
3.1. Overall Architecture
3.2. Resolution Learning
3.3. Segmentation Learning
3.4. Training
3.4.1. Loss Function
3.4.2. Ground Truth
3.4.3. Center Loss
3.4.4. Scale Loss
3.4.5. Offset Loss
3.4.6. Resolution Loss
3.4.7. Segmentation Loss
3.4.8. Total Function
3.5. Inference
4. Experiments
4.1. Experiment Setting
4.1.1. Datasets
4.1.2. Training Details
4.2. Ablation Study
4.3. Comparison with the State of the Arts
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.; Liao, S.; Ren, W.; Hu, W.; Yu, Y. High-level semantic feature detection: A new perspective for pedestrian detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5187–5196. [Google Scholar]
- Mao, J.; Xiao, T.; Jiang, Y.; Cao, Z. What can help pedestrian detection? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3127–3136. [Google Scholar]
- Chi, C.; Zhang, S.; Xing, J.; Lei, Z.; Li, S.Z.; Zou, X. Pedhunter: Occlusion robust pedestrian detector in crowded scenes. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10639–10646. [Google Scholar]
- Shang, M.; Xiang, D.; Wang, Z.; Zhou, E. V2F-Net: Explicit Decomposition of Occluded Pedestrian Detection. arXiv 2021, arXiv:2104.03106. [Google Scholar]
- He, Y.; Zhu, C.; Yin, X.C. Mutual-Supervised Feature Modulation Network for Occluded Pedestrian Detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milano, Italy, 10–15 January 2021; pp. 8453–8460. [Google Scholar]
- Zhou, C.; Yuan, J. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 135–151. [Google Scholar]
- Lu, R.; Ma, H.; Wang, Y. Semantic head enhanced pedestrian detection in a crowd. Neurocomputing 2020, 400, 343–351. [Google Scholar] [CrossRef] [Green Version]
- Zhou, C.; Yang, M.; Yuan, J. Discriminative feature transformation for occluded pedestrian detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9557–9566. [Google Scholar]
- Wang, Y.; Zhu, C.; Yin, X.C. A Hybrid Self-Attention Model for Pedestrians Detection. In Proceedings of the International Conference on Neural Information Processing; Springer: Bangkok, Thailand, 2020; pp. 62–74. [Google Scholar]
- Zhang, S.; Benenson, R.; Schiele, B. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3213–3221. [Google Scholar]
- Hasan, I.; Liao, S.; Li, J.; Ullah Akram, S.; Shao, L. Pedestrian detection: The elephant in the room. arXiv 2020, arXiv:2003.08799. [Google Scholar]
- Daliparthi, V.S.S.A. The Ikshana Hypothesis of Human Scene Understanding. arXiv 2021, arXiv:2101.10837. [Google Scholar]
- Wang, L.; Li, D.; Zhu, Y.; Tian, L.; Shan, Y. Dual Super-Resolution Learning for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3774–3783. [Google Scholar]
- Sha, M.; Boukerche, A. Semantic Fusion-based Pedestrian Detection for Supporting Autonomous Vehicles. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; pp. 1–6. [Google Scholar]
- Cai, J.; Lee, F.; Yang, S.; Lin, C.; Chen, H.; Kotani, K.; Chen, Q. Pedestrian as Points: An Improved Anchor-Free Method for Center-Based Pedestrian Detection. IEEE Access 2020, 8, 179666–179677. [Google Scholar] [CrossRef]
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. Resnest: Split-attention networks. arXiv 2020, arXiv:2004.08955. [Google Scholar]
- Meletis, P.; Wen, X.; Lu, C.; de Geus, D.; Dubbelman, G. Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding. arXiv 2020, arXiv:2004.07944. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 743–761. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the 31st Conference on Neural Information Processing System, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Tarvainen, A.; Valpola, H. Weight-averaged, consistency targets improve semi-supervised deep learning results. arXiv 2017, arXiv:1703.01780. [Google Scholar]
- Wang, W. Adapted Center and Scale Prediction: More Stable and More Accurate. arXiv 2020, arXiv:2002.09053. [Google Scholar]
- Wang, H.; Zhang, Y.; Ke, H.; Wei, N.; Xu, Z. Semantic Structural and Occlusive Feature Fusion for Pedestrian Detection. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2021; pp. 293–307. [Google Scholar]
- Song, T.; Sun, L.; Xie, D.; Sun, H.; Pu, S. Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 536–551. [Google Scholar]
- Liu, W.; Liao, S.; Hu, W.; Liang, X.; Chen, X. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 618–634. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Occlusion-aware R-CNN: Detecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 637–653. [Google Scholar]
Region | R | S | Visible | Reasonable | Bare | Partial | Heavy |
---|---|---|---|---|---|---|---|
total image | + | + | − | 10.99% | 7.83% | 9.89% | 43.63% |
pedestrian area | + | + | + | 9.71% | 5.83% | 8.89% | 42.53% |
+Resolution | Segmentation | Visible | Reasonable | Bare | Partial | Heavy |
---|---|---|---|---|---|---|
+ | − | − | 9.35% | 6.11% | 8.44% | 43.78% |
− | + | − | 9.84% | 5.68% | 9.50% | 43.39% |
+ | + | + | 9.71% | 5.83% | 8.89% | 42.53% |
Method | Backbone | Reasonable | Heavy | Partial | Bare |
---|---|---|---|---|---|
FRCNN [11] | VGG-16 | 15.4% | - | - | - |
FRCNN+Seg [11] | VGG-16 | 14.8% | - | - | - |
TLL [30] | ResNet-50 | 15.5% | 53.6% | 17.2% | 10.0% |
ALF [31] | ResNet-50 | 12.0% | 51.9% | 11.4% | 8.4% |
OR-CNN [32] | VGG-16 | 12.8% | 55.7% | 15.3% | 6.7% |
CSP [2] | ResNet-50 | 11.0% | 49.3% | 10.8% | 8.1% |
BCNet [15] | ResNet-50 | 9.8% | 53.3% | 9.2% | 5.8% |
ACSP [28] | ResNet-101 | 9.3% | 46.3% | 8.7% | 5.6% |
CSPRS (ours) | ResNeSt-101 | 9.71% | 42.53% | 8.89% | 5.83% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Wang, H.; Liu, Y.; Lu, M. Region Resolution Learning and Region Segmentation Learning with Overall and Body Part Perception for Pedestrian Detection. Electronics 2022, 11, 966. https://doi.org/10.3390/electronics11060966
Zhang Y, Wang H, Liu Y, Lu M. Region Resolution Learning and Region Segmentation Learning with Overall and Body Part Perception for Pedestrian Detection. Electronics. 2022; 11(6):966. https://doi.org/10.3390/electronics11060966
Chicago/Turabian StyleZhang, Yu, Hui Wang, Yizhuo Liu, and Mao Lu. 2022. "Region Resolution Learning and Region Segmentation Learning with Overall and Body Part Perception for Pedestrian Detection" Electronics 11, no. 6: 966. https://doi.org/10.3390/electronics11060966
APA StyleZhang, Y., Wang, H., Liu, Y., & Lu, M. (2022). Region Resolution Learning and Region Segmentation Learning with Overall and Body Part Perception for Pedestrian Detection. Electronics, 11(6), 966. https://doi.org/10.3390/electronics11060966