An Enhanced Feature Pyramid Object Detection Network for Autonomous Driving
Abstract
:1. Introduction
2. Proposed Method
2.1. Enhanced Feature Extraction Subnet
2.1.1. Feature Channel Weight Module (FCWM)
2.1.2. Feature Spatial Weight Module (FSWM)
2.1.3. Feature Channel Spatial Weight Module (FCSWM)
2.2. Adaptive Parallel Detection Subnet
2.2.1. Adaptive Context Expansion (ACE)
2.2.2. Parallel Detection Branch (PDB)
2.2.3. Loss Function
3. Experiments on Open Datasets
3.1. Implementation Details
3.2. Object Detection Results on Pascal VOC
3.3. Object Detection Results on KITTI
3.4. Object Detection Results on Cityscapes
4. Application in Autonomous Driving System
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Uijlings, J.R.R.; Sande, K.E.A.V.D.; Gevers, T.; Smeulders, A.W.M. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.B. Fast R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1440–1448. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
- Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Light-head r-cnn: In defense of two-stage object detector. arXiv 2017, arXiv:1711.07264. [Google Scholar]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 354–370. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. Hypernet: Towards accurate region proposal generation and joint object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 845–853. [Google Scholar]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R.B. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2874–2883. [Google Scholar]
- Lin, T.; Dollár, P.; Girshick, R.B.; He, K.; Hariharan, B.; Belongie, S.J. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Kong, T.; Sun, F.; Yao, A.; Liu, H.; Lu, M.; Chen, Y. RON: Reverse connection with objectness prior networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5244–5252. [Google Scholar]
- Shrivastava, A.; Sukthankar, R.; Malik, J.; Gupta, A. Beyond skip connections: Top-down modulation for object detection. arXiv 2016, arXiv:1612.06851. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Roy, A.G.; Navab, N.; Wachinger, C. Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 421–429. [Google Scholar]
- Gidaris, S.; Komodakis, N. Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1134–1142. [Google Scholar]
- Zhu, Y.; Urtasun, R.; Salakhutdinov, R.; Fidler, S. segdeepm: Exploiting segmentation and context in deep neural networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4703–4711. [Google Scholar]
- Wang, S. PCN: Part and context information for pedestrian detection with cnns. In Proceedings of the British Machine Vision Conference, Oxford, UK, 4–7 September 2017. [Google Scholar]
- Chen, X.; Zhu, Y. 3d object proposals for accurate object class detection. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2961–2969. [Google Scholar]
Approach | Backbone | Car | Person | Bus | Bike | Motorbike | Train | mAP |
---|---|---|---|---|---|---|---|---|
(a) EFPN | Res101 | 88.7 | 85.4 | 88.4 | 86.8 | 88.2 | 88.0 | 81.6 |
(b) FPN with enhanced feature extraction subnet | Res101 | 88.6 | 85.4 | 86.7 | 86.6 | 89.0 | 86.4 | 81.3 |
(c) FPN with adaptive parallel detection subnet | Res101 | 88.5 | 84.4 | 88.0 | 88.5 | 86.4 | 86.9 | 81.4 |
(d) FPN | Res101 | 88.2 | 84.7 | 86.9 | 85.5 | 85.5 | 87.2 | 81.1 |
(e) SSD 513 | Res101 | 88.1 | 83.0 | 88.2 | 87.6 | 87.5 | 87.2 | 80.6 |
(f) DSSD 513 | Res101 | 88.7 | 83.7 | 89.0 | 86.2 | 87.5 | 85.7 | 81.5 |
(g) R-FCN | Res101 | 88.5 | 81.2 | 86.8 | 87.2 | 79.9 | 85.9 | 80.5 |
(h) MR-CNN | VGG | 85.9 | 76.4 | 88.0 | 84.1 | 85.0 | 85.0 | 78.2 |
(i) Faster R-CNN | Res101 | 85.3 | 75.4 | 85.1 | 80.7 | 80.9 | 85.3 | 76.4 |
(j) ION | VGG | 85.1 | 74.4 | 85.4 | 83.1 | 82.2 | 84.2 | 75.6 |
Approach | Context Expansion | Detection Branch | mAP |
---|---|---|---|
(a) FPN | 0 | share | 75.8 |
(b) FPN with | 0.1 | PDB | 76.3 |
(c) FPN with | 0.3 | PDB | 76.3 |
(d) FPN with | 0.5 | PDB | 75.8 |
(e) FPN with | 0.7 | PDB | 76.2 |
(f) FPN with | 0.5 | merge | 76.2 |
(j) FPN with | ACE | merge | 75.5 |
(h) FPN with | ACE | concat | 76.1 |
(i) FPN with | ACE | PDB | 76.6 |
Approach | Test on | Car | Pedestrian | mAP |
---|---|---|---|---|
(a) [email protected] | KITTI | 90.3 | 78.3 | 84.3 |
(b) [email protected] | KITTI | 90.4 | 81.0 | 85.7 |
(c) [email protected] | KITTI | 73.6 | 33.4 | 53.5 |
(d) [email protected] | KITTI | 74.0 | 35.8 | 54.9 |
Approach | Test on | Car | Pedestrian | mAP |
---|---|---|---|---|
(a) [email protected] | Cityscapes | 38.5 | 19.9 | 29.2 |
(b) [email protected] | Cityscapes | 38.9 | 21.4 | 30.1 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, Y.; Tang, S.; Zhang, S.; Ogai, H. An Enhanced Feature Pyramid Object Detection Network for Autonomous Driving. Appl. Sci. 2019, 9, 4363. https://doi.org/10.3390/app9204363
Wu Y, Tang S, Zhang S, Ogai H. An Enhanced Feature Pyramid Object Detection Network for Autonomous Driving. Applied Sciences. 2019; 9(20):4363. https://doi.org/10.3390/app9204363
Chicago/Turabian StyleWu, Yutian, Shuming Tang, Shuwei Zhang, and Harutoshi Ogai. 2019. "An Enhanced Feature Pyramid Object Detection Network for Autonomous Driving" Applied Sciences 9, no. 20: 4363. https://doi.org/10.3390/app9204363
APA StyleWu, Y., Tang, S., Zhang, S., & Ogai, H. (2019). An Enhanced Feature Pyramid Object Detection Network for Autonomous Driving. Applied Sciences, 9(20), 4363. https://doi.org/10.3390/app9204363