Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation
Abstract
:1. Introduction
- (1)
- We propose for the first time the use of different feature extractors (enhanced through deconvolution and enhanced through pooling) for the localization and classification tasks, respectively, in the vehicle detection process, which results in a significant improvement of speed and detection accuracy trade-off.
- (2)
- In order to enhance the representation power of feature pyramids, we concatenate lower feature maps of basic feature pyramid through deconvolution to precisely produce vehicle bounding boxes, and concatenate upper feature maps of basic feature pyramid through pooling to accurately and simultaneously classify vehicle categories.
- (3)
- For each concatenated feature pyramid, we increase the number of channels for each feature layer in the pyramid. In this way, the multiple scale boxes for one vehicle can be effectively eliminated.
- (4)
- As for the size of default box, we allow the predictions to produce smaller bounding boxes than conventional SSD by adjusting the and parameters, thus can accurately detect small vehicles in real-time.
2. Related Work
2.1. Deep CNNs for Object Detection
2.2. Deep CNNs for Vehicle Detection
3. The Proposed Vehicle Detection Method
3.1. Feature Concatenation
3.1.1. Feature Concatenation through Deconvolution for Localization
3.1.2. Feature Concatenation through Pooling for Categorization
3.2. Default Box with Feature Concatenation
3.3. Deep Nets Training
4. Experiments and Results
4.1. Dataset and Experimental Configuration
4.1.1. Training Dataset and Data Augmentation
4.1.2. Implementation Details
4.2. Experiments on UA-DETRAC Dataset
4.2.1. The Importance of Feature Concatenation
4.2.2. Comparisons with State-of-the-art Detection Methods
4.3. Experiments on KITTI Dataset
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Chu, W.; Liu, Y.; Shen, C.; Cai, D.; Hua, X.S. Multi-Task Vehicle Detection With Region-of-Interest Voting. IEEE Trans. Image Process. 2018, 27, 432–441. [Google Scholar] [CrossRef] [PubMed]
- Hu, X.; Xu, X.; Xiao, Y.; Chen, H.; He, S.; Qin, J.; Heng, P.A. SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection. arXiv, 2018; arXiv:1804.00433v1. [Google Scholar] [CrossRef]
- Loukas, G.; Vuong, T.; Heartfield, R.; Sakellari, G.; Yoon, Y.; Gan, D. Cloud-Based Cyber-Physical Intrusion Detection for Vehicles Using Deep Learning. IEEE Access 2018, 6, 3491–3508. [Google Scholar] [CrossRef]
- Sivaraman, S.; Trivedi, M.M. A General Active-Learning Framework for On-Road Vehicle Recognition and Tracking. IEEE Trans. Intell. Transp. Syst. 2010, 11, 267–276. [Google Scholar] [CrossRef] [Green Version]
- Niknejad, H.T.; Takeuchi, A.; Mita, S.; McAllester, D. On-Road Multivehicle Tracking Using Deformable Object Model and Particle Filter With Improved Likelihood Estimation. IEEE Trans. Intell. Transp. Syst. 2012, 13, 748–758. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. arXiv, 2015; arXiv:1406.4729v4. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uijlings, J.R.R.; Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Jeong, J.; Park, H.; Kwak, N. Enhancement of SSD by concatenating feature maps for object detection. arXiv, 2017; arXiv:1705.09587v1. [Google Scholar]
- Ghodrati, A.; Diba, A.; Pedersoli, M.; Tuytelaars, T.; Gool, L.V. DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2578–2586. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv, 2014; arXiv:1409.4842v1. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the Neural Information Processing Systems (NIPS), Montréal, QC, Canada, 7–12 December 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Erhan, D.; Szegedy, C.; Toshev, A.; Anguelov, D. Scalable Object Detection using Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Szegedy, C.; Reed, S.; Erhan, D.; Anguelov, D.; Ioffe, S. Scalable, High-Quality Object Detection. arXiv, 2015; arXiv:1412.1441v3. [Google Scholar]
- Bell, S.; Zitnick, L.; Bala, K.; Girshick, R. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Liu, W.; Rabinovich, A.; Berg, A.C. ParseNet: Looking Wider to See Better. arXiv, 2015; arXiv:1506.04579. [Google Scholar]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv, 2016; arXiv:1612.08242v1. [Google Scholar]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv, 2017; arXiv:1701.06659v1. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. arXiv, 2018; arXiv:1711.06897. [Google Scholar]
- Liu, Y.; Wang, R.; Shan, S.; Chen, X. Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Zhou, P. Scale-Transferrable Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv, 2018; arXiv:1804.02767v1. [Google Scholar]
- Huang, S.C.; Do, B.H. Radial basis function based neural network for motion detection in dynamic scenes. IEEE Trans. Cybern. 2014, 44, 114–125. [Google Scholar] [CrossRef] [PubMed]
- Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis. J. ACM 2011, 58, 1–37. [Google Scholar] [CrossRef]
- Xu, H.; Caramanis, C.; Sanghavi, S. Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory 2012, 58, 3047–3064. [Google Scholar] [CrossRef]
- Chen, B.H.; Huang, S.C. Probabilistic neural networks based moving vehicles extraction algorithm for intelligent traffic surveillance systems. Inf. Sci. 2015, 299, 283–295. [Google Scholar] [CrossRef]
- Wang, L.; Lu, Y.; Wang, H.; Zheng, Y.; Ye, H.; Xue, X. Evolving Boxes for Fast Vehicle Detection. arXiv, 2017; arXiv:1702.00254v3. [Google Scholar]
- Cai, Y.; Wang, H.; Zheng, Z.; Sun, X. Scene-Adaptive Vehicle Detection Algorithm Based on a Composite Deep Structure. IEEE Access 2017, 5, 22804–22811. [Google Scholar] [CrossRef]
- Kuang, H.; Zhang, X.; Li, Y.J.; Yan, H. Nighttime Vehicle Detection Based on Bio-Inspired Image Enhancement and Weighted Score-Level Feature Fusion. IEEE Trans. Intell. Transp. Syst. 2017, 18, 927–936. [Google Scholar] [CrossRef]
- Ji, X.; Zhang, G.; Chen, X.; Guo, Q. Multi-Perspective Tracking for Intelligent Vehicle. IEEE Trans. Intell. Transp. Syst. 2018, 19, 518–529. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, L.; Shao, L.; Mellor, M. DAVE: A Unied Framework for Fast Vehicle Detection and Annotation. arXiv, 2016; arXiv:1607.04564v3. [Google Scholar]
- Yang, L.; Luo, P.; Loy, C.C.; Tang, X. A Large-Scale Car Dataset for Fine-Grained Categorization and Verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Tang, T.; Zhou, S.; Deng, Z.; Lei, L.; Zou, H. Arbitrary-Oriented Vehicle Detection in Aerial Imagery with Single Convolutional Neural Networks. Remote Sens. 2017, 9, 1170. [Google Scholar] [CrossRef]
- Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors 2017, 17, 336. [Google Scholar] [CrossRef]
- Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Zou, H. Toward Fast and Accurate Vehicle Detection in Aerial Images Using Coupled Region-Based Convolutional Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3652–3664. [Google Scholar] [CrossRef]
- Qu, T.; Zhang, Q.; Sun, S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimedia Tools Appl. 2017, 76, 21651–21663. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv, 2015; arXiv:1502.03167v3. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Hypercolumns for Object Segmentation and Fine-grained Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Lecun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Lyu, S.; Chang, M.C.; Du, D.; Wen, L.; Qi, H.; Li, Y.; Wei, Y.; Ke, L.; Hu, T.; Coco, M.D.; et al. UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking. arXiv, 2015; arXiv:1511.04136v3. [Google Scholar]
- Geiger, A.; Philip, G.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv, 2014; arXiv:1408.5093v1. [Google Scholar]
- Fredrik, G.; Erik, L.N. Automotive 3D Object Detection Without Target Domain Annotations. Master’s Thesis, Linköping University, Linköping, Sweden, 2018. [Google Scholar]
- Yang, F.; Choi, W.; Lin, Y. Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Rajarm, R.; Bar, E.; Trivedi, M. RefineNet: Refining Object Detectors for Autonomous Driving. IEEE Trans. Intell. Veh. 2016, 1, 358–368. [Google Scholar] [CrossRef]
- Rajarm, R.; Bar, E.; Trivedi, M. RefineNet: Iterative Refinement for Accurate Object Localization. In Proceedings of the Intelligent Transportation Systems Conference, Rio de Janeiro, Brazil, 1–4 November 2016. [Google Scholar]
- Hu, Q.; Paisitkriangkrai, S.; Shen, C.; Hengel, A.; Porikli, F. Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1002–1014. [Google Scholar] [CrossRef] [Green Version]
- Stewart, R.; Andriluka, M.; Ng, A. End-to-End People Detection in Crowded Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Type | Layer_Name | Bottom | Top | |
---|---|---|---|---|
Feature concatenation through deconvolution for localization | ||||
Stage 1 | Convolution | conv9_2 | conv9_1 | conv9_2 |
Stage 2 | Deconvolution | deconv9_2 | conv9_2 | deconv9_2 |
Concat | conv8_2_concat | deconv9_2 & conv8_2 | conv8_2_concat | |
Stage 3 | Deconvolution | deconv8_2 | conv8_2_concat | deconv8_2 |
Concat | conv7_2_concat | deconv8_2 & conv7_2 | conv7_2_concat | |
Stage 4 | Deconvolution | deconv7_2 | conv7_2_concat | deconv7_2 |
Concat | conv6_2_concat | deconv7_2 & conv6_2 | conv6_2_concat | |
Stage 5 | Deconvolution | deconv6_2 | conv6_2_concat | deconv6_2 |
Concat | fc7_concat | deconv6_2 & fc7 | fc7_concat | |
Stage 6 | Deconvolution | defc7 | fc7_concat | defc7 |
Concat | conv4_3_concat | defc7 & conv4_3 | conv4_3_concat | |
Feature concatenation through pooling for categorization | ||||
Stage 1 | Convolution | conv4_3 | conv4_2 | conv4_3 |
Stage 2 | Pooling | pooling4_3 | conv4_3 | pooling4_3 |
Concat | fc7_concat | pooling4_3 & fc7 | fc7_concat | |
Stage 3 | Pooling | poolingfc7 | fc7_concat | poolingfc7 |
Concat | conv6_2_concat | poolingfc7 & conv6_2 | conv6_2_concat | |
Stage 4 | Pooling | pooling6_2 | conv6_2_concat | pooling6_2 |
Concat | conv7_2_concat | pooling6_2 & conv7_2 | conv7_2_concat | |
Stage 5 | Pooling | pooling7_2 | conv7_2_concat | pooling7_2 |
Concat | conv8_2_concat | pooling7_2 & conv8_2 | conv8_2_concat | |
Stage 6 | Pooling | pooling8_2 | conv8_2_concat | pooling8_2 |
Concat | conv9_2_concat | pooling8_2 & conv9_2 | conv9_2_concat |
Layer | Min_size | Max_size | 1:1 (Small) | 1:1 (Large) | 1:2 | 2:1 | 1:3 | 3:1 |
---|---|---|---|---|---|---|---|---|
conv4_3_pb | 21 | 45 | 21 × 21 | 31 × 31 | 15 × 30 | 30 × 15 | - | - |
fc7_pb | 45 | 99 | 45 × 45 | 67 × 67 | 32 × 64 | 64 × 32 | 26 × 78 | 78 × 26 |
conv6_2_pb | 99 | 153 | 99 × 99 | 123 × 123 | 70 × 140 | 140 × 70 | 57 × 171 | 171 × 57 |
conv7_2_pb | 153 | 207 | 153 × 153 | 178 × 178 | 108 × 216 | 216 × 108 | 88 × 265 | 265 × 88 |
conv8_2_pb | 207 | 261 | 207 × 207 | 232 × 232 | 146 × 293 | 293 × 146 | - | - |
conv9_2_pb | 261 | 315 | 261 × 261 | 315 × 315 | 185 × 369 | 369 × 185 | - | - |
Layer | Min_size | Max_size | 1:1 (Small) | 1:1 (Large) | 1:2 | 2:1 | 1:3 | 3:1 |
---|---|---|---|---|---|---|---|---|
conv4_3_pb | 36 | 77 | 36 × 36 | 53 × 53 | 25 × 51 | 51 × 25 | - | - |
fc7_pb | 77 | 169 | 77 × 77 | 114 × 114 | 54 × 109 | 109 × 54 | 44 × 133 | 133 × 44 |
conv6_2_pb | 169 | 261 | 169 × 169 | 210 × 210 | 120 × 239 | 239 × 120 | 98 × 293 | 293 × 98 |
conv7_2_pb | 261 | 353 | 261 × 261 | 304 × 304 | 185 × 369 | 369 × 185 | 151 × 452 | 452 × 151 |
conv8_2_pb | 353 | 445 | 353 × 353 | 396 × 396 | 250 × 499 | 499 × 250 | - | - |
conv9_2_pb | 445 | 537 | 445 × 445 | 489 × 489 | 315 × 629 | 629 × 315 | - | - |
Method | Input | FPS | mAP (%) | Car | Bus | Van | Others |
---|---|---|---|---|---|---|---|
Ours (DP-SSD) * | 300 | 50.47 | 75.43 | 85.38 | 82.12 | 70.35 | 63.87 |
Ours (DP-SSD) * | 512 | 25.12 | 77.94 | 86.27 | 83.39 | 78.26 | 63.84 |
Aablation experiments follow below: | |||||||
SSD [12] | 300 | 58.78 | 74.18 | 84.46 | 81.56 | 71.85 | 58.86 |
Ours (SSD pooling) * | 300 | 52.36 | 73.49 | 84.03 | 79.26 | 71.76 | 58.91 |
Ours (SSD deconvolution) * | 300 | 51.78 | 73.27 | 85.10 | 78.86 | 70.87 | 58.26 |
Ours (DP-SSD) | 300 | 49.22 | 74.77 | 84.69 | 80.54 | 70.27 | 63.57 |
SSD [12] | 512 | 27.75 | 76.83 | 84.74 | 84.32 | 76.64 | 61.62 |
Ours (SSD pooling) * | 512 | 26.79 | 75.87 | 85.80 | 81.26 | 77.23 | 59.18 |
Ours (SSD deconvolution) * | 512 | 25.83 | 75.15 | 85.42 | 77.33 | 76.91 | 60.94 |
Ours (DP-SSD) | 512 | 25.26 | 77.04 | 84.87 | 83.73 | 76.35 | 63.21 |
Method | Input | Settings | Default Boxes in Multiple Feature Maps | Total Boxes | mAP (%) |
---|---|---|---|---|---|
Ours (DP-SSD) | 300 | default | 4 6 6 6 4 4 | 8732 | 75.43 |
Ours (DP-SSD) | 300 | all small | 4 4 4 4 4 4 | 7760 | 75.10 |
Ours (DP-SSD) | 300 | all large | 6 6 6 6 6 6 | 11,640 | 73.62 |
Ours (DP-SSD) | 512 | default | 4 6 6 6 4 4 | 24,656 | 77.94 |
Ours (DP-SSD) | 512 | all small | 4 4 4 4 4 4 | 21,968 | 75.60 |
Ours (DP-SSD) | 512 | all large | 6 6 6 6 6 6 | 32,952 | 76.54 |
Method | Input | mAP (%) | FPS |
---|---|---|---|
Faster R-CNN [9] (VGG16) | - | 72.70 | 11.23 |
YOLO [11] | 448 × 448 | 62.52 | 42.34 |
YOLOv2 [24] | 416 × 416 | 73.82 | 64.65 |
YOLOv2 544 × 544 [24] | 544 × 544 | 75.96 | 39.14 |
SSD300 [12] | 300 × 300 | 74.18 | 58.78 |
SSD512 [12] | 512 × 512 | 76.83 | 27.75 |
DSSD (ResNet-101) [25] | 321 × 321 | 76.03 | 8.36 |
R-SSD300 [13] | 300 × 300 | 75.02 | 43.78 |
R-SSD512 [13] | 512 × 512 | 77.73 | 24.19 |
RefineDet320 [26] | 320 × 320 | 76.97 | 46.83 |
RefineDet512 [26] | 512 × 512 | 77.68 | 29.45 |
SIN [27] | - | 77.26 | 10.79 |
YOLOv3 [29] | 416 × 416 | 88.09 | 51.26 |
Ours (DP-SSD300) * | 300 × 300 | 75.43 | 50.47 |
Ours (DP-SSD512) * | 512 × 512 | 77.94 | 25.12 |
Method | mAP (%) | Runtime | ||
---|---|---|---|---|
Moderate | Easy | Hard | ||
THU CV-AI | 91.97 | 91.96 | 84.57 | 0.38 s |
DP-SSD512 (Ours) | 85.32 | 89.19 | 74.82 | 0.1 s |
R-SSD512 [13] | 84.71 | 88.96 | 72.68 | 0.1 s |
DP-SSD300 (Ours) | 83.86 | 87.37 | 70.78 | 0.07 s |
R-SSD300 [13] | 82.37 | 88.13 | 71.73 | 0.08 s |
A3DODWTDA (image) [54] | 81.54 | 76.21 | 66.85 | 0.8 s |
SDP+CRC (ft) [55] | 81.33 | 90.39 | 70.33 | 0.6 s |
MV3D (LIDAR) [56] | 79.76 | 89.80 | 78.61 | 0.24 s |
RefineNet [57,58] | 79.21 | 90.16 | 65.71 | 0.2 s |
Faster R-CNN [9] | 79.11 | 87.90 | 70.19 | 2 s |
spLBP [59] | 77.39 | 80.16 | 60.59 | 1.5 s |
Reinspect [60] | 76.65 | 88.36 | 66.56 | 2 s |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, F.; Li, C.; Yang, F. Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation. Sensors 2019, 19, 594. https://doi.org/10.3390/s19030594
Zhang F, Li C, Yang F. Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation. Sensors. 2019; 19(3):594. https://doi.org/10.3390/s19030594
Chicago/Turabian StyleZhang, Fukai, Ce Li, and Feng Yang. 2019. "Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation" Sensors 19, no. 3: 594. https://doi.org/10.3390/s19030594
APA StyleZhang, F., Li, C., & Yang, F. (2019). Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation. Sensors, 19(3), 594. https://doi.org/10.3390/s19030594