An Efficient Vehicle Localization Method by Using Monocular Vision
Abstract
:1. Introduction
- The combination of target detection and semantic segmentation to improve the precision of bounding boxes was proved.
- An exact relationship of bounding boxes and real location was deduced by imaging geometry and camera parameters.
- Utilization of deep learning and monocular vision technology for vehicles localization was studied.
2. Related Work
2.1. Laser SLAM
2.2. Visual Localization
3. Method
3.1. Vehicle Detection Improvement Method
3.1.1. Structure
3.1.2. Training
3.1.3. Inference
3.2. PRS Relationships
4. Experiments
4.1. Vehicle Detection
4.1.1. Dataset Implementation
4.1.2. Qualitative Analysis
4.1.3. Quantitative Analysis
4.2. Distance and Angle Measurement
4.3. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zhang, D.; Xu, Z.; Huang, Z.; Gutierrez, A.R.; Norris, T.B. Neural network based 3D tracking with a graphene transparent focal stack imaging system. Nat. Commun. 2021, 12, 2413. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D. SSD: Single Shot MultiBox Detector. Lect. Notes Comput. Sci. 2016, 9905, 21–37. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2017, arXiv:1804.02767. [Google Scholar]
- Sri, J.S.; Esther, R.P. LittleYOLO-SPP: A Delicate Real-Time Vehicle Detection Algorithm. Optik 2020, 225, 165–173. [Google Scholar]
- Paul, B.; Neil, M. Method for registration of 3-D shapes. Sens. Fusion IV Control. Paradig. Data Struct. 1992, 1611, 586–606. [Google Scholar]
- Aleksandr, S.; Dirk, H.; Sebastian, T. Generalized-icp. Robot. Sci. Syst. 2009, 2, 435. [Google Scholar]
- Kenji, K.; Masashi, Y.; Shuji, O.; Atsuhiko, B. Voxelized gicp for fast and accurate 3d point cloud registration. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11054–11059. [Google Scholar]
- Ji, Z.; Sanjiv, S. LOAM: Lidar Odometry and Mapping in Real-time. Robot. Sci. Syst. 2014, 2, 592. [Google Scholar]
- Tixiao, S.; Brendan, E. Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 7 January 2019; pp. 4758–4765. [Google Scholar]
- Ellon, M.; Pierrick, K.; Simon, L. ICP-based pose-graph SLAM. In Proceedings of the 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Lausanne, Switzerland, 23–27 October 2016; pp. 195–200. [Google Scholar]
- Frosi, M.; Matteo, M. ART-SLAM: Accurate Real-Time 6DoF LiDAR SLAM. arXiv 2021, arXiv:2109.05483. [Google Scholar]
- Scharstein, D.; Szeliski, R.; Zabih, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision 2001 (SMBV), Kauai, HI, USA, 9–10 December 2001; pp. 131–140. [Google Scholar]
- Navarro, J.; Duran, J.; Buades, A. Disparity adapted weighted aggregation for local stereo. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2249–2253. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Chang, J.; Chen, Y. Pyramid stereo matching network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- Yang, G.; Zhao, H.; Shi, J.; Deng, Z.; Jia, J. SegStereo: Exploiting Semantic Information for Disparity Estimation. Lect. Notes Comput. Sci. 2018, 11211, 660–676. [Google Scholar]
- Guo, X.; Yang, K.; Yang, W.; Wang, X.; Li, H. Group-wise correlation stereo network. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3268–3277. [Google Scholar]
- Young, T.; Bourke, M.; Zhou, X.; Toshitake, A. Ten-m2 is required for the generation of binocular visual circuits. J. Neurosci. 2013, 33, 12490–12509. [Google Scholar] [CrossRef] [Green Version]
- Lin, Y.; Yang, J.; Lv, Z.; Wei, W.; Song, H. A Self-Assessment Stereo Capture Model Applicable to the Internet of Things. Sensors 2015, 15, 20925–20944. [Google Scholar] [CrossRef] [Green Version]
- Perdices, E.; Cañas, J.M. SDVL: Efficient and Accurate Semi-Direct Visual Localization. Sensors 2019, 19, 302. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nagai, T.; Naruse, T.; Ikehara, M.; Kurematsu, A. HMM-based surface reconstruction from single images. In Proceedings of the International Conference on Image Processing, New York, NY, USA, 22–25 September 2002; p. 11. [Google Scholar]
- Saxena, A.; Chung, S.; Ng, A. 3-D depth reconstruction from a single still image. Int. J. Comput. Vis. 2008, 76, 53–69. [Google Scholar] [CrossRef] [Green Version]
- Zhuo, S.; Sim, T. On the recovery of depth from a single defocused image. Lect. Notes Artif. Intell. Lect. Notes 2009, 5702, 889–897. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. arXiv 2014, arXiv:1406.2283, 2014. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Xu, D.; Ricci, E.; Ouyang, W.; Wang, X.; Sebe, N. Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 161–167. [Google Scholar]
- Wang, P.; Shen, X.; Lin, Z.; Cohen, S.; Price, B.; Yuille, A. Towards unified depth and semantic prediction from a single image. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 July 2015; pp. 2800–2909. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tomban, F.; Navab, N. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar]
- Meng, C.; Bao, H.; Ma, Y.; Xu, X.; Li, Y. Visual Meterstick: Preceding Vehicle Ranging Using Monocular Vision Based on the Fitting Method. Symmetry 2019, 11, 1081. [Google Scholar] [CrossRef] [Green Version]
- Parmar, Y.; Natarajan, S.; Sobha, G. Deep Range: Deep-learning-based object detection and ranging in autonomous driving. Intell. Transp. Syst. 2019, 13, 1256–1264. [Google Scholar] [CrossRef]
- Nister, D.; Naroditsky, O.; Bergen, J. Visual odometry. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 July 2004; p. 1. [Google Scholar]
- Lin, H.Y.; Lin, J.H. A visual positioning system for vehicle or mobile robot navigation. IEICE Trans. Inf. Syst. 2006, 89, 2109–2116. [Google Scholar] [CrossRef]
- Aladem, M.; Rawashdeh, S. Lightweight visual odometry for autonomous mobile robots. Sensors 2018, 18, 2837. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Howard, A. Real-time stereo visual odometry for autonomous ground vehicles. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 3946–3952. [Google Scholar]
- Yan, L.; Dezheng, S. Robust RGB-D odometry using point and line features. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3934–3942. [Google Scholar]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 June 2014; pp. 15–22. [Google Scholar]
- Zhou, D.; Dai, Y.; Li, H. Ground-Plane-Based Absolute Scale Estimation for Monocular Visual Odometry. Trans. Intell. Transp. Syst. 2020, 21, 791–802. [Google Scholar] [CrossRef]
- Lin, H.Y.; Hsu, J.L. A Sparse Visual Odometry Technique Based on Pose Adjustment With Keyframe Matching. IEEE Sens. J. 2020, 21, 11810–11821. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision & Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. arXiv 2018, arXiv:1805.04687. [Google Scholar]
INDEX | Encode_X | (Conv + Maxpool)_X | Decode_X | ||||
---|---|---|---|---|---|---|---|
Model | X | Structure | Output Size | Structure | Output Size | Structure | Output Size |
SegSSD (300 × 300) | 1 | , 256 , 512 | concat , 256 | upsampling, s4 | |||
2 | crop concat , 128 | 75 | upsampling, s2 | 150 | |||
3 | , 64 stride 2 | concat , 64 | upsampling, s2 | ||||
final | , 32 , classes | ||||||
SegYOLO (512 × 512) | 1 | concat , 256 | upsampling, s4 | ||||
2 | concat , 128 | upsampling, s4 | |||||
3 | , 256 , 512 | concat , 64 | upsampling, s2 | ||||
final | , 32 , classes |
Model | [email protected] | Car AP | Van AP | Truck AP | Cyclist AP | Tram AP | Inference Time (ms) |
---|---|---|---|---|---|---|---|
SSD | 0.449 | 0.752 | 0.336 | 0.504 | 0.276 | 0.375 | 26.9 |
SegSSD | 0.452 | 0.762 | 0.344 | 0.498 | 0.273 | 0.391 | 30.4 |
YOLOv3-spp | 0.596 | 0.837 | 0.499 | 0.599 | 0.580 | 0.463 | 35.9 |
SegYOLO | 0.620 | 0.840 | 0.549 | 0.550 | 0.570 | 0.592 | 39.7 |
Category | Prior Height (m) | The Number of Annotated Samples | ||||
---|---|---|---|---|---|---|
Total | ≤40 m | >40 m | ≤20° | >20° | ||
Car | 1.53 | 2566 | 2087 | 479 | 1724 | 842 |
Van | 2.21 | 247 | 152 | 95 | 160 | 87 |
Truck | 3.25 | 93 | 40 | 53 | 71 | 22 |
Cyclist | 1.74 | 127 | 108 | 19 | 87 | 40 |
Tram | 3.53 | 56 | 30 | 26 | 46 | 10 |
Category | Model | Recall (%) | Average Depth Error Rate (%) | Average Angle Error (°) | |||
---|---|---|---|---|---|---|---|
≤40 m | >40 m | ≤40 m | >40 m | ≤20° | >20° | ||
Car | SSD | 0.57 | 0.47 | 9.15 | 12.23 | 2.34 | 2.57 |
SegSSD | 0.61 | 0.51 | 6.11 | 8.79 | 2.11 | 2.47 | |
YOLOv3-spp | 0.87 | 0.72 | 5.02 | 9.12 | 2.21 | 2.42 | |
SegYOLO | 0.88 | 0.75 | 3.05 | 5.52 | 1.96 | 2.38 | |
Van | SSD | 0.63 | 0.42 | 10.61 | 11.06 | 2.01 | 2.34 |
SegSSD | 0.67 | 0.50 | 7.63 | 9.28 | 1.85 | 2.16 | |
YOLOv3-spp | 0.85 | 0.78 | 7.36 | 10.96 | 1.58 | 1.87 | |
SegYOLO | 0.91 | 0.79 | 4.82 | 7.31 | 1.47 | 1.83 | |
Truck | SSD | 0.43 | 0.56 | 9.24 | 12.33 | 2.12 | 2.32 |
SegSSD | 0.47 | 0.60 | 7.92 | 10.69 | 1.94 | 2.12 | |
YOLOv3-spp | 0.71 | 0.79 | 8.93 | 9.25 | 1.26 | 1.92 | |
SegYOLO | 0.73 | 0.83 | 5.33 | 6.17 | 1.16 | 1.57 | |
Cyclist | SSD | 0.56 | 0.11 | 12.33 | 14.73 | 2.57 | 2.85 |
SegSSD | 0.59 | 0.16 | 8.26 | 11.56 | 2.17 | 2.45 | |
YOLOv3-spp | 0.82 | 0.19 | 6.76 | 10.04 | 2.38 | 2.46 | |
SegYOLO | 0.83 | 0.21 | 3.91 | 8.49 | 1.79 | 1.82 | |
Tram | SSD | 0.40 | 0.46 | 10.28 | 13.75 | 1.99 | 2.21 |
SegSSD | 0.47 | 0.54 | 5.60 | 7.61 | 1.55 | 1.43 | |
YOLOv3-spp | 0.57 | 0.70 | 4.95 | 8.81 | 1.52 | 1.22 | |
SegYOLO | 0.63 | 0.77 | 4.34 | 5.3 | 1.43 | 1.02 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, Y.; He, Y.; Yang, J.; Jin, W.; Liu, M. An Efficient Vehicle Localization Method by Using Monocular Vision. Electronics 2021, 10, 3092. https://doi.org/10.3390/electronics10243092
Liang Y, He Y, Yang J, Jin W, Liu M. An Efficient Vehicle Localization Method by Using Monocular Vision. Electronics. 2021; 10(24):3092. https://doi.org/10.3390/electronics10243092
Chicago/Turabian StyleLiang, Yonghui, Yuqing He, Junkai Yang, Weiqi Jin, and Mingqi Liu. 2021. "An Efficient Vehicle Localization Method by Using Monocular Vision" Electronics 10, no. 24: 3092. https://doi.org/10.3390/electronics10243092
APA StyleLiang, Y., He, Y., Yang, J., Jin, W., & Liu, M. (2021). An Efficient Vehicle Localization Method by Using Monocular Vision. Electronics, 10(24), 3092. https://doi.org/10.3390/electronics10243092