Stereo SLAM in Dynamic Environments Using Semantic Segmentation
Abstract
:1. Introduction
- A new stereo SLAM system based on the ORB-SLAM2 framework combined with a deep learning method is put forward to decrease the impact of dynamic objects on the camera pose and trajectory estimation. The approach of the semantic segmentation network plays a role in the data preprocessing stage to filter out the feature expression of moving objects.
- A novel motion object detection method is presented to reduce the influence of moving targets on the camera pose and trajectory estimation, which calculates the likelihood of each keyframe point belonging to the dynamic content and distinguishes between dynamic and static goals in scenarios.
- The semantic segmentation neural network ENet [12], which is appropriate for city spaces, is first utilized (to the best of our knowledge) to enhance the performance of the visual SLAM system, which makes our system more robust and practical in high-dynamic and complex city streets, and this has practical engineering applications to a certain extent.
2. Related Work
3. System Description
3.1. Overview
3.2. Semantic Segmentation
3.3. Moving Object Detection
3.4. Outliers Removal
4. Experiments
4.1. Evaluation Using KITTI Benchmark Dataset
4.2. Time Analysis
4.3. Evaluation Test in Real Environment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Davison, A.; Reid, I.; Molton, N.; Stasse, O. MonoSLAM: Realtime single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Neira, J.; Davison, A.; Leonard, J. Guest editorial, special issue in visual slam. IEEE Trans. Robot. 2008, 24, 929–931. [Google Scholar] [CrossRef]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007. [Google Scholar]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China, 31 May–7 June 2014. [Google Scholar]
- Engel, J.; Schops, T.; Cremers, D. Lsd-slam: Large-scale direct monocular slam. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 834–849. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Campos, C.; Elvira, R.; Rodriguez, J.; Montiel, J.; Tardós, J. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Panchpor, A.A.; Shue, S.; Conrad, J.M. A survey of methods for mobile robot localization and mapping in dynamic indoor environments. In Proceedings of the 2018 Conference on Signal Processing and Communication Engineering Systems, Vijayawada, India, 4–5 January 2018. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Bescos, B.; Facil, J.M.; Civera, J.; Neira, J. Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef] [Green Version]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Davison, A. Mobile Robot Navigation Using Active Vision. Ph.D. Dissertation, University Oxford, Oxford, UK, 1998. [Google Scholar]
- Davison, A.J.; Murray, D.W. Simultaneous localization and map building using active vision. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 865–880. [Google Scholar] [CrossRef] [Green Version]
- Davison, A.; Kita, N. 3-D simultaneous localisation and map building using active vision for a robot moving on undulating terrain. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. 384–391. [Google Scholar]
- Iocchi, L.; Konolige, K.; Bajracharya, M. Visually realistic mapping of a planar environment with stereo. In International Symposium on Experimental Robotics; Springer: Berlin/Heidelberg, Germany, 2001; pp. 521–532. [Google Scholar]
- Se, S.; Lowe, D.; Little, J. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. Int. J. Robot. Res. 2002, 21, 735–758. [Google Scholar] [CrossRef]
- Jung, I.; Lacroix, S. High resolution terrain mapping using low altitude aerial stereo imagery. In Proceedings of the 9th International Conference on Computer Vision, Nice, France, 14–17 October 2003; Volume 2, pp. 946–951. [Google Scholar]
- Hygounenc, E.; Jung, I.; Soueres, P.; Lacroix, S. The autonomous blimp project of LAAS-CNRS: Achievements in flight control and terrain mapping. Int. J. Robot. Res. 2004, 23, 473–511. [Google Scholar] [CrossRef]
- Saez, J.; Escolano, F.; Penalver, A. First Steps towards Stereobased 6DOF SLAM for the visually impaired. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-Workshops, San Diego, CA, USA, 21–23 September 2005; Volume 3, p. 23. [Google Scholar]
- Sim, R.; Elinas, P.; Griffin, M.; Little, J. Vision-based SLAM using the Rao–Blackwellised particle filter. In Proceedings of the International Joint Conference on Artificial Intelligence-Workshop Reason, Uncertainty Robot, Edinburgh, UK, 30 July–5 August 2005; pp. 9–16. [Google Scholar]
- Sim, R.; Elinas, P.; Little, J. A study of the Rao–Blackwellised particle filter for efficient and accurate vision-based SLAM. Int. J. Comput. Vis. 2007, 74, 303–318. [Google Scholar] [CrossRef] [Green Version]
- Paz, L.M.; PiniÉs, P.; TardÓs, J.D.; Neira, J. Large-Scale 6-DOF SLAM With Stereo-in-Hand. IEEE Trans. Robot. 2008, 24, 946–957. [Google Scholar] [CrossRef] [Green Version]
- Lin, K.H.; Wang, C.C. Stereo-based simultaneous localization, mapping and moving object tracking. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010. [Google Scholar]
- Kawewong, A.; Tongprasit, N.; Tangruamsub, S.; Hasegawa, O. Online and Incremental Appearance-based SLAM in Highly Dynamic Environments. Int. J. Robot. Res. 2011, 30, 33–55. [Google Scholar] [CrossRef]
- Alcantarilla, P.F.; Yebes, J.J.; Almazán, J.; Bergasa, L.M. On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In Proceedings of the IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 1290–1297. [Google Scholar]
- Kaess, M.; Ni, K.; Dellaert, F. Flow separation for fast and robust stereo odometry. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3539–3544. [Google Scholar]
- Karlsson, N.; Di Bernardo, E.; Ostrowski, J.; Goncalves, L.; Pirjanian, P.; Munich, M.E. The vSLAM algorithm for robust localization and mapping. In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 24–29. [Google Scholar]
- Zou, D.; Tan, P. CoSLAM: Collaborative Visual SLAM in Dynamic Environments. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 354–366. [Google Scholar] [CrossRef]
- Fan, Y.; Han, H.; Tang, Y.; Zhi, T. Dynamic objects elimination in SLAM based on image fusion. Pattern Recognit. Lett. 2019, 127, 191–201. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Sun, T.; Sun, Y.; Liu, M.; Yeung, D.Y. Movable-Object-Aware Visual SLAM via Weakly Supervised Semantic Segmentation. arXiv 2019, arXiv:1906.03629. [Google Scholar]
- Balntas, V.; Riba, E.; Ponsa, D.; Mikolajczyk, K. Learning local feature descriptors with triplets and shallow convolutional neural networks. Br. Mach. Vis. Conf. 2016, 1, 3. [Google Scholar]
- Kang, R.; Shi, J.; Li, X.; Liu, Y. DF-SLAM: A Deep-Learning Enhanced Visual SLAM System based on Deep Local Features. arXiv 2019, arXiv:1901.07223. [Google Scholar]
- Li, P.; Qin, T.; Shen, S. Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Ai, Y.; Rui, T.; Lu, M.; Fu, L.; Liu, S.; Wang, S. DDL-SLAM: A robust RGB-D SLAM in dynamic environments combined with Deep Learning. IEEE Access 2020, 8, 162335–162342. [Google Scholar] [CrossRef]
- Chiranjeevi, P.; Sengupta, S. Moving object detection in the presence of dynamic backgrounds using intensity and textural features. J. Electron. Imaging 2011, 20, 3009. [Google Scholar] [CrossRef]
- Wang, Y.T.; Chi, C.T.; Hung, S.K. Robot Visual Simultaneous Localization and Mapping in Dynamic Environments. J. Comput. Theor. Nanosci. 2012, 8, 229–234. [Google Scholar] [CrossRef]
- Zeng, Z.; Zhou, Y.; Odest, C.J.; Karthik, D. Semantic Mapping with Simultaneous Object Detection and Localization. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018; pp. 911–918. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (fpfh) for 3D registration. In Proceedings of the Robotics and Automation, IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
- Hu, J.; Fang, H.; Yang, Q.; Zha, W. MOD-SLAM: Visual SLAM with Moving Object Detection in Dynamic Environments. In Proceedings of the 40th Chinese Control Conference, Shanghai, China, 26–28 July 2021; pp. 4302–4307. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, Aichi, Japan, 23–29 August 1997. [Google Scholar]
- Xie, W.; Liu, P.X.; Zheng, M. Moving Object Segmentation and Detection for Robust RGBD-SLAM in Dynamic Environments. IEEE Trans. Instrum. Meas. 2021, 70, 5001008. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.X.; Liu, X.J.; Xie, F.G.; Yang, Y.; Wei, Q.; Qiao, F. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhao, Y.; Shi, H.; Chen, X.; Li, X.; Wang, C. An overview of object detection and tracking. In Proceedings of the 2015 IEEE International Conference on Information and Automation, Lijiang, China, 8–10 August 2015. [Google Scholar]
- Shi, J.; Tomasi, C. Good Features to Track. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hilton Head, SC, USA, 13–15 June 2000; p. 600. [Google Scholar]
- Sumikura, S.; Shibuya, M.; Sakurada, K. OpenVSLAM: A Versatile Visual SLAM Framework. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2292–2295. [Google Scholar]
ORB-SLAM2 | OpenVSLAM | OurSLAM | |
---|---|---|---|
Mean [ms/frame] | 65.35 | 55.46 | 68.56 |
Median [ms/frame] | 66.43 | 56.25 | 69.21 |
Item | ORB-SLAM2 | OurSLAM | Improvements | |||
---|---|---|---|---|---|---|
RMSE | STD | RMSE | STD | RMSE | STD | |
ATE [m/s] | 20.387 | 10.021 | 4.632 | 2.987 | 77.28% | 70.19% |
RPE [°/s] | 0.203 | 0.324 | 0.085 | 0.039 | 58.13% | 87.96% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ai, Y.; Sun, Q.; Xi, Z.; Li, N.; Dong, J.; Wang, X. Stereo SLAM in Dynamic Environments Using Semantic Segmentation. Electronics 2023, 12, 3112. https://doi.org/10.3390/electronics12143112
Ai Y, Sun Q, Xi Z, Li N, Dong J, Wang X. Stereo SLAM in Dynamic Environments Using Semantic Segmentation. Electronics. 2023; 12(14):3112. https://doi.org/10.3390/electronics12143112
Chicago/Turabian StyleAi, Yongbao, Qianchong Sun, Zhipeng Xi, Na Li, Jianmeng Dong, and Xiang Wang. 2023. "Stereo SLAM in Dynamic Environments Using Semantic Segmentation" Electronics 12, no. 14: 3112. https://doi.org/10.3390/electronics12143112
APA StyleAi, Y., Sun, Q., Xi, Z., Li, N., Dong, J., & Wang, X. (2023). Stereo SLAM in Dynamic Environments Using Semantic Segmentation. Electronics, 12(14), 3112. https://doi.org/10.3390/electronics12143112