A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic Segmentation
Abstract
:1. Introduction
2. Key Technologies and Influencing Factors of VSLAM
2.1. Workflow of VSLAM
2.2. Front-End Visual Odometry
2.3. Back-End Optimization
2.4. Loop Closure Detection
2.5. Mapping
2.6. Evaluation Metrics
2.6.1. Absolute Trajectory Error (ATE)
2.6.2. Relative Pose Error (RPE)
2.6.3. Map Completeness
2.7. Influencing Factors
3. Principles and Methods of Semantic Segmentation
3.1. Definition and Evaluation Metrics
3.2. Encoder–Decoder Architectures
3.3. Context Modeling Modules
3.4. Training Strategies
3.5. Factors Affecting Performance
3.6. Representative Models
4. Applications of Semantic Segmentation in VSLAM
4.1. Visual Odometry
4.2. Loop Closure Detection
4.3. Environment Mapping
4.4. Model Action Mechanism
4.5. Experimental Comparison
4.6. Discussion
5. Conclusions
6. Future Trends and Prospects
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S. A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 2018, 7, 87–93. [Google Scholar] [CrossRef]
- Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding convolution for semantic segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: New York, NY, USA, 2018; pp. 1451–1460. [Google Scholar]
- Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Li, Z.; Hu, K. An overview on visual slam: From tradition to semantic. Remote. Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, Y.; Hu, L.; Wang, W.; Ge, G.; Tan, S. A Semantic Topology Graph to Detect Re-Localization and Loop Closure of the Visual Simultaneous Localization and Mapping System in a Dynamic Environment. Sensors 2023, 23, 8445. [Google Scholar] [CrossRef] [PubMed]
- Mo, J.; Islam, M.J.; Sattar, J.J.I.R.; Letters, A. Fast direct stereo visual SLAM. IEEE Robot. Autom. Lett. 2021, 7, 778–785. [Google Scholar] [CrossRef]
- Moreno, F.-A.; Blanco, J.-L.; Gonzalez-Jimenez, J. A constant-time SLAM back-end in the continuum between global mapping and submapping: Application to visual stereo SLAM. Int. J. Robot. Res. 2016, 35, 1036–1056. [Google Scholar] [CrossRef]
- Chen, S.; Zhou, B.; Jiang, C.; Xue, W.; Li, Q. A lidar/visual slam backend with loop closure detection and graph optimization. Remote Sens. 2021, 13, 2720. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-Scale Direct Monocular SLAM. In European Conference on Computer Vision—ECCV 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 834–849. [Google Scholar]
- Ortiz, L.E.; Cabrera, E.V.; Gonçalves, L.M. Depth data error modeling of the ZED 3D vision sensor from stereolabs. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 2018, 17, 0001–15. [Google Scholar] [CrossRef]
- Wang, K.; Ma, S.; Chen, J.; Ren, F.; Lu, J. Approaches, challenges, and applications for deep visual odometry: Toward complicated and emerging areas. IEEE Trans. Cogn. Dev. Syst. 2020, 14, 35–49. [Google Scholar] [CrossRef]
- Bailey, T.; Nieto, J.; Guivant, J.; Stevens, M.; Nebot, E. Consistency of the EKF-SLAM algorithm. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; IEEE: New York, NY, USA, 2006; pp. 3562–3568. [Google Scholar]
- Zhao, Y.; Vela, P.A. Good feature selection for least squares pose optimization in VO/VSLAM. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 1183–1189. [Google Scholar]
- Zhao, Y.; Vela, P.A. Good line cutting: Towards accurate pose tracking of line-assisted VO/VSLAM. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 516–531. [Google Scholar]
- Nüchter, A.; Lingemann, K.; Hertzberg, J.; Surmann, H. 6D SLAM—3D mapping outdoor environments. J. Field Robot. 2007, 24, 699–722. [Google Scholar] [CrossRef]
- Kejriwal, N.; Kumar, S.; Shibata, T.J.R.; Systems, A. High performance loop closure detection using bag of word pairs. Robot. Auton. Syst. 2016, 77, 55–65. [Google Scholar] [CrossRef]
- Shen, X.; Chen, L.; Hu, Z.; Fu, Y.; Qi, H.; Xiang, Y.; Wu, J. A Closed-loop Detection Algorithm for Online Updating of Bag-Of-Words Model. In Proceedings of the 2023 9th International Conference on Computing and Data Engineering, Association for Computing Machinery, Haikou, China, 6–8 January 2023; pp. 34–40. [Google Scholar]
- Xi, K.; He, J.; Hao, S.; Luo, L. SLAM Loop Detection Algorithm Based on Improved Bag-of-Words Model. In Proceedings of the 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Chengdu, China, 19–21 August 2022; IEEE: New York, NY, USA, 2022; pp. 683–689. [Google Scholar]
- Xu, L.; Feng, C.; Kamat, V.R.; Menassa, C. An occupancy grid mapping enhanced visual SLAM for real-time locating applications in indoor GPS-denied environments. Autom. Constr. 2019, 104, 230–245. [Google Scholar] [CrossRef]
- Blochliger, F.; Fehr, M.; Dymczyk, M.; Schneider, T.; Siegwart, R. Topomap: Topological mapping and navigation based on visual slam maps. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 3818–3825. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Sünderhauf, N.; Pham, T.T.; Latif, Y.; Milford, M.; Reid, I. Meaningful maps with object-oriented semantic mapping. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: New York, NY, USA, 2017; pp. 5079–5085. [Google Scholar]
- Safarova, L.; Abbyasov, B.; Tsoy, T.; Li, H.; Magid, E. Comparison of Monocular ROS-Based Visual SLAM Methods. In Proceedings of the International Conference on Interactive Collaborative Robotics, Fuzhou, China, 16–18 December 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 81–92. [Google Scholar]
- Deshmukh-Taskar, P.R.; Nicklas, T.A.; O’Neil, C.E.; Keast, D.R.; Radcliffe, J.D.; Cho, S. The relationship of breakfast skipping and type of breakfast consumption with nutrient intake and weight status in children and adolescents: The National Health and Nutrition Examination Survey 1999–2006. J. Am. Diet. Assoc. 2010, 110, 869–878. [Google Scholar] [CrossRef] [PubMed]
- Taketomi, T.; Uchiyama, H.; Ikeda, S. Applications, Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 1–11. [Google Scholar]
- Ben Ali, A.J.; Kouroshli, M.; Semenova, S.; Hashemifar, Z.S.; Ko, S.Y.; Dantu, K. Edge-SLAM: Edge-assisted visual simultaneous localization and mapping. ACM Trans. Embed. Comput. Syst. 2022, 22, 1–31. [Google Scholar] [CrossRef]
- Gao, F.; Moltu, S.B.; Vollan, E.R.; Shen, S.; Ludvigsen, M. Increased Autonomy and Situation Awareness for ROV Operations. In Proceedings of the Global Oceans 2020: Singapore–US Gulf Coast, Virtual, 5–14 October 2020; IEEE: New York, NY, USA, 2020; pp. 1–8. [Google Scholar]
- Vincent, J.; Labbé, M.; Lauzon, J.-S.; Grondin, F.; Comtois-Rivet, P.-M.; Michaud, F. Dynamic object tracking and masking for visual SLAM. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, Nevada, USA, 25–29 October 2020; IEEE: New York, NY, USA, 2020; pp. 4974–4979. [Google Scholar]
- Zhu, J.; Huang, H.; Li, B.; Wang, L. E-CRF: Embedded Conditional Random Field for Boundary-caused Class Weights Confusion in Semantic Segmentation. arXiv 2021, arXiv:2112.07106. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Sun, C.-Z.; Zhang, B.; Wang, J.-K.; Zhang, C.-S. A review of visual SLAM based on unmanned systems. In Proceedings of the 2021 2nd International Conference on Artificial Intelligence and Education (ICAIE), Dali, China, 18–20 June 2021; IEEE: New York, NY, USA, 2021; pp. 226–234. [Google Scholar]
- Chang, J.; Dong, N.; Li, D.; Qin, M. Triplet loss based metric learning for closed loop detection in VSLAM system. Expert Syst. Appl. 2021, 185, 115646. [Google Scholar] [CrossRef]
- Wang, Z.; Peng, Z.; Guan, Y.; Wu, L. Manifold regularization graph structure auto-encoder to detect loop closure for visual SLAM. IEEE Access 2019, 7, 59524–59538. [Google Scholar] [CrossRef]
- Saputra, M.R.U.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Wen, S.; Li, P.; Zhao, Y.; Zhang, H.; Sun, F.; Wang, Z. Semantic visual SLAM in dynamic environment. Auton. Robot. 2021, 45, 493–504. [Google Scholar] [CrossRef]
- Mingachev, E.; Lavrenov, R.; Tsoy, T.; Matsuno, F.; Svinin, M.; Suthakorn, J.; Magid, E. Comparison of ros-based monocular visual slam methods: Dso, ldso, orb-slam2 and dynaslam. In Proceedings of the International Conference on Interactive Collaborative Robotics, St. Petersburg, Russia, 7–9 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 222–233. [Google Scholar]
- Yanik, Ö.F.; Ilgin, H.A. Engineering, A comprehensive computational cost analysis for state-of-the-art visual slam methods for autonomous mapping. Commun. Fac. Sci. Univ. Ank. Ser. A2-A3 Phys. Sci. Eng. 2023, 65, 1–15. [Google Scholar]
- Chua, L.O.; Roska, T. The CNN paradigm. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 1993, 40, 147–156. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 7262–7272. [Google Scholar]
- Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Chen, H.; Qi, X.; Dou, Q.; Fu, C.-W.; Heng, P.-A. H-DenseUNet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 2018, 37, 2663–2674. [Google Scholar] [CrossRef] [PubMed]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. EEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Baheti, B.; Innani, S.; Gajre, S.; Talbar, S. Eff-unet: A novel architecture for semantic segmentation in unstructured environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 358–359. [Google Scholar]
- Kazerouni, I.A.; Dooly, G.; Toal, D. Ghost-UNet: An asymmetric encoder-decoder architecture for semantic segmentation from scratch. IEEE Access 2021, 9, 97457–97465. [Google Scholar] [CrossRef]
- Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Lafferty, J.; McCallum, A.; Pereira, F.C. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA, 28 June–1 July 2001. [Google Scholar]
- Combes, J.-M.; Grossmann, A.; Tchamitchian, P. Wavelets: Time-Frequency Methods and Phase Space. In Proceedings of the International Conference, Marseille, France, 14–18 December 1987; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Deng, Z.; Wang, B.; Xu, Y.; Xu, T.; Liu, C.; Zhu, Z. Multi-scale convolutional neural network with time-cognition for multi-step short-term load forecasting. IEEE Access 2019, 7, 88058–88071. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Zheng, S.; Lin, X.; Zhang, W.; He, B.; Jia, S.; Wang, P.; Jiang, H.; Shi, J.; Jia, F. Medicine, MDCC-Net: Multiscale double-channel convolution U-Net framework for colorectal tumor segmentation. Comput. Biol. Med. 2021, 130, 104183. [Google Scholar] [CrossRef] [PubMed]
- Gangopadhyay, S.; Zhai, A. CGBNet: A Deep Learning Framework for Compost Classification. IEEE Access 2022, 10, 90068–90078. [Google Scholar] [CrossRef]
- Mo, Y.; Wu, Y.; Yang, X.; Liu, F.; Liao, Y.J.N. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 2022, 493, 626–646. [Google Scholar] [CrossRef]
- Lee, M.; Kim, D.; Shim, H. Threshold matters in wsss: Manipulating the activation for the robust and accurate segmentation model against thresholds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4330–4339. [Google Scholar]
- Liu, S.-A.; Zhang, Y.; Qiu, Z.; Xie, H.; Zhang, Y.; Yao, T. Learning orthogonal prototypes for generalized few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11319–11328. [Google Scholar]
- Hoyer, L.; Dai, D.; Van Gool, L. Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9924–9935. [Google Scholar]
- Hou, Y.; Zhu, X.; Ma, Y.; Loy, C.C.; Li, Y. Point-to-voxel knowledge distillation for lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8479–8488. [Google Scholar]
- Zhu, Y.; Zhang, Z.; Wu, C.; Zhang, Z.; He, T.; Zhang, H.; Manmatha, R.; Li, M.; Smola, A. Improving semantic segmentation via efficient self-training. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 46, 1589–1602. [Google Scholar] [CrossRef] [PubMed]
- Targ, S.; Almeida, D.; Lyman, K. Resnet in resnet: Generalizing residual architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- Huang, Y.; Tang, Z.; Chen, D.; Su, K.; Chen, C. Batching soft IoU for training semantic segmentation networks. IEEE Signal Process. Lett. 2019, 27, 66–70. [Google Scholar] [CrossRef]
- Yan, S.; Zhou, J.; Xie, J.; Zhang, S.; He, X. An em framework for online incremental learning of semantic segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 3052–30600. [Google Scholar]
- Luo, Y.; Wang, Z.; Huang, Z.; Yang, Y.; Zhao, C. Coarse-to-fine annotation enrichment for semantic segmentation learning. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 237–246. [Google Scholar]
- Kenjic, D.; Baba, F.; Samardzija, D.; Kaprocki, Z. Utilization of the open source datasets for semantic segmentation in automotive vision. In Proceedings of the 2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin), Berlin, Germany, 8–11 September 2019; IEEE: New York, NY, USA, 2019; pp. 420–423. [Google Scholar]
- Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Via del Mar, Chile, 27–29 October 2020; IEEE: New York, NY, USA, 2020; pp. 1–7. [Google Scholar]
- Ke, T.-W.; Hwang, J.-J.; Liu, Z.; Yu, S.X. Adaptive affinity fields for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 587–602. [Google Scholar]
- Jiang, W.; Xie, Z.; Li, Y.; Liu, C.; Lu, H. Lrnnet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Li, Z.; Sun, Y.; Zhang, L.; Tang, J. CTNet: Context-based tandem network for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9904–9917. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Z.; Mao, Y.; Ding, Y.; Ren, P.; Zheng, N. Visual-based semantic SLAM with landmarks for large-scale outdoor environment. In Proceedings of the 2019 2nd China Symposium on Cognitive Computing and Hybrid Intelligence (CCHI), Xi’an, China, 21–22 September 2019; IEEE: New York, NY, USA, 2019; pp. 149–154. [Google Scholar]
- Qiao, S.; Wang, H.; Liu, C.; Shen, W.; Yuille, A. Micro-batch training with batch-channel normalization and weight standardization. arXiv 2019, arXiv:1903.10520. [Google Scholar]
- Yuan, J.; Liu, Y.; Shen, C.; Wang, Z.; Li, H. A simple baseline for semi-supervised semantic segmentation with strong data augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 8229–8238. [Google Scholar]
- Holder, C.J.; Shafique, M.J. On efficient real-time semantic segmentation: A survey. arXiv 2022, arXiv:2206.08605. [Google Scholar]
- Mukherjee, A.; Chakraborty, S.; Saha, S. Detection of loop closure in SLAM: A DeconvNet based approach. Appl. Soft Comput. 2019, 80, 650–656. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 2017, 19, 263–272. [Google Scholar] [CrossRef]
- Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Mollica, G.; Legittimo, M.; Dionigi, A.; Costante, G.; Valigi, P. Integrating Sparse Learning-Based Feature Detectors into Simultaneous Localization and Mapping—A Benchmark Study. Sensors 2023, 23, 2286. [Google Scholar] [CrossRef] [PubMed]
- Esparza, D.; Flores, G. The STDyn-SLAM: A stereo vision and semantic segmentation approach for VSLAM in dynamic outdoor environments. IEEE Access 2022, 10, 18201–18209. [Google Scholar] [CrossRef]
- Zhao, Y.; Vela, P.A. Good feature matching: Toward accurate, robust vo/vslam with low latency. IEEE Trans. Robot. 2020, 36, 657–675. [Google Scholar] [CrossRef]
- Runz, M.; Buffier, M.; Agapito, L. Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 16–20 October 2018; IEEE: New York, NY, USA, 2018; pp. 10–20. [Google Scholar]
- Zhang, J.; Henein, M.; Mahony, R.; Ila, V. VDO-SLAM: A visual dynamic object-aware SLAM system. arXiv 2020, arXiv:2005.11052. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Karkus, P.; Cai, S.; Hsu, D. Differentiable slam-net: Learning particle slam for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2815–2825. [Google Scholar]
- Cai, Y.; Ou, Y.; Qin, T. Improving SLAM techniques with integrated multi-sensor fusion for 3D reconstruction. Sensors 2024, 24, 2033. [Google Scholar] [CrossRef] [PubMed]
- Hou, J.; Yu, L.; Li, C.; Fei, S. Handheld 3D reconstruction based on closed-loop detection and nonlinear optimization. Meas. Sci. Technol. 2019, 31, 025401. [Google Scholar] [CrossRef]
- Lomas-Barrie, V.; Suarez-Espinoza, M.; Hernandez-Chavez, G.; Neme, A. A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points. Sensors 2023, 23, 8836. [Google Scholar] [CrossRef] [PubMed]
- Yang, K.; Wang, K.; Bergasa, L.M.; Romera, E.; Hu, W.; Sun, D.; Sun, J.; Cheng, R.; Chen, T.; López, E. Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sensors 2018, 18, 1506. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.-Y.; Liu, T.-A.; Lin, W.-Y. InertialNet: Inertial Measurement Learning for Simultaneous Localization and Mapping. Sensors 2023, 23, 9812. [Google Scholar] [CrossRef]
- Dubé, R.; Cramariuc, A.; Dugas, D.; Nieto, J.; Siegwart, R.; Cadena, C. SegMap: 3d segment mapping using data-driven descriptors. arXiv 2018, arXiv:1804.09557. [Google Scholar]
- Lv, K.; Zhang, Y.; Yu, Y.; Wang, Z.; Min, J.J.I.A. SIIS-SLAM: A vision SLAM based on sequential image instance segmentation. IEEE Access 2022, 11, 17430–17440. [Google Scholar] [CrossRef]
- Yu, S.; Fu, C.; Gostar, A.K.; Hu, M. A review on map-merging methods for typical map types in multiple-ground-robot SLAM solutions. Sensors 2020, 20, 6988. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Yu, W.; Liu, W.; Xu, H.; He, Y. A Lightweight Visual Simultaneous Localization and Mapping Method with a High Precision in Dynamic Scenes. Sensors 2023, 23, 9274. [Google Scholar] [CrossRef] [PubMed]
- Lee, Y.; Kim, M.; Ahn, J.; Park, J. Accurate Visual Simultaneous Localization and Mapping (SLAM) against Around View Monitor (AVM) Distortion Error Using Weighted Generalized Iterative Closest Point (GICP). Sensors 2023, 23, 7947. [Google Scholar] [CrossRef] [PubMed]
- McCormac, J.; Handa, A.; Davison, A.; Leutenegger, S. Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: New York, NY, USA, 2017; pp. 4628–4635. [Google Scholar]
- Narita, G.; Seno, T.; Ishikawa, T.; Kaji, Y. Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; IEEE: New York, NY, USA, 2019; pp. 4205–4212. [Google Scholar]
- Li, C.; Kang, Z.; Yang, J.; Li, F.; Wang, Y. Research on semantic-assisted SLAM in complex dynamic indoor environment. ISPRS Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2020, 43, 353–359. [Google Scholar] [CrossRef]
- Lai, T. A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-Based Semantic Scene Understanding Using Multi-Modal Sensor Fusion. Sensors 2022, 22, 7265. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Huang, K.; Li, J.; Li, X.; Zeng, Z.; Chang, L.; Zhou, J. AdaSG: A Lightweight Feature Point Matching Method Using Adaptive Descriptor with GNN for VSLAM. Sensors 2022, 22, 5992. [Google Scholar] [CrossRef]
- Yan, Y.; Hang, Y.; Hu, T.; Yu, H.; Lai, F. Visual SLAM in Long-Range Autonomous Parking Application Based on Instance-Aware Semantic Segmentation via Multi-Task Network Cascades and Metric Learning Scheme. SAE Int. J. Adv. Curr. Pract. Mobil. 2021, 3, 1357–1368. [Google Scholar] [CrossRef]
- Zarringhalam, A.; Ghidary, S.S.; Khorasani, A.M. Semi-supervised Vector-Quantization in Visual SLAM using HGCN. Int. J. Intell. Syst. 2024, 2024, 9992159. [Google Scholar] [CrossRef]
- Shen, T.; Luo, Z.; Zhou, L.; Deng, H.; Zhang, R.; Fang, T.; Quan, L. Beyond photometric loss for self-supervised ego-motion estimation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 6359–6365. [Google Scholar]
- Liu, R.; Zhang, J.; Chen, S.; Yang, T.; Arth, C. Real-time visual SLAM combining building models and GPS for mobile robot. J. Real-Time Image Process. 2021, 18, 419–429. [Google Scholar] [CrossRef]
- Xu, S.; Xiong, H.; Wu, Q.; Yao, T.; Wang, Z.; Wang, Z. Online Visual SLAM Adaptation against Catastrophic Forgetting with Cycle-Consistent Contrastive Learning. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: New York, NY, USA, 2023; pp. 6196–6202. [Google Scholar]
- Loo, S.Y.; Shakeri, M.; Tang, S.H.; Mashohor, S.; Zhang, H. Online mutual adaptation of deep depth prediction and visual slam. arXiv 2021, arXiv:2111.0409. [Google Scholar]
- Vargas, E.; Scona, R.; Willners, J.S.; Luczynski, T.; Cao, Y.; Wang, S.; Petillot, Y.R. Robust underwater visual SLAM fusing acoustic sensing. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xian, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 2140–2146. [Google Scholar]
Method | Standard Discussion |
---|---|
Volumetric | Volume integrity: Consider the volume information in the map, including the volume of a building, room, or other object. This can be assessed by comparing the volume in the map to the volume in the actual scene. |
Volume consistency: Check that the volume of different areas in the map is consistent. If the volume in the map changes too much, it may indicate that the map is incomplete. | |
Surface | Surface integrity: Focus on surface information in the map, including walls, floors, and ceilings. A complete map should accurately capture the geometry of these surfaces. |
Surface consistency: Check that surfaces in different areas of the map are consistent. If the surface shape in the map is inconsistent, it may indicate that there is a problem with the map. | |
Semantic classifications | Semantic integrity: Consider the semantic information in the map, including the categories of objects (e.g., chairs, tables, doors, etc.). A complete map should be able to mark these objects correctly. |
Semantic consistency: Check whether semantic labels in different areas of the map are consistent. If the semantic labels in the map are inconsistent, it may indicate that the map is incomplete or contains errors. |
Model | Features | Application Scenarios |
---|---|---|
U-Net [79] | Simple, efficient, easy to build. | U-Net can classify image pixels into different categories, including lanes, stop lines, speed bumps, and obstacles. |
Mask-RCNN [76] | Features: Powerful image-based instance level segmentation algorithm. | Mask-RCNN can segment instances of different semantic objects at the pixel level, which is suitable for dynamic environments. |
Pyramid Scene Parsing Network [45] | Considering the context relationship matching problem, showing a good segmentation effect. | PSPNet performs well in complex environments and can extract semantic information efficiently. |
Fully Convolutional Networks [80] | The traditional convolutional neural network is transformed into a full convolutional structure for pixel-level semantic segmentation. | FCN is widely used in semantic segmentation tasks, which can effectively extract semantic information from images. |
ERFNet [78] | Real-time segmentation with low computational costs while maintaining high accuracy. | ERFNet is particularly suitable for scenarios that require real-time performance, including autonomous driving and lane detection. |
Model | Semantic Segmentation Model | Features | Application Scenario |
---|---|---|---|
MaskFusion [84] | Mask R-CNN | Object-level RGB-D SLAM system for dynamic environments. Run in real time, able to track multiple moving objects and perform dense reconstruction. | Autonomous driving, online positioning at the vehicle end. |
VDO-SLAM [85] | FCN | Emphasis on dynamic object perception, without the need for an a priori model of the object. The motion estimation of rigid objects is realized by using semantic information. | Deployment in real-world applications involving highly dynamic and unstructured environments. |
ORB-SLAM3 [86] | Mask R-CNN | Real-time calculation of camera position and generate sparse 3D reconstructed maps. | Mobile robots, mobile phones, drones. |
SegMap [93] | Mask R-CNN | A pure static semantic octree map is constructed by using semantic information. | Construction, navigation. |
Semantic Fusion SLAM [98] | Res-Net | Real-time: Systems typically need to operate in a real-time environment and therefore require high frame rates and low latency. | High-precision map construction for autonomous vehicles. |
Sequence | ORB-SLAM | DynaSLAM | SLAM-Net | Semantic- Assisted SLAM | ORB-SLAM3 | SIIS-SLAM |
---|---|---|---|---|---|---|
KITTI 01 | 5.19 | 7.44 | 5.06 | 5.12 | 4.76 | 7.23 |
KITTI 02 | 23.45 | 26.53 | 22.36 | 22.63 | 19.68 | 23.36 |
KITTI 03 | 1.49 | 1.79 | 1.43 | 1.53 | 1.55 | 1.78 |
KITTI 04 | 1.58 | 0.99 | 1.56 | 1.49 | 1.45 | 0.93 |
KITTI 05 | 4.79 | 4.53 | 4.66 | 4.23 | 3.99 | 4.55 |
KITTI 06 | 13.01 | 14.79 | 12.36 | 11.36 | 10.77 | 12.88 |
KITTI 07 | 2.30 | 2.26 | 2.12 | 2.19 | 2.08 | 2.26 |
KITTI 08 | 47.69 | 41.23 | 46.23 | 45.26 | 43.16 | 39.31 |
KITTI 09 | 6.53 | 3.22 | 5.99 | 5.36 | 4.23 | 2.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; He, Y.; Li, J.; Yan, R.; Li, X.; Huang, H. A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic Segmentation. Sensors 2024, 24, 3388. https://doi.org/10.3390/s24113388
Liu X, He Y, Li J, Yan R, Li X, Huang H. A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic Segmentation. Sensors. 2024; 24(11):3388. https://doi.org/10.3390/s24113388
Chicago/Turabian StyleLiu, Xiwen, Yong He, Jue Li, Rui Yan, Xiaoyu Li, and Hui Huang. 2024. "A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic Segmentation" Sensors 24, no. 11: 3388. https://doi.org/10.3390/s24113388
APA StyleLiu, X., He, Y., Li, J., Yan, R., Li, X., & Huang, H. (2024). A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic Segmentation. Sensors, 24(11), 3388. https://doi.org/10.3390/s24113388