Loop Closure Detection in RGB-D SLAM by Utilizing Siamese ConvNet Features
Abstract
:1. Introduction
2. Materials and Methods
2.1. Siamese ConvNet Architecture for Loop Closure Detection
- Pre-train
- Softmax supervision:
- 1{a true statement} = 1 and
- 1{a flase statement} = 0.
- Contrastive supervision:
- Multi-task supervision:
Algorithm 1 learning strategy for loop closure detection |
Input:,
, initialized parameters , and . learning rate , , times of iteration N, batch size k. While do sample training samples from Update End while Output |
- Detecting loops;
2.2. RGB-D Fusion for Loop Closure Detection
- Early fusion:
- Mid-level fusion:
- Late fusion:
3. Results
3.1. Experiment Setup
3.2. Experiment Results on New College Dataset
3.3. Experiment Results on NYU Dataset
3.4. Computational Time
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Klingensmith, M.; Sirinivasa, S.S.; Kaess, M. Articulated Robot Motion for Simultaneous Localization and Mapping (ARM-SLAM). IEEE Robot. Autom. Lett. 2016, 1, 1156–1163. [Google Scholar] [CrossRef]
- Pan, H.Z.; Zhang, J.X. Extending RRT for Robot Motion Planning with SLAM. Appl. Mech. Mater. 2012, 151, 493–497. [Google Scholar] [CrossRef]
- Valencia, R.; Andrade-Cetto, J.; Porta, J.M. Path planning in belief space with pose SLAM. IEEE Int. Conf. Robot. Autom. 2011, 43, 78–83. [Google Scholar]
- Lee, K.H.; Hwang, J.N.; Okapal, G.; Pitton, J. Driving recorder based on-road pedestrian tracking using visual SLAM and Constrained Multiple-Kernel. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems, Qingdao, China, 8–11 October 2014; pp. 2629–2635. [Google Scholar]
- Panzieri, S.; Pascucci, F.; Ulivi, G. Vision based navigation using Kalman approach for SLAM. Int. Conf. Adv. Robot. 2001. [Google Scholar]
- Huang, G.P.; Mourikis, A.I.; Roumeliotis, S.I. Analysis and improvement of the consistency of extended Kalman filter-based SLAM. In Proceedings of the IEEE International Conference on Robotics & Automation, Pasadena, CA, USA, 19–23 May 2008; pp. 473–479. [Google Scholar]
- Montemerlo, M.; Thrun, S.; Roller, D.; Wegbreit, B. FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. Int. Jt. Conf. Artif. Intell. 2003, 133, 1151–1156. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints; Kluwer Academic Publishers: New York, NY, USA, 2004; Volume 60, pp. 91–110. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. Comput. Vis. Image Underst. 2006, 110, 404–417. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. IEEE Int. Conf. Comput. Vis. 2011, 58, 2564–2571. [Google Scholar]
- Oliva, A.; Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Cummins, M.; Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. Int. J. Robot. Res. 2008, 27, 647–665. [Google Scholar] [CrossRef]
- Filliat, D. A visual bag of words method for interactive qualitative localization and mapping. In Proceedings of the IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 3921–3926. [Google Scholar]
- Cummins, M.; Newman, P. Highly Scalable Appearance Only SLAM-FAB-MAP 2.0. In Robotics: Science and Systems (RSS); MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Kim, A.; Eustice, R.M. Combined visually and geometrically informative link hypothesis for pose-graph visual SLAM using bag-of-words. In Proceedings of the International Conference on Intelligent Robots & Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 1647–1654. [Google Scholar]
- Zhang, H.; Liu, Y.; Tan, J. Loop Closing Detection in RGB-D SLAM Combining Appearance and Geometric Constraints. Sensors 2015, 15, 14639–14660. [Google Scholar] [CrossRef] [Green Version]
- Perronnin, F.; Dance, C. Fisher Kernels on Visual Vocabularies for Image Categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 18–23 June 2007; pp. 1–8. [Google Scholar]
- Perronnin, F.; Sanchez, J.; Mensink, T. Improving the Fisher Kernel for Large-Scale Image Classification. In Proceedings of the European Conference on Computer Vision (ECCV), Heraklion, Greece, 5–11 September 2010; Volume 6314. [Google Scholar]
- Jegou, H.; Douze, M.; Schmid, C.; Perez, P. Aggregating Local Descriptors into a Compact Image Representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 3304–3311. [Google Scholar]
- Arandjelovic, R.; Zisserman, A. All about VLAD. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 1578–1585. [Google Scholar]
- Liu, Y.; Zhang, H. Indexing visual features: Real-time loop closure detection using a tree structure. In Proceedings of the IEEE International Conference on Robotics & Automation, Saint Paul, MN, USA, 14–18 May 2012; Volume 20, pp. 3613–3618. [Google Scholar]
- Korrapati, H.; Uzer, F.; Mezouar, Y. Hierarchical visual mapping with omnidirectional images. Int. Conf. Intell. Robot. Syst. 2013, 8215, 3684–3690. [Google Scholar]
- Singh, G.; Kosecka, J. Visual Loop Closing using Gist Descriptors in Manhattan World. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) Omnidirectional Robot Vision Workshop, Kobe, Japan, 3–7 May 2010. [Google Scholar]
- Sunderhauf, N.; Protzel, P. BRIEF-Gist-Closing the Loop by Simple Means. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA, 25–30 September 2011; pp. 1234–1241. [Google Scholar]
- Liu, Y.; Zhang, H. Visual Loop Closure Detection with a Compact Image Descriptor. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal, 7–12 October 2012; pp. 1051–1056. [Google Scholar]
- Hou, Y.; Zhang, H.; Zhou, S. Convolutional Neural Network-Based Image Representation for Visual Loop Closure Detection. In Proceedings of the 2015 IEEE International Conference on Information and Automation, Lijiang, China, 8–10 August 2015. [Google Scholar]
- Gao, X.; Zhang, T. Loop closure detection for visual slam systems using deep neural networks. In Proceedings of the Control Conference (CCC), 2015 34th Chinese, Hangzhou, China, 28–30 July 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
- Gao, X.; Zhang, T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton. Robot. 2017, 41, 1–18. [Google Scholar] [CrossRef]
- Xia, Y.; Li, J.; Qi, L.; Fan, H. Loop Closure Detection for Visual SLAM Using PCANet Features. In Proceedings of the International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
- Chan, T.H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. PCANet: A Simple Deep Learning Baseline for Image Classification. Image Process. IEEE Trans. 2015, 24, 5017–5032. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006. [Google Scholar]
- Krizhevsky, A.; Ilya, S.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Zeiler, M.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 818–833. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 1891–1898. [Google Scholar]
- Hu, Y.C.; Chang, H.; Nian, F.D.; Wang, Y.; Li, T. Dense crowd counting from still images with convolutional neural networks. J. Vis. Commun. Image Represent. 2016, 38, 530–539. [Google Scholar] [CrossRef]
- Zhang, C.; Li, H.; Wang, X.; Yang, X. Cross-scene crowd counting via deep convolutional neural networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Smirnov, E. North Atlantic right whale call detection with convolutional neural networks. In ICML Workshop on Machine Learning for Bioacoustics; Citeseer: Atlanta, GA, USA, 2013. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGBD images. Computer Vision–ECCV 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Xiao, J.; Torralba, A.; Oliva, A. Learning Deep Features for Scene Recognition using Places Database. In Advances in Neural Information Processing Systems (NIPS); Neural Information Processing Systems Foundation: San Diego, CA, USA, 2014; pp. 487–495. [Google Scholar]
- Scherer, S.A.; Kloss, A.; Zell, A. Loop closure detection using depth images. Eur. Conf. Mob. Robot. 2014, 10, 100–106. [Google Scholar]
Layer | Layer Type | Parameter |
---|---|---|
input1_1, input1_2 | input | Image size:227 × 227 × 3 |
conv1_1, conv1_2 | convolution | Filter size:11 × 11, Filter Num:96, Stride: 4 |
pool1_1, pool1_2 | pooling | Pooling method: Max, Kernel size:3 × 3, Stride: 2 |
conv2_1, conv2_2 | convolution | Filter size:5 × 5, Filter Num:256, Stride: 1 |
pool2_1, pool2_2 | pooling | Pooling method: Max, Kernel size:3 × 3, Stride: 2 |
conv3_1, conv3_2 | convolution | Filter size:3 × 3, Filter Num:384, Stride: 1 |
conv4_1, conv4_2 | convolution | Filter size:3 × 3, Filter Num:384, Stride: 1 |
conv5_1, conv5_2 | convolution | Filter size:3 × 3, Filter Num:256, Stride: 1 |
pool5_1, pool5_2 | pooling | Pooling method: Max, Kernel size:3 × 3, Stride: 2 |
full6_1, full_6_2 | fully connected | Neurons output:4096 |
full7_1, full_7_2 | fully connected | Neurons output:2048 |
softmax_1, softmax_2 | fully connected | Neurons output:205 |
feat1_1, feat1_2 | fully connected | Neurons output:1000 |
Layers | Conv1 | Pool1 | Conv2 | Pool2 | Conv3 | Conv4 | Conv5 | Pool5 |
---|---|---|---|---|---|---|---|---|
AP | 0.7444 | 0.7644 | 0.8111 | 0.7778 | 0.7222 | 0.7444 | 0.6667 | 0.6333 |
Feature | BoVW | GIST | CNN | SCNN | ||
---|---|---|---|---|---|---|
Time(s) | 1.517 | 0.524 | CPU | GPU | CPU | GPU |
0.142 | 0.021 | 0.208 | 0.029 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, G.; Li, X.; Zhang, X.; Xing, G.; Pan, F. Loop Closure Detection in RGB-D SLAM by Utilizing Siamese ConvNet Features. Appl. Sci. 2022, 12, 62. https://doi.org/10.3390/app12010062
Xu G, Li X, Zhang X, Xing G, Pan F. Loop Closure Detection in RGB-D SLAM by Utilizing Siamese ConvNet Features. Applied Sciences. 2022; 12(1):62. https://doi.org/10.3390/app12010062
Chicago/Turabian StyleXu, Gang, Xiang Li, Xingyu Zhang, Guangxin Xing, and Feng Pan. 2022. "Loop Closure Detection in RGB-D SLAM by Utilizing Siamese ConvNet Features" Applied Sciences 12, no. 1: 62. https://doi.org/10.3390/app12010062
APA StyleXu, G., Li, X., Zhang, X., Xing, G., & Pan, F. (2022). Loop Closure Detection in RGB-D SLAM by Utilizing Siamese ConvNet Features. Applied Sciences, 12(1), 62. https://doi.org/10.3390/app12010062