Robust RGB-D SLAM Using Point and Line Features for Low Textured Scene
Abstract
:1. Introduction
- We exploit both 3D and 2D line reprojection error with point reprojection error and build a novel cost function that utilizes more line information than the previous methods.
- We build a robust pose solver to solve the camera pose and apply the Chi-Square test to detect the wrong point and line correspondences.
- We maintain a sliding-window framework to correct the camera pose and the related point and line features.
- We evaluate the proposed system on a public dataset and real-world experiments. Compared with state-of-the-art RGB-D SLAM systems, the proposed system can yield same-level accuracy in a common scene, and higher accuracy and robustness in low textured scenes.
2. Related Works
2.1. RGB-D SLAM
2.2. Line-Based SLAM
3. System Overview and Notation
3.1. System Overview
- Frontend: The frontend has one thread for pose tracking. Firstly, point and line features are detected in the current frame and matched with the previous frame. Secondly, the 3D information of the matched features in the world coordinate is searched in the feature map. Thirdly, a robust pose solver is built based on point and line reprojection errors, and wrong matches are deleted by the Chi-Square test. Fourthly, the camera pose is outputted, and the 3D model is expanded. Finally, the keyframe decision is made based on the relative motion and matched features from the previous keyframe. The feature map will be updated if a new keyframe comes.
- Backend: The backend has two threads: local mapping and loop closing. In the local mapping thread, a sliding-window bundle adjustment is implemented to update the feature map when a new keyframe arrives. In the loop closing thread, firstly, the arrived keyframe is transferred to a word vector by the bag-of-word approach, and the loop candidate is detected by the word vector comparison. The loop candidate is then verified by a geometry test by a Random Sample Consensus (RANSAC) Perspective-n-Point (PnP). Finally, the loop closure is corrected by pose graph optimization.
3.2. Notations
3.2.1. Camera Pose Representation
3.2.2. Point Representation
3.2.3. Line Representation
4. Frontend
4.1. Feature Extraction and Matching
- Cross-check. FLANN [43] is applied twice to match line descriptors. In the first matching, the descriptors from the previous frame are set as the query set, and those from the current frame are set as the train set. We can find a matched descriptor DesT1 in the train set for a descriptor DesQ1 in the query set. By contrast, in the second matching, the descriptors from the previous frame are set as the train set, and those from the current frame are set as the query set. Again, we can find DesT2 and DesQ2. DesT1 and DesQ2 are from the previous frame, while DesT2 and DesQ1 are from the current frame. If DesT1 and DesQ2 are the same descriptor, then DesT2 and DesQ1 should also have the same descriptor index. Otherwise, they will be removed as the wrong feature match.
- Ratio-test. We assume that DesT1 is the matched feature of DesQ1 after cross-check, which means DesT1 has the smallest distance from DesQ1 among the train set. We argue that the ratio between DesT1 and the second smallest distance should be smaller than 0.75. Otherwise, the feature match DesT1 and DesQ1 is rejected.
- Geometry test. We then associate DesT1 and DesQ1 with two line segments based on their indexes. LSD provides the orientation, length, and endpoints of the line segments. If the line segments have highly different orientations, length or endpoints, we will discard the line match and not use it for pose estimation.
4.2. Robust Pose Solver
4.2.1. Infinite Impulse Response Filter
4.2.2. Two-Dimensional (2D) Point Reprojection Error
4.2.3. Three-Dimensional (3D) Line Reprojection Error
4.2.4. Two-Dimensional (2D) Line Reprojection Error
4.2.5. Novel Cost Function
- The initial camera pose is solved by the RANSAC PnP method before minimizing the function. Wrong matches among 2D features are filtered out by RANSAC. If the relative motion between the previous frame and the initial camera pose exceeds a threshold, the initial guess from RANSAC PnP will be rejected and is calculated again using a constant-velocity motion model. We directly use the implementation of RANSAC PnP from OpenCV [45]. After 100 iterations, the point feature will be removed if its reprojection error is larger than three pixels. We assume that the camera motion during a short period follows a constant-velocity assumption, so we can predict the camera pose of the current frame using the velocity and camera pose of the previous frame.
- For all the line matches, the 2D line reprojection error is calculated based on the initial camera pose and 3D line landmarks in the feature map using Equation (13). The line matches associated with large initial line reprojection errors are filtered out.
- The remaining feature matches are then sent to Equation (14) for optimization. After every four iterations, we filter out the matches that fail in the Chi-Square test and continue to optimize using the remaining matches.
5. Backend
5.1. Local Mapping Thread
5.2. Loop Closing Thread
5.2.1. Loop Closure Detection
- The difference between the indexes of the new keyframe and candidate keyframe exceeds 100. Therefore, a keyframe close to the new keyframe will not be selected.
- The candidate keyframe has the highest score.
- The highest score should exceed 0.15. Otherwise, we argue that the similarities between the two keyframes are insufficient.
- The scores of continuous three keyframes before the candidate keyframe exceed 0.12. Thus, an isolated keyframe with the highest score will be rejected, which may be caused by perceptual aliasing.
5.2.2. Loop Closure Verification
5.2.3. Pose Graph Optimization
6. Experiments and Discussion
6.1. Evaluation Matrix and Tool
6.2. TUM RGB-D Datasets
6.3. Room Experiment
6.4. Corridor Experiment
- The number of keyframes of the proposed system is lower than ORB-SLAM2. While the proposed system maintains a sliding-window with eight keyframes, ORB-SLAM2 builds a covisibility map for every keyframe, where the connected keyframes can be more than 20.
- While ORB-SLAM2 needs to build a new optimizer using g2o for every keyframe, the proposed system maintains the same optimizer for every keyframe. Therefore, the optimizer of the proposed system can converge faster, and it also saves the time to build the new optimizer.
7. Conclusions
- To comprehensively fuse 3D and 2D line features, we build a novel cost function combining point reprojection error and 3D and 2D line reprojection error.
- We build a robust pose solver to solve the novel function and recover camera pose from the point and line matches. We maintain a sliding-window framework to update the camera poses and the corresponding landmarks.
- Experiment results of the TUM dataset show that the proposed system can achieve the same-level accuracy in common scenes compared with the state-of-the-art system and can improve the robustness in low textured scenes. The room experiment shows the improvement of localization accuracy of the proposed system compared with point-based SLAM, and also indicates the benefit of fusing both 3D and 2D lines. The corridor experiment further reflects the improved mapping quality of the proposed system owing to 3D and 2D line fusion.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Chen, K.; Lai, Y.-K.; Hu, S.-M. 3D Indoor scene modeling from RGB-D data: A survey. Comput. Vis. Media 2015, 1, 267–278. [Google Scholar] [CrossRef] [Green Version]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar]
- Endres, F.; Hess, J.; Sturm, J.; Cremers, D.; Burgard, W. 3-D mapping with an RGB-D camera. IEEE Trans. Robot. 2013, 30, 177–187. [Google Scholar] [CrossRef]
- Labbé, M.; Michaud, F. RTAB-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Shi, J. Good features to track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
- Gomez-Ojeda, R.; Moreno, F.-A.; Zuñiga-Noël, D.; Scaramuzza, D.; Gonzalez-Jimenez, J. PL-SLAM: A stereo SLAM system through the combination of points and line segments. IEEE Trans. Robot. 2019, 35, 734–746. [Google Scholar] [CrossRef] [Green Version]
- Jeong, W.Y.; Lee, K.M. Visual SLAM with line and corner features. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 2570–2575. [Google Scholar]
- Lemaire, T.; Lacroix, S. Monocular-vision based SLAM using line segments. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 2791–2796. [Google Scholar]
- Pumarola, A.; Vakhitov, A.; Agudo, A.; Sanfeliu, A.; Moreno-Noguer, F. PL-SLAM: Real-time monocular visual SLAM with points and lines. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4503–4508. [Google Scholar]
- He, Y.; Zhao, J.; Guo, Y.; He, W.; Yuan, K. Pl-vio: Tightly-coupled monocular visual–inertial odometry using point and line features. Sensors 2018, 18, 1159. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Brasch, N.; Wang, Y.; Navab, N.; Tombari, F. Structure-SLAM: Low-drift monocular SLAM in indoor environments. arXiv 2020, arXiv:2008.01963. [Google Scholar]
- Zuo, X.; Xie, X.; Liu, Y.; Huang, G. Robust visual SLAM with point and line features. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1775–1782. [Google Scholar]
- Fu, Q.; Yu, H.; Lai, L.; Wang, J.; Peng, X.; Sun, W.; Sun, M. A robust RGB-D SLAM system with points and lines for low texture indoor environments. IEEE Sens. J. 2019, 19, 9908–9920. [Google Scholar] [CrossRef]
- Lu, Y.; Song, D. Robust RGB-D odometry using point and line features. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3934–3942. [Google Scholar]
- Zhou, Y.; Li, H.; Kneip, L. Canny-vo: Visual odometry with rgb-d cameras based on geometric 3-d—2-d edge alignment. IEEE Trans. Robot. 2018, 35, 184–199. [Google Scholar] [CrossRef]
- Kerl, C.; Sturm, J.; Cremers, D. Dense visual SLAM for RGB-D cameras. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 2100–2106. [Google Scholar]
- Newcombe, R.A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A.J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; pp. 127–136. [Google Scholar]
- Whelan, T.; Leutenegger, S.; Salas-Moreno, R.; Glocker, B.; Davison, A. ElasticFusion: Dense SLAM without a pose graph. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015. [Google Scholar]
- Whelan, T.; Kaess, M.; Fallon, M.; Johannsson, H.; Leonard, J.; McDonald, J. Kintinuous: Spatially Extended Kinectfusion. In Proceedings of the RSS Workshop on RGB-D: Advanced Reasoning With Depth Cameras, Sydney, NSW, Australia, 9–10 July 2012. [Google Scholar]
- Nießner, M.; Zollhöfer, M.; Izadi, S.; Stamminger, M. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 2013, 32, 1–11. [Google Scholar] [CrossRef]
- Schops, T.; Sattler, T.; Pollefeys, M. Bad slam: Bundle adjusted direct rgb-d slam. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 134–144. [Google Scholar]
- Schonberger, J.L.; Frahm, J.-M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
- Henry, P.; Krainin, M.; Herbst, E.; Ren, X.; Fox, D. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 2012, 31, 647–663. [Google Scholar] [CrossRef] [Green Version]
- Nister, D.; Stewenius, H. Scalable recognition with a vocabulary tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 2161–2168. [Google Scholar]
- Engelhard, N.; Endres, F.; Hess, J.; Sturm, J.; Burgard, W. Real-time 3D visual SLAM with a hand-held RGB-D camera. In Proceedings of the RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, Vasteras, Sweden, 6–8 April 2011; pp. 1–15. [Google Scholar]
- Sturm, J.; Burgard, W.; Cremers, D. Evaluating egomotion and structure-from-motion approaches using the TUM RGB-D benchmark. In Proceedings of the Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RJS International Conference on Intelligent Robot Systems (IROS), Vilamoura, Algarve, Portugal, 7–12 October 2012. [Google Scholar]
- Hornung, A.; Wurm, K.M.; Bennewitz, M.; Stachniss, C.; Burgard, W. OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 2013, 34, 189–206. [Google Scholar] [CrossRef] [Green Version]
- Tang, S.; Chen, W.; Wang, W.; Li, X.; Darwish, W.; Li, W.; Huang, Z.; Hu, H.; Guo, R. Geometric integration of hybrid correspondences for RGB-D unidirectional tracking. Sensors 2018, 18, 1385. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dai, A.; Nießner, M.; Zollhöfer, M.; Izadi, S.; Theobalt, C. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 2017, 36, 1. [Google Scholar] [CrossRef]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef]
- Bartoli, A.; Sturm, P. Structure-from-motion using lines: Representation, triangulation, and bundle adjustment. Comput. Vis. Image Underst. 2005, 100, 416–441. [Google Scholar] [CrossRef] [Green Version]
- Chen, S.; Wen, C.-Y.; Zou, Y.; Chen, W. Stereo visual inertial pose estimation based on feedforward-feedback loops. arXiv 2020, arXiv:2007.02250. [Google Scholar]
- Park, F.C.; Bobrow, J.E.; Ploen, S.R. A Lie group formulation of robot dynamics. Int. J. Robot. Res. 1995, 14, 609–618. [Google Scholar] [CrossRef]
- Harris, C.G.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988; pp. 147–151. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with An Application to Stereo Vision. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada, 24–28 August 1981; pp. 674–679. [Google Scholar]
- Von Gioi, R.G.; Jakubowicz, J.; Morel, J.-M.; Randall, G. LSD: A line segment detector. Image Process. Line 2012, 2, 35–55. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Koch, R. An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency. J. Vis. Commun. Image Represent. 2013, 24, 794–805. [Google Scholar] [CrossRef]
- Muja, M.; Lowe, D.G. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP 2009, 2, 2. [Google Scholar]
- Grisetti, G.; Kümmerle, R.; Strasdat, H.; Konolige, K. g2o: A general framework for (hyper) graph optimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 9–13. [Google Scholar]
- Bradski, G.; Kaehler, A. Learning OpenCV: Computer Vision with the OpenCV Library; O’Reilly Media, Inc.: Newton, MA, USA, 2008. [Google Scholar]
- Zhang, G.; Lee, J.H.; Lim, J.; Suh, I.H. Building a 3-D line-based map using stereo SLAM. IEEE Trans. Robot. 2015, 31, 1364–1377. [Google Scholar] [CrossRef]
- Gálvez-López, D.; Tardos, J.D. Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 2012, 28, 1188–1197. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
- Umeyama, S. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 376–380. [Google Scholar] [CrossRef] [Green Version]
- Qualisys, A.B. Qualisys Motion Capture Systems; Qualisys: Gothenburg, Sweden, 2008. [Google Scholar]
- Girardeau-Montaut, D. CloudCompare; GPL Software: Boston, MA, USA, 2016. [Google Scholar]
Sequence | Length (m) | Proposed | ORB-SLAM2 | DVO-SLAM | LSD-SLAM | DSO | PL-SLAM | Canny-VO |
---|---|---|---|---|---|---|---|---|
fr1_desk | 9.3 | 4.6 | 2.1 | 2.4 | 10.7 | - | 3.0 | 4.4 |
fr1_floor | 12.6 | 3.2 | 6.1 | X | 38.1 | 5.5 | 3.0 | 2.1 |
fr2_desk | 18.9 | 4.5 | 1.7 | 1.7 | 4.5 | - | 1.4 | 3.7 |
fr3_long_office | 21.5 | 6.5 | 4.1 | 3.5 | 38.5 | 14.4 | - | 8.5 |
fr3_nstr_tex_far | 4.3 | 7.0 | 5.6 | 2.8 | 18.3 | 4.8 | - | 2.6 |
fr3_nstr_tex_near | 13.5 | 3.3 | 3.5 | 7.3 | 7.5 | 3.6 | 3.5 | 9.0 |
fr3_str_ntex_far | 4.4 | 9.0 | - | 3.9 | 14.6 | 18.4 | - | 3.1 |
fr3_str_ntex_near | 3.8 | 3.7 | - | 2.1 | - | - | - | - |
fr3_str_tex_far | 5.9 | 1.3 | 1.3 | 3.9 | 8.0 | 7.9 | 0.9 | 1.3 |
fr3_str_tex_near | 5.1 | 1.2 | 1.4 | 4.1 | - | 24.1 | 2.6 | 2.5 |
Sequence | Length (m) | ORB-SLAM2 | Point | Point + 3D Line | Point + 2D Line | Proposed |
---|---|---|---|---|---|---|
room | 28.4 | - | 14.9 | 9.3 | 9.7 | 7.2 |
Sequence | Length (m) | ORB-SLAM2 | Point | Point+3D Line | Point + 2D Line | Proposed |
---|---|---|---|---|---|---|
corridor | 60.8 | 27.2 | 30.1 | 23.8 | 25.1 | 21.4 |
Thread | Part | Proposed | ORB-SLAM2 |
---|---|---|---|
Tracking | Optical Flow | 7.5 | |
Line Extraction | 42.1 | ||
Robust Pose Solver | 2.3 | ||
Shi-Tomasi Detection | 7.6 | ||
IIR Filter | 1.0 | ||
Total | 60.4 | 32.1 | |
Local Mapping | Local BA | 115.3 | 186.3 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, Y.; Eldemiry, A.; Li, Y.; Chen, W. Robust RGB-D SLAM Using Point and Line Features for Low Textured Scene. Sensors 2020, 20, 4984. https://doi.org/10.3390/s20174984
Zou Y, Eldemiry A, Li Y, Chen W. Robust RGB-D SLAM Using Point and Line Features for Low Textured Scene. Sensors. 2020; 20(17):4984. https://doi.org/10.3390/s20174984
Chicago/Turabian StyleZou, Yajing, Amr Eldemiry, Yaxin Li, and Wu Chen. 2020. "Robust RGB-D SLAM Using Point and Line Features for Low Textured Scene" Sensors 20, no. 17: 4984. https://doi.org/10.3390/s20174984
APA StyleZou, Y., Eldemiry, A., Li, Y., & Chen, W. (2020). Robust RGB-D SLAM Using Point and Line Features for Low Textured Scene. Sensors, 20(17), 4984. https://doi.org/10.3390/s20174984