DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization
Abstract
:1. Introduction
- We develop a real-time dense monocular VI SLAM system based on the tightly-coupled graph optimization by considering visual, inertial, and depth residuals. The proposed VI SLAM system can track the ego-motion of the sensor body and build the local dense map with a metric scale in real-time;
- A light-weight monocular depth estimation and completion network combined with CVAE and attention mechanisms is proposed. The network can encode and decode a full-resolution dense depth map from a compact representation (i.e., the depth code) of the dense depth image. The network predicts uncertainty-aware dense depth maps in real-time accurately thanks to a light-weight architecture and extremely low-dimensional depth representation (8D depth codes in our system). At the same time, the network shows robustness and generalization capability because of attention modules, leading to improved deep feature refinement and fusion;
- In order to achieve more consistent local dense mapping and make use of prior geometry information provided by the depth predictions, particularly in featureless environments, we propose a robust point sampling strategy in textureless or featureless cases for depth residuals. This strategy is based on the spatial distribution of feature points. The sampling strategy can provide more constraints from featureless image regions to guide the tightly-coupled optimization for deriving consistent and complete local dense maps, even in the cases that the feature-based sparse SLAM system may lose or drift.
2. Related Work
2.1. Monocular Visual-Inertial SLAM
2.2. Light-Weight Depth Estimation and Completion Networks
2.3. Real-Time Dense SLAM with the Compact Implicit Optimizable Representation
3. System Overview
3.1. Monocular Visual-Inertial SLAM with Sparse Maps
3.2. Light-Weight Depth Completion Networks with Attention Mechanisms
3.3. Tightly-Coupled Graph Optimization with Depth Residuals
4. Light-Weight Depth Estimation and Completion Network with Attention Mechanisms
4.1. Deep Feature Extraction with CBAM Attention Modules
4.2. Deep Feature Fusion with SK-like Attention Modules
4.3. Loss Functions
5. Tightly-Coupled Graph Optimization with Depth Residuals
5.1. Visual and Inertial Residuals
5.2. Depth Residuals: Geometry and Consistency
5.3. Tightly-Coupled Graph Optimization with the Numerical Parallel Jacobian Computation
6. Experimental Results
6.1. Evaluation of Depth Estimation on NYU Depth V2 Dataset
- Scale: input RGB/gray images are scaled by a random scale and corresponding input depth images are divided by s;
- Rotation: input RGB/gray and depth images are rotated with a slightly larger angle in degrees to simulate the dynamic motion of sensor bodies;
- Color Jitter: input RGB/gray image brightness, contrast, and saturation are adjusted by a random factor ;
- Horizontal Flips: input RGB/gray and depth images are synchronously flipped in the horizontal side by a probability of 50%.
6.2. Evaluation of Depth Completion on EuRoC Dataset
6.3. Evaluation of Trajectory Accuracy on EuRoC Dataset
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bailey, T.; Nieto, J.; Guivant, J.; Stevens, M.; Nebot, E. Consistency of the EKF-SLAM algorithm. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 3562–3568. [Google Scholar]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar]
- Newcombe, R.A.; Lovegrove, S.J.; Davison, A.J. DTAM: Dense tracking and mapping in real-time. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2320–2327. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Chen, D.; Wang, N.; Xu, R.; Xie, W.; Bao, H.; Zhang, G. RNIN-VIO: Robust Neural Inertial Navigation Aided Visual-Inertial Odometry in Challenging Scenes. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Bari, Italy, 4–8 October 2021; pp. 275–283. [Google Scholar]
- Gurturk, M.; Yusefi, A.; Aslan, M.F.; Soycan, M.; Durdu, A.; Masiero, A. The YTU dataset and recurrent neural network based visual-inertial odometry. Measurement 2021, 184, 109878. [Google Scholar] [CrossRef]
- Aslan, M.F.; Durdu, A.; Sabanci, K. Visual-Inertial Image-Odometry (VIIONet): A Gaussian Process Regression-Based Deep Architecture Proposal for UAV Pose Estimation. Measurement 2022, 194, 111030. [Google Scholar] [CrossRef]
- Yusefi, A.; Durdu, A.; Aslan, M.F.; Sungur, C. LSTM and Filter Based Comparison Analysis for Indoor Global Localization in UAVs. IEEE Access 2021, 9, 10054–10069. [Google Scholar] [CrossRef]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 834–849. [Google Scholar]
- Newcombe, R.A.; Fox, D.; Seitz, S.M. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 343–352. [Google Scholar]
- Whelan, T.; Leutenegger, S.; Salas-Moreno, R.; Glocker, B.; Davison, A. ElasticFusion: Dense SLAM without a Pose Graph. 2015. Available online: https://spiral.imperial.ac.uk/bitstream/10044/1/23438/2/whelan2015rss.pdf (accessed on 18 March 2022).
- Bloesch, M.; Czarnowski, J.; Clark, R.; Leutenegger, S.; Davison, A.J. CodeSLAM—Learning a compact, optimisable representation for dense visual SLAM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2560–2568. [Google Scholar]
- Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. In Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Czarnowski, J.; Laidlow, T.; Clark, R.; Davison, A.J. Deepfactors: Real-time probabilistic dense monocular slam. IEEE Robot. Autom. Lett. 2020, 5, 721–728. [Google Scholar] [CrossRef] [Green Version]
- Zuo, X.; Merrill, N.; Li, W.; Liu, Y.; Pollefeys, M.; Huang, G. CodeVIO: Visual-inertial odometry with learned optimizable dense depth. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 14382–14388. [Google Scholar]
- Matsuki, H.; Scona, R.; Czarnowski, J.; Davison, A.J. CodeMapping: Real-Time Dense Mapping for Sparse SLAM using Compact Scene Representations. IEEE Robot. Autom. Lett. 2021, 6, 7105–7112. [Google Scholar] [CrossRef]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
- Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1623–1637. [Google Scholar] [CrossRef] [PubMed]
- Wofk, D.; Ma, F.; Yang, T.J.; Karaman, S.; Sze, V. Fastdepth: Fast monocular depth estimation on embedded systems. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6101–6108. [Google Scholar]
- Ma, F.; Karaman, S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In Proceedings of the 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 4796–4803. [Google Scholar]
- Tang, J.; Tian, F.P.; Feng, W.; Li, J.; Tan, P. Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 2020, 30, 1116–1129. [Google Scholar] [CrossRef] [PubMed]
- Lupton, T.; Sukkarieh, S. Visual-inertial-aided navigation for high-dynamic motion in built environments without initial conditions. IEEE Trans. Robot. 2011, 28, 61–76. [Google Scholar] [CrossRef]
- Forster, C.; Carlone, L.; Dellaert, F.; Scaramuzza, D. On-manifold preintegration for real-time visual-inertial odometry. IEEE Trans. Robot. 2016, 33, 1–21. [Google Scholar] [CrossRef] [Green Version]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 510–519. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef] [Green Version]
- Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics; Springer: New York, NY, USA, 1992; pp. 492–518. [Google Scholar]
- Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Numerical Recipes 3rd Edition: The Art of Scientific Computing; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
- Lourakis, M.I.; Argyros, A.A. SBA: A software package for generic sparse bundle adjustment. ACM Trans. Math. Softw. (TOMS) 2009, 36, 1–30. [Google Scholar] [CrossRef]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 430–443. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Geneva, P.; Eckenhoff, K.; Lee, W.; Yang, Y.; Huang, G. Openvins: A research platform for visual-inertial estimation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 4666–4672. [Google Scholar]
Modalities | Methods | RMSE ↓ | iRMSE ↓ | Abs Rel ↓ | MAE ↓ | MAPE ↓ | Time ↓ | |||
---|---|---|---|---|---|---|---|---|---|---|
RGB | FastDepth 1 | 0.598 | 0.098 | 0.161 | 0.443 | 0.139 | 76.6% | 93.9% | 98.2% | 2.498 |
CodeVIO 2 | 0.545 | 0.090 | - 3 | - | 0.130 | 83.0% | 95.1% | 98.5% | 3.713 | |
CodeVIO (32D) 4 | 0.557 | 0.099 | 0.127 | 0.398 | 0.137 | 83.3% | 92.0% | 96.3% | 3.932 | |
CodeVIO (8D) 5 | 0.582 | 0.110 | 0.144 | 0.401 | 0.150 | 81.8% | 93.6% | 96.2% | 3.480 | |
Ours | 0.498 | 0.075 | 0.119 | 0.328 | 0.121 | 86.8% | 96.6% | 98.7% | 3.764 | |
G | CodeVIO | 0.535 | 0.089 | - | - | 0.133 | 83.3% | 95.3% | 98.6% | 3.796 |
CodeVIO (32D) | 0.551 | 0.085 | 0.136 | 0.395 | 0.131 | 80.4% | 92.3% | 95.9% | 3.782 | |
CodeVIO (8D) | 0.587 | 0.116 | 0.147 | 0.397 | 0.141 | 83.2% | 90.4% | 94.5% | 3.237 | |
Ours | 0.515 | 0.079 | 0.133 | 0.394 | 0.130 | 85.2% | 96.1% | 98.8% | 3.689 | |
RGB-S | CodeVIO | 0.316 | 0.052 | - | - | 0.066 | 94.4% | 98.6% | 99.6% | 3.745 |
CodeVIO (32D) | 0.331 | 0.063 | 0.083 | 0.191 | 0.078 | 93.3% | 98.2% | 97.8% | 3.940 | |
CodeVIO (8D) | 0.337 | 0.075 | 0.094 | 0.202 | 0.075 | 93.1% | 96.5% | 98.1% | 3.547 | |
Ours | 0.298 | 0.047 | 0.070 | 0.179 | 0.062 | 94.9% | 98.3% | 99.3% | 3.738 | |
G-S | CodeVIO | 0.315 | 0.055 | - | - | 0.071 | 93.9% | 98.5% | 99.5% | 3.746 |
CodeVIO (32D) | 0.327 | 0.072 | 0.070 | 0.185 | 0.073 | 93.2% | 98.2% | 97.3% | 3.792 | |
CodeVIO (8D) | 0.343 | 0.075 | 0.081 | 0.191 | 0.090 | 91.8% | 95.1% | 98.4% | 3.403 | |
Ours | 0.290 | 0.042 | 0.061 | 0.170 | 0.059 | 95.1% | 98.9% | 99.5% | 3.760 |
Sequence | Methods | RMSE ↓ | iRMSE ↓ | Abs Rel ↓ | MAE ↓ | MAPE ↓ | Time ↓ | |||
---|---|---|---|---|---|---|---|---|---|---|
V101 | Sparse ORB 1 | 0.251 | 0.058 | 0.061 | 0.105 | 0.049 | 96.2% | 98.7% | 99.0% | - |
CodeMapping 2 | 0.381 | - | - | 0.192 | - | - | - | - | 11.00 | |
CodeVIO 3 | 0.468 | 0.091 | - 4 | - | 0.107 | 87.0% | 95.2% | 97.9% | - | |
CodeVIO (32D) 5 | 0.488 | 0.103 | 0.100 | 0.251 | 0.137 | 83.8% | 93.1% | 97.5% | 3.794 | |
CodeVIO (8D) 6 | 0.503 | 0.134 | 0.132 | 0.283 | 0.154 | 81.4% | 91.7% | 95.3% | 3.412 | |
Ours | 0.405 | 0.069 | 0.098 | 0.208 | 0.091 | 91.8% | 95.7% | 98.2% | 3.767 | |
V102 | Sparse ORB | 0.380 | 0.074 | 0.089 | 0.167 | 0.088 | 92.4% | 96.1% | 98.0% | - |
CodeMapping | 0.369 | - | - | 0.259 | - | - | - | - | 11.00 | |
CodeVIO | 0.602 | 0.118 | - | - | 0.170 | 78.7% | 91.9% | 96.4% | - | |
CodeVIO (32D) | 0.621 | 0.123 | 0.113 | 0.285 | 0.173 | 76.9% | 92.2% | 97.0% | 3.787 | |
CodeVIO (8D) | 0.640 | 0.135 | 0.112 | 0.291 | 0.193 | 72.4% | 90.0% | 94.8% | 3.337 | |
Ours | 0.511 | 0.103 | 0.109 | 0.266 | 0.117 | 90.1% | 93.2% | 97.2% | 3.758 | |
V103 | Sparse ORB | 0.419 | 0.078 | 0.097 | 0.229 | 0.098 | 91.0% | 94.7% | 97.9% | - |
CodeMapping | 0.407 | - | - | 0.283 | - | - | - | - | 11.00 | |
CodeVIO | 0.687 | 0.103 | - | - | 0.198 | 73.9% | 90.2% | 96.5% | - | |
CodeVIO (32D) | 0.684 | 0.125 | 0.124 | 0.313 | 0.213 | 73.4% | 90.1% | 94.8% | 3.790 | |
CodeVIO (8D) | 0.693 | 0.149 | 0.130 | 0.337 | 0.223 | 75.9% | 89.7% | 94.0% | 3.414 | |
Ours | 0.588 | 0.097 | 0.123 | 0.304 | 0.158 | 88.2% | 92.1% | 96.8% | 3.753 | |
V201 | Sparse ORB | 0.388 | 0.071 | 0.110 | 0.193 | 0.101 | 92.3% | 95.8% | 97.3% | - |
CodeMapping | 0.428 | - | - | 0.290 | - | - | - | - | 11.00 | |
CodeVIO | 0.656 | 0.117 | - | - | 0.163 | 77.3% | 90.8% | 96.0% | - | |
CodeVIO (32D) | 0.667 | 0.126 | 0.129 | 0.335 | 0.173 | 76.0% | 90.2% | 95.7% | 3.766 | |
CodeVIO (8D) | 0.683 | 0.154 | 0.125 | 0.349 | 0.191 | 74.5% | 90.1% | 93.2% | 3.409 | |
Ours | 0.577 | 0.096 | 0.118 | 0.301 | 0.150 | 88.3% | 93.4% | 96.5% | 3.785 | |
V202 | Sparse ORB | 0.513 | 0.099 | 0.120 | 0.257 | 0.112 | 90.3% | 92.9% | 96.3% | - |
CodeMapping | 0.655 | - | - | 0.415 | - | - | - | - | 11.00 | |
CodeVIO | 0.777 | 0.125 | - | - | 0.206 | 72.0% | 88.3% | 94.9% | - | |
CodeVIO (32D) | 0.758 | 0.113 | 0.173 | 0.345 | 0.193 | 79.7% | 88.2% | 95.0% | 3.732 | |
CodeVIO (8D) | 0.783 | 0.136 | 0.181 | 0.347 | 0.207 | 73.8% | 85.2% | 90.5% | 3.389 | |
Ours | 0.598 | 0.105 | 0.159 | 0.316 | 0.159 | 84.1% | 91.3% | 96.5% | 3.729 | |
V203 | Sparse ORB | 0.473 | 0.070 | 0.109 | 0.248 | 0.104 | 90.9% | 93.3% | 97.2% | - |
CodeMapping | 0.952 | - | - | 0.686 | - | - | - | - | 11.00 | |
CodeVIO | 0.652 | 0.097 | - | - | 0.177 | 75.6% | 92.5% | 97.3% | - | |
CodeVIO (32D) | 0.637 | 0.092 | 0.140 | 0.351 | 0.173 | 77.0% | 93.4% | 97.1% | 3.757 | |
CodeVIO (8D) | 0.653 | 0.108 | 0.161 | 0.372 | 0.188 | 74.8% | 91.5% | 96.2% | 3.458 | |
Ours | 0.585 | 0.092 | 0.122 | 0.306 | 0.156 | 86.0% | 93.1% | 97.6% | 3.776 |
Methods | V101 | V102 | V103 | V201 | V202 | V203 | Mean |
---|---|---|---|---|---|---|---|
ORB-SLAM3 1 | 0.035 | 0.013 | 0.030 | 0.043 | 0.016 | 0.019 | 0.026 |
OpenVINS 2 | 0.056 | 0.072 | 0.069 | 0.098 | 0.061 | 0.286 | 0.107 |
CodeVIO 3 | 0.054 | 0.071 | 0.068 | 0.097 | 0.061 | 0.275 | 0.104 |
Ours (Framerate) | 0.036 (16.4) | 0.011 (16.2) | 0.022 (15.7) | 0.041 (16.3) | 0.012 (15.5) | 0.017 (15.1) | 0.023 (15.9) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, M.; Zhou, D.; Song, X.; Chen, X.; Zhang, L. DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization. Sensors 2022, 22, 3389. https://doi.org/10.3390/s22093389
Zhao M, Zhou D, Song X, Chen X, Zhang L. DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization. Sensors. 2022; 22(9):3389. https://doi.org/10.3390/s22093389
Chicago/Turabian StyleZhao, Mingle, Dingfu Zhou, Xibin Song, Xiuwan Chen, and Liangjun Zhang. 2022. "DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization" Sensors 22, no. 9: 3389. https://doi.org/10.3390/s22093389
APA StyleZhao, M., Zhou, D., Song, X., Chen, X., & Zhang, L. (2022). DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization. Sensors, 22(9), 3389. https://doi.org/10.3390/s22093389