E2-VINS: An Event-Enhanced Visual–Inertial SLAM Scheme for Dynamic Environments
Abstract
:1. Introduction
- 1.
- A novel event-based method is used to assign dynamicity weights to visual residuals, which is supported by a momentum factor to mitigate event noise interference. This method enables robust, dynamically adaptive bundle adjustment that integrates seamlessly into feature-based SLAM frameworks to filter dynamic feature points.
- 2.
- A joint alternating optimization framework is employed that optimizes the system state and dynamicity weights, reducing the influence of dynamic features on SLAM robustness in dynamic environments.
- 3.
- The first event-enhanced, dynamicity-adaptive VI-SLAM system is proposed, which is termed -VINS. Extensive qualitative and quantitative experiments on multiple benchmark datasets demonstrate that -VINS performs better than its competitors in the same field in real dynamic scenarios.
2. Related Work
2.1. Event-Based SLAM
2.2. Dynamic VI-SLAM
- In Section 3, we describe the differences in pipeline design, including the introduction of dynamicity weights and alternating optimization.
- In Section 4, we present quantitative and qualitative comparisons with existing methods, demonstrating the superior performance of our approach in dynamic scenes.
- In Section 6, we highlight the advantages and limitations of our method in handling dynamic environments.
3. Methodology
3.1. Preliminaries of the Event Camera
3.2. Preprocessing and Visual–Inertial Alignment-Based IMU Calibration
- Preintegration of IMU Data
- Preprocessing of Images and Visual-based Structure from Motion
- Visual–Inertial Alignment
3.3. Event-Based Dynamicity Metrics
- IMU-assisted Motion Compensation
- Event-based Dynamicity Metrics
3.4. Event-Based Dynamicity-Adaptive Bundle Adjustment
- Dynamicity-adaptive Objective Function
- Weight Momentum Factor
- Details on Weights Optimization and Update Strategies
4. Experiments
4.1. System Implementation and Methods for Comparison
- •
- ORB-SLAM3 [1] is a system capable of performing visual, visual–inertial, and multimap SLAM with monocular, stereo, and RGB-D cameras using pinhole and fisheye lens models. It is not only a feature-based, tightly integrated visual–inertial SLAM system that relies on Maximum A Posteriori (MAP) estimation but also a multiple map system that utilizes a novel place recognition method for improved recall.
- •
- DSO [3] is a direct and sparse formulation for visual odometry. It combines a fully direct approach that minimizes photometric errors with a consistent and joint optimization of all model parameters, including geometry represented as inverse depth in a reference frame and camera motion.
- •
- VINS-FUSION [28] is an optimization-based multisensor state estimator. It features a tightly coupled monocular visual–inertial state estimator designed by fusing preintegrated IMU measurements with feature observations.
- •
- USLAM [21] is an indirect monocular method that fuses events, frames, and IMU measurements. Its frontend converts events into frames by motion compensation using the IMU’s gyroscope and the median scene depth. Then, FAST corners [39] are extracted and tracked separately on the event frames and the grayscale frames and then passed to a geometric feature-based backend. Similar to the proposed -VINS, USLAM [21] constructs three-modality tightly coupled SLAM systems using IMU, events, and image frames as inputs.
- •
- DynaVINS [8] is a novel visual–inertial SLAM framework designed for dynamic environments. It uses robust bundle adjustment based on IMU preintegration and a multihypothesis-based constraints grouping method to reduce the impact of dynamic objects and temporarily static objects, respectively.
4.2. Datasets and Metrics
4.3. Qualitative Results
4.4. Quantitative Results
5. Discussions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Montiel, J.M.; D. Tardós, J. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-MONO: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [Google Scholar] [CrossRef]
- Saputra, M.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Kim, D.; Han, S.; Kim, J. Visual odometry algorithm using an RGB-D sensor and IMU in a highly dynamic environment. In Proceedings of the Robot Intelligence Technology and Applications, Bucheon, Republic of Korea, 14–16 December 2015; pp. 11–26. [Google Scholar]
- Fan, Y.; Han, H.; Tang, Y.; Zhi, T. Dynamic objects elimination in SLAM based on image fusion. IEEE Pattern Recognit. Lett. 2019, 127, 191–201. [Google Scholar] [CrossRef]
- Canovas, B.; Rombaut, M.; Nègre, A.; Pellerin, D.; Olympieff, S. Speed and memory efficient dense RGB-D SLAM in dynamic scenes. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020; pp. 4996–5001. [Google Scholar]
- Song, S.; Lim, H.; Lee, A.J.; Myung, H. DynaVINS: A visual-inertial SLAM for dynamic environments. IEEE Robot. Autom. Lett. 2022, 7, 11523–11530. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.; Liu, X.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A semantic visual SLAM towards dynamic environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Bescos, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
- Schörghuber, M.; Steininger, D.; Cabon, Y.; Humenberger, M.; Gelautz, M. SLAMANTIC-leveraging semantics to improve VSLAM in dynamic environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Republic of Korea, 27–28 October 2019; pp. 3759–3768. [Google Scholar]
- Liu, J.; Li, X.; Liu, Y.; Chen, H. RGB-D inertial odometry for a resource-restricted robot in dynamic environments. IEEE Robot. Autom. Lett. 2022, 7, 9573–9580. [Google Scholar] [CrossRef]
- Lichtsteiner, P.; Posch, C.; Delbruck, T. A 128 × 128 120 dB 15 μs latency asynchronous temporal contrast vision sensor. IEEE J. -Solid-State Circuits 2008, 43, 566–576. [Google Scholar] [CrossRef]
- Posch, C.; Matolin, D.; Wohlgenannt, R. A QVGA 143 dB dynamic range frame-free PWM image sensor with lossless pixel-level video compression and time-domain CDS. IEEE J. Solid-State Circuits 2011, 46, 259–275. [Google Scholar] [CrossRef]
- Brandli, C.; Berner, R.; Yang, M.; Liu, S.; Delbruck, T. A 240 × 180 130 dB 3 µs latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 2014, 49, 2333–2341. [Google Scholar] [CrossRef]
- Kim, H.; Kim, H.J. Real-time rotational motion estimation with contrast maximization over globally aligned events. IEEE Robot. Autom. Lett. 2021, 6, 6016–6023. [Google Scholar] [CrossRef]
- Gallego, G.; Lund, J.E.A.; Mueggler, E.; Rebecq, H.; Delbruck, T.; Scaramuzza, D. Event-based, 6-DoF camera tracking from photometric depth maps. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2402–2412. [Google Scholar] [CrossRef]
- Kim, H.; Leutenegger, S.; Davison, A.J. Real-time 3D reconstruction and 6-DoF tracking with an event camera. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 349–364. [Google Scholar]
- Rebecq, H.; Horstschaefer, T.; Gallego, G.; Scaramuzza, D. EVO: A geometric approach to event-based 6-DoF parallel tracking and mapping in real time. IEEE Robot. Autom. Lett. 2017, 2, 593–600. [Google Scholar] [CrossRef]
- Zuo, Y.; Yang, J.; Chen, J.; Wang, X.; Wang, Y.; Kneip, L. DEVO: Depth-event camera visual odometry in challenging conditions. In Proceedings of the IEEE International Conference on Robotics and Automation, Philadelphia, PA, USA, 23–27 May 2022; pp. 2179–2185. [Google Scholar]
- Vidal, A.R.; Rebecq, H.; Horstschaefer, T.; Scaramuzza, D. Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios. IEEE Robot. Autom. Lett. 2018, 3, 994–1001. [Google Scholar] [CrossRef]
- Zuo, Y.; Xu, W.; Wang, X.; Wang, Y.; Kneip, L. Cross-modal semi-dense 6-DoF tracking of an event camera in challenging conditions. IEEE Trans. Robot. 2024, 40, 1600–1616. [Google Scholar] [CrossRef]
- Hidalgo-Carrió, J.; Gallego, G.; Scaramuzza, D. Event-aided direct sparse odometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5771–5780. [Google Scholar]
- Huang, J.; Zhao, S.; Zhang, T.; Zhang, L. MC-VEO: A visual-event odometry with accurate 6-DoF motion compensation. IEEE Trans. Intell. Veh. 2024, 9, 1756–1767. [Google Scholar] [CrossRef]
- Guan, W.; Chen, P.; Xie, Y.; Lu, P. PL-EVIO: Robust monocular event-based visual inertial odometry with point and line Features. IEEE Trans. Autom. Sci. Eng. 2024, 21, 6277–6293. [Google Scholar] [CrossRef]
- Guo, S.; Gallego, G. CMax-SLAM: Event-based rotational-motion bundle adjustment and SLAM system using contrast maximization. IEEE Trans. Robot. 2024, 40, 2442–2461. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Qin, T.; Pan, J.; Cao, S.; Shen, S. A general optimization-based framework for local odometry estimation with multiple sensors. arXiv arXiv:1901.03638.
- Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada, 24–28 August 1981; Volume 2, pp. 674–679. [Google Scholar]
- Nistér, D. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 756–770. [Google Scholar] [CrossRef]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef]
- Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle adjustment—A modern synthesis. In Proceedings of the International Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 2000; pp. 298–372. [Google Scholar]
- Zhao, C.; Li, Y.; Lyu, Y. Event-based real-time moving object detection based on IMU ego-motion compensation. In Proceedings of the IEEE International Conference on Robotics and Automation, London, UK, 29 May–2 June 2023; pp. 690–696. [Google Scholar]
- Delbruck, T.; Villanueva, V.; Longinotti, L. Integration of dynamic vision sensor with inertial measurement unit for electronically stabilized event-based vision. In Proceedings of the IEEE International Symposium on Circuits and Systems, Melbourne, Australia, 1–5 June 2014; pp. 2636–2639. [Google Scholar]
- Falanga, D.; Kleber, K.; Scaramuzza, D. Dynamic obstacle avoidance for quadrotors with event cameras. Sci. Robot. 2020, 5, eaaz9712. [Google Scholar] [CrossRef]
- Babin, P.; Giguère, P.; Pomerleau, F. Analysis of robust functions for registration algorithms. In Proceedings of the IEEE International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 1451–1457. [Google Scholar]
- Black, M.; Rangarajan, A. On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int. J. Comput. Vis. 1996, 19, 57–91. [Google Scholar] [CrossRef]
- Corke, P.I.; Khatib, O. Robotics, Vision and Control: Fundamental Algorithms in MATLAB; Springer: Berlin/Heidelberg, Germany, 2011; Volume 73. [Google Scholar]
- Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 430–443. [Google Scholar]
- Agarwal, S.; Mierle, K.; CeresSolverTeam. Ceres Solver. 2023. Available online: http://ceres-solver.org (accessed on 27 September 2024).
- OpenRobotics. ROS—Robot Operating System. 2021. Available online: https://www.ros.org/ (accessed on 27 September 2024).
- Minoda, K.; Schilling, F.; Wüest, V.; Floreano, D.; Yairi, T. VIODE: A simulated dataset to address the challenges of visual-inertial odometry in dynamic environments. IEEE Robot. Autom. Lett. 2021, 6, 1343–1350. [Google Scholar] [CrossRef]
- Chen, P.; Guan, W.; Huang, F.; Zhong, Y.; Wen, W.; Hsu, L.; Lu, P. ECMD: An event-centric multisensory driving dataset for SLAM. IEEE Trans. Intell. Veh. 2024, 9, 407–416. [Google Scholar] [CrossRef]
- Grupp, M. evo: Python Package for the Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 27 September 2024).
- Hu, Y.; Liu, S.; Delbruck, T. v2e: From video frames to realistic DVS events. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, Nashville, TN, USA, 19–25 June 2021; pp. 1312–1321. [Google Scholar]
Method | Input | Main limitations | Dynamic Environment |
---|---|---|---|
Kim et al. [18] | Events | GPU requirement | × |
Rebecq et al. [19] | Insufficient robustness | × | |
Guo and Gallego [26] | Rotational motion only | × | |
Zuo et al. [20] | Events and depth images | RGB-D camera brings efficiency bottleneck | × |
Hidalgo-Carrió et al. [23] | Events and RGB images | Relatively low computational efficiency | × |
Huang et al. [24] | - | × | |
Guan et al. [25] | Events, images, and IMU measurements | Not open source | × |
Vidal et al. [21] | Insufficient utilization of the motion sensitivity of events | × | |
-VINS (Ours) | - | ✓ |
Method | Camera Configuration | Practical Consideration | ||
---|---|---|---|---|
Prior Knowledge | Temporary Stopping | Uncertainty | ||
Constraint-based methods | ||||
Kim et al. [5] | RGB-D | ✓ | ✓ | × |
Fan et al. [6] | Monocular | × | × | × |
Canovas et al. [7] | RGB-D | × | × | × |
Song et al. [8] | Monocular | ✓ | × | ✓ |
Deep learning-based methods | ||||
DS-SLAM [9] | Monocular | × | ✓ | × |
DynaSLAM [10] | Monocular | × | ✓ | × |
SLAMANTIC [11] | Monocular | × | ✓ | × |
Dynamic-VINS [12] | RGB-D | ✓ | ✓ | × |
-VINS (Ours) | Event | × | ✓ | ✓ |
Level | City_day | City_night | Parking_lot | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
None | Low | Mid | High | None | Low | Mid | High | None | Low | Mid | High | |
ORB-SLAM3 [1] | 1.230 | 0.543 | 2.844 | fail | fail | fail | fail | fail | 0.194 | 0.231 | 0.191 | 0.256 |
DSO [3] | 6.055 | 5.214 | 5.445 | fail | fail | fail | fail | fail | 3.106 | 1.762 | 8.440 | 5.999 |
VINS-FUSION [28] | 0.115 | 0.355 | 0.323 | 0.198 | 0.224 | 0.276 | 0.280 | 0.134 | 0.182 | 0.932 | 1.497 | |
DynaVINS [8] | 0.142 | 0.114 | 0.109 | 0.111 | 0.155 | |||||||
Ours+V2E [45] | 0.118 | 0.101 | 0.080 | 0.104 | 0.095 | 0.142 | 0.138 | 0.134 |
Level | Dense_street_day | Dense_street_night | Urban_road_day | Urban_road_night | |||||
---|---|---|---|---|---|---|---|---|---|
Easy_a | Easy_b | Medium_a | Difficult | Easy_a | Easy_b | Easy_a | Easy_b | Easy_a | |
ORB-SLAM3 [1] | fail | 8.558 | fail | 4.332 | fail | fail | fail | fail | |
DSO [3] | fail | 8.509 | fail | 7.647 | fail | 5.560 | 11.190 | 13.663 | 5.534 |
VINS-FUSION [28] | fail | 3.428 | fail | 9.966 | fail | fail | 12.964 | ||
USLAM [21] | fail | 9.435 | fail | fail | 5.363 | 6.141 | |||
DynaVINS [8] | fail | fail | 3.472 | 3.500 | 5.014 | 9.239 | 10.318 | 6.496 | |
Ours | 1.356 | 1.562 | 2.404 | 1.393 | 1.427 | 2.736 | 7.339 | 9.933 | 3.159 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, J.; Zhao, S.; Zhang, L. E2-VINS: An Event-Enhanced Visual–Inertial SLAM Scheme for Dynamic Environments. Appl. Sci. 2025, 15, 1314. https://doi.org/10.3390/app15031314
Huang J, Zhao S, Zhang L. E2-VINS: An Event-Enhanced Visual–Inertial SLAM Scheme for Dynamic Environments. Applied Sciences. 2025; 15(3):1314. https://doi.org/10.3390/app15031314
Chicago/Turabian StyleHuang, Jiafeng, Shengjie Zhao, and Lin Zhang. 2025. "E2-VINS: An Event-Enhanced Visual–Inertial SLAM Scheme for Dynamic Environments" Applied Sciences 15, no. 3: 1314. https://doi.org/10.3390/app15031314
APA StyleHuang, J., Zhao, S., & Zhang, L. (2025). E2-VINS: An Event-Enhanced Visual–Inertial SLAM Scheme for Dynamic Environments. Applied Sciences, 15(3), 1314. https://doi.org/10.3390/app15031314