Mapping with Monocular Camera Sensor under Adversarial Illumination for Intelligent Vehicles
Abstract
:1. Introduction
- We introduce an unsupervised keypoint-detection and -description approach into the visual mapping process with an improved loss to enhance the learning of discriminative descriptors. Such an approach only requires monocular images as input, thus saving annotation labors.
- We present a scheme integrating both feature-point verification and self-supervised multi-grained image similarity measurements. It effectively reduces the cumulative error and suppresses the overall scale drift at a low level.
- We further integrate ground point features and camera height to recover the absolute scale. Using validation on both the self-collected data and public benchmarks, this mapping approach is demonstrated to be robust against illumination changes in scenarios such as underground parking and outdoor roads.
2. Related Work
2.1. Keypoint Models
2.2. Monocular Visual Architecture
2.3. Mapping under Adversarial Illumination
3. Proposed Approach
3.1. Unsupervised Keypoint Extraction
3.2. 3D Reconstruction with Improved Scale Estimation
3.2.1. Base SfM Approach
3.2.2. Image Matching with Multi-Grained Similarity and Keypoint Verification
- 1.
- Images are processed in chronological order. If there is still an image unprocessed, it is marked as . Otherwise, the process is terminated.
- 2.
- The query is matched with its following N images.
- 3.
- Based on the matching results, the images are searched with their multi-grained image similarities to within a threshold and denoted as set . For an empty , the procedure goes back to step 1.
- 4.
- For each image , if the number of correspondence keypoints between and is greater than a threshold , the pair is recorded into a database of query-positive candidates (Figure 4). Otherwise, the procedure goes back to step 1.
- 5.
- For each in the candidate database, we also consider the correspondence between and N images after . If the number of correspondence keypoints in any image pair is not greater than the threshold , the candidate database is cleared. The assumption is that the keypoints across true positive images can be tracked for a period. Otherwise, the candidate database is recorded in the final database. The procedure goes back to step 1.
3.2.3. Scale Recovery
4. Experiment
4.1. Platform Configuration and Dataset Selection
4.2. Evaluation of Keypoint Model
4.3. Study on Image Matching and Keypoint Verification
4.4. Exploration of Mapping Architecture
4.5. Transferring on SLAM Approaches
4.5.1. Evaluation on EuRoC
4.5.2. Evaluation on KITTI
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Mathematical Explanations
Appendix A.1. Equation of
Appendix A.2. Explanation for Math Symbols
Symbol | Description |
---|---|
Map of keypoint score | |
Map of keypoint relative positions | |
Map of keypoint descriptors | |
Height/Width of input images | |
Source image/Warped image | |
Homography transform matrix | |
Keypoint | |
s | Keypoint score |
Keypoint relative position | |
Keypoint descriptor | |
Loss function | |
Weights of losses | |
t | Hyperparameter that controls |
Trade-off factor between and | |
Feature vector | |
Similarity vector | |
Hyperparameter that controls and | |
Unprocessed image | |
N | Number of images following |
Set of images that are not searched | |
Image that is not searched | |
Threshold of multi-grained image similarities | |
Threshold of the number of correspondence keypoints | |
d | Camera height |
e | Side length of the cubic space in Figure 6 |
Camera position | |
Plane normal vector | |
r | Scale correction factor |
References
- Abassi, R.; Ben Chehida Douss, A.; Sauveron, D. TSME: A trust-based security scheme for message exchange in vehicular Ad hoc networks. Hum.-Centric Comput. Inf. Sci. 2020, 10, 43. [Google Scholar] [CrossRef]
- Aliedani, A.; Loke, S.W.; Glaser, S. Robust cooperative car-parking: Implications and solutions for selfish inter-vehicular social behaviour. Hum.-Centric Comput. Inf. Sci. 2020, 10, 37. [Google Scholar] [CrossRef]
- Xu, Z.; Liang, W.; Li, K.C.; Xu, J.; Jin, H. A blockchain-based Roadside Unit-assisted authentication and key agreement protocol for Internet of Vehicles. J. Parallel Distrib. Comput. 2021, 149, 29–39. [Google Scholar] [CrossRef]
- Chen, C.; Li, K.; Teo, S.G.; Zou, X.; Li, K.; Zeng, Z. Citywide Traffic Flow Prediction Based on Multiple Gated Spatio-Temporal Convolutional Neural Networks. ACM Trans. Knowl. Discov. Data 2020, 14, 1–23. [Google Scholar] [CrossRef]
- Chen, Q.; Xie, Y.; Guo, S.; Bai, J.; Shu, Q. Sensing system of environmental perception technologies for driverless vehicle: A review of state of the art and challenges. Sens. Actuators A: Phys. 2021, 319, 112566. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
- Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust Invariant Scalable Keypoints. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar] [CrossRef] [Green Version]
- Guan, H.; Lei, X.; Yu, Y.; Zhao, H.; Peng, D.; Junior, J.M.; Li, J. Road marking extraction in UAV imagery using attentive capsule feature pyramid network. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102677. [Google Scholar] [CrossRef]
- Qin, T.; Chen, T.; Chen, Y.; Su, Q. Avp-slam: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020; pp. 5939–5945. [Google Scholar] [CrossRef]
- Gao, F.; Ma, J. Indoor Location Technology with High Accuracy Using Simple Visual Tags. Sensors 2023, 23, 1597. [Google Scholar] [CrossRef]
- Huang, Y.; Zhao, J.; He, X.; Zhang, S.; Feng, T. Vision-Based Semantic Mapping and Localization for Autonomous Indoor Parking. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 636–641. [Google Scholar] [CrossRef] [Green Version]
- Tang, J.; Ericson, L.; Folkesson, J.; Jensfelt, P. GCNv2: Efficient Correspondence Prediction for Real-Time SLAM. IEEE Robot. Autom. Lett. 2019, 4, 3505–3512. [Google Scholar] [CrossRef] [Green Version]
- Shi, J.T. Good Features to Track. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar] [CrossRef]
- Bibi, S.; Abbasi, A.; Haq, I.; Baik, S.; Ullah, A. Digital Image Forgery Detection Using Deep Autoencoder and CNN Features. Hum.-Centric Comput. Inf. Sci. 2021, 11, 1–17. [Google Scholar] [CrossRef]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The Kitti Vision Benchmark Suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
- Rosten, E.; Drummond, T. Machine Learning for High Speed Corner Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006; Volume 1, pp. 430–443. [Google Scholar] [CrossRef]
- Calonder, M.; Lepetit, V.; Ozuysal, M.; Trzcinski, T.; Strecha, C.; Fua, P. BRIEF: Computing a Local Binary Descriptor Very Fast. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1281–1298. [Google Scholar] [CrossRef] [Green Version]
- Yu, G.; Morel, J.M. A fully affine invariant image comparison method. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 1597–1600. [Google Scholar] [CrossRef]
- Gao, J.; Sun, Z. An Improved ASIFT Image Feature Matching Algorithm Based on POS Information. Sensors 2022, 22, 7749. [Google Scholar] [CrossRef]
- Yum, J.; Kim, J.S.; Lee, H.J. Fast Execution of an ASIFT Hardware Accelerator by Prior Data Processing. Electronics 2019, 8, 1176. [Google Scholar] [CrossRef] [Green Version]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. Lift: Learned Invariant Feature Transform. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14October 2016; Springer: Berlin, Germany, 2016; pp. 467–483. [Google Scholar] [CrossRef] [Green Version]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 337–33712. [Google Scholar] [CrossRef] [Green Version]
- Hviid Christiansen, P.; Fly Kragh, M.; Brodskiy, Y.; Karstoft, H. UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor. arXiv 2019, arXiv:1907.04011. [Google Scholar] [CrossRef]
- Klein, G.; Murray, D. Parallel Tracking and Mapping for Small AR Workspaces. In Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality, Washington, DC, USA, 13–16 November 2007; pp. 225–234. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Qin, T.; Shen, S. Robust Initialization of Monocular Visual-Inertial Estimation on Aerial Robots. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 4225–4232. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
- Chen, W.; Shang, G.; Hu, K.; Zhou, C.; Wang, X.; Fang, G.; Ji, A. A Monocular-Visual SLAM System with Semantic and Optical-Flow Fusion for Indoor Dynamic Environments. Micromachines 2022, 13, 2006. [Google Scholar] [CrossRef]
- Zang, Q.; Zhang, K.; Wang, L.; Wu, L. An adaptive ORB-SLAM3 System for Outdoor Dynamic Environments. Sensors 2023, 23, 1359. [Google Scholar] [CrossRef] [PubMed]
- Qin, T.; Shen, S. Online Temporal Calibration for Monocular Visual-Inertial Systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3662–3669. [Google Scholar] [CrossRef] [Green Version]
- Qin, T.; Pan, J.; Cao, S.; Shen, S. A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors. arXiv 2019, arXiv:1901.03638. [Google Scholar] [CrossRef]
- Snavely, N.; Seitz, S.M.; Szeliski, R. Photo Tourism: Exploring Photo Collections in 3D. ACM Trans. Graph. 2006, 25, 835–846. [Google Scholar] [CrossRef] [Green Version]
- Moulon, P.; Monasse, P.; Marlet, R. Adaptive Structure from Motion with a Contrario Model Estimation. In Proceedings of the Asian Conference on Computer Vision (ACCV), Daejeon, Republic of Korea, 5–9 November 2012; pp. 257–270. [Google Scholar] [CrossRef] [Green Version]
- Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
- Ge, Y.; Wang, H.; Zhu, F.; Zhao, R.; Li, H. Self-Supervising Fine-Grained Region Similarities for Large-Scale Image Localization. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 369–386. [Google Scholar] [CrossRef]
- Zhang, L.; Huang, J.; Li, X.; Xiong, L. Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset. IEEE Trans. Image Process. 2018, 27, 5350–5364. [Google Scholar] [CrossRef] [PubMed]
- Yu, Z.; Gao, Z.; Chen, H.; Huang, Y. SPFCN: Select and Prune the Fully Convolutional Networks for Real-time Parking Slot Detection. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 13–19 October 2020; pp. 445–450. [Google Scholar] [CrossRef]
- Nguyen, K.; Nguyen, Y.; Le, B. Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation with SimCLR. arXiv 2021, arXiv:2108.00587. [Google Scholar] [CrossRef]
- Tian, W.; Ren, X.; Yu, X.; Wu, M.; Zhao, W.; Li, Q. Vision-based mapping of lane semantics and topology for intelligent vehicles. Int. J. Appl. Earth Obs. Geoinf. 2022, 111, 102851. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2014. [Google Scholar] [CrossRef] [Green Version]
- Balntas, V.; Lenc, K.; Vedaldi, A.; Mikolajczyk, K. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3852–3861. [Google Scholar] [CrossRef] [Green Version]
- Strecha, C.; von Hansen, W.; Van Gool, L.; Fua, P.; Thoennessen, U. On Benchmarking Camera Calibration and Multi-View Stereo for High Resolution Imagery. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef]
- Aanas, H.; Dahl, A.L.; Steenstrup Pedersen, K. Interesting Interest Points. Int. J. Comput. Vis. 2012, 97, 18–35. [Google Scholar] [CrossRef]
- Grupp, M. Evo: Python Package for the Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 18 February 2023).
Term | Description |
---|---|
Squared score difference of paired points | |
Euclidean distance of paired points | |
Differences between the distribution of predicted point coordinates and a uniform distribution | |
Correlation coefficients between keypoint descriptors on the same image, further explained in Equation (A.1) | |
Ensuring closely located point pair with a high score, interpreted as with : point scores of the k-th pair : distance of the k-th pair : mean distance of all pairs |
Metric | Description |
---|---|
HA | Ratio of estimated homographies under a threshold (here set to 3 px) to all estimated homographies |
RS | Ratio of corresponding points to all predicted points |
LE | Average distance of corresponding points |
MS | Ratio of good matches
to predicted points in one image, where a good match denotes two corresponding points with the nearest descriptors in the feature space. |
Illumination | Viewpoint | |||||||
---|---|---|---|---|---|---|---|---|
Methods | HA ↑ | RS ↑ | LE ↓ | MS ↑ | HA ↑ | RS ↑ | LE ↓ | MS ↑ |
SURF | 0.77 | 0.57 | 1.16 | 0.27 | 0.58 | 0.53 | 1.41 | 0.23 |
SIFT | 0.86 | 0.50 | 1.11 | 0.25 | 0.66 | 0.52 | 1.22 | 0.29 |
Superpoint | 0.93 | 0.64 | 0.94 | 0.63 | 0.63 | 0.51 | 1.17 | 0.47 |
Ours | 0.91 | 0.65 | 0.81 | 0.64 | 0.62 | 0.55 | 1.09 | 0.47 |
Trans. | HA ↑ | RS ↑ | LE ↓ | MS ↑ | |
---|---|---|---|---|---|
✓ | 0.86 | 0.65 | 0.82 | 0.38 | |
✓ | 0.91 | 0.65 | 0.92 | 0.62 | |
✓ | ✓ | 0.91 | 0.65 | 0.81 | 0.64 |
Match | Match | Pts. | APE-as |
---|---|---|---|
by Pts. | by Img. | Verify | (Mean) ↓ |
✓ | 12.81 m | ||
✓ | 0.60 m | ||
✓ | ✓ | 0.45 m |
Methods | APE-as (Mean) ↓ |
---|---|
Ours | 0.453 m |
COLMAP (Superpoint) | 0.502 m |
COLMAP (SIFT) | 0.562 m |
COLMAP (SURF) | 0.596 m |
VINS | VINS | VINS | VINS | ||
---|---|---|---|---|---|
Seq. | (Flow) | Ours | (Superpoint) | (SIFT) | (SURF) |
MH01 | 0.24 | 0.22 | 0.20 | 0.76 | 0.50 |
MH02 | 0.22 | 0.22 | 0.18 | 0.51 | 0.48 |
MH03 | 0.28 | 0.24 | 0.17 | x | 0.27 |
MH04 | 0.43 | 0.43 | 0.47 | 0.56 | 0.62 |
MH05 | 0.31 | 0.32 | 0.22 | 0.53 | 0.75 |
V101 | 0.109 | 0.108 | 0.12 | 0.23 | 0.20 |
V102 | 0.10 | 0.11 | 0.09 | 0.13 | 0.13 |
V103 | 0.111 | 0.088 | 0.09 | 0.15 | 0.19 |
V201 | 0.121 | 0.116 | 0.14 | 0.18 | 0.21 |
V202 | 0.11 | 0.09 | 0.13 | 0.27 | 0.25 |
V203 | 0.30 | 0.20 | 0.79 | 0.39 | 0.49 |
VINS | VINS | ||
---|---|---|---|
Seq. | (Flow) | Ours | (Superpoint) |
00 | 13.74 | 7.05 | 77.70 |
01 | 7.54 | x | x |
02 | 20.69 | 11.34 | 17.71 |
03 | 1.75 | 2.67 | 3.17 |
04 | 1.33 | 2.27 | 2.68 |
05 | 6.64 | 5.54 | 5.78 |
06 | 3.87 | 4.89 | 15.64 |
07 | 2.20 | 4.27 | 11.28 |
08 | 9.37 | 4.97 | 57.24 |
09 | 7.73 | 7.42 | 8.01 |
10 | 3.66 | 1.96 | 4.25 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, W.; Wen, Y.; Chu, X. Mapping with Monocular Camera Sensor under Adversarial Illumination for Intelligent Vehicles. Sensors 2023, 23, 3296. https://doi.org/10.3390/s23063296
Tian W, Wen Y, Chu X. Mapping with Monocular Camera Sensor under Adversarial Illumination for Intelligent Vehicles. Sensors. 2023; 23(6):3296. https://doi.org/10.3390/s23063296
Chicago/Turabian StyleTian, Wei, Yongkun Wen, and Xinning Chu. 2023. "Mapping with Monocular Camera Sensor under Adversarial Illumination for Intelligent Vehicles" Sensors 23, no. 6: 3296. https://doi.org/10.3390/s23063296
APA StyleTian, W., Wen, Y., & Chu, X. (2023). Mapping with Monocular Camera Sensor under Adversarial Illumination for Intelligent Vehicles. Sensors, 23(6), 3296. https://doi.org/10.3390/s23063296