Reliable Feature Matching for Spherical Images via Local Geometric Rectification and Learned Descriptor
Abstract
:1. Introduction
2. Methodology
2.1. Spherical Camera Imaging Model
2.2. Image Patch Reprojection for Local Geometric Rectification
- (1)
- (2)
- Considering that a unit sphere camera model is used to define the Cartesian coordinate system , the homogeneous coordinate is then projected onto the sphere point through the normalization operation presented in Equation (5):
- (3)
- The sphere point is further transformed from the local Cartesian coordinate system of the rectified image patch to the global Cartesian coordinate system by using a transformation matrix , as presented by Equation (6). The transformation matrices , and define the rotation around the Z, X, and Y axes with the orientation , latitude and longitude , respectively:
- (4)
2.3. Learned Feature Descriptors from Rectified Image Patches
2.4. Outlier Removal through Robust Essential Matrix Estimation
2.5. Implementation of the Proposed Algorithm
3. Experiments and Results
3.1. Test Sites and Datasets
- The first dataset is recorded from a campus, which includes a parterre surrounded by high buildings as shown in Figure 7a. For image acquisition, a Garmin VIRB 360 camera is used, which stores images in the equirectangular representation format. The data acquisition is conducted around the central parterre, and there are a total number of 37 images collected with a resolution of 5640 by 2820 pixels.
- The second dataset includes a complex building structure that covers from its rooftop to the inner aisles as shown in Figure 7b. Parterres exist on the rooftop, and the inner aisles connect different layers. For image acquisition, the same Garmin VIRB 360 camera as in dataset 1 is adopted by using a hand-held tripod. A total number of 279 spherical images are collected, which cover the whole inner aisles.
- The third dataset is collected using an MMS system. The test site goes along an urban street, whose length is approximately 7.0 km. Along the street, low residual buildings are located near the two roadsides as shown in Figure 7c. In this test site, a PointGrey Ladybug3 camera that is made of six fisheye cameras is used. By setting the interval distance of 3 m for camera exposure, there are a total number of 1937 spherical images collected from this site.
3.2. Evaluation Metrics
3.3. The Analysis of the Performance for Local Geometric Rectification
3.4. The Comparison of Local Feature-Based Matching
3.5. Application in SfM-Based Image Orientation
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jiang, S.; Jiang, W.; Wang, L. Unmanned Aerial Vehicle-Based Photogrammetric 3D Mapping: A survey of techniques, applications, and challenges. IEEE Geosci. Remote Sens. Mag. 2022, 10, 135–171. [Google Scholar] [CrossRef]
- Wu, B.; Xie, L.; Hu, H.; Zhu, Q.; Yau, E. Integration of aerial oblique imagery and terrestrial imagery for optimized 3D modeling in urban areas. ISPRS J. Photogramm. Remote Sens. 2018, 139, 119–132. [Google Scholar] [CrossRef]
- Chiabrando, F.; D’Andria, F.; Sammartano, G.; Spanò, A. UAV photogrammetry for archaeological site survey. 3D models at the Hierapolis in Phrygia (Turkey). Virtual Archaeol. Rev. 2018, 9, 28–43. [Google Scholar] [CrossRef]
- Jiang, S.; Jiang, W.; Huang, W.; Yang, L. UAV-based oblique photogrammetry for outdoor data acquisition and offsite visual inspection of transmission line. Remote Sens. 2017, 9, 278. [Google Scholar] [CrossRef]
- Jiang, S.; You, K.; Li, Y.; Weng, D.; Chen, W. 3D Reconstruction of Spherical Images based on Incremental Structure from Motion. arXiv 2023, arXiv:2306.12770. [Google Scholar]
- Torii, A.; Havlena, M.; Pajdla, T. From google street view to 3d city models. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, Japan, 29 September–2 October 2009; pp. 2188–2195. [Google Scholar]
- Gao, S.; Yang, K.; Shi, H.; Wang, K.; Bai, J. Review on panoramic imaging and its applications in scene understanding. IEEE Trans. Instrum. Meas. 2022, 71, 1–34. [Google Scholar] [CrossRef]
- Jhan, J.P.; Kerle, N.; Rau, J.Y. Integrating UAV and ground panoramic images for point cloud analysis of damaged building. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Fangi, G.; Pierdicca, R.; Sturari, M.; Malinverni, E. Improving spherical photogrammetry using 360 omni-cameras: Use cases and new applications. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 331–337. [Google Scholar] [CrossRef]
- Janiszewski, M.; Torkan, M.; Uotinen, L.; Rinne, M. Rapid photogrammetry with a 360-degree camera for tunnel mapping. Remote Sens. 2022, 14, 5494. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Jiang, S.; Jiang, W. Reliable image matching via photometric and geometric constraints structured by Delaunay triangulation. ISPRS J. Photogramm. Remote Sens. 2019, 153, 1–20. [Google Scholar] [CrossRef]
- Pagani, A.; Stricker, D. Structure from motion using full spherical panoramic cameras. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 375–382. [Google Scholar]
- Lichti, D.D.; Jarron, D.; Tredoux, W.; Shahbazi, M.; Radovanovic, R. Geometric modelling and calibration of a spherical camera imaging system. Photogramm. Rec. 2020, 35, 123–142. [Google Scholar] [CrossRef]
- Chuang, T.Y.; Perng, N. Rectified feature matching for spherical panoramic images. Photogramm. Eng. Remote Sens. 2018, 84, 25–32. [Google Scholar] [CrossRef]
- Taira, H.; Inoue, Y.; Torii, A.; Okutomi, M. Robust feature matching for distorted projection by spherical cameras. IPSJ Trans. Comput. Vis. Appl. 2015, 7, 84–88. [Google Scholar] [CrossRef]
- Wang, Y.; Cai, S.; Li, S.J.; Liu, Y.; Guo, Y.; Li, T.; Cheng, M.M. CubemapSLAM: A piecewise-pinhole monocular fisheye SLAM system. In Proceedings of the Asian Conference on Computer Vision, Perth, WA, Australia, 2–6 December 2018; pp. 34–49. [Google Scholar]
- Zhao, Q.; Feng, W.; Wan, L.; Zhang, J. SPHORB: A fast and robust binary feature on the sphere. Int. J. Comput. Vis. 2015, 113, 143–159. [Google Scholar] [CrossRef]
- Guan, H.; Smith, W.A. BRISKS: Binary features for spherical images on a geodesic grid. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4516–4524. [Google Scholar]
- Chen, L.; Rottensteiner, F.; Heipke, C. Feature detection and description for image matching: From hand-crafted design to deep learning. Geo-Spat. Inf. Sci. 2021, 24, 58–74. [Google Scholar] [CrossRef]
- Jiang, S.; Jiang, W.; Guo, B.; Li, L.; Wang, L. Learned local features for structure from motion of uav images: A comparative evaluation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10583–10597. [Google Scholar] [CrossRef]
- Han, X.; Leung, T.; Jia, Y.; Sukthankar, R.; Berg, A.C. Matchnet: Unifying feature and metric learning for patch-based matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3279–3286. [Google Scholar]
- Kumar BG, V.; Carneiro, G.; Reid, I. Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5385–5394. [Google Scholar]
- Luo, Z.; Shen, T.; Zhou, L.; Zhu, S.; Zhang, R.; Yao, Y.; Fang, T.; Quan, L. Geodesc: Learning local descriptors by integrating geometry constraints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 168–183. [Google Scholar]
- Tian, Y.; Fan, B.; Wu, F. L2-net: Deep learning of discriminative patch descriptor in euclidean space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 661–669. [Google Scholar]
- Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8092–8101. [Google Scholar]
- Luo, Z.; Zhou, L.; Bai, X.; Chen, H.; Zhang, J.; Yao, Y.; Li, S.; Fang, T.; Quan, L. Aslfeat: Learning local features of accurate shape and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6589–6598. [Google Scholar]
- Eder, M.; Shvets, M.; Lim, J.; Frahm, J.M. Tangent images for mitigating spherical distortion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12426–12434. [Google Scholar]
- Shan, Y.; Li, S. Descriptor matching for a discrete spherical image with a convolutional neural network. IEEE Access 2018, 6, 20748–20755. [Google Scholar] [CrossRef]
- Coors, B.; Condurache, A.P.; Geiger, A. Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 518–533. [Google Scholar]
- Su, Y.C.; Grauman, K. Learning spherical convolution for fast features from 360 imagery. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Zhao, Q.; Zhu, C.; Dai, F.; Ma, Y.; Jin, G.; Zhang, Y. Distortion-aware CNNs for Spherical Images. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 1198–1204. [Google Scholar]
- Su, Y.C.; Grauman, K. Kernel transformer networks for compact spherical convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9442–9451. [Google Scholar]
- Mei, C.; Rives, P. Single view point omnidirectional camera calibration from planar grids. In Proceedings of the Proceedings 2007 IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007; pp. 3945–3950. [Google Scholar]
- Scaramuzza, D.; Martinelli, A.; Siegwart, R. A flexible technique for accurate omnidirectional camera calibration and structure from motion. In Proceedings of the Fourth IEEE International Conference on Computer Vision Systems (ICVS’06), New York, NY, USA, 4–7 January 2006; p. 45. [Google Scholar]
- Ji, S.; Shi, Y.; Shi, Z.; Bao, A.; Li, J.; Yuan, X.; Duan, Y.; Shibasaki, R. Comparison of two panoramic sensor models for precise 3d measurements. Photogramm. Eng. Remote Sens. 2014, 80, 229–238. [Google Scholar] [CrossRef]
- Mishchuk, A.; Mishkin, D.; Radenovic, F.; Matas, J. Working hard to know your neighbor’s margins: Local descriptor learning loss. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Wu, C. SiftGPU: A GPU Implementation of Sift. 2007. Available online: http://cs.unc.edu/~ccwu/siftgpu (accessed on 10 October 2023).
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Umeyama, S. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 376–380. [Google Scholar] [CrossRef]
Item Name | Dataset 1 | Dataset 2 | Dataset 3 |
---|---|---|---|
Scene type | Outdoor | Hybrid | Street |
Sensor type | Sphere | Sphere | Sphere |
Camera model | Garmin VIRB 360 | Garmin VIRB 360 | Ladybug3 |
Storage format | Equirectangular | Equirectangular | Equirectangular |
Sensor platform | Ground tripod | Hand-held rod | Moving car |
Number of images | 37 | 279 | 1937 |
Image size (pixel) | 5640 × 2820 | 5640 × 2820 | 5400 × 2700 |
Category | Metric | Description |
---|---|---|
1 | No. matches | The number of initial matches before outlier removal (large value indicates good results). |
No. inliers | The total number of true matches after outlier removal (large value indicates good results). | |
Match precision | The ratio between the numbers of true matches and initial matches (large value indicates good results). | |
2 | No. images | The number of resumed images in SfM-based image orientation (small value indicates good results). |
No. points | The number of reconstructed 3D points in SfM-based image orientation (large value indicates good results). | |
RMSE | The RMSE of the bundle adjustment optimization (small value indicates good results). |
Metric | Method | Dataset 1 | Dataset 2 | Dataset 3 |
---|---|---|---|---|
No. matches | SIFT | 165 | 232 | 296 |
ASLFeat | 337 | 385 | 253 | |
NGR-H | 248 | 234 | 286 | |
Ours | 290 | 297 | 371 | |
No. inliers | SIFT | 111 | 158 | 250 |
ASLFeat | 83 | 198 | 177 | |
NGR-H | 168 | 160 | 244 | |
Ours | 193 | 212 | 317 | |
Match Precision | SIFT | 0.57 | 0.64 | 0.79 |
ASLFeat | 0.33 | 0.51 | 0.68 | |
NGR-H | 0.62 | 0.59 | 0.81 | |
Ours | 0.60 | 0.67 | 0.82 |
Dataset | Method | Images | Points | RMSE |
---|---|---|---|---|
Dataset 1 | SIFT | 37 | 2569 | 0.74 |
NGR-H | 37 | 3832 | 0.80 | |
Ours | 37 | 4645 | 0.80 | |
Dataset 2 | SIFT | 279 | 40,118 | 0.80 |
NGR-H | 279 | 38,927 | 0.83 | |
Ours | 279 | 49,252 | 0.82 | |
Dataset 3 | SIFT | 1937 | 290,240 | 0.56 |
NGR-H | 1937 | 289,681 | 0.61 | |
Ours | 1937 | 363,371 | 0.60 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, S.; Liu, J.; Li, Y.; Weng, D.; Chen, W. Reliable Feature Matching for Spherical Images via Local Geometric Rectification and Learned Descriptor. Remote Sens. 2023, 15, 4954. https://doi.org/10.3390/rs15204954
Jiang S, Liu J, Li Y, Weng D, Chen W. Reliable Feature Matching for Spherical Images via Local Geometric Rectification and Learned Descriptor. Remote Sensing. 2023; 15(20):4954. https://doi.org/10.3390/rs15204954
Chicago/Turabian StyleJiang, San, Junhuan Liu, Yaxin Li, Duojie Weng, and Wu Chen. 2023. "Reliable Feature Matching for Spherical Images via Local Geometric Rectification and Learned Descriptor" Remote Sensing 15, no. 20: 4954. https://doi.org/10.3390/rs15204954
APA StyleJiang, S., Liu, J., Li, Y., Weng, D., & Chen, W. (2023). Reliable Feature Matching for Spherical Images via Local Geometric Rectification and Learned Descriptor. Remote Sensing, 15(20), 4954. https://doi.org/10.3390/rs15204954