Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold
Abstract
:1. Introduction
2. Related Work
2.1. Global Image Descriptors
2.2. Appearance-Based Localization
3. System Description
3.1. Patches of Smooth Appearance Change
3.1.1. Definition
- The appearance distance from the query descriptor to the m-th PSAC is defined as the average of the descriptor distances to each of its constituent key-pairs:
- Similarly, but in the pose space, we define the translational distance from to the m-th PSAC as
3.1.2. GP Regression
3.2. Robot Localization
3.2.1. System Initialization
3.2.2. Robot Tracking
4. Experimental Results
- 1M COLD Quadruplet and 1M RobotCar Volume: end-to-end learned condition invariant features with VGG16 NetVLAD [24] with quadruplet and volume loss functions in two different datasets.
4.1. Corridor: Sanity Check
4.2. COLD: Sequential Map Testing
4.3. SUNCG: Grid Map Testing
4.4. Comparative Study
- Gaussian Process Particle Filter (GPPF) [42], configured with P = 103 particles.
- The Pairwise Relative Pose estimator (PRP) presented in [31]: a CNN-based regressor that estimates the pose transform between the query and the 5 most similar map images obtained through PR.
- The Network flow solution proposed in [37]: a sequential sparse localization method that includes uniform and flow-based mapping, both considered in this study. In order to make the results comparable, we modified its outcome, which is sparse, to produce continuous estimations. For that, we used the following weighting after the bipartite matching:
- Our approach, configured with P = 103 particles.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
PSAC | Patch of Smooth Appearance Change |
PR | Place Recognition |
GP | Gaussian Process |
GPPF | Gaussian Process Particle Filter |
PF | Particle Filter |
SLAM | Simultaneous Localization and Mapping |
IM | Image Manifold |
DM | Descriptor Manifold |
DL | Deep Learning |
KP | Key-pair |
CS | Constant Sampling |
References
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Sattler, T.; Leibe, B.; Kobbelt, L. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1744–1756. [Google Scholar]
- Gomez-Ojeda, R.; Moreno, F.A.; Zuñiga-Noël, D.; Scaramuzza, D.; Gonzalez-Jimenez, J. PL-SLAM: A stereo SLAM system through the combination of points and line segments. IEEE Trans. Robot. 2019, 35, 734–746. [Google Scholar] [CrossRef] [Green Version]
- Sattler, T.; Torii, A.; Sivic, J.; Pollefeys, M.; Taira, H.; Okutomi, M.; Pajdla, T. Are large-scale 3d models really necessary for accurate visual localization? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1637–1646. [Google Scholar]
- Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Cummins, M.; Newman, P. FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 2008, 27, 647–665. [Google Scholar] [CrossRef]
- Milford, M.J.; Wyeth, G.F. SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 1643–1649. [Google Scholar]
- Pepperell, E.; Corke, P.I.; Milford, M.J. All-environment visual place recognition with SMART. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1612–1618. [Google Scholar]
- Radenović, F.; Tolias, G.; Chum, O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1655–1668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Crowley, J.L.; Pourraz, F. Continuity properties of the appearance manifold for mobile robot position estimation. Image Vis. Comput. 2001, 19, 741–752. [Google Scholar] [CrossRef]
- Ham, J.; Lin, Y.; Lee, D.D. Learning nonlinear appearance manifolds for robot localization. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 2971–2976. [Google Scholar]
- He, X.; Zemel, R.S.; Mnih, V. Topological map learning from outdoor image sequences. J. Field Robot. 2006, 23, 1091–1104. [Google Scholar] [CrossRef]
- Gomez-Ojeda, R.; Lopez-Antequera, M.; Petkov, N.; Gonzalez-Jimenez, J. Training a convolutional neural network for appearance-invariant place recognition. arXiv 2015, arXiv:1505.07428. [Google Scholar]
- Ko, J.; Fox, D. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Auton. Robot. 2009, 27, 75–90. [Google Scholar] [CrossRef] [Green Version]
- Lopez-Antequera, M.; Petkov, N.; Gonzalez-Jimenez, J. Image-based localization using Gaussian processes. In Proceedings of the 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Alcala de Henares, Spain, 4–7 October 2016; pp. 1–7. [Google Scholar]
- Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version]
- Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
- Pless, R.; Souvenir, R. A survey of manifold learning for images. IPSJ Trans. Comput. Vis. Appl. 2009, 1, 83–94. [Google Scholar] [CrossRef] [Green Version]
- Wakin, M.B.; Donoho, D.L.; Choi, H.; Baraniuk, R.G. The multiscale structure of non-differentiable image manifolds. In Proceedings of the Wavelets XI, Proceedings of the SPIE, San Diego, CA, USA, 17 September 2005; Volume 5914, p. 59141.
- Lopez-Antequera, M.; Gomez-Ojeda, R.; Petkov, N.; Gonzalez-Jimenez, J. Appearance-invariant place recognition by discriminatively training a convolutional neural network. Pattern Recognit. Lett. 2017, 92, 89–95. [Google Scholar] [CrossRef]
- Thoma, J.; Paudel, D.P.; Chhatkuli, A.; Van Gool, L. Learning Condition Invariant Features for Retrieval-Based Localization from 1M Images. arXiv 2020, arXiv:2008.12165. [Google Scholar]
- Jaenal, A.; Zuñiga-Nöel, D.; Gomez-Ojeda, R.; Gonzalez-Jimenez, J. Improving Visual SLAM in Car-Navigated Urban Environments with Appearance Maps. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2021; pp. 4679–4685. [Google Scholar]
- Lowry, S.; Sünderhauf, N.; Newman, P.; Leonard, J.J.; Cox, D.; Corke, P.; Milford, M.J. Visual place recognition: A survey. IEEE Trans. Robot. 2015, 32, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Lenc, K.; Vedaldi, A. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 991–999. [Google Scholar]
- Jaenal, A.; Moreno, F.A.; Gonzalez-Jimenez, J. Experimental study of the suitability of CNN-based holistic descriptors for accurate visual localization. In Proceedings of the 2nd International Conference on Applications of Intelligent Systems, Las Palmas de Gran Canaria, Spain, 7–9 January 2019; pp. 1–6. [Google Scholar]
- Sattler, T.; Weyand, T.; Leibe, B.; Kobbelt, L. Image Retrieval for Image-Based Localization Revisited. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012; Volume 1, p. 4. [Google Scholar]
- Torii, A.; Arandjelovic, R.; Sivic, J.; Okutomi, M.; Pajdla, T. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1808–1817. [Google Scholar]
- Laskar, Z.; Melekhov, I.; Kalia, S.; Kannala, J. Camera relocalization by computing pairwise relative poses using convolutional neural network. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 929–938. [Google Scholar]
- Balntas, V.; Li, S.; Prisacariu, V. RelocNet: Continuous Metric Learning Relocalisation using Neural Nets. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 751–767. [Google Scholar]
- Ding, M.; Wang, Z.; Sun, J.; Shi, J.; Luo, P. CamNet: Coarse-to-fine retrieval for camera re-localization. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 2871–2880. [Google Scholar]
- Sünderhauf, N.; Neubert, P.; Protzel, P. Are we there yet? Challenging SeqSLAM on a 3000 km journey across all four seasons. In Proceedings of the Workshop on Long-Term Autonomy, IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013. [Google Scholar]
- Naseer, T.; Spinello, L.; Burgard, W.; Stachniss, C. Robust visual robot localization across seasons using network flows. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; pp. 2564–2570. [Google Scholar]
- Naseer, T.; Burgard, W.; Stachniss, C. Robust visual localization across seasons. IEEE Trans. Robot. 2018, 34, 289–302. [Google Scholar] [CrossRef]
- Thoma, J.; Paudel, D.P.; Chhatkuli, A.; Probst, T.; Gool, L.V. Mapping, localization and path planning for image-based navigation using visual features and map. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7383–7391. [Google Scholar]
- Maddern, W.; Milford, M.; Wyeth, G. CAT-SLAM: Probabilistic localisation and mapping using a continuous appearance-based trajectory. Int. J. Robot. Res. 2012, 31, 429–451. [Google Scholar] [CrossRef] [Green Version]
- Rasmussen, C.E. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2004; pp. 63–71. [Google Scholar]
- Huhle, B.; Schairer, T.; Schilling, A.; Straßer, W. Learning to localize with Gaussian process regression on omnidirectional image data. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 5208–5213. [Google Scholar]
- Schairer, T.; Huhle, B.; Vorst, P.; Schilling, A.; Straßer, W. Visual mapping with uncertainty for correspondence-free localization using Gaussian process regression. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 4229–4235. [Google Scholar]
- Lopez-Antequera, M.; Petkov, N.; Gonzalez-Jimenez, J. City-scale continuous visual localization. In Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France, 6–8 September 2017; pp. 1–6. [Google Scholar]
- Doucet, A.; de Freitas, N.; Gordon, N. Sequential Monte Carlo Methods in Practice; Statistics for Engineering and Information Science; Springer Science & Business Media: New York, NY, USA, 2001. [Google Scholar]
- Blanco, J.L. A Tutorial on SE(3) Transformation Parameterizations and On-Manifold Optimization; Technical Report; Universidad de Málaga: Málaga, Spain, 2010; Volume 3, p. 6. [Google Scholar]
- Pronobis, A.; Caputo, B. COLD: The CoSy localization database. Int. J. Robot. Res. 2009, 28, 588–594. [Google Scholar] [CrossRef] [Green Version]
- Song, S.; Yu, F.; Zeng, A.; Chang, A.X.; Savva, M.; Funkhouser, T. Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1746–1754. [Google Scholar]
- Wu, Y.; Wu, Y.; Gkioxari, G.; Tian, Y. Building generalizable agents with a realistic and rich 3D environment. arXiv 2018, arXiv:1801.02209. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- GPy. GPy: A Gaussian Process Framework in Python. Since 2012. Available online: http://github.com/SheffieldML/GPy (accessed on 30 March 2021).
Dataset | Area | Map | Key-Pairs | PSACs | Size | Construction |
(KPs) | (Mb) | Time (s) | ||||
COLD Database | ∼900 m | Samp. 10 | 559 | 321 | 2.18 | 56.68 |
Samp. 20 | 280 | 159 | 1.09 | 32.27 | ||
Samp. 30 | 187 | 103 | 0.73 | 25.92 | ||
SUNCG | ∼45 m | Dense | 1203 | 679 | 4.69 | 140.94 |
Sparse pos | 451 | 227 | 1.76 | 129.36 | ||
Sparse rot | 723 | 432 | 2.82 | 136.95 | ||
Sparse pos-rot | 271 | 149 | 1.05 | 107.93 |
Dataset | Map | Sequence | GPPF [18,42] + | PRP CNN [31] + | Network Flow [37] + | Network Flow [37] + | Our Method |
---|---|---|---|---|---|---|---|
Unif. Sampl | NetVLAD PR | Unif. Sampl. | Flow Sampl | C.S. | |||
COLD Database | Samp. 20 | Night std | L | 1.17, 10.94 (8%) | 0.19, 4.33 (66%) | 0.26, 4.66 (60%) | 0.2, 3.08 (85%) |
Cloudy std | L | 1.93, 14.59 (4%) | 0.31, 4.76 (57%) | 0.36, 5.29 (46%) | 0.3, 5.82 (56%) | ||
Sunny std | L | 2.2, 14.75 (3%) | 0.36, 6.91 (40%) | 0.42, 7.7 (33%) | 0.23, 5.08 (66%) | ||
Night ext | L | 1.27, 11.07 (8%) | 0.22, 3.88 (69%) | 0.26, 4.36 (61%) | 0.17, 3.38 (82%) | ||
Cloudy ext | L | 1.46, 12.17 (6%) | 0.22, 4.13 (60%) | 0.31, 5.12 (52%) | 0.2, 3.48 (82%) | ||
Sunny ext | L | 2.11, 16.59 (2%) | 0.3, 6.95 (50%) | 0.35, 8.14 (39%) | 0.28, 6.67 (54%) | ||
SUNCG | Dense | Test sequence | 1.07, 6.07 (17%) | 0.75, 12.14 (13%) | N/A | N/A | 0.15, 5.69 (60%) |
Sparse pos | 1.21, 9.16 (7%) | 1.14, 19.76 (2%) | N/A | N/A | 0.46, 4.30 (51%) | ||
Sparse rot | 1.24, 12.36 (2%) | 0.89, 19.76 (6%) | N/A | N/A | 0.20, 5.15 (57%) | ||
Sparse pos-rot | 1.62, 20.16 (0%) | 1.51, 24.51 (1%) | N/A | N/A | 0.72, 18.58 (19%) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jaenal, A.; Moreno, F.-A.; Gonzalez-Jimenez, J. Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold. Sensors 2021, 21, 2483. https://doi.org/10.3390/s21072483
Jaenal A, Moreno F-A, Gonzalez-Jimenez J. Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold. Sensors. 2021; 21(7):2483. https://doi.org/10.3390/s21072483
Chicago/Turabian StyleJaenal, Alberto, Francisco-Angel Moreno, and Javier Gonzalez-Jimenez. 2021. "Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold" Sensors 21, no. 7: 2483. https://doi.org/10.3390/s21072483
APA StyleJaenal, A., Moreno, F. -A., & Gonzalez-Jimenez, J. (2021). Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold. Sensors, 21(7), 2483. https://doi.org/10.3390/s21072483