AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer
Abstract
:1. Introduction
- (1)
- Designing a module that integrates multi-head self-attention, MLP, and graph neural networks into the backend of the visual positioning algorithm’s backbone network. This module aims to enhance the aggregation of contextual information from the feature map. It addresses the limitation of the poor recall effect observed in Transformer-based research of the CosPlace network.
- (2)
- Investigating the impact of the sinusoidal position coding module on the proposed visual localization algorithm. This exploration sheds light on how the introduction of the sinusoidal position coding module influences the performance of the visual positioning algorithm.
- (3)
- Constructing a UAV dataset that closely resembles aerial perspectives and annotating it with UTM coordinates. This dataset serves as a means to test the performance of the trained visual positioning algorithm. It addresses the challenge posed by the image matching algorithm’s inability to achieve UAV positioning when the camera’s relative poses and intrinsics are unknown.
2. Related Works
3. AGCosPlace
3.1. Backbone
3.2. Attentional GNN
3.3. Aggregation Module
4. Results
4.1. Dataset
4.2. Training Environment
4.3. Performance Evaluation of AGCosPlace and CosPlace
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Liu, M.; Yang, J.; Gui, G. DSF-NOMA: UAV-assisted emergency communication technology in a heterogeneous Internet of Things. IEEE Internet Things J. 2019, 6, 5508–5519. [Google Scholar] [CrossRef]
- Radoglou-Grammatikis, P.; Sarigiannidis, P.; Lagkas, T.; Moscholios, I. A compilation of UAV applications for precision agriculture. Comput. Netw. 2020, 172, 107148. [Google Scholar] [CrossRef]
- Mademlis, I.; Mygdalis, V.; Nikolaidis, N.; Montagnuolo, M.; Negro, F.; Messina, A.; Pitas, I. High-level multiple-UAV cinematography tools for covering outdoor events. IEEE Trans. Broadcast. 2019, 65, 627–635. [Google Scholar] [CrossRef]
- Feroz, S.; Abu Dabous, S. Uav-based remote sensing applications for bridge condition assessment. Remote Sens. 2021, 13, 1809. [Google Scholar] [CrossRef]
- Outay, F.; Mengash, H.A.; Adnan, M. Applications of unmanned aerial vehicle (UAV) in road safety, traffic and highway infrastructure management: Recent advances and challenges. Transp. Res. Part A Policy Pract. 2020, 141, 116–129. [Google Scholar] [CrossRef] [PubMed]
- Chen, D.; Gao, G.X. Probabilistic graphical fusion of LiDAR, GPS, and 3D building maps for urban UAV navigation. Navigation 2019, 66, 151–168. [Google Scholar] [CrossRef] [Green Version]
- Lu, Y.; Xue, Z.; Xia, G.S.; Zhang, L. A survey on vision-based UAV navigation. Geo-Spat. Inf. Sci. 2018, 21, 21–32. [Google Scholar] [CrossRef] [Green Version]
- Gyagenda, N.; Hatilima, J.V.; Roth, H.; Zhmud, V. A review of GNSS-independent UAV navigation techniques. Robot. Auton. Syst. 2022, 152, 104069. [Google Scholar] [CrossRef]
- Raja, G.; Suresh, S.; Anbalagan, S.; Ganapathisubramaniyan, A.; Kumar, N. PFIN: An efficient particle filter-based indoor navigation framework for UAVs. IEEE Trans. Veh. Technol. 2021, 70, 4984–4992. [Google Scholar] [CrossRef]
- Gao, K.; Wang, H.; Lv, H.; Gao, P. A DL-based High-Precision Positioning Method in Challenging Urban Scenarios for B5G CCUAVs. IEEE J. Sel. Areas Commun. 2023, 41, 1670–1687. [Google Scholar] [CrossRef]
- Eckenhoff, K.; Geneva, P.; Huang, G. MIMC-VINS: A versatile and resilient multi-IMU multi-camera visual-inertial navigation system. IEEE Trans. Robot. 2021, 37, 1360–1380. [Google Scholar] [CrossRef]
- Wang, Z.H.; Qin, K.Y.; Zhang, T.; Zhu, B. An intelligent ground-air cooperative navigation framework based on visual-aided method in indoor environments. Unmanned Syst. 2021, 9, 237–246. [Google Scholar] [CrossRef]
- Arafat, M.Y.; Alam, M.M.; Moh, S. Vision-based navigation techniques for unmanned aerial vehicles: Review and challenges. Drones 2023, 7, 89. [Google Scholar] [CrossRef]
- Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar] [CrossRef]
- Ramadan, M.; El Tokhey, M.; Ragab, A.; Fath-Allah, T.; Ragheb, A. Adopted image matching techniques for aiding indoor navigation. Ain Shams Eng. J. 2021, 12, 3649–3658. [Google Scholar] [CrossRef]
- Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2020, 129, 23–79. [Google Scholar] [CrossRef]
- Hai, J.; Hao, Y.; Zou, F.; Lin, F.; Han, S. A visual navigation system for UAV under diverse illumination conditions. Appl. Artif. Intell. 2021, 35, 1529–1549. [Google Scholar] [CrossRef]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 19–23 June 2020; pp. 4938–4947. [Google Scholar] [CrossRef]
- Berton, G.; Masone, C.; Caputo, B. Rethinking visual geo-localization for large-scale applications. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–23 June 2022; pp. 4878–4888. [Google Scholar] [CrossRef]
- Krichen, M.; Adoni, W.Y.H.; Mihoub, A.; Alzahrani, M.Y.; Nahhal, T. Security challenges for drone communications: Possible threats, attacks and countermeasures. In Proceedings of the 2022 2nd International Conference of Smart Systems and Emerging Technologies (SMARTTECH 2022), Riyadh, Saudi Arabia, 9–11 May 2022; pp. 184–189. [Google Scholar] [CrossRef]
- Ko, Y.; Kim, J.; Duguma, D.G.; Astillo, P.V.; You, I.; Pau, G. Drone secure communication protocol for future sensitive applications in military zone. Sensors 2021, 21, 2057. [Google Scholar] [CrossRef]
- Rezwan, S.; Choi, W. Artificial intelligence approaches for UAV navigation: Recent advances and future challenges. IEEE Access 2022, 10, 26320–26339. [Google Scholar] [CrossRef]
- Couturier, A.; Akhloufi, M.A. A review on absolute visual localization for UAV. Robot. Auton. Syst. 2021, 135, 103666. [Google Scholar] [CrossRef]
- Wan, X.; Liu, J.; Yan, H.; Morgan, G.L. Illumination-invariant image matching for autonomous UAV localisation based on optical sensing. ISPRS J. Photogramm. Remote Sens. 2016, 119, 198–213. [Google Scholar] [CrossRef]
- Zhang, X.; He, Z.; Ma, Z.; Wang, Z.; Wang, L. Llfe: A novel learning local features extraction for uav navigation based on infrared aerial image and satellite reference image matching. Remote Sens. 2021, 13, 4618. [Google Scholar] [CrossRef]
- Ren, K.; Ding, L.; Wan, M.; Gu, G.; Chen, Q. Target localization based on cross-view matching between UAV and satellite. Chin. J. Aeronaut. 2022, 35, 333–341. [Google Scholar] [CrossRef]
- Ding, L.; Zhou, J.; Meng, L.; Long, Z. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sens. 2020, 13, 47. [Google Scholar] [CrossRef]
- Xu, C.; Liu, C.; Li, H.; Ye, Z.; Sui, H.; Yang, W. Multiview Image Matching of Optical Satellite and UAV Based on a Joint Description Neural Network. Remote Sens. 2022, 14, 838. [Google Scholar] [CrossRef]
- Fu, C.; Li, B.; Ding, F.; Lin, F.; Lu, G. Correlation filters for unmanned aerial vehicle-based aerial tracking: A review and experimental evaluation. IEEE Geosci. Remote Sens. Mag. 2021, 10, 125–160. [Google Scholar] [CrossRef]
- Li, S.; Liu, Y.; Zhao, Q.; Feng, Z. Learning residue-aware correlation filters and refining scale for real-time uav tracking. Pattern Recognit. 2022, 127, 108614. [Google Scholar] [CrossRef]
- An, Z.; Wang, X.; Li, B.; Xiang, Z.; Zhang, B. Robust visual tracking for UAVs with dynamic feature weight selection. Appl. Intell. 2022, 53, 3836–3849. [Google Scholar] [CrossRef]
- Wang, Y. Hybrid efficient convolution operators for visual tracking. J. Artif. Intell. 2021, 3, 63. [Google Scholar]
- Hou, Z.; Wang, Z.; Pu, L.; Ma, S.; Yang, Z.; Fan, J. Target drift discriminative network based on deep learning in visual tracking. J. Electron. Imaging 2022, 31, 043052. [Google Scholar] [CrossRef]
- Lin, H.; Zhou, J.; Gan, Y.; Vong, C.M.; Liu, Q. Novel up-scale feature aggregation for object detection in aerial images. Neurocomputing 2020, 411, 364–374. [Google Scholar] [CrossRef]
- Bellavia, F.; Colombo, C. Is there anything new to say about SIFT matching? Int. J. Comput. Vis. 2020, 128, 1847–1866. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Chen, R.; Liu, M.; Xiao, A.; Wu, D.; Zhao, S. Indoor visual positioning aided by CNN-based image retrieval: Training-free, 3D modeling-free. Sensors 2018, 18, 2692. [Google Scholar] [CrossRef] [Green Version]
- Ha, I.; Kim, H.; Park, S.; Kim, H. Image retrieval using BIM and features from pretrained VGG network for indoor localization. Build. Environ. 2018, 140, 23–31. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1437–1451. [Google Scholar] [CrossRef] [Green Version]
- Milford, M.J.; Wyeth, G.F. Mapping a suburb with a single camera using a biologically inspired SLAM system. IEEE Trans. Robot. 2008, 24, 1038–1053. [Google Scholar] [CrossRef] [Green Version]
- Hu, S.; Lee, G.H. Image-based geo-localization using satellite imagery. Int. J. Comput. Vis. 2020, 128, 1205–1219. [Google Scholar] [CrossRef] [Green Version]
- Zheng, Z.; Wei, Y.; Yang, Y. University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 2020 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1395–1403. [Google Scholar] [CrossRef]
Groups Parameters | Setting |
---|---|
M | 10 |
30 | |
N | 5 |
L | 2 |
ground_num | 1 |
Min_images_per_class | 10 |
Training Parameters | Setting |
---|---|
resized | 512 × 512 |
batch size | 32 |
epoch | 50 |
iterations_per_epoch | 10,000 |
lr | 0.00001 |
classifiers_lr | 0.01 |
optimizer | Adam |
seed | 0 |
Data Augmentation Parameters | Setting |
---|---|
brightness | 0.7 |
contrast | 0.7 |
hue | 0.5 |
saturation | 0.7 |
random_resized_crop | 0.5 |
Model | Desc.dim | channels. num.in. last.conv | R@1 (%) | R@5 (%) | R@10 (%) | R@20 (%) | Train_Time (Days) |
---|---|---|---|---|---|---|---|
ResNet18 | 512 | 512 | 53.7 | 66.0 | 71.5 | 75.3 | 2 |
AGResNet18 | 512 | 512 | 54.3 | 66.1 | 71.8 | 76.3 | 2 |
PE+ AGResNet18 | 512 | 512 | 53.5 | 65.9 | 71.3 | 76.5 | 2 |
ResNet50 | 512 | 2048 | 59.1 | 72.1 | 76.9 | 80.5 | 2 |
AGResNet50 | 512 | 2048 | 60.1 | 74.0 | 78.7 | 81.6 | 3 |
ResNet101 | 512 | 2048 | 62.4 | 73.8 | 77.8 | 82.4 | 4 |
AGResNet101 | 512 | 2048 | 62.7 | 74.2 | 78.6 | 83.0 | 5 |
VGG16 | 512 | 512 | 61.0 | 73.3 | 77.7 | 80.6 | 3 |
AGVGG16 | 512 | 512 | 60.8 | 71.8 | 76.3 | 80.4 | 3 |
Model | Test Set | Desc.dim | channels. num.in. last.conv | R@1 (%) | R@5 (%) | R@10 (%) | R@20 (%) |
---|---|---|---|---|---|---|---|
ResNet18 | Drone | 512 | 512 | 61.3 | 66.7 | 66.7 | 68.8 |
AGResNet18 | Drone | 512 | 512 | 64.5 | 67.7 | 67.7 | 71.0 |
PE+ AGResNet18 | Drone | 512 | 512 | 64.5 | 67.7 | 67.7 | 68.8 |
ResNet50 | Drone | 512 | 2048 | 66.7 | 68.8 | 69.9 | 71.0 |
AGResNet50 | Drone | 512 | 2048 | 66.7 | 69.9 | 72.0 | 74.2 |
ResNet101 | Drone | 512 | 2048 | 64.5 | 69.9 | 72.0 | 72.0 |
AGResNet101 | Drone | 512 | 2048 | 62.4 | 67.7 | 71.0 | 72.0 |
VGG16 | Drone | 512 | 512 | 67.7 | 74.2 | 75.3 | 76.3 |
AGVGG16 | Drone | 512 | 512 | 69.9 | 72.0 | 74.2 | 76.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Y.; Zhou, Y.; Yang, F. AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer. Drones 2023, 7, 498. https://doi.org/10.3390/drones7080498
Guo Y, Zhou Y, Yang F. AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer. Drones. 2023; 7(8):498. https://doi.org/10.3390/drones7080498
Chicago/Turabian StyleGuo, Ya, Yatong Zhou, and Fan Yang. 2023. "AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer" Drones 7, no. 8: 498. https://doi.org/10.3390/drones7080498
APA StyleGuo, Y., Zhou, Y., & Yang, F. (2023). AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer. Drones, 7(8), 498. https://doi.org/10.3390/drones7080498