AQSFormer: Adaptive Query Selection Transformer for Real-Time Ship Detection from Visual Images
Abstract
:1. Introduction
- (1)
- This study introduces 2D rotational position encoding (2D-RoPE) [17] to enhance transformer models for ship detection. 2D-RoPE enables the model to precisely interpret spatial relationships, including distances, angles, and directions between ships, by combining relative and absolute positions. This enriched positional awareness significantly improves the precision of the model in identifying ship targets, advancing spatial analysis capabilities in maritime surveillance, and providing a sophisticated solution for ship detection tasks.
- (2)
- This paper enhances ship detection models with a deformable attention (DA) module, focusing on crucial sampling points near reference points in feature maps. This selective attention prioritizes important contour points, serving as a preliminary filter that highlights critical image features. It directs attention to essential ship details through edge contours, effectively gathering crucial edge information. This method overcomes the challenges posed by vague ship contours against complex backgrounds such as waves or fog by combining sparse sampling with transformer global relationship modeling. This improvement increases the capacity of the model to distinguish ships, enhancing its effectiveness in ship detection tasks.
- (3)
- We propose an adaptive query selection (AQS) module to autonomously choose positive and negative training samples based on the statistical characteristics of ship targets. This module evaluates bbox quality and consistency, distinguishing highly similar queries. It offers flexible object query selection, adapts to diverse ship sizes and detection scenarios, reduces incorrect filtering, improves recall rates, and guarantees the pipeline end-to-end. This strategic improvement significantly enhances the ship detection capabilities of the model, improving efficiency and effectiveness.
2. Related Works
2.1. Ship Detection Methods Based on CNNs
2.2. Ship Detection Methods Based on Transformers
3. Proposed Method
3.1. The 2D-RoPE Module
3.2. The Deformable Attention Module
3.3. The Adaptive Query Selection Module
Algorithm 1 Adaptive query selection module. |
|
4. Experiments
4.1. Experimental Setup
4.1.1. Dataset
4.1.2. Settings
4.1.3. Evaluation Metrics
4.2. Comparison with State-of-the-Art Methods
4.3. Ablation Experiments and Sensitivity Analysis
- Flip: Horizontal flipping increases data diversity and helps the model adapt to different directions, improving robustness in real-world scenarios.
- Resize: Resizing helps the model learn scale-invariant features, ensuring good performance regardless of object size in object detection tasks.
- Crop: Random cropping enables the model to focus on different areas of the image, improving recognition in dense scenes or with partial occlusions.
- Local Similarity Jitter (LSJ): This technique introduces small perturbations to simulate noise, enhancing the model’s robustness to subtle variations in real-world conditions.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Si, J.; Song, B.; Wu, J.; Lin, W.; Huang, W.; Chen, S. Maritime ship detection method for satellite images based on multiscale feature fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6642–6655. [Google Scholar] [CrossRef]
- Zhang, D.; Wang, C.; Fu, Q. Ofcos: An oriented anchor-free detector for ship detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6004005. [Google Scholar] [CrossRef]
- Wang, P.; Liu, B.; Li, Y.; Chen, P.; Liu, P. IceRegionShip: Optical Remote Sensing Dataset for Ship Detection in Ice-Infested Waters. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1007–1020. [Google Scholar] [CrossRef]
- Zhang, Y.; Lu, D.; Qiu, X.; Li, F. Scattering point topology for few-shot ship classification in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 10326–10343. [Google Scholar] [CrossRef]
- Yin, Y.; Cheng, X.; Shi, F.; Liu, X.; Huo, H.; Chen, S. High-order spatial interactions enhanced lightweight model for optical remote sensing image-based small ship detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4201416. [Google Scholar] [CrossRef]
- Yuan, Y.; Rao, Z.; Lin, C.; Huang, Y.; Ding, X. Adaptive ship detection from optical to SAR images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3508205. [Google Scholar] [CrossRef]
- Zhao, J.; Chen, Y.; Zhou, Z.; Zhao, J.; Wang, S.; Chen, X. Multiship speed measurement method based on machine vision and drone images. IEEE Trans. Instrum. Meas. 2023, 72, 2513112. [Google Scholar] [CrossRef]
- Zhao, J.; Shi, B.; Huang, T. Reconstructing clear image for high-speed motion scene with a retina-inspired spike camera. IEEE Trans. Comput. Imaging 2021, 8, 12–27. [Google Scholar] [CrossRef]
- Huang, Q.; Sun, H.; Wang, Y.; Yuan, Y.; Guo, X.; Gao, Q. Ship detection based on YOLO algorithm for visible images. IET Image Process. 2023, 18, 481–492. [Google Scholar] [CrossRef]
- Yang, D.; Solihin, M.I.; Zhao, Y.; Yao, B.; Chen, C.; Cai, B.; Machmudah, A. A review of intelligent ship marine object detection based on RGB camera. IET Image Process. 2023, 18, 281–297. [Google Scholar] [CrossRef]
- Assani, N.; Matić, P.; Kaštelan, N.; Čavka, I.R. A review of artificial neural networks applications in maritime industry. IEEE Access 2023, 11, 139823–139848. [Google Scholar] [CrossRef]
- Er, M.J.; Zhang, Y.; Chen, J.; Gao, W. Ship detection with deep learning: A survey. Artif. Intell. Rev. 2023, 56, 11825–11865. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5999–6099. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Maurício, J.; Domingues, I.; Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 2023, 13, 5521. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 2024, 568, 127063. [Google Scholar] [CrossRef]
- Ren, Z.; Tang, Y.; Yang, Y.; Zhang, W. SASOD: Saliency-Aware Ship Object Detection in High-Resolution Optical Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611115. [Google Scholar] [CrossRef]
- Zhang, J.; Xing, M.; Sun, G.C.; Li, N. Oriented Gaussian function-based box boundary-aware vectors for oriented ship detection in multiresolution SAR imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5211015. [Google Scholar] [CrossRef]
- Hu, Q.; Hu, S.; Liu, S.; Xu, S.; Zhang, Y.D. FINet: A feature interaction network for SAR ship object-level and pixel-level detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5239215. [Google Scholar] [CrossRef]
- Yu, H.; Yang, S.; Zhou, S.; Sun, Y. Vs-lsdet: A multiscale ship detector for spaceborne sar images based on visual saliency and lightweight cnn. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1137–1154. [Google Scholar] [CrossRef]
- Guo, H.; Yang, X.; Wang, N.; Gao, X. A CenterNet++ model for ship detection in SAR images. Pattern Recognit. 2021, 112, 107787. [Google Scholar] [CrossRef]
- Leng, X.; Wang, J.; Ji, K.; Kuang, G. Ship detection in range-compressed SAR data. In Proceedings of the 2022 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 2135–2138. [Google Scholar]
- Zhu, M.; Hu, G.; Zhou, H.; Wang, S.; Feng, Z.; Yue, S. A ship detection method via redesigned FCOS in large-scale SAR images. Remote Sens. 2022, 14, 1153. [Google Scholar] [CrossRef]
- Zhang, C.; Liu, P.; Wang, H.; Jin, Y. Saliency-based centernet for ship detection in sar images. In Proceedings of the 2022 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 1552–1555. [Google Scholar]
- Leng, X.; Ji, K.; Kuang, G. Ship detection from raw SAR echo data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5207811. [Google Scholar] [CrossRef]
- Zhang, Y.; Lu, D.; Qiu, X.; Li, F. Scattering-point-guided RPN for oriented ship detection in SAR images. Remote Sens. 2023, 15, 1411. [Google Scholar] [CrossRef]
- Ren, Z.; Tang, Y.; He, Z.; Tian, L.; Yang, Y.; Zhang, W. Ship detection in high-resolution optical remote sensing images aided by saliency information. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5623616. [Google Scholar] [CrossRef]
- Li, X.; Li, Z.; Lv, S.; Cao, J.; Pan, M.; Ma, Q.; Yu, H. Ship detection of optical remote sensing image in multiple scenes. Int. J. Remote Sens. 2021, 43, 5709–5737. [Google Scholar] [CrossRef]
- Wang, Z.; Zhou, Y.; Wang, F.; Wang, S.; Xu, Z. SDGH-Net: Ship detection in optical remote sensing images based on Gaussian heatmap regression. Remote Sens. 2021, 13, 499. [Google Scholar] [CrossRef]
- Dong, Y.; Chen, F.; Han, S.; Liu, H. Ship object detection of remote sensing image based on visual attention. Remote Sens. 2021, 13, 3192. [Google Scholar] [CrossRef]
- Hu, J.; Zhi, X.; Jiang, S.; Tang, H.; Zhang, W.; Bruzzone, L. Supervised multi-scale attention-guided ship detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5630514. [Google Scholar] [CrossRef]
- Xiao, S.; Zhang, Y.; Chang, X. Ship detection based on compressive sensing measurements of optical remote sensing scenes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8632–8649. [Google Scholar] [CrossRef]
- Cui, Z.; Leng, J.; Liu, Y.; Zhang, T.; Quan, P.; Zhao, W. SKNet: Detecting rotated ships as keypoints in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8826–8840. [Google Scholar] [CrossRef]
- Zheng, J.; Liu, Y. A study on small-scale ship detection based on attention mechanism. IEEE Access 2022, 10, 77940–77949. [Google Scholar] [CrossRef]
- Ngo, D.D.; Vo, V.L.; Nguyen, T.; Nguyen, M.H.; Le, M.H. Image-based ship detection using deep variational information bottleneck. Sensors 2023, 23, 8093. [Google Scholar] [CrossRef] [PubMed]
- Liu, T.; Zhang, Z.; Lei, Z.; Huo, Y.; Wang, S.; Zhao, J.; Zhang, J.; Jin, X.; Zhang, X. An approach to ship target detection based on combined optimization model of dehazing and detection. Eng. Appl. Artif. Intell. 2024, 127, 107332. [Google Scholar] [CrossRef]
- Zhou, W.; Peng, Y. Ship detection based on multi-scale weighted fusion. Displays 2023, 78, 102448. [Google Scholar] [CrossRef]
- Yi, Y.; Ni, F.; Ma, Y.; Zhu, X.; Qi, Y.; Qiu, R.; Zhao, S.; Li, F.; Wang, Y. High Performance Gesture Recognition via Effective and Efficient Temporal Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 1003–1009. [Google Scholar]
- Jiang, S.; Zhang, H.; Qi, Y.; Liu, Q. Spatial-temporal interleaved network for efficient action recognition. IEEE Trans. Industr. Inform. 2024, 1–10, Early Access. [Google Scholar] [CrossRef]
- Zheng, Y.; Liu, P.; Qian, L.; Qin, S.; Liu, X.; Ma, Y.; Cheng, G. Recognition and depth estimation of ships based on binocular stereo vision. J. Mar. Sci. Eng. 2022, 10, 1153. [Google Scholar] [CrossRef]
- Shi, H.; Chai, B.; Wang, Y.; Chen, L. A local-sparse-information-aggregation transformer with explicit contour guidance for SAR ship detection. Remote Sens. 2022, 14, 5247. [Google Scholar] [CrossRef]
- Zhang, Y.; Er, M.J.; Gao, W.; Wu, J. High performance ship detection via transformer and feature distillation. In Proceedings of the 2022 5th International Conference on Intelligent Autonomous Systems (ICoIAS), Dalian, China, 23–25 September 2022; pp. 31–36. [Google Scholar]
- Chen, B.; Yu, C.; Zhao, S.; Song, H. An anchor-free method based on transformers and adaptive features for arbitrarily oriented ship detection in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 2012–2028. [Google Scholar] [CrossRef]
- Zhou, Y.; Jiang, X.; Xu, G.; Yang, X.; Liu, X.; Li, Z. PVT-SAR: An arbitrarily oriented SAR ship detector with pyramid vision transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 291–305. [Google Scholar] [CrossRef]
- Chen, Y.; Xia, Z.; Liu, J.; Wu, C. TSDet: End-to-end method with transformer for SAR ship detection. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
- Chen, W.; Hong, D.; Qi, Y.; Han, Z.; Wang, S.; Qing, L.; Huang, Q.; Li, G. Multi-attention network for compressed video referring object segmentation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 4416–4425. [Google Scholar]
- Ge, C.; Song, Y.; Ma, C.; Qi, Y.; Luo, P. Rethinking attentive object detection via neural attention learning. IEEE Trans. Image Process. 2023, 33, 1726–1739. [Google Scholar] [CrossRef]
- Phan, V.M.H.; Xie, Y.; Zhang, B.; Qi, Y.; Liao, Z.; Perperidis, A.; Phung, S.L.; Verjans, J.W.; To, M.S. Structural attention: Rethinking transformer for unpaired medical image synthesis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Marrakesh, Morocco, 6–10 October 2024; pp. 690–700. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Ma, Y.; Zhu, Z.; Qi, Y.; Beheshti, A.; Li, Y.; Qing, L.; Li, G. Style-aware two-stage learning framework for video captioning. Knowl. Based Syst. 2024, 301, 112258. [Google Scholar] [CrossRef]
- Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimedia 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13039–13048. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable transformers for end-to-end object detection. In Proceedings of the Ninth International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
- Li, L.H.; Zhang, P.; Zhang, H.; Yang, J.; Li, C.; Zhong, Y.; Wang, L.; Yuan, L.; Zhang, L.; Hwang, J.N.; et al. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10965–10975. [Google Scholar]
- Zhang, S.; Wang, X.; Wang, J.; Pang, J.; Lyu, C.; Zhang, W.; Luo, P.; Chen, K. Dense distinct query for end-to-end object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7329–7338. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zong, Z.; Song, G.; Liu, Y. DETRs with collaborative hybrid assignments training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 6748–6758. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Methods | [email protected] | mAP | mAP | |||||
---|---|---|---|---|---|---|---|---|
Ore Ship | General Cargo Ship | Bulk Carrier | Fishing Boat | Passenger Ship | Container Ship | @0.5 | @0.5:0.95 | |
(CVPR 2020) ATSS [53] | 0.935 | 0.942 | 0.906 | 0.871 | 0.844 | 0.925 | 0.904 | 0.661 |
(ArXiv 2021) YOLOX [54] | 0.914 | 0.925 | 0.927 | 0.896 | 0.852 | 0.923 | 0.906 | 0.642 |
(CVPR 2021) YOLOF [55] | 0.921 | 0.933 | 0.917 | 0.912 | 0.847 | 0.915 | 0.908 | 0.693 |
(ICLR 2021) Deformable DETR [56] | 0.888 | 0.932 | 0.896 | 0.830 | 0.828 | 0.920 | 0.882 | 0.617 |
(ICCV 2021) TOOD [57] | 0.927 | 0.891 | 0.930 | 0.885 | 0.838 | 0.917 | 0.898 | 0.625 |
(ArXiv 2022) RTMDet [58] | 0.902 | 0.915 | 0.933 | 0.862 | 0.833 | 0.924 | 0.895 | 0.624 |
(CVPR 2022) GLIP [59] | 0.942 | 0.967 | 0.956 | 0.917 | 0.866 | 0.938 | 0.931 | 0.728 |
(CVPR 2023) DDQ DETR [60] | 0.937 | 0.944 | 0.951 | 0.923 | 0.849 | 0.933 | 0.923 | 0.705 |
(ICLR 2023) DINO [61] | 0.949 | 0.962 | 0.963 | 0.915 | 0.923 | 0.956 | 0.945 | 0.752 |
(ICCV 2023) CO-DETR [62] | 0.936 | 0.958 | 0.962 | 0.908 | 0.901 | 0.942 | 0.935 | 0.749 |
(ours) AQSFormer | 0.963 | 0.975 | 0.959 | 0.936 | 0.918 | 0.985 | 0.956 | 0.779 |
Methods | Memory (G) ↓ | Params (M) ↓ | GFLOPs ↓ | FPS ↑ |
---|---|---|---|---|
(CVPR 2020) ATSS [53] | 6.60 | 32.3 | 211.968 | 29.5 |
(ArXiv 2021) YOLOX [54] | 6.71 | 9.0 | 33.777 | 29.2 |
(CVPR 2021) YOLOF [55] | 7.27 | 44.2 | 103.424 | 29.1 |
(ICLR 2021) Deformable DETR [56] | 6.72 | 40.1 | 197.632 | 29.4 |
(ICCV 2021) TOOD [57] | 6.74 | 32.2 | 207.872 | 29.4 |
(ArXiv 2022) RTMDet [58] | 6.75 | 52.3 | 80.121 | 46.0 |
(CVPR 2022) GLIP [59] | 7.02 | 153.6 | 122.88 | 29.2 |
(CVPR 2023) DDQ DETR [60] | 7.04 | 48.3 | 280.576 | 29.4 |
(ICLR 2023) DINO [61] | 7.06 | 47.6 | 263.483 | 29.4 |
(ICCV 2023) CO-DETR [62] | 7.28 | 65.8 | 374.735 | 29.5 |
(ours) AQSFormer | 6.67 | 37.4 | 67.855 | 31.3 |
Backbone | TTA | [email protected]:0.95 (%) | [email protected] (%) | @0.5 (%) | @0.5 (%) | @0.5 (%) |
---|---|---|---|---|---|---|
ResNet50 [63] | × | 77.9 | 95.6 | 65.7 | 85.4 | 98.6 |
ResNet101 [63] | × | 78.1 (+0.2) | 96.0 (+0.4) | 67.6 (+1.9) | 85.8 (+0.4) | 98.9 (+0.3) |
Swin-L [64] | × | 78.4 (+0.5) | 96.8 (+1.2) | 68.7 (+3.0) | 85.6 (+0.2) | 99.0 (+0.4) |
Swin-L [64] | ✔ | 78.7 (+0.8) | 98.2 (+2.6) | 70.5 (+4.8) | 86.5 (+1.1) | 99.6 (+1.0) |
Pos Embed | Attention | [email protected]:0.95 (%) | [email protected] (%) | @0.5 (%) | @0.5 (%) | @0.5 (%) |
---|---|---|---|---|---|---|
Sine | Self-Attn | 74.5 | 89.6 | 49.8 | 80.4 | 97.3 |
2D-RoPE | Self-Attn | 76.1 (+1.6) | 90.7 (+1.1) | 55.9 (+6.1) | 81.3 (+0.9) | 98.1 (+0.8) |
Sine | DA | 76.6 (+2.1) | 94.3 (+4.7) | 59.1 (+9.3) | 82.7 (+2.3) | 98.1 (+0.8) |
2D-RoPE | DA | 77.9 (+3.4) | 95.6 (+6.0) | 65.7 (+15.9) | 85.4 (+5.0) | 98.6 (+1.3) |
Data Aug. | [email protected]:0.95 (%) | [email protected] (%) | @0.5 (%) | @0.5 (%) | @0.5 (%) |
---|---|---|---|---|---|
× | 74.5 | 87.5 | 52.5 | 80.8 | 97.1 |
Flip & Resize & Crop | 76.5 (+2.0) | 92.6 (+5.1) | 59.8 (+7.3) | 84.8 (+4.0) | 98.3 (+1.2) |
LSJ | 76.0 (+1.5) | 91.1 (+3.6) | 59.4 (+6.9) | 82.0 (+1.2) | 97.8 (+0.7) |
Flip & Resize & Crop & LSJ | 77.9 (+3.4) | 95.6 (+8.1) | 65.7 (+13.2) | 85.4 (+4.6) | 98.6 (+1.5) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, W.; Jiang, Y.; Gao, H.; Bai, X.; Liu, B.; Xia, C. AQSFormer: Adaptive Query Selection Transformer for Real-Time Ship Detection from Visual Images. Electronics 2024, 13, 4591. https://doi.org/10.3390/electronics13234591
Yang W, Jiang Y, Gao H, Bai X, Liu B, Xia C. AQSFormer: Adaptive Query Selection Transformer for Real-Time Ship Detection from Visual Images. Electronics. 2024; 13(23):4591. https://doi.org/10.3390/electronics13234591
Chicago/Turabian StyleYang, Wei, Yueqiu Jiang, Hongwei Gao, Xue Bai, Bo Liu, and Caifeng Xia. 2024. "AQSFormer: Adaptive Query Selection Transformer for Real-Time Ship Detection from Visual Images" Electronics 13, no. 23: 4591. https://doi.org/10.3390/electronics13234591
APA StyleYang, W., Jiang, Y., Gao, H., Bai, X., Liu, B., & Xia, C. (2024). AQSFormer: Adaptive Query Selection Transformer for Real-Time Ship Detection from Visual Images. Electronics, 13(23), 4591. https://doi.org/10.3390/electronics13234591