FEA-Swin: Foreground Enhancement Attention Swin Transformer Network for Accurate UAV-Based Dense Object Detection
Abstract
:1. Introduction
- To the extent we are aware, we are the first to introduce a foreground enhancement attention block (FEAB) in the original Swin-tiny backbone to bring more contextual information and learn more recognizable features and investigate its effectiveness in aerial image object detection tasks. Moreover, the FEAB module we introduced in the backbone can theoretically be inserted into other existing hierarchical vision transformers.
- We additionally propose a straightforward and efficient weighted bi-directional feature pyramid network (BiFPN) for efficiently fusing feature maps with context information from different stages of the encoder.
- We have created a self-collected dataset around the lab, which currently has 2000 images, targeting both pedestrians and vehicles, which we have annotated and made public. The download link is given at the end of this article.
- Finally, we provide an in-depth analysis of the impact of each of the two critical components in FEA-Swin on detection accuracy. Our proposed method achieves competitive performance metrics on the VisDrone, NWPU VHR-10, and our self-collected dataset, exceeding the best currently available universal models.
2. Related Work
2.1. Object Detection in Aerial Images
- ClusDet [11] proposes an end-to-end aerial target detection framework that combines target clustering and detection. It consists of three main basic components: a cluster proposal network (CPNet), which is used for target clustering to generate target cluster regions; a scale estimation network (ScaleNet), which estimates the scale of target clusters; and a dedicated detection network (DetecNet), which performs target detection on the cluster regions normalized to each scale.
- DMNet [12] leverages density maps to do target detection in aerial images. Density maps come from a similar field, crowd counting. Density maps can reflect the distribution of targets in an image. In crowd counting, the targets are highly dense and unevenly distributed, and the scale of individual targets is small, which is highly similar to the target distribution in aerial remote sensing datasets. The proposed DMNet consists of three main steps: (1) density map generation network; (2) segmentation of the input map into foregrounds based on the density map; (3) target detection using the generated foregrounds.
- GLSAN [13] proposes an end-to-end global–local adaptive network. It consists of three main components: a global-local detection network (GLDN), an adaptive region selection algorithm (SARSA), and a local super-resolution network (LSRN). The method integrates a global–local fusion strategy into a progressively scale-varying network to perform more accurate detection.
- UCGNet [14] proposes a network based on unsupervised clustering guidance. First, a local location module (LLM) is proposed to predict binary images using an attention mechanism. The binary map can represent the location of the target presence in the image. Second, an unsupervised clustering module (UCM) is proposed to cluster these points into some clusters (clusters). To enhance the effectiveness of these clusters, the authors sample 1000 points from all pixels covering the target using the farthest point sampling strategy. Each cluster corresponds to a region. Third, these sub-regions are cropped down. Finally, the global fusion module (GMM) is used to join all candidate frames to obtain the final detection results.
2.2. Attentional Mechanisms
- SENet [17]: It pioneered channel attention. At the heart of SENet is a squeeze and excitation (SE) block that models the global picture, observes phase relationships between different channels, and improves the learning capability of the model. The disadvantage of this is that the global average pooling in the squeeze module cannot handle complex features. The fully connected layer in the excitation module also adds redundant operations.
- Non-Local [18]: It uses a spatial attention mechanism to directly model any two locations in the image, capturing long-range dependencies. The set of locations can be spatial, temporal, or spatio-temporal. It has the advantage that it can be fused with other operations for insertion into other networks, but, again, requires a larger amount of operations.
- RAN [19]: A residual attention network (RAN) is proposed, which stacks multiple attention modules. Its advantages are that it can capture mixed attention and is a scalable convolutional neural network. However, the proposed bottom-up structure fails to fully utilize the global spatial information. In addition, direct prediction of a 3D attentional map has a high computational cost.
2.3. Vision Transformer
2.4. Swin Transformer
3. Methods
3.1. Overview of the Proposed Framework
3.2. FEAB in Backone of FEA-Swin
3.3. Improved BiFPN as the Neck of FEA-Swin
4. Experimental Results and Discussion
4.1. Datasets
4.2. Implementation Details
4.3. Results and Analysis
4.4. Ablation Studies
4.5. Hyperparameter Independence of the Model
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mehallegue, N.; Djellab, M.; Loukhaoukha, K. Efficient Use of UAVs for Public Safety in Disaster and Crisis Management. Wirel. Pers. Commun. 2021, 116, 369–380. [Google Scholar] [CrossRef]
- Reedha, R.; Dericquebourg, E.; Canals, R.; Hafiane, A. Transformer Neural Network for Weed and Crop Classification of High Resolution UAV Images. Remote. Sens. 2022, 14, 592. [Google Scholar] [CrossRef]
- Masuduzzaman, M.; Islam, A.; Sadia, K.; Shin, S.Y. UAV-based MEC-assisted automated traffic management scheme using blockchain. Future Gener. Comput. Syst. 2022, 134, 256–270. [Google Scholar] [CrossRef]
- Shao, Z.; Li, C.; Li, D.; Altan, O.; Zhang, L.; Ding, L. An accurate matching method for projecting vector data into surveillance video to monitor and protect cultivated land. ISPRS Int. J. Geo Inf. 2020, 9, 448. [Google Scholar] [CrossRef]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the ECCV 2016—14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Part I, Lecture Notes in Computer Science. Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef]
- Xiang, T.; Xia, G.; Zhang, L. Mini-UAV-based Remote Sensing: Techniques, Applications and Prospectives. arXiv 2018, arXiv:1812.07770. [Google Scholar]
- Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered Object Detection in Aerial Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea, 27 October–2 November 2019; pp. 8310–8319. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Yang, T.; Zhu, S.; Chen, C.; Guan, S. Density Map Guided Object Detection in Aerial Images. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, 14–19 June 2020; pp. 737–746. [Google Scholar] [CrossRef]
- Deng, S.; Li, S.; Xie, K.; Song, W.; Liao, X.; Hao, A.; Qin, H. A Global-Local Self-Adaptive Network for Drone-View Object Detection. IEEE Trans. Image Process. 2021, 30, 1556–1569. [Google Scholar] [CrossRef]
- Liao, J.; Piao, Y.; Su, J.; Cai, G.; Huang, X.; Chen, L.; Huang, Z.; Wu, Y. Unsupervised Cluster Guided Object Detection in Aerial Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 11204–11216. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Chen, J.; Wan, L.; Zhu, J.; Xu, G.; Deng, M. Multi-Scale Spatial and Channel-wise Attention for Improving Object Detection in Remote Sensing Imagery. IEEE Geosci. Remote. Sens. Lett. 2020, 17, 681–685. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Girshick, R.B.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar] [CrossRef]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Yang, H.; Zhang, D.; Hu, A.; Liu, C.; Cui, T.J.; Miao, J. Transformer-Based Anchor-Free Detection of Concealed Objects in Passive Millimeter Wave Images. IEEE Trans. Instrum. Meas. 2022, 71, 5012216. [Google Scholar] [CrossRef]
- Xu, X.; Feng, Z.; Cao, C.; Li, M.; Wu, J.; Wu, Z.; Shang, Y.; Ye, S. An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation. Remote. Sens. 2021, 13, 4779. [Google Scholar] [CrossRef]
- Zheng, Y.; Sun, P.; Zhou, Z.; Xu, W.; Ren, Q. ADT-Det: Adaptive Dynamic Refined Single-Stage Transformer Detector for Arbitrary-Oriented Object Detection in Satellite Optical Imagery. Remote. Sens. 2021, 13, 2623. [Google Scholar] [CrossRef]
- Lin, T.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014—13th European Conference, Zurich, Switzerland, 6–12 September 2014; Volume 8693, pp. 740–755. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, X.; Wa, S.; Chen, S.; Ma, Q. GANsformer: A Detection Network for Aerial Images with High Performance Combining Convolutional Network and Transformer. Remote. Sens. 2022, 14, 923. [Google Scholar] [CrossRef]
- Xu, Z.; Liu, Y.; Gan, L.; Sun, Y.; Wu, X.; Liu, M.; Wang, L. RNGDet: Road Network Graph Detection by Transformer in Aerial Images. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 4707612. [Google Scholar] [CrossRef]
- Chen, G.; Shang, Y. Transformer for Tree Counting in Aerial Images. Remote. Sens. 2022, 14, 476. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A. Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020; pp. 4095–4104. [Google Scholar] [CrossRef]
- Zhang, K.; Wu, Y.; Wang, J.; Wang, Y.; Wang, Q. Semantic Context-Aware Network for Multiscale Object Detection in Remote Sensing Images. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 8009705. [Google Scholar] [CrossRef]
- Huang, W.; Li, G.; Jin, B.; Chen, Q.; Yin, J.; Huang, L. Scenario Context-Aware-Based Bidirectional Feature Pyramid Network for Remote Sensing Target Detection. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 6505005. [Google Scholar] [CrossRef]
- Gao, L.; Liu, H.; Yang, M.; Chen, L.; Wan, Y.; Xiao, Z.; Qian, Y. STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 10990–11003. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, X.; Liu, C.; Wang, H.; Sun, C.; Li, B.; Huang, P.; Li, Q.; Liu, Y.; Kuang, H.; et al. RelationRS: Relationship Representation Network for Object Detection in Aerial Images. Remote. Sens. 2022, 14, 1862. [Google Scholar] [CrossRef]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin Transformer V2: Scaling Up Capacity and Resolution. arXiv 2021, arXiv:2111.09883. [Google Scholar]
- Song, K.; Huang, P.; Lin, Z.; Lv, T. An oriented anchor-free object detector including feature fusion and foreground enhancement for remote sensing images. Remote. Sens. Lett. 2021, 12, 397–407. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Chen, L.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Papandreou, G.; Kokkinos, I.; Savalle, P. Modeling local and global deformations in Deep Learning: Epitomic convolution, Multiple Instance Learning, and sliding window detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015; pp. 390–399. [Google Scholar] [CrossRef]
- Yu, W.; Yang, T.; Chen, C. Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, 3–8 January 2021; pp. 3257–3266. [Google Scholar] [CrossRef]
- Lin, T.; Goyal, P.; Girshick, R.B.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You Only Look One-Level Feature. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021; pp. 13039–13048. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Cao, Y.; He, Z.; Wang, L.; Wang, W.; Yuan, Y.; Zhang, D.; Zhang, J.; Zhu, P.; Gool, L.V.; Han, J.; et al. VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2021), Montreal, BC, Canada, 11–17 October 2021; pp. 2847–2854. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote. Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.S.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Lin, T.; Dollár, P.; Girshick, R.B.; He, K.; Hariharan, B.; Belongie, S.J. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef] [Green Version]
Method | Backbone | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awning- Tricycle | Bus | Motor | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cascade R-CNN | Resnet-50 | 16.8 | 9.1 | 11.8 | 59.2 | 31.4 | 32.7 | 12.9 | 15.6 | 58.1 | 16.7 | 26.4 |
Faster R-CNN | Resnet-50 | 16.7 | 9.1 | 10.4 | 59.3 | 34.1 | 33.0 | 17.2 | 15.3 | 57.6 | 15.1 | 26.8 |
RetinaNet | Resnet-50 | 15.5 | 9.1 | 9.8 | 58.9 | 28.7 | 27.9 | 7.5 | 5.0 | 55.6 | 13.8 | 23.2 |
YOLOF | Resnet-50 | 13.1 | 1.2 | 1.4 | 55.6 | 24.6 | 25.9 | 6.4 | 5.5 | 52.7 | 6.1 | 19.3 |
Swin Transformer | Swin-tiny | 17.0 | 9.1 | 12.0 | 66.7 | 32.2 | 33.4 | 17.7 | 18.1 | 62.4 | 22.5 | 29.1 |
FEA-Swin | Ours | 31.1 | 14.6 | 13.5 | 70.1 | 42.4 | 39.3 | 19.1 | 18.5 | 62.7 | 25.6 | 33.7 |
Methods | Airplane | Ship | Storage Tank | Baseball Diamond | Tennis court | Basketball Court | Ground Track Field | Harbor | Bridge | Vehicle | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|
Cascade R-CNN | 95.3 | 90.4 | 90.9 | 100 | 90.6 | 89.3 | 98.1 | 99.7 | 61.1 | 71.9 | 88.7 |
Faster R-CNN | 90.9 | 90.7 | 90.9 | 99.9 | 86.5 | 78.0 | 89.9 | 99.7 | 57.1 | 68.6 | 85.2 |
RetinaNet | 95.6 | 64.3 | 76.2 | 94.9 | 82.3 | 51.3 | 97.7 | 82.6 | 56.7 | 67.2 | 76.9 |
YOLOF | 98.3 | 87.9 | 89.2 | 97.8 | 82.6 | 74.6 | 98.5 | 88.9 | 60.7 | 64.8 | 84.3 |
Swin Transformer | 100 | 88.0 | 90.8 | 100 | 90.3 | 88.4 | 100 | 99.7 | 78.8 | 71.9 | 90.8 |
FEA-Swin | 100 | 90.9 | 90.9 | 99.9 | 90.9 | 89.6 | 100 | 100 | 89.3 | 80.3 | 93.2 |
AP | Method | Cascade RCNN | Faster RCNN | Retina Net | YOLOF | Swin Transformer | FEA-Swin |
---|---|---|---|---|---|---|---|
Class | |||||||
Car | 84.1 | 81.6 | 78.4 | 79.6 | 88.5 | 89.2 | |
Person | 70.5 | 66.3 | 64.4 | 65.2 | 75.7 | 79.5 | |
mAP | 77.3 | 73.9 | 71.4 | 72.4 | 82.1 | 84.4 |
Settings | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awning- Tricycle | Bus | Motor | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|
Baseline | 17.0 | 9.1 | 12.0 | 66.7 | 32.2 | 33.4 | 17.7 | 18.1 | 62.4 | 22.5 | 29.1 |
+BiFPN | 28.2 | 9.3 | 12.2 | 68.2 | 41.7 | 40.7 | 18.5 | 17.2 | 48.5 | 18.3 | 30.3 |
+FEAB | 30.9 | 12.6 | 12.9 | 70.2 | 42.2 | 38.7 | 18.7 | 18.7 | 52.7 | 27.5 | 32.5 |
+BiFPN+FEAB | 31.1 | 14.6 | 13.5 | 70.1 | 42.4 | 39.3 | 19.1 | 18.5 | 62.7 | 25.6 | 33.7 |
Method | Batch Size | Optimizer Type | Weight Decay | mAP |
---|---|---|---|---|
FEA-Swin-v1 | 4 | SGD | 0.05 | 83.6 |
FEA-Swin-v2 | 4 | AdamW | 0.05 | 84.4 |
FEA-Swin-v3 | 4 | AdamW | 0.1 | 84.3 |
FEA-Swin-v4 | 8 | SGD | 0.05 | 83.4 |
FEA-Swin-v5 | 8 | AdamW | 0.05 | 84.1 |
FEA-Swin-v6 | 8 | AdamW | 0.1 | 83.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, W.; Zhang, C.; Wang, Q.; Dai, P. FEA-Swin: Foreground Enhancement Attention Swin Transformer Network for Accurate UAV-Based Dense Object Detection. Sensors 2022, 22, 6993. https://doi.org/10.3390/s22186993
Xu W, Zhang C, Wang Q, Dai P. FEA-Swin: Foreground Enhancement Attention Swin Transformer Network for Accurate UAV-Based Dense Object Detection. Sensors. 2022; 22(18):6993. https://doi.org/10.3390/s22186993
Chicago/Turabian StyleXu, Wenyu, Chaofan Zhang, Qi Wang, and Pangda Dai. 2022. "FEA-Swin: Foreground Enhancement Attention Swin Transformer Network for Accurate UAV-Based Dense Object Detection" Sensors 22, no. 18: 6993. https://doi.org/10.3390/s22186993
APA StyleXu, W., Zhang, C., Wang, Q., & Dai, P. (2022). FEA-Swin: Foreground Enhancement Attention Swin Transformer Network for Accurate UAV-Based Dense Object Detection. Sensors, 22(18), 6993. https://doi.org/10.3390/s22186993