sensors-logo

Journal Browser

Journal Browser

AI-Based Object Detection and Tracking in UAVs: Challenges and Research Directions

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 30 November 2024 | Viewed by 32183

Special Issue Editors

Department of Aeronautical and Aviation Engineering,The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Interests: unmanned aerial vehicle; flight dynamics and control; aerial robotics; SLAM
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Interests: UAV/MAV technology research and development; aerothermodynamics; experimental fluid mechanics and CFD

Special Issue Information

Dear Colleagues,

Combining autonomous unmanned aerial vehicles (UAVs) and AI-based object detection and tracking could significantly improve efficiency, reduce cost, and lower risks for various applications. With fast developments in UAV platform design, cameras, micro-computers, and image-processing algorithms, autonomous UAVs have become a promising sensing platform for various applications such as environment monitoring and infrastructure inspection. These systems can reduce the necessity of traditional manual inspection in risky working environments and avoid the cost of using piloted fixed-wing aircraft or helicopters to conduct large-scale sensing tasks.

New aerial-based sensors with machine learning, object detection, and tracking capabilities provide both opportunities and challenges that allow the research community to provide novel solutions. The key aim of this Special Issue is to bring together innovative research that uses off-the-shelf or custom-made platforms to extend autonomous aerial sensing capabilities. Contributions from all fields related to UAVs and aerial-image processing techniques are of interest, particularly including, but not limited to, the following topics:

  • Unmanned aerial vehicle (UAV) system;
  • Machine learning;
  • AI-based data processing;
  • Object detection;
  • Object tracking;
  • Localization and mapping;
  • Path planning;
  • Obstacle avoidance;
  • Multi-agent collaboration.

Dr. Boyang Li
Prof. Dr. Chihyung Wen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 5554 KiB  
Article
Unmanned Aerial Vehicle Photogrammetry for Monitoring the Geometric Changes of Reclaimed Landfills
by Grzegorz Pasternak, Klaudia Pasternak, Eugeniusz Koda and Paweł Ogrodnik
Sensors 2024, 24(22), 7247; https://doi.org/10.3390/s24227247 - 13 Nov 2024
Viewed by 352
Abstract
Monitoring reclaimed landfills is essential for ensuring their stability and monitoring the regularity of facility settlement. Insufficient recognition of the magnitude and directions of these changes can lead to serious damage to the body of the landfill (landslides, sinkholes) and, consequently, threaten the [...] Read more.
Monitoring reclaimed landfills is essential for ensuring their stability and monitoring the regularity of facility settlement. Insufficient recognition of the magnitude and directions of these changes can lead to serious damage to the body of the landfill (landslides, sinkholes) and, consequently, threaten the environment and the life and health of people near landfills. This study focuses on using UAV photogrammetry to monitor geometric changes in reclaimed landfills. This approach highlights the advantages of UAVs in expanding the monitoring and providing precise information critical for decision-making in the reclamation process. This study presents the result of annual photogrammetry measurements at the Słabomierz–Krzyżówka reclaimed landfill, located in the central part of Poland. The Multiscale Model to Model Cloud Comparison (M3C2) algorithm was used to determine deformation at the landfill. The results were simultaneously compared with the landfill’s reference (angular–linear) measurements. The mean vertical displacement error determined by the photogrammetric method was ±2.3 cm. The results showed that, with an appropriate measurement methodology, it is possible to decide on changes in geometry reliably. The collected 3D data also gives the possibility to improve the decision-making process related to repairing damage or determining the reclamation direction of the landfill, as well as preparing further development plans. Full article
Show Figures

Figure 1

18 pages, 9357 KiB  
Article
Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model
by Yaning Kong, Xiangfeng Shang and Shijie Jia
Sensors 2024, 24(17), 5496; https://doi.org/10.3390/s24175496 - 24 Aug 2024
Viewed by 2331
Abstract
Performing low-latency, high-precision object detection on unmanned aerial vehicles (UAVs) equipped with vision sensors holds significant importance. However, the current limitations of embedded UAV devices present challenges in balancing accuracy and speed, particularly in the analysis of high-precision remote sensing images. This challenge [...] Read more.
Performing low-latency, high-precision object detection on unmanned aerial vehicles (UAVs) equipped with vision sensors holds significant importance. However, the current limitations of embedded UAV devices present challenges in balancing accuracy and speed, particularly in the analysis of high-precision remote sensing images. This challenge is particularly pronounced in scenarios involving numerous small objects, intricate backgrounds, and occluded overlaps. To address these issues, we introduce the Drone-DETR model, which is based on RT-DETR. To overcome the difficulties associated with detecting small objects and reducing redundant computations arising from complex backgrounds in ultra-wide-angle images, we propose the Effective Small Object Detection Network (ESDNet). This network preserves detailed information about small objects, reduces redundant computations, and adopts a lightweight architecture. Furthermore, we introduce the Enhanced Dual-Path Feature Fusion Attention Module (EDF-FAM) within the neck network. This module is specifically designed to enhance the network’s ability to handle multi-scale objects. We employ a dynamic competitive learning strategy to enhance the model’s capability to efficiently fuse multi-scale features. Additionally, we incorporate the P2 shallow feature layer from the ESDNet into the neck network to enhance the model’s ability to fuse small-object features, thereby enhancing the accuracy of small object detection. Experimental results indicate that the Drone-DETR model achieves an mAP50 of 53.9% with only 28.7 million parameters on the VisDrone2019 dataset, representing an 8.1% enhancement over RT-DETR-R18. Full article
Show Figures

Figure 1

23 pages, 5952 KiB  
Article
HP-YOLOv8: High-Precision Small Object Detection Algorithm for Remote Sensing Images
by Guangzhen Yao, Sandong Zhu, Long Zhang and Miao Qi
Sensors 2024, 24(15), 4858; https://doi.org/10.3390/s24154858 - 26 Jul 2024
Viewed by 2286
Abstract
YOLOv8, as an efficient object detection method, can swiftly and precisely identify objects within images. However, traditional algorithms encounter difficulties when detecting small objects in remote sensing images, such as missing information, background noise, and interactions among multiple objects in complex scenes, which [...] Read more.
YOLOv8, as an efficient object detection method, can swiftly and precisely identify objects within images. However, traditional algorithms encounter difficulties when detecting small objects in remote sensing images, such as missing information, background noise, and interactions among multiple objects in complex scenes, which may affect performance. To tackle these challenges, we propose an enhanced algorithm optimized for detecting small objects in remote sensing images, named HP-YOLOv8. Firstly, we design the C2f-D-Mixer (C2f-DM) module as a replacement for the original C2f module. This module integrates both local and global information, significantly improving the ability to detect features of small objects. Secondly, we introduce a feature fusion technique based on attention mechanisms, named Bi-Level Routing Attention in Gated Feature Pyramid Network (BGFPN). This technique utilizes an efficient feature aggregation network and reparameterization technology to optimize information interaction between different scale feature maps, and through the Bi-Level Routing Attention (BRA) mechanism, it effectively captures critical feature information of small objects. Finally, we propose the Shape Mean Perpendicular Distance Intersection over Union (SMPDIoU) loss function. The method comprehensively considers the shape and size of detection boxes, enhances the model’s focus on the attributes of detection boxes, and provides a more accurate bounding box regression loss calculation method. To demonstrate our approach’s efficacy, we conducted comprehensive experiments across the RSOD, NWPU VHR-10, and VisDrone2019 datasets. The experimental results show that the HP-YOLOv8 achieves 95.11%, 93.05%, and 53.49% in the [email protected] metric, and 72.03%, 65.37%, and 38.91% in the more stringent [email protected]:0.95 metric, respectively. Full article
Show Figures

Figure 1

19 pages, 17742 KiB  
Article
A Lightweight Remote Sensing Small Target Image Detection Algorithm Based on Improved YOLOv8
by Haijiao Nie, Huanli Pang, Mingyang Ma and Ruikai Zheng
Sensors 2024, 24(9), 2952; https://doi.org/10.3390/s24092952 - 6 May 2024
Cited by 6 | Viewed by 2701
Abstract
In response to the challenges posed by small objects in remote sensing images, such as low resolution, complex backgrounds, and severe occlusions, this paper proposes a lightweight improved model based on YOLOv8n. During the detection of small objects, the feature fusion part of [...] Read more.
In response to the challenges posed by small objects in remote sensing images, such as low resolution, complex backgrounds, and severe occlusions, this paper proposes a lightweight improved model based on YOLOv8n. During the detection of small objects, the feature fusion part of the YOLOv8n algorithm retrieves relatively fewer features of small objects from the backbone network compared to large objects, resulting in low detection accuracy for small objects. To address this issue, firstly, this paper adds a dedicated small object detection layer in the feature fusion network to better integrate the features of small objects into the feature fusion part of the model. Secondly, the SSFF module is introduced to facilitate multi-scale feature fusion, enabling the model to capture more gradient paths and further improve accuracy while reducing model parameters. Finally, the HPANet structure is proposed, replacing the Path Aggregation Network with HPANet. Compared to the original YOLOv8n algorithm, the recognition accuracy of [email protected] on the VisDrone data set and the AI-TOD data set has increased by 14.3% and 17.9%, respectively, while the recognition accuracy of [email protected]:0.95 has increased by 17.1% and 19.8%, respectively. The proposed method reduces the parameter count by 33% and the model size by 31.7% compared to the original model. Experimental results demonstrate that the proposed method can quickly and accurately identify small objects in complex backgrounds. Full article
Show Figures

Figure 1

29 pages, 6850 KiB  
Article
YOLOv8-MU: An Improved YOLOv8 Underwater Detector Based on a Large Kernel Block and a Multi-Branch Reparameterization Module
by Xing Jiang, Xiting Zhuang, Jisheng Chen, Jian Zhang and Yiwen Zhang
Sensors 2024, 24(9), 2905; https://doi.org/10.3390/s24092905 - 1 May 2024
Cited by 3 | Viewed by 2475
Abstract
Underwater visual detection technology is crucial for marine exploration and monitoring. Given the growing demand for accurate underwater target recognition, this study introduces an innovative architecture, YOLOv8-MU, which significantly enhances the detection accuracy. This model incorporates the large kernel block (LarK block) from [...] Read more.
Underwater visual detection technology is crucial for marine exploration and monitoring. Given the growing demand for accurate underwater target recognition, this study introduces an innovative architecture, YOLOv8-MU, which significantly enhances the detection accuracy. This model incorporates the large kernel block (LarK block) from UniRepLKNet to optimize the backbone network, achieving a broader receptive field without increasing the model’s depth. Additionally, the integration of C2fSTR, which combines the Swin transformer with the C2f module, and the SPPFCSPC_EMA module, which blends Cross-Stage Partial Fast Spatial Pyramid Pooling (SPPFCSPC) with attention mechanisms, notably improves the detection accuracy and robustness for various biological targets. A fusion block from DAMO-YOLO further enhances the multi-scale feature extraction capabilities in the model’s neck. Moreover, the adoption of the MPDIoU loss function, designed around the vertex distance, effectively addresses the challenges of localization accuracy and boundary clarity in underwater organism detection. The experimental results on the URPC2019 dataset indicate that YOLOv8-MU achieves an [email protected] of 78.4%, showing an improvement of 4.0% over the original YOLOv8 model. Additionally, on the URPC2020 dataset, it achieves 80.9%, and, on the Aquarium dataset, it reaches 75.5%, surpassing other models, including YOLOv5 and YOLOv8n, thus confirming the wide applicability and generalization capabilities of our proposed improved model architecture. Furthermore, an evaluation on the improved URPC2019 dataset demonstrates leading performance (SOTA), with an [email protected] of 88.1%, further verifying its superiority on this dataset. These results highlight the model’s broad applicability and generalization capabilities across various underwater datasets. Full article
Show Figures

Figure 1

17 pages, 10284 KiB  
Article
Filling the Gaps: Using Synthetic Low-Altitude Aerial Images to Increase Operational Design Domain Coverage
by Joachim Rüter, Theresa Maienschein, Sebastian Schirmer, Simon Schopferer and Christoph Torens
Sensors 2024, 24(4), 1144; https://doi.org/10.3390/s24041144 - 9 Feb 2024
Cited by 1 | Viewed by 1198
Abstract
A key necessity for the safe and autonomous flight of Unmanned Aircraft Systems (UAS) is their reliable perception of the environment, for example, to assess the safety of a landing site. For visual perception, Machine Learning (ML) provides state-of-the-art results in terms of [...] Read more.
A key necessity for the safe and autonomous flight of Unmanned Aircraft Systems (UAS) is their reliable perception of the environment, for example, to assess the safety of a landing site. For visual perception, Machine Learning (ML) provides state-of-the-art results in terms of performance, but the path to aviation certification has yet to be determined as current regulation and standard documents are not applicable to ML-based components due to their data-defined properties. However, the European Union Aviation Safety Agency (EASA) published the first usable guidance documents that take ML-specific challenges, such as data management and learning assurance, into account. In this paper, an important concept in this context is addressed, namely the Operational Design Domain (ODD) that defines the limitations under which a given ML-based system is designed to operate and function correctly. We investigated whether synthetic data can be used to complement a real-world training dataset which does not cover the whole ODD of an ML-based system component for visual object detection. The use-case in focus is the detection of humans on the ground to assess the safety of landing sites. Synthetic data are generated using the methods proposed in the EASA documents, namely augmentations, stitching and simulation environments. These data are used to augment a real-world dataset to increase ODD coverage during the training of Faster R-CNN object detection models. Our results give insights into the generation techniques and usefulness of synthetic data in the context of increasing ODD coverage. They indicate that the different types of synthetic images vary in their suitability but that augmentations seem to be particularly promising when there is not enough real-world data to cover the whole ODD. By doing so, our results contribute towards the adoption of ML technology in aviation and the reduction of data requirements for ML perception systems. Full article
Show Figures

Figure 1

34 pages, 1871 KiB  
Article
Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance
by Raúl Arranz, David Carramiñana, Gonzalo de Miguel, Juan A. Besada and Ana M. Bernardos
Sensors 2023, 23(21), 8766; https://doi.org/10.3390/s23218766 - 27 Oct 2023
Cited by 6 | Viewed by 4455
Abstract
This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement-learning-based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi-agent centralized swarm architecture. The proposed system [...] Read more.
This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement-learning-based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi-agent centralized swarm architecture. The proposed system is tailored to perform surveillance of a specific area, searching and tracking ground targets, for security and law enforcement applications. The swarm is governed by a central swarm controller responsible for distributing different search and tracking tasks among the cooperating UAVs. Each UAV agent is then controlled by a collection of cooperative sub-agents, whose behaviors have been trained using different deep reinforcement learning models, tailored for the different task types proposed by the swarm controller. More specifically, proximal policy optimization (PPO) algorithms were used to train the agents’ behavior. In addition, several metrics to assess the performance of the swarm in this application were defined. The results obtained through simulation show that our system searches the operation area effectively, acquires the targets in a reasonable time, and is capable of tracking them continuously and consistently. Full article
Show Figures

Figure 1

16 pages, 15333 KiB  
Article
YOLOv5 Drone Detection Using Multimodal Data Registered by the Vicon System
by Wojciech Lindenheim-Locher, Adam Świtoński, Tomasz Krzeszowski, Grzegorz Paleta, Piotr Hasiec, Henryk Josiński, Marcin Paszkuta, Konrad Wojciechowski and Jakub Rosner
Sensors 2023, 23(14), 6396; https://doi.org/10.3390/s23146396 - 14 Jul 2023
Cited by 6 | Viewed by 2686
Abstract
This work is focused on the preliminary stage of the 3D drone tracking challenge, namely the precise detection of drones on images obtained from a synchronized multi-camera system. The YOLOv5 deep network with different input resolutions is trained and tested on the basis [...] Read more.
This work is focused on the preliminary stage of the 3D drone tracking challenge, namely the precise detection of drones on images obtained from a synchronized multi-camera system. The YOLOv5 deep network with different input resolutions is trained and tested on the basis of real, multimodal data containing synchronized video sequences and precise motion capture data as a ground truth reference. The bounding boxes are determined based on the 3D position and orientation of an asymmetric cross attached to the top of the tracked object with known translation to the object’s center. The arms of the cross are identified by the markers registered by motion capture acquisition. Besides the classical mean average precision (mAP), a measure more adequate in the evaluation of detection performance in 3D tracking is proposed, namely the average distance between the centroids of matched references and detected drones, including false positive and false negative ratios. Moreover, the videos generated in the AirSim simulation platform were taken into account in both the training and testing stages. Full article
Show Figures

Figure 1

22 pages, 27783 KiB  
Article
Joint Fusion and Detection via Deep Learning in UAV-Borne Multispectral Sensing of Scatterable Landmine
by Zhongze Qiu, Hangfu Guo, Jun Hu, Hejun Jiang and Chaopeng Luo
Sensors 2023, 23(12), 5693; https://doi.org/10.3390/s23125693 - 18 Jun 2023
Cited by 4 | Viewed by 3301
Abstract
Compared with traditional mine detection methods, UAV-based measures are more suitable for the rapid detection of large areas of scatterable landmines, and a multispectral fusion strategy based on a deep learning model is proposed to facilitate mine detection. Using the UAV-borne multispectral cruise [...] Read more.
Compared with traditional mine detection methods, UAV-based measures are more suitable for the rapid detection of large areas of scatterable landmines, and a multispectral fusion strategy based on a deep learning model is proposed to facilitate mine detection. Using the UAV-borne multispectral cruise platform, we establish a multispectral dataset of scatterable mines, with mine-spreading areas of the ground vegetation considered. In order to achieve the robust detection of occluded landmines, first, we employ an active learning strategy to refine the labeling of the multispectral dataset. Then, we propose an image fusion architecture driven by detection, in which we use YOLOv5 for the detection part, to improve the detection performance instructively while enhancing the quality of the fused image. Specifically, a simple and lightweight fusion network is designed to sufficiently aggregate texture details and semantic information of the source images and obtain a higher fusion speed. Moreover, we leverage detection loss as well as a joint-training algorithm to allow the semantic information to dynamically flow back into the fusion network. Extensive qualitative and quantitative experiments demonstrate that the detection-driven fusion (DDF) that we propose can effectively increase the recall rate, especially for occluded landmines, and verify the feasibility of multispectral data through reasonable processing. Full article
Show Figures

Figure 1

15 pages, 968 KiB  
Article
Dynamic Weighting Network for Person Re-Identification
by Guang Li, Peng Liu, Xiaofan Cao and Chunguang Liu
Sensors 2023, 23(12), 5579; https://doi.org/10.3390/s23125579 - 14 Jun 2023
Cited by 1 | Viewed by 1471
Abstract
Recently, hybrid Convolution-Transformer architectures have become popular due to their ability to capture both local and global image features and the advantage of lower computational cost over pure Transformer models. However, directly embedding a Transformer can result in the loss of convolution-based features, [...] Read more.
Recently, hybrid Convolution-Transformer architectures have become popular due to their ability to capture both local and global image features and the advantage of lower computational cost over pure Transformer models. However, directly embedding a Transformer can result in the loss of convolution-based features, particularly fine-grained features. Therefore, using these architectures as the backbone of a re-identification task is not an effective approach. To address this challenge, we propose a feature fusion gate unit that dynamically adjusts the ratio of local and global features. The feature fusion gate unit fuses the convolution and self-attentive branches of the network with dynamic parameters based on the input information. This unit can be integrated into different layers or multiple residual blocks, which will have varying effects on the accuracy of the model. Using feature fusion gate units, we propose a simple and portable model called the dynamic weighting network or DWNet, which supports two backbones, ResNet and OSNet, called DWNet-R and DWNet-O, respectively. DWNet significantly improves re-identification performance over the original baseline, while maintaining reasonable computational consumption and number of parameters. Finally, our DWNet-R achieves an mAP of 87.53%, 79.18%, 50.03%, on the Market1501, DukeMTMC-reID, and MSMT17 datasets. Our DWNet-O achieves an mAP of 86.83%, 78.68%, 55.66%, on the Market1501, DukeMTMC-reID, and MSMT17 datasets. Full article
Show Figures

Figure 1

14 pages, 3355 KiB  
Article
Research of Maritime Object Detection Method in Foggy Environment Based on Improved Model SRC-YOLO
by Yihong Zhang, Hang Ge, Qin Lin, Ming Zhang and Qiantao Sun
Sensors 2022, 22(20), 7786; https://doi.org/10.3390/s22207786 - 13 Oct 2022
Cited by 5 | Viewed by 2208
Abstract
An improved maritime object detection algorithm, SRC-YOLO, based on the YOLOv4-tiny, is proposed in the foggy environment to address the issues of false detection, missed detection, and low detection accuracy in complicated situations. To confirm the model’s validity, an ocean dataset containing various [...] Read more.
An improved maritime object detection algorithm, SRC-YOLO, based on the YOLOv4-tiny, is proposed in the foggy environment to address the issues of false detection, missed detection, and low detection accuracy in complicated situations. To confirm the model’s validity, an ocean dataset containing various concentrations of haze, target angles, and sizes was produced for the research. Firstly, the Single Scale Retinex (SSR) algorithm was applied to preprocess the dataset to reduce the interference of the complex scenes on the ocean. Secondly, in order to increase the model’s receptive field, we employed a modified Receptive Field Block (RFB) module in place of the standard convolution in the Neck part of the model. Finally, the Convolutional Block Attention Module (CBAM), which integrates channel and spatial information, was introduced to raise detection performance by expanding the network model’s attention to the context information in the feature map and the object location points. The experimental results demonstrate that the improved SRC-YOLO model effectively detects marine targets in foggy scenes by increasing the mean Average Precision (mAP) of detection results from 79.56% to 86.15%. Full article
Show Figures

Figure 1

12 pages, 2840 KiB  
Article
Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation
by Xiaobin Wang, Dekang Zhu and Ye Yan
Sensors 2022, 22(19), 7663; https://doi.org/10.3390/s22197663 - 9 Oct 2022
Cited by 17 | Viewed by 3460
Abstract
Small object detection has always been a difficult direction in the field of object detection, especially the detection of small objects in UAV aerial images. The images captured by UAVs have the characteristics of small objects and dense objects. In order to solve [...] Read more.
Small object detection has always been a difficult direction in the field of object detection, especially the detection of small objects in UAV aerial images. The images captured by UAVs have the characteristics of small objects and dense objects. In order to solve these two problems, this paper improves the performance of object detection from the aspects of data and network structure. In terms of data, the data augmentation strategy and image pyramid mechanism are mainly used. The data augmentation strategy adopts the method of image division, which can greatly increase the number of small objects, making it easier for the algorithm to be fully trained during the training process. Since the object is denser, the image pyramid mechanism is used. During the training process, the divided images are up-sampled into three different sizes, and then sent to three different detectors respectively. Finally, the detection results of the three detectors are fused to obtain the final detection results. The small object itself has few pixels and few features. In order to improve the detection performance, it is necessary to use context. This paper adds attention mechanism to the yolov5 network structure, while adding a detection head to the underlying feature map to make the network structure pay more attention to small objects. By using data augmentation and improved network structure, the detection performance of small objects can be significantly improved. The experiment in this paper is carried out on the Visdrone2019 dataset and DOTA dataset. Through experimental verification, our proposed method can significantly improve the performance of small object detection. Full article
Show Figures

Figure 1

Back to TopTop