2.1. UAV Classification Based on ML Using RF Analysis
RF signals are captured and examined using RF-based devices in order to identify and recognize threats. The benefits of the RF-based detection method are that it operates day or night and in any kind of weather. Thus, compared with other current technologies, RF-based monitoring techniques have recently shown higher potential for the UAV communication system. In order to manage and operate the UAV utilizing RF signals, most UAVs are equipped with an onboard transmitter for data transfer.
UAVs can be detected and located from a considerable distance using RF information. To improve the challenges of UAV detection and classification rates using RF signals, the ML-based algorithm has shown excellent performance. The overview of the RF-based UAV detection and classification operation is illustrated in
Figure 3. In addition, a summary of recent related research on RF-based methods using ML for UAV detection and classification is shown in
Table 2. Furthermore, the dataset information of the current research on RF-based methods using ML for UAV detection and classification is shown in
Table 3.
In recent years, the use of RF-based UAV detection and classification has dramatically increased. In the state of the art, many works have been completed using RF technology for UAV detection and classification [
42,
57,
63,
64,
65]. A DL approach based on RF was proposed in [
63] to detect multiple UAVs. To complete the objectives of detection and classification, the authors suggested the use of a supervised DL model. For RF signal preparation, they employed short-term Fourier transform (STFT). The higher efficiency of their approach was largely due to the preparation of the data, which was first conducted by using STFT. The authors in [
64] introduced a model named RF-UAVNet, which was designed with a convolutional network for UAV tracking systems that used RF signals to recognize and classify the UAVs. In order to minimize the network dimensions and operational expense, the recommended setup uses clustered convolutional layer structures. This research took advantage of the publicly accessible dataset DroneRF [
57] for RF-based UAV detection techniques.
The authors in [
34] assessed the impact of real-world Bluetooth and Wi-Fi signal interference on UAV detection and classification by employing convolutional neural network (CNN) feature extraction and machine learning classifiers logistic regression and k-nearest neighbor (kNN). They used graphical representations in both the time and frequency domains to evaluate two-class, four-class, and ten-class flying mode classification.
In a separate study, the authors in [
35] proposed a drone detection system recognizing various drone types and detecting drones. They designed a network structure using multiple 1-dimensional layers of a sequential CNN to progressively learn the feature map of RF signals of different sizes obtained from drones. The suggested CNN model was trained using the DroneRF dataset, comprising three distinct drone RF signals along with background noise.
Another investigation by the authors in [
36] involved comparing three distinct classification methods to identify the presence of airborne users in a network. These algorithms utilized standard long-term evolution (LTE) metrics from the user equipment as input and were evaluated using data collected from a car and a drone in flight equipped with mobile phones. The results were analyzed, emphasizing the advantages and disadvantages of each approach concerning various use cases and the trade-off between sensitivity and specificity.
Furthermore, in [
37], the researchers explored the use of artificial neural networks (ANNs) for feature extraction and classification from RF signals for UAV identification. This study distinguished itself by employing the UAV communication signal as an identification marker. Moreover, the research creatively extracted the slope, kurtosis, and skewness of UAV signals in the frequency domain. Additionally, [
38] proposed the detection and classification of micro-UAVs using machine learning based on RF fingerprints of the signals transmitted from the controller to the micro-UAV. During the detection phase, raw signals were divided into frames and converted into the wavelet domain to reduce data processing and eliminate bias from the signals. The existence of a UAV in each frame was detected using a naïve Bayes approach based on independently constructed Markov models for UAV and non-UAV classes.
The authors in [
39] described their efforts to locate drone controllers using RF signals. A signal spectrum monitor was used as an RF sensor array. From the sensor’s output, a CNN was trained to anticipate the drone controller’s bearing on the sensor. By positioning two or more sensors at suitable distances apart, it became possible to determine the controllers’ positions using these bearings.
In [
40], the authors proposed a drone detection method aimed at creating a database for RF signals emitted by different drones operating in various flight modes. They considered multiple flight modes in simulations and utilized the RF database to develop algorithms that detect and identify drone intrusions. Three DNNs were employed to identify drone locations, types, and flight modes.
For recognizing and identifying UAVs based on their RF signature, [
41] suggested an end-to-end DL model. Different from previous research, this study employed multiscale feature extraction methods without human intervention to extract enhanced features aiding the model in achieving strong signal generalization capabilities and reducing computing time for decisionmaking.
The study in [
42] utilized a compressed sensing technique instead of the conventional sampling theorem for data sampling. The researchers employed a multichannel random demodulator to sample the signal and proposed a multistage DL-based method to detect and classify UAVs, capitalizing on variations in communication signals between drones and controllers under changing conditions. Additionally, the DroneRF dataset was utilized in [
42], The UAV was first identified by the DNN, and then it was further identified by a CNN model. Nevertheless, it was not feasible to take into account additional signals that appeared in the
GHz range when utilizing the DroneRF dataset [
65].
In [
43], the authors proposed a novel method based on RF signal analysis and multiple ML techniques for drone swarm characterization and identification. They provided an unsupervised strategy for drone swarm characterization using RF features extracted from the RF fingerprint through various frequency transforms. Unsupervised techniques like manifold approximation and projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), independent component analysis (ICA), and clustering algorithms such as X-means, K-means, and mean shift were suggested to minimize input data dimension.
A study on RF-based UAV detection and classifictaion was conducted in [
8], where the authors considered the interference of other wireless transmissions such as Bluetooth and Wi-Fi. They extracted and characterized the RF signals using wavelet scattering transform (WST) and continuous wavelet transform (CWT). The signal was classified and identified taking into account both transient and stable phases. In order to examine the effectiveness of coefficient-based approaches (CWT and WST), they also executed several image-based methods for extracting features. Using PCA in conjunction with several ML models, including support vector machine (SVM), KNNs, and ensemble, they completed classification and detection activities with varying degrees of noise.
In [
48], the authors demonstrated the use of Markov-based naïve Bayes ML approaches for the identification and classification of UAVs using numerous RF raw signal fingerprints from various UAV controllers and under varying SNR levels. To mitigate noise sensitivity and respond with modulation approaches, the categorization specifically relied on the energy transient signal and statistically processed it. This approach avoids potential delays in identifying the transient signal, particularly in low-SNR conditions, due to its lower processing cost and not relying on the time domain. Several ML techniques, such as discriminant analysis (DA), NN, KNN classification, and SVM, were trained on the feature sets for UAV classification and detection.
In addition, low, slow, and small UAVs (LSSUAVs) operating in the
GHz frequency range can be detected slowly by using Hash Fingerprint (HF) characteristics based on distance-based support vector data description (SVDD)-based UAVs detection according to a proposal provided in [
66]. The system started by identifying the primary signal’s starting point, creating envelope signals, followed by removing the envelopes from the signals. The HF is then created as a characteristic to train SVDD. To evaluate the system, the authors gathered a customized dataset. The outcomes showed that the system can identify and locate LSSUAV signals within an interior setting. Nevertheless, the system efficiency was decreased when additive white Gaussian noise (AWGN) was supplied.
A framework for UAV detection based on auxiliary classifier Wasserstein generative adversarial networks (AC-WGANs) was presented in [
67]. The model leverages RF fingerprints from UAV radios as input features. The popular image synthesis and analyzing tool known as the generative adversarial network (GAN) model was modified for UAV detection and multiclassification. This was accomplished by utilizing and enhancing the GAN discriminator model. PCA was utilized to further decrease the dimensionality of the RF signature for feature extraction after the intensity envelope had been used to shorten the initial signal. Four UAVs, one Wi-Fi device, and a randomly selected signal taken from the surroundings were used in their test setup. AC-WGAN was able to achieve a
accuracy rate of UAV detection.
A DNN model was trained using the frequency parts of the UAV RF signals that were extracted using discrete Fourier transform (DFT) in [
40]. In the proposed work, three UAVs were used for the simulation. The UAV detection and classification achieved a precision of
. The authors did not take into account additional ISM devices that operate in the identical
GHz frequency range, except for UAV-flight controller communication. Furthermore, the efficacy of the framework at different signal-to-noise ratios (SNRs) was not evaluated. Additionally, the time required for inference of the classification algorithm was not considered.
2.1.1. Challenges and Solutions of RF-Based UAV Detection and Classification Using ML
RF signal variability: Diverse RF signal characteristics due to variations in UAV models, communication protocols, and flight dynamics. Develop robust feature extraction methods capable of capturing and analyzing different RF signal patterns across various UAVs.
Background noise and interference: Environmental noise and interference affecting the accuracy of RF-based detection systems. Investigate advanced signal processing algorithms and adaptive filtering to mitigate the impact of background noise on RF signal clarity.
Signal strength and distance: RF signal attenuation over long distances limits the effective range of UAV detection systems. Explore novel antenna designs and signal amplification techniques to improve signal sensitivity and extend detection range.
UAV classification: Accurately distinguishing between different UAV types based on similar RF signal features. Implement advanced machine learning models, such as deep neural networks, for fine-grained classification of UAVs using RF signatures.
Real-time processing: Processing RF data in real time for prompt detection and response. Optimize machine learning algorithms and hardware configurations, possibly leveraging edge computing techniques, to enable rapid analysis of RF signals.
Security and adversarial attacks: Vulnerability of RF-based systems to adversarial attacks and signal spoofing. Implement robust encryption and authentication mechanisms to secure RF signals and prevent malicious intrusions.
2.1.2. Future Directions of RF-Based UAV Detection and Classification Using ML
Advanced signal processing techniques: Explore advanced signal processing methods, such as compressed sensing and adaptive filtering, to enhance the extraction of discriminative features from RF signals for more precise UAV classification [
68].
Multisensor fusion for improved accuracy: Investigate the fusion of RF data with other sensor modalities (e.g., optical or acoustic) to create more comprehensive and accurate UAV detection systems capable of handling diverse environmental conditions [
69].
Dynamic adaptation and self-learning algorithms: Develop machine learning models with adaptive learning capabilities, enabling continuous improvement and adaptation to evolving UAV signal variations, environmental changes, and new UAV models [
17].
Real-time edge computing for swift decisionmaking: Explore the integration of edge computing techniques with RF-based UAV detection systems to achieve faster processing speeds, enabling real-time decisionmaking in dynamic and resource-constrained environments [
70].
Robustness against adversarial attacks: Investigate novel approaches to fortify RF-based UAV detection systems against adversarial attacks, including intrusion detection mechanisms and cryptographic protocols [
71].
Standardization and interoperability: Collaborate across academia, industry, and regulatory bodies to establish standardized protocols and interoperable frameworks for RF-based UAV detection systems, facilitating compatibility and integration across different platforms [
72].
2.2. UAV Classification Based on ML Using Visual Data Analysis
Due to the intricacy of radar technology and the quick advancements in computer vision, several researchers are considering the employment of visual information (images or videos) for UAV detection and classification. Because visual images have a high resolution, they are frequently utilized for semantic segmentation and object recognition. However, using visible images also comes with its own set of issues, like shifting light, obscured areas, and a cluttered background. In addition, there are usually difficulties involved in carrying out this operation in visible photographs, such as the UAV’s small dimensions, the disorientation of birds, the presence of concealed regions, and busy backgrounds. For these reasons, an effective and thorough detection technique must be used. Deep CNN has recently made significant strides, and the introduction of better technology allows for faster and more accurate object detection using visual input, especially for visual-based UAV detection and classification. The basic detection and classification of UAVs based on image or video (visual data) using the ML algorithm is demonstrated in
Figure 4. The summary of related research on visual-based methods using ML for UAV detection and classification is shown in
Table 4. Furthermore, the dataset information of the current research on visual-based methods using ML for UAV detection and classification is shown in
Table 5.
The recent advancements in ML models have significantly enhanced the capability to detect and classify UAVs in both secure areas and public environments. With the emergence of more powerful deep CNNs and the availability of superior equipment, the process of identifying objects through visual input can now be accomplished with greater speed and accuracy [
73]. DL networks are specifically designed for instantaneous UAV recognition, distinguishing them from traditional UAV detection systems. These networks classify inputs into various UAV classes, identifying the category, position, as well as the presence or absence of different UAV types [
99]. CNNs are among the most significant NN models for image detection and categorization. The input information for this network passes through the convolutional layers. Next, the network’s kernel is used to execute the convolution function in order to detect commonalities. Finally, the generated feature map is used for feature extraction [
100]. CNNs come in several varieties, including region-based CNN (R-CNN) [
101], spatial syramid sooling network (SPPNet) [
102], and Faster RCNN [
103]. Convolutional procedures are applied in these networks, enabling the extraction of additional information and improving both speed and precision in object detection compared to traditional techniques. Practically, the extracted features serve as object descriptors for recognition. Region proposal networks (RPNs) are employed in these networks to initially define suggested regions [
104]. Following the application of convolutional filters to such locations, the convolutional process yields the extracted features [
103]. Additional DL methods, including you only look once (YOLO) [
105] and SSD (single-shot multibox detector) [
106], often examine the image, leading to faster and more accurate item detection than simple techniques [
105]. In recent times, the detection of UAVs has emerged as a promising field within the research community. Numerous studies have been conducted for UAV detection and classification [
73,
74,
75,
76,
77,
78,
79,
82,
83].
Several obstacles, including the UAVs’ tiny dimensions in various images, could be too small for the YOLOv4 DL network to detect [
74]. In this investigation [
74], the network was unable to identify the UAV in a few of the challenging images. These difficulties are due to the fact that, due to their small size, some drones can be mistaken for birds and may be found in concealed or cluttered environments. To address the challenge of identifying flying UAVs more effectively, the YOLOv4 DL network underwent significant changes. The primary innovation in this study lies in the modification of the network design. Additionally, this research successfully identified four types of multirotors, which include fixed-wing, helicopter, and VTOL (vertical take-off and landing) aircraft.
By employing the YOLOv3 DL system, an autonomous UAV detection system was implemented in [
75]. One of the benefits of this study is its affordability due to the system’s low requirement for GPU memory. The study successfully detected tiny-sized UAVs operating at close range. However, a limitation of this research is the inability to accurately identify different types of UAVs.
The study in [
76] utilized CNNs, SVMs, and nearest-neighbor algorithms for UAV identification using fisheye cameras. The experimental results demonstrated that the efficiency of CNN, SVM, and nearest-neighbor algorithms was
,
, and
, respectively. The CNN classifier exhibited good precision when compared to different classifiers operating under the same test settings. It is worth noting that this study did not take into account different types of UAVs or account for various detection challenges; it solely focused on UAV detection.
Utilizing the YOLOv3 DL network, the study in [
77] successfully identified and categorized UAVs in RGB images, achieving a mean average precision (mAP) of
after completing 150 epochs. It is worth noting that the paper did not delve into the topic of differentiating UAVs from birds; rather, it focused solely on the identification of UAVs at varying distances. Furthermore, the study in [
78] addressed low-altitude UAV detection using the YOLOv4 deep learning network. For performance comparison, the YOLOv4 detection results were contrasted with those of the YOLOv3 and SSD models. The investigation revealed that the YOLOv4 network outperformed both the YOLOv3 and SSD networks in terms of mAP and detection speed. In the simulation, YOLOv4 achieved an impressive
mAP in the detection, recognition, and identification of three different types of UAVs.
In [
79], the YOLOv3 DL network was employed to detect and track a UAV. The study utilized the NVIDIA Jetson TX2 for real-time UAV detection. Based on the findings, it can be concluded that the proposed YOLOv3 DL network achieved an
average confidence score and demonstrated an accuracy range of
to
for detecting UAVs of small, medium, and large sizes, respectively. In addition, employing four DL network architectures in conjunction with a dataset comprising both visual and thermal images, the study outlined in [
80] successfully detected and classified UAVs. This investigation harnessed the power of DL networks such as DETR (DEtection TRansformer), SSD, YOLOv3, and Faster RCNN models for superior detection performance. The results demonstrated that even diminutive UAVs could be reliably detected from a considerable distance by all the networks under scrutiny. Notably, YOLOv3 exhibited the highest overall accuracy, boasting an impressive mAP of up to
, while Faster RCNN consistently demonstrated the highest mAP for detecting tiny UAVs, peaking at
.
In [
81], the authors utilized YOLOv4 to develop an automated UAV detection technology. They evaluated the algorithm on two distinct types of UAV recordings, employing a dataset containing images of both drones and birds for UAV identification. The findings from this study for the detection of two different kinds of multirotor UAVs are as follows: precision of
, recall of
, F1-score of
, and mAP of
. In [
82], the authors proposed an extension of the single-shot object detector CNN model, known as YOLO. They introduced a regression training approach for UAV identification in the latest version, YOLOv2, using fine-tuning. With the use of an artificial dataset, they were able to achieve similar accuracy and recall values, both at
, in their technique evaluation.
In order to detect the UAVs from video data, the authors in [
83] examined a variety of pre-trained CNN models, such as Zeiler and Fergus (ZF) and VGG16 combined with the Faster R-CNN model. To make up for the absence of a large enough dataset and to guarantee convergence throughout the model’s training process, they employed the VGG16 and the ZF model as transfer learning models. The Nvidia Quadro P6000 GPU was used for the training, and a batch size of 64 was used with a fixed learning rate of
. They used the Bird vs. UAV dataset, which is made up of five MPEG4-coded films with a total of 2727 frames and
pixel quality, shot during various sessions.
The study in [
87] proposed a novel DL-based technique for effectively identifying and detecting two different types of birds and drones. When the suggested method was evaluated using a pre-existing image dataset, it outperformed the detection systems currently utilized in the existing literature. Moreover, due to their similar appearance and behavior, drones and birds were often mistaken for each other. The proposed technique can discern and differentiate between two varieties of drones, distinguishing them from birds. Additionally, it can determine the presence of drones in a given location.
To detect small UAVs, [
88] utilized various iterations of state-of-the-art object detection models (like YOLO models) using computer vision and DL techniques. They proposed different image-processing approaches to enhance the accuracy of tiny UAV detection, resulting in significant performance gains.
2.2.1. Challenges and Solutions of Visual Data-Based UAV Detection and Classification Using ML
Variability in visual data: Visual data captured by cameras vary due to factors like lighting conditions, weather, angles, and distances, making consistent detection and classification challenging. Employ robust preprocessing techniques (e.g., normalization and augmentation) to standardize and enhance visual data quality.
Limited annotated datasets: The lack of diverse and well-annotated datasets specific to UAVs hampers the training of accurate ML models. Develop and curate comprehensive datasets encompassing various UAV types and scenarios for effective model training.
Real-time processing: Processing visual data in real time for swift and accurate UAV detection and classification. Optimize algorithms and hardware configurations to ensure real-time processing capabilities, potentially leveraging GPU acceleration or edge computing.
Scale and complexity: Scaling detection and classification algorithms to handle complex scenes, multiple UAVs, or crowded environments. Explore advanced DL architectures capable of handling complex visual scenes for improved detection and classification accuracy.
Adaptability to environmental changes: Adapting to environmental changes (e.g., varying weather conditions) affecting visual data quality and system performance. Develop adaptive algorithms capable of adjusting to environmental variations for robust and reliable detection.
2.2.2. Future Directions of Visual Data-Based UAV Detection and Classification Using ML
Multimodal integration: Integrate visual data with other sensor modalities (e.g., RF or LiDAR) for more comprehensive and reliable UAV detection systems [
107].
Semantic understanding and contextual information: Incorporate semantic understanding and contextual information in visual analysis to improve classification accuracy [
108,
109].
Ethical and privacy concerns: Address privacy considerations by implementing privacy-preserving techniques without compromising detection accuracy [
110].
Interpretability and explainability: Develop methods for explaining and interpreting model decisions, enhancing trust and transparency in visual-based UAV detection systems [
111].
2.3. UAV Classification Based on ML Using Acoustic Signal
UAVs emit a distinctive buzzing sound during flight, which can be captured by acoustic sensors and subjected to various analyses to establish a unique audio signature for each UAV. The capability to identify a UAV based on its auditory fingerprint, or even determine its specific type, would be highly valuable.
Figure 5 illustrates an example of a machine learning-based approach for UAV detection and classification using the acoustic method. Furthermore,
Table 6 provides a summary of related research on acoustic-based methods employing machine learning for UAV detection and classification. In addition, the dataset information of the current research on acoustic-based methods using ML for UAV detection and classification is shown in
Table 7. In the realm of audio-based UAV identification, DL techniques are commonly employed to extract features and achieve optimal UAV detection performance. Recent studies [
112,
113,
114,
115,
116,
117,
118] also demonstrated the efficacy of DL models in extracting characteristics from UAV audio signals for UAV identification.
The authors in [
119] created spectrograms from audio samples and fed them into DL models. The system extracted various characteristics from the spectrograms generated by the audio sources and used them to train DL models. Additionally, the authors in [
113] employed an STFT to convert the audio signal into the Mel spectrum, creating a visual representation. This image was then input into a specifically designed lightweight (LWCNN) for identifying signal attributes and UAV detection.
To categorize the auditory signals as suggestive of UAV activity or not, the authors in [
112] employed Log Mel spectrograms and Mel frequency cepstral coefficients (MFCC) considered as input and fed them to the CNN model. Additionally, for amateur UAV identification, the authors in [
115] suggested a method that combines ML techniques with acoustic inputs. Nevertheless, the distinction between things that can be mistaken for ambient noise and other UAVs was not taken into account in their investigation.
A KNN-based method and Fast Fourier Transform (FFT) were presented in the study [
120] for UAV detection using auditory inputs. Using SVM and KNN based on the auditory inputs, the signals are classified to determine whether the amateur UAV is present or not. An amateur UAV is detected based on the similarities that the acquired spectral pictures have to one another; nonetheless, the precision of this technique is only up to
. In order to discriminate between the sounds of items such as UAVs, birds, aircraft, and storms, the authors in [
121] suggested an ML-based UAV identification system. The MFCC and linear predictive cepstral coefficients (LPCC) feature extraction techniques are used to extract the required characteristics from UAV sound. Then, SVM with different kernels is used to precisely identify these sounds after feature extraction. The findings of the experiment confirm that the SVM cubic kernel with MFCC performs better for UAV identification than the LPCC approach, with an accuracy of about
.
The authors in [
114] proposed a method for identifying the presence of a UAV within a 150-m radius. They suggested employing classification techniques such as the Gaussian mixture model (GMM), CNN, and RNN. To address the scarcity of acoustic data from UAV flights, the authors recommended creating datasets by blending UAV sounds with other ambient noises. One intriguing aspect of their research involves the use of diverse UAV models for training and evaluating the classifiers. Their findings revealed that the RNN classifier exhibited the highest performance at
, followed by the GMM model at
, and the CNN model at
. However, in scenarios involving unidentified information, the accuracy of all the predictors experienced a significant drop.
To produce 2-dimensional (2D) pictures from UAV audio data, the authors in [
122] suggested the normalization STFT for UAV detection. Firstly, the audio stream was split into
overlapping 20 ms pieces. After that, a CNN network that had been created was fed the normalization STFT as an input. Evaluations from outside using DJI Phantom 3 and Phantom 4 hovering were included in the dataset, and
non-UAV frames and
UAV audio frames were present in the datasets.
In [
123], the authors provided a hybrid drone acoustic dataset, combining artificially generated drone audio samples and recorded drone audio clips using GAN, a cutting-edge DL technique. They explored the efficacy of drone audio in conjunction with three distinct DL algorithms (CNN, RNN, and CRNN) for drone detection and identification and investigated the impact of their suggested hybrid dataset on drone detection.
The author proposed an effective drone detection technique based on the audio signature of drones in [
124]. To identify the optimal acoustic descriptor for drone identification, five distinct aspects were examined and contrasted. These included MFCC, Gammatone cepstral coefficients (GaCC), linear prediction coefficients, spectral roll-off, and zero-crossing rate as chosen features. Several SVM classifier models were trained and tested to assess the individual feature performance for effective drone identification. This was completed using 10-fold and
data holdout cross-validation procedures on a large heterogeneous database. The experimental outcome indicated that GaCCs were the most effective features for acoustic drone detection.
In addition, AWGN was added to the dataset before conducting the testing. With a detection rate (DR) of and a false alarm rate (FAR) of , the best results were obtained when training the CNN network with 100 epochs and low SNR ranges.
In [
117], a method was proposed to optimize numerous acoustic nodes for extracting STFT characteristics and MFCC features. Subsequently, the extracted characteristics dataset was used to train two different types of supervised classifiers: CNN and SVM. In the case of the CNN model, the audio signal was encoded as 2D images, incorporating dropout and pooling layers alongside two fully connected and two convolution layers. In the initial instance, the UAV operated at a maximum range of 20 m, hovering between 0 and 10 m above the six-node acoustic setup. The Parrot AR Drone
was one of the UAVs that was put to the test. Numerous tests were carried out, and the outcomes show that the combination of SVM and STFT characteristics produced the best outcomes, as expressed in terms of color maps.
In addition, the authors in [
16] explored the use of DL techniques for identifying UAVs using acoustic data. They employed Mel spectrograms as input features to train DNN models. Upon comparison with RNNs and convolutional (CRNNs), it was demonstrated that CNNs exhibited superior performance. Furthermore, an ensemble of DNNs was utilized to assess the final fusion techniques. This ensemble outperformed single models, with the weighted soft voting process yielding the highest average precision of
.
In order to differentiate between the DJI Phantom 1 and 2 models, the authors in [
118] suggested KNN classifier techniques in conjunction with correlation analysis and spectrum images derived from the audio data. They collected ambient sound from a YouTube video, as well as various sound signals from both indoor settings (without propellers) and outdoor environments, including a drone-free outdoor setting. Each sound was recorded and subsequently divided into one-second frames. By utilizing image correlation methods, they achieved an accuracy of
, while the KNN classifier yielded an accuracy of
.
2.3.1. Challenges and Solutions of Acoustic Signals-Based UAV Detection and Classification Using ML
Signal variability: Acoustic signals from UAVs can vary significantly based on factors like drone model, distance, environmental noise, and flight dynamics. Develop robust feature extraction methods to capture diverse acoustic signal patterns and account for variations in different UAV types.
Background noise and interference: Environmental noise and interference can obscure UAV acoustic signatures, affecting detection accuracy. Employ noise reduction algorithms and signal processing techniques to filter out background noise and enhance the signal-to-noise ratio for improved detection.
Distance and signal attenuation: Acoustic signals weaken with distance, limiting the effective detection range for UAVs. Explore advanced signal processing methods and sensor technologies to compensate for signal attenuation and improve detection range.
UAV classification from acoustic signatures: Accurately classifying different types of UAVs based on similar acoustic features. Implement machine learning models capable of discerning subtle acoustic signal variations for precise classification, possibly utilizing DL architectures.
Real-time processing: Achieving real-time processing of acoustic signals for timely detection and response. Optimize machine learning algorithms and hardware to enable faster processing speeds, potentially leveraging edge computing for quicker decisionmaking.
Environmental variations: Adaptability to changes in environmental conditions (e.g., wind and temperature) affecting acoustic signal characteristics. Develop adaptive algorithms capable of adjusting to environmental variations to ensure robust and reliable detection.
2.3.2. Future Directions of Acoustic Signals-Based UAV Detection and Classification Using ML
Sensor fusion and multimodal integration: Combine acoustic data with information from other sensors (e.g., visual or RF) to create more comprehensive and reliable UAV detection systems [
143].
Advanced machine learning techniques: Investigate advanced machine learning algorithms capable of handling complex acoustic data for improved classification accuracy [
123].
Privacy and ethical considerations: Address privacy concerns related to acoustic surveillance by implementing privacy-preserving methods without compromising detection accuracy [
110].
Robustness against adversarial attacks: Investigate methods to secure acoustic signals and machine learning models against potential adversarial attacks or spoofing attempts [
144].
2.4. UAV Classification Based on ML Using Radar
Radar-based techniques rely on measuring the radar cross-section (RCS) signature to recognize airborne objects through electromagnetic backscattering. Radar offers several advantages, such as wide coverage in both azimuth and elevation planes, extended detection ranges, and the ability to operate effectively in adverse conditions like fog, where visibility is poor. This sets it apart from other UAV detection methods such as acoustics and video camera (computer vision) strategies.
However, the identification of UAVs using RCS is more challenging compared to airplanes, primarily due to their smaller dimensions and the use of low-conductivity materials, resulting in a lower RCS. In [
145], it was found that the micro-Doppler signature (MDS) with time-domain analysis outperforms the Doppler-shift signature in enhancing the discrimination between clutter and targets. However, recently, ML-based UAV detection and classification tasks using radar have received more attention due to the overcoming of challenges in detection tasks and providing high-precision systems. The example detection scenario of the ML-based radar detection mechanism is shown in
Figure 6. Additionally, a summary of related research on radar-based methods using ML for UAV detection and classification is shown in
Table 8. Moreover, the dataset information of current research on radar-based methods using ML for UAV detection and classification is shown in
Table 9. However, many works have been conducted based on ML techniques for the detection of UAVs using radar technology [
146,
147,
148,
149,
150,
151,
152,
153,
154,
155].
A non-cooperative UAV monitoring technique proposed in [
146] utilized decision tree (DT) and SVM classifiers, along with the inclusion of MDS for UAV detection and classification. In the case of a two-class instance, the one-stage DT achieves a true positive (TP) ratio of
with a corresponding false positive (FP) ratio of
. The two-stage DT achieves
and
for TP and FP, respectively, using identical training and test datasets. The authors in [
147] proposed an approach based on a radar device-based detection strategy to protect structures from UAV attacks. The real Doppler RAD-DAR (radar with digital array receiver) dataset was developed by the microwave and radar departments. Having a bandwidth of 500 MHz, the radar in operation operates on the
GHz base frequency range using frequency-modulated continuous wave technology. In the CNN-32DC, the suggested CNN exhibits variation in terms of the number of filters, combination layers, and the extraction of feature blocks. The selection process aimed to achieve the most accurate result, which was then compared to various ML and classification methods. The CNN-32DC demonstrates higher accuracy compared to similar networks while requiring less computation in terms of time.
In [
148], the authors proposed a CNN model with a DL foundation that incorporates MDSs, extensively employed in UAV detection applications. UAV radar returns and their associated micro-Doppler fingerprints are often complex-valued. However, CNNs typically neglect the phase component of these micro-Doppler signals, focusing solely on the magnitude. Yet, crucial information that could enhance the accuracy of UAV detection lies within this phase component. Therefore, this study introduced a unique complex-valued CNN that considers both the phase and magnitude components of radar returns. Furthermore, this research assessed the effectiveness of the proposed model using radar returns with varying sampling frequencies and durations. Additionally, a comparison was conducted regarding the model’s performance in the presence of noise. The complex-valued CNN model suggested in this study demonstrated the highest detection precision, achieving an impressive
accuracy, at a sampling rate of
Hz and a duration of
s. This indicates that the suggested model can effectively identify UAVs even when they appear on the radar for very brief periods.
According to the study in [
149], the authors proposed a novel lightweight DCNN model called “DIAT-RadSATNet” for precise identification and classification of small unmanned aerial vehicles (SUAVs) using the synthesis of micro-Doppler signals. The design and testing of DIAT-RadSATNet utilized an open-field, continuous-wave (CW) radar-based dataset of MDS recorded at 10 GHz. Equipped with 40 layers,
MB of memory,
G FLOPs,
million trainable parameters, and a calculation time complexity of
seconds, the DIAT-RadSATNet module was quite powerful. According to studies on unidentified open-field datasets, “DIAT-RadSATNet” achieved a detection/classification precision ranging between
and
, respectively.
In [
150], the authors proposed a novel MDS-based approach, termed MDSUS, aimed at tackling the detection, classification, and localization (including angle of arrival calculation) of small UAVs. The synergistic utilization of a long short-term memory (LSTM) neural network and the empirical mode decomposition (EMD) methodology effectively addressed the blurring issue encountered in MDS within the low-frequency band. This approach enables the monitoring of small UAVs by leveraging attributes extracted from the MDS. In both short- and long-distance experiments, the LSTM neural network outperforms its two main rivals, namely CNN and SVM. Notably, precision is enhanced by
and
in the short- and long-distance experiments, respectively, when compared to the peak performance of the competing models, resulting in accuracies of
and
, respectively.
In [
151], the authors employed a frequency-modulated continuous wave (FMCW) radar to generate a collection of micro-Doppler images, measuring dimensions of [
]. These images corresponded to three different UAV models: DJI Inspire-1, DJI Inspire-2, and DJI Spark. Subsequently, the authors proposed a CNN architecture for the identification and categorization of these images. However, their research only encompassed one category class, and the maximum operational range of the targets was 412 m. As a result, they were constrained in the number of available train/test samples for each class. In [
152], the authors designed a three-layer CNN architecture for utilizing a generated micro-Doppler image collection of a DJI Phantom-3 UAV, which measured dimensions of [
]. The time–frequency (T–F) images were captured using a pulse-Doppler radar operating in the X-band with a 20 MHz bandwidth. To ensure an adequate number of train/test samples for their study, the authors combined simulated and experimental data
The authors in [
153] utilized a multistatic antenna array comprising one Tx/Rx and two Rx arrays to independently acquire matching MDS signatures, measuring [
], while operating a DJI Phantom-vision 2+ UAV in two modes: hovering and flying. For categorization, they employed a pre-trained AlexNet model. In [
154], the authors gathered a suitable MDS signature dataset of size [
] using three different UAV types: hexacopter, helicopter, and quadcopter. The categorization of SUAV targets often involves employing the nearest neighbor with a three-sample (NN3) classifier. In [
185], the authors investigated the feasibility of using a K-band CW radar to concurrently identify numerous UAVs. They used the cadence frequency spectrum as training data for a K-means classifier, which was derived from the cadence–velocity diagram (CVD) after transforming the time–frequency spectrogram. In their lab testing, they collected data for one, two, and all UAVs using a helicopter, a hexacopter, and a quadcopter. They found that the average precision outcomes for the categories of single UAVs, two UAVs, and three UAVs were
,
, and
, respectively.
In order to categorize two UAVs (Inspire 1 and F820), in [
155], the authors examined the pre-trained CNN (GoogLeNet) for UAV detection. The MDS was measured, and its CVD was ascertained while in the air at two altitudes (50 and 100 m) over a Ku-band FMCW radar. The term ’merged Doppler image’ (MDI) refers to the combination of the MDS and CVD pictures into a single image. Ten thousand images from measurements conducted outside were created and fed into the CNN classifier using fourfold cross-validation. The findings indicate that
accuracy in classifying the UAVs was possible. Remarkably, trials conducted indoors in an anechoic environment showed worse categorization ability.
The authors in [
186] proposed a UAV detection and classification system utilizing sensor fusion, incorporating optical images, radar range-Doppler maps, and audio spectrograms. The fusion features were trained using three pre-trained CNN models: GoogLeNet, ResNet-101, and DenseNet-201, respectively. During training, the parameters, including the number of epochs, were set to 40, and the learning rate was set to
. The classification F1-score accuracies of the three models were
,
, and
, respectively.
Using mmWave FMCW radar, the authors in [
187] described a unique approach to UAV location and activity classification. The suggested technique used vertically aligned radar antennae to measure the UAV elevation angle of arrival from the base station. The calculated elevation angle of arrival and the observed radial range were used to determine the height of the UAV and its horizontal distance from the ground-based radar station. ML techniques were applied to classify the UAV behavior based on MDS that was retrieved from outdoor radar readings. Numerous lightweight classification models were examined to evaluate efficiency, including logistic regression, SVM, light gradient boosting machine (GBM), and a proprietary lightweight CNN. The results showed that
accuracy was achieved with Light GBM, SVM, and logistic regression. A
accuracy rate in activity categorization was also possible with the customized lightweight CNN. Pre-trained models (VGG16, VGG19, ResNet50, ResNet101, and InceptionResNet) and the suggested lightweight CNN’s efficiency were also contrasted.
In [
188], the author introduced the inception-residual neural network (IRNN) for target classification using MDS radar image data. By adjusting the hyperparameters, the suggested IRNN technique was examined to find a balance between accuracy and computational overhead. Based on experimental findings using the real Doppler radar with digital array receiver (RAD-DAR) database, the proposed method can identify UAVs with up to
accuracy. Additionally, in [
189], the authors proposed employing a CNN to detect UAVs using data from radar images. The microwave and radar group developed the real Doppler RAD-DAR radar technology, a range-Doppler system. They built and evaluated the CNN by adjusting its hyperparameters using the RAD-DAR dataset. The highest accuracy in terms of time was achieved when the number of filters was set to 32, as per the experimental findings. With an accuracy of
, the network outperformed similar image classifiers. The research team also conducted an ablation investigation to examine and confirm the significance of individual neural network components.
The authors addressed the issue of UAV detection using RCS fingerprinting in their study [
190]. They conducted analyses on the RCS of six commercial UAVs in a chamber with anechoic conditions. The RCS data were gathered for both vertical–vertical and horizontal–horizontal polarizations at frequencies of 15 GHz and 25 GHz. Fifteen distinct classification algorithms were employed, falling into three categories: statistical learning (STL), ML, and DL. These algorithms were trained using the RCS signatures. The analysis demonstrated that, while the precision of all the techniques for classification was improved with SNR, the ML algorithm outperformed the STL and DL methods in terms of efficiency. For instance, using the 15 GHz VV-polarized RCS data from the UAVs, the classification tree ML model achieved an accuracy of
at 3dB SNR. Monte Carlo analysis was employed, along with boxplots, confusion matrices, and classification plots, to assess the efficiency of the classification. Overall, the discriminant analysis ML model and the statistical models proposed by Peter Swerling exhibited superior accuracy compared to the other algorithms. The study revealed that both the ML and STL algorithms outperformed the DL methods (such as Squeezenet, GoogLeNet, Nasnet, and Resnet-101) in terms of classification accuracy. Additionally, an analysis of processing times was conducted for each program. Despite acceptable classification accuracy, the study found that the STL algorithms required comparatively longer processing times than the ML and DL techniques. The investigation also revealed that the classification tree yielded the fastest results, with an average classification time of approximately
milliseconds.
A UAV classification technique for polarimetric radar, based on CNN and image processing techniques, was presented by the authors in [
191]. The suggested approach increases the accuracy of drone categorization when the aspect angle MDS is extremely poor. They suggested a unique picture framework for three-channel image classification CNN in order to make use of the obtained polarimetric data. An image processing approach and framework were presented to secure good classification accuracy while reducing the quantity of data from four distinct polarizations. The dataset was produced using a polarimetric Ku-band FMCW radar system for three different types of drones. For quick assessment, the suggested approach was put to the test and confirmed in an anechoic chamber setting. GoogLeNet, a well-known CNN structure, was employed to assess the impact of the suggested radar preprocessing. The outcome showed that, compared to a single polarized micro-Doppler picture, the suggested strategy raised precision from
to
.
2.4.1. Challenges and Solutions of Radar-Based UAV Detection and Classification Using ML
Signal processing complexity: Radar signals can be complex due to noise, clutter, and interference, requiring sophisticated signal processing techniques. Develop advanced signal processing algorithms to filter noise, suppress clutter, and enhance signal-to-noise ratio for accurate detection.
Signal ambiguity and multipath effects: Signal ambiguity arising from multiple reflections (multipath effects) in radar signals, impacting accurate target localization and classification. Explore waveform design and beamforming strategies to mitigate multipath effects and improve spatial resolution.
Classification from radar signatures: Accurately classifying different UAV types based on radar signatures exhibiting similar characteristics. Utilize machine learning models capable of distinguishing subtle radar signal variations for precise classification, potentially leveraging ensemble learning techniques.
Real-time processing and computational complexity: Processing radar data in real time while managing computational complexity for timely detection and response. Optimize machine learning algorithms and hardware configurations for efficient real-time processing, potentially utilizing parallel computing or hardware acceleration.
Adverse weather conditions: Performance degradation in adverse weather conditions (e.g., rain or fog) affects radar signal quality and detection accuracy. Develop adaptive algorithms capable of compensating for weather-induced signal degradation and maintaining robust detection capabilities.
Security and interference mitigation: Vulnerability to interference and potential security threats in radar-based systems. Implement interference mitigation techniques and security measures (e.g., encryption and authentication) to safeguard radar signals and system integrity.
2.4.2. Future Directions of Radar-Based UAV Detection and Classification Using ML
Multisensor fusion and integration: Integration of radar data with other sensor modalities (e.g., visual or acoustic) for improved detection accuracy and robustness [
107].
Advanced machine learning techniques: Exploration of advanced machine learning methods (e.g., reinforcement learning or meta-learning) for adaptive radar-based UAV detection systems [
192].
Enhanced model interpretability: Development of interpretable machine learning models for radar-based UAV detection to enhance transparency and trust in decisionmaking [
193].
Standardization and collaboration: Collaboration among researchers, industries, and regulatory bodies for standardizing radar-based UAV detection systems, ensuring interoperability, and advancing research in this field [
194].
In [
195], the authors proposed a novel UAV classification technique that integrates DL into the classification process, specifically designed to handle data from surveillance radar. To differentiate between UAVs and undesirable samples like birds or noise, a DNN model was employed. The conducted studies demonstrated the effectiveness of this approach, achieving a maximum classification precision of
.
The authors in [
173] proposed a unique approach to data augmentation based on a deterministic model, which eliminates the need for measurement data and creates a simulated radar MDS dataset suitable for UAV target categorization. Improved prediction performance is achieved by training a DNN on appropriately generated model-based data. A 77-GHz FMCW automotive radar system was used to classify the number of UAV motors into two groups, and the results were summarized. This demonstrated the effectiveness of the suggested methodology: a CNN trained on the synthetic dataset achieved a classification precision of
, while a standard signal processing data augmentation method on a limited measured dataset resulted in a precision of
.
2.6. UAV Classification Using Hybrid Methods
Despite the above-discussed four general detection classification methods, there is another possible detection method called hybrid sensor-based detection. The hybrid method is more dependable, durable, and personalized for drone detection techniques in various situations. Using visible light or optical detection in conjunction with acoustic, RF, and radar detection is a common trend in hybrid detection. Utilizing the combined outputs of two sensing technologies to make a detection judgment is a more popular method of hybrid detection. An alternate method is to generate an early detection alarm using a long-range non-line-of-sight detection scheme (such as acoustic, RF, or radar) and then use the alarm’s output to trigger the second sensor (usually a camera) to change its configurations (such as direction, zoom level, etc.) to perform a more precise and reliable identification. The hybrid detection scheme of UAVs based on ML and DL is shown in
Figure 7a,b. In addition, the summary of related work based on a hybrid detection scheme using ML is shown in
Table 10. Moreover, the dataset information of ML-based UAV classification and detection using hybrid sensor data is shown in
Table 11. Recently, many works have been completed using hybrid sensors UAV detection and classification based on ML algorithms [
69,
202,
203,
204,
205,
206].
The authors in [
69] presented a detection system based on ANNs. This system processed image data using a CNN and RF data using a DNN. A single prediction score for drone presence was produced by concatenating the characteristics of the CNNs and DNNs and then feeding them into another DNN. The feasibility of a hybrid sensing-based approach for UAV identification was demonstrated by the numerical results of the proposed model, which achieved a validation accuracy of
.
The study [
202] thoroughly described the process of developing and implementing an automated multisensor UAV detection system (MSDDS) that utilizes thermal and auditory sensors. The authors augmented the standard video and audio sensors with a thermal infrared camera. They also discussed the constraints and potential of employing GMM and YOLOv2 ML approaches in developing and implementing the MSDDS method. Furthermore, the authors assembled a collection of 650 visible and infrared videos featuring helicopters, airplanes, and UAVs. The visible videos have a resolution of
pixels, while the infrared videos are scaled to
pixels. The authors focused their analysis on evaluating the system’s efficiency in terms of F1-score, recall, and accuracy.
The authors of [
203] presented a system that continuously monitors a certain region and produces audio and video feeds. The setup consisted of thirty cameras (visual sensors) and three microphones (acoustic sensors). Following this, features were extracted from the audio and video streams and sent to a classifier for detection. For the classification and training of the datasets, they employed the popular SVM-based ML algorithm. The efficiency of the visual detection approach was
, while the audio-assisted method outperformed it significantly at
, as indicated by the findings.
A method for detecting tiny UAVs, which utilizes radar and audio sensors, was presented by the authors in [
204]. The system employs a customized radar called the “Cantenna” to detect moving objects within a specified target region. An acoustic sensor array is utilized to discern whether the object identified by the radar is a UAV. Furthermore, the system incorporates a pre-trained DL model consisting of three MLP classifiers that collectively vote based on auditory data to determine the presence or absence of a UAV. When the system was evaluated using both field and collected data, it demonstrated accurate identification of every instance in which a UAV was present, with very few false positives and no false negatives.
The authors in [
205] introduced a multimodal DL technique for combining and filtering data from many unimodal UAV detection techniques. To conduct UAV identification predictions, they used a combined set of data from three modalities. Specifically, an MLP network was utilized to combine data from thermal imaging, vision, and 2D radar in the form of range profile matrix data. To enhance the accuracy of deductions by combining characteristics collected from unimodal modules, they provided a generic fusion NN architecture. Multimodal features from both positive UAV and negative UAV detections make up the training set. The system achieved precision, recall, and F1-scores of
,
, and
, respectively.
The authors in [
206] proposed a combined classification structure based on radar and camera fusion. The camera network extracts the deep and complex characteristics from the image, while the radar network collects the spatiotemporal data from the radar record. Several field tests at various periods of the year were used to establish synchronized radar and camera data. The field dataset was used to evaluate the performance of the combined joint classification network, which incorporates camera detection and classification using YOLOv5, as well as radar classification using a combination of interacting multiple model (IMM)) filters and RNN. The study’s results demonstrated a significant enhancement in classification accuracy, with birds and UAVs achieving
and
accuracy, respectively.
The authors in [
143] introduced a multisensory detection technique for locating and gathering information on UAVs operating in prohibited areas. This technique employed a variety of methods, including video processing, IR imaging, radar, light detection and ranging (LIDAR), audio pattern evaluation, radio signal analysis, and video synthesis. They proposed a set of low-volume neural networks capable of parallel classification, which they termed concurrent neural networks. This research focused on the detection and classification of UAVs using two CNNs: a self-organizing map (SOM) for identifying objects in a video stream and a multilayer perception (MLP) network for auditory pattern detection.
2.6.1. Challenges and Solutions of Hybrid Sensor-Based UAV Detection and Classification Using ML
Sensor data fusion and integration: Integrating heterogeneous data from various sensors (e.g., radar, visual, and acoustic) with different characteristics, resolutions, and modalities. Develop fusion techniques that align and synchronize data from multiple sensors for holistic UAV detection and classification.
Data synchronization and alignment: Aligning data streams from diverse sensors in real time for accurate fusion and analysis. Implement synchronization methods to align temporal and spatial information from different sensors for cohesive fusion.
Complexity in feature fusion: Fusion of diverse features extracted from various sensor modalities for meaningful representation. Investigate advanced feature fusion techniques to combine and extract relevant information from heterogeneous sensor data for robust classification.
Model complexity and computational cost: Developing complex machine learning models for fusion sensor-based classification that can be computationally expensive. Explore model optimization techniques and efficient algorithms to handle the computational burden without compromising accuracy.
Scalability and real-time processing: Scaling fusion sensor-based systems to handle real-time processing of large volumes of data. Optimize hardware configurations and leverage parallel processing to enable real-time analysis of fused sensor data.
2.6.2. Future Directions of Hybrid Sensor-Based UAV Detection and Classification Using ML
Deep learning and fusion models: Advancing deep learning architectures tailored for sensor fusion to leverage the strengths of multiple sensor modalities for UAV detection and classification [
30,
69].
Dynamic fusion strategies: Developing adaptive fusion strategies capable of dynamically adjusting sensor weights or modalities based on environmental conditions for improved classification accuracy [
209,
210].
Privacy-preserving fusion techniques: Addressing privacy concerns by designing fusion techniques that preserve privacy while maintaining high accuracy in UAV detection [
211,
212].
Standardization and interoperability: Collaborating across industries to establish standardized protocols for sensor data fusion, ensuring interoperability and compatibility among different sensor systems [
72,
213].