1. Introduction
In recent years, there has been an increase in the commercial use of components that work in the high frequencies of radio radiation, including the millimeter waves (mmW) regime. One of the reasons for the commercial use and the lowering of costs is the broad demand for communication and the requirements for wider bandwidths than before, which is possible with the increases in frequencies.
Concerning the detection of targets the size of humans or animals, it is crucial to use high frequencies, especially when it is necessary to detect the limbs and micro-movements of those targets.
1.1. Millimeter Wave Radar Sensors
Radar sensors in electromagnetic waves are common worldwide in various fields such as the automotive industry [
1], where the radar used can provide warnings about a dangerous proximity to vehicles, information about weather [
2], and information related to medical applications in order to detect heart beats, breathing, and more [
3]. Radar sensors are also common in military industries in order to detect and track aircraft [
4], missiles [
5], shooting bullets [
6], drones [
7], and more [
8,
9]. The basic operating principle of radar is the transmission of an electromagnetic wave from the transmission antenna, radiation scattering by the target, and reception of the reflected back-scattered radiation from the target by the radar [
10]. A radar system can identify important characteristics of a target such as distance, velocity, acceleration [
11], size [
12], angle of arrival [
13], and more [
14] by analyzing the radiation that is scattered from the target and received by the radar’s receiving antenna, and much of the analysis is performed by comparing the known transmitted wave and the wave absorbed by the dispersion of the target and the environment.
Radar sensors are manufactured according to characteristics that are defined according to the needs, type, and characteristics of the targets that will be detected. For this reason, radar for detecting aircraft will not necessarily be suitable for detecting pedestrians because there are large differences between the required ranges and the sizes of the targets. There are key radar characteristics that must be determined according to the type of mission: (1) central frequency of the transmission; (2) waveform; (3) transmission power; (4) number of transmission and reception antennas; (5) processing capabilities.
The operation frequency of a radar is usually determined to fit the expected targets features, including dimension and velocity, enabling a reliable detection [
15]. Operation in high frequencies, such as in the mmW regime, allows better spatial and velocity resolution. Successful detection of a target also relies on the scattering characteristics of the target at the radar frequency. High frequencies result in better spatial and velocity resolutions. However, mmW suffer from higher atmospheric absorption, and it is preferable to choose a frequency within a ‘atmospheric window’, where minimum absorption occurs.
There are several works that have presented serious research on types of radar in the mmW regime [
16,
17,
18,
19]. The choice to use the mmW regime is derived from the following advantages:
Primarily, the components in mmW are limited by the maximum power they can work with, and these are typically relatively low powers [
22]. Constant envelope transmission techniques are very common in mmW for the following reasons:
The components in the mmW regime are very limited in their maximum power, and transmission in the maximum instantaneous power is possible due to the supply of constant power for the entire time.
The efficiency of the amplifiers in a constant envelope is higher.
The mmW better handles envelope noise [
23].
It can be concluded that there is no escape from using mmW in the case of tiny targets when, on the one hand, high directionality and detection resolution are achieved and, on the other, the transmittable powers are lower. In the case of working with mmW, a distinct advantage can be seen for using transmission in a constant envelope of transmission.
1.2. Millimeter Wave Radar Waveforms
The waveforms in radar systems have a significant influence on detection capabilities. Mainly, there are three radar operation modes [
24]: (I) pulse radar, (II) continuous wave (CW) radar, and (III) frequency modulation continuous wave (FMCW) radar. The differences are in the type of target detection (characteristic, range, or velocity), the relative required power of the transmission, the average power of the transmission, the computer resources for the data analysis, and the resolution in the detection (which determines the error rate of the obtained data).
1.2.1. CW Radar
Continuous wave (CW) transmission is a relatively easier option since the transmission is a carrier wave without any special modulation. With CW transmission, it is possible to detect the velocity of a target according to the Doppler effect [
25] without any additional information. The processing resources consumed are relatively low because only a fast Fourier transform (FFT) operation is required to detect the frequency difference between the transmission and the reception.
In CW radar, only a target’s velocity may be obtained, and the system cost is the lowest, the data analysis is relatively low, and the transmitted power is relatively low. The advantage of CW radar is its error rate, which depends only on the time window length for analysis, the natural immunity in a highly cluttered environment, and its constant envelope characteristic.
1.2.2. Pulse Radar
In pulse radar, a carrier wave is modulated with an amplitude modulation (AM) signal of pulses. The result is a carrier wave with transmissions and breaks at specified times. With this method, it is possible to discover the velocity of a target by comparing the transmission and reception frequencies and discover the range by measuring the delays between the pulses [
26]. This method is relatively more expensive than the CW transmission method and its processing requires more resources, although it provides the desired information about the range of a target. In pulse radar, the error in range measurement is the time between pulses multiplied by the speed of light and divided by two. Pulse radar is not immune to clutter, similar to the CW method.
Since pulse radar does not always transmit, in order to perform optimal detection, relatively higher instantaneous powers are required [
27]; hence, this type of radar is not suitable for radar in the mmW regime.
1.2.3. FMCW Radar
A more complex transmission method is the frequency modulation continuous wave (FMCW) method. With FMCW, it is possible to transmit all the time without the need for breaks (as in pulse radar). In this method, more energy is received from a target without the need for high instantaneous power [
28]. The FMCW is a ‘frequency-consuming’ technique because of the relatively broad bandwidth due to the frequency sweep, while the transmission of a continuous wave (CW) has zero bandwidth. In addition, in order to detect both, the range and speed of a target, the processing required is relatively more complex [
29].
In FMCW radar, both a target’s velocity and range may be obtained, the system cost is the highest, and the data analysis is relatively high, but the required transmission power is relatively low. The resolution of the velocity and range depend on the transmission bandwidth. The FMCW method is not immune to clutter, similar to the CW method.
1.3. Artificial-Intelligence-Based Detection and Classification
Artificial intelligence (AI) is based primarily on the passing of raw data through weights, and it involves mathematical functions of powers, convolutions, and non-linear activation functions. These weights are organized in layers while the data are inserted in the input layer, and the result is obtained in the output layer while all other layers are called hidden layers. The result in the output layer should be close to the expected value, and a cost function indicates the error between the result and the expected value. The weights are tuned for the purpose of minimizing the error of the cost function by the sophisticated process of back-propagation with iterations of the data with optimization algorithms [
30].
The field of AI has been studied for many years, but it has recently received significant attention due to the processing capabilities that have developed. In addition, the field of artificial intelligence has received significant weight as a practical solution to a variety of problems, and it has received a significant scope of research in the fields of computer vision [
31], voice processing [
32], radar [
33], and more [
34]. Many studies have reported great contributions to in accuracy and lowered complexity [
35,
36] using AI-based processing compared to classic solutions that do not include AI.
In general, an AI solution is suitable for the needs of classification or regression where the output of the system is based on the passage of the information through weights and filters. The learning is carried out on a known database in learning epochs that include an error calculation in each iteration between the desired value and the output of the system, with the weights readjusted according to the errors by a back-propagation technique. AI methods have added value over classical tools where the data have low signal-to-noise ratio (SNRs), as well as when the model used for classification is highly complicated for being analyzed with classical methods.
1.4. Detection and Classification of Pedestrians and Animals Using Radar
As was said previously, the detection of a target is made possible with the help of the scattering it causes as a result of the radar’s transmission. In order to detect targets with sizes in the order of meters, and even to perform a classification according to the movements of limbs in dimensions of centimeters, it is necessary for the radar used to work in the millimeter wave range. Different targets can have similar characteristics regarding their speed and distance, and the difference between targets may lie in the movements of their limbs. For example, a person and a dog walking in front of radar at the same speed and distance, without distinct movements of their limbs, will both be analyzed and classified as the same target. The successful classification and understanding of a target can be achieved when the frequency that is transmitted is suitable at a wavelength for the detection of limbs and an appropriate analysis is performed on the information.
Classification of a target by radar is based neither on the appearance of the target nor on its general shape; rather, it is based on the repetitions of the waves that are transmitted to it. For this reason, radar devices are exposed to many deceptions. For example, a human crawling in front of radar may be classified as an animal when the role of the radar is to classify between a person and an animal. In addition, there may be an error in the classification in the case where a person walks with a large metallic object, such as a gun, that causes multiple reflections toward the radar such that the reflections from the person are weaker.
1.5. Detection and Classification of Pedestrians and Animals Using Other Methods
There are several systems that can classify targets that do not use radar, such as cameras in the visible range [
37], thermal cameras [
38], LiDAR (light detection and ranging) devices [
39], and more. These systems provide a detailed picture of both a target and the environment of that target, and they are more informative than radar devices that only provide vector information. With all the advantages of systems that do not use radar, there are several significant disadvantages: (I) all the mentioned systems require a line of sight (LOS) with a target, whereby they are not suitable for difficult thicket conditions such as a forest environment or small hills; (II) systems in the field of visible waves will not work well for the same reason that the human eye does not see well in conditions of rain, snow, or fog.
1.6. The Proposed Technique for Pedestrians and Animals Classification
In this study, we proposed the use of a mmW Doppler radar sensor, which is a type of radar that works in the mmW regime, allowing for the detection of limb movements by human and animal targets. Further, this radar allows for receiving returns and it analyzes whether a target is carrying something. In addition, working in the form of Doppler, i.e., CW only, allows for detecting only moving targets, while all the static environment does not influence the performance. It can be said that conducting detection using Doppler radar constitutes a spatial filter and causes moving targets to be highlighted and easier to detect. Another distinct advantage of using radar is the existence of phenomena in electromagnetic waves such as diffraction and refraction [
40]. These phenomena are less noticeable in systems in the visible range; thus, detection and classification can be performed with the help of radar even in the case of partial concealment without an LOS.
This radar’s output is analog, and it is sampled and saved on a computer, which performs the preprocessing and adjustment of the information so that it can enter a NN. The proper use of an NN demands many recordings of all the desired types of targets for classification, in addition to examining several networks and choosing the network with the best results.
1.7. Our Contribution
In recent studies, classification tasks have been performed on recordings at fixed distances from targets or without separating the targets into deferent categories of distances, typically in a neutral environment and without attempts to fool the radar, while using lower-frequency types of radar. In addition, in existing studies, there has been one task required from the systems; for example, in [
41], they managed to classify the type of the target, in [
42], they introduced targets detection, and, in [
43], they presented activity classification of a target, and these tasks were performed separately. We added a task—range detection—to the tasks of classifying a target and its activity in the mmW regime using CW radar, which is naturally immune to clattering.
As expected, the classification at close range had the highest classification accuracy, and we also performed training on recordings with various ranges for making the system more reliable. The presented scheme is fundamentally different in the detection and certainty capabilities due to the high frequency of the used radar. In addition, we present an integrated capability for detection, classification, activity detection, and range estimation, all in one single neural network.
We contribute to the field in several unique and significant ways. Firstly, we introduce a reliable classification system for walking pedestrians and animals using a radar operating in the mmW regime. This is a significant improvement in detecting and classifying targets. Our proposed technique is robust, demonstrated by our inclusion of additional targets designed to ‘fool’ the radar system.
Reliable classification is not feasible when operating at low frequencies since the data on small movements are not available with the required resolution. The exclusive advantage of using mmW is the ability to detect sub-movements within the body, including activities of legs and arms in addition to the main movement of the center of mass. This enables additive information for the classification procedure. Employing interpretability analysis, we demonstrate how important is the detection of sub-movements of the legs and the hands for obtaining a reliable classification.
These tests highlighted the resilience and effectiveness of our methodology. Importantly, our system goes beyond classification, also enabling the detection of the range of targets. The core of our approach is a deep neural network (DNN) architecture that processes radar recordings as input after a pre-processing procedure. This use of DNNs represents a novel application in the field. We also employ high frequencies in tandem with the neural network-based classification, which underscores the superiority of our scheme over current state-of-the-art methodologies. Lastly, we strive for transparency and understanding in our work. Therefore, we conducted an in-depth analysis of the neural network using interpretable tools, such as explainable AI (XAI). Understanding the decision-making process of the DNN and its ability to perform multiple tasks at once is possible through this. These unique aspects significantly differentiate our work from existing research and pave the way for further advancements in radar-based target classification and detection.
1.8. Article’s Sections
The remainder of this paper is organized as follows:
Section 2 presents the system model and detection technique;
Section 3 and
Section 4 describe the pre-processing stage for the data before the DNN and the dataset creation in the field;
Section 5 presents the DNN training process;
Section 6 presents the interpretability analysis for the pre-trained DNN and an in-depth performance analysis;
Section 7 concludes the study.
2. Micro-Doppler Radar System
The use of a micro-Doppler radar for small targets demands working with short wavelengths, as described in the introduction. A recommended frequency for this system is 94 GHz, as used in [
19]. As expected with the Doppler effect, the transmission of frequency toward a target which has a relative linear movement to the radar will be returned with a frequency shift.
To analyze the returned frequency in the case of a 94 GHz transmission, a minimum sampling rate of 188 GHz is required in order to comply with the Nyquist condition. Sampling at this rate is nearly impossible, and it demands expensive equipment, making it necessary to find a way to lower the frequency.
Using a mixer and a 94 GHz frequency generator in a local oscillator is a good solution, but the instability error between the generators of the transmission and the local oscillator input remains. When the instability is much greater than the expected frequency shifts, the detection of the radar will be uncertain or impossible.
One solution involves using the coherent radar method, where the transmission is subtracted from the reception in analog manner. In this way, the generator’s instability error is subtracted, as described in
Figure 1.
The radar sensor proposed in this study and shown in
Figure 1 consists of a fixed frequency generator with the frequency f
0, an antenna that is used for both transmission and reception, a circulator that enables simultaneous transmission and reception, a single-channel mixer, and an analog-to-digital (ADC) device. The generator signal is connected through the circulator to the antenna so that a transmission of frequency f
0 is carried out at the radar output. The transmission signal at the radar output has the frequency f
0, and the amplitude A
T is obtained by the following expression:
The transmission signal hits the target at the distance R(t). The target scatters the radar’s transmission signal, with some of the scatters being reflected backward to the radar. The scatters from all the N sub-movements of the target returned from the target backward to the radar have the amplitudes
and the time delays
, which are calculated as follows:
where
is equal to the round-trip
divided be the speed of light, c, as follows:
According to
Figure 1, the input signals to the mixer comprise the local oscillator CW signal
and the received signal
. The output of the mixer after low-pass filtering (LPF) is the signal Z(t), calculated as follows:
where the LPF is carried out without a filter and instead by waveguides that do not pass all the high harmonics of the mixing process.
The result is a time-varying phase for each sub-movement
. After digitization by ADC and spectral analysis by the fast Fourier transform (FFT) operation of
, the following Doppler frequency is obtained:
which is related to the derivative of the phase
. The derivative of the distance R(t) comprises the radial velocities,
, of the target and its micro-movements relative to the radar, as follows:
Equation (5) presents the linear relationship between the radial velocity of a target relative to the radar and the frequency obtained at the radar detector output .
In order to track the instantaneous micro-movements velocity, a short-time Fourier transform (STFT) is employed. Fourier transformation is carried out within a moving temporal window of a duration T. The resolution in the frequency domain is, thus,
From the relation appearing in Equation (5), the resulted velocity resolution of the radar is
Consequently, the minimum integration time
T for obtaining a required velocity resolution
is
Inspection of Equation (9) reveals that increasing the carrier frequency allows shortening integration time T, while keeping the required resolution of velocity.
The measurements in this study were performed with the radar shown in
Figure 2. This radar was a purchased radar with the parameters specified in
Table 1.
The W-band radar employed in this work is used to demonstrate the feasibility of performing a variety of classification tasks by a NN. In order simplify the implementation, a radar with single antenna and a circulator was used. In such configuration, the limited isolation of the circulator (18 dB) may lead a degradation in the dynamic range. However, it was found that, for the range of distances considered in the experiments, the targets were detected with sufficient accuracy enabling reliable classification.
3. Pre-Processing by Time–Frequency Analysis
Equation (5) shows the way to measure the velocities of a target with the help of Doppler radar when it is also possible to measure the amount of return of a target, a parameter characterized as the radar cross-section (RCS). Although the amplitude of the transmission is constant, the amplitudes in the reception, , depend on the RCS of the targets and the distance between the radar and the target, while the RCS is mainly affected by the size of the target and the material from which it is made.
In this form of analysis, information is obtained about the velocity of a target and the strength of the return without any information about the changes in time. This analysis is still too simple when a more qualitative discovery of the parameters that characterize the radar targets is desired since much of the information is essentially available in the time plane.
A time analysis, in addition to frequency and power analyses, demands a short-time Fourier transform (STFT) operation. In an STFT operation, the total data are divided into separate time windows with same period T, and an FFT operation is performed on each of them separately. The result is a multiplicity of frequency–power-type graphs at different times, and, when a total analysis of all those graphs divided by time is performed, the time plane is also considered.
The signal
from Equation (4) is entered into the FFT, in accordance with
Figure 1. By using an STFT analysis with the window of length T, the three-dimensional output
is obtained as follows:
and, when
is the Fourier transform of (
),
is a window function that is equal to zero when t < 0 and T < t. In this study, the Blackman window was chosen, and
was calculated as follows:
In the case where
has a very slow variant for the period of time T, Equation (5) transforms into the form
A convenient way to view a 3D graph created by the STFT process is to display the power plane in a color display. Thus, the Y-axis represents the frequency, the X-axis represents the time, and the colors represent the Z-axis of the power according to a color map that should be attached to the graph. There are cases where time overlaps are made between time windows for getting a better time resolution. The result of the process is a graph with three axes commonly called a spectrogram.
Figure 3a illustrates the time windows division from the complete vector x(t).
Figure 3b presents the FFT for each separated time window in a comparison to the FFT of the complete vector x(t).
Figure 4 presents the spectrograms with a comparison between full-time Fourier Transform analysis versus short-time Fourier transform.
The color map can also be displayed in grayscale. With this method, it is possible to distinguish between the types of signals that change in time, even when their central frequencies are the same. Thus, for example, it is possible to distinguish between the types of signal modulation in wireless communications. Without a spectral analysis over time, all signals would receive a relatively similar spectrum image. Similarly, in a Doppler radar system that detects instantaneous velocities, it is possible to notice changes over time when we use a time-spectral display; thus, it is possible to perform an in-depth analysis of the signal.
In this study, we present a simplified way for frequency analysis over time. Other spectral analysis techniques, such as wavelet transform [
44] can be also employed. However, we found the STFT analysis efficient for the processing and target classification. Note that, in alternate methods, the frequency resolution is dependent on the frequency. In the measured scenarios, the resulted Doppler frequencies are spread over a broad range. We preferred to employ the standard STFT, while keeping the resolution equal in order to avoid introducing bias into the classification procedure.
4. Dataset Creation
Classifying signals with the help of a DNN’s supervised learning is a data-driven method; therefore, to learn, it requires recordings. In general, the accuracy of the classification improves if the data are numerous and qualitative.
When it comes to a DNN whose job is to classify different signals from different targets, the task becomes very complex when the same type of target has great diversity in the range from the radar, the direction of arrival (DoA), the multi-path, and other factors. The task is even more complex when there is an attempt to fool the radar. Therefore, in this study, a dataset was built with multiple ranges, targets, and attempts to fool the radar.
Seven categories of targets were recorded by the radar:
Targets walking (human) at close range: four different people and 132 recordings;
Targets walking (human) at medium range: four different people;
Targets walking (human) at long range: four different people;
Targets walking (human) with a concealed weapon: four different people (the goal was to test whether a person walking toward the radar could hide their intentions);
Targets walking and running (dogs) at far to close range: five different dogs;
Targets crawling (human) at close range: four different people (the goal was to test whether a person could fool the radar and impersonate a dog);
Recordings with no target at all, i.e., open space (these recordings were created to prevent false alarms; when the system is operational, there will not always be targets, and the system also needs to recognize this case).
The system described in
Figure 1 is shown in
Figure 5 as it was assembled and placed in the field to perform the recordings.
The system was placed in a wooded area, and recordings were made of a target walking at three different distances, as shown in
Figure 6.
Recordings without a target have no meaning for distance. Creating a recording of walking dogs is a complex task; therefore, no divisions into different distances were made. The methods of obtaining recordings for dogs and without a target are shown in
Figure 7.
A summary of the composition of the dataset can be seen in
Table 2.
The total dataset consisted of 748 recordings. The created dataset was very diverse, and there was also an opening to prevent false alarms by creating recordings with no targets. The spectrogram of the variety of recorded targets is presented in the next section.
5. Neural Network Training Procedure
After making all the recordings of all the desired targets, the recordings were put into pre-processing, as discussed in
Section 3 of this study. The spectrograms were put into folders, with each folder having the name of that category. In this way, the training process was performed using the supervised learning method where the DNN had to be trained to associate the recordings with the name of the folder from which the recordings came.
The dataset was divided into three sub-datasets as follows:
Dataset for training: This set had 60% of the recordings from the main dataset. It was used for learning in iterations, where, in each iteration, there was a change in the weights according to the training.
Validation database: This set had 20% of the recordings from the main dataset. These were spectrogram images that did not exist in the training set, and they were used for analyzing the online training performance.
Database for testing: This set had 20% of the recordings from the main dataset. These spectrogram images were not entered into the training process, and they simulated a situation where the system was operated in the field; and the capabilities of the network were observed by performing the target classification in operational mode.
In the training process, the spectrogram images were inserted into the DNN, as shown in
Figure 8.
Figure 8 presents the structure of a convolutional neural network (CNN), which is a subtype of a DNN. In this structure, the order of the layers is tailored to fit the classification procedure. Initially, the input to the network is a grayscale spectrogram, sized at 228 × 288 pixels. This image is fed into the first convolutional layer to identify potential features. Within this convolutional layer, the presence of corresponding features is detected. Following this layer, normalization of all indications must be performed before proceeding to the ReLU activation. This normalization step is crucial because ReLU is a non-linear function, as discussed in the introduction above. After the ReLU activations, the max pooling layer reduces the dimensions to decrease the complexity of the classification computation. Following max pooling, the layers are replicated. The output from the final ReLU layer is channeled into seven neurons in the fully connected layer. The softmax layer then generates the probabilities corresponding to these seven neurons. Finally, the prediction is executed by the classification layer. The general structure of the DNN and the order of the layers depicted in
Figure 8 are typical of CNN-type neural networks [
42,
43].
The network shown in
Figure 8 then underwent a training process according to the process described in
Figure 9.
The DNN from
Figure 8 underwent the training process from
Figure 9. The progress achieved through the training process is shown in
Figure 10.
Figure 10 shows a suboptimal result of the DNN’s training. Although the accuracy of the study approached 100%, the validation database showed a different situation, with an accuracy of approximately 80%. The actual accuracy that should have been achieved with the new data was that of the validation database; thus, the result did not appear to be ideal.
In the case of a very diverse database, there may be a need to understand the reasons for the low level of accuracy, and sometimes, there is no reason to be concerned. To better understand the reasons for the low accuracy, the confusion matrix, which separately shows the reasons for the overall accuracy for each type of category, can be consulted.
Figure 11 shows the confusion matrix of the trained network.
Several important conclusions can be drawn from
Figure 11:
High accuracy was achieved at a close walking range and even at a medium range.
Poorer accuracy was achieved for long-range walking, where the main confusion was due to the recordings without any target, and it was, nonetheless, acceptable.
There was confusion between the recordings without any target and the recordings of far-range walking, which was acceptable.
There was slight confusion between the crawling and walking targets at both medium and long ranges.
There was no confusion between the crawling person and the walking dog (the radar could not be fooled).
High-quality detection existed for the walking ranges (even though we used Doppler radar).
There was little acceptable confusion due to overlap between the ranges.
It is important to note that the resources of the available data for training neural networks in the tasks are very limited. This requires the establishment of new dataset for the purpose of this research only. For the purpose of training the DNN and providing real proof of the feasibility of the presented scheme, initial recordings of the radar targets were made for the neural network training. Despite the limited quantity of recordings, which is not particularly large, the training process shown in
Figure 10 demonstrates convergence around a finite value. The confusion matrix, shown in
Figure 11, demonstrates a sufficient classification reliability.
6. Interpretability
Understanding how decisions are made in a DNN is challenging since DNNs are rarely driven by a clear model. There are a few ways to comprehend the DNN’s decision-making abilities and motivations in order to understand how they operate. Taking information samples from various categories and comparing the activation levels of various attributes is one of the techniques used in classification networks.
The features in the first convolution layer were trained to find different patterns according to the different categories across the images while the features in more advanced layers learned to find the different locations of the previous activations.
6.1. Activity Detection
To understand the patterns of the different categories that the features in the first convolution layer identified in the images, we performed a comparison between three different features across three different categories from the database that were saved for examination. The activations obtained from the three different features in the first convolution layer are shown in
Figure 12.
Figure 8 shows the hierarchy of the DNN that was employed in this study. It can be seen in the first convolution layer that there were 11 features. The most noticeable features in this layer—features 2, 4, and 8—were chosen to represent how the network operated for part of the process of classification.
Figure 12 demonstrates the activation images produced after passing three images of different categories through features 2, 4, and 8. The left column in
Figure 12 shows the activations for feature 2, the middle column shows the activations for feature 4, and the right column shows the activations for feature 8. For each of the features, three spectrogram images for the three different categories were inserted.
Three categories out of the seven were chosen to distinguish the way the network worked when the network was asked to classify the activities (walking with and without a gun), and a comparison was made with the situation where there was no target.
For features 2 and 4, there were almost no activations in the case where there was no movement (as shown in
Figure 12a,b). For the walking category, the activations were significant for limb movements (as shown in
Figure 12d,e). For the walking movement with a weapon, there were significantly stronger activations for the center of mass, as shown in
Figure 12g,h.
For feature 8, there were strong activations to noise across the entire frequency and time domains in the case where there was no movement, as shown in
Figure 12c, and there were different activations for walking and walking with a weapon, as shown in
Figure 12f,i.
For clarification, an analysis was conducted to better understand the indications of limbs and center of mass that appeared in the activation features of
Figure 12d,e,g,h. The corresponding input images, shown as grayscale spectrograms, are presented in
Figure 13. The movement of the center of mass is depicted as an average signal near line 150 on the Y-axis of the spectrogram. After the max pooling layer, this center-of-mass movement becomes evident as an activation feature around line 75. A closer look at
Figure 12g,h reveals a pronounced activation at this line.
Furthermore,
Figure 13 displays limb movements across the Y-axis spanning from line 50 to line 228. After the max pooling layer, the anticipated activations for this range are halved, resulting in activations from line 25 to line 114. This is evident in the activation features of
Figure 13d,e, where the activations span the Y-axis between these lines. This relationship highlights the connection between the micro-Doppler mapping of the spectrograms and the subsequent activations.
6.2. Range Detection
In this study, work was performed using Doppler radar. According to Equation (5), Doppler radar can provide information about a target’s velocity without any information about its range. In practice, the measurements were made for three different walking ranges for four different people, and, in most cases, the network discovered the walking range in addition to detecting the walking itself. Perceptions about the reason for the discovery of the range may be achieved by an analysis similar to that in the previous section; thus, images from the test database were inserted into the three features from the first convolution layer. The results of the activation images are shown in
Figure 13.
Figure 14a,b present strong activations for features 2 and 4 in the case of walking in the near range. The activations were weaker for the average range, as shown in
Figure 14d,e, and they were even weaker for the far range, as shown in
Figure 14g,h. Thus, it could be said that a closer range of walking led to a higher level of activation for these features. Additionally, for feature 8, a greater walking range led to greater activation.
Therefore, it could be concluded that the distance classification results from the different SNRs of the measurements were due to the differences in the intensities of the reflected radiation from the targets as a function of the different distances of the walking measurements.
6.3. Dog and Human Detection
A radar that will work in the field should classify targets correctly with high certainty, even attempts to fool the system are made. One of the possibilities for attempting to fool a radar that can differentiate between a person and an animal is to attempt to resemble an animal by crawling in front of the radar. The DNN shown in
Figure 8 received recordings of a human crawling in order to test the DNN’s ability to detect that the target was a crawling human, rather than an animal.
According to the confusion matrix in
Figure 11, the DNN rarely confused a crawling human with a walking dog, even in the case where there was a small number of examples in the dataset. Observing the various activations within the DNN should provide insight into its ability to differentiate a walking human, a crawling human, and a walking dog. Here again, the activations were selected after the first convolution layer with features 2, 4, and 8. The activation results are shown in
Figure 15.
As discussed in the previous paragraphs, for comparison, low activations can be seen for non-target recordings. It can be seen in
Figure 15e,h,k that, in the column for feature 4, there are differences between the activation results for the same feature for different targets. For the other features, it is difficult to identify significant differences; indeed, this is a comparison between different but relatively similar targets in relation to the previous cases examined.
6.4. Prediction Accuracy
The common radar parameters, true positive (TP) and false alarm (FA), are extremely important for understanding and evaluating whether a radar system is reliable.
Figure 15 presents the process of deriving the TP directly from the confusion matrix shown in
Figure 11. The TP considers cases in which there were true classifications of targets, cases in which a person was detected as a person, and all cases of walking and crawling. As can be seen in
Figure 16, for all 25 recordings, crawling was detected as crawling and walking, and near- and medium-range walking was detected as walking (and overlap between them was allowed), while only 2 out of 25 recordings of far-range walking were wrongly detected as noise. This is understandable because, with any system, there is a degradation in abilities as long as the range for detection increases.
Figure 17 presents the process of deriving the FAs directly from the confusion matrix shown in
Figure 11. The FAs indicate the cases in which the predictions indicated the existence of a target, while, in reality, there were no targets. In our case, targets that were dogs or instances with no target at all were predicted as being walking person, as can be seen in the red arrows in
Figure 17. All the targets of persons, shown in the green circles, were true predictions.
In order to present the classification capabilities of the DNN in a realistic way, it is necessary to divide the classification capabilities according to different ranges, as in any practical system. That is, it would be wrong to say outright that the DNN misclassifies when it was given a very difficult task of classifying targets at a very long distance. As shown in
Figure 16, accuracy dependent on distance can be established (as shown in
Figure 18) where, for the near-range and medium-range walking, the resultant accuracy was 100%, and for the far-range distance, there were two mistakes (out of 25), which resulted in an accuracy level of 92%. As discussed earlier, this study was unique in that there was a division of the radar targets into different ranges as a significant step toward an operational system, whereby closer targets led to a higher classification accuracy. A degree of uncertainty was allowed when the targets were far away.
The subject of false alarm was significant, and the study indicated the number of false alarms expected for the system. According to
Figure 18, the FA measure could be extended in the case of a small database when no false alarms were presented in the medium-range walking distance (possibly because only the walking human targets had range diversity). Three false alarms were shown for an actual dog walking, which was classified as a human walking at close range, out of 107 recordings in total, and four false alarms were shown for walking at a far distance when in fact there were four cases where there was no target. Given that there was a situation without a target and a situation where a dog was walking, these were cases where there was no need for an alarm from the system. In situations where a person was walking and that person needed to be warned, the degree of FA was 3 out of 107 for close-range distances (2.8%) and 4 out of 107 for far-range distances (3.7%). An illustration of the FA factors based on the actual DNN classifications can be seen in
Figure 19.
The X-axis in
Figure 19 indicates the decision levels of the DNN after the softmax layer (from
Figure 8). This layer shows the classification percentages for each category. Immediately afterward, the classification layer made hard decisions between the categories; hence, the category with the highest probability was accepted as the final classification. It was possible to manually change the threshold levels between the different categories so that the system could have different FAs and TPs than what were obtained in
Figure 16,
Figure 17 and
Figure 18, and these threshold levels are illustrated in
Figure 19 as threshold 1 and threshold 2.
Figure 19 introduces a significant ability to calibrate decision thresholds in a tactical realistic system. Changing decision thresholds is extremely important because, in many cases, a higher or lower priority is needed for some of the targets or activities that may occur in front of the radar.
7. Conclusions
A detection technique was examined, and it presented the ability to classify radar targets, detect their activity, and detect their range, while showing immunity against being fooled due to its use of a single neural network. A dataset for walking human targets was divided into three different ranges between the target and the radar, and a high classification accuracy was shown for near-range targets over far-range targets. This is a significant milestone and constitutes progress toward the creation of an operational system. The TP and FA parameters were analyzed given the provided pre-trained NN, and an XAI analysis was performed on the activations of the first convolution layer of the DNN for achieving interpretability and understanding how the DNN worked in completing its various tasks. In the presented scheme, superiority was demonstrated using a DNN combined with a micro-Doppler radar in the millimeter wave frequencies, and it was immune to clatters and detected the micro-movements of the targets. A very high level of accuracy and a low FA were achieved despite the relatively small database, which was coupled with a relatively low-cost radar system and a complexity-efficient neural network.