1. Introduction
Radar sensors are relatively robust to weather conditions and have the advantage of being able to detect large areas. In particular, it can be applied as a helpful sensor for detecting humans in outdoor conditions for security purposes. For detection, Doppler signals of radar are essential information to filter moving targets under environments with complicated background noise. For air surveillance radar, since the speed of a target is very fast, it may be not challenging to classify target signals from background noises. However, human walking motions induce complex frequency modulation that includes the Doppler shift by a torso and the additional Doppler shifts of the side-band in the returned signal; this phenomenon is known as the micro-Doppler effect [
1]. In particular, since the speed of human walking is much slower than that of airplanes or vehicles, classifying micro-Doppler signals by humans walking from background noise signals can be a severe problem. Since the bandwidth of micro-Doppler signals of human walking motion can overlap the bandwidth of micro-Doppler signals of background noise, it can be challenging to filter two signals individually.
Figure 1 describes the difficulty of detecting human walking in an outdoor environment.
Processing techniques that analyze or recognize patterns of received signals can be useful approaches to overcome these limitations. Much research has investigated detection problems in Doppler radar signals using micro-Doppler signatures or pattern recognition techniques. According to the reported studies, many researchers have described or proposed mathematical equations or approaches for extracting a meaningful feature vector that can affect the classifier’s estimation accuracy [
2,
3,
4,
5,
6,
7]. Similarly, other studies tried to find meaningful features through statistical analysis [
5,
6,
7,
8]. These approaches need a relatively long pre-processing time (window time) to extract a feature vector from the micro-Doppler spectrogram. For example, the authors of Reference [
2] reported that six features are extracted by processing data over a three-second period. Another work [
3] analyzed the classification accuracy according to the dwell time and reported that three or four seconds were required to get 90% accuracy. Since this window time directly affects the response and update speed of a radar sensor, a radar sensor’s response speed is lowered as we increase this window time.
Also, References [
9,
10] showed that a model having parameters or mathematical equations could be used to formalize micro-Doppler signals of human walking motion and the virtual features of a human walking motion can be generated. However, making a model that can cover all human activities and background noise may be challengeable. Also, statistical analysis or modeling of highly variable micro-Doppler signals for extracting an essential feature vector may be quite challenging approaches. Instead of these previous approaches, recent deep learning gradually reconstructs and generates features through a process of passing several hidden layers in deep networks. If we design deep learning well, we can easily extract a feature vector without complex calculations for pre-processing or simulation models.
Recent studies related to the convolutional neural network, which is a type of deep learning methods, have been researched for recognizing micro spectrogram signals and they showed improved classification accuracy by deep learning technology [
11,
12,
13]. However, since the long window time is still required to obtain micro-Doppler spectrogram images, it may negatively affect the sensor’s response speed. For example, there are limitations to the application of these approaches in short-range sensors that need to recognize targets quickly. On the other hand, in order to reduce the window time, References [
14,
15] proposed methods to directly extract features from raw data without any pre-processing and using them as the input to the designed classifier and they showed the possibility of classifying micro-Doppler signals without generating image data. However, since the amount of information in the input data for extracting the feature is also reduced according to the decreased window time, the classification accuracy may degrade. So, it should be seriously considered how to improve the classification accuracy of the corresponding classifier.
To tackle this problem, in this paper, we propose a stacking method as an ensemble method of deep networks to distinguish micro-Doppler signals caused by humans walking from the background noise using radar sensors. The stacking method can be a practical approach to improve classifier’s accuracy without increasing the window time. About the latest research, Reference [
16] reported that a stacking deep neural network (DNN) method improves performance for acoustic signal processing. In the field of image recognition, Reference [
17] reported that stacking could be a useful approach to improve performance. Also, stacked auto-encoders to detect a human fall using radar have been researched [
18].
About the need to include background noise signals in this research, since the background noise has the potential to distort the micro-Doppler effects generated by human walking, we should consider the effect of background noise in a design classification algorithm for outdoor applications. Also, if the classification algorithm can classify background status, it is possible to get helpful information about environmental conditions.
This paper is organized as follows.
Section 2 describes the micro-Doppler signals collected in an outdoor environment and
Section 3 describes the design of classifiers.
Section 4 shows the experimental results and
Section 5 concludes this paper.
2. Micro-Doppler Signals
2.1. Backgrounds
Moving targets generate Doppler frequency in the radar return signal due to velocity. Since most targets are not rigid bodies, different parts of a target may have additional vibrations and rotations. For example, when a person walks, his arms naturally swing. These movements generate additional Doppler shifts, called micro-Doppler effects. Much research regards these to be useful information for identifying the target.
The returned radar signal by a moving rigid body is modeled as shown in Equations (
1) and (
2), where
is the carrier frequency,
A is the intensity of the receiving signal,
is the moving speed of the target,
c is the light speed,
is the Doppler shift and
is the phase.
The returned radar signal having the micro-doppler effect can be modeled as a form of a combination of
M sine-wave signals as shown in Equation (
3).
After the application of a matched filter and a low pass filter, we can get the base-band signal as shown in Equation (
4).
Equation (
5) shows the discrete signal sampled by the sampling rate
and Equation (
6) shows the signal form converted by DFT (Discrete Fourier Transform), where
N is the number of samples.
The DFT signal is calculated from N samples of , which contains information on the variation of micro-Doppler signals over time. Therefore, even if a spectrogram image used by the previous studies is not applied, we can extract the feature information related to micro-Doppler effects from .
To reflect variability of
over time, we devise a method of utilizing previously generated feature information. Equation (
7) shows the feature generated by a base classifier. And, we reuse this feature information as the input data for a blender to combine base classifiers in the proposed stacking method. Equation (
8) shows the feature data for a blender. The structure of
is shown in
Figure 2 and the optimal weights of
are calculated after training.
2.2. Radar Sensor Board and Specifications
We designed the Doppler radar sensor board.
Figure 3 shows the block diagram of the low-cost & short-range 24 GHz Doppler Radar and
Figure 4 shows the radar sensor board implemented in this research. Radar sensor board has two main part of RF-Antenna circuits and processing circuits. RF-Antenna circuits include the micro-strip patch antenna, CW signal generator, Tx amplifier (Drive amp.), low noise amplifier and homodyne I/Q mixer (direct conversion) and two-stage cascaded baseband amplifier. The processing circuits have the micro-controller having AD converter and interface for the external data transmission and power circuits.
Table 1 shows the specifications of 24 GHz Doppler radar.
2.3. Data Acquisition
Figure 5 shows the experimental setup used to gather signals. The test site is an open space with no obstacles. We considered micro-Doppler signals caused by four types of background noise (Line of sight (LoS), fan, snow and rain). These noise signals can be understood as common background noises that can frequently be received in real outdoor environments. We additionally considered human walking micro-Doppler signals combined with these four types of background noise; that is, we gathered the eight types of signals in total whose examples are shown in
Figure 6. The Doppler radar was installed at a selected test site and raw data were gathered for several days. In particular, to see rain and snow effects, we collected a rain and snow background noise dataset.
To measure signals generated by human walking motions, the experimenter only moved within the radar detection area. We set the detection area through the pre-test and marked the boundary on the ground. Three adults (height: 173 cm, 177 cm, 182 cm) participated as the experimenter. The experimenter selected and moved the direction that the experimenter wanted within the detection area. We consider only human walking motions of regular walking and fast walking as general cases. These walking movements are the movements that we use in our daily life. Unusual human movements, such as crawling and irregular movements without direction and purpose, were excluded. To collect Doppler signals caused by background noises, we used a limited experimental environment to avoid mixing different background noises. The fan signals were collected using two types of ordinary electric fans for home use. The fan signal was measured at a distance of 5 m from the radar.
The experiments under clear weather conditions were done for about three days and micro-Doppler signals for rain and snow were collected for two days each on the day when the weather conditions were met. The amount of rainfall and snowfall was according to the weather forecast and we measured rain and snow signal under moderate rainfall and snowfall conditions. We carefully notify that this experiment does not cover all weather conditions.
2.4. Pre-Processing for Raw Data
The processing methods of previous research require mathematical equations for extracting feature vectors from I/Q signal shown in
Figure 3. In these methods, after extracting a feature vector by feature extraction algorithms using mathematical equations, the classification algorithm (e.g., SVM or DNN) processes this generated feature vector. However, the optimization of this approach could be quite complicated because of the difficulty of finding an essential feature vector and it requires comparatively long pre-processing time due to pre-processing for this feature extraction. Meanwhile, this paper does not use complex mathematical equations but also aims to extract features and classify targets in one second of processing time. We do not use any pre-processing methods for feature extraction and we use only Fourier transform in pre-processing for classification. We use signals of frequency domain converted by Fourier transform as the input data of DNN. The trained DNN automatically extracts and generates the features and it predicts the optimal results.
3. Classifier Design
We firstly design and test a binary classifier and a multiclass classifier using the most widely used support vector machine (SVM). Since SVM has been widely applied as a pattern recognition algorithm for detecting human motion by radar [
2,
5,
19,
20], we use SVM as a reference algorithm for comparison to our algorithm proposed in this study.
Before designing the stacking multiple classifiers, we design and verify the basic DNN (multiclass classifier) for classifying 8 types of signal. Then, we propose a stacking method to increase the average accuracy. To design the stacking method, we combine the multiclass classifier and the binary classifier. The binary classifier is designed to classify background noise signals (a, b, c and d) and human walking signals (e, f, g and h) in
Figure 6. The stacking method can be used to complement each classifier. Lastly, the modified stacking method is designed to reflect the variability of
over time.
3.1. Support Vector Machine
SVM is one of the most popular machine learning techniques. SVM is known as a soft margin classification and it classifies class by finding the optimal hyper-plane to distinguish two data. The optimal hyper-plane means the hyperplane with the maximum margin between two data. An important consideration in using SVM is the use of kernel trick. In this paper, we apply the Gaussian RBF(Radial Basis Function), which is one of the most widely used kernel trick. The input of SVM directly uses without any pre-processing.
Since SVM is a binary classifier, it is necessary to use several SVMs to design a multiclass classifier. Typically there are the one-versus-all and the one-versus-one method. If the number of the given classes is M, the one-versus-all approach trains eight binary classifiers that classify between one specific type and others. Moreover, the final result selects the highest score among each classifier’s decision score. In the case of the one-versus-one method, since we should consider all combination cases of classifiers, the one-versus-one approach is more complicated. Nevertheless, since it can give better performance, we apply the one-versus-one approach in this study.
3.2. Deep Neural Network
We designed the binary classifier and the multiclass classifier using deep-neural-network (DNN).
Figure 2a shows the architecture of the binary classifier and
Figure 2b shows the architecture of the multiclass classifier. The DNN based binary classifier has one input layer, six hidden layers and one output layer. The input layer has 512 nodes which corresponds to
N, the hidden layer has 32 nodes in each layer and the output layer has one node. The Exponential Linear Unit (ELU) is used as the activation function, which has been reported to have an excellent performance [
21]. The activation function for the output layer used the sigmoid function. Likewise, The DNN based multiclass classifier has one input layer, seven hidden layers and one output layer. The input layer has 512 nodes which corresponds to
N, the hidden layer has 32 nodes in each layer and the output layer has eight nodes. The activation function for the output layer used the SoftMax. We use Batch Normalization (BN) for performance improvement, which helps train the optimal scale (mean and variance) for each layer’s input data. Generally, it is known that BN in DNN can produce a large enhancement in the result [
22].
3.3. Stacking Classifier
We can expect a better result by combining the prediction results obtained from several predictors called ensembles; this learning method is called ensemble learning [
19]. The idea of ensemble learning is that, even though each predictor is a weak classifier, a combination of many ensembles may become a strong predictor. Bagging, boosting and stacking are known as prevalent methods of ensemble learning and we adopt the stacking method in this paper.
Unlike the simple voting method, the stacking is a way to train a new predictor model on top of the last layer that aggregates predictions. The new predictor on the last layer is called the blender or meta-learner. A stacking method is a useful tool for combining classifiers with different structures [
23,
24]. It enables us to design a new classifier that combines the already-trained binary classifier (
Figure 2a) and the multi-class classifier (
Figure 2b). The binary classifier determines whether there is a human walking in the detection area and the multi-class classifier is designed to classify eight cases individually. The blender is optimized after combining these two classifiers to improve classification accuracy performance.
Figure 7 shows the proposed structure of the stacking method.
3.4. Modified Classifier
We propose the stacking multiple classifiers for the fast processing within 1 s and improving classification accuracy.
Figure 8 shows the structure of the modified stacking method. The modified stacking method stores and reuses the multiclass classifier’s outputs to reflect continuous variability of micro-Doppler effect over time. The multiclass classifier produces
and the blender combines three consecutive features as
, where
p is the time shift (0.25 s, 50% of the window size) and the total processing time is 1 s. The sampling rate is 1024 samples/s, the window size is 512 samples and the sliding interval is 256 samples (50%). The combined
is used as the input feature for the blender classifier.
Increasing the window size can improve accuracy performance but these lead to increasing the processing time. If only a sliding interval is reduced, classification performance can be degraded due to the decreasing of the amount of input information. Thus, the window time, sliding interval and the number of features should be taken into consideration together for designing a classifier. Increasing the number of features did not make a significant performance difference, because the change of micro-Doppler signals is not much within a short time of 1 s. On the other hand, increasing the sample size of the Fourier transform requires more samples in the time domain and this comes with an increase in window time and processing time. More importantly, since the number of nodes in the input and hidden layers of DNN should be increased, the training and processing computation of the DNN get exponentially complicated. We selected 512 samples, 50% sliding interval and three within 1 s through repeated experiments.
4. Experiment
We excluded severe outliers in the given dataset and did not apply any additional pre-processing related to feature selection or noise filtering. We collected (a) 40,000 outdoor environment data, (b) 40,000 fan data, (c) 40,000 snow data, (d) 40,000 rain data, (e) 40,000 human walking data, (f) 10,000 human walking data with fan, (g) 40,000 human walking data with snow and (h) 40,000 human walking data with rain. Half of the total collected data were used to train two base classifiers and the remaining half of the data was used to optimize the blender.
We used cross-validation for verification. 75% of the total samples were used for the training set, 12.5% were used for the validation set and the remaining 12.5% were used for the test set. After the training set optimizes the given algorithm model, we can analyze and verify the learning optimization and the generalized performance of the trained classifier through validation set and test set; where training accuracy is the classification accuracy we get when we apply the model on training set, while validation and test accuracy is the classification accuracy for validation data and test set.
The labels of data sets used to train and test the classifiers should be associated or the same. For example, the output labeling of the binary classifier is “0” or “1.” “0” here means background noise signals without human walking. So the labeling of (a), (b), (c) and (d) by the multiclass classifier should be connected with “0” of the binary classifier. If “0” is associated with (e), (f), (g) or (h) in the given data set, it produces an incorrect result; likewise, the labels of the data set for learning the blender should be the same as those of the multiclass classifier. Also, in order to train and test the modified Stacking method, the components of should be generated from the same class.
4.1. Support Vector Machine
We designed the binary classifier and the multiclass classifier using SVM.
Figure 9 shows the ROC curve of the binary classifier and the confusion matrix of the multiclass classifier using SVM.
Table 2 shows the accuracy performance of SVM based classifiers. Despite being highly optimized for the training set, the accuracy results on the verification set and test set have not improved. This result means that the algorithm is over-fitted for the training set and the generalization performance is poor.
4.2. Deep Neural Network
We designed and tested the DNN based binary classifier (
Figure 2a) and the multiclass classifier (
Figure 2b).
Figure 10 shows the accuracy-loss graph of the DNN based binary classifier and the multiclass classifier. In the case of the binary classifier, the training accuracy was 93.42%, the validation accuracy was 89.71% and the test accuracy was 89.65%. And, in the case of the multiclass classifier, The training accuracy was 96.88%, the validation accuracy was 88.71% and the test accuracy was 88.79%.
Table 3 shows the confusion matrix of the DNN based multiclass classifier. We reduced fluctuation in the validation loss graph by lowering the learning rate of Adam optimizer and adjusting the batch size but there was a limit to reducing the fluctuation. DNN showed a definite performance improvement over SVM.
4.3. Stacking Method
We designed and tested the stacking method (
Figure 7).
Figure 11a and
Table 4 show the accuracy result of the stacking method. We reused the already-trained two base classifier. The learning process of stacking method was smooth that is shown in
Figure 11a. The accuracy-loss graph for epoch was very smooth, which means that it is easy to determine which point has the optimal performance. The training accuracy was 92.13%, the validation accuracy was 91.49% and the test accuracy was 91.43%.
4.4. Modified Stacking Method
We designed and tested the modified stacking method (
Figure 8).
Figure 11b and
Table 5 show the final results of the modified stacking method. The training accuracy was 95.58%, the validation accuracy was 95.55% and the test accuracy was 95.62%.
5. Conclusions and Discussion
We have proposed the stacking method for ensemble learning to improve the classification accuracy of the classifier to recognize micro-Doppler signals. The stacking method is to combines the classification models of different structures. Also, we additionally designed and tested the modified stacking method that reflects the variability of micro-Doppler effects over time.
Table 6 shows the experimental results. The stacking method of ensemble learning showed a performance improvement of about 2.6% (test accuracy) over the DNN based multiclass classifier.
Figure 10b showed that the DNN based multiclass classifier could no longer improve its performance due to over-fitting and fluctuation. However,
Figure 11a showed that the stacking method could be an effective alternative to optimize. In other words, the stacking method can give better learning process when the learning rate of the DNN is too long or too difficult due to its complex structure. We can get a more efficient learning process through the stacking that properly divides and combines the given structure.
Lastly, we designed and tested the modified stacking method to improve the accuracy of the classifier. The result showed a performance increase of about 6.8% (test accuracy) over the DNN based multiclass classifier. This result means that the performance can get improvement by using information that reflects the variability of micro-Doppler effects over time.
If we wanted to put more FFT data into the input of the DNN based multiclass classifier, the input node would have increased by k multiple of 512. This approach may lead to an optimization problem due to a more complex structure. On the other hand, the proposed modified stacking method is a useful approach to reuse already learned classifiers (the DNN based classifiers) to combine the micro-Doppler information over time because it does not increase the input node and does not complicate the structure. Therefore, since the proposed modified stacking method can give an alternative to simplify the structure of the classifier’s model and this method can provide a useful learning process.
The results of this experiment are based on datasets obtained in a limited environment. Nevertheless, this study showed that using a well-trained DNN can be an effective way to classify classes without any additional signal processing to extract features. Also, we verified the stacking multiple method as an effective alternative to improve the classification accuracy because of being able to reflect the micro-Doppler variability within a given window time. We expect that we can solve more challengeable classification problems by stacking and reusing well-known classifiers already trained.