3.1. Experimental Setup
The target vibration signals in the dataset were divided into three groups, dataset A, dataset B, and dataset C, to validate the performance test of the target recognition algorithm. The experiment was realized on a computer with 64-bit Win7 system, i5-7500 (CPU 3.4 GHz) processor, and 8.00G RAM.
The data set is the vibration data that we collect from different regions using vibration sensors. Environmental noise are data collected without any moving ground targets. Pedestrian signals are data on the movement of pedestrians in the monitored area. Tracked vehicle data and wheeled vehicle signals are the data of two different types of vehicles driving in the monitored area. Due to the limitations of the data acquisition environments and test vehicle types, some environments did not capture the data for all target types. Therefore, the ambient noise at the signal acquisition locations was compared, and vibration signals from locations with similar ambient noise characteristics were selected as a set of data sets. The sample data settings for the three datasets are shown in
Table 1, and the length of each sample is 1 s.
The input to the feature extractor is a time-domain waveform of length 1024. The sampling rate of the target vibration signal in the sample database is 2048 Hz. Although the dense waveform points can better represent the variation of the time-domain waveform, it leads to an increase in model computation. In order to reduce the computation of the feature extraction network, the original vibration model is downsampled to reduce the length of the input as much as possible while preserving the time-domain waveform characteristics of the signal. After experimental testing, the sampling rate of the data was reduced to 1024 Hz.
The input to the Short-Time Fourier Transform (STFT) of the signal is a vibration signal with a length of 1024, with a single window of 0.125 s and an overlapping window of 0.0625 s. The size of the STFT time-frequency representation obtained is 15 × 256. In order to reduce the amount of network computation, the effective frequency of the target vibration signal in the time–frequency representation is selected range to construct the feature map. Then, the feature map is converted into a grayscale image and input into the CNN. According to the energy band distribution of the target vibration signal, the low-frequency part of the time–frequency representation is selected to characterize the signal, and the input of the feature map is set to 15 × 60.
In the recognition model, the ReLU function is selected as the activation function of the convolutional layer. The softmax function is selected as the output function of the fully connected layer. In addition, the dropout layer is used to improve the generalization ability of the model and its ratio is set to 0.5.
In the loss function, , , , .
The target recognition algorithm DATR proposed in this paper is experimentally compared with VibCNN [
15] and PRNN [
16]. The parameters setup is consistent with the related papers. Among them, VibCNN takes the 1D time-domain waveform of the vibration signal as input, and uses CNN to construct the recognition model. The inputs of PRNN are the time-domain waveform and spectrum of the signal, and the recognition model is constructed by using the LSTM network. The settings for model training in this paper are shown in
Table 2, and an early stopping mechanism is adopted during network training to avoid network over-fitting.
3.2. Experiments and Comparative Analysis of Base Model Performance
3.2.1. Recognition Performance Comparisons
This subsection compares the recognition performance of this algorithm with other algorithms. The performance of the recognition algorithm is evaluated using Macro-Accuracy and Macro-F1-score. This subsection describes the experiments that show the accuracy of the proposed method for target recognition.
Dataset A, shown in
Table 1, is used to train the recognition model and compare the recognition performance of the DATR algorithm proposed in this paper with the VibCNN and the PRNN. Dataset A includes three target categories: pedestrian vibration signal, wheeled vehicle vibration signal and tracked vehicle signal. The proportions of training samples, validation samples and test samples are set to 60%, 20%, and 20%
, respectively. Data were randomly selected from the dataset proportionally for 25 experiments, and the average accuracy and confidence interval of the experiments were calculated. The performance of the three methods is shown in
Table 3.
From
Table 3, it can be seen that the average Accuracy of the DATR algorithm on the test set is improved by about 1.2–2.8%. The average F1-score is improved by about 1.1–2.9% compared with the comparison method, and it has the best recognition performance on both the validation set and the test set. Comparing the average accuracies and confidence intervals of the different algorithms for multiple repetitive experiments, it can be seen that the range of variation of the accuracy of the DATR algorithm is smaller than that of the VibCNN algorithm. It shows that its algorithm stability is better than the VibCNN algorithm, and the average accuracy of the VibCNN algorithm is slightly lower than the DATR algorithm. Due to the complex structure of the recognition network of the VibCNN algorithm, more network layers may lead to gradient instability in model training and cause the accuracy of the recognition model to decrease. Compared with the PRNN algorithm, the DATR algorithm has a higher recognition accuracy and F1-score, and can more accurately distinguish between human targets, wheeled vehicle targets, and tracked vehicle targets. From the model testing results, the CNN-based network has better fitting ability than the LSTM-based network.
In summary, the DATR algorithm has higher recognition accuracy and better model training stability than the VibCNN algorithm and the PRNN algorithm.
3.2.2. Different Target Recognition Performance
This subsection describes the experiments that show the accuracy of the proposed method for identifying different types of targets.
The following tests the recognition performance of the DATR algorithm and comparison methods for different types of targets. The data test set is used to verify the recognition performance of the three methods for pedestrians, wheeled vehicles and tracked vehicles. The identification accuracy of the three methods are shown in
Figure 5. The recognition accuracies of the DATR algorithm for the three types of targets of pedestrians, wheeled vehicles and tracked vehicles are 95.5%, 95.5%, and 96.2%, respectively. The DATR algorithm has high recognition accuracies on the task of recognizing all types of targets, and there is no problem that the recognition methods do not have an uneven recognition accuracy for multi-targets.
The experimental results show that the feature extractor of DATR algorithm can extract better target features of differentiation, recognize different types of targets, and have high recognition accuracy. The algorithm has high recognition accuracy on the task of recognizing all types of targets, and there is no problem with uneven recognition accuracy of multiple targets by recognition methods.
3.2.3. Computing Efficiency
This subsection describes the experiments that show the computational efficiency of the proposed method.
In order to evaluate the computing efficiency of the classification algorithms, the training time and running time of the three recognition algorithms are compared, and the results are shown in
Table 4.
From the table, we can see that the training time of DATR is 11 min 34 s. With the same number of epochs, DATR takes about one-seventeenth of the training time of the VibCNN method (the time shown in the table is the time required for VibCNN to run 200 epochs). The single judgment operation time of DATR is 0.8 ms on average, which is 3.6 ms faster than the single judgment operation time of the VibCNN algorithm. From the above experimental results, it can be seen that the DATR algorithm not only has higher recognition accuracy but also has faster model training time and computation time. In practical applications, the DATR algorithm has higher judgment efficiency, which can provide real-time recognition results for personnel and reduce system power consumption.
3.3. Experiments and Comparative Analysis of Recognition Algorithm Performance
3.3.1. Identify Manifestations Across Domains
This subsection shows the experiments that show the performance of unmigrated DATR and other algorithms in the new environment.
We test the performance of the source-domain recognition model in the new environment. The source-domain recognition model is trained using dataset A, and its recognition accuracy on target=domain datasets B and C is tested and the experimental results are shown in
Figure 6 and
Figure 7.
As can be seen from the figure, the accuracy of DATR as well as VibCNN and PRNN is 64%, 60%, and 57% for dataset B and 69%, 68%, and 59% for dataset C, respectively. The decrease in accuracy of this recognition model in a new environment (new dataset) is due to the change in target features. The original recognition model is not able to achieve high performance on the new dataset due to the changes in the characteristics of the target vibration signals caused by changes in the environment and changes in the vehicle type.
3.3.2. Migrated DATR Cross-Domain Identification Performance
This subsection describes the experiments that show the performance of migrated DATR in the new environment.
The following four sets of transfer learning experiments are evaluated: dataset A → dataset B, dataset B → dataset A, dataset A → dataset C, and dataset B → dataset C. In front of the arrows are the source-domain datasets, and after the arrows are the target-domain datasets. The accuracy and F1 scores of the four transfer experiments are averaged over 10 experiments, as shown in
Figure 8.
Comparing the experimental results, it can be seen that the DATR algorithm can improve the recognition accuracy of the recognition model on the target-domain vibration signal after transfer. Taking dataset A → dataset B as an example, the algorithm using transfer learning improves the recognition accuracy by 29.7% and the F1-score by 37% than that of the direct source-domain model, which indicates that the DATR algorithm has a better effect in moving ground-target recognition domain transfer. In addition, it can also be seen from the figure that all the recognition models obtained by DATR after transfer learning between different datasets have higher recognition accuracy than the model without transfer learning. This indicates that the algorithm has better constraints on extracting domain-invariant features, is able to extract common features of signals from different data domains in semi-supervised training with unlabeled data in the target domain, and has good target recognition performance.
3.3.3. Visual Distribution of Different Algorithm Features
This subsection describes the experiments that show the feature distribution of different algorithms.
In order to observe the performance of the model after transfer learning more intuitively, the output features of the recognition algorithm are compared. The model output features are reduced from high-dimensional to two-dimensional using t-stochastic neighbor embedding (t-SNE). The visualization distributions of the features obtained from different recognition methods for processing vibration signals are compared.
Specifically, model transfer from dataset A → B is performed using dataset A as the source-domain dataset and dataset B as the target-domain dataset. Compare the distribution of the output features of the feature extractor of the DATR algorithm after transfer with the distribution of the features of the unmigrated DATR algorithm (trained with dataset A only). Also, compare the distribution of features extracted by the VibCNN algorithm and the PRNN algorithm (trained with dataset A only) for dataset B recognition. The feature visualization results for each method are shown in
Figure 9.
The red dots shown in
Figure 9 are the visualized features of pedestrian vibration signals, the yellow triangles indicate the visualized features of wheeled vehicle vibration signals, and the blue squares indicate the visualized features of tracked vehicle vibration signals.
Figure 9a–c show the feature visualization results of the VibCNN algorithm, PRNN algorithm and the DATR algorithm without using transfer learning. As can be seen from the figures, when the recognition algorithm without transfer learning trained on the source-domain data is used to discriminate the target-domain data, the intra-class feature distribution of the signals is relatively scattered and there is a large overlapping portion of inter-class features. This indicates that some of the recognizable features of the target-domain signals have changed, and the recognition model trained on the source-domain dataset cannot accurately determine the class of the target-domain signals.
Figure 9d shows the visualized clustering diagram of the features extracted from the target-domain data by the DATR algorithm after transfer learning. From the figure, it can be seen that although there is a small amount of feature overlap region in the center part of the visualized feature map, the overall distribution of features shows that the features extracted by the feature extractor have a good degree of intra-class aggregation and inter-class dispersion. This indicates that the signal model features after transfer learning can be able to extract features with separability from non-similar targets, and the unlabeled target-domain data can be used to train the target-domain recognition model and achieve target classification with higher accuracy.
3.3.4. Visual Features of Different Domains
This subsection describes the experiments that show the ability of the proposed method to extract the common features of the source and target-domain data.
The feature visualization results are shown in
Figure 10. The hollow icons in the figure indicate the visualized features of the target domain and the solid icons indicate the visualized features of the source domain. From the figure, it can be seen that the distribution of features extracted by the model feature extractor from the same class of signals in the source and target domains is similar. As a whole, the features of the same class of targets in different domains are clustered in the same spatial range. This indicates that the DATR algorithm can extract domain-invariant features of the target vibration signals in the source domain and the target vibration signals in the target domain after transfer learning, and the proposed loss function can better constrain the training of the recognition model.
In summary, the neural network and domain adaptive moving ground-target recognition algorithm proposed in this paper can semi-supervise the training of target recognition models using source domain data and unlabeled target-domain data, and the trained models have better target recognition accuracy in both source and target domains.