4.3. Experimental Results and Analysis
The DRACNN model proposed in this paper is built in a deep learning development environment with a Windows 10 operating system, Python 3.6.5, Keras 2.2.4, TensorFlow-1.14.0, Cuda 10.0.130, and trained on a workstation equipped with Nvidia GTX1660ti GPU, Core I5-10400F CPU, and 16 GB RAM. We have chosen appropriate hyperparameters for the model as follows: optimizer setting to Adam, batch-size setting to 256 and epochs setting to 100. In the iterations, we update the model parameters using the error back-propagation (BP) algorithm and optimize the model parameters using the Adam optimizer with an adaptive learning rate. The initial learning rate of the Adam optimizer is 0.001. The exponential decay rate of the first-order moment estimation is set to 0.9, and the exponential decay rate of the second-order moment estimation is set to 0.999. We use the joint loss function containing a multicategory cross-entropy loss function, and a central loss function to measure the error between the model predicted values and the actual values, minimizing the error through iteration.
The DRACNN model is iterated on the ShipsEar dataset for 100 epochs, and
Figure 7 shows the change curves of
accuracy and loss during the whole process. As shown in
Figure 7a, the training loss and validation loss decrease rapidly within about 20 epochs, after which they gradually decrease and converge to a certain fixed value in the absence of overfitting and underfitting. It is also shown in
Figure 7b that the training and validation accuracies grow rapidly and quickly converge to a stable value, with an optimal
accuracy of 0.999 on the training set and 0.970 on the validation set. In the experiments to test the recognition
accuracy of the model, we conducted a total of 10 repetitions of the experiment, and each time, we reselected the samples of the training set and the test set randomly with fixed sample capacities. The recognition results are shown in
Table 4.
The mean recognition
accuracy of the 10 experiments is 97.1% with a standard deviation of 0.24, and the mean loss function value is 0.11 with a standard deviation of 0.01. In order to show the prediction information in detail, we give the recognition accuracies of the DRACNN model for each class of targets in the ShipsEar dataset as well as the proportion of samples that misidentify one class of targets as other classes in the form of confusion matrices, as shown in
Figure 8.
Targets in class E have the best recognition results with an
accuracy of 99.1% despite the smallest number of training samples, mainly because these targets are marine background noise, which are easy to distinguish from ship targets in class A, B, C, and D, because of the significant difference in the noise characteristics. Among the four types of ship targets, targets in class B have the worst recognition results, with a recognition
accuracy of 94.7%. One of the possible reasons is that the training samples of this class of targets are the least, leading to the model training not being sufficient. Another possible reason is that targets (Motor boats, pilot boats, sailboats) in class B have similar characteristics to targets (Passenger ferries) in class C and are easily misidentified as class C. In addition to recognition
accuracy, in the experimental results presentation, we use
precision,
recall, and
F1_
score to evaluate the recognition performance of the DRACNN model more comprehensively. The results are shown in
Table 5, and the formula for each index is as follows:
where
is true positive,
is false positive and
is false negative.
In order to better analyze the results, we compare some methods also validated using the ShipsEar dataset with our experimental results. Here, in addition to the recognition
accuracy, we also compare the number of parameters and the amount of floating-point computation of these deep learning-based models given in the references, as shown in
Table 6.
The DRACNN model proposed in this paper achieves a target recognition accuracy of 97.1% on the ShipsEar dataset, which has the highest recognition accuracy except for the AResNet model, with a recognition accuracy of 98.0%. The UTAR-Transformer model, introducing the self-attention mechanism into CNN, has a recognition accuracy of 96.9%, which is roughly the same as the model recognition accuracy in this paper. However, it is worth mentioning that unlike AResNet and other comparative methods, we did not conduct any data filtering and data enhancement in the process of model training and testing, which makes our results more credible and convincing. More importantly, our model has only 0.26 M parameters, which is about 1/36th and 1/10th of the AResNet model and UTAR-Transformer model, respectively, and has 5 M floating-point computations, which is about 1/292nd and 1/46th of the two models, respectively. So, our model has a smaller number of parameters and floating-point computations in comparison, which means that less memory and computational resources are required to run the model, facilitating the deployment of the model on a minicomputer system and the fast implementation of target recognition. On our computer, preprocessing consumes 2.9 ms per sample, and recognition consumes 0.2 ms per sample.
Due to the close distance from the targets to the hydrophones, the signal-to-noise ratio is high for the underwater acoustic target radiated noise signals in the ShipsEar dataset. We add Gaussian noise with different signal-to-noise ratios to these signals and repeat the previous experimental steps for recognition.
Figure 9 shows the recognition results of our model with different SNRs, which shows the target recognition
accuracy increases with the increase of SNR. When the SNR is −20 dB, the recognition
accuracy of our method is 65.8%, and when the SNR is greater than 0 dB, the recognition
accuracy of our method achieves more than 90.0%.
4.4. Generalization Ability of DRACNN Model
It is worth focusing on whether our method is still applicable to other datasets or application scenarios. For the DRACNN model proposed in this paper, the FEM in front of the GAP layer in CM is used to extract deep features from the underwater acoustic target radiated noise, and we use the DeepShip dataset to verify the generalization ability of the features extracted by FEM. A detailed description of the DeepShip dataset was given in the paper [
7]. The DeepShip dataset records the radiated noise signals of four types of targets, and they are Cargo (Car), Passenger ship (Pas), Tanker (Tan), and Tug. However, The DeepShip dataset is too large for our model’s training and test, so we only select part of the data in this dataset without losing representativity, and the selections are as follows:
(1) We have a subset of audio files with labeling no greater than 60 from the full set of each type of DeepShip dataset.
(2) We intercept signals with a duration of 10 s from the middle of each signal for making the sample set.
We obtained a total of 45,162 samples after preprocessing, 80% of the samples were randomly selected as the training set, and 20% of the samples were used as the test set. The detailed sample size is shown in
Table 7.
The DRACNN model is trained using the ShipsEar dataset. On this basis, we input the test data from the ShipsEar dataset and DeepShip dataset into the pre-training DRACNN model and use the TSNE algorithm to downsize the input data and the deep features extracted from the GAP layer so that their dimensionality is all changed to 2, which can be conveniently used for visualization and analysis. The results are shown in
Figure 10.
The raw signals of underwater acoustic targets are cluttered in the feature space, and we can hardly find any information that can distinguish the targets from
Figure 10A,C, while from
Figure 10B,D, we can see the distinguishability of the depth features extracted from the GAP layer is significantly improved. As can be seen from
Figure 10B, the features of the four classes of ship targets show some separation, especially the marine environmental noise represented by category E, which is clearly distinguished from the data of other categories. The class separability of the depth features of the DeepShip dataset samples extracted by DRACNN is significantly improved, which is shown in
Figure 10D. These results indicate that the DRACNN model effectively learns the intrinsic properties of underwater acoustic targets and has good generalization ability. Finally, we connect two dense layers with 32 nodes and a relu activation function layer and a Softmax classifier with four nodes after the FEM to form a new model in sequence. The new model has only 0.005 M trainable parameters, which is equivalent to a sample deep neural network and is shown in
Figure 11.
The parameters in the FEM are frozen so that they are no longer involved in the training process. Several newly added layers are trained using training set samples included in the DeepShip dataset and then used to recognize the test set samples, achieving 89.2% recognition
accuracy, exceeding that of most traditional methods and deep learning methods. The confusion matrix of the recognition results is shown in
Figure 12.
It is easy to see that we only trained a classifier for the DRACNN model on the DeepShip dataset, while the feature extraction was done by the FEM trained on the ShipsEar dataset, and the new model respectively achieves 84.0%, 90.9%, 87.8%, and 90.9% recognition accuracies for the four types of targets. Another experimental result shows that the recognition accuracy of the new model without FEM is only 51.8%, which further illustrates that the deep features extracted by FEM can reflect the target characteristics well.