1. Introduction
Source localization is a long-term research topic in underwater acoustics, and approaches based on matched field processing (MFP) are widely applied [
1]. The concept of MFP for source localization relies on determining the position of the best match between real data and replicas generated at a certain grid but limited by some common problems, such as environment mismatch and the convergence difficulty in the optimization algorithm [
2]. Recently, machine learning (ML), including deep-learning methods, such as those involving feedforward neural networks (FNNs) and convolutional neural networks (CNNs), has become a research hotspot owing to the high performance of these methods for classification and feature extraction tasks. In the field of image recognition, AlexNet [
3] was proposed in the ImageNet large-scale visual recognition challenge benchmark, confirming the dominant position of CNNs in the field of computer vision. Hinton et al. [
4] introduced an FNN into speech recognition, driving the rapid growth of speech recognition technology. In source localization, ML is a data-based approach that exploits the function between underlying training data sets and corresponding source ranges and then performs source localization on testing data sets [
5]. Various studies have found that ML achieved lower error in source ranging than MFP, especially in complex ocean environments with low SNR [
5,
6,
7]. In addition, progressively increasing network architectures, such as FNN [
5,
6], TDNN [
7,
8], CNN [
9,
10], ResNet [
11,
12,
13], and Inceptions [
10], are applied to underwater acoustic localization to mine deep features and decouple sound source information from the underwater acoustic environment. This application of ML to source ranging has been demonstrated to achieve good localization results in both synthetic and experimental data sets collected on the sea. However, in the published literature, the outputs of ML based on CNNs are extremely scattered within estimated ranges, especially when the SNR is low, although they do provide stable trends. These scattered estimation results can be confusing in terms of physical interpretation and practical applications, especially when encountering a very limited data set size.
In our recent study [
14], a Gaussian distribution for output layer labels used in CNN transfer learning to express the uncertainty of ranging estimation and achieved good ranging results on sea trial data. However, no systematic elucidation and analysis for the underlying mechanism was given in that paper. Based on our previous results, this paper further investigates the benefit of changing the conventional format of CNNs as a Gaussian distribution for output layer labels. Considering the errors between the “true” distances from a source to a receiver, as calculated from GPS points, the standard deviation of the Gaussian distribution label reduced the divergence of the estimated results. The CNN source-range performance for at-sea data compared with that of CNN regression layer and classification layer outputs were significantly improved. In addition, the performance of the modified CNN with different standard deviations was examined for real data with different signal-to-noise ratios (SNRs).
This paper is organized as follows:
Section 2 describes the details of the at-sea experiment, the modified CNN framework for source ranging is presented in
Section 3; the real-data processing is given in
Section 4; the results and analyses are given in
Section 5, and
Section 6 concludes the paper.
2. Description of the Experiment
To study sound source ranging, sea trials were carried out in the South China Sea in 2017. The experimental data in this paper adopted the data used in the literature [
14]. In this experiment, the sound source was towed by a source ship with pre-designed routes, as indicated by the blue and black dashed lines in
Figure 1a, while the recording system was an 8-element vertical line array (VLA) moored on the seafloor, which recorded the sound data of different source distance. The bathymetry of the experimental area measured using a multibeam sonar is shown in
Figure 1a.
The top element of the VLA was located at a depth of 1880 m, the bottom sensor of the VLA was fixed 20 m from the seabed, and the spacing between the elements was 20 m. According to the depth sensor recordings, the VLA remained straight throughout the entire experiment. As shown in
Figure 1a, the trajectory of the source ship can be divided into two main lines, track 1 and track 2. The bathymetry of track 1 (denoted by the blue dashed line) was relatively flat, whose farthest distance to the VLA is about 10 km. The bathymetry of track 2 (denoted by the black dashed line) was moderately range dependent along latitude and had a slope of about 9.65°. In track 2, the source ship repeatedly sailed to the east and west of the VLA, reaching a maximum of approximately 8 km west and approximately 12 km east of the VLA.
Figure 1b shows the sound speed profiles (SSPs) calculated from in situ measured conductivity, temperature, and depth (CTD) before, after, and during the source, the ship runs. The source ship speed was approximately 4 knots, and an acoustic transducer was towed from the ship with varying depths between 50 and 100 m, which were highly correlated due to the status of the source ship (i.e., maneuvering, etc.).
To test the ranging effect under a low SNR, A spectral signal with multiple narrow bands (narrowband frequencies of 63, 79, 105, 126, 160, 203, 260, and 315 Hz) was transmitted at different SNRs. Each group of signals contained four segments, denoted E1, E2, E3, and E4. The SNR was sequentially reduced by 0, 5, 10, and 15 dB for 60 s durations at 10 s intervals. The experimental data were recorded using an 8-element VLA fixed 20 m above the seabed.
4. Data Simulation and Preprocessing
To expand the dataset, environmental parameters and spatial distribution were used to produce replica datasets to compensate for the lack of real data at certain distances. Using either prior information or in situ measured environmental parameters and geometrical and source settings, such as SSPs, a seabed model and parameters, the depths of sources and hydrophones, and signal frequencies, a sound field replica data set as recorded by the VLA can be calculated via a sound field calculation code, herein, Kraken-C [
17]. Before inputting into the CNN, the data were first normalized by procedures presented by Niu et al. [
6], and a net was used to locate the source by analyzing the data set as provided by the VLA.
The simulation was carried out using the Kraken-C sound field calculation program. The SSPs used in the simulation, three measured ones and one average one, were used to obtain a smooth sound field structure. The water depth at the VLA was approximately 2170 m, and the seabed was regarded as a single-layer seabed. The density of the seabed was set at 1.5 g/cm
3, and the attenuation coefficient was set at 0.2 dB/wavelength. Because the sound speed of seabed has considerable influence on the sound field, it was varied from 1500 to 1800 m/s at intervals of 50 m/s. The source depths were sequentially set at 20, 40, 60, 80, and 100 m. Considering the placement error of the VLA depth, the first element of the VLA was 1885, 1905, and 1925 m, respectively. The horizontal distance interval was 0.02 km, and the longest distance was 12 km. A total of 252,420 (4 SSPs × 7 bottom speeds × 5 source depths × 601 ranges × 3 VLA depths) copies of sound field data sets were generated by employing the above-mentioned parameters and used as a training dataset. 10% percent of the training set was used as the validation set for diagnosis during DNN training. A diagram of the synthetic ocean environment is shown in
Figure 4.
A CNN with different output modes was trained using simulation data, and the network was optimized by Adam’s adaptive optimization algorithm [
18]. A total of 10,000 epochs were trained with an initial learning rate of 1 × 10
−4 and a regularization factor of 5 × 10
−4 to ensure adequate network training. For the receiving VLA sound field data, fast Fourier transform was performed on the signal of each element, with a time window of 1 s and step size of 0.25 s. Further, each signal was normalized by the following formula:
where
f represents the frequency and
zn represents the depth. These feature vectors contained abundant information on the properties of the sound field and the source, which was extracted and mapped to the target location by the CNN architecture. The preprocessing results were inputted into three types of CNNs. The classification output layer divided the output into different categories with a spacing of 0.02 km for training. The regression output layer was the output in the form of the continuous distance value of the source. The Gauss regression output layer incorporated a Gaussian distribution using a one-dimensional Gaussian function with a standard deviation of 0.1 and a mesh spacing of 0.02 km. Tracks 1 and 2 contained 3371 and 14,400 data sample points, respectively, for each SNR. The statistical data described in the following section corresponds to all the data under each SNR.
5. Results and Analyses
To quantify the performance of the ranging method, the probability of credible localization [
19] (PCL) and the mean absolute percentage error (MAPE) were used as the criteria for the performance tests. The PCL was used to estimate the reliability of the ranging results by indicating that the relative ranging error estimated by the method was less than the percentage of the given value at all test sample points as follows:
where
K is the sample size of the test data,
is the estimated range, and
r0 is the GPS range. Equation (5) shows that when the relative error of the distance estimate is less than
λ%, the estimated distance is considered to be reliable, and the PCL −
λ% is the percentage of the number of reliable ranging in the total number of samples. PCL −
λ% is a criterion that is very strict with the convergence of predictions but tolerates errors within a certain range. Even if the prediction has high accuracy, when the prediction is unstable and divergent, the PCL −
λ% can decrease sharply.
The MAPE was defined to measure the difference between the estimated and real value data, representing the percentage of the mean relative error of the estimated distance at all test sample points as follows:
where
i = 1, 2, 3, …,
n represents the numbers of output nodes and
and
represent the true label and the estimated value of each output node, respectively.
The results of on the three types of CNNs for all 3371 samples on track 1 are presented in
Figure 5,
Figure 6 and
Figure 7, where the pings are the numbers of the samples in the test dataset, the estimated distances of samples are given by blue plus signs and true distances given by red dots. The outputs of the classification CNN and the regression CNN were scattered and diverged from the true values, as shown in
Figure 5 and
Figure 6. Under the same conditions, the Gauss regression CNN showed stronger convergence ability and ranging performance, particularly for the close and middle range, as shown in
Figure 7.
When comparing different output layers, the network architecture, network parameters, and training mode were the same. The following conclusions can be drawn from the comparison chart presented in
Figure 8. The Gauss regression CNN demonstrated the best performance in terms of average relative error and accuracy. When the emission energy was 0 dB, the PCL −5% and PCL −10% of the Gauss regression CNN reached accuracies of 99.56% and 90.14%, respectively, which were higher than those of the classification CNN (84.81% and 65.62%) and the regression CNN (83.33% and 64.49%). With a decrease in SNR, the ranging accuracy of the three methods decreased, but the degree of degradation of the Gauss regression CNN was smaller than that of the other two methods. In track 1, for example, when the relative SNR was reduced from 0 dB to −15 dB, the Gauss regression CNN curve displayed a drop of 5.27% and 12.93% within the error range of 10% and 5%, respectively, with drop rates of 11.69% and 15.84% for the classification method and 12.31% and 25.51% for the regression method, respectively. Although the Gauss regression CNN showed better performance in the criterion of PCL, its MAPE was similar to the traditional regression CNN, while its PCL was much higher, which indicated its limitations in local searching ability and precision of estimation results.
The MAPE for the classification CNN was the highest; however, its 5% and 10% error accuracy was greater than that of the regression CNN. This was because the outputs of the different distances could not be effectively distinguished in the classification problem, while the output of the regression problem was a continuous result: the closer the distance to the real one, the smaller the error.
The MAPE of the regression CNN with a single distance in the output layer was lower than that of the classification CNN; however, its 5% and 10% error accuracy was the lowest. This showed that the regression method fully reflected the regression fitting ability of the deep neural network, and the relative average error of the output was very small. However, the corresponding positioning accuracy was not remarkably high, which may have been caused by a mismatch between the actual environment and the simulation model.
The MAPE and ranging accuracy of track 1 were less than those for track 2. Because the simulation environment used in this study was horizontally uniform and the seabed in track 1 was relatively flat, the relationship between the actual acoustic field characteristics and the simulation data was more consistent. The farthest distance between the two ends of track 2 pertained to an uphill environment. The simulation carried out in this study could not accurately simulate the acoustic field characteristics of the seabed as it fluctuated with distance, and the accuracy of the location was thus decreased.
The modified CNN with different standard deviations of the Gauss regression layer was compared to the different SNRs (
Figure 9). For each SNR, at first, the PCL −5% rapidly increased with increasing standard deviation; however, it tended to stabilize. When the standard deviation reached a certain value, the improved performance of the CNN slowly declined. Moreover, the value of the standard deviation when the performance began to degrade was smaller with lower SNR.
6. Conclusions
Using simulated replicas as the training data set, the source distance was obtained from the multiple-narrow-band data received by a deep-sea vertical array using a trained CNN. The results showed that setting the output layer of the CNN in the form of a Gaussian distribution regression significantly improved the ranging accuracy and convergence ability. In addition, the Gauss regression CNN was least affected by the decrease in SNR compared with the traditional CNNs. However, the Gauss regression CNN showed a limitation in local searching ability and precision of estimation results.
When compares the two traditional CNNs, the results indicate that the ranging error of different estimated distances could not be effectively distinguished in the classification CNN, while the output of the regression CNN was a continuous result by minimizing the error between the true and estimated range, which leading to a higher MAPE for classification CNN, but a higher accuracy for regression CNN.
The modified CNN is more suitable for practical environments: the position in the classification problem pertained to different types of position relations, while the fuzzy change in the Gaussian function with distance was similar to the horizontal correlation of the actual sound field, which makes the model fully trained and a better fit. However, for complex oceanographic environments, the results showed that the CNN performance decreases significantly when the marine environment changes greatly with the distance, calling for further research on the environmental adaptation for source-ranging CNNs.