Next Article in Journal
Acknowledgement to Reviewers of Remote Sensing in 2019
Previous Article in Journal
Performance Comparison of Geoid Refinement between XGM2016 and EGM2008 Based on the KTH and RCR Methods: Jilin Province, China
 
 
Article
Peer-Review Record

Comparing Performances of Five Distinct Automatic Classifiers for Fin Whale Vocalizations in Beamformed Spectrograms of Coherent Hydrophone Array

Remote Sens. 2020, 12(2), 326; https://doi.org/10.3390/rs12020326
by Heriberto A. Garcia 1, Trenton Couture 1, Amit Galor 2, Jessica M. Topple 3, Wei Huang 1, Devesh Tiwari 1 and Purnima Ratilal 1,*
Reviewer 1: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2020, 12(2), 326; https://doi.org/10.3390/rs12020326
Submission received: 23 December 2019 / Revised: 13 January 2020 / Accepted: 15 January 2020 / Published: 19 January 2020
(This article belongs to the Section Ocean Remote Sensing)

Round 1

Reviewer 1 Report

This paper tries to successfully classify acoustic signals acquired from a coherent hydrophone array into signals which correspond to fin whale localizations and those that do not.  The paper was generally well-written.  There were a few minor spelling and grammatical errors which will be spelt out below, after some more general comments.  

First of all, I found the title of the manuscript somewhat misleading, or at least not clear.  What methods correspond to neural networks, and what ones correspond to "conventional classifiers" must be made clear to the reader.  I feel the distinction is not made clearly enough as it stands.

I also found that there was not enough justification for choosing the decision tree approach as the best method for classification of acoustic signals.  It performed slightly better than other methods, such as SVM, but at the cost of potentially increased complexity.  Some more rigorous justification should be made before the paper is acceptable in my mind.  

The right side of Table 2 on page 9 exceeds the margin bounds and so simplfication/rescaling have to be performed on the table in this case.  In Figure 5, there are several instances where "principal" is misspelled as "principle".  All such instances should be reverted and changed to "principal". 

In the confusion matrices depicted on page 15, as well as throughout the paper, the marginal numbers for each row and column should be revised where necessary.  It would also be good to provide percentage values for the totals on the right-hand side and the top of any of the confusion matrices.  

 

Author Response

Response to Reviewer 1

 

This paper tries to successfully classify acoustic signals acquired from a coherent hydrophone array into signals which correspond to fin whale localizations and those that do not. The paper was generally well-written. There were a few minor spelling and grammatical errors which will be spelt out below, after some more general comments.

First of all, I found the title of the manuscript somewhat misleading, or at least not clear. What methods correspond to neural networks, and what ones correspond to "conventional classifiers" must be made clear to the reader. I feel the distinction is not made clearly enough as it stands.

Response: We have modified the manuscript title and no longer use the word “conventional”.

 

I also found that there was not enough justification for choosing the decision tree approach as the best method for classification of acoustic signals. It performed slightly better than other methods, such as SVM, but at the cost of potentially increased complexity. Some more rigorous justification should be made before the paper is acceptable in my mind.

Response: The justification for further work employing the decision tree approach in classifying fin whale vocalizations is provide in lines 418 to 423. For classification of general ocean acoustic signals according to sound sources, we now provide a discussion in lines 445 to 455.

 

The right side of Table 2 on page 9 exceeds the margin bounds and so simplification/rescaling have to be performed on the table in this case.

Response: Table 2 has been reformatted to fit the margin bounds.

 

In Figure 5, there are several instances where "principal" is misspelled as "principle". All such instances should be reverted and changed to "principal".

Response: We have made the corrections and now use “principal” throughout the manuscript.

 

In the confusion matrices depicted on page 15, as well as throughout the paper, the marginal numbers for each row and column should be revised where necessary. It would also be good to provide percentage values for the totals on the right hand side and the top of any of the confusion matrices.

Response: We find it more useful to show the number of signals in each of the categories in Tables 4-8, for both the actual and predicted, because the number of signals in each category is disproportionate, with dominantly high volume of non-fin-whale detections. Percentages are quantified in Tables 9-12, 14 and 17 for classifier accuracy, precision, recall and F1-score metrics where it is meaningful to do so.

 

 

Reviewer 2 Report

20 Hz and 120 Hz are not keywords for a paper, they should be removed

 

Since 5 classifiers are compared together, it would be more change title: “ Comparison of 5 classifiers to...” and eliminate specific reference to neural networks

Detail what type of beamforming algorithms are used to obtain the beamformed spectra. Do you use a delay and sum beamforming algorithm, an FFT based beamforming algorithm?

 

In table 1 it is said that the beamwidth for a 90° angle is 10 degrees. For this angle the effective length of the array would be zero and the width of the beam would be very large. The electronic scanning arrays, usually have restricted to a sector of 120 º steering angles. Can you explain why they use the array for 90 º steering angles?

 

Why haven't the dynamic parameters of the Bearing-time trajectories of fin whale been included as a feature?

 

Explain the concepts total left or total right represented in figure 4, with the color red and the color blue

On the one hand, we talk about 12 parameters that are extracted from the detected signals, but on the other hand, we talk about images associated to the spectrogram as input data.

Before describing each of the classification algorithms, specify clearly that some of the algorithms are trained with the characteristics vector and others are trained with the spectrogram image.

Why has the spectrogram image not been used to train the SVM classifier?

I recommend that you include in the bibliography of the papers three research paper that works with acoustic images that train SVM classifiers. Justify why you have ruled out training with acoustic images directly for this application.

Gidudu, Anthony & Greg, Hulley & Marwala, Tshilidzi. (2007). Classification of Images Using Support Vector Machines.

Del Val, Lara & Izquierdo-Fuente, Alberto & Villacorta, Juan & Raboso, Mariano. (2015). Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines. Sensors. 2015. 14241-14260. 10.3390/s150614241.

Amiriparian, Shahin & Gerczuk, Maurice & Ottl, Sandra & Cummins, Nicholas & Freitag, Michael & Pugachevskiy, Sergey & Baird, Alice & Schuller, Björn. (2017). Snore Sound Classification Using Image-Based Deep Spectrum Features. 10.21437/Interspeech.2017-434.

In tables 10 to 14 and 17 , use the value in percentage, so that it can be compared in a more intuitive way and be equivalent to the data provided in the text.

It is not logical to repeat figures 6 and 8. Please use only one.

Author Response

Response to Reviewer 2


20 Hz and 120 Hz are not keywords for a paper, they should be removed.

Response: We have removed 130 Hz, but kept 20 Hz. The “20 Hz” vocalization from fin whales is very characteristic for this whale species and appears in the title and abstract of various publications. We have chosen to retain it here.


Since 5 classifiers are compared together, it would be more change title: “Comparison of 5 classifiers to...” and eliminate specific reference to neural networks.

Response: We have modified the manuscript title following the suggestion of this review.


Detail what type of beamforming algorithms are used to obtain the beamformed spectra. Do you use a delay and sum beamforming algorithm, an FFT based beamforming algorithm?

Response: The beamformer is now specified in line 175.


In table 1 it is said that the beamwidth for a 90° angle is 10 degrees. For this angle the effective length of the array would be zero and the width of the beam would be very large. The electronic scanning arrays, usually have restricted to a sector of 120 º steering angles. Can you explain why they use the array for 90 º steering angles?

Response: It is standard in ocean acoustic array processing to steer a towed linear hydrophone array from forward endfire direction (direction of tow ship, or +90 degrees), through broadside (perpendicular to array axis, or 0 degrees) and to back endfire (-90 degrees). This is done in both active and passive sensing application with linear hydrophone arrays. See active acoustic images from ocean acoustic linear hydrophone arrays in Refs. 40, 41, 42, 33, 35, 36 and 37.


Why haven't the dynamic parameters of the Bearing-time trajectories of fin whale been included as a feature?

Response: The issue of including bearing-time features is now addressed in lines 456 to 460.


Explain the concepts total left or total right represented in figure 4, with the color red and the color blue.

Response: The red and blue dots, left and right side bearings are now explained in the captions to Figures 4 and 7.

On the one hand, we talk about 12 parameters that are extracted from the detected signals, but on the other hand, we talk about images associated to the spectrogram as input data. Before describing each of the classification algorithms, specify clearly that some of the algorithms are trained with the characteristics vector and others are trained with the spectrogram image.

Response: The input data to the classification algorithms are now clearly distinguished and specified in lines 70-73 and lines 236 to 240.


Why has the spectrogram image not been used to train the SVM classifier?

I recommend that you include in the bibliography of the papers three research paper that works with acoustic images that train SVM classifiers. Justify why you have ruled out training with acoustic images directly for this application.

 

Gidudu, Anthony & Greg, Hulley & Marwala, Tshilidzi. (2007). Classification of Images Using Support Vector Machines.

 

Del Val, Lara & Izquierdo-Fuente, Alberto & Villacorta, Juan & Raboso, Mariano. (2015). Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines. Sensors. 2015. 14241-14260. 10.3390/s150614241.

 

Amiriparian, Shahin & Gerczuk, Maurice & Ottl, Sandra & Cummins, Nicholas & Freitag, Michael & Pugachevskiy, Sergey & Baird, Alice & Schuller, Björn. (2017). Snore Sound Classification Using Image-Based Deep Spectrum Features.10.21437/Interspeech.2017-434.

Response to both: The recommended references are now cited in line 239 and included in the paper as References 53 to 55. The fact that SVM classifiers can also be trained using images is now discussed in lines 239 to 240.


In tables 10 to 14 and 17, use the value in percentage, so that it can be compared in a more intuitive way and be equivalent to the data provided in the text.

Response: We find it more useful to show the number of signals in each of the categories in Tables 4-8, for both the actual and predicted, because the number of signals in each category is disproportionate, with dominantly high volume of non fin whale detections. Percentages are quantified in Tables 9-12, 14 and 17 for classifier accuracy, precision, recall and F1-score metrics where it is meaningful to do so.


It is not logical to repeat figures 6 and 8. Please use only one.

Response: Figures 6 and 8 are not repetitive. Specifically, Figure 8 displays modified CNN and modified LSTM beamformed spectrogram input images including zero padding, while Figure 6 does not include zero padding. The caption to Figure 8 specifically highlights the zero padding in the input images for Figure 8.

Reviewer 3 Report

The paper deals with an interesting problem in the field of underwater source recognition. A methodology is adopted ranging from the constitution of a large base of acoustic signals through the use of classification methods (5) to identify the different acoustic sources. An evaluation phase of the different methods was carried out to analyze the performance of the adopted methodology. In the context of automatic detection and recognition of sources in real time, the results obtained are encouraging. The structure and organization of the paper is good. However, it would be important to take into account the points listed below.
- It would be useful to improve the quality of the figures by starting from the beginning.
- The introduction should be reinforced by inserting other bibliographic references on the problem of passive acoustics (characterization of the underwater environment, and detection of underwater mines)
- It is known that among the methods used, they are greedy in computing time, how can you explain your application in real time from a very large volume of signals?
- It would be useful to specify the limits of the methodology adopted
- It would be important to reinforce the conclusion and to add the perspectives offered to the work carried out and presented in this paper.

Author Response

Response to Reviewer 3


The paper deals with an interesting problem in the field of underwater source recognition. A methodology is adopted ranging from the constitution of a large base of acoustic signals through the use of classification methods (5) to identify the different acoustic sources. An evaluation phase of the different methods was carried out to analyze the performance of the adopted methodology. In the context of automatic detection and recognition of sources in real time, the results obtained are encouraging. The structure and organization of the paper is good. However, it would be important to take into account the points listed below.

- It would be useful to improve the quality of the figures by starting from the beginning.

Response: Figures 2, 6 and 8 now show the zero time label.


- The introduction should be reinforced by inserting other bibliographic references on the problem of passive acoustics (characterization of the underwater environment, and detection of underwater mines)

Response: We have added five references (References 3 to 7) on passive acoustic characterization of underwater environments and cited them in line 26. Underwater mines are detected using active acoustics and is not the approach used here.


- It is known that among the methods used, they are greedy in computing time, how can you explain your application in real time from a very large volume of signals?

Response: The run times for training and testing data sets are provided separately in lines 228 to 231 and issue of real time classification now discussed in lines 231 to 234.


- It would be useful to specify the limits of the methodology adopted.

- It would be important to reinforce the conclusion and to add the perspectives offered to the work carried out and presented in this paper.

Response: The limits are now specified in lines 521-524 for fin whale vocalization classification in the conclusion, providing perspectives, and in lines 445-455 for broader category of general ocean sound source classification.


Round 2

Reviewer 3 Report

This version is better than the previous one. Therefore, paper can be accepted.  
Back to TopTop