Analysis of Collected Data and Establishment of an Abnormal Data Detection Algorithm Using Principal Component Analysis and K-Nearest Neighbors for Predictive Maintenance of Ship Propulsion Engine
Round 1
Reviewer 1 Report
I would have been very happy if the algorithm had been applied to a land vehicle engine
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The paper presents a case study of using of using ship propulsion data to make predictions for future needs of maintenance using the PCA and nearest neighbors algorithms. The targeted application is well motivated in section 1, and the description of the data used in the analysis is adequate for the most of section 2 materials and methods. However, my biggest concern is the novelty of the methods used, e.g. PCA and k-NNs are simply implemented in a data set without much regard on the suitability and limitations of these methods to the case study presented. Indeed, if one is not aware of these schemes Equations (1) and (2) have little if anything to offer in helping one understand these methods. Moreover, based on the symbols used it is hard to connect them to the assumed data structures. The data pre-processing flow chart in fig 1 is informative but then not directly relevant to the main goal of the paper.
Further down in the results and discussion section it appears that the authors use the first two principle components to make decisions about data anomalies. I say "it appears" as the PC1 and PC2 labels on the axes of figures do not have explicit definitions in the text. Why was the given radius-distance chosen to separate normal from anomalies in the data? The text in the tables of figures 5, 6 and 10 are too small to parse.
There are also a few typos scattered in the text that can be easily picked up in a revision. I am also wondering on where the information (labels) used in line 113 are used in the text.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
I have minor observations regarding the clarification of some aspects of the study from paper:
Figure 2: why are there again “check and delete dataless (null) rows and columns”? Wasn't to the preprocessing operation already done that operation (figure 1)?
Figure 2: what is the significance of two branches of No to the first condition regarding the PCA analysis?
Figure 2: Actually, what is the significance of the first condition (test): "Analysis of PCA variance ratio"? What does Yes and No mean here? Is it a condition for completing PCA? In this case, reformulate the first condition.
Figure 2: apart from the detection of the minimum distance, what are the other conditions that would lead to the completion of the KNN from the second condition? Why did you put “Distance between index, etc.”?
Did you use NumPy to analyze data and generate the figures? In this case, mention this in section 2.3. Specify, in the same section, which ML libraries you used in Python .
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
I am satisfied with the changes and feedback provided to my comments.