Unsupervised Security Threats Identification for Heterogeneous Events
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article address an important and challenging problem in industrial control system security, i.e., analyzing large amounts of data and providing timely responses mainly because of the heterogeneous nature of the environment that leads to data generation with different types of alerts occurring in devices from a wide variety of vendors, making it difficult to build a reliable detection system. The proposed approach involves applying a customized pre-processing technique tailored to different data types comprising of alerts, followed by classification of unlabeled alerts using an autoencoder model, to effectively distinguish between different attack types, thereby helping the administrators to respond to attacks. The dataset used to test the effectiveness of the proposed approach was created using a HIL-based augmented ICS testbed to emulate real-world situations. Overall, the article is well-presented and the research conducted is promising. The authors have also clearly identified the existing limitations of the proposed work such mainly pertaining to dataset and static threshold for anomaly detection. One important feedback to further improve the article would be to including shortcomings of similar approaches in the related work section. As an example, a somewhat similar approach (https://doi.org/10.3390/asi7020018) was recently published and it would be beneficial to clearly state the difference between this (and other similar methods if any) and the proposed approach.
Comments on the Quality of English LanguageThe article can benefit from another round of proof-reading.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors- The authors propose a framework that preprocesses data from various security devices, applies an autoencoder model to classify alerts, and performs integrated relevance analysis (IRA) to help security administrators respond to attacks more effectively.The challenge of handling and analyzing alerts from various heterogeneous security devices in industrial environments.The issue of false positives and the difficulty of building a reliable detection system.
The use of unsupervised learning allows for the detection of new and unknown threats without the need for labeled data.The framework integrates various data preprocessing steps, anomaly detection, and relevance analysis, providing a comprehensive solution for heterogeneous environments.
- The analysis of deployment complexity in real-world scenarios and the complexity of the framework is lacking. The feasibility of deploying the framework has not been explained. Figures 5 and 6 are too densely arranged, which makes it difficult for readers to follow. It is recommended to adjust the layout for better readability. The discussion section lacks depth and does not serve its intended purpose. In the experiment section, the differences and performance comparisons between the proposed framework and other algorithms are not clearly presented.
- In Section 1, the authors summarize the shortcomings of existing methods and the innovative points of this work. However, there is no detailed description of the proposed framework as shown in Figure 1.
- Some sentences in the paper have some problems including colloquial expression, grammatical errors, tense errors, and so on. Therefore, it is suggested to polish the whole paper with a native English speaker.
- As mentioned earlier, the analysis in the discussion section is not deep enough and can be further expanded to use data to support the conclusions. In addition, adding comparison and reflection can enrich the discussion.
- The use of technical terms is mostly accurate, but in some areas further clarification and definition may be needed to ensure that all readers, especially non-experts, can understand the content.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authorselectronics-3318474
Unsupervised Security Threats Identification for Heterogeneous
Events
The authors have proposed an unsupervised threat identification method for distinguish between normal and abnormal alerts and classifying the types of attacks. The proposed method involves data generation, data preprocessing, anomaly detection, and integrated relevance analysis (using correlations to identify attacks and its classification). The proposed technique appears sufficiently robust for detecting false alerts. Overall, the paper is well written and it can be published without further changes.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe manuscript presents an unsupervised machine learning approach, leveraging autoencoders, to detect security threats from heterogeneous events, specifically in industrial control systems and operational technology environments. The authors attempt to address a significant challenge in cybersecurity: the complexity of dealing with unlabeled, heterogeneous alert data, which often produces false alarms in security operations centers. This study proposes a framework that includes preprocessing of heterogeneous data, anomaly detection, and integrated relevance analysis to assist administrators in filtering false alerts and identifying real attacks. Here are couples of comments:
1. The first three paragraphs in the introduction section seem describe the background of the topic. Can they be combined into one and would that flow better?
2. The images in Figure 5 and figure 6 are hard to read.
3. While the paper introduces a novel approach, it lacks comparison with state-of-the-art anomaly detection techniques in cybersecurity.
4. The Integrated Relevance Analysis (IRA) is an interesting addition, but the results of the chi-squared test, Cramér’s V, and Pearson correlation analysis seem underexplored. How does the IRA directly assist in identifying specific attack types?
5. The results, particularly Table 9, indicate high detection accuracy but don’t give enough insight into the false positive and false negative rates across different scenarios. A general discussion might be helpful for me to understand.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 5 Report
Comments and Suggestions for AuthorsThis paper focuses on anomaly detection using unsupervised learning, more specifically they deploy autoencoders, for detecting threats in industrial control systems. The introduction gives a good motivation and the literature review is vast. The efficacy of the work is demonstrated on the datasets.
Below are my comments:
· The train and test split is clear. But how is overfitting avoided?
· Only one method is applied. More comparative methods need to be added. Like PCA and 3 sigma confidence interval as a baseline and since there is temporal dependency, LSTM autoencoder can be tried?
· In the IRA section, multiple hypothesis testing is being conducted. Was the level of significance adjusted for the multiple tests?
· The tables in the paper are far away from the paragraph where is it being discussed.
· Please proofread the paper
· The literature review for autoencoder can be made richer by adding- https://ieeexplore.ieee.org/abstract/document/10020482
Comments on the Quality of English LanguageCheck above
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 5 Report
Comments and Suggestions for AuthorsThank yopu for addressing the comments.
Minor comments:
Table 7 has LSTM and LSTM stacked with exact same values. Once should be removed.
Table 9 has F1 score twice instead of accuracy.
Overfitting is still a concern even if it is a remapping model under the normal setting.
Please proofread the paper
Author Response
Please see the attachment
Author Response File: Author Response.pdf