Next Article in Journal
Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models
Previous Article in Journal
Decoupling Online Ride-Hailing Services: A Privacy Protection Scheme Based on Decentralized Identity
Previous Article in Special Issue
Clop Ransomware in Action: A Comprehensive Analysis of Its Multi-Stage Tactics
 
 
Article
Peer-Review Record

Unsupervised Security Threats Identification for Heterogeneous Events

Electronics 2024, 13(20), 4061; https://doi.org/10.3390/electronics13204061
by Young In Jang 1, Seungoh Choi 2, Byung-Gil Min 2 and Young-June Choi 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Reviewer 5: Anonymous
Electronics 2024, 13(20), 4061; https://doi.org/10.3390/electronics13204061
Submission received: 15 August 2024 / Revised: 11 October 2024 / Accepted: 14 October 2024 / Published: 15 October 2024
(This article belongs to the Special Issue Network Security and Cryptography Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article address an important and challenging problem in industrial control system security, i.e., analyzing large amounts of data and providing timely responses mainly because of the heterogeneous nature of the environment that leads to data generation with different types of alerts occurring in devices from a wide variety of vendors, making it difficult to build a reliable detection system. The proposed approach involves applying a customized pre-processing technique tailored to different data types comprising of alerts, followed by classification of unlabeled alerts using an autoencoder model, to effectively distinguish between different attack types, thereby helping the administrators to respond to attacks. The dataset used to test the effectiveness of the proposed approach was created using a HIL-based augmented ICS testbed to emulate real-world situations. Overall, the article is well-presented and the research conducted is promising. The authors have also clearly identified the existing limitations of the proposed work such mainly pertaining to dataset and static threshold for anomaly detection. One important feedback to further improve the article would be to including shortcomings of similar approaches in the related work section. As an example, a somewhat similar approach (https://doi.org/10.3390/asi7020018) was recently published and it would be beneficial to clearly state the difference between this (and other similar methods if any) and the proposed approach.

Comments on the Quality of English Language

The article can benefit from another round of proof-reading.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

  1. The authors propose a framework that preprocesses data from various security devices, applies an autoencoder model to classify alerts, and performs integrated relevance analysis (IRA) to help security administrators respond to attacks more effectively.The challenge of handling and analyzing alerts from various heterogeneous security devices in industrial environments.The issue of false positives and the difficulty of building a reliable detection system.

The use of unsupervised learning allows for the detection of new and unknown threats without the need for labeled data.The framework integrates various data preprocessing steps, anomaly detection, and relevance analysis, providing a comprehensive solution for heterogeneous environments.

  1. The analysis of deployment complexity in real-world scenarios and the complexity of the framework is lacking. The feasibility of deploying the framework has not been explained. Figures 5 and 6 are too densely arranged, which makes it difficult for readers to follow. It is recommended to adjust the layout for better readability. The discussion section lacks depth and does not serve its intended purpose. In the experiment section, the differences and performance comparisons between the proposed framework and other algorithms are not clearly presented.
  2. In Section 1, the authors summarize the shortcomings of existing methods and the innovative points of this work. However, there is no detailed description of the proposed framework as shown in Figure 1.
  3. Some sentences in the paper have some problems including colloquial expression, grammatical errors, tense errors, and so on. Therefore, it is suggested to polish the whole paper with a native English speaker.
Comments on the Quality of English Language

  • As mentioned earlier, the analysis in the discussion section is not deep enough and can be further expanded to use data to support the conclusions. In addition, adding comparison and reflection can enrich the discussion.
  •  The use of technical terms is mostly accurate, but in some areas further clarification and definition may be needed to ensure that all readers, especially non-experts, can understand the content.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

electronics-3318474

 

Unsupervised Security Threats Identification for Heterogeneous

Events

 

The authors have proposed an unsupervised threat identification method for  distinguish between normal and abnormal alerts and classifying the types of attacks. The proposed method involves data generation, data preprocessing, anomaly detection, and integrated relevance analysis (using correlations to identify attacks and its classification). The proposed technique appears sufficiently robust for detecting false alerts. Overall, the paper is well written and it can be published without further changes.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript presents an unsupervised machine learning approach, leveraging autoencoders, to detect security threats from heterogeneous events, specifically in industrial control systems and operational technology environments. The authors attempt to address a significant challenge in cybersecurity: the complexity of dealing with unlabeled, heterogeneous alert data, which often produces false alarms in security operations centers. This study proposes a framework that includes preprocessing of heterogeneous data, anomaly detection, and integrated relevance analysis to assist administrators in filtering false alerts and identifying real attacks. Here are couples of comments:

 

1.     The first three paragraphs in the introduction section seem describe the background of the topic. Can they be combined into one and would that flow better?

2.     The images in Figure 5 and figure 6 are hard to read.

3.     While the paper introduces a novel approach, it lacks comparison with state-of-the-art anomaly detection techniques in cybersecurity.

4.     The Integrated Relevance Analysis (IRA) is an interesting addition, but the results of the chi-squared test, Cramér’s V, and Pearson correlation analysis seem underexplored. How does the IRA directly assist in identifying specific attack types?

5.     The results, particularly Table 9, indicate high detection accuracy but don’t give enough insight into the false positive and false negative rates across different scenarios. A general discussion might be helpful for me to understand.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

This paper focuses on anomaly detection using unsupervised learning, more specifically they deploy autoencoders, for detecting threats in industrial control systems. The introduction gives a good motivation and the literature review is vast. The efficacy of the work is demonstrated on the datasets.

 

 

Below are my comments:

 

·      The train and test split is clear. But how is overfitting avoided?

·      Only one method is applied. More comparative methods need to be added. Like PCA and 3 sigma confidence interval as a baseline and since there is temporal dependency, LSTM autoencoder can be tried?

·      In the IRA section, multiple hypothesis testing is being conducted. Was the level of significance adjusted for the multiple tests?

·      The tables in the paper are far away from the paragraph where is it being discussed.

·      Please proofread the paper

·      The literature review for autoencoder can be made richer by adding- https://ieeexplore.ieee.org/abstract/document/10020482

Comments on the Quality of English Language

Check above

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 5 Report

Comments and Suggestions for Authors

Thank yopu for addressing the comments.

 

Minor comments:

Table 7 has LSTM and LSTM stacked with exact same values. Once should be removed.
Table 9 has F1 score twice instead of accuracy.

Overfitting is still a concern even if it is a remapping model under the normal setting.

Comments on the Quality of English Language

Please proofread the paper

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Back to TopTop