Next Article in Journal
An Experimental Study of an Autonomous Heat Removal System Based on an Organic Rankine Cycle for an Advanced Nuclear Power Plant
Previous Article in Journal
Identification of Sub-Synchronous Oscillation Mode Based on HO-VMD and SVD-Regularized TLS-Prony Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generating Payloads of Power Monitoring Systems Compliant with Power Network Protocols Using Generative Adversarial Networks

1
State Grid Jibei Electric Power Company Limited, Tangshan 063000, China
2
Beijing Kedong Electric Power Control System Co., Ltd., Beijing 100192, China
3
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
*
Author to whom correspondence should be addressed.
Energies 2024, 17(20), 5068; https://doi.org/10.3390/en17205068
Submission received: 15 September 2024 / Revised: 9 October 2024 / Accepted: 10 October 2024 / Published: 11 October 2024

Abstract

:
In the network environment of power systems, payload generation is used to construct data packets, which are used to obtain data for the security management of network assets. Payloads generated by existing methods cannot satisfy the specifications of the protocols in power systems, resulting in low efficiency and information errors. In this paper, a payload generation model, LoadGAN, is proposed by using generative adversarial networks (GANs). Firstly, we find segmentation points to cut payloads into different segment sequences using sliding window schema based on Bayesian optimization. Then, we use different payload segments to train several child generators to generate corresponding parts of a whole payload. Segment sequences generated by these generators are assembled to form a whole new payload that is compliant with the specifications of the original network protocol. Experiments on the Mozi botnet dataset show that LoadGAN achieves precise payload segmentation while maintaining a high payload effectiveness of 85.5%, which is a 40% improvement compared to existing methods.

1. Introduction

1.1. Literature Review

With the advancement of the energy network strategy, power systems are gradually moving away from their traditional exclusiveness, leading to the proposal of the interactive power management system [1]. Many researchers have proposed excellent maintenance policies for power systems (such as Ref. [2]), which greatly improve systematic operating efficiency while also placing higher demands on cybersecurity. In power systems, network asset detection can be used to depict complex structures of network topology, assisting in constructing open and interactive power systems. At the same time, network asset detection can help with security management. Within the network asset detection process, payload construction is of great importance. The constructed payloads are sent to target nodes to make requests. Then, relevant information is collected via analysis of responses. Therefore, payload generation is a key step in network security management, and the efficiency and accuracy of payload generation directly affect the security management of power systems [3].
The development of machine learning technology has changed the paradigm of network security [4]. However, existing methods for payload generation mostly rely on manual analysis, which requires researchers to have previous knowledge of the target network [5], including the rules and protocols. At the same time, network protocols are diverse in power systems, and most are still in unknown and undisclosed states; therefore, it is difficult to manually identify which protocol network traffic belongs to.
In order to automatically generate payloads for network asset detection in power systems, one feasible method is to randomly mutate a certain number of bytes in the sample payloads [6]. However, this method may generate a large number of invalid payloads that cannot pass the basic format verification of the original protocol. This may result in low efficiency during the process of network asset detection. Another typical method involves generating payloads using neural networks. Generative adversarial networks (GANs) have various applications in data generation and are broadly applied in industrial control networks [7]. Compared with methods based on random mutations, GANs can learn the semantic information of network payloads automatically. Payloads generated by GANs are more compliant with the format specifications of the original protocol. However, a typical network payload often contains a feature sequence marking the version and identity of its control program. As a result, GANs may learn and overemphasize these features during the training process. This ultimately causes redundant features in the output of GANs, leading to severe decreases in the data quality of generated payloads.
This paper aims to eliminate the effect of feature sequences during the training process of GANs and generate payloads compliant with network protocols in power systems. In order to accurately identify potential cutting points in payloads, we reference the application of neural networks in industrial control networks. Osipov et al. designed a windowed Fourier transform (WFT)–2D-CapsNet to identify the transition between two rock layers with different properties, reaching an accuracy of 99% [8]. Inspired by this method, we explored an innovative method to cut sample payloads into segment sequences and training models, respectively, to generate different parts of a whole payload. These generated segment sequences were finally concatenated together to form a whole payload as output. Guided by this ideology, a novel payload generation model, LoadGAN, is proposed in this paper. Moreover, we designed LoadCut, a payload segmentation algorithm, to achieve better segment payloads while mitigating the impact of feature sequences in sample payloads. The key innovations of this study are as follows:
(1) We designed a model, LoadGAN, to generate network payloads based on segment seqGAN. This model contains a series of seqGAN generators, it first cuts the sample payloads into segment sequences, and then trains seqGAN generators, respectively, using these segment sequences. By concatenating the outputs of each generator together, LoadGAN can generate realistic, structured network payloads while mitigating the effect of feature sequences in sample payloads.
(2) A payload segmentation algorithm, LoadCut, is proposed to segment the payloads on the network’s application layer. We reference the sliding window method in WFT [9] and designed a payload segmentation method based on the information entropy of different payload segments within a sliding window. We implemented an adaptive adjustment mechanism for the parameters used to decide segmentation points in the payload sequences.
(3) Experiments were conducted on single-state payloads in the Mozi botnet and results show that payloads generated by LoadGAN are more compliant with protocols in the Mozi botnet. In our experiments, LoadGAN achieved an 85.5% payload effectiveness, which is an increase of about 40% compared to existing methods based on GANs and about 50% compared to traditional methods based on random mutations.

1.2. Related Work

1.2.1. Payload Generation

Currently, in power systems employing payloads for network detection, the methods for generating payloads can be broadly categorized into two types: methods based on semantic payload information [10] and methods based on mutations of existing payloads [11]. In the field of semantic generation, Wei et al. presented SemFuzz, which uses a novel technique leveraging vulnerability-related text (e.g., CVE reports and Linux git logs) to guide the automatic generation of test cases and PoC exploits [12]. This method produces valid test cases that conform to the target program’s input format, but it requires manual setup for the parameters of data structure modeling. In contrast, Zhao et al. proposed a device-free driver fuzzing system, DR. FUZZ, which can generate semi-malformed inputs based on semantic information [13], achieving a higher code coverage rate for the tested program. With mutation-based methods, Zhang et al. utilized the time information of seeds being added to the seed queue to set the seed selection probability [14]. The advantage of this method is that newly added seeds have a higher probability of being selected, but this advantage gradually decreases over time. Yet, this probabilistic method often yields a majority of invalid cases. Demir et al. proposed a seed selection strategy based on a clustering algorithm, where the test data in the corpus were clustered, and a cluster was randomly selected. Then, one or more test data in the selected cluster were randomly chosen for mutation [15]. AFL (American Fuzzy Lop) introduced a feedback mechanism for state coverage based on the dumb fuzzer [16]. This mechanism depends on instrumentations in the source code of the target program. AFL guides the mutations of test cases using genetic algorithms based on the edge coverage rate, considering internal transitions of the program’s logic. However, this approach still requires instrumentations into the target program and corresponding network samples. Moreover, some tools used for network testing can automatically select the method of payload generation based on real-time tasks (for example, ZMap). Huang et al. employed a combination of ZMap and ZipperZMap to construct a cyberspace surveying system and apply it to security management in power systems [17]. However, ZMap cannot ensure detection accuracy due to its lack of optimization for special network protocols in the power system, which may result in inefficiency and misreports.

1.2.2. Generative Adversarial Networks

GANs are widely used in data generation. Existing types of generative adversarial networks include DCGAN, cGAN, and StackGAN (used to generate two-dimensional data, such as images), as well as seqGAN and madGAN (used to generate one-dimensional sequence data).
For example, the specific workflow of seqGAN [18] is as follows:
(1) Pre-train the generator model with real data; enable it to generate simple samples similar to real data.
(2) Pre-train the discriminator model with real data; enable it to distinguish between real data and generated data.
(3) Use a generator to generate a batch of data (similar to real data).
(4) Recognize the input data via the discriminator to determine whether the input sample was generated by the generator.
(5) The discriminator provides a reward value for each token that makes up the complete sequence in the input sample, and the generator model adjusts the latent variables based on the policy gradient algorithm to affect the next turn of the training process.
(6) Repeat steps 3–5, and progressively refine the output of the generator.

1.2.3. Bayesian Optimization

Bayesian optimization [19,20] is a process used to find better hyperparameters in machine learning or deep learning models. During the training process of the model, parameters defined as formulas or numerical descriptions of attributes are used as hyperparameters. For example, in machine learning, the threshold parameter for hierarchical clustering is used as the hyperparameter of the clustering model [21]. In deep learning, the number of layers and the number of latent variables in each layer used to train the autoencoder are also hyperparameters of the autoencoder model.
Choosing hyperparameters and finding suitable hyperparameters are crucial for model training. If the parameters are manually adjusted, the efficiency and speed of the entire search process will be extremely low. Therefore, an automatic method for finding the best hyperparameters is needed [22]. These existing methods for automatic hyperparameter searching mainly include grid search [23], random search [24], and Bayesian optimization.
The Bayesian optimization process involves using Gaussian process regression [25] to calculate the posterior probability distribution [26] of observed points, given several sample points, and to obtain the expected mean and variance of each hyperparameter at each observation point. The larger the expected mean, the better the performance of the model [27], while the larger the variance, the higher the uncertainty of the results. Bayesian optimization can quickly and automatically find hyperparameter intervals that achieve better performance. The hyperparameter search of Bayesian optimization is shown in Figure 1.
In this paper, a network payload generation model, LoadGAN, is proposed, which can generate network payloads that are compliant with corresponding protocol rules. In this model, we designed a novel payload segmentation algorithm, LoadCut, based on information entropy and Bayesian optimization, and applied it to cut single-state payload samples. The output of LoadCut was used to train child generators in LoadGAN, eliminating the influence of feature sequences in payload samples.
The rest of this paper is organized as follows. Section 2 describes our model, LoadGAN, and the payload generation framework using GANs. Section 3 describes our payload segmentation algorithm, LoadCut, in detail, for network payloads. In Section 4, experiments are presented and the results are analyzed. Finally, the conclusion and future works are presented in Section 5.

2. Payload Generation Framework Using Generative Adversarial Networks

In this section, a novel framework for network payload generation, LoadGAN, is proposed, as depicted in Figure 2. This framework comprises a unidirectional classifier for classifying network payloads, a payload segmentation module, a subsequence training module, and a payload concatenation module. The main idea of this framework is to segment network payloads into segment sequences based on entropy thresholds, which serve as the training sets for child SubseqGANs generators. Using this method, we could effectively mitigate the problem of redundant features in the output of GANs and make generated payloads compliant with the rules of specific protocols, without exposing this framework to the real-time network environment.
In Figure 2, ① represent the input process of the original payload. After ①, the original payload is cut into pieces, which is denoted as ②. ③ represent the process of generating different payload segments, while different segments are combined to form a complete payload in ④. Generated payloads are input into the invalid payload filter in process ⑤, which outputs the reward value to adjust the parameters of payload cutting algorithm in process ⑥.

2.1. Framework and Procedure

The framework of our payload generation framework is illustrated in Figure 2. This framework utilizes payloads of a specific state as input data, which are captured from a real-time network environment. The two main procedures of the framework involve training and data generation. After segmenting payloads, concatenating sequences, and filtering invalid payloads, the framework outputs complete network payloads.
The payload generation procedure in this framework is as follows:
(1) Use payloads in a single state as a training set to train an invalid payload filter. This filter is designed to identify and exclude invalid payloads that are not compliant with the format specifications corresponding to the current network state.
(2) Use all payloads in a single state as input of the payload segmentation algorithm. Segment these payloads into n sets of training subsequences, ranging from subseq_train_0 to subseq_train_n.
(3) Use the training subsequences from (2) as inputs of corresponding GAN subsequences (subseqGAN_0 to subseqGAN_n). Train GAN subsequences, respectively.
(4) Generate different parts of a complete payload using SubseqGAN generators. Input these generated subsequences into the sequence concatenation module.
(5) Concatenate all generated subsequences to form complete network payloads, then input them into the invalid payload filter.
(6) The invalid payload filter assesses whether a payload is valid according to the correct format of the corresponding state. It outputs the proportion of valid payloads in the total generated payloads as the reward value to adjust the parameters of the LoadCut algorithm. Meanwhile, valid payloads, after filtering, are output as the final generated set of payloads. This procedure ensures that the generated payloads closely match the correct format in real network communication scenarios.

2.2. Invalid Payload Filter

Support vector machine (SVM) is a fundamental and robust small-sample learning method. One-class SVM (OCSVM) is an extension of SVM, which can accurately differentiate between valid and invalid samples, even with limited data [28]. As depicted in Figure 3, OCSVM was trained using payloads in a single state and then applied to filter invalid payloads. The purpose of this classifier is to exclude invalid cases in the generated payloads and figure out the proportion of valid payloads, making sure that the generated payloads are compliant. This approach ensures precision while avoiding reliance on extensive data, providing reliable support for effective payload generation.
As shown in Figure 3, payloads in specific states are initially used as training data to train an OCSVM. This enables the OCSVM to discern whether payloads are compliant with the format specifications of that state type. The generated complete payloads are input into the OCSVM (Step ①), which filters out invalid payloads and retains the valid ones as output, while simultaneously calculating the reward value (Step ②). The reward value denotes the proportion of payloads that conform to the format specifications of the network protocols. It is calculated as shown in Equation (1), where C n t v a l i d represents the number of payloads filtered by the OCSVM, and C n t t o t a l represents the total number of input payloads:
R e w a r d = C n t v a l i d / C n t t o t a l

2.3. SubseqGANs: Subsequence Generative Adversarial Networks

The objective is to endow the generators of each subsequence set, subseq_train_i, with the ability to generate semantically similar subsequences. The training and data generation procedures of SubseqGANs are depicted in Figure 4.

2.4. Sequence Concatenation

During the sequence concatenation process, all sets of subsequences generated by SubseqGANs in step 3 serve as the input data. These input data are concatenated through Cartesian product operations, forming complete network payloads, as illustrated in Algorithm 1.
Algorithm 1. Subsequence set concatenation algorithm
Input: N sets of generated subsequence sets (Subseq)
Output: Complete payload set (Payload_result)
1.         Function combine (Subseq_i, Subseq_j):
2.                 Ret = []
3.             // Iterate through the subsequences of Subseq_i
4.         For seq_i in Subseq_i:
5.                 For seq_j in Subseq_j:
6.                          // concatenate to form long subsequences
7.                         Ret.append(seq_i + seq_j)
8.         Return Ret
9.         Payload_result = combine (Subseq_0, Subseq_1)
10.             // concatenate all subsequence sets.
11.         For i in [2..N]:
12.                 Payload_result = combine (Payload_result, Subseq_i)
13.         Return Payload_result
The concatenation algorithm takes N sets of generated subsequences, denoted as Subseq, as input. It performs Cartesian product operations on each pair of subsequences to generate complete payload sequences.
Subsequence concatenation (Lines 1 to 6): Each subsequence in Subseq_i is concatenated with every subsequence in Subseq_j, forming a larger subsequence set. In this new set, each subsequence is created by concatenating elements from Subseq_i and Subseq_j. The concatenation process is initialized (Line 7) by concatenating the first two sets of generated subsequences, Subseq_1 and Subseq_2, forming the initial concatenation output.
Concatenation of the remaining subsequence sets (Lines 8 to 10): Building upon the initial concatenation output, the algorithm continues concatenating for the remaining subsequence sets in pairs, ultimately forming the complete payload set, called Payload_result. The details are described in Lines 8 to 10.
The number of payloads in the generated set, Payload_result, depends on the number of subsequences within each subsequence set Subseq_i, denoted as C n t s u b s e q _ i . This relationship is expressed in Equation (2).
C n t r e s u l t = i = 0 n C n t s u b s e q _ i
The time complexity of the concatenation algorithm can be described as Equation (3).
O ( n 2 )

3. Automatic Segmentation Algorithm Designed for Network Payloads

Existing methods for payload segmentation typically rely on manually defined ASCII or hexadecimal strings as field delimiters and employ specific substrings to split payloads. However, these methods depend on prior knowledge of the network protocol of target payloads and have certain limitations. In order to establish a relationship between the segmentation points and semantics of network payloads, we propose a novel automatic segmentation algorithm called LoadCut. This method primarily identifies potential segmentation points in payloads and then exploits the differences in information entropy near segmentation points to cut the whole payload into segmentations.

3.1. Experiments on the Information Entropy

The feature sequences in payloads are different from other parts in the chaos level. Identifying the threshold of this difference in the chaos level allows the original payload to be segmented into the feature sequences and subsequences containing other fields. To further elucidate this phenomenon, we conducted experiments on the information entropy of some sequences in payloads captured from the Mozi botnet.
These experiments utilize payloads between Mozi botnet nodes, extracting the application-layer content of a communication flow as the target to analyze. A small segment of the payload is extracted from the command payload, targeting a specific target within the Mozi botnet, as illustrated in Figure 5. Calculations and analysis of information entropy were performed, aimed at extracting the payload segments between the ‘key’ and ‘value’ parts, as denoted by the red line in Figure 5.
Firstly, we define a sliding window of 10 bytes, allowing it to continuously slide right on the payload segment until the right boundary of the sliding window reaches the last byte of the payload segment. While the sliding window moves, the information entropy value of the byte sequence within the window is computed. The corresponding entropy values at each offset position, as the window shifts, are recorded. In this case, relevant data are presented in Table 1.

3.2. Experimental Results

Based on the offset positions of the sliding window and the entropy values within the window as presented in Table 1, the variations in information entropy values are depicted in Figure 6. The horizontal axis represents the offset position of the sliding window’s starting point, and the vertical axis represents the entropy values of the byte content within the corresponding sliding window. The sliding window’s offset positions 3 and 4 correspond to the points with the highest entropy values. After referencing the window contents in Table 1, these two offset positions signify the transition of the sliding window from the ‘key’ part to the ‘value’ part.

3.3. Collection of the Best Segmentation Results

Through detailed analysis of experimental results, when the entropy value exceeds 3.0, the sliding window precisely transits from the ‘key’ part to the ‘value’ part of the key–value pairs, representing the target IP address. Utilizing this characteristic, we set the entropy threshold at 3.0, effectively segmenting the segmentation of payload samples into ‘Key’ and ‘Value’. From the perspective of byte information, this suggests that there is a relationship between entropy values and potential segmentation points in byte sequences. By choosing appropriate entropy threshold values and the length of sliding windows, segmentation points in the target byte sequences can be determined. At these points, the entropy value surpasses the entropy threshold, showing significant differences in entropy values compared with nearby offset positions.
In summary, experimental results indicate that selecting an appropriate sliding window length and threshold value can identify potential segmentation points within network payloads, forming semantically meaningful subsequences. However, determining the optimal entropy threshold and the length of the sliding window is challenging, as global search methods consume considerable time. Therefore, this study introduces Bayesian optimization as a black-box optimization algorithm for parameter optimization. Parameters such as entropy threshold and the length of the sliding window are treated as hyperparameters, and Bayesian optimization is employed to search for suitable hyperparameters, achieving optimal performance when segmenting payloads captured in unknown network environments.
Based on the analysis above, a segmentation algorithm for network payloads, called LoadCut, is proposed, utilizing a combination of information entropy and Bayesian optimization. The algorithm’s parameter selection and segmentation procedure are shown in Algorithm 2. LoadCut employs Bayesian optimization to rapidly search for suitable combinations of hyperparameters and determines the segmentation points in network payloads based on the values of information entropy within the sliding window. The algorithm utilizes the reward value as the reward metric for Bayesian optimization, which is calculated by the invalid payload filter stated in Section 3.3. The algorithm takes a collection of captured payload samples as input, and outputs segmented subsequences of network payloads after optimal segmentation, denoted as Best_Subseqs.
Algorithm 2. LoadCut algorithm
Input: Network communication payload collection (Origin_Payloads)
Output: Subsequence collection of network communication payloads after best segmentation (Best_Subseqs)
1.            Function Cut(Origin_Payload, TH_ENT, LEN_WND, SEQ_CNT):
2.                  Ret = []
3.                  Cut_pos = 0
4.                  For i in 0 .. Len_payload – LEN_WND:
5.                        // get window content
6.                        Wnd = Origin_Payload[i:i + LEN_WND]
7.                        Len_payload = len(Origin_Payload)
8.                        // Calculate the entropy of window content
9.                        Entropy_wnd = Calculate_info_entropy(Wnd)
10.                        // If the entropy exceeds the threshold, perform segmentation.
11.                        If Entropy_wnd > TH_ENT:
12.                            Ret.append(Origin_Payload[Cut_pos:i + LEN_WND])
13.                            i += LEN_WND
14.                            Cut_pos = i
15.                        If len(Ret) >= SEQ_CNT – 1:
16.                            Ret.append(Origin_Payload[Cut_pos:])
17.                            Break
18.                  // Fill the segmented subsequences with blank spaces
19.                  Ret.padding(SEQ_CNT)
20.                  Return Ret
21.            // initialize hyperparameters randomly
22.             TH_ENT = random(), LEN_WND = random(), SEQ_CNT = random()
23.             Subseqs = []
24.             Best_Subseqs = []
25.             Best_Reward = 0
26.            // iterate 20 times
27.             N_iter = 20
28.             For _ in range(N_iter):
29.                  For Origin_Payload in Origin_Payloads:
30.                        Slices = Cut(Origin_Payload, TH_ENT, LEN_WND, SEQ_CNT)
31.                        Subseqs.append(Slices)
32.                  // concatenate the subsequence sets into complete payloads
33.                  Payloads = Combine(Subseqs)
34.                  // calculate the proportion of valid payloads
35.                  Reward = OCSVM(Payloads)
36.                  If Reward > Best_Reward:
37.                        Best_Reward = Reward
38.                        Best_Subseqs = Subseqs
39.                  // use Bayesian optimization to find optimal hyperparameters
40.                  TH_ENT,LEN_WND,SEQ_CNT=
41.                            Bayesian_Opt(Reward,TH_ENT,LEN_WND,SEQ_CNT)
42.             Return Best_Subseqs
The LoadCut algorithm takes each byte sequence from the Origin_Payloads set as input, and outputs Best_Subseqs, which includes a collection of subsequences generated from the optimal segmentation of all payloads in Origin_Payloads. Explanations of the algorithm are as follows:
Segmentation procedure of Origin_Payload (Lines 1–16): Initially, the algorithm slides a window over the input payload sample (Origin_Payload) and calculates the entropy value within the window at each sliding step. The obtained entropy value is compared with the predefined entropy threshold, TH_ENT. If the entropy value exceeds TH_ENT, the window’s end position is marked as a segmentation point (Cut_pos), and the window’s new starting point is set as the byte after the previous segmentation point. The window continues to slide until it reaches the end of Origin_Payload. The resulting subsequence collection is padded with blank bytes to form SEQ_CNT.
Random initialization of hyperparameters (Line 17): Initial values for hyperparameters include entropy threshold (TH_ENT), the length of the sliding window (LEN_WND), and the number of subsequences (SEQ_CNT). These hyperparameters are set using random numbers. According to the Bayesian optimization process, initial hyperparameters will not have an impact on the final optimized hyperparameters.
Initialization of temporary parameters (Lines 18–21): The collections (Subseqs and Best_Subseqs) as well as the best reward value (Best_Reward) are initialized as empty. The iteration count for Bayesian optimization is explicitly defined as 20.
Search for segmented subsequence collection (Lines 22–34): For each original payload (Origin_Payload) in Origin_Payloads, the algorithm segments the payload and adds the generated subsequences to Subseqs. Using the sequence concatenation method described in Section 3.2, the subsequences in Subseqs are concatenated to form complete payloads. The invalid payload filter mentioned in Section 3.2 is then used to compute the proportion of valid payloads (reward) in the concatenated complete payload. Subseqs with the maximum reward are set as Best_Subseqs. Bayesian optimization is applied to optimize the value of hyperparameters, which are used as the next set of hyperparameters based on reward values.
Output the best segmentation result collection (Line 35): After 20 iterations of searching for the best-segmented subsequence collection, the final output is Best_Subseqs.
The time complexity of the LoadCut algorithm can be analyzed through the Cut function and the main algorithm loop. The Cut function contains a loop that iterates over the length of the payload (Lenpayload) minus the window size (LENWND), whose complexity is O(Lenpayload). And inside this loop, the entropy value within the sliding window is calculated, whose complexity is O(LENWND). Therefore, the overall complexity of the Cut function is O(Lenpayload ∗ LENWND). In the main algorithm loop, the algorithm iterates over the Origin_Payloads collection, consisting of n payloads. For each payload, the Cut function is invoked. Given that the algorithm runs for 20 iterations, the final time complexity of the LoadCut algorithm can be expressed as Equation (4).
O   n     L e n p a y l o a d   L E N W N D

4. Experiments and Evaluation

4.1. Experiment Setups

4.1.1. Problem Statement

In order to validate the effectiveness of the proposed method for payload generation under different network protocols, we need to choose a stable and controllable network environment. After investigations, we finally chose the Mozi zombie network as our experimental environment. The Mozi botnet has been active in public network environments in recent years, primarily spreading through public protocols like UDP and DHT. We conducted experiments using the dataset of the Mozi zombie network. We mainly focused on the following research questions.
Research Question 1 (RQ1). How effective is LoadGAN when tested on known network samples? (See Section 4.2, corresponding to A1).
Research Question 2 (RQ2). How does the performance of our payload generation method compare to existing methods? This issue is discussed in detail in Section 4.3, corresponding to Part A2.

4.1.2. Experimental Environment

The experiments were conducted on a system running Ubuntu 20.04 with Python 3.9, Wireshark 3.4.6.0, and VirtualBox 7.0. The hardware resources included an Intel i7 10750H processor and 16 GB RAM. Due to the unique characteristics of zombie network nodes, particularly in relation to internal and external network penetration issues, experiments were conducted on a cloud server with a static public IP address.

4.1.3. Dataset

We constructed our experimental dataset by collecting network payloads in the Mozi botnet environment. We conducted simulated runtime experiments based on existing binary samples in the Mozi zombie network on a publicly accessible server. We ran each individual binary sample for approximately 10 days, during which, a total of 86,291 network payloads between Mozi network nodes were collected. The captured payloads contained the following main features:
Source address and destination address: The IP addresses and ports of the source nodes and destination nodes of each communication.
Protocol type: The communication protocol used by the payload (e.g., TCP, UDP).
Payload content: The payload data were in binary format and were used for analysis and generation.
Among all the payloads in the Mozi botnet, command payloads are the most crucial payloads as they contain the instructions sent from the command and control (C2) servers to infected devices, dictating the botnet’s malicious activities and overall coordination. Using the filtered dataset of command payloads, find_nodes (Mozi_findnodes), as an example, the main structure of find_nodes is shown in Figure 7.

4.2. Validity Evaluation

4.2.1. Validity Evaluation Procedures

In the experiment, payloads generated using LoadGAN were sent to known network nodes in the Mozi zombie network. If the Mozi network nodes successfully parsed and responded to the sent payloads, it could be confirmed that the payloads generated by this model were valid. We compared the total number of payloads in the experiment (represented as C n t a l l ) with the number of valid payloads (represented as C n t v a l i d ). The evaluation criterion for the experiment is determined using Equation (5), as follows:
R a t e v a l i d = C n t v a l i d / C n t a l l
  • A1: Validity evaluation experiment (answering RQ1).
In this experiment, we utilized find_nodes payloads as the input dataset. We demonstrated the effectiveness of LoadGAN in generating valid payloads by comparing results on the experimental dataset with different components of LoadGAN either enabled or disabled. Figure 7 shows a selection of filtered find_nodes command payloads.
Specific evaluation schemes are as follows.
  • Directly use the payload samples to train GANs without the payload segmentation module, then filter out invalid ones using an invalid payload filter (denoted as scheme 1).
  • Use the LoadCut algorithm to cut payloads into sequences, and then train seqGANs, respectively. Concatenate all generated payload segments to form complete payloads (denoted as scheme 2).
  • Generate payloads as (2), and then filter out invalid samples using an invalid payload filter (denoted as scheme 3).
According to the evaluation schemes above, evaluations are conducted as follows (and will be noticed at the beginning when some steps in the schemes are not executed):
(1) After inputting the dataset (Mozi_findnodes) into our model, trace the dataflow inside the model and observe how the payloads are handled.
(2) This step is not included in evaluation scheme 1. Following the procedure described in Section 3.2, the input dataset of the Mozi_findnodes is segmented into several subsequences using the LoadCut algorithm. These subsequences are illustrated in Figure 8. After segmentation, each payload is divided into eight subsequence parts.
(3) This step is not included in evaluation scheme 1. Figure 9 shows the segmentation results of some payloads from the Mozi zombie network. Each subsequence in these payloads is sent to train a SubseqGANs generator. Subsequently, the trained generators are utilized to generate the corresponding eight subsequence sets. The outputs of the subsequence sets are shown in Figure 9. The eight sets in the figure represent the subsequence sets of payloads in the Mozi network generated after this step.
(4) This step is not included in evaluation scheme 1. The subsequences generated in the third step are combined through Cartesian product concatenation to form a complete payload list. Examples of complete payloads formed after concatenation are shown in Figure 10. The portions highlighted in red in the figure represent segments identified through manual judgment that have significant differences from real payloads.
(5) This step is not performed in evaluation schemes 1 and 3. A one-sided classifier is employed to filter out the payloads formed in the fourth step that do not satisfy the format specifications of the Mozi botnet. The retained payloads represent those compliant with the Mozi botnet protocol through evaluation scheme 2.
(6) This step is not included in evaluation schemes 1 and 2. The complete payloads formed in the fourth step are directly used as network payloads and are generated in accordance with the Mozi protocol through evaluation scheme 3.
(7) This step is not included in evaluation schemes 2 and 3. All Input payload samples in the first step are used as the training set. A GAN is trained directly using this training set. Subsequently, a batch of complete botnet payloads is generated using the trained GAN, as shown in Figure 11. Portions with significant differences from real data are highlighted in red boxes in the figure.
(8) This step is not included in evaluation schemes 2 and 3. The complete payloads obtained in (7) are filtered using a one-sided classifier to exclude invalid payloads that do not conform to the format specifications of the Mozi botnet. The set of retained valid payloads constitutes the final output.
(9) The generated network payloads conforming to the Mozi botnet protocol are sent to the target Mozi node. Subsequently, within a timeout period of 15 s, the reception of response messages from the Mozi botnet node is monitored. The effectiveness of the generated payloads is verified by assessing the correctness of the received contents in the response message. The payloads in the application layer carrying the response messages in the Mozi botnet are illustrated in Figure 12.

4.2.2. Experimental Results of the Validity Evaluation

Based on the aforementioned steps, we count the number of effective payloads generated by three evaluation schemes, each having a total payload count of 400. We also calculate the proportion of valid payloads within the total generated payloads. The experimental results are presented in Table 2.
From Table 2, it is evident that evaluation scheme 2 achieved the highest proportion of valid payloads, reaching 85.5%. This scheme processed all the payloads from the input dataset using the LoadCut segmentation algorithm. It generated payload subsequences of the Mozi botnet using GANs, combined these subsequences through Cartesian product operations to form complete payloads, and then filtered out invalid complete payloads using payload filters. This scheme fully utilized all components and modules in the LoadGAN framework. The experimental results demonstrate that the payloads generated using LoadGAN exhibit high effectiveness.

4.3. Comparative Experimental Evaluation

4.3.1. Comparative Evaluation Procedures

In this section, three distinct methods used to generate network payloads in the Mozi botnet are employed, including generating payloads through random mutations, directly training GANs to generate complete payloads, and utilizing LoadGAN. By comparing the effectiveness of payloads generated by different methods, this section aims to validate the superiority of LoadGAN over existing approaches.
  • A2: The following comparative experiment was designed to discuss the performances of three payload generation methods in generating payloads in the Mozi botnet environment, addressing RQ2.
Method one: Random mutation payload generation in reference [11]. A total of 20% of the total bytes in payloads were designated for random mutations. A total of 400 find_nodes payloads, 300 ping payloads, and 250 config_sync payloads were randomly selected. For each payload, 20% of the bytes were randomly chosen, and these bytes were replaced with random bytes ranging from 0 to 255, as shown in Figure 13. The figure illustrates an example of a find_nodes original payload, showing the newly generated payload after mutating 20% of its bytes randomly.
Method two: Directly train GANs to generate complete payloads [29]. Choose seqGAN as the target model and train it on the find_nodes, ping, and config_sync payload sets, in which the training objective is to complete zombie network payloads. Subsequently, 400 find_nodes payloads, 300 ping payloads, and 250 config_sync payloads were generated using this method. Figure 13 illustrates find_nodes payloads generated directly through training with SubseqGAN.
Method three: Use LoadGAN to generate payloads. The LoadCut algorithm was used to segment complete payloads into subsequence sets for different payload sets of find_nodes, ping, and config_sync. In detail, 400 find_nodes payloads, 300 ping payloads, and 250 config_sync payloads were generated for experimentation.
The generated network payload datasets from each method are sent to real Mozi botnet nodes to conduct network detection tasks. The validity of the generated payloads is determined by verifying whether the Mozi nodes return normal responses. The proportion of valid payloads within all generated payloads is calculated and summarized for each method.

4.3.2. Experimental Results of Comparative Evaluation

The final results are shown in Figure 14. The x-axis represents different payload types in the Mozi botnet. The experiment compared the effectiveness of the three different methods for generating find_nodes, ping, and config_sync payloads. The y-axis represents the effectiveness percentage, indicating the proportion of effective payloads within all generated payloads. A higher effectiveness percentage means the generated payloads can better conform to the Mozi botnet protocol and complete the task of network asset detection.
Conclusion: Based on the experimental results, the payloads generated using LoadGAN performed better in the detection task. When employing the method based on random mutation, the effectiveness of payload generation for these three payload types remains below 40%, while the second approach performs slightly better compared to the random mutation method (however, the overall efficiency still falls short of 50%). In contrast, LoadGAN maintains an overall effectiveness of around 80% for all three payload types. This suggests that, compared to the other two popular methods, LoadGAN exhibits superior efficiency.

5. Conclusions and Future Work

In order to manage network assets in power systems, this study proposes a novel payload generation model, LoadGAN, which addresses the limitations in existing methods for generating network payloads, making generated payloads more compliant with the format specifications of network protocols. We also propose a LoadCut algorithm based on information entropy and Bayesian optimization to achieve better payload segmentation. With this algorithm, child generators in LoadGAN can precisely generate different parts of a complete payload and combine generated parts to form structured network payloads. We also designed a one-sided classifier to filter generated payloads and enhance the effectiveness of generated test cases. Experimental results demonstrate that compared with existing methods, LoadGAN is more effective, achieving a higher validity of generated network payloads.
However, LoadGAN may have some potential limitations. Firstly, SeqGANs used in LoadGAN may still face issues related to mode collapse, leading to limited types of generated payloads. This could reduce the effectiveness of LoadGAN in real-world scenarios. Secondly, the effectiveness of LoadGAN may be compromised by invalid payload filters, which might fail to accurately filter out invalid payloads due to adversarial attacks. In future works, we will mainly focus on addressing these issues and continuously improving the performance of LoadGAN.

Author Contributions

Conceptualization, H.Z. (Hao Zhang 1) and Y.L.; Methodology, H.Z. (Hao Zhang 1), Y.L., J.Z. and J.W.; Validation, H.Z. (Hao Zhang 2), J.Z., J.W. and T.X.; Formal analysis, H.Z. (Hao Zhang 1) and H.Z. (Hao Zhang 2); Investigation, Y.L. and J.Z.; Writing—original draft, T.X.; Writing—review & editing, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Corporation of China “Network Security Alarm Fusion Discrimination and Collaborative Traceability Technology and Application in Electric Power System Supervision and Control” grant number 52010123003N.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Hao Zhang, Jun Zhang and Hao Zhang were employed by the State Grid Jibei Electric Power Company Limited, Authors Ye Liang, and Jing Wang were employed by the Beijing Kedong Electric Power Control System Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from State Grid Corporation of China. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:
GANsgenerative adversarial networks
SVMsupport vector machine
OCSVMone-class support vector machine
LoadGANthe payload generation model using GANs
LoadCutthe payload segmentation algorithm
SeqGANsexisting types of GANs
SubseqGANspayload segment sequences based on entropy thresholds

References

  1. Li, Z.; Chen, Z.; Wang, C.; Xu, Z.; Ye, L. Network security threat tracing technology of power monitoring system. Electr. Power Eng. Technol. 2020, 39, 166–172. [Google Scholar]
  2. Qiu, Q.; Cui, L.; Yang, L. Maintenance policies for energy systems subject to complex failure processes and power purchasing agreement. Comput. Ind. Eng. 2018, 119, 193–203. [Google Scholar] [CrossRef]
  3. Chen, T.Y.; Cheung, S.C.; Yiu, S.M. Metamorphic testing: A new approach for generating next test cases. arXiv 2020, arXiv:2002.12543. [Google Scholar]
  4. Pleshakova, E.; Osipov, A.; Gataullin, S.; Gataullin, T.; Vasilakos, A. Next gen cybersecurity paradigm towards artificial general intelligence: Russian market challenges and future global technological trends. J. Comput. Virol. Hacking Tech. 2024, 20, 429–440. [Google Scholar] [CrossRef]
  5. Fowler, D.S.; Bryans, J.; Cheah, M.; Wooderson, P.; Shaikh, S.A. A method for constructing automotive cybersecurity tests, a CAN fuzz testing example. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion, Sofia, Bulgaria, 22–26 July 2019; pp. 1–8. [Google Scholar]
  6. Song, C.; Yu, B.; Zhou, X.; Yang, Q. SPFuzz: A hierarchical scheduling framework for stateful network protocol fuzzing. IEEE Access 2019, 7, 18490–18499. [Google Scholar] [CrossRef]
  7. Gambi, A.; Mueller, M.; Fraser, G. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, Beijing, China, 15–19 July 2019; pp. 318–328. [Google Scholar]
  8. Osipov, A.V.; Pleshakova, E.S.; Gataullin, S.T. Production processes optimization through machine learning methods based on geophysical monitoring data. Comput. Opt. 2024, 48, 633–642. [Google Scholar]
  9. Osipov, A.; Pleshakova, E.; Bykov, A.; Kuzichkin, O.; Surzhik, D.; Suvorov, S.; Gataullin, S. Machine Learning Methods Based on Geophysical Monitoring Data in Low Time Delay Mode for Drilling Optimization. IEEE Access 2023, 11, 60349–60364. [Google Scholar] [CrossRef]
  10. Dai, H.; Sun, C.; Jin, H.; Xiao, M. Research progress in fuzzy testing technology for deep learning systems. J. Softw. Sci. 2023, 34, 5008–5028. [Google Scholar]
  11. Niu, S.; Li, P.; Zhang, Y. Survey on fuzzy testing technologies. Comput. Eng. Sci. 2022, 44, 2173–2186. [Google Scholar]
  12. You, W.; Zong, P.; Chen, K.; Wang, X.; Liao, X.; Bian, P.; Liang, B. SemFuzz: Semantics-based automatic generation of proof-of-concept exploits. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Dallas, TA, USA, 30 October–3 November 2017; pp. 2139–2154. [Google Scholar]
  13. Zhao, W.; Lu, K.; Wu, Q.; Qi, Y. Semantic-informed driver fuzzing without both the hardware devices and the emulators. In Proceedings of the Network and Distributed Systems Security Symposium, San Diego, CA, USA, 24–28 April 2025. [Google Scholar]
  14. Zhang, P.; Ren, B.; Dong, H.; Dai, Q. Cagfuzz: Coverage-guided adversarial generative fuzzing testing for image-based deep learning systems. IEEE Trans. Softw. Eng. 2021, 48, 4630–4646. [Google Scholar] [CrossRef]
  15. Demir, S.; Eniser, H.F.; Sen, A. DeepSmartFuzzer: Reward Guided Test Generation For Deep Learning; American Fuzzy Lop, Zalewski, M., Eds.; CEUR-WS: Anissaras, Greece, 2020. [Google Scholar]
  16. Zalewski, M. American Fuzzy Lop. Available online: http://lcamtuf.coredump.cx/afl/ (accessed on 1 March 2020).
  17. Huang, W.; Gu, Z.; Guo, J. Research on Power Cyberspace Surveying and Penetration. Electr. Power Inf. Commun. Technol. 2021, 19, 49–54. [Google Scholar]
  18. Li, W.; Wang, X.B.; Xu, Y. Recognition of CRISPR Off-target cleavage sites with SeqGAN. Curr. Bioinform. 2022, 17, 101–107. [Google Scholar] [CrossRef]
  19. Garnett, R. Bayesian Optimization; Cambridge University Press: New York, NY, USA, 2023. [Google Scholar]
  20. Greenhill, S.; Rana, S.; Gupta, S.; Vellanki, P.; Venkatesh, S. Bayesian optimization for adaptive experimental design: A review. IEEE Access 2020, 8, 13937–13948. [Google Scholar] [CrossRef]
  21. Subramanian, M.; Lv, N.P.; VE, S. Hyperparameter optimization for transfer learning of VGG16 for disease identification in corn leaves using Bayesian optimization. Big Data 2022, 10, 215–229. [Google Scholar] [CrossRef] [PubMed]
  22. Wainer, J.; Fonseca, P. How to tune the RBF SVM hyperparameters? An empirical evaluation of 18 search algorithms. Artif. Intell. Rev. 2021, 54, 4771–4797. [Google Scholar] [CrossRef]
  23. Ngoc, T.T.; Le Van Dai CM, T.; Thuyen, C.M. Support vector regression based on grid search method of hyperparameters for load forecasting. Acta Polytech. Hung. 2021, 18, 143–158. [Google Scholar] [CrossRef]
  24. Turner, R.; Eriksson, D.; McCourt, M.; Kiili, J.; Laaksonen, E.; Xu, Z.; Guyon, I. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In Proceedings of the NeurIPS 2020 Competition and Demonstration Track, Virtual, 6–12 December 2021; pp. 3–26. [Google Scholar]
  25. Deringer, V.L.; Bartók, A.P.; Bernstein, N.; Wilkins, D.M.; Ceriotti, M.; Csányi, G. Gaussian process regression for materials and molecules. Chem. Rev. 2021, 121, 10073–10141. [Google Scholar] [CrossRef] [PubMed]
  26. Du, Z.; Chai, H.; Yin, X.; Liu, C.; Shi, M. A method for PPP ambiguity resolution based on Bayesian posterior probability. In China Satellite Navigation Conference (CSNC) 2020 Proceedings; Springer: Singapore, 2020; pp. 324–337. [Google Scholar]
  27. Nguyen, V.; Osborne, M.A. Knowing the what but not the where in Bayesian optimization. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 7317–7326. [Google Scholar]
  28. Yang, X.; Huang, P.; An, L.; Feng, P.; Wei, B.; He, P.; Peng, K. A Growing Model-Based OCSVM for Abnormal Student Activity Detection from Daily Campus Consumption. New Gener. Comput. 2022, 40, 915–933. [Google Scholar] [CrossRef]
  29. Hu, Z.; Shi, J.; Huang, Y.; Xiong, J.; Bu, X. GANFuzz: A GAN-based industrial network protocol fuzzing framework. In Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, 8–10 May 2018; pp. 138–145. [Google Scholar]
Figure 1. Bayesian optimization method used to search for hyperparameters [19].
Figure 1. Bayesian optimization method used to search for hyperparameters [19].
Energies 17 05068 g001
Figure 2. Payload generation framework that is compliant with network protocols.
Figure 2. Payload generation framework that is compliant with network protocols.
Energies 17 05068 g002
Figure 3. Dataflow of invalid payload filter.
Figure 3. Dataflow of invalid payload filter.
Energies 17 05068 g003
Figure 4. Dataflow of subsequences generated by GANS.
Figure 4. Dataflow of subsequences generated by GANS.
Energies 17 05068 g004
Figure 5. Fragments of the botnet command payload.
Figure 5. Fragments of the botnet command payload.
Energies 17 05068 g005
Figure 6. Variations in entropy values in the sliding window experiment.
Figure 6. Variations in entropy values in the sliding window experiment.
Energies 17 05068 g006
Figure 7. Structure of the find_nodes payload.
Figure 7. Structure of the find_nodes payload.
Energies 17 05068 g007
Figure 8. Botnet payload segmentation results.
Figure 8. Botnet payload segmentation results.
Energies 17 05068 g008
Figure 9. Botnet payload subsequence generation results.
Figure 9. Botnet payload subsequence generation results.
Energies 17 05068 g009
Figure 10. Complete payload after concatenation.
Figure 10. Complete payload after concatenation.
Energies 17 05068 g010
Figure 11. The complete payload (generated directly using generative adversarial networks).
Figure 11. The complete payload (generated directly using generative adversarial networks).
Energies 17 05068 g011
Figure 12. Response from the Mozi botnet node.
Figure 12. Response from the Mozi botnet node.
Energies 17 05068 g012
Figure 13. Complete payload generated after random variation.
Figure 13. Complete payload generated after random variation.
Energies 17 05068 g013
Figure 14. Chart of experimental results for efficiency comparison.
Figure 14. Chart of experimental results for efficiency comparison.
Energies 17 05068 g014
Table 1. Entropy values within the sliding window.
Table 1. Entropy values within the sliding window.
Offset ContentEntropyOffset ContentEntropy
0“attackip”2.72192818p”:”202.102.7219281
1attackip”:2.92192819“:”202.1022.4464393
2ttackip”:”2.921928110:”202.102.2.4464393
3tackip”:”23.121928111“202.102.12.2464393
4ackip”:”203.121928112202.102.192.2464393
5ckip”:”2022.92192811302.102.19.2.2464393
6kip”:”202.2.9219281142.102.19.52.4464393
7ip”:”202.12.921928115.102.19.5”2.4464393
Table 2. Results for the effectiveness evaluation.
Table 2. Results for the effectiveness evaluation.
SchemesTotal PayloadsEffective PayloadsRatio (%)
140020451
240034285.5
340019849.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, H.; Liang, Y.; Zhang, J.; Wang, J.; Zhang, H.; Xu, T.; Wang, Q. Generating Payloads of Power Monitoring Systems Compliant with Power Network Protocols Using Generative Adversarial Networks. Energies 2024, 17, 5068. https://doi.org/10.3390/en17205068

AMA Style

Zhang H, Liang Y, Zhang J, Wang J, Zhang H, Xu T, Wang Q. Generating Payloads of Power Monitoring Systems Compliant with Power Network Protocols Using Generative Adversarial Networks. Energies. 2024; 17(20):5068. https://doi.org/10.3390/en17205068

Chicago/Turabian Style

Zhang, Hao, Ye Liang, Jun Zhang, Jing Wang, Hao Zhang, Tong Xu, and Qianshi Wang. 2024. "Generating Payloads of Power Monitoring Systems Compliant with Power Network Protocols Using Generative Adversarial Networks" Energies 17, no. 20: 5068. https://doi.org/10.3390/en17205068

APA Style

Zhang, H., Liang, Y., Zhang, J., Wang, J., Zhang, H., Xu, T., & Wang, Q. (2024). Generating Payloads of Power Monitoring Systems Compliant with Power Network Protocols Using Generative Adversarial Networks. Energies, 17(20), 5068. https://doi.org/10.3390/en17205068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop