Next Article in Journal
A Voice-Enabled ROS2 Framework for Human–Robot Collaborative Inspection
Previous Article in Journal
Sex Estimation from Computed Tomography of Os Coxae—Validation of the Diagnose Sexuelle Probabiliste (DSP) Software in the Romanian Population
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayes-Optimized Adaptive Growing Neural Gas Method for Online Anomaly Detection of Industrial Streaming Data

1
Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
School of Software, Tsinghua University, Beijing 100084, China
4
College of Electrical Engineering, Shandong Huayu University of Technology, Dezhou 253034, China
*
Author to whom correspondence should be addressed.
These authors contribute equally to this work.
Appl. Sci. 2024, 14(10), 4139; https://doi.org/10.3390/app14104139
Submission received: 30 March 2024 / Revised: 6 May 2024 / Accepted: 8 May 2024 / Published: 13 May 2024

Abstract

:
Online anomaly detection is critical for industrial safety and security monitoring but is facing challenges due to the complexity of evolving data streams from working conditions and performance degradation. Unfortunately, existing approaches fall short of such challenges, and these models may be disabled, suffering from the evolving data distribution. The paper presents a framework for online anomaly detection of data streams, of which the baseline algorithm is the incremental learning method of Growing Neural Gas (GNG). It handles complex and evolving data streams via the proposed model Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG). Firstly, novel learning rate adjustment and neuron addition strategies are designed to enhance the model convergence and data presentation capability. Then, the Bayesian algorithm is adopted to realize the fine-grained search of BOA-GNG-based hyperparameters. Finally, comprehensive studies with six data sets verify the superiority of BOA-GNG in terms of detection accuracy and computational efficiency.

1. Introduction

Anomalies can be identified as abnormal points that have different characteristics from the majority of the data, and such novel observations may result from a specific failure, unexpected noise, etc., in industrial facilities monitoring. Online anomaly detection of industrial data streams can support operation and maintenance staff in identifying and locating potential equipment failures in a timely and accurate manner, avoiding serious faults and accidents.
Today, a complex data stream is common from Distributed Control System (DCS) and IoT devices in many industry fields, where multivariate data of infrastructures with complex correlations and time-varying characteristics are continuously arriving. The complexity brings an insurmountable challenge to anomaly detection, and the evolving data streams may make the preprocessing or trained models quickly outdated.
This paper concerns the computational method to deal with complex evolving data streams. Incremental learning-based anomaly detection provides an effective way to address the challenge. In particular, a competitive neural network has been widely used, as it is appropriate for an unsupervised setting that is natural for anomaly detection with rare labels. Existing state-of-the-art competitive neural network models, such as Growing Neural Gas (GNG) [1,2], Self-Organizing Neural Network (SONN) [3,4], Adaptive Resonance Theory (ART) [5], however, are designed for static offline data and cannot effectively cope with evolving data streams. There are a few competitive neural network-based methods proposed for time series anomaly detection [6,7,8]. However, they focus on learning the time-varying characteristic via adding adaptive learning strategies, ignoring the evaluation of the computational overhead of improved algorithms for evolving data streams.
The goal of this paper, thus, is to provide a novel framework for online anomaly detection that adaptively adopts the baseline GNG algorithm in an online setting, effectively dealing with the challenges of complex and evolving data streams.

2. Related Work

2.1. Incremental Learning

Undoubtedly, it is either too ineffective or too inefficient to handle a complex evolving data stream using a pre-trained fixed model or repeatedly creating a new model [9]. A common method is to build an initial model and incrementally update the model over the arriving data. Thus, Incremental Learning (IL) adapts a model to only the latest data, regardless of how data streams evolve. According to the different learning factors, the IL method can be categorized into three types, Sample Incremental Learning (SIL), Class Incremental Learning (CIL), and Feature Incremental Learning (FIL) [10]. SIL aims to continuously learn internal attributes to maintain model presentation ability for dynamic data streams and extract new knowledge to enhance model accuracy. CIL attempts to learn a new class of arriving data and add the new class into the historical class set so the classification performance can be improved. For the FIL method, such an algorithm is devoted to adding new features of evolving data to construct a new representation space to achieve the goal of improving classification accuracy. Due to the scarcity of anomalies in the anomaly detection tasks, it is difficult to define the boundary of normal and abnormal samples, and data labels are lacking for building a supervised model. Therefore, the CIL approach has difficulty performing well in anomaly detection of data streams. Additionally, excellent and distinguishable features in anomaly detection tasks are challenging to extract dynamically, especially in the modeling task of data streams [11]. Thus, SIL is better suited to constructing unsupervised models for the anomaly detection of evolving data streams.

2.2. Incremental Anomaly Detection

The time-varying characteristics of data streams require an anomaly detection model to have incremental learning ability. Existing state-of-the-art SIL-based anomaly detection approaches mainly include improved classical machine learning methods and competitive neural networks. Andrew [12] proposed a model integrating autoencoder and incremental clustering, providing a reliable machine learning-based monitoring method for electrical applications with varying power cycle patterns. Bigdeli [13] designed a two-layer cluster-based anomaly detection, which is fast, noise-resilient, and incremental, to lower the false alarm rate of real-time anomaly detection for dynamic data. Kaan [14] designed a two-stage filtering and hedging algorithm for sequential anomaly detection, where an incremental decision tree is used to construct a multimodal probability density function, and an adaptive thresholding scheme is used to detect anomalies. Furthermore, deep learning is also generally selected as the baseline algorithm for incremental anomaly detection, and its advantage is its spatiotemporal feature-catching ability. Nawaratne [15] proposes the incremental spatiotemporal learner for real-time video surveillance, and the learner is an unsupervised deep-learning module to continuously update and distinguish between new anomalies and normality over time. Agarwal [16] develops an LSTM-autoencoder-based incremental anomaly detection model to detect machine chatter, not only capturing changes in system dynamics over time but also incrementally improving the detection accuracy via transfer learning. The existing incremental anomaly detection approaches based on improved classical machine learning or deep learning algorithms always aim to design a continuous learning strategy for detected data via an offline setting, paying little attention to evolving data streams. Furthermore, many deep learning-based incremental anomaly detection models focus on learning temporal relationships in local sequences and incrementally updating the initial model, which is not suitable for handling arbitrarily evolving data streams. On the other hand, the optimization of computational overhead for deep anomaly detection models is another challenge for real-time streaming anomaly detection.
Competitive neural network-based incremental anomaly detection methods are always shallow neural networks, which can not dynamically learn the data stream characteristics and also have lower computation complexity. Typical algorithms of such methods are the Self-Organizing Incremental Neural Network (SOINN), Growing Neural Gas (GNG), Adaptive Resonance Theory (ART), and their variants. Fahn [17] designed a SOINN-based abnormal trajectory detection for efficient video condensation; the feature extraction module can compress the original video size to 10%, while the detection accuracy is maintained at 95%. Hu [18] developed a novel algorithm for fault diagnosis of redundant inertial measurement units. The method introduces SOINN and PCA to achieve high accuracy of tiny faults with low computational complexity. Based on the original GNG algorithm, Mahmoudabadi [19] integrates a fuzzy inference system with GNG for online anomaly detection of data streams; the selection trick of winning neurons in GNG is improved, and the algorithm shows better accuracy on public datasets over existing clustering models. Song [20] optimizes the adjustment of the learning rate and neuron addition and deletion strategies; the accuracy and computational efficiency of GNG are both improved in online anomaly detection. Therefore, in comparison with improved machine learning models, the competitive neural network-based incremental anomaly detection approach has the disadvantages of good precision and timeliness, which is more suitable for streaming anomaly detection. It is worth noting that promoting the ability to present complex data streams and optimize the adjustment strategy of the network structure is challenging. The motivation of this paper is to further optimize our work in [20].

2.3. Highlights

The main idea in this paper is to use the GNG algorithm as a baseline model, which can process evolving data streams via an adaptive learning mechanism. On the basis of our previous work in [20], we propose a novel framework (BOA-GNG) for online anomaly detection of data streams, and the contributions of this work are briefly summarized as follows.
1. An improved GNG algorithm is proposed to better model the complex and evolving data streams for online anomaly detection. Therein, a novel learning rate adjustment of GNG is designed to obtain better model convergence and stability; the neuron addition strategy is improved for better data representation ability.
2. In terms of hyperparameter settings for GNG, a Bayesian algorithm is introduced for fine-grained search of network hyperparameters instead of brute force search, optimizing the computational efficiency of online detection.
3. Extensive experiments on six data sets verify the superiority of BOA-GNG in terms of detection accuracy and computational efficiency, and further ablation studies are conducted to show the effectiveness of the improvement strategies.

3. Methodology

3.1. Algorithm of Original GNG

The base model of the proposed method is the GNG algorithm, and the theory and process of the GNG are described briefly. Then, the vital improvements for the original GNG are detailed and demonstrated in this section.
The original GNG, which combines the neural gas [21] and competitive Hebbian learning [22], is a topological graph. It aims to represent the characteristics of multivariate data by allowing the number of neurons to increase and taking into account the neighborhood relations of neurons. The learning process of the original GNG includes the adjustment of neuron parameters, adjustment of relationships between neurons, insertion of neurons, and deletion of neurons, of which the process is as follows in Algorithm 1, and the relevant parameters are shown in Table 1.
Algorithm 1: The original Growing Neural Gas (GNG)
1. 
Input: λ , ε 1 , ε 2 , a m a x ;
2. 
t → 0;
3. 
Initialize the network A with at least 2 neurons and C = ;
4. 
For each new instance x from the data stream:
5. 
t t + 1 ;
6. 
Find the two neurons closest to x from A , which are s and z , by using the Euclidean distance:
s = arg min n A x ω n , z = arg min n A / { s } x ω n ;
7. 
Update cumulative error of winning neuron s :   e s = e s + x ω s 2 ;
8. 
Increase the age of the edges associated with s ;
9. 
Update the weights of the winning neuron and the neurons connected to it:
ω s = ω s + ε 1 × ( x ω s ) ,   v n ( s ) : ω v = ω v + ε 2 × ( x ω v ) ;
10.
If s and z are connected by edges, reset the edge’s age to 0;
11.
Else link the s and z with an edge of age 0: c = c { ( s , z ) }
12.
Remove old edges with age > a m a x and neurons that become isolated;
13.
If t is multiple of λ , then let q = arg max n A e n and let f = arg max l n ( q ) e l
14.
Create a new neuron r between q and f , which: ω r = 0.5 × ( ω q + ω f ) ,   e r = 0.5 × e q
15.
Exponentially decrease the representation error of all neurons:   n A   :   e n = 0.9 × e n
As shown in the original GNG algorithm, a minimum number of neurons is initially created (lines 1~3), and then, new neurons and new neighborhood connections (edges) are added between them during learning, according to the input instances. For each new instance x from the data stream (lines 4~5), the two closest neurons s and z are found via Euclidean distance (line 6). A local representation error e s is increased for the winning neuron (line 7), and the age of the edges connected to this neuron is updated (line 8). The winning neuron s and the neighboring neurons (linked to s by an edge) are adapted according to the learning rate ε 1 and ε 2 (line 9). Moreover, the two neurons s and z are linked by a new edge (age is 0) (lines 10~11). When the edges reach a maximum age a m a x without being reset, they will be deleted. If any neuron belonging to the graph becomes isolated, it is also deleted (line 12). With the input of data streams, the graph periodically creates a new neuron between the two neighboring neurons that have accumulated the largest representation error (lines 13~14). Finally, the representation error of all neurons is subject to an exponential decay (line 15) in order to emphasize the importance of the recently measured errors.
As stated above, the GNG model can dynamically represent the distribution characteristics of streaming data, as shown in Figure 1. However, it also suffers from several drawbacks. First, the original GNG organizes neurons to represent the streaming data using fixed learning rates, affecting the model convergence and stability. Second, inserting new neurons into the model every k step cannot guarantee the necessity of the new neurons, and it may also result in model redundancy. Third, the hyperparameter optimization of GNG via grid search is time-consuming and imprecise, which makes it difficult to obtain the optimal solution.
Several variants of GNG have been proposed to solve the previous problems, such as GNG-U [23], GWR [24], and Online GNG [25]. GNG-U defines a utility measure that removes neurons located in low-density regions and inserts them in regions of high density. However, we may need to create new neurons without the necessity of removing others. GWR judges whether new neurons need to be added according to the activity and firing thresholds of the winning neurons, but it does not consider optimizing the learning rate. The model of Online GNG can estimate the learning efficiency and adjust the network size to automatically fit the changing data space. However, neighbor-related strategies treat all neurons equally, which may result in some common parameters amplified by the sparsest or densest. In this paper, a novel GNG-based anomaly detection method of data streaming is proposed on the basis of our previous work in [20], and detailed improvements are presented as the description of highlights in Section 2.3.

3.2. Insertion Strategy of New Neurons

Representing a large amount of data with a few neurons is an essential idea of GNG models. As described in Algorithm 1, the original GNG creates a new neuron periodically, the strategy of which cannot adapt to a sudden change in the distribution of streaming data with a new data topology. If there has been no new pattern for a long time, the algorithm will create a lot of redundant neurons. On the other hand, a threshold μ is defined in our previous work (GNG-I) to determine whether an input is distinctive enough to insert new neurons, and thus, the fixed threshold μ may limit its ability to adapt to dynamically changing data streams. In this paper, we further optimized the threshold-setting strategy.
When a new instance x of the data stream is coming, we find the winner s by following line 6 of Algorithm 1. If s has directly connected neighbors, then the new threshold T s is calculated using the maximum distance between s and its neighbors following Formula (1).
T s = max ω s ω n n n ( s )
where ω s is the weight of s , ω n is the weight of n , n ( s ) is the set of neurons that connect to the neuron s , and ω s ω n is the Euclidean distance between neuron s and neuron n . If s has no neighbor, the threshold T s is calculated using the minimum distance between s and other neurons:
T s = min ω s ω n n A \ { s }
where A is the set of all neurons. During the learning process, if the distance between a new instance x and its winner neuron s is larger than the threshold, the algorithm adaptively inserts new neurons to support the new topology. The original GNG inserts the new neuron between the two neurons with the largest and second-largest cumulative error (line 14 of Algorithm 1), overlooking the role of the new neuron. However, the new neuron describes a new inputting pattern for the model. Therefore, the insertion location in this paper is between x and s , taking into account both the winner neuron and the new neuron, and the weights of the new neuron can be calculated as follows.
ω i = 0.5 × ( ω s + x )
The purpose of optimizing the neuron insertion is to improve the model’s adaptability in online anomaly detection scenarios for streaming data. When the data distribution is stationary, fewer neurons are created, and it is necessary to create neurons in a timely manner in order to handle changes.

3.3. Adjustment Strategy of Learning Rate

As shown in Table 1, to learn the inputting pattern, the winner and its neighbors will change their weights by multiplying fixed rates ε 1 and ε 2 . Previous works [26,27] indicate that using a dynamic learning rate is helpful for modeling streaming data. In this paper, the adaptive learning rate is designed, and the calculation equation can be calculated as follows.
ε 1 = e x ω s c s × T s ε 2 = e x ω n 10 ( c n + 1 ) × T n , n n ( s )
where s is the winner neuron and n is the neighbor neuron of s , n ( s ) is the set of neurons that connect to the neuron s , x ω i is the Euclidean distance between instance x and neuron i , c i is the winning times of the neuron i , T i has been defined in Equations (1) and (2). There are two main advantages of such an adaptive learning rate.
First, neurons with more winning times are considered to be more important, and these relatively vital neurons only need small adjustments after continuous incremental learning. The relatively larger learning rate enables quick updates to the weights at the beginning, while the gradual attenuation of the learning rate enables the neurons with higher winning frequency to converge gradually at the end of model training.
Second, the adaptive learning rate enables the model to adjust its neurons at appropriate step sizes. For example, if the distance between the input and its winner is large, the winner should adjust the weights fast. Otherwise, if the winner is mature by winning enough patterns, this acceleration can be decreased.

3.4. Optimization of Hyperparameters

The choice of GNG-based hyperparameters is very important for its performance. In this paper, the Bayesian optimization method is adopted to select optimal hyperparameters for the proposed BOA-GNG model. Bayesian optimization, which mainly consists of the probabilistic surrogate model and acquisition function, is an efficient global optimization algorithm [28,29]. Its objective function is designed as follows.
f p ( x ) = 1 N v x v i = 1               N v               e x ω x
where p is the set of parameters to be optimized, v is the validation set, x is the instance from validation set, N v is the total number of validation set, and ω x is the neuron nearest to x . We hope to find the optimal parameter p , which minimizes the average distance from x to the existing neurons and maximizes the value of the objective function at the same time. Taking the objective function as the optimization goal, the process of Bayesian optimization can be described below.
1. Initialization: Initialize the search scope of parameter set p . Start by randomly selecting a small set of initial sampling points from the search space. Evaluate the objective function to obtain initial observations.
2. Build a Probabilistic Surrogate Model: Based on the initial observations, construct a probabilistic surrogate model (defined as a Gaussian process in this paper) that approximates the objective function. This model captures both the mean prediction and the uncertainty of the objective function.
3. Optimize the Acquisition Function: Define an acquisition function (probability of improvement is selected as the acquisition function in this paper) that quantifies the utility of sampling at a given point based on the current surrogate model. Optimize this acquisition function to find the next point to sample.
4. Sample the Next Point: Evaluate the objective function at the point identified by optimizing the acquisition function. This new observation is used to update the surrogate model.
5. Iterate: Repeat steps 3 and 4, continuously updating the surrogate model and optimizing the acquisition function until a stopping criterion is met (e.g., a maximum number of iterations is reached or the objective function value improves below a certain threshold).
6. Return the Best Solution: At the end of the optimization process, return the point in the search space that corresponds to the best objective function value observed.
By balancing exploration and exploitation, it can efficiently navigate the search space to find high-quality solutions.

4. BOA-GNG-Based Anomaly Detection of Streaming Data

Anomaly detection methods learn a model from a reference set of regular (or normal) data and classify the unexpected data as irregular (or abnormal) [30]. However, if the reference data comes as a stream and its distribution is subject to change over time, the model trained over historical data may lose efficacy as the data distribution of the streaming data is changing. The proposed BOA-GNG can adapt to evolving data streams, and the representational topology structure is adaptively changing during the online anomaly detection of streaming data. In the detecting process, the distance-based method is used to estimate the anomaly state of input data, and the threshold T d can be calculated as follows.
T d = 1 N C ( ω i , ω j ) C ( ω i , ω j )
where N C is the current number of edges in the BOA-GNG model, and ( ω i , ω j ) is an edge connecting neurons. C is the set of edge-connecting neurons, ω i , ω j is the Euclidean distance of ( ω i , ω j ) . Manually choosing a convenient value for the decision parameter T d is hard because it not only depends on the dataset but also on the number of neurons in the model, which varies over time. Therefore, we heuristically set T d equal to the expected distance between neighboring neurons in the model. In other words, T d at any time is defined as the average length of edges at that time.
Obviously, each neuron in the BOA-GNG model can be considered as the center of a hyper-sphere, and the model covers the space that represents regular data at any time. If a new instance x from the data streams comes in, and the distance between it and the winner neuron is larger than T d , it means that such data are not part of the existing topology of the model, and x is considered an anomaly. The anomaly judge rule is defined as follows.
x ω s s A > T d
where A the set of all neurons, x is the current inputting data, ω s is the weight of the winning neuron s , and x ω s is the Euclidean distance between x and s . T d is calculated using Equation (6). As in Equation (7), the Euclidean distance of each x and its winner neuron is calculated for online anomaly detection. The whole process of the BOA-GNG can be described as the pseudocode in Algorithms 2 and 3, and the variants of BOA-GNG are shown in Table 2.
Algorithm 2: The learning process of BOA-GNG
1. 
 Input: N m , N w , a m a x ;
2. 
  t → 0;
3. 
 Initialize the network A with at least 2 neurons and c i = 0 ( i = 1 , 2 ) ,   C = ;
4. 
 For each new instance x from the data stream;
5. 
  t t + 1 ;
6. 
 Find the two neurons closest to x from A , which are s and z , by using the Euclidean distance:
s = arg min n A x ω n ,   z = arg min n A / { s } x ω n ;
7. 
 Update the winning times of neuron s :   c s = c s + 1 ;
8. 
 Increase the age of the edges associated with s ;
9. 
 Calculate the learning rate ε 1 and ε 2 by Equation (4);
10.
Update the weights of the winning neuron and the neurons connected to it:
ω s = ω s + ε 1 × ( x ω s ) ,   v n ( s ) : ω v = ω v + ε 2 × ( x ω v ) ;
11.
If s and z are connected by edges, reset the edge’s age to 0;
12.
Else link the s and z with an edge of age 0:
c = c { ( s , z ) }
13.
Remove old edges with age >   a m a x ;
14.
If the total number of neurons larger than N m :
Delete the neurons, which become isolated and c < N w ;
15
If x ω s > N w , then create a new neuron r between s and x , which: ω r = 0.5 × ( ω s + x )
Algorithm 3: The hyperparameter optimization of BOA-GNG via Bayesian
  • Initialize p , T _ b e s t as a random value, f p to negative infinity and D = ;
  • Select the next set of hyperparameters based on the G P :
T _ n e x t = G P . ( T _ b e s t , f p )
3.
Configure GNG proposed with the new hyperparameters and train the model:
BOA-GNG. train (train set);
4.
Evaluate the performance of f p _ n e x t as describe in Equation (5) by using the validation set;
5.
Update D = ( T _ n e x t , f p _ n e x t )
6.
G P . update ( D )
7.
Update the current T _ b e s t and f p , if f p _ n e x t > f p :
T _ b e s t = T _ n e x t , f p = f p _ n e x t
8.
Repeat steps 2~7 until the maximum number of iterations or f p stabilizes;
9.
Output f p and the BOA-GNG model.

5. Validation of the Proposed Method

In this section, we validate the proposed method. Firstly, we describe the experimental datasets. Secondly, the evaluation indicators of the experiments are introduced. Thirdly, the effect of improvements is discussed. Finally, comparison and ablation studies are presented, and the results are discussed.

5.1. Datasets

The experimental datasets include five publicly available datasets and one real engineering dataset from the Payload of China’s aerospace satellite. Table 3 gives a brief summary of the six datasets.
Shuttle dataset: This dataset was utilized to delineate the position of radiators on NASA space shuttles, primarily for classification purposes. Initially comprising 58,000 samples, 80% were in the first category. The version used in this study is post-processed data, where samples belonging to the first category are considered normal, while other categories are considered anomalous. In the experiments, the test set consisted of 1778 samples evenly distributed between normal and anomalous data.
KDD-CUP99 HTTP dataset: This dataset was collected over a period of 9 weeks from a simulated Air Force network and includes network connection and system audit data, which are commonly used to validate the performance of intrusion detection algorithms. To meet unsupervised or semi-supervised requirements, this study ultimately adopted a simplified version of KDD99, known as KDD-CUP99 HTTP. This subset used only HTTP traffic data from the original dataset, which consists of 620,000 samples, including 1053 anomalous samples, accounting for 0.17%. In the experiment, 80,000 normal samples were used for training, and 2100 samples were used for testing.
Satellite Dataset: This dataset was generated by the Australian Centre for Remote Sensing from NASA data. This dataset contains 36 parameters and serves as a multi-class classification dataset. Throughout the experiment, all normal data were considered the positive class, while all anomalous data were considered the negative class. A selection of 4100 normal samples was used for training, and 1000 samples were reserved for testing, of which 925 were normal and 75 were anomalous.
SMAP dataset: This dataset was obtained from a NASA spacecraft and contains real-world data. It consists of 55 telemetry channels, 429,735 telemetry values, and 69 anomalous sequences. Anomalies within the dataset are divided into two categories: point anomalies and contextual anomalies. For the experiment, 70,000 normal samples from channels A1 to A9 were selected for training, and 7000 samples were selected for testing, with 10% of the testing samples being anomalous.
MSL dataset: This dataset is a real-world dataset derived from NASA’s spacecraft via the Mars Science Laboratory (MSL). It comprises anomalous data stemming from Incident, Surprise, and Anomaly (ISA) reports from a spacecraft monitoring system. The data set consists of 27 telemetry channels, 66,709 telemetry values, and 36 anomalous sequences. To perform the experiments, 15,000 normal samples were selected for training, and 1500 samples were selected for testing, with 10% of the testing samples being anomalous.
Payload Dataset: This dataset consists of operation data collected from the Payload of China’s aerospace satellite. It includes variables such as current, voltage, temperature, and command parameters, totaling 66 dimensions. To represent a typical anomaly of a complex mode type during its operational phase, a subset of 96,662 samples was extracted following preprocessing steps, including noise reduction. Upon analyzing the data samples, a change in current was observed during abnormal operational stages, although it remained within the valid threshold values. Figure 2 shows that under normal conditions, the payload operates with a current of approximately 1.44 A within the South Atlantic Anomaly (SAA) and around 1.33 A outside the SAA. However, an anomaly occurred with a current surge to approximately 1.49 A. To reduce the influence of significant variations in current dimensionality during experimentation, we reduced the dataset to retain the remaining 65 dimensions and then selected 80,000 normal samples as the training set and the remaining 16,662 samples as the test set. The test set comprised 3083 anomalous samples and 13,580 normal samples.

5.2. Performance Metrics

Some metrics were chosen to evaluate the performance of the proposed method during the experiments in this section, including precision p , recall r , F1 score, and the processing speed v .
p = T P T P + F P
r = T P T P + F N
f = 2 × p × r p + r
v = N d t d
where T P (True Positive) indicates the number of correctly detected anomalies, F P (False Positive) is the number of incorrectly detected anomalies, F N (False Negative) denotes the number of missed detected anomalies, N d is the total number of samples, and t d is the time used to process the samples.

5.3. The Improvement Effect of BOA-GNG

In this section, we demonstrate the detailed effect of improvement for BOA-GNG. As described in Section 3.3, BOA-GNG adopts the adaptive learning rate of neurons, which is related to the Euclidean distance of the input data and the winner neuron, the maximum Euclidean distance between the winner neuron and its neighbors, and the winning times of each neuron. Figure 3a indicates the changing curve of the learning rate and neuron winning times of one neuron on the Payload dataset. Although the adaptive learning rate fluctuates due to the calculated Euclidean distance of the input data and winning neuron during the learning process, we can see that the improved learning rate adjustment strategy can obtain a faster convergence speed.
Here, we discuss the detailed hyperparameter optimization of BOA-GNG based on the Payload dataset. The hyperparameters of the proposed method described in line 1 of Table 3 are the maximum edge’s age a m a x , the maximum of the neurons N m , and the threshold of isolated neuron deletion N w . The parameter set p = ( a m a x , N m , N w ) , and the objective function is defined in Equation (5). We selected samples from the training set at a ratio of 4:1 as the validation set, randomly selected 320 sets of p , and then calculated the value of the objective function. In order to determine the appropriate search scope of the parameter set p , the control variable method was used to analyze the relations of optimization loss and hyperparameters. As shown in Figure 3b, as the value of a m a x is greater than 120, the value of the objective function decreases. If the value of N m is greater than 260, the value of the objective function remains almost unchanged. The value of the objective function decreases when the value of N w is greater than 150. Therefore, initialize the search scope of the parameter set p is as follows:
p = { a m a x [ 20 , 120 ] ; N m [ 20 , 260 ] ; N w [ 20 , 150 ] }
On the basis of the range setting of hyperparameters, the Bayesian optimization steps in Algorithm 3 are conducted to search the optimal parameter setting of BOA-GNG, and the best parameter combination p is obtained, where p = ( 32 , 160 , 88 ) . Figure 4 shows the iteration loss of BOA-GNG under different hyperparameter settings via grid search and Bayesian optimization, which demonstrates that the latter can obtain a better convergence value with a faster convergence speed.

5.4. Comparison Experiments

In the comparison experiments, the proposed model was compared with the original GNG, GWR, GNG-I, SOIIN, and K-Means. The experiments were conducted on a computer equipped with an Intel(R) Core (TM) i7-10700 CPU at 2.90 GHz and 16 GB of RAM. According to the results of the test set, the values of the corresponding evaluation criteria are shown in Table 4. For each dataset, the result of the best-performing method is highlighted in bold, and we can come to the conclusion that BOA-GNG achieves good results on all datasets.
Given the sequence nature of streaming data, divide the test set into three equal parts in order (part 1, part 2, part 3). To simulate the continuous arrival of streaming data, define that Stream 1 includes part 1, Stream 2 includes part 1 and part 2, and Stream 3 includes part 1, part 2, and part 3. For each dataset, the results of the rolling test of BOA-GNG are shown in Table 5. As the data flow continues to expand, the values of the corresponding evaluation criteria of BOA-GNG remain at a relatively high level.
For online abnormal detection scenarios, computational efficiency is important as it is the high accuracy of anomaly detection, which includes learning time and detecting time for incremental learning techniques. As a result, the experiments compare the computational efficiency of BOA-GNG and other GNG-based models. The results are shown in Table 6, and the bold fonts denote the highest computational efficiency of the methods on each dataset. v 1 , v 2 , respectively, represent the average velocity of the learning and detecting processes. We show that BOA-GNG outperforms other models on most datasets in terms of learning and velocity detection.

5.5. Ablation Experiments

Ablation experiments are conducted to verify the effectiveness of the optimization strategy in Section 3.2 and Section 3.3 We define the model with a fixed learning rate as BOA-GNGFLR, with StepLR as BOA-GNGSLR, and with the fixed insertion step size as BOA-GNGFIS. The results of anomaly detection performance are shown in Table 7 and the result of the best-performing model is highlighted in bold. It can be seen that the adaptive learning rate and new insertion strategy have a positive effect on improving the GNG-based anomaly detection performance.

6. Conclusions

In this paper, we propose a novel incremental learning model named BOA-GNG for online anomaly detection of streaming data. The proposed approach adopts our previous model [22] as a baseline model, of which the learning rate, neuron insertion strategy, and network optimization strategy are made for better dynamic data learning ability and online detection performance.
We demonstrate that the adaptive learning rate can obtain faster convergence ability compared to the previous linear learning rate. Then, the new neuron insertion method can improve the model’s adaptability for evolving data streams and make it more flexible to adapt to changes in data distributions. Finally, Bayesian optimization is introduced for fast and fine-grained hyperparameter setting instead of grid search.
Five open datasets and a real engineering dataset from the aerospace satellite are taken as experimental cases to verify the effectiveness and superiority of the proposed model. Results indicate that BOA-GNG can improve precision, recall, F1 score, and computing efficiency of online anomaly detection compared with classical GNG-based models, and both the precision and recall of BOA-GNG reach more than 95% on six datasets. Additionally, the lowest learning and detecting velocity can reach nearly 500 dot/s and 4000 dot/s. Further studies demonstrate that an adaptive learning rate and new neuron insertion strategy are effective in improving the baseline model.

Author Contributions

Methodology, J.Z., L.S. and S.G.; writing—original draft, J.Z. and L.S.; software, S.G. and M.L.; supervision, X.L. and L.G.; investigation, C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Prospective Foundation of Technology and Engineering Center for Space utilization, Chinese Academy of Sciences of funder grant number No. T303271.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study can be obtained by contacting [email protected].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fritzke, B. A growing neural gas network learns topologies. Adv. Neural Inf. Process. Syst. 1995, 7, 625–632. [Google Scholar]
  2. Frezza-Buet, H. Following non-stationary distributions by controlling the vector quantization accuracy of a growing neural gas network. Neurocomputing 2008, 71, 1191–1202. [Google Scholar] [CrossRef]
  3. Xiang, Z.; Zhu, J. Network anomaly detection with improved self-organizing incremental neural network. Comput. Eng. Appl. 2014, 50, 88–91. [Google Scholar]
  4. Ren, H.; Guo, C.; Yang, R.; Wang, S. Fault diagnosis of electric rudder based on self-organizing differential hybrid biogeography algorithm optimized neural network. Measurement 2023, 208, 112355. [Google Scholar] [CrossRef]
  5. Masuyama, N.; Amako, N.; Yamada, Y.; Nojima, Y.; Ishibuchi, H. Adaptive resonance theory-based topological clustering with a divisive hierarchical structure capable of continual learning. IEEE Access 2022, 10, 68042–68056. [Google Scholar] [CrossRef]
  6. Zheng, S.; Lan, F.; Castellani, M. A competitive learning scheme for deep neural network pattern classifier training. Appl. Soft Comput. 2023, 146, 110662. [Google Scholar] [CrossRef]
  7. Wang, X.; Wang, J.; Zhang, Y.; Du, Y. Analysis of local macroeconomic early-warning model based on competitive neural network. J. Math. 2022, 2022, 7880652. [Google Scholar] [CrossRef]
  8. Vanguri, N.; Pazhanirajan, S.; Kumar, T. Competitive feedback particle swarm optimization enabled deep recurrent neural network with technical indicators for forecasting stock trends. Int. J. Intell. Robot. Appl. 2023, 7, 385–405. [Google Scholar] [CrossRef]
  9. Yoon, S.; Lee, J.; Lee, B. Ultrafast local outlier detection from a data stream with stationary region skipping. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1181–1191. [Google Scholar]
  10. Van, D.; Tinne, T.; Tolias, A. Three types of incremental learning. Nat. Mach. Intell. 2022, 4, 1185–1197. [Google Scholar]
  11. Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. (CSUR) 2021, 54, 38. [Google Scholar] [CrossRef]
  12. Andrew, C.; Syed, A.; Des, M. Autoencoder and incremental clustering-enabled anomaly detection. Electronics 2023, 12, 1970. [Google Scholar] [CrossRef]
  13. Bigdeli, E.; Mohammadi, M.; Raahemi, B.; Matwin, S. Incremental anomaly detection using two-layer cluster-based structure. Inf. Sci. Int. J. 2018, 429, 315–331. [Google Scholar] [CrossRef]
  14. Gokcesu, K.; Neyshabouri, M.; Gokcesu, H.; Kozat, S.S. Sequential outlier detection based on incremental decision trees. IEEE Trans. Signal Process. 2018, 67, 993–1005. [Google Scholar] [CrossRef]
  15. Nawaratne, R.; Alahakoon, D.; Silva, D.; Yu, X. Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans. Ind. Inform. 2020, 16, 393–402. [Google Scholar] [CrossRef]
  16. Agarwal, R.; Nagpal, T.; Roy, D. A Novel Anomaly Detection for Streaming Data using LSTM Autoencoders. Int. J. Recent Technol. Eng. 2021, 10, 233–241. [Google Scholar]
  17. Fahn, C.; Kao, C.; Wu, M.; Chueh, H.E. SOINN-based abnormal trajectory detection for efficient video condensation. Comput. Syst. Sci. Eng. 2022, 42, 451–463. [Google Scholar] [CrossRef]
  18. Hu, X.; Zhang, X.; Peng, X.; Yang, D. A novel algorithm for the fault diagnosis of a redundant inertial measurement unit. IEEE Access 2020, 8, 46080–46091. [Google Scholar] [CrossRef]
  19. Mahmoudabadi, A.; Rafsanjani, M.; Javidi, M. Online one pass clustering of data streams based on growing neural gas and fuzzy inference systems. Expert Syst. 2021, 38, e12736. [Google Scholar] [CrossRef]
  20. Song, L.; Zheng, T.; Wang, J.; Guo, L. An improvement growing neural gas method online anomaly detection of aerospace payloads. Soft Comput. 2020, 24, 11393–11405. [Google Scholar] [CrossRef]
  21. Martinetz, T.; Berkovich, S.; Schulten, K. Neural-gas network for vector quantization and its application to timeseries prediction. IEEE Trans. Neural Netw. 1993, 4, 558–569. [Google Scholar] [CrossRef]
  22. Hebb, D.O. The Organization of Behavior; Wiley: New York, NY, USA, 1988; pp. 43–54. [Google Scholar]
  23. Fritzke, B. A self-organizing network that can follow nonstationary distributions. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 1997; pp. 613–618. [Google Scholar]
  24. Marsland, S.; Shapiro, J.; Nehmzow, U. A self-organizing network that grows when required. Neural Netw. 2002, 15, 1041–1058. [Google Scholar] [CrossRef] [PubMed]
  25. Sun, Q.; Liu, H.; Harada, T. Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recognit. 2017, 64, 187–201. [Google Scholar] [CrossRef]
  26. Mohamed-Rafik, B.; Slawomir, N.; Payberah, A. An adaptive algorithm for anomaly and novelty detection in evolving data streams. Data Min. Knowl. Discov. 2018, 32, 1597–1633. [Google Scholar]
  27. Zhang, Q.; Wu, H.; Tao, J.; Ding, W.; Zhang, J.; Li, J. Fault Diagnosis of Rolling Bearing Based on CNN with Attention Mechanism and Dynamic Learning Rate. In Proceedings of the 2021 International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), Nanjing, China, 21–23 October 2021; pp. 1–7. [Google Scholar]
  28. Liu, X.; Ma, T.; Gao, W.; Zhu, X.; Wen, Y.; Pan, W. Outlier Detection Using Machine Learning Algorithms Integrated with Bayesian Optimization. In Proceedings of the 2022 International Conference on Algorithms, Data Mining, and Information Technology (ADMIT), Xi’an, China, 23–25 September 2022; pp. 160–165. [Google Scholar]
  29. Zhou, A.; Zhu, Q.; Zhang, J.; Meng, K. Ship Intrusion Detection Technology Based on Bayesian Optimization Algorithm and XGBoost. In Proceedings of the 2023 3rd International Conference on Electrical Engineering and Control Science (IC2ECS), Hangzhou, China, 29–31 December 2023; pp. 1647–1652. [Google Scholar]
  30. Sarhan, M.; Kulatilleke, G.; Lo, W.W.; Layeghy, S.; Portmann, M. DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly Detection. In Proceedings of the 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), Bangalore, India, 1–4 May 2023; pp. 1–7. [Google Scholar]
Figure 1. The dynamic topology learning of Growing Neural Gas (GNG) algorithm. The blue bullets are neurons of GNG. The red bullet represents newly inserted neuron as the topology grows.
Figure 1. The dynamic topology learning of Growing Neural Gas (GNG) algorithm. The blue bullets are neurons of GNG. The red bullet represents newly inserted neuron as the topology grows.
Applsci 14 04139 g001
Figure 2. The current curve before and after fault.
Figure 2. The current curve before and after fault.
Applsci 14 04139 g002
Figure 3. (a) The changing curve of learning rate and neuron winning times. (b) Value range setting of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) hyperparameters.
Figure 3. (a) The changing curve of learning rate and neuron winning times. (b) Value range setting of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) hyperparameters.
Applsci 14 04139 g003
Figure 4. The iteration loss of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) under different hyperparameter settings.
Figure 4. The iteration loss of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) under different hyperparameter settings.
Applsci 14 04139 g004
Table 1. Parameters of the Growing Neural Gas (GNG) algorithm.
Table 1. Parameters of the Growing Neural Gas (GNG) algorithm.
ParameterParameter Meaning
A The set of all neurons
C The set of edge-connecting neurons
ω The weight of neuron
ε 1 The learning rate of winning neuron
ε 2 The learning rate of neuron in the neighborhood of winning neuron
e The cumulative error of neuron
n ( s ) The set of neurons that connect to the neuron s
a max The maximum of edge’s age
t The current time
λ The step size of inserting neurons
Table 2. The notations of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG).
Table 2. The notations of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG).
ParametersThe Meaning of the Parameters
N m Limitations of the number of nodes
N w The threshold for the deletion of isolated neurons
c i The winning times of neuron i
G P Gaussian process
p The search scope of hyperparameters
T _ b e s t The optimal hyperparameters set
T _ n e x t The process hyperparameters set
f p The optimal objective function
f p _ n e x t The process objective function
D The set of ( T _ n e x t , f p _ n e x t )
Table 3. Summary of the six experimental datasets.
Table 3. Summary of the six experimental datasets.
DatasetTrain Set SamplesTest Set SamplesDimension
Shuttle44,68617789
KDD-CUP99 HTTP80,000210027
Satellite4100100036
SMAP70,000700026
MSL15,000150056
Payload Dataset80,00016,66265
Table 4. The comparison results of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) and other methods on six datasets.
Table 4. The comparison results of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) and other methods on six datasets.
DatasetsCriteriaGNG [1]GWR [24]GNG-I [20]SOINN [17]K-MeansProposed Method
Shuttle datasetp0.97620.99760.99880.98360.99760.9988
r0.98180.96130.96010.96920.96120.9901
f0.97900.97910.97910.97000.97910.9944
HTTP datasetp0.99180.80720.99440.71000.84000.9959
r0.99900.76080.99790.91890.71970.9990
f0.99540.78330.99620.80110.77520.9972
Satellite datasetp0.97890.95970.96870.98000.87340.9802
r0.95140.90050.97080.79350.50700.9632
f0.96490.92920.96980.87690.64160.9716
SMAP datasetp0.98520.79241.00000.84830.81601.0000
r0.99120.96830.99391.00000.83940.9942
f0.98770.87160.99690.91790.82750.9971
MSL datasetp0.86621.00000.86620.60330.92001.0000
r0.97060.95290.97060.95920.99180.9807
f0.97690.97510.97690.74070.95450.9902
Payload datasetp0.89070.81490.95710.81500.99960.9997
r0.75940.99960.99871.00000.77440.9949
f0.81980.89790.97750.89810.87270.9973
Table 5. The results of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) on the rolling tests.
Table 5. The results of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) on the rolling tests.
DatasetsData Streamprf
Shuttle datasetStream 11.00001.00001.0000
Stream 21.00001.00001.0000
Stream 31.00000.99780.9989
HTTP datasetStream 11.00001.00001.0000
Stream 21.00000.91040.9531
Stream 31.00000.95380.9764
Satellite datasetStream 11.00000.99700.9985
Stream 21.00000.99550.9977
Stream 30.96130.99460.9777
SMAP datasetStream 11.00000.98580.9928
Stream 21.00000.98580.9928
Stream 31.00000.98460.9922
MSL datasetStream 11.00000.99110.9955
Stream 21.00000.96730.9834
Stream 31.00000.96730.9834
Payload datasetStream 11.00000.99840.9992
Stream 21.00000.99400.9970
Stream 30.99730.99460.9959
Table 6. Comparison of online learning rates on each dataset.
Table 6. Comparison of online learning rates on each dataset.
DatasetsGNG [1]GWR [24]GNG-I [20]Proposed Method
v1
(dot/s)
v2
(dot/s)
v1
(dot/s)
v2
(dot/s)
v1
(dot/s)
v2
(dot/s)
v1
(dot/s)
v2
(dot/s)
Shuttle497.4019,772.57357.4915,303.423796.60186,191.673822.58203,118.18
KDD-CUP99453.2621,276.601079.776467.26479.4722,727.27459.6122,222.22
Satellite2204.309761.90334.973504.271822.229318.182469.8812,058.82
SMAP2987.624888.26315.713564.152687.145728.313111.115988.02
MSL2504.174213.48304.633170.532008.034065.042369.674043.13
Payload dataset276.722002.50143.631714.53482.834212.74535.984364.43
Table 7. Results of ablation experiments on the payload dataset.
Table 7. Results of ablation experiments on the payload dataset.
Algorithmsprf
BOA-GNGFLR0.81590.99510.8966
BOA-GNGSLR0.86670.99510.9264
BOA-GNGFIS0.99970.98340.9915
BOA-GNG0.99970.99490.9973
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Guo, L.; Gao, S.; Li, M.; Hao, C.; Li, X.; Song, L. Bayes-Optimized Adaptive Growing Neural Gas Method for Online Anomaly Detection of Industrial Streaming Data. Appl. Sci. 2024, 14, 4139. https://doi.org/10.3390/app14104139

AMA Style

Zhang J, Guo L, Gao S, Li M, Hao C, Li X, Song L. Bayes-Optimized Adaptive Growing Neural Gas Method for Online Anomaly Detection of Industrial Streaming Data. Applied Sciences. 2024; 14(10):4139. https://doi.org/10.3390/app14104139

Chicago/Turabian Style

Zhang, Jian, Lili Guo, Song Gao, Mingwei Li, Chuanzhu Hao, Xuzhi Li, and Lei Song. 2024. "Bayes-Optimized Adaptive Growing Neural Gas Method for Online Anomaly Detection of Industrial Streaming Data" Applied Sciences 14, no. 10: 4139. https://doi.org/10.3390/app14104139

APA Style

Zhang, J., Guo, L., Gao, S., Li, M., Hao, C., Li, X., & Song, L. (2024). Bayes-Optimized Adaptive Growing Neural Gas Method for Online Anomaly Detection of Industrial Streaming Data. Applied Sciences, 14(10), 4139. https://doi.org/10.3390/app14104139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop