Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A Survey
Abstract
:1. Introduction
1.1. Literature Review Strategy
1.2. Structure of This Paper
1.3. Contributions
- A summary of the key challenges related to OD in streaming data with a focus on data processing approaches, computational complexity, and the optimal types of detectable outliers. This will facilitate the choice of the best method for specific detection problems based on a particular set of assumptions.
- An overview of the advantages and disadvantages of OD methods for streaming data. The methods are organized into broader categories (statistics, ML, and DL) to facilitate the selection of the most appropriate method depending on the detection challenges, i.e., the available data, sample size requirements, the method’s capability to process the temporal changes in the data, and its computational complexity.
- A summary of the applications of OD methods in the telecommunications industry and other sectors is given, as well as datasets commonly used for training, testing, and benchmarking in the literature. This will aid researchers in identifying the right data sources for developing and testing new OD methods in different fields.
2. Basics of Streaming Data, Outlier Detection, and Related Surveys
2.1. Streaming Data and Processing
- Time-based windowing—timeslots to delimit windows and data that arrive within a particular timeframe is then considered as a complete subset of the data stream. This subset is processed through analytics and aggregations performed within its boundaries. Given any data stream generating data over time , the stream can be divided into time windows of time length starting from the time The window size is generally fixed and user-defined in most of the literature based on empirical data, but it can also be dynamic via a systematic approach. This mechanism allows precise control over time intervals and is suitable for time-sensitive data, but it is not adaptable to variable data rates and requires time synchronization [12,13,14].
- Count-based windowing—this mechanism is based on the number of observations received rather than the duration of the windows. The number of observations is set as the count threshold based on empirical data and a window is considered completed every time the number of observations since the last window reaches the user-defined threshold [15,16]. This approach is suitable for a data stream with a known pattern and from which analysis is sensitive to the sample size. This mechanism is simple to implement and is effective for streams with constant event rates but is inflexible in handling time variability. It is also unsuitable for time-sensitive analysis. It can be used for detecting peaks or for frequency estimation in a data stream.
- Landmark-based windowing—In this case, data drive the way the window is selected. This method is used, for instance, in the field of network data analysis for security [17], where streams are segmented by flow and analyzed for intrusion detection. Traffic flows are identified within communication sessions and used as complete sliding windows. The advantages of this mechanism are that it allows for flexible start points and is suitable for event-triggered analysis. However, it has increased complexity in managing dynamic window boundaries and its window size is potentially unbounded.
- A sliding window [18,19,20] is used to delete all data and refresh the window slide with the most recent data, for which window size is fixed, assuming that all slides have equal importance. It can be time-based or count-based.
- ○
- A special type of count-based sliding window is the eviction window for which a fixed number of data points overlap between the slides and observations are refreshed based on an eviction policy.
- A landmark window [23], also known as the time fading window approach, considers data between a fixed point in time in the future called landmark and the current time.
- ○
- Session windows are special types of landmarks for which the boundaries of the window are defined by the session start and end points.
- A tumbling window, for which the slide distance d is equal to its size s, indicative of non-overlapping consecutive windows, . This presents the advantage of performing a small cross-sectional analysis on each window.
- Hopping using a fixed window size and a fixed-length step between windows. When the jump length is shorter than the window size, an overlap is maintained between windows; otherwise, there will be gaps between windows and potentially missed data points.
- ○
- Boundary: indicating if the windows’ starting and ending points are fixed or variable.
- ○
- Overlap: indicating if a data overlap is observed while moving from one window to another.
- ○
- Number of passes: indicating if an observation is likely to be maintained across multiple windows.
- ○
- Sequence tracking: indicating if the window mechanism requires keeping track of the window sequence.
- ○
- Historical data tracking: indicative of whether the window type needs to keep track of information on previous windows such as session identity number.
2.2. Data Dimensionality
2.3. Outlier Characteristics and Detection
2.4. Related Surveys
3. Methods for OD
3.1. Statistical-Based Methods
- Gaussian-based OD (GBOD) is one of the most frequently used parametric methods that assumes normally distributed data around its mean with a variance , denoted as . Among the GBOD applied to univariate distributions, the Z-score method gives a measure distance to the mean of each observation by standard deviation [53]. The standard deviation (SD) method is similar to Z-score, but it is the raw untransformed data that are used. Further, the extreme studentized deviate (ESD) method, also called Grubbs’ test [54], assumes near normality and removes a user-defined number of outliers from a data set by conducting a series of tests that identify each point with the maximum deviation from the mean while removing them iteratively from the data, i.e., , where .
- These methods apply to a univariate distribution centered around their mean and for which normal data lie within a standard deviation of from the mean, i.e., . Gaussian methods are more suitable for global OD and are sensitive to extreme values as they use the mean and the standard deviation [55] in their detection process. They are not designed for temporal/sequential dependency and perform best on univariate numerical data with large sample sizes [56]. Popular variants of ESD are generalized ESD (GESD), online sequential ESD [54], and seasonal hybrid ESD (SHESD) [57,58], which are robust for data with a high percentage of outliers. Two major drawbacks of ESD are that the outlier threshold is user-defined, and that its iterative detection approach will make it too slow for time-sensitive detection in streaming data.
- The Boxplot-based (BPOD) method, also known as Tukey’s method, was proposed in 1977 [59] and leverages the Boxplot to detect extreme values at both whiskers [60,61,62]. It uses the median as the central tendency measure and the inter-quartile range (IQR) to measure the dispersion of the data and is more suitable for univariate symmetric distributions [62].
- The Median Absolute Deviation (MAD) is another method based on the median that uses, as a scale estimator, the overall median of all observations’ deviation to the median of the distribution, as follows: , i = 1, 2, …, n, where is the observation number i. A point will be classified as an outlier if it is distance away from the distribution’s median, denoted as , with higher than 2, 2.5, or 3. MAD assumes that the dataset has a symmetric distribution [63]. Both BPOD and MAD methods are based on robust statistics, assume a symmetric distribution of the data, and are less sensitive to outliers when compared to the Z-score and the SD-based methods. Their efficiency on a Gaussian distribution, also known as Gaussian efficiency [63], are 37% for MAD and up to 82% for BPOD based on the IQR. This indicates that unless the distribution is preliminary known to be skewed, its skewness is to be validated.
- Regression-based OD (RBOD) is another commonly used parametric method that fits a regression model to the data, estimates the residuals, and flags observations with larger residuals as outliers [64]. This method is suitable for OD on multivariate data streams within the sub-space of a sliding window, and it detects outliers in the context of a dependency between univariates while completely ignoring uncorrelated dimensions. Additionally, in real-world data, regression assumptions, i.e., the linearity of the relationship between univariates, the independence and non-multicollinearity of univariates, and the normality and homoscedasticity of the residuals, generally do not hold, making it challenging to use data with no prior information on the distribution. RBOD is suitable for contextual OD and can be used for global OD if all dimensions are considered in the regression model.
- Copula-based OD (COPOD) is another method based on the concept of copulas first defined by Sklar (1959) [65] that focuses on capturing the correlation between univariates of multivariate distributions with continuous marginals. The copula theorem stipulates in simple terms that for any multivariate random distribution, a joint distribution function can be derived as a combination of the hidden link between the marginal distributions, called copula, and the actual marginal distributions [66]. This separation allows analysis of the correlation between the distributions without a priori information about the actual distribution of the marginals. Another important component of this theorem is that if the random variables are continuous, then the copula is unique, which means this can be used to uniquely represent the cumulative distribution of multivariate random variables in the interval . The major families of copulas are independence copulas, applicable when the marginals are independent of each other, elliptical copulas based on standard distributions such as the Gaussian or Student-T, and Archimedean copulas, popular in representing multivariate distributions in the real world for their ability to model lower tail dependency (Clayton copula), asymmetric tail dependency (Gumbel copula), or symmetric with both negative and positive dependencies (Frank copula). These copulas use one or more parameters to express the correlation between marginals. A nonparametric family of copulas, known as empirical copulas, are used in studies for OD [67,68], where no assumption on the dependency function is made but instead the data are used to construct such a function. An advantage of using this method is that it may be used without making any prior assumption of the marginal distributions [69] and is applicable to the analysis of heavy-tailed non-linear dependencies [70]. Their applications related to OD have been explored in the field of dimensionality reduction, synthetic data generation, or signal denoising by baselining and decoupling [71,72]. A drawback of the copula is that it represents only the dependence between the marginals and infers no information about them. Additionally, selecting the appropriate copula distribution to use for data representation is non-trivial, hence [69] proposes an unsupervised copula selection algorithm for OD. COPOD is suitable for contextual outliers as the outlierness is determined in the context of the inherent link between the univariates.
- The Gaussian Mixture Model (GMM) is a weighted multivariate Gaussian model, which assumes that any multivariate distribution is a weighted combination of its univariate marginal distributions that all follow a Gaussian distribution [73]. This model relies on Expectation Maximization (EM), which is an iterative algorithm that first estimates the log likelihood of any data point belonging to a prior distribution [74] and then uses the maximum likelihood estimator to find the optimum log likelihood [75] under the assumption that the posterior distribution is known. Major challenges of the GMM models in detecting outliers in streaming data are reflected in their time greediness and requirement for multiple passes on the data to estimate the model. Additionally, the model requires user-defined parameters for its initialization as well as for the training phase. It is therefore suitable for multidimensional numeric data streams comprising historical data, from which the GMM parameters can be initialized and a prior distribution constructed. In this manner, a distribution can be learned from a live data stream before applying the OD on new data once the model is initialized. Since GMM-OD detects outliers by constructing a joint distribution and estimating the likelihood of belonging of each point to this distribution, it is very suitable for OD in multimodal distributions. These outliers are then considered as contextual because their outlierness is subject to the degree to which they are associated with the mixture model. It can be used for global OD as well.
- Histogram-based OD (HBOD) relies on the frequency of a continuous data distribution to compute a histogram of each continuous feature or a relative histogram of the frequency of all categories for each categorical feature in any multivariate distribution [77]. The approach for building a histogram of a feature of a dataset with observations starts with values ordering and the definition of the number of bins that will be used to build the histogram. Then, the data are grouped into the same number of ordered observations , which are then represented. The shape of the distribution is impacted by the histogram bin width selection, which determines the density of the histogram, i.e., the bins’ width is inversely proportionate to the density. The bin width can either be fixed or dynamic and the latter is recommended [75] for real-time data if the distribution is unknown and there are peaks or periods without data, which would result in gaps in the histogram if a fixed bin is used. The author points out that dynamic bin selection is less sensitive to such extremes or to outliers.
- The kernel-based OD (KBOD) method was introduced in 2007 [78] and is based on the identification of the density of points around each point of a dataset in an Euclidean space using the kernel density estimator (KDE) as the nonparametric density function and as the basis to mark an observation as an outlier. KBOD assumes non-negative, symmetrical, and normally distributed data. The kernel density is used in combination with the reachability distance estimator and the local density estimates (LDEs) of all neighbors of an observation, from which the local density factor (LDF), a continuous measurement of the risk of outlierness, is calculated. The authors of [78] report greater performance of KBOD at detecting local outliers over other methods such as the local outlier factor (LOF) and local correlation integral (LOCI). One of the challenges of KBOD is the impossibility of detecting outliers in multimodal distributions, as a point considered normal in one of the dimensions would be marked as an outlier in other sub-distributions of the dataset. KDE relies on the nearest neighbors [79] to determine a point density as opposed to the LOF which uses the full space. This makes KBOD a good candidate for contextual outliers, even though it can detect global outliers as well.
3.2. ML-Based Methods
- The distance-based OD (DBOD) method is a supervised ML method that leverages the spatial distribution of points in the space and assumes that in an -dimensional dataset, a distance can be calculated between the observations. It uses distance metrics such as the Euclidean, Mahalanobis, and the cosine similarity or the Manhattan distance. The basic assumption of proximity-based methods is that similarity is defined by spatial closeness and the higher the dissimilarity measure or distance, the more likely the observation is an outlier. Any point in the space is therefore considered normal if it has at least neighbours at a maximum distance . Any point not meeting these criteria is a candidate outlier to the rest of the dataset. Detection performance depends on user-defined parameters, distance metric selection, and the dataset. This method makes no prior distribution assumption but assumes time invariance. Additionally, DBOD performance suffers in high-dimensional data due to its reliance on the distance metric, the limit of which tends to zero in medium-to-higher dimensions (). This is also known as the curse of dimensionality. Its complexity is a quadratic of the dataset size [80] and the number of dimensions k and can be expressed as [81] or in the worst case. Several innovative approaches were developed to adapt to these challenges by tackling computational cost and execution time [82,83], using simple random subsampling (SRS) prior to OD, reducing the dimensionality and estimating a probability density function over the data [84]. The triangle OD (TOD) [83] uses geometric reasoning based on a dissimilarity matrix on the distance to a collection of neighbors to select the decision thresholds. Context-aware distance (CaD) [85] is used for trajectory description in video analysis.
- Density-based OD (DSBOD) is an unsupervised ML method that considers the local neighborhood density of observations and calculates the degree of a tuple p in the d-dimensional dataset of being an outlier relative to its neighbors, also known as the LOF [86,87]. The LOF was first introduced during the International Conference on Management of Data and it detects local outliers based on LDE and the LDF mentioned above. The LOF is a continuous value representing the degree to which a point is an outlier to the object with data points, so that the larger the LOF, the greater the risk of being an outlier. The LOF, rather than being a binary classifier of outlier or not, is a continuous value that provides a degree of “outlierness” of each observation, allowing it to adapt to various scenarios, especially for non-linear systems [88]. It is however computationally intensive given that the LOF should be chosen iteratively for all points. Furthermore, because this method assumes access to the full data space and requires multiple passes over the data, it is difficult to use in streaming data. Several LOF adaptations are proposed to address its shortcomings. The incremental local outlier factors (iLOFs) [89], for instance, are calculated for each new data point and incremental multi-class outlier detection (iMCOD) [90] proposes a multi-class OD, while the cube-based incremental local outlier factor (CB-ILOF) [91] utilizes a 3D slice of the data before performing the detection. The distributed local OD in big data (DLOF) [92] leverages distributed computing and storage for improving memory and time efficiency. The method in [93] constructs a weighted index using information entropy to improve accuracy and memory management in real-time high-dimensional data. DSBOD methods are more accurate in the detection of outliers in high-dimensional and contiguously distributed data compared to DBOD, the performance of which degrades as dimensionality increases. The computational complexity function in a dimensional dataset of size is represented as for , or otherwise [94].
- Clustering-based OD (CBOD) finds its roots in the LOF defining the possibility of local outliers as opposed to global ones. This unsupervised ML method was developed under the assumption that in a dataset, observations that relate to each other are spatially clustered [95]. Its execution follows a two-step process consisting of identifying clusters in data and flagging observations or groups of observations according to whether they are outliers or not. The method also assumes that an observation might be an outlier relative to a local cluster even though, globally, it might seem to be normal. One of the most popular clustering algorithms is density-based spatial clustering of applications in noise (DBSCAN) by Ester et al. [96], which uses the dense property of clusters. Its predecessor, CLARANS (Clustering Large Applications based on RANdomized Search), uses K-medoid clustering that is memory-intensive and impractical for large datasets as all data and objects are manipulated in memory during its entire execution. DBSCAN instead employs the concepts of density reachability and density connectivity, which define the connectedness of points through their neighbors, while a single user parameter defines the minimum number of points required in each cluster. CBOD is advantageous for OD in dense datasets compared to DBOD. It is similar to DSBOD but has the advantage of identifying global, collective, and contextual outliers (a point outlier in the context of a cluster). In higher dimensions, the data become sparse and affect clustering performance. Its computational complexity is a quadratic function of the dataset size , and of the number of dimensions or features involved [97].
- Angle-based OD (ABOD) was first introduced during the International Conference on Knowledge Discovery and Data Mining [98] and addresses the performance disadvantage of OD in high-dimensional data. Instead of assuming an association of observations based on spatial distance, this type of method assesses the variance of the angle between vectors of a point to others to calculate their proximity. ABOD is a non-parametric method, which is more efficient at OD in higher dimensions than DBOD and DSBOD, which rely on distance metrics. However, it is slower than its alternatives on larger datasets given that its complexity is cubically related to [98,99]. It does not account for temporal/sequential correlation in streaming data and is time-consuming to execute. Applying a sample size reduction method might help improve its performance. There are improvements to ABOD in the literature, such as FastABOD, which is faster in low dimensions with large datasets, data stream angle-based OD (DSABOD) [100], and angle-based intrinsic dimensionality (ABID) [101], which improves the speed of detection by applying dimensionality reduction.
- Support Vector Machines (SVMs) were introduced by [102] and is a supervised ML technique that aims to find hyperplanes that divide any dataset represented in an -dimensional hyperspace into spatially separated classes with the maximum margin between them. It relies on linear functions for linearly separable data, and on kernel functions otherwise [103]. The method is originally supervised, designed for pairwise comparison, and is nonprobabilistic. It is suitable for classification tasks, and therefore, popular for OD problems; however, this makes it impractical for unlabeled streaming data. One-class SVM (OC-SVM) is an adaptation of SVM allowing unsupervised OD by training the model to learn normality in data [104] and detect deviation or novelties in new data. This method is used for OD in WSN [105], 5G IoT [106], or in combination with deep learning methods for unsupervised feature extraction and learning [107]. Online SVMs have also been developed to incrementally update the SVM models on unseen data [108] but may be computationally expensive, especially when using kernel functions. SVM does not account for temporality or sequencing [109] in data, and therefore requires a combination with other time-agnostic models for anomaly detection in temporal or sequential data. This method operates optimally on high-dimensional datasets with a small number of samples.
- Method: indicating the OD being described.
- Concept: containing the general concept the method is based on.
- Parametric: indicative of whether the method is parametric or not.
- Type of outliers: outlier type suitability detected by the method.
- Size: the optimal data size to implement the method.
- Dimensionality: the optimal dimensionality supported by the method.
- Computational Complexity: A function indicating the training complexity.
3.3. DL-Based Methods
- Multi-layer Feed Forward (MLF) is one of the most prominent NNs. It comprises sequentially cascaded layers, each neuron of which receives the output of previous layers as input [113]. These networks may be fully connected, where predecessors’ outputs are fed as input to all successors, or partially connected. This method uses back-propagation as an error correction algorithm based on the partial derivative of the output error function to update the weights and the thresholds of each of the previous layers in the cascade. MLF is used for OD in the field of network intrusion detection [114], anomaly detection in IoT networks [115], and in multi-sensor systems [116]. A drawback of MLF in performing OD on streaming data is the fact that it requires labeled data for the model’s initial training before OD. This implies that there is empirical labeled data for model pre-training, and this is not always the case in real-world applications, unless online training is used, which would come at a computational cost of , where is the depth of the NN. This notation will be used throughout this section for other DL methods. A compromise is to use the pre-trained model, relying only on online transfer learning (OTL) [117] to retrain the model in case of changes in the data stream structure (concept evolution), but this comes at a computational cost of where represents the subset of new points considered for training. This method is supervised as it requires labels in the training data.
- Recurrent Neural Networks (RNNs) were introduced by Elman, J.L. in 1990 [118] to address the limitations of MLF NN in dealing with temporally dependent data, such as time series and sequential data [119], by introducing a feedback mechanism that allows a neuron output to be fed as input back to itself in hidden layers [120] or to preceding neurons [121]. This allows them to process sequences of data with variable length, therefore representing an excellent solution for dealing with streaming data. This method is typically used on temporal data such as time series and streaming data but is limited by the number of lags it can feedback to, making it a short-term memory network. Long short-term memory (LSTM) and gated recurrent unit (GRU) are the two most used variations of RNNs [122], which are extensively used in the literature for solving many problems involving real-world data. RNNs are better adapted for contextual outliers given that detection is achieved within the context of short-term lags at a cost of a cubical computational complexity, similar to the MLF NN. OTL is also very frequently used with RNNs for OD in streaming data. RNNs are used in environmental science [123], information security (biometric authentication) [124], and sensor networks [125].
- Long Short-term Memory (LSTM), developed by Hochreiter, S. and Schmidhuber, J. in 1997 [126], is a tweak to the RNN that allows keeping the long-term memory of past lags in the data, thus addressing the RNN’s weakness in processing long-term sequential and temporal data. LSTM introduces a three-gate system with an input gate, an output gate, and a forget gate, allowing newly acquired information into the memory cell to be memorized or forgotten. One important pattern in the literature is that LSTM appears in most research related to anomaly detection in time series using NNs. Even when not employed as a direct method for detecting outliers in the spatial domain, it is combined with other OD models that have higher efficiency in the same, while LSTM covers the temporal aspect of detection by taking advantage of its long-term memory ability. One of the advantages of LSTM is that it maintains information longer using branching, which helps to reduce the vanishing gradient problem. A notable drawback of this method is that, similarly to the RNN, it processes the information sequentially [127]; therefore, it cannot utilize the computationally efficient parallel processing offered by the graphical processing units (GPUs). Some authors combine LSTM with RNN for anomaly detection [128,129] or leverage online learning transfer [130,131,132] for faster retraining to improve detection performance. Model pretraining requires labeled data.
- Autoencoders (AE) are special types of symmetrical [133] and unsupervised [134] NNs that use the input data as the target output, using its encoder layers to reduce the dimension of the input data while the decoder layers reconstruct it. It is an alternative to the dimensionality reduction algorithms that use both linear and non-linear transformations to reduce the data dimension as opposed to principal component analysis (PCA), which relies solely on linear models for feature extraction. This comes very handy when dealing with real-world data streams with a large number of features as it would help with escaping the curse of dimensionality faced by other ML algorithms in high-dimensional data. Additionally, this method does not require labeled data because the input data are used as output. This makes AEs an interesting candidate for dealing with the unpredictable nature of real-world data streams with often frequently changing structures and content over time.
- Convolutional Neural Networks (CNNs) are prominently applied in image classification tasks [135] and are designed to accept tensor data at the input layer and use their hidden layers to extract features from the tensor. At the end, it returns a result that corresponds to the specific goal at the output layer. CNNs are, in general, composed of three common building blocks [136], the first of which is the convolutional layer that employs the convolution of filters to compute a feature map of the input by using either the sliding sum of 2D filters/3D filters or by matrix multiplication [137]. Then, it is the pooling layer [138], which applies summarization functions, such as maximum or average, on the output of previous layers to produce a lower-dimensional matrix as the output. Finally, there is the fully connected layer that performs the classification, defined as a function of the sum of product of a dimensional weight matrix (of rows by columns), by input features and the dimensional bias matrix , represented as . An advantage of the CNN is that it can be trained on a small sample of high-dimensional data. However, the training complexity is very high and may affect its performance in the streaming data context. Transfer learning is generally used in this case as the capabilities of a model trained on a very large dataset are transferred to a smaller dataset for OD.
- The Deep Belief Network (DBN) is a probabilistic generative model [139], formed by stacking Restricted Bolzmann Machines (RBMs) and designed to face the challenge of NNs overfitting at the learning stage due to poor parameter selection, which leads to increased data greediness. RBM was introduced in 2006 by Geoffrey Hinton and is composed of two layers: a layer of visible units and a layer of binary hidden units , where the total energy of the machine is calculated as follows [140]: . The DBN uses a layer-by-layer training approach also known as RBM unsupervised training, as well as error back-propagation for fine-tuning [141]. DBN’s advantage for OD in streaming data is its ability to handle high-dimensional data and perform feature extraction [142], which significantly improves prediction performance. It also allows assigning a probability of outlierness to each outlier, which is very useful for setting the appropriate decision threshold.
- Generative Adversarial Networks (GANs) represent a dual network composed of a generative unit and a discriminative unit, which was initially introduced by Ian Goodfellow et al. [143] in 2014. These architectures are based on the concept of learning normality from an input dataset, using the normal probability distribution of the data input in its generative network to produce similar data, and then using its discriminative network to identify original data from the output [144,145].
- Transformer NNs, introduced by Vaswani et al. in 2017 [146] in the article “Attention is all you need”, are among the most prominent topics in the DL field at present, as they provide more possibilities in various fields, especially in natural language processing and computer vision. Transformers rely on a state-of-the-art method called the attention mechanism. One of the key advantages of transformers over other NN models such as RNN and LSTM is their ability to parallelize processing by taking multiple inputs versus the prominent sequential approach, hence improving model training and computational performance. Additionally, the multi-headed attention layer allows the NN to focus only on important features of the hidden layer output by applying scoring, according to which features of high importance are emphasized while the influence of others is diminished. This addresses the vanishing gradient problem that impacts the performance of RNN and LSTM in the case of, for instance, neural machine translator systems.
4. Applications of OD Methods
5. Review of Datasets for OD in Streaming Data
Dataset Name | Year | Number of Features | Size | Application | Method | Reference | Source |
---|---|---|---|---|---|---|---|
KDD99 | 1999 | 41 | 58.3 k | Intrusion detection dataset | DSBOD GAN | [87,162,179] | University of California Irvine [188] |
ISCX NSL-KD | 2009 | 41 | 125 k | Intrusion detection | GAN | [179] | University of New Brunswick [189] |
IoTID20 | 2020 | 85 | 625.7 k | IoT Intrusion Dataset | Ensemble | [156] | IEEE Data Port [190] |
Corel Histogram | 2024 | 31 | 68 k | IoT in smart cities | DBOD | [82] | ML Pack [191] |
DS2OS Traffic Traces | 2018 | 14 | 322 k | IoT in smart cities | MLF | [169] | Aubet, FX [192] available on Kaggle |
Wireless Sensor Network | 2004 | 8 | 2.5 M | WSN | CBOD | [69] | Intel Berkeley Research Lab [193] |
A Poisson–Gaussian Denoising Dataset | 2018 | N/A | N/A | Image denoising | CBOD | [71] | Zhan Y. et al. [194] |
HAR | 2013 | 562 | 10.3 k | Human activity recognition | DBN | [164] | University of California Irvine [195] |
DSA | 2012 | 315 | 9.1 k | Daily sport activities | DBN | [164] | University of California Irvine [196] |
GAS, GT, IR, WM | 2004 | N/A | N/A | Comparative study | RBOD | [44] | Kaggle |
SCADA data | 2011 | 8 | N/A | Wind turbine fault detection | RBOD | [137] | Collected by user |
Ultrasonic sensor | 2019 | N/A | N/A | Monitoring water level and discharge | MAD | [140] | Collected by user |
Tropospheric Data Acquisition Network | N/A | 9 | N/A | Sensor data on atmospheric temperature | MAD | [139] | Center for Atmospheric Research [197] |
Synthetic and testbed | 2018 | N/A | N/A | Synthetic and testbed | GMM | [142] | Collected by user |
Network traffic | 2017 | N/A | N/A | Real-time data | GMM | [143] | Collected by user |
CICIDS2017 | 2017 | 83 | 25 users in 5 days | Intrusion detection evaluation | KBOD | [145] | Canadian Institute for Cybersecurity [198] |
Portable radiation spectrometer | 2021 | N/A | N/A | Real-time data | KBOD | [147] | Collected by user |
Cognitive Radio Network | 2021 | N/A | N/A | Real-time data | CBOD | [149] | Collected by user |
Wireless Sensor Network | 2023 | 200 | 400 s | Real-time data | CBOD | [151] | Collected by user |
Network Forensic Analysis | 2019 | 35 | 72 M | BOT-IOT | GAN | [166] | Cyber Range Lab—UNSW [199] |
UC-Merced 256 × 256 images | 2022 | 21 | 100 | Remote sensing scene classification | CNN and Transformer | [172] | University of California, Merced [200] |
AID (600 × 600) | 2019 | 17 | 3 k | Remote sensing scene classification | CNN and Transformer | [172] | AID scene from Wuhan University [201] |
NWPU-RESISC45 (256 × 256) | 2021 | 12 | 31.5 k | Northwestern Polytechnical University dataset [202] | |||
OPTIMAL-31 (256 × 256) | 2018 | 31 | 186 k | Hyperspectral Image Dataset [203] |
6. Summary of Assumptions and Outlier Types
7. Classification of OD Methods by Predefined Criteria
7.1. Criteria Definition
- The sample size requirement for the method performance in OD in relation to the selected window mechanism is referred to as data greediness; a model considered data greedy will be marked with “Yes”, otherwise with “No”.
- The optimal data dimensionality to perform OD—this will take values of Low if the model performs better in low dimensions , in medium dimensions of , or high dimensions of .
- The computational complexity of the method, which is defined as a function of the sample size , dimensionality , and depth of the model used in the method; given that is the complexity function, a model will be classified as:
- ○
- Low complexity if is a linear function of and the data dimensionality .
- ○
- Medium complexity if is quadratic for and linear for .
- ○
- High complexity if is a quadratic function of both and and uses the model depth or the number of hidden layers as parameters.
- The ability to detect outliers in temporal data is called temporal ability; a model will be marked as “Yes” for its temporal ability, and “No” otherwise.
- The flexibility and adaptability of the method are defined based on the number of parameters used by the method or whether it is supervised, semi-supervised, or unsupervised and will be classified, respectively, as low, medium, highly flexible, or adaptable; a model considered flexible will be denoted with “Yes”, otherwise “No”.
- The model’s robustness is indicative of whether the model is sensitive to outliers or not. A model considered robust is denoted in Table 7 as “Yes”, and as “No” if it is not.
7.2. Classification of OD Methods by Predefined Criteria
8. Discussion
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Li, J.; Yang, B.; Li, H.G. Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions. ISA Transactions. ISA Trans. 2022, 129, 594–608. [Google Scholar] [CrossRef]
- Zhu, R.; Ji, X.; Yu, D.; Tan, Z.; Zhao, L.; Li, J.; Xia, X. KNN-based approximate outlier detection algorithm over IoT streaming data. IEEE Access 2020, 8, 42749–42759. [Google Scholar] [CrossRef]
- Paul, K.; Chatterjee, S.S.; Pai, P.; Varshney, A.; Juikar, S.; Prasad, V.; Bhadra, B.; Dasgupta, S. Viable smart sensors and their application in data driven agriculture. Comput. Electron. Agric. 2022, 198, 107096. [Google Scholar] [CrossRef]
- Yang, Y.; Ding, S.; Liu, Y.; Meng, S.; Chi, X.; Ma, R.; Yan, C. Fast wireless sensor for anomaly detection based on data stream in an edge-computing-enabled smart greenhouse. Digit. Commun. Netw. 2021, 8, 498–507. [Google Scholar] [CrossRef]
- Juszczuk, P.; Kozak, J.; Kania, K. Using similarity measures in prediction of changes in financial market stream data—Experimental approach. Data Knowl. Eng. 2020, 125, 101782. [Google Scholar] [CrossRef]
- Edge, M.E.; Sampaio, P.R.F. The design of FFML: A rule-based policy modelling language for proactive fraud management in financial data streams. Expert Syst. Appl. 2012, 39, 9966–9985. [Google Scholar] [CrossRef]
- Ma, B.; Guo, W.; Zhang, J. A survey of online data-driven proactive 5G network optimisation using machine learning. IEEE Access 2020, 8, 35606–35637. [Google Scholar] [CrossRef]
- Parwez, M.S.; Rawat, D.B.; Garuba, M. Big data analytics for user-activity analysis and user-anomaly detection in mobile wireless network. IEEE Trans. Ind. Inform. 2017, 13, 2058–2065. [Google Scholar] [CrossRef]
- Ullah, I.; Mahmoud, Q.H. A framework for anomaly detection in IoT networks using conditional generative adversarial networks. IEEE Access 2021, 9, 165907–165931. [Google Scholar] [CrossRef]
- Márquez, D.G.; Otero, A.; Félix, P.; García, C.A. A novel and simple strategy for evolving prototype based clustering. Pattern Recognit. 2018, 82, 16–30. [Google Scholar] [CrossRef]
- ZareMoodi, P.; Kamali Siahroudi, S.; Beigy, H. Concept-evolution detection in non-stationary data streams: A fuzzy clustering approach. Knowl. Inf. Syst. 2019, 60, 1329–1352. [Google Scholar] [CrossRef]
- Chan, H.L.; Lam, T.W.; Lee, L.K.; Ting, H.F. Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica 2012, 62, 1088–1111. [Google Scholar] [CrossRef]
- Pugliese, L.D.P.; Ferone, D.; Festa, P.; Guerriero, F. Shortest path tour problem with time windows. Eur. J. Oper. Res. 2020, 282, 334–344. [Google Scholar] [CrossRef]
- Blevins, D.H.; Moriano, P.; Bridges, R.A.; Verma, M.E.; Iannacone, M.D.; Hollifield, S.C. Time-based can intrusion detection benchmark. arXiv 2021, arXiv:2101.05781. [Google Scholar]
- Yue, W.; Moczalla, R.; Luthra, M.; Rabl, T. Deco: Fast and Accurate Decentralized Aggregation of Count-Based Windows in Large-Scale IoT Applications. In Proceedings of the 27th International Conference on Extending Database Technology (EDBT), Paestum, Italy, 25–28 March 2024; pp. 412–425. [Google Scholar]
- Zeng, Z.; Cui, L.; Qian, M.; Zhang, Z.; Wei, K. A survey on sliding window sketch for network measurement. Computer Networks 2023, 226, 109696. [Google Scholar] [CrossRef]
- Baldini, G.; Amerini, I. Online Distributed Denial of Service (DDoS) intrusion detection based on adaptive sliding window and morphological fractal dimension. Comput. Netw. 2022, 210, 108923. [Google Scholar] [CrossRef]
- Iqbal, W.; Berral, J.L.; Carrera, D. Adaptive sliding windows for improved estimation of data center resource utilization. Future Gener. Comput. Syst. 2020, 104, 212–224. [Google Scholar]
- Youn, J.; Shim, J.; Lee, S.G. Efficient data stream clustering with sliding windows based on locality-sensitive hashing. IEEE Access 2018, 6, 63757–63776. [Google Scholar] [CrossRef]
- Bahri, M.; Bifet, A.; Gama, J.; Gomes, H.M.; Maniu, S. Data stream analysis: Foundations, major tasks and tools. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2021, 11, 1405. [Google Scholar] [CrossRef]
- Baek, Y.; Yun, U.; Kim, H.; Nam, H.; Lee, G.; Yoon, E.; Vo, B.; Lin, J.C.W. Erasable pattern mining based on tree structures with damped window over data streams. Eng. Appl. Artif. Intell. 2020, 94, 103735. [Google Scholar] [CrossRef]
- Kim, J.; Yun, U.; Kim, H.; Ryu, T.; Lin, J.C.W.; Fournier-Vier, P.; Pedrycz, W. Average utility driven data analytics on damped windows for intelligent systems with data streams. Int. J. Intell. Syst. 2021, 36, 5741–5769. [Google Scholar] [CrossRef]
- Zubaroğlu, A.; Atalay, V. Data stream clustering: A review. Artif. Intell. Rev. 2021, 54, 1201–1236. [Google Scholar] [CrossRef]
- Tanbeer, S.K.; Ahmed, C.F.; Jeong, B.S.; Lee, Y.K. Sliding window-based frequent pattern mining over data streams. Inf. Sci. 2009, 179, 3843–3865. [Google Scholar] [CrossRef]
- Giraud, C. Introduction to High-Dimensional Statistics; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
- Assent, I. Clustering high dimensional data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 340–350. [Google Scholar] [CrossRef]
- Peng, W.; Zhou, T.; Chen, Y. Enhancing mass spectrometry data analysis: A novel framework for calibration, outlier detection, and classification. Pattern Recognit. Lett. 2024, 182, 1–8. [Google Scholar] [CrossRef]
- Harrou, F.; Bouyeddou, B.; Zerrouki, N.; Dairi, A.; Sun, Y.; Zerrouki, Y. Detecting the signs of desertification with Landsat imagery: A semi-supervised anomaly detection approach. Results Eng. 2024, 22, 102037. [Google Scholar] [CrossRef]
- Tahvili, S.; Hatvani, L. Chapter three-transformation, vectorization, and optimization. In Artificial Intelligence Methods for Optimization of the Software Testing Process, Ser. Uncertainty, Computational Techniques, and Decision Intelligence; Academic Press: Cambridge, MA, USA, 2022; pp. 35–84. [Google Scholar]
- Rozza, A.; Lombardi, G.; Ceruti, C.; Casiraghi, E.; Campadelli, P. Novel high intrinsic dimensionality estimators. Mach. Learn. 2012, 89, 37–65. [Google Scholar] [CrossRef]
- Hawkins, D.M. Identification of Outliers; Chapman and Hall: London, UK, 1980; Volume 11. [Google Scholar]
- Aggarwal, C.C. Data Mining: The Textbook; Springer: New York, NY, USA, 2015; Volume 1. [Google Scholar]
- Smiti, A. A critical overview of outlier detection methods. Comput. Sci. Rev. 2020, 38, 100306. [Google Scholar] [CrossRef]
- Škoda, P.; Adam, F. Knowledge Discovery in Big Data from Astronomy and Earth Observation; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
- Han, J.; Kamber, M.; Pei, J. (Eds.) Outlier Detection, The Morgan Kaufmann Series in Data Management Systems. In Data Mining, 3rd ed.; Morgan Kaufmann: Burlington, MA, USA, 2012; pp. 543–584. [Google Scholar]
- Gupta, M.; Gao, J.; Aggarwal, C.C.; Han, J. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 2013, 26, 2250–2267. [Google Scholar] [CrossRef]
- Shi, Y.; Gong, J.; Deng, M.; Yang, X.; Xu, F. A graph-based approach for detecting spatial cross-outliers from two types of spatial point events. Comput. Environ. Urban Syst. 2018, 72, 88–103. [Google Scholar] [CrossRef]
- Zheng, Y.; Zhang, H.; Yu, Y. Detecting collective anomalies from multiple spatio-temporal datasets across different domains. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2015; ACM: New York, NY, USA, 2015; pp. 1–10. [Google Scholar]
- Qin, S.J. Neural networks for intelligent sensors and control—Practical issues and some solutions. In Neural Systems for Control; Academic Press: Cambridge, MA, USA, 1997; pp. 213–234. [Google Scholar]
- Han, J.; Pei, J.; Yin, Y.; Mao, R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 2004, 8, 53–87. [Google Scholar] [CrossRef]
- Keogh, E.; Lonardi, S.; Chiu, B.Y.C. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AL, Canada, 23–26 July 2002; pp. 550–556. [Google Scholar]
- Kern, R.; Al-Ubaidi, T.; Sabol, V.; Krebs, S.; Khodachenko, M.; Scherf, M. Astro-and Geoinformatics–Visually Guided Classification of Time Series Data. In Knowledge Discovery in Big Data From Astronomy and Earth Observation; Elsevier: Amsterdam, The Netherlands, 2020; pp. 267–282. [Google Scholar]
- Knapp, E.D.; Langill, J. Industrial Network Security: Securing Critical Infrastructure Networks for Smart Grid, SCADA, and Other Industrial Control Systems; Syngress: Oxford, UK, 2014. [Google Scholar]
- Kotu, V.; Deshpande, B. Data Science: Concepts and Practice; Morgan Kaufmann: Burlington, MA, USA, 2018. [Google Scholar]
- Duraj, A.; Szczepaniak, P.S. Outlier detection in data streams—A comparative study of selected methods. Procedia Comput. Sci. 2021, 192, 2769–2778. [Google Scholar] [CrossRef]
- Fernandes, G.; Rodrigues, J.J.; Carvalho, L.F.; Al-Muhtadi, J.F.; Proença, M.L. A comprehensive survey on network anomaly detection. Telecommun. Syst. 2019, 70, 447–489. [Google Scholar] [CrossRef]
- Dwivedi, R.K.; Rai, A.K.; Kumar, R. Outlier detection in wireless sensor networks using machine learning techniques: A survey. In Proceedings of the 2020 International Conference on Electrical and Electronics Engineering (ICE3), Gorakhpur, India, 14–15 February 2020; IEEE: Piscataway, NY, USA, 2020; pp. 316–321. [Google Scholar]
- Wang, H.; Bah, M.J.; Hammad, M. Progress in outlier detection techniques: A survey. IEEE Access 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
- Habeeb, R.A.A.; Nasaruddin, F.; Gani, A.; Hashem, I.A.T.; Ahmed, E.; Imran, M. Real-time big data processing for anomaly detection: A survey. Int. J. Inf. Manag. 2019, 45, 289–307. [Google Scholar] [CrossRef]
- Samara, M.A.; Bennis, I.; Abouaissa, A.; Lorenz, P. A survey of outlier detection techniques in IoT: Review and classification. J. Sens. Actuator Netw. 2022, 11, 4. [Google Scholar] [CrossRef]
- Gaddam, A.; Wilkin, T.; Angelova, M.; Gaddam, J. Detecting sensor faults, anomalies and outliers in the internet of things: A survey on the challenges and solutions. Electronics 2020, 9, 511. [Google Scholar] [CrossRef]
- Souiden, I.; Omri, M.N.; Brahmi, Z. A survey of outlier detection in high dimensional data streams. Comput. Sci. Rev. 2022, 44, 100463. [Google Scholar] [CrossRef]
- Molugaram, K.; Rao, G.S.; Shah, A.; Davergave, N. Statistical Techniques for Transportation Engineering; Butterworth-Heinemann: Portsmouth, NH, USA, 2017. [Google Scholar]
- Ryu, M.; Lee, G.; Lee, K. Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns. Clust. Comput. 2021, 24, 1975–1987. [Google Scholar] [CrossRef]
- Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 2013, 49, 764–766. [Google Scholar] [CrossRef]
- Bhargavi, M.V.; Sireesha, V. A comparative study for statistical outlier detection using colon cancer data. Adv. Appl. Stat. 2022, 72, 41–54. [Google Scholar] [CrossRef]
- Vieira, R.G.; Leone Filho, M.A.; Semolini, R. An Enhanced Seasonal-Hybrid ESD technique for robust anomaly detection on time series. Symp. Bras. Redes Comput. Sist. Distrib. 2018, 281–294. [Google Scholar] [CrossRef]
- Ray, S.; McEvoy, D.S.; Aaron, S.; Hickman, T.T.; Wright, A. Using statistical anomaly detection models to find clinical decision support malfunctions. J. Am. Med. Inform. Assoc. 2018, 25, 862–871. [Google Scholar] [CrossRef] [PubMed]
- Saleem, S.; Aslam, M.; Shaukat, M.R. A Review and Empirical Comparison of univariate outlier Detection Methods. Pak. J. Stat. 2021, 37, 447–462. [Google Scholar]
- Bhattacharya, S.; Beirlant, J. Outlier detection and a tail-adjusted boxplot based on extreme value theory. arXiv 2019, arXiv:1912.02595. [Google Scholar]
- Dai, W.; Mrkvička, T.; Sun, Y.; Genton, M.G. Functional outlier detection and taxonomy by sequential transformations. Comput. Stat. Data Anal. 2020, 149, 106960. [Google Scholar] [CrossRef]
- Walker, M.L.; Dovoedo, Y.H.; Chakraborti, S.; Hilton, C.W. An improved boxplot for univariate data. Am. Stat. 2018, 72, 348–353. [Google Scholar] [CrossRef]
- Rousseeuw, P.J.; Croux, C. Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 1993, 88, 1273–1283. [Google Scholar] [CrossRef]
- Devarakonda, N.; Subhani, S.; Basha, S.A.H. Outliers detection in regression analysis using partial least square approach. In Proceedings of the ICT and Critical Infrastructure: 48th Annual Convention of Computer Society of India, Visakhapatnam, India, 13–15 December 2013; Springer: Berlin/Heidelberg, Germany, 2014; Volume 2, pp. 125–135. [Google Scholar]
- Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 1959, 8, 229–231. [Google Scholar]
- Klein, N.; Kneib, T.; Marra, G.; Radice, R. Bayesian mixed binary-continuous copula regression with an application to childhood undernutrition. In Flexible Bayesian Regression Modelling; Academic Press: Cambridge, MA, USA, 2020; pp. 121–152. [Google Scholar]
- Li, Z.; Zhao, Y.; Botta, N.; Ionescu, C.; Hu, X. COPOD: Copula-based outlier detection. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NY, USA, 2020; pp. 1118–1123. [Google Scholar]
- Wang, Y.; Infield, D.G.; Stephen, B.; Galloway, S.J. Copula-based model for wind turbine power curve outlier rejection. Wind Energy 2014, 17, 1677–1688. [Google Scholar] [CrossRef]
- Ghalem, S.K.; Kechar, B.; Bounceur, A.; Euler, R. A probabilistic multivariate copula-based technique for faulty node diagnosis in wireless sensor networks. J. Netw. Comput. Appl. 2019, 127, 9–25. [Google Scholar] [CrossRef]
- Fang, G.; Pan, R. On multivariate copula modelling of dependent degradation processes. Comput. Ind. Eng. 2021, 159, 107450. [Google Scholar] [CrossRef]
- Škorić, T.; Pantelić, D.; Jelenković, B.; Bajić, D. Noise reduction in two-photon laser scanned microscopic images by singular value decomposition with copula threshold. Signal Process. 2022, 195, 108486. [Google Scholar] [CrossRef]
- Sheikhi, A.; Amirzadeh, V.; Mesiar, R. A comprehensive family of copulas to model bivariate random noise and perturbation. Fuzzy Sets Syst. 2021, 415, 27–36. [Google Scholar] [CrossRef]
- Wang, M.L.; Lynch, J.P.; Sohn, H. (Eds.) Sensing hardware and data collection methods. In Sensor Technologies for Civil Infrastructures; Woodhead Publishing: Sawston, UK, 2014; Volume 1. [Google Scholar]
- Carson, E.; Cobelli, C. Modelling Methodology for Physiology and Medicine, 2nd ed.; Newnes: London, UK, 2013. [Google Scholar]
- Theodoridis, S. Bayesian learning: Inference and the EM algorithm. In Machine Learning; Academic Press: Cambridge, MA, USA, 2020; pp. 595–646. [Google Scholar]
- Haldar, S.K. Statistical and geostatistical applications in geology. In Mineral Exploration; Elsevier: Amsterdam, The Netherlands, 2018; pp. 167–194. [Google Scholar]
- Goldstein, M.; Dengel, A. Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. In Proceedings of the 35th German Conference on Artificial Intelligence KI-2012, Saarbrücken, Germany, 24–27 September 2012; Volume 1, pp. 59–63. [Google Scholar]
- Latecki, L.J.; Lazarevic, A.; Pokrajac, D. Outlier detection with kernel density functions. In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, New York, NY, USA, 15–20 July 2017; Springer: Berlin/Heidelberg, Germany, 2007; pp. 61–75. [Google Scholar]
- Schubert, E.; Zimek, A.; Kriegel, H.P. Generalized outlier detection with flexible kernel density estimates. In Proceedings of the 2014 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics 2014, Philadelphia, PA, USA, 24–26 April 2014; pp. 542–550. [Google Scholar]
- Abdulghafoor, S.A.; Mohamed, L.A. A local density-based outlier detection method for high dimension data. Int. J. Nonlinear Anal. Appl. 2022, 13, 1683–1699. [Google Scholar]
- Ghoting, A.; Parthasarathy, S.; Otey, M.E. Fast mining of distance-based outliers in high-dimensional datasets. Data Min. Knowl. Discov. 2008, 16, 349–364. [Google Scholar] [CrossRef]
- Vu, N.H.; Gopalkrishnan, V. Efficient pruning schemes for distance-based outlier detection. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bled, Slovenia, 6–10 September 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 160–175. [Google Scholar]
- Navarro, J.; de Diego, I.M.; Fernández, R.R.; Moguerza, J.M. Triangle-based outlier detection. Pattern Recognit. Lett. 2022, 156, 152–159. [Google Scholar] [CrossRef]
- Angiulli, F.; Fassetti, F. Uncertain distance-based outlier detection with arbitrarily shaped data objects. J. Intell. Inf. Syst. 2021, 57, 1–24. [Google Scholar] [CrossRef]
- Román, I.S.; de Diego, I.M.; Conde, C.; Cabello, E. Outlier trajectory detection through a context-aware distance. Pattern Anal. Appl. 2019, 22, 831–839. [Google Scholar] [CrossRef]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; ACM: New York, NY, USA, 2000; pp. 93–104. [Google Scholar]
- Bai, M.; Wang, X.; Xin, J.; Wang, G. An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 2016, 181, 19–28. [Google Scholar] [CrossRef]
- Zhang, L.; Lin, J.; Karim, R. Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl. Based Syst. 2018, 139, 50–63. [Google Scholar] [CrossRef]
- Pokrajac, D.; Lazarevic, A.; Latecki, L.J. Incremental local outlier detection for data streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, HI, USA, 1 March–5 April 2007; IEEE: Piscataway, NY, USA, 2007; pp. 504–515. [Google Scholar]
- Degirmenci, A.; Karal, O. iMCOD: Incremental multi-class outlier detection model in data streams. Knowl. Based Syst. 2022, 258, 109950. [Google Scholar] [CrossRef]
- Gao, J.; Ji, W.; Zhang, L.; Li, A.; Wang, Y.; Zhang, Z. Cube-based incremental outlier detection for streaming computing. Inf. Sci. 2020, 517, 361–376. [Google Scholar] [CrossRef]
- Yan, Y.; Cao, L.; Kulhman, C.; Rundensteiner, E. Distributed local outlier detection in big data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017, Halifax, NS, Canada, 13–17 August 2017; pp. 1225–1234. [Google Scholar]
- Chen, L.; Wang, W.; Yang, Y. CELOF: Effective and fast memory efficient local outlier detection in high-dimensional data streams. Appl. Soft Comput. 2021, 102, 107079. [Google Scholar] [CrossRef]
- Cassisi, C.; Ferro, A.; Giugno, R.; Pigola, G.; Pulvirenti, A. Enhancing density-based clustering: Parameter reduction and outlier detection. Inf. Syst. 2013, 38, 317–330. [Google Scholar] [CrossRef]
- Nozad, S.A.N.; Haeri, M.A.; Folino, G. SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets. Knowl. Based Syst. 2021, 228, 107256. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. Density-based spatial clustering of applications with noise. In Proceedings of the International Conferences Knowledge Discovery and Data Mining 1996, Portland, OR, USA, 2–4 August 1996; p. 240. [Google Scholar]
- Degirmenci, A.; Karal, O. Efficient density and cluster based incremental outlier detection in data streams. Inf. Sci. 2022, 607, 901–920. [Google Scholar] [CrossRef]
- Kriegel, H.P.; Schubert, M.; Zimek, A. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008, Las Vegas, NV, USA, 24–27 August 2008; pp. 444–452. [Google Scholar]
- Al-taei, R.; Haeri, M.A. An ensemble angle-based outlier detection for big data. In Proceedings of the International Congress on High-Performance Computing and Big Data Analysis, Tehran, Iran, 23–25 April 2019; Springer: Cham, Switzerland, 2019; pp. 98–108. [Google Scholar]
- Ye, H.; Kitagawa, H.; Xiao, J. Continuous angle-based outlier detection on high-dimensional data streams. In Proceedings of the 19th International Database Engineering & Applications Symposium, Yokohama, Japan, 13–15 July 2015; pp. 162–167. [Google Scholar]
- Thordsen, E.; Schubert, E. ABID: Angle based intrinsic dimensionality. In Proceedings of the International Conference on Similarity Search and Applications, Copenhagen, Denmark, 30 September–2 October 2020; Springer: Cham, Switzerland, 2020; pp. 218–232. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Urso, A.; Fiannaca, A.; La Rosa, M.; Ravì, V.; Rizzo, R. Data mining: Classification and prediction. Encycl. Bioinform. Comput. Biol. ABC Bioinform. 2018, 1, 384. [Google Scholar]
- Singh, P.K.; Gupta, S.; Vashistha, R.; Nandi, S.K.; Nandi, S. Machine learning based approach to detect position falsification attack in VANETs. In Proceedings of the Security and Privacy: 2nd ISEA International Conference, ISEA-ISAP 2018, Jaipur, India, 9–11 January 2019; Springer: Singapore, 2019; pp. 166–178. [Google Scholar]
- Parras, J.; Zazo, S. Using one class SVM to counter intelligent attacks against an SPRT defense mechanism. Ad Hoc Netw. 2019, 94, 101946. [Google Scholar] [CrossRef]
- Sumathy, S.; Revathy, M.; Manikandan, R. Improving the state of materials in cybersecurity attack detection in 5G wireless systems using machine learning. Mater. Today Proc. 2023, 81, 700–707. [Google Scholar] [CrossRef]
- Erfani, S.M.; Rajasegarar, S.; Karunasekera, S.; Leckie, C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 2016, 58, 121–134. [Google Scholar] [CrossRef]
- Zhou, X.; Zhang, X.; Wang, B. Online support vector machine: A survey. In Proceedings of the Harmony Search Algorithm 2nd International Conference on Harmony Search Algorithm (ICHSA2015), Seoul, Republic of Korea, 19–21 August 2015; Springer: Berlin/Heidelberg, Germany, 2016; pp. 269–278. [Google Scholar]
- Martín, L.; Sánchez, L.; Lanza, J.; Sotres, P. Development and evaluation of Artificial Intelligence techniques for IoT data quality assessment and curation. Internet Things 2023, 22, 100779. [Google Scholar] [CrossRef]
- Rosenblatt, F. The Perceptron, a Perceiving and Recognizing Automaton (Project PARA); Cornell Aeronautical Laboratory: Buffalo, NY, USA, 1957. [Google Scholar]
- Krishnan, S. Machine learning for biomedical signal analysis. In Biomedical Signal Analysis for Connected Healthcare; Elsevier: Amsterdam, The Netherlands, 2021; pp. 223–264. [Google Scholar]
- Al-Jabery, K.; Obafemi-Ajayi, T.; Olbricht, G.; Wunsch, D. Computational Learning Approaches to Data Analytics in Biomedical Applications; Academic Press: Cambridge, MA, USA, 2019. [Google Scholar]
- Svozil, D.; Kvasnicka, V.; Pospichal, J. Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 1997, 39, 43–62. [Google Scholar] [CrossRef]
- Iqbal, A.; Aftab, S. A feed-forward and pattern recognition ANN model for network intrusion detection. Int. J. Comput. Netw. Inf. Secur. 2019, 11, 19. [Google Scholar] [CrossRef]
- Ullah, I.; Mahmoud, Q.H. An anomaly detection model for IoT networks based on flow and flag features using a feed-forward neural network. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; IEEE: Piscataway, NY, USA, 2022; pp. 363–368. [Google Scholar]
- Li, H.; Wang, X.; Yang, Z.; Ali, S.; Tong, N.; Baseer, S. Correlation-Based Anomaly Detection Method for Multi-sensor System. Comput. Intell. Neurosci. 2022, 2022, 4756480. [Google Scholar] [CrossRef]
- Kang, Z.; Yang, B.; Nielsen, M.; Deng, L.; Yang, S. A buffered online transfer learning algorithm with multi-layer network. Neurocomputing 2022, 488, 581–597. [Google Scholar] [CrossRef]
- Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- DiPietro, R.; Hager, G.D. Deep learning: RNNs and LSTM. In Handbook of Medical Image Computing and Computer Assisted Intervention; Academic Press: Cambridge, MA, USA, 2020; pp. 503–519. [Google Scholar]
- Singh, E.; Kuzhagaliyeva, N.; Sarathy, S.M. Using deep learning to diagnose preignition in turbocharged spark-ignited engines. In Artificial Intelligence and Data Driven Optimization of Internal Combustion Engines; Elsevier: Amsterdam, The Netherlands, 2022; pp. 213–237. [Google Scholar]
- Gupta, T.K.; Raza, K. Optimization of ANN architecture: A review on nature-inspired techniques. In Machine Learning in Bio-Signal Analysis and Diagnostic Imaging; Academic Press: Cambridge, MA, USA, 2019; pp. 159–182. [Google Scholar]
- Zhu, R.; Tu, X.; Huang, J.X. Deep learning on information retrieval and its applications. In Deep Learning for Data Analytics; Academic Press: Cambridge, MA, USA, 2020; pp. 125–153. [Google Scholar]
- Muharemi, F.; Logofătu, D.; Leon, F. Machine learning approaches for anomaly detection of water quality on a real-world data set. J. Inf. Telecommun. 2019, 3, 294–307. [Google Scholar] [CrossRef]
- Ackerson, J.M.; Dave, R.; Seliya, N. Applications of recurrent neural network for biometric authentication & anomaly detection. Information 2021, 12, 272. [Google Scholar] [CrossRef]
- Jeong, S.; Ferguson, M.; Law, K.H. Sensor data reconstruction and anomaly detection using bidirectional recurrent neural network. In Proceedings of the Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2019, Denver, CO, USA, 3–7 March 2019; Volume 10970, pp. 157–167. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Ankit, U. Transformer Neural Network: Step-By-Step Breakdown of the Beast. 2020. Available online: https://towardsdatascience.com/transformer-neural-network-step-by-step-breakdown-of-the-beast-b3e096dc857f (accessed on 20 September 2023).
- Al Mamun, S.A.; Beyaz, M. LSTM Recurrent Neural Network (RNN) for Anomaly Detection in Cellular Mobile Networks. In Proceedings of the Machine Learning for Networking: First International Conference MLN 2018, Paris, France, 27–29 November 2018; pp. 222–237. [Google Scholar]
- Muhuri, P.S.; Chatterjee, P.; Yuan, X.; Roy, K.; Esterline, A. Using a long short-term memory recurrent neural network (LSTM-RNN) to classify network attacks. Information 2020, 11, 243. [Google Scholar] [CrossRef]
- Sagheer, A.; Hamdoun, H.; Youness, H. Deep LSTM-based transfer learning approach for coherent forecasts in hierarchical time series. Sensors 2021, 21, 4379. [Google Scholar] [CrossRef] [PubMed]
- Bleiweiss, A. LSTM Neural Networks for Transfer Learning in Online Moderation of Abuse Context. In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019), Prague, Czech Republic, 19–21 February 2019; pp. 112–122. [Google Scholar]
- Negi, N.; Jelassi, O.; Chaouchi, H.; Clemençon, S. Distributed online Data Anomaly Detection for connected vehicles. In Proceedings of the International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; IEEE: Piscataway, NY, USA, 2020; pp. 616–621. [Google Scholar]
- Raj, P.; Evangeline, P. The Digital Twin Paradigm for Smarter Systems and Environments: The Industry Use Cases; Academic Press: Cambridge, MA, USA, 2020. [Google Scholar]
- Pavithra, V.; Jayalakshmi, V. Smart energy and electric power system: Current trends and new intelligent perspectives and introduction to AI and power system. In Smart Energy and Electric Power Systems; Elsevier: Amsterdam, The Netherlands, 2023; pp. 19–36. [Google Scholar]
- Hung, C.L. Deep learning in biomedical informatics. In Intelligent Nanotechnology; Elsevier: Amsterdam, The Netherlands, 2023; pp. 307–329. [Google Scholar]
- Teuwen, J.; Moriakov, N. Convolutional neural networks. In Handbook of Medical Image Computing and Computer Assisted Intervention; Academic Press: Cambridge, MA, USA, 2020; pp. 481–501. [Google Scholar]
- Jeon, W.; Ko, G.; Lee, J.; Lee, H.; Ha, D.; Ro, W.W. Deep learning with GPUs. Adv. Comput. 2021, 122, 167–215. [Google Scholar]
- Mishra, S.; Tripathy, H.K.; Mallick, P.K.; Sangaiah, A.K.; Chae, G.S. (Eds.) Cognitive Big Data Intelligence with a Metaheuristic Approach; Academic Press: Cambridge, MA, USA, 2021. [Google Scholar]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
- Mocanu, E.; Nguyen, P.H.; Gibescu, M. Deep learning for power system data analysis. In Big Data Application in Power Systems; Elsevier: Amsterdam, The Netherlands, 2018; pp. 125–158. [Google Scholar]
- Liu, H. Wind Forecasting in Railway Engineering; Elsevier: Amsterdam, The Netherlands, 2021. [Google Scholar]
- Talapula, D.K.; Kumar, A.; Ravulakollu, K.K.; Kumar, M. Anomaly Detection in Online Data Streams Using Deep Belief Neural Networks. In Proceedings of the Doctoral Symposium on Computational Intelligence, Lucknow, India, 3 March 2023; Springer: Singapore, 2023; pp. 729–749. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems NIPS’14, Montreal, QC, Canada, 8–13 December 2014; Volume 2, pp. 2672–2680. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Aggarwal, A.; Mittal, M.; Battineni, G. Generative adversarial network: An overview of theory and applications. Int. J. Inf. Manag. Data Insights 2021, 1, 100004. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the Advances in Neural Information Processing Systems NeurIPS 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Li, G.; Duan, Z.; Liang, L.; Zhu, H.; Hu, A.; Cui, Q.; Chen, B.; Hu, W. Outlier data mining method considering the output distribution characteristics for photovoltaic arrays and its application. Energy Rep. 2020, 6, 2345–2357. [Google Scholar] [CrossRef]
- Srinu, S.; Mishra, A.K. Efficient elimination of erroneous nodes in cooperative sensing for cognitive radio networks. Comput. Electr. Eng. 2016, 52, 284–292. [Google Scholar] [CrossRef]
- Zhao, Y.; Lehman, B.; Ball, R.; Mosesian, J.; de Palma, J.F. Outlier detection rules for fault detection in solar photovoltaic arrays. In Proceedings of the 2013 28th Annual IEEE Applied Power Electronics Conference and Exposition (APEC), Long Beach, CA, USA, 17–21 March 2013; IEEE: Piscataway, NY, USA, 2013; pp. 2913–2920. [Google Scholar]
- Schlechtingen, M.; Santos, I.F. Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection. Mech. Syst. Signal Process. 2011, 25, 1849–1875. [Google Scholar] [CrossRef]
- Leigh, C.; Alsibai, O.; Hyndman, R.J.; Kandanaarachchi, S.; King, O.C.; McGree, J.M.; Neelamraju, C.; Strauss, J.; Talagala, P.D.; Turner, R.D.; et al. A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. Sci. Total Environ. 2019, 664, 885–898. [Google Scholar] [CrossRef]
- Owolabi, O.; Okoh, D.; Rabiu, B.; Obafaye, A.; Dauda, K. A median absolute deviation-neural network (MAD-NN) method for atmospheric temperature data cleaning. MethodsX 2021, 8, 101533. [Google Scholar] [CrossRef] [PubMed]
- Bae, I.; Ji, U. Application of Outlier Detection and Smoothing Algorithm for Monitoring Water Level and Discharge by Ultrasonic Sensor. In Proceedings of the AGU Fall Meeting Abstracts, San Francisco, CA, USA, 9–13 December 2019; Volume 2019, p. H53K-1913. [Google Scholar]
- Belkhouche, F. Robust calibration of MEMS accelerometers in the presence of outliers. IEEE Sens. J. 2022, 22, 9500–9508. [Google Scholar] [CrossRef]
- Diaz-Rozo, J.; Bielza, C.; Larrañaga, P. Clustering of data streams with dynamic Gaussian mixture models: An IoT application in industrial processes. IEEE Internet Things J. 2018, 5, 3533–3547. [Google Scholar] [CrossRef]
- Reddy, A.; Ordway-West, M.; Lee, M.; Dugan, M.; Whitney, J.; Kahana, R.; Ford, B.; Muedsam, J.; Henslee, A.; Rao, M. Using gaussian mixture models to detect outliers in seasonal univariate network traffic. In Proceedings of the 2017 IEEE Security and Privacy Workshops (SPW), San Jose, CA, USA, 25 May 2017; IEEE: Piscataway, NY, USA, 2017; pp. 229–234. [Google Scholar]
- Kalaycı, İ.; Ercan, T. Anomaly detection in wireless sensor networks data by using histogram based outlier score method. In Proceedings of the 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 19–21 October 2018; IEEE: Piscataway, NY, USA, 2018; pp. 1–6. [Google Scholar]
- Çakmakçı, S.D.; Kemmerich, T.; Ahmed, T.; Baykal, N. Online DDoS attack detection using Mahalanobis distance and Kernel-based learning algorithm. J. Netw. Comput. Appl. 2020, 168, 102756. [Google Scholar] [CrossRef]
- Saeed, M.M. A real-time adaptive network intrusion detection for streaming data: A hybrid approach. Neural Comput. Appl. 2022, 34, 6227–6240. [Google Scholar] [CrossRef]
- Alamaniotis, M. Fuzzy Integration of kernel-based Gaussian Processes applied to Anomaly Detection in Nuclear Security. In 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), Chania Crete, Greece, 12–14 July 2021; IEEE: Piscataway, NY, USA, 2021; pp. 1–4. [Google Scholar]
- Bhattacharjee, S.; Marchang, N. Malicious user detection with local outlier factor during spectrum sensing in cognitive radio network. Int. J. Ad Hoc Ubiquitous Comput. 2019, 30, 215–223. [Google Scholar] [CrossRef]
- Chhetry, B.; Marchang, N. Detection of primary user emulation attack (PUEA) in cognitive radio networks using one-class classification. arXiv 2021, arXiv:2106.10964. [Google Scholar]
- Baek, S.; Kwon, D.; Suh, S.C.; Kim, H.; Kim, I.; Kim, J. Clustering-based label estimation for network anomaly detection. Digit. Commun. Netw. 2021, 7, 37–44. [Google Scholar] [CrossRef]
- Premkumar, M.; Ashokkumar, S.R.; Jeevanantham, V.; Mohanbabu, G.; AnuPallavi, S. Scalable and energy efficient cluster based anomaly detection against denial of service attacks in wireless sensor networks. Wirel. Pers. Commun. 2023, 129, 2669–2691. [Google Scholar] [CrossRef]
- Yang, L.; Lu, Y.; Yang, S.X.; Guo, T.; Liang, Z. A secure clustering protocol with fuzzy trust evaluation and outlier detection for industrial wireless sensor networks. IEEE Trans. Ind. Inform. 2020, 17, 4837–4847. [Google Scholar] [CrossRef]
- Jha, H.S.; Khanal, A.; Seikh, H.M.D.; Lee, W.J. A comparative study on outlier detection techniques for noisy production data from unconventional shale reservoirs. J. Nat. Gas Sci. Eng. 2022, 105, 104720. [Google Scholar] [CrossRef]
- Soumya, T.R.; Revathy, S. A Novel Approach for Cyber Threat Detection Based on Angle-Based Subspace Anomaly Detection. Cybern. Syst. 2022, 1–10. [Google Scholar] [CrossRef]
- Vanitha, N.; Ganapathi, P. Traffic analysis of UAV networks using enhanced deep feed forward neural networks (EDFFNN). In Handbook of Research on Machine and Deep Learning Applications for Cyber Security; IGI Global: Hershey, PA, USA, 2020; pp. 219–244. [Google Scholar]
- Reddy, D.K.; Behera, H.S.; Nayak, J.; Vijayakumar, P.; Naik, B.; Singh, P.K. Deep neural network based anomaly detection in Internet of Things network traffic tracking for the applications of future smart cities. Trans. Emerg. Telecommun. Technol. 2021, 32, 4121. [Google Scholar] [CrossRef]
- Yu, Y.; Wu, X.; Yuan, S. Anomaly detection for internet of things based on compressed sensing and online extreme learning machine autoencoder. J. Phys. Conf. Ser. 2020, 1544, 012027. [Google Scholar] [CrossRef]
- Adkisson, M.; Kimmell, J.C.; Gupta, M.; Abdelsalam, M. Autoencoder-based anomaly detection in smart farming ecosystem. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: Piscataway, NY, USA, 2021; pp. 3390–3399. [Google Scholar]
- Han, P.; Ellefsen, A.L.; Li, G.; Holmeset, F.T.; Zhang, H. Fault detection with LSTM-based variational autoencoder for maritime components. IEEE Sens. J. 2021, 21, 21903–21912. [Google Scholar] [CrossRef]
- Alabadi, M.; Celik, Y. Detection for cyber-security based on convolution neural network: A survey. In Proceedings of the 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 26–28 June 2020; IEEE: Piscataway, NY, USA, 2020; pp. 1–14. [Google Scholar]
- Sun, H.; Chen, M.; Weng, J.; Liu, Z.; Geng, G. Anomaly detection for in-vehicle network using CNN-LSTM with attention mechanism. IEEE Trans. Veh. Technol. 2021, 70, 10880–10893. [Google Scholar] [CrossRef]
- Tschuchnig, M.E.; Gadermayr, M. Anomaly detection in medical imaging-a mini review. In Data Science–Analytics and Applications: Proceedings of the 4th International Data Science Conference—iDSC2021, Online, 16–18 October 2021; Springer: Wiesbaden, Germany, 2022; pp. 33–38. [Google Scholar]
- Arabahmadi, M.; Farahbakhsh, R.; Rezazadeh, J. Deep learning for smart Healthcare—A survey on brain tumor detection from medical imaging. Sensors 2022, 22, 1960. [Google Scholar] [CrossRef]
- Qiao, Y.; Cui, X.; Jin, P.; Zhang, W. Fast outlier detection for high-dimensional data of wireless sensor networks. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147720963835. [Google Scholar] [CrossRef]
- Sarkar, N.; Keserwani, P.K.; Govil, M.C. A better and fast cloud intrusion detection system using improved squirrel search algorithm and modified deep belief network. Clust. Comput. 2023, 27, 1699–1718. [Google Scholar] [CrossRef]
- Deecke, L.; Vandermeulen, R.; Ruff, L.; Mandt, S.; Kloft, M. Image anomaly detection with generative adversarial networks. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference ECML PKDD 2018, Dublin, Ireland, 10–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–17. [Google Scholar]
- Jiang, T.; Li, Y.; Xie, W.; Du, Q. Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4666–4679. [Google Scholar] [CrossRef]
- Jin, P.; Mou, L.; Xia, G.S.; Zhu, X.X. Anomaly detection in aerial videos with transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5628213. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, D.; Zhang, X.; Yuan, Z.; Cheng, X. Learning graph structures with transformer for multivariate time-series anomaly detection in IoT. IEEE Internet Things J. 2021, 9, 9179–9189. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, Y.; Zhang, X.; Cheng, W.; Chen, H.; Xiong, H. Cat: Beyond efficient transformer for content-aware anomaly detection in event sequences. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 4541–4550. [Google Scholar]
- Zhang, J.; Zhao, H.; Li, J. TRS: Transformers for remote sensing scene classification. Remote Sens. 2021, 13, 4143. [Google Scholar] [CrossRef]
- ODDS. Outliers Detection Datasets. Available online: https://odds.cs.stonybrook.edu/ (accessed on 23 July 2024).
- IEEE Dataport. IEEE Dataport Datasets. Available online: https://ieee-dataport.org/datasets (accessed on 23 July 2024).
- University of California Irving. University of California Irving Database. Available online: https://kdd.ics.uci.edu/databases/ (accessed on 23 July 2024).
- UCI Machine Learning Repository. KDD Cup 1999 Data. University of California, Irvine. 1999. Available online: https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 29 July 2024).
- University of New Brunswick. NSL-KDD Dataset. Information Security Centre of Excellence, University of New Brunswick. 2009. Available online: http://www.unb.ca/cic/datasets/nsl.html (accessed on 29 July 2024).
- Koppula, M.; Joseph, L. A Real Time Dataset IDSIoT 2024. IEEE Data Port. 2024. [CrossRef]
- Pack, M.L. Corel Histogram Dataset. Available online: https://www.mlpack.org/datasets/ (accessed on 23 July 2024).
- Pahl, M.O.; Aubet, F.X. All Eyes on You: Distributed Multi-Dimensional IoT Microservice Anomaly Detection. In Proceedings of the 2018 14th International Conference on Network and Service Management (CNSM), Rome, Italy, 5–9 November 2018; IEEE: Piscataway, NY, USA, 2018; pp. 72–80. [Google Scholar]
- Intel Berkeley Research Lab. Intel Berkeley Research Lab Sensor Data. Intel Corporation. 2004. Available online: http://db.csail.mit.edu/labdata/labdata.html (accessed on 29 July 2024).
- Zhang, Y.; Zhu, Y.; Nichols, E.; Wang, Q.; Zhang, S.; Smith, C.; Howard, S. A Poisson-Gaussian Denoising Dataset with Real Fluorescence Microscopy Images. arXiv 2018, arXiv:1812.10366. [Google Scholar]
- Jorge, R.-O.; Davide, A.; Alessandro, G.; Luca, O.; Xavier, P. Human Activity Recognition Using Smartphones; UCI Machine Learning Repository; University of California: Irvine, CA, USA, 2012. [Google Scholar] [CrossRef]
- Billur, B.; Kerem, A. Daily and Sports Activities; UCI Machine Learning Repository; University of California: Irvine, CA, USA, 2013. [Google Scholar] [CrossRef]
- Center for Atmospheric Research. Tropospheric Data Acquisition Network (TRODAN) Data. 2013. Available online: https://carnasrda.com/trodan_data (accessed on 29 July 2024).
- Canadian Institute for Cybersecurity. CICIDS2017 Dataset. University of New Brunswick. 2017. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 29 July 2024).
- University of New South Wales. BoT-IoT Dataset. UNSW Canberra Cyber. 2018. Available online: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/ (accessed on 29 July 2024).
- University of California. Merced. UC Merced Land Use Dataset. 2010. Available online: http://weegee.vision.ucmerced.edu/datasets/landuse.html (accessed on 29 July 2024).
- Xia, G.-S. AID: Aerial Image Dataset. Wuhan University. 2017. Available online: https://captain-whu.github.io/DiRS/ (accessed on 29 July 2024).
- Haikel, H. NWPU-RESISC45 Dataset with 12 Classes; Figshare: London, UK, 2021. [Google Scholar] [CrossRef]
- Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1155–1167. [Google Scholar] [CrossRef]
Type | Mechanism | Boundaries | Overlap | Number of Passes | Sequence Tracking | Historical Data Tracking |
---|---|---|---|---|---|---|
Tumbling | Time-based | Fixed | No | One | No | No |
Damped | Time-based | Fixed | Yes | More than one | Yes | Yes |
Hopping | Time-based | Fixed | Yes | More than one | No | Yes |
Sliding | Time-based | Fixed | Yes | More than one | Yes | No |
Sliding (Eviction) | Count-based | Fixed | Yes | More than one | No | Yes |
Landmark | Landmark-based | Dynamic | Undetermined | Undetermined | Yes | Undetermined |
Landmark (Session) | Activity-based | Dynamic | Yes | More than one | No | Yes |
Survey | Scope of Reference Survey | Advantages | Limitations | Our Contributions |
---|---|---|---|---|
Duraj, A., et. al. [45] | Experimental study comparing the performance of a few statistical algorithms for OD. | Conducted a detailed performance analysis allowing the comparison of methods by OD count and algorithm time. | Analysis is limited to univariate data analysis. | Our survey covers OD in multivariate data. |
Fernandes, G., et al. [46] | A comprehensive survey on network anomaly detection. | Covers 5 domains of network anomaly detection. | Applications are limited to the security perspective and do not account for events or fault detection. | Our survey provides a broader coverage of detection by outlier types and applications on various types of outliers by industry. |
Dwivedi, R.K., et. al. [47] | Outlier detection strategies in WSNs. | Covers most OD methods in detail and proposes a structured classification of outlier source by noise/error, malicious attack, or specific events. | Does not provide guidance on OD method selection by problem type. | Our survey provides OD method selection guidelines, based on streaming data characteristics and detection tasks. |
Dwivedi, R.K., et. al. [47] | Survey of Machine learning-based OD methods in WSNs. | Detailed review of ML methods and their application for OD in WSNs. | Limited to ML Methods | Our survey covers a broader range of methods including statistical-, ML-, and DL-based methods. |
Wang, H., et. al. [48] | Survey of OD methods proposed between 2000–2016. | Provides an extensive analysis of most methods, tools, datasets, and performance metrics used in the literature. | Slightly covers DL methods and does not cover applications. | Our survey provides a summary of the applications and covers recent DL methods including state-of-the-art transformers. |
Habeeb, R. A. A., et al. [49] | Survey of real-time anomaly detection in Big Data. | Provides a detailed taxonomy for classifying studies performing anomaly detection in Big Data by methods used, applications, technology, and type of outliers. | Does not cover the computational complexity of the method in detail. Does not cover the advantages and disadvantages of methods by streaming data challenge. | Our survey categorizes the computational complexity by method. |
Samara, M.A., et. al. [50] | Survey of OD in IoT networks | Provides a summary of challenges of OD in IoT. | Applications are limited to IoT networks. | Our survey covers a wider range of Telecommunications and industrial applications. |
Gaddam, A., et. al. [51] | Fault Detection in IoT networks. | Provides a methodology for sensor faults determination in IoT using various methods. | Applications are limited to IoT fault detection. Does not cover concept evolution cases. | Our survey outlines OD methods for handling concept drift or evolution in data streams. |
Souiden, I., et. al. [52] | OD in high-dimensional data. | Developed 2 taxonomies of OD methods in high-dimensional data. | Does not provide applications per OD method or recommendations on method selection. | Our survey categorizes OD methods by application and provides several OD selection recommendations based on the data structure. |
Method | Concept | Parametric | Type of Outliers | Data Size | Dimensionality | Computational Complexity |
---|---|---|---|---|---|---|
DBOD | Distance | Yes | Global | 5000 or 10,000 | Low | or [84] |
DSBOD | Density | Yes | Global Contextual | Low | or [97] | |
CBOD | Cluster | Yes | Global Contextual Collective | 10,000 | Low | [100] |
ABOD | Angle | No | Global | Small to medium | High | [101,102] |
SVM | No | Global | Small | High | Linear or Kernel function |
Broad Category | Method | Applications |
---|---|---|
Statistical-based | GBOD | Transportation [53], medicine [56,58], energy monitoring [147], resource management and orchestration in wireless networks [148], and Cloud Systems [54,57] for log analysis. |
BPOD | Energy monitoring and power consumption [149] | |
RBOD | Information technology, fault detection in the energy industry [150], and environmental science [151]. | |
MAD | Environmental science for atmospheric data analysis [152], water level monitoring [153], and sensor calibration [154]. | |
CBOD | Child nutrition monitoring in healthcare [66], wind turbine power monitoring [68], and process monitoring in computer science [70]. | |
GMM | Sensors in civil engineering [73], streams clustering for customization of IoT [155], and more general networks [156]. | |
HBOD | WSN data analysis [157]. | |
KBOD | Denial of service detection and analysis in computer networks [158], intrusion detection [159], and nuclear security [160]. | |
ML-based | DBOD | Knowledge discovery and pattern recognition [82,83] and anomaly detection in video streams [85]. |
DSBOD | Intrusion detection in computer networks [87] and physical layer security in spectrum sensing for customization in CR networks [161,162]. | |
CBOD | Computer networking [163] and orchestration of WSNs [164,165]. | |
ML-based | ABOD | Oil and gas [166] and cyber security [167]. |
SVM | Threat detection and prevention in 5G IoT [106] and WSNs [109] and Vehicle Ad Hoc Networks (VANET) [104]. | |
DL-based | MLF | Network intrusion detection [114], intelligent transport systems [168], medicine, and customization of IoT and WSNs [115,116,169]. |
RNN | Environmental science (water quality monitoring) [123], biometric authentication [124], video surveillance, sensor data reconstruction [125], malicious insider threat, network traffic, and electricity theft detection. | |
LSTM | Self-organization and customization of cellular [128] and computer [129] networks, renewable energy, and intelligent transportation systems [132]. | |
AE | Customization of IoT networks [170], smart farming in agriculture [171], aerospace industry, medical field, environmental science, and WSNs in maritime [172]. | |
CNN | Cyber security [173], in-vehicle networks [174], and anomaly detection in medical imagery [175,176]. | |
DBN | Data analysis from WSNs [177], industrial systems, and intrusion detection systems in cyber security [178]. | |
GAN | Anomaly detection in medical imaging [179], data mining and knowledge discovery, customization of IoT networks [9], telemetric data, and radio spectrum reconstruction [180]. | |
Transformer NNs | Aerial video streaming [181], multivariate OD in IoT [182], power grids and water distribution industries, processing time series data, events sequence content-aware anomaly detection [183], medical imagery on electrocardiogram (ECG) images, and vibrating signals and remote sensing [2,184]. |
Broad Category | Method | Distribution Assumptions | Data Type | Online Learning | Supervision | Outlier Type | Outlierness | Method-Specific Assumptions |
---|---|---|---|---|---|---|---|---|
Statistical-based | GBOD | Normal | N(C) | Off * | U | G | Distribution-based | Univariate and mean centered |
BPOD | N/A | N(C) | Off * | U | G | Distribution-based | Univariate symmetrical | |
RBOD | Normal | N|T | Off | S | Normal, Stationary | |||
MAD | None | N | Off | U | G | Distribution-based | Univariate Symmetrical | |
Copula-based OD | None | N(C) | Off * | U | C | Distribution-based (learned) | A joint distribution can be built from marginals. | |
GMM | Normal | N(C) | Off * | U | C|G | Distribution-based | Same family of univariates | |
HBOD | None | N(C|D) | Off | U | G | Distribution-based (learned) | Data distribution can be learned | |
KBOD | None | N(C) | Off * | U | C | Density | - | |
ML-based | DBOD | None | N(C) | Off * | U | G | Spatial closeness | The spatial distance can be calculated. |
DSBOD | None | N(C) | Off * | U | G|C | Spatial closeness and density | Observations are spatially contiguous and densely distributed. | |
CBOD | None | N(C)|C | Off * | U | C|CL | Spatial closeness | Outlierness relates to local clusters. | |
ABOD | None | N(C) | Off | U | G | Angular closeness (Trigonometric) | Does not assume spatial projection. | |
SVM | None | N|C|Tx|Ts | Off * | S | G | Class boundary or Distribution-based | Data can be separated by class | |
DL-based | MLF | None | N/C/Tx | Off | S | G|C | N/A | - |
RNN | None | T | On | S | C | N/A | Short-term autocorrelation. | |
LSTM | None | T | On | S | C | N/A | Long-term autocorrelation. | |
AE | None | N/C/Tx | On | U | G|C | Encoding/decoding ability | No linear assumption | |
CNN | None | Ts | On | S | C | N/A | - | |
DBN | None | N/C/Tx | On | S | G|C | N/A | Data are hierarchical. | |
GAN | None | N/C/Tx | On | U | G|C | Distribution-based (learned) | Data distribution can be learned | |
Transformer | None | Ts/C/T | On | S | All | N/A | - |
Criteria | Definition |
---|---|
Data Greediness | Referring to the amount of data required for the method to perform OD task. |
Data Dimensionality | Indicative of the dimensionality on which models would optimally perform OD. |
Computational Complexity | This relates to training complexity as a function of the dimensionality , and :
|
Temporal ability | Defined as:
|
Flexibility and Adaptability | Defined as:
|
Robustness | Accounts for whether the model is sensitive to outliers:
|
Broad Category | Type of Methods | Method | Data Greediness | Data Dimensionality | Computational Complexity | Complexity Class | Temporal Ability | Flexibility and Adaptability | Robustness |
---|---|---|---|---|---|---|---|---|---|
Statistical-based | Parametric | GBOD | No | Low | Low | No | Yes | No | |
BPOD | No | Low | Low | No | Yes | Yes | |||
RBOD | No | Low | Medium | No | No | No | |||
MAD | No | Low | Low | No | Yes | Yes | |||
COPOD | Yes | High | Medium | Yes | Yes | Yes | |||
GMM | Yes | High | High | No | Yes | No | |||
Statistical-based | Non-parametric | HBOD | No | Low | O(n) | Low | No | Yes | Yes |
KBOD | Yes | Low/Medium | Low | No | Yes | Yes | |||
ML-based | Proximity | DBOD | No | Low | Medium | No | Yes | No | |
DSBOD | Yes | Low | Low | No | Yes | Yes | |||
CBOD | Yes | Low | Low | No | Yes | Yes | |||
Deviation | ABOD | No | High | Medium | No | Yes | Yes | ||
DL-based | Discriminative | MLF | Yes | Low/Medium | High | No | Yes | Yes | |
RNN | Yes | Low/Medium | High | Yes | Yes | Yes | |||
LSTM | Yes | Low/Medium | High | Yes | Yes | Yes | |||
CNN | No | High | High | No | Yes | Yes | |||
Generative | AE | No | High | High | No | Yes | Yes | ||
DBN | No | Low/Medium | High | No | Yes | Yes | |||
GAN | No | Low/Medium | High | No | Medium | Yes | |||
Generative and Discriminative | Transformer | Yes | High | High | Yes | Yes | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mfondoum, R.N.; Ivanov, A.; Koleva, P.; Poulkov, V.; Manolova, A. Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A Survey. Electronics 2024, 13, 3339. https://doi.org/10.3390/electronics13163339
Mfondoum RN, Ivanov A, Koleva P, Poulkov V, Manolova A. Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A Survey. Electronics. 2024; 13(16):3339. https://doi.org/10.3390/electronics13163339
Chicago/Turabian StyleMfondoum, Roland N., Antoni Ivanov, Pavlina Koleva, Vladimir Poulkov, and Agata Manolova. 2024. "Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A Survey" Electronics 13, no. 16: 3339. https://doi.org/10.3390/electronics13163339
APA StyleMfondoum, R. N., Ivanov, A., Koleva, P., Poulkov, V., & Manolova, A. (2024). Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A Survey. Electronics, 13(16), 3339. https://doi.org/10.3390/electronics13163339