Adaptive Gaussian Kernel-Based Incremental Scheme for Outlier Detection
Abstract
:1. Introduction
- A Gaussian kernel function with an adaptive kernel width is employed to ensure smoothness in the local measures and to improve discriminability between objects.
- The dynamical Gaussian kernel density is presented to describe the gradual process of changing density.
- When new data arrives, the method updates the measures of the affected objects used for outlying computation of the arrived object, which can significantly reduce the computational burden.
- The experimental results illustrate that the proposed method is more effective and robust for incremental outlier mining automatically.
2. Preliminaries
Computation of Outlier Factor
3. Incremental Learning Scheme
- (a)
- If , then do not update.
- (b)
- If , then update .
- (c)
- If , then update and .
4. The Proposed Method
4.1. IncDGOF Algorithm
Algorithm 1: IncDGOF algorithm |
Input: Dataset D, new collected object o, nearest neighbor number k Output: value of o 1: Scale o to zero mean and unit variance 2: for all do 3: Compute , 4: end for 5: Compute 6: Compute using Formula (3) 7: Compute , using Formula (6) /*Find 1st order affected objects and update their measures*/ 8: for all do 9: if then 10: 11: end if 12: for all do 13: Update using Formula (10) 14: Update using Formula (6) 15: Update using Formula (7) 16: end for 17: Compute using Formula (7) /*Find 2nd order affected objects and update their measures*/ 18: for all do 19: if then 20: 21: 22: end if 23: end for 24: for all do 25: Update using Formula (8) 26: end for 27: Compute using Formula (8) 28: Compute using Formula (9) |
4.2. Time Complexity Analysis
5. Experimental Setup and Analysis
5.1. Experimental Datasets and Implementation Details
5.2. Experimental Results and Discussions of the Methods
5.3. Experimental Results and Discussions for the Involved Scaling Factors
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zou, L.; Wang, Z.; Geng, H.; Liu, X. Set-membership filtering subject to impulsive measurement outliers: A recursive algorithm. IEEE/CAA J. Autom. Sin. 2021, 8, 377–388. [Google Scholar] [CrossRef]
- Pan, Z.; Wang, Y.; Yuan, X.; Yang, C.; Gui, W. A classification-driven neuron-grouped SAE for feature representation and its application to fault classification in chemical processes. Knowl. Based Syst. 2021, 230, 107350. [Google Scholar] [CrossRef]
- Yu, T.; Hu, J.; Yang, J. Intrusion detection in intelligent connected vehicles based on weighted self-information. Electronics 2023, 12, 2510. [Google Scholar] [CrossRef]
- Kim, S.; Hwang, C.; Lee, T. Anomaly based unknown intrusion detection in endpoint environments. Electronics 2020, 9, 1022. [Google Scholar] [CrossRef]
- Cai, S.; Huang, R.; Chen, J.; Zhang, C.; Liu, B.; Yin, S.; Geng, Y. An efficient outlier detection method for data streams based on closed frequent patterns by considering anti-monotonic constraints. Inform. Sci. 2021, 555, 125–146. [Google Scholar] [CrossRef]
- Slavakis, K.; Banerjee, S. Robust hierarchical-optimization RLS against sparse outliers. IEEE Signal Process. Lett. 2020, 27, 171–175. [Google Scholar] [CrossRef]
- Degirmenci, A.; Karal, O. Robust incremental outlier detection approach based on a new metric in data streams. IEEE Access 2021, 9, 160347–160360. [Google Scholar] [CrossRef]
- Li, A.; Xu, W.; Liu, Z.; Shi, Y. Improved incremental local outlier detection for data streams based on the landmark window model. Knowl. Inf. Syst. 2021, 63, 2129–2155. [Google Scholar] [CrossRef]
- Taha, A.; Hadi, A.S. Anomaly detection methods for categorical data: A review. ACM Comput. Surv. 2019, 52, 38. [Google Scholar] [CrossRef]
- Cai, S.; Li, Q.; Li, S.; Yuan, G.; Sun, R. WMFP-Outlier: An efficient maximal frequent-pattern-based outlier detection approach for weighted data streams. Inf. Technol. Control 2019, 48, 505–521. [Google Scholar] [CrossRef]
- Gao, J.; Ji, W.; Zhang, L.; Li, A.; Wang, Y.; Zhang, Z. Cube-based incremental outlier detection for streaming computing. Inform. Sci. 2020, 517, 361–376. [Google Scholar] [CrossRef]
- Ozkan, H.; Ozkan, F.; Kozat, S.S. Online anomaly detection under markov statistics with controllable type-i error. IEEE Trans. Signal Process. 2015, 64, 1435–1445. [Google Scholar] [CrossRef]
- Ruff, L.; Kauffmann, J.R.V.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K. A unifying review of deep and shallow anomaly detection. Proc. IEEE 2021, 109, 756–795. [Google Scholar] [CrossRef]
- Degirmenci, A.; Karal, O. iMCOD: Incremental multi-class outlier detection model in data streams. Knowl. Based Syst. 2022, 258, 109950. [Google Scholar] [CrossRef]
- Deshmukh, M.M.K.; Kapse, A.S. A survey on outlier detection technique in streaming data using data clustering approach. Int. Eng. Comput. Sci. 2016, 5, 15453–15456. [Google Scholar]
- Khan, I.; Huang, J.Z.; Ivanov, K. Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 2016, 191, 34–43. [Google Scholar] [CrossRef]
- Azhir, E.; Navimipour, N.J.; Hosseinzadeh, M.; Sharifi, A.; Darwesh, A. An efficient automated incremental density-based algorithm for clustering and classification. Future Gener. Comput. Syst. 2021, 114, 665–678. [Google Scholar] [CrossRef]
- Bakr, A.M.; Ghanem, N.M.; Ismail, M.A. Efficient incremental density-based algorithm for clustering large datasets. Alexandria Eng. J. 2015, 54, 1147–1154. [Google Scholar] [CrossRef]
- Tran, L.; Fan, L.; Shahabi, C. Distance-based outlier detection in data streams. Proc. VLDB Endow. 2016, 9, 1089–1100. [Google Scholar] [CrossRef]
- Angiulli, F.; Fassetti, F. Detecting distance-basedoutliers in streams of data. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, Lisbon, Portugal, 6–10 November 2007; pp. 811–820. [Google Scholar]
- Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X. A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cogn. Comput. 2020, 5, 1. [Google Scholar] [CrossRef]
- Degirmenci, A.; Karal, O. Efficient density and cluster based incremental outlier detection in data streams. Inf. Sci. 2022, 607, 901–920. [Google Scholar] [CrossRef]
- Pokrajac, D.; Lazarevic, A.; Latecki, L.J. Incremental local outlier detection for data streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, HI, USA, 1–5 April 2007; pp. 504–515. [Google Scholar]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 93–104. [Google Scholar]
- Pokrajac, D.; Reljin, N.; Pejcic, N.; Lazarevic, A. Incremental connectivity-based outlier factor algorithm. In Proceedings of the Visions of Computer Science-BCS International Academic Conference, London, UK, 22–24 September 2008; pp. 211–223. [Google Scholar]
- Karimian, S.H.; Kelarestaghi, M.; Hashemi, S. I-inclof: Improved incremental local outlier detection for data streams. In Proceedings of the CSI International Symposium on Artificial Intelligence and Signal Processing, Fars, Iran, 2–3 May 2012; pp. 23–28. [Google Scholar]
- Dupuis, P.; Katsoulakis, M.A.; Pantazis, Y.; Rey-Bellet, L. Sensitivity analysis for rare events based on Rényi divergence. Ann. Appl. Probab. 2020, 30, 1507–1533. [Google Scholar] [CrossRef]
- Huang, J.W.; Zhong, M.X.; Jaysawal, B.P. Tadilof: Time aware density-based incremental local outlier detection in data streams. Sensors 2020, 20, 5829. [Google Scholar] [CrossRef]
- Singh, M.; Pamula, R. ADINOF: Adaptive density summarizing incremental natural outlier detection in data stream. Neural Comput. Appl. 2021, 33, 9607–9623. [Google Scholar] [CrossRef]
- Zhang, L.; Lin, J.; Karim, R. Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl. Based Syst. 2018, 139, 50–63. [Google Scholar] [CrossRef]
- Zhang, P.; Cao, H.; Zhang, Y.; Wang, J.; Jia, J.; Hu, F. Adjoint dynamical kernel density for anomaly detection. Neurocomputing 2022, 499, 81–92. [Google Scholar] [CrossRef]
- Wahid, A.; Rao, A.C.S. Rkdos: A relative kernel density-based outlier score. IETE Tech. Rev. 2020, 37, 441–452. [Google Scholar] [CrossRef]
- Hoi, S.C.; Jin, R.; Zhao, P.; Yang, T. Online multiple kernel classification. Mach. Learn. 2013, 90, 289–316. [Google Scholar] [CrossRef]
- Pinar, A.J.; Rice, J.; Hu, L.; Anderson, D.T.; Havens, T.C. Efficient multiple kernel classification using feature and decision level fusion. IEEE Trans. Fuzzy Syst. 2016, 25, 1403–1416. [Google Scholar] [CrossRef]
- Hang, H.; Steinwart, I.; Feng, Y.; Suykens, J. Kernel Density Estimation for Dynamical Systems. J. Mach. Learn. Res. 2016, 19, 1–49. [Google Scholar]
- Aggarwal, C.C.; Sathe, S. Theoretical foundations and algorithms for outlier ensembles. ACM Sigkdd Explor. Newsl. 2015, 17, 24–47. [Google Scholar] [CrossRef]
- Cao, H.; Ma, R.; Ren, H.; Ge, S.S. Data-defect inspection with kernel-neighbor-density-change outlier factor. IEEE Trans. Autom. Sci. Eng. 2016, 15, 225–238. [Google Scholar] [CrossRef]
- Tang, B.; He, H. A local density-based approach for outlier detection. Neurocomputing 2017, 241, 171–180. [Google Scholar] [CrossRef]
- Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: New York, NY, USA, 1986. [Google Scholar]
Method Categories | Advantages | Drawbacks |
---|---|---|
Model-based incremental method | global perspective, excellent performance, easy to implement | depend on the sufficient data and the prior knowledge of dataset distribution, only applicable to low-dimensional datasets |
Clustering-based incremental method | global perspective, low cost, high portability | without deviation degree, depend on the adopted clustering algorithm |
Distance-based incremental method | global perspective, give outlier degree, easy to comprehend | sensitive to the nearest neighbor parameter, not suitable for unbalanced density distribution dataset |
Density-based incremental method | consider local measures of objects, give outlier degree, suitable for unbalanced density distribution dataset, adapt to practical application | sensitive to the nearest neighbor parameter, ignore the density changes in objects |
Datasets | Objects | Training Dataset | New Arriving Dataset | Attributes | Outliers |
---|---|---|---|---|---|
Wine | 81 | 65 | 26 | 12 | 10 |
Ionosphere | 245 | 196 | 34 | 5 | 20 |
Phoneme | 500 | 400 | 100 | 5 | 50 |
Vowel | 1456 | 1167 | 289 | 12 | 50 |
Smtp | 5000 | 4000 | 1000 | 3 | 30 |
Datasets | Methods | IncLOF | IncCOF | IncDGOF | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pr | Re | RP | AUC | Pr | Re | RP | AUC | Pr | Re | RP | AUC | ||
Wine | Top 10 | 0.30 | 0.30 | 0.67 | 0.30 | 0.30 | 0.40 | 0.40 | 0.40 | 0.50 | |||
Top 20 | 0.20 | 0.40 | 0.34 | 0.30 | 0.60 | 0.37 | 0.40 | 0.80 | 0.44 | ||||
Top 30 | 0.20 | 0.60 | 0.26 | 0.68 | 0.27 | 0.80 | 0.34 | 0.83 | 0.30 | 0.90 | 0.44 | 0.84 | |
Top 40 | 0.18 | 0.70 | 0.23 | 0.25 | 1.00 | 0.31 | 0.23 | 0.90 | 0.44 | ||||
parameter k | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | |
Ionosphere | Top 20 | 0.30 | 0.30 | 0.35 | 0.55 | 0.55 | 0.72 | 0.90 | 0.90 | 0.96 | |||
Top 40 | 0.33 | 0.65 | 0.34 | 0.40 | 0.80 | 0.60 | 0.48 | 0.95 | 0.95 | ||||
Top 60 | 0.25 | 0.75 | 0.33 | 0.88 | 0.30 | 0.90 | 0.52 | 0.94 | 0.32 | 0.95 | 0.95 | 0.98 | |
Top 80 | 0.24 | 0.95 | 0.29 | 0.24 | 0.95 | 0.50 | 0.24 | 0.95 | 0.95 | ||||
parameter k | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | |
Phoneme | Top 50 | 0.08 | 0.08 | 0.08 | 0.12 | 0.12 | 0.10 | 0.46 | 0.46 | 0.54 | |||
Top 100 | 0.17 | 0.34 | 0.14 | 0.13 | 0.26 | 0.13 | 0.36 | 0.72 | 0.47 | ||||
Top 150 | 0.15 | 0.44 | 0.14 | 0.59 | 0.13 | 0.40 | 0.13 | 0.59 | 0.26 | 0.76 | 0.44 | 0.87 | |
Top 200 | 0.14 | 0.54 | 0.14 | 0.14 | 0.54 | 0.14 | 0.21 | 0.82 | 0.39 | ||||
parameter k | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 25 | |
Vowel | Top 50 | 0.38 | 0.38 | 0.66 | 0.40 | 0.40 | 0.48 | 0.58 | 0.58 | 0.75 | |||
Top 100 | 0.26 | 0.52 | 0.44 | 0.26 | 0.52 | 0.38 | 0.41 | 0.82 | 0.57 | ||||
Top 150 | 0.20 | 0.60 | 0.36 | 0.74 | 0.21 | 0.62 | 0.32 | 0.86 | 0.31 | 0.92 | 0.50 | 0.98 | |
Top 200 | 0.17 | 0.68 | 0.29 | 0.16 | 0.62 | 0.32 | 0.24 | 0.94 | 0.49 | ||||
parameter k | 73 | 73 | 73 | 73 | 73 | 73 | 73 | 73 | 73 | 73 | 73 | 73 | |
Smtp | Top 30 | 0.67 | 0.67 | 1 | 0.67 | 0.67 | 0.98 | 0.67 | 0.67 | 0.98 | |||
Top 60 | 0.33 | 0.67 | 1 | 0.33 | 0.67 | 0.98 | 0.33 | 0.67 | 0.98 | ||||
Top 90 | 0.22 | 0.67 | 1 | 0.87 | 0.22 | 0.67 | 0.98 | 0.78 | 0.22 | 0.67 | 0.72 | 0.92 | |
Top 120 | 0.17 | 0.67 | 1 | 0.17 | 0.67 | 0.98 | 0.18 | 0.70 | 0.72 | ||||
parameter k | 250 | 250 | 250 | 250 | 250 | 250 | 250 | 250 | 250 | 250 | 250 | 250 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, P.; Wang, T.; Cao, H.; Lu, S. Adaptive Gaussian Kernel-Based Incremental Scheme for Outlier Detection. Electronics 2023, 12, 4571. https://doi.org/10.3390/electronics12224571
Zhang P, Wang T, Cao H, Lu S. Adaptive Gaussian Kernel-Based Incremental Scheme for Outlier Detection. Electronics. 2023; 12(22):4571. https://doi.org/10.3390/electronics12224571
Chicago/Turabian StyleZhang, Panpan, Tao Wang, Hui Cao, and Siliang Lu. 2023. "Adaptive Gaussian Kernel-Based Incremental Scheme for Outlier Detection" Electronics 12, no. 22: 4571. https://doi.org/10.3390/electronics12224571
APA StyleZhang, P., Wang, T., Cao, H., & Lu, S. (2023). Adaptive Gaussian Kernel-Based Incremental Scheme for Outlier Detection. Electronics, 12(22), 4571. https://doi.org/10.3390/electronics12224571