Non-Pattern-Based Anomaly Detection in Time-Series
Abstract
:1. Introduction
- We provide the Non-Pattern based Anomaly Detection (NP-AD) formalisms from the perspective of a Finite State Machine (FSM), where we map each FSM state with the generic security data collection system, i.e., SIEM.
- We propose an NP-AD approach in Time-Series. Experiments are conducted in varying environments based on non-seasonal time series with/without anomalies in the Numenta Anomaly Benchmark (NAB) dataset, and from SIEM SPLUNK machine learning toolkit datasets. The outcome of the conducted experiments provides proof of the possibility of achieving NP-AD in a complex setting, given a variety of environments.
- We conduct a comparative analysis and map the outcome with the proposed study, where the limitations of each study are identified. That notwithstanding, a contextual critical evaluation of NP-AD in Time-Series is given. The outcome shows our approach to be easily integrated and generalized in a complex environment, even with the absence of statistical methods.
2. Methodology
3. Background and Related Literature
3.1. Anomalies and Secure Data Gathering
3.2. Time-Series
4. Non-Pattern Anomaly Detection (NP-AD) in Time Series
4.1. High-Level Description of NP-AD in Time Series
4.2. NP-AD Problem Formulation
4.2.1. Finite State Machine
4.3. General NP-AD Formalisms
4.4. NP-AD Requirements and Functions
4.4.1. General Prediction Function
4.4.2. Logistic Function
- Its middle point (the point where = 0.5) depends on the average of the last N values of ;
- Its parameter b is reciprocally proportional to the average of the last N values of .
Algorithm 1: NP-AD in Time-Series |
Input: Output: |
5. Experiment and Results
- Approach 1: General Anomaly Detection Conduct general anomaly detection in time-series against the historical data as a step towards NP-AD.
- Approach 2: Non-pattern Anomaly Detection Conduct an experiment on more complex noise data focused on data on voltage measurement with time-series from a CPU. We utilize non-seasonal time series with outliers, Numenta Anomaly Benchmark (NAB) dataset, and SIEM SPLUNK machine learning toolkit to show anomalies in time-series and we apply a Numenta Anomaly Benchmark (NAB) to evaluate the detected anomalies.
5.1. Approach 1: General Anomaly Detection
5.2. Approach 2: Non-Pattern Anomaly Detection
6. Comparative Analysis
7. Critical Evaluation of NP-AD in Time Series
8. Conclusions and Future Work
9. Raw Data and Sources
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 2017, 262, 134–147. [Google Scholar] [CrossRef]
- Tan, S.C.; Ting, K.M.; Liu, T.F. Fast anomaly detection for streaming data. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Catalonia, Spain, 16–22 July 2011. [Google Scholar]
- Waite, A. InfoSec Triads: Security/Functionality/Ease-of-Use. Available online: https://blog.infosanity.co.uk/?p=676 (accessed on 13 December 2022).
- Rainie, L.; Anderson, J.; Connolly, J. Cyber Attacks Likely to Increase; Pew Research Center: Washington, DC, USA, 2014. [Google Scholar]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access 2018, 7, 1991–2005. [Google Scholar] [CrossRef]
- Wei, L.; Kumar, N.; Lolla, V.N.; Keogh, E.J.; Lonardi, S.; Ratanamahatana, C.A. Assumption-Free Anomaly Detection in Time Series. In Proceedings of the SSDBM, Santa Barbara, CA, USA, 27–29 June 2005; Volume 5, pp. 237–242. [Google Scholar]
- Hindy, H.; Brosset, D.; Bayne, E.; Seeam, A.; Bellekens, X. Improving SIEM for critical SCADA water infrastructures using machine learning. In Computer Security; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar]
- Di Mauro, M.; Di Sarno, C. Improving SIEM capabilities through an enhanced probe for encrypted Skype traffic detection. J. Inf. Secur. Appl. 2018, 38, 85–95. [Google Scholar] [CrossRef]
- Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [Google Scholar]
- Alkharabsheh, K.; Alawadi, S.; Kebande, V.R.; Crespo, Y.; Fernández-Delgado, M.; Taboada, J.A. A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of God class. Inf. Softw. Technol. 2022, 143, 106736. [Google Scholar] [CrossRef]
- Blázquez-García, A.; Conde, A.; Mori, U.; Lozano, J.A. A review on outlier/anomaly detection in time series data. ACM Comput. Surv. 2021, 54, 1–33. [Google Scholar] [CrossRef]
- Begum, N.; Keogh, E. Rare Pattern Discovery from Time Series. In Proceedings of the Int’l Conference on Very Large Databases (VLDB), Kohala Coast, HI, USA, 31 August–4 September 2015. [Google Scholar]
- Donald, S.D.; McMillen, R.V.; Ford, D.K.; McEachen, J.C. Therminator 2: A thermodynamics-based method for real-time patternless intrusion detection. In Proceedings of the MILCOM 2002, Anaheim, CA, USA, 7–10 October 2002; IEEE: Piscataway, NJ, USA, 2002; Volume 2, pp. 1498–1502. [Google Scholar]
- Donald, S.D.; McMillen, R.V.; Ford, D.K.; McEachen, J.C. Modeling Network Conversation Flux for Patternless Intrusion Detection. Available online: https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=Modeling+network+conversation+flux+for+patternless+intrusion++detection&btnG= (accessed on 13 December 2022).
- Dobashi, K.; Ho, C.P.; Fulford, C.P.; Lin, M.F.G.; Higa, C. Learning pattern classification using moodle logs and the visualization of browsing processes by time-series cross-section. Comput. Educ. Artif. Intell. 2022, 3, 100105. [Google Scholar] [CrossRef]
- Bollmann, C.A.; Tummala, M.; McEachen, J.C. Resilient real-time network anomaly detection using novel non-parametric statistical tests. Comput. Secur. 2021, 102, 102146. [Google Scholar] [CrossRef]
- Olsavsky, V.L. Implementing a Patternless Intrusion Detection System; A Methodology for Zippo; Technical Report; Naval Postgraduate School: Monterey, CA, USA, 2005. [Google Scholar]
- Teng, M. Anomaly detection on time series. In Proceedings of the 2010 IEEE International Conference on Progress in Informatics and Computing, Shanghai, China, 10–12 December 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 1, pp. 603–608. [Google Scholar]
- Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long short term memory networks for anomaly detection in time series. In Proceedings of the ESANN, Bruges, Belgium, 22–23 April 2015; Volume 89, pp. 89–94. [Google Scholar]
- Basu, S.; Meckesheimer, M. Automatic outlier detection for time series: An application to sensor data. Knowl. Inf. Syst. 2007, 11, 137–154. [Google Scholar] [CrossRef]
- Chuah, M.C.; Fu, F. ECG anomaly detection via time series analysis. In Proceedings of the International Symposium on Parallel and Distributed Processing and Applications, Niagara Falls, Canada, 29–31 August 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 123–135. [Google Scholar]
- Williams, C. Research methods. J. Bus. Econ. Res. 2007, 5, 65–72. [Google Scholar] [CrossRef]
- Patten, M.L. Understanding Research Methods: An Overview of the Essentials; Routledge: Abingdon, UK, 2017. [Google Scholar]
- McNeill, P. Research Methods; Routledge: Abingdon, UK, 2006. [Google Scholar]
- Hawkins, D.M. Identification of Outliers; Springer: Berlin/Heidelberg, Germany, 1980; Volume 11. [Google Scholar]
- Barnett, V.; Lewis, T. Outliers in statistical data. Applied Probability and Statistics; Wiley Series in Probability and Mathematical Statistics; Wiley: New York, NY, USA, 1984. [Google Scholar]
- Ahmed, M.; Mahmood, A.N.; Hu, J. A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 2016, 60, 19–31. [Google Scholar] [CrossRef]
- Ahmed, M.; Mahmood, A.N. Novel approach for network traffic pattern analysis using clustering-based collective anomaly detection. Ann. Data Sci. 2015, 2, 111–130. [Google Scholar] [CrossRef]
- Zimek, A.; Schubert, E.; Kriegel, H.P. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Mining ASA Data Sci. J. 2012, 5, 363–387. [Google Scholar] [CrossRef]
- Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
- Markou, M.; Singh, S. Novelty detection: A review—Part 2: Neural network based approaches. Signal Process. 2003, 83, 2499–2521. [Google Scholar] [CrossRef]
- González-Granadillo, G.; González-Zarzosa, S.; Diaz, R. Security information and event management (SIEM): Analysis, trends, and usage in critical infrastructures. Sensors 2021, 21, 4759. [Google Scholar] [CrossRef]
- Carasso, D. Exploring Splunk; CITO Research: New York, NY, USA, 2012. [Google Scholar]
- Fedorov, M.; Adams, P.; Brunton, G.; Fishler, B.; Flegel, M.; Wilhelmsen, K.; Wilson, R. Leveraging Splunk for Control System Monitoring and Management; Technical Report; Lawrence Livermore National Lab. (LLNL): Livermore, CA, USA, 2017. [Google Scholar]
- Sigman, B.P.; Delgado, E. Splunk Essentials; Packt Publishing Ltd.: Birmingham, UK, 2016. [Google Scholar]
- Parzen, E. An approach to time series analysis. Ann. Math. Stat. 1961, 32, 951–989. [Google Scholar] [CrossRef]
- Cryer, J.D. Time Series Analysis; Springer: Berlin/Heidelberg, Germany, 1986; Volume 286. [Google Scholar]
- Gladyshev, P.; Patel, A. Finite state machine approach to digital event reconstruction. Digit. Investig. 2004, 1, 130–149. [Google Scholar] [CrossRef]
- Kebande, V.R.; Choo, K.K.R. Finite state machine for cloud forensic readiness as a service (CFRaaS) events. Secur. Priv. 2022, 5, e182. [Google Scholar] [CrossRef]
- Pan, J.X.; Fang, K.T. Maximum likelihood estimation. In Growth Curve Models and Statistical Diagnostics; Springer: Berlin/Heidelberg, Germany, 2002; pp. 77–158. [Google Scholar]
- Aue, A.; Norinho, D.D.; Hörmann, S. On the prediction of functional time series. arXiv 2012, arXiv:1208.2892. [Google Scholar]
- Bercu, S.; Proïa, F. A SARIMAX coupled modelling applied to individual load curves intraday forecasting. J. Appl. Stat. 2013, 40, 1333–1348. [Google Scholar] [CrossRef] [Green Version]
- Vagropoulos, S.I.; Chouliaras, G.; Kardakos, E.G.; Simoglou, C.K.; Bakirtzis, A.G. Comparison of SARIMAX, SARIMA, modified SARIMA and ANN-based models for short-term PV generation forecasting. In Proceedings of the 2016 IEEE International Energy Conference (ENERGYCON), Leuven, Belgium, 4–8 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Tarsitano, A.; Amerise, I.L. Short-term load forecasting using a two-stage sarimax model. Energy 2017, 133, 108–114. [Google Scholar] [CrossRef]
- Choi, T.M.; Yu, Y.; Au, K.F. A hybrid SARIMA wavelet transform method for sales forecasting. Decis. Support Syst. 2011, 51, 130–140. [Google Scholar] [CrossRef]
- Molan, M.; Borghesi, A.; Cesarini, D.; Benini, L.; Bartolini, A. RUAD: Unsupervised anomaly detection in HPC systems. Future Gener. Comput. Syst. 2023, 141, 542–554. [Google Scholar] [CrossRef]
- Venkataramanan, S.; Peng, K.C.; Singh, R.V.; Mahalanobis, A. Attention guided anomaly localization in images. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 485–503. [Google Scholar]
- Kebande, V.R.; Alawadi, S.; Awaysheh, F.M.; Persson, J.A. Active machine learning adversarial attack detection in the user feedback process. IEEE Access 2021, 9, 36908–36923. [Google Scholar] [CrossRef]
- Shin, Y.; Kim, K. Comparison of anomaly detection accuracy of host-based intrusion detection systems based on different machine learning algorithms. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 252–259. [Google Scholar] [CrossRef]
- Park, S.; Choi, J.Y. Hierarchical anomaly detection model for in-vehicle networks using machine learning algorithms. Sensors 2020, 20, 3934. [Google Scholar] [CrossRef] [PubMed]
- Escalante, H.J. A comparison of outlier detection algorithms for machine learning. In Proceedings of the International Conference on Communications in Computing, Las Vegas, NV, USA, 27–30 June 2005; pp. 228–237. [Google Scholar]
- Nawir, M.; Amir, A.; Lynn, O.B.; Yaakob, N.; Ahmad, R.B. Performances of machine learning algorithms for binary classification of network anomaly detection system. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2018; Volume 1018, p. 012015. [Google Scholar]
- Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding classifiers to maximize F1 score. arXiv 2014, arXiv:1402.1892. [Google Scholar]
- Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
No | Task | Description |
---|---|---|
1 | Gathering | Collecting, processing, and analyzing security events that come into the system from diverse sources many sources |
2 | Detection | Real-time detection of attacks and violations of security criteria and policies |
3 | Assessment | Prompt assessment of the security of information, telecommunications, and other critical resources |
4 | Risk | Security risk analysis and management |
5 | Investigation | Conducting investigations into incidents |
6 | Security | Making effective decisions to protect information |
6 | Reporting | Reporting documents |
Tuple | Representation | |
---|---|---|
1 | Q | Finite nonempty set of states |
2 | ∑ | Set of input symbols |
3 | A state transition functions | |
4 | Q | The initial state |
5 | Set of end states |
Tuple | SIEM Representation |
---|---|
Q | Number of events in the SIEM system |
∑ | Number of features in SIEM logs |
Number of functions in the SIEM system | |
Q | Subset of the flow of data from SIEM systems |
Subset of the flow of data from SIEM systems |
Ref. | Focus | Approach | Limitations |
---|---|---|---|
[12] | Anomaly detection in time-series data | Focused on the unsupervised environment and taxonomy for outlier detection | Not inclined toward patternless |
[13] | Rare pattern discovery in time-series | Using repeated sub-sequences in time-series | Does not show utilities for wide-range real-life dataset |
[14,15] | Real-time patternless IDS | Non-signature based/patternless IDS for thermodynamics | hardly focused on a time series approach |
[16] | Learning patterns over time-series | Pattern classifications using logs and processes over time-series | Basically a generic and normal to the learning patterns |
[17] | Resilient real-time anomaly | Uses non-parametric statistical test | Only worked on single features and not inclined toward patternless over time series |
[18] | Patternless IDS | Leverages ZIPPO | Time-series approach is hardly observed |
[19] | Anomaly detection in time series | Detecting abnormalities in time series based on distance computation | There is need to check the accuracy of the proposed algorithm, together with the efficiency |
[20] | Anomaly in LSTM networks | Trained LSTM network in non-anomalous data modeled in Gaussian distribution | It is not known whether normal behaviors included long-time dependencies |
[6] | DeepAnt unsupervised anomaly | Leverages Deep learning in unsupervised anomaly detection in time series | Not certain of adversarial approaches and there is a need to evaluate the impact on time series forecasting |
[21] | Automatic outlier detection in time-series | Detecting unusual values from time-series where data are hard to model | Knowledge about the signal is important |
[22] | ECG anomaly detection in time series | Leverages ZIPPO | Time-series approach is hardly observed |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tkach, V.; Kudin, A.; Kebande, V.R.; Baranovskyi, O.; Kudin, I. Non-Pattern-Based Anomaly Detection in Time-Series. Electronics 2023, 12, 721. https://doi.org/10.3390/electronics12030721
Tkach V, Kudin A, Kebande VR, Baranovskyi O, Kudin I. Non-Pattern-Based Anomaly Detection in Time-Series. Electronics. 2023; 12(3):721. https://doi.org/10.3390/electronics12030721
Chicago/Turabian StyleTkach, Volodymyr, Anton Kudin, Victor R. Kebande, Oleksii Baranovskyi, and Ivan Kudin. 2023. "Non-Pattern-Based Anomaly Detection in Time-Series" Electronics 12, no. 3: 721. https://doi.org/10.3390/electronics12030721
APA StyleTkach, V., Kudin, A., Kebande, V. R., Baranovskyi, O., & Kudin, I. (2023). Non-Pattern-Based Anomaly Detection in Time-Series. Electronics, 12(3), 721. https://doi.org/10.3390/electronics12030721