Deep-Shallow Metaclassifier with Synthetic Minority Oversampling for Anomaly Detection in a Time Series
Abstract
:1. Introduction
- Pipeline leak detection identifies losses of product due to a physical leak in a pipeline; the product may be one of many types of liquids or gasses, and the product is sometimes dangerous or toxic.
- ○
- Water-main pipeline leak-detection equipment revenues were $1.5 billion USD in 2020, expected to grow to) $2.5 billion by 2028 (https://www.reportlinker.com/p06063427/Water-Pipeline-Leak-Detection-System-Market-Forecast-to-COVID-19-Impact-and-Global-Analysis-By-Offering-Equipment-Type-Pipe-Type-and-End-User-and-Geography.html, (accessed on 29 January 2024)).
- ○
- Oil and gas pipeline leak-detection equipment revenues were $2.1 billion USD in 2020, likely growing to over $2.8 billion by 2027 (https://www.industryresearch.co/enquiry/request-sample/18445354, (accessed on 29 January 2024)).
- Patient-Ventilator Asynchrony (PVA) detection involves recognizing a discordance between a mechanical ventilator’s operation and the breathing reflex of a patient, one that can exacerbate distress or even cause major injury if not addressed correctly, with the worst case being a life-threatening pneumothorax (collapsed lung) [3,4].
2. Background
2.1. Deep Anomaly Detection
2.2. Learning from Imbalanced Data
2.3. Pipeline Leak Detection
2.4. Patient-Ventilator Asynchrony Detection
Mechanical Ventilation
3. Related Work
3.1. Anomaly Detection
3.2. Sample Selection Bias Mitigation in Time Series
3.3. Patient-Ventilator Asynchrony Detection
4. Proposed Anomaly Detector
4.1. Normal Model
4.2. Anomaly Detector
4.3. Oversampling for Reducing False Positives
5. Experimental Methodology
5.1. Pipeline Leak Detection
5.1.1. Datasets
5.1.2. Methodology
5.2. PVA Detection
5.2.1. Dataset
5.2.2. Methodology
6. Results and Discussion
6.1. Pipeline Leak Detection
6.2. PVA Detection
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pang, G.; Shen, C.; Cao, L.; Hengel, A.v.d. Deep learning for anomaly detection: A review. arXiv 2020, arXiv:2007.02500. [Google Scholar] [CrossRef]
- Hawkins, D.M. Identification of Outliers; Springer: Berlin, Germany, 1980. [Google Scholar]
- Blanch, L.; Villagra, A.; Sales, B.; Montanya, J.; Lucangelo, U.; Luján, M.; García-Esquirol, O.; Chacón, E.; Estruga, A.; Oliva, J.C.; et al. Asynchronies during mechanical ventilation are associated with mortality. Intensive Care Med. 2015, 41, 633–641. [Google Scholar] [CrossRef] [PubMed]
- Slutsky, A.S.; Ranieri, V.M. Ventilator Induced Lung Injury. N. Engl. J. Med. 2013, 369, 2126–2136. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Erfani, S.M.; Rajasegarar, S.; Karunasekera, S.; Leckie, C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 2016, 58, 121–134. [Google Scholar] [CrossRef]
- Chalapathy, R.; Chawla, S. Deep learning for anomaly detection: A survey. arXiv 2019, arXiv:1901.03407. [Google Scholar]
- Gamboa, J.C.B. Deep Learning for Time-Series Analysis. arXiv 2017, arXiv:1701.01887. Available online: https://arxiv.org/abs/1701.01887 (accessed on 27 October 2021).
- Fawaz, H.I.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.-A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
- Nakkiran, P.; Kaplun, G.; Bansal, Y.; Yang, T.; Barak, B.; Sutskever, I. Deep double descent: Where bigger models and more data hurt. J. Stat. Mech. Theory Exp. 2021, 124003. [Google Scholar] [CrossRef]
- D’Amour, A.; Heller, K.; Moldovan, D.; Adlam, B.; Alipanahi, B.; Beutel, A.; Chen, C.; Deaton, J.; Eisenstein, J.; Hoffman, M.D.; et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning. arXiv 2021, arXiv:2011.03395. Available online: https://arxiv.org/abs/2011.03395 (accessed on 1 October 2022).
- Ling, C.X.; Sheng, V.S. Cost-sensitive learning and the class imbalance problem. In Encyclopedia of Machine Learning; Springer: New York, NY, USA, 2008. [Google Scholar]
- Monard, M.C.; Batista, G. Learning with skewed class distributions. In Advances in Logic, Artificial Intelligence and Robotics; IOS Press: Amsterdam, The Netherlands, 2002; pp. 173–180. [Google Scholar]
- Fan, W.; Davidson, I.; Zadrozny, B.; Yu, P.S. An improved categorization of classifier’s sensitivity on sample selection bias. In Proceedings of the IEEE International Conference Data Mining, Houston, TX, USA, 27–30 November 2005; pp. 605–608. [Google Scholar]
- Provost, F.; Fawcett, T. Robust classification for imprecise environments. Mach. Learn. 2001, 42, 203–231. [Google Scholar] [CrossRef]
- Raskutti, B. Extreme Re-balancing for SVM’s: A case study. In Proceedings of the ICML-KDD’2003 Workshop: Learning from Imbalanced Data Sets, Washington, DC, USA, 21 August 2003. [Google Scholar]
- Greene, W.H.; Zhang, C. Econometric Analysis; Prentice Hall: Upper Saddle River, NJ, USA, 2003; Volume 5. [Google Scholar]
- Ahumada, H.; Grinblat, G.L.; Uzal, L.C.; Granitto, P.M.; Ceccatto, A. REPMAC: A new hybrid approach to highly imbalanced classification problems. In Proceedings of the 2008 Eighth International Conference on Hybrid Intelligent Systems, Barcelona, Spain, 10–12 September 2008; pp. 386–391. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Syntethic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Batista, G.; Prati, R.; Monard, M.C. A study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Lect. Notes Comput. Sci. 2005, 3644, 878–887. [Google Scholar]
- Chawla, N.V.; Cieslak, D.A.; Hall, L.O.; Joshi, A. Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 2008, 17, 225–252. [Google Scholar] [CrossRef]
- García, V.; Sánchez, J.S.; Mollineda, R.A. On the use of surrounding neighbors for synthetic over-sampling of the minority class. In Proceedings of the 8th Conference Simulation, Modelling and Optimization Santander, Cantabria, Spain, 23–25 September 2008; Spain World Scientific and Engineering Academy and Society (WSEAS): Stevens Point, WI, USA, 2008. [Google Scholar]
- Domingos, P. MetaCost: A General Method for Making Classifiers Cost-Sensitive. In Proceedings of the Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999. [Google Scholar]
- Thai-Nghe, N.; Gantner, Z.; Schmidt-Thieme, L. Cost-sensitive learning methods for imbalanced data. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; p. 8. [Google Scholar]
- Karangwa, E. Estimating the Cost of Pipeline Transportation in Canada. Available online: http://ctrf.ca/wp-content/uploads/2014/07/Karangwa2008.pdf (accessed on 23 March 2020).
- INGAA. SAFETY Every Step of the Way. Available online: http://www.ingaa.org/File.aspx?id=12282 (accessed on 23 March 2020).
- Belvederesi, C.; Thompson, M.S.; Komers, P.E. Statistical analysis of environmental consequences of hazardous liquid pipeline accidents. Heliyon 2018, 4, 19. [Google Scholar] [CrossRef]
- Computational Pipeline Monitoring for Liquids; American Petroleum Institute: Washington, DC, USA, 2017.
- Mannan, S. Lees’ Loss Prevention in the Process Industries: Hazard Identification, Assessment and Control; Butterworth-Heinemann: Oxford, UK, 2012; Volume 2. [Google Scholar]
- Angelov, P.; Kordon, A. Adaptive inferential sensors based on evolving fuzzy models. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2009, 40, 529–539. [Google Scholar] [CrossRef]
- Rashid, S.; Akram, U.; Qaisar, S.; Khan, S.A.; Felemban, E. Wireless sensor network for distributed event detection based on machine learning. In Proceedings of the IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing, Taipei, Taiwan, 1–3 September 2014; pp. 540–545. [Google Scholar]
- Milner, M.; Dick, S. Pipeline Leak Detection via Machine Learning. Pipeline Technol. J. 2019, 2019, 14–21. [Google Scholar]
- Staff. Cerebral Hypoxia. Available online: https://medlineplus.gov/ency/article/001435.htm (accessed on 28 October 2021).
- Burri, P.H.; Siebens, A.A.; Weibel, E.R.; Heath, D.A.; Elliott, D.H.; Klocke, R.A.; Cherniack, N.S.; Beers, M.F. Human respiratory system. In Encyclopedia Britannica; Encyclopædia Britannica, Inc.: Chicago, IL, USA, 2020. [Google Scholar]
- Walker, C. Just Breathe: Breathing Techniques for Your Exercise. 2013. Available online: https://www.fitness19.com/just-breathe-breathing-techniques-for-your-exercise/ (accessed on 31 March 2021).
- Emrath, E. The basics of ventilator waveforms. Curr. Pediatr. Rep. 2021, 9, 11–19. [Google Scholar] [CrossRef]
- Rehm, G.; Han, J.; Kuhn, B.; Delplanque, J.; Anderson, N.; Adams, J.; Chuah, C. Creation of a robust and generalizable machine learning classifier for patient ventilator asynchrony. Methods Inf. Med. 2018, 57, 208–219. [Google Scholar] [CrossRef] [PubMed]
- Imhoff, M.; Kuhls, S. Alarm Algorithms in Critical Care Monitoring. Anesth. Analg. 2006, 102, 1525–1537. [Google Scholar] [CrossRef] [PubMed]
- Koski, E.M.J.; Mäkivirta, A.; Sukuvaara, T.; Kari, A. Clinicians’ opinions on alarm limits and urgency of therapeutic responses. J. Clin. Monit. Comput. 1995, 12, 85–88. [Google Scholar] [CrossRef]
- Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. Proc. Mach. Learn. Res. 2018, 80, 4393–4402. [Google Scholar]
- Chalapathy, R.; Menon, A.K.; Chawla, S. Anomaly detection using one-class neural networks. arXiv 2018, arXiv:1802.06360. [Google Scholar]
- Zheng, P.; Yuan, S.; Wu, X.; Li, J.; Lu, A. One-class adversarial nets for fraud detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, F.; Cohen, W.W.; Salakhutdinov, R.R. Good semi-supervised learning that requires a bad GAN. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Goldstein, M.; Uchida, S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 2016, 11, e0152173. [Google Scholar] [CrossRef]
- Sugiyama, M.; Nakajima, S.; Kashima, H.; von Bunau, P.; Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–11 December 2008. [Google Scholar]
- Pelayo, L.; Dick, S. Synthetic minority oversampling for function approximation problems. Int. J. Intell. Syst. 2019, 34, 2741–2768. [Google Scholar] [CrossRef]
- Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time Series Data Augmentation for Deep Learning: A Survey. In Proceedings of the IJCAI 2021, Online, 19–26 August 2021. [Google Scholar]
- de la Cal, E.; Villar, J.R.; Vergara, P.; Sedano, J.; Herrero, A. A SMOTE Extension for Balancing Multivariate Epilepsy-Related Time Series Datasets. Adv. Intell. Syst. Comput. 2018, 649, 439–448. [Google Scholar]
- Moniz, N.; Branco, P.; Torgo, L. Resampling strategies for imbalanced time series forecasting. Int. J. Data Sci. Anal. 2017, 3, 161–181. [Google Scholar] [CrossRef]
- Wu, Y.; Ding, Y.; Feng, J. SMOTE-Boost-based sparse Bayesian model for flood prediction. EURASIP J. Wirel. Comm. Net. 2020, 2020, 78. [Google Scholar] [CrossRef]
- Chollet, F. Deep Learning with Python; Manning Pub. Co.: Shelter Island, NY, USA, 2018. [Google Scholar]
- Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980; Springer: Berlin/Heidelberg, Germany, 1981; pp. 366–381. [Google Scholar]
- Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Pearson Education, Inc.: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
- Gholami, B.; Phan, T.S.; Haddad, W.M.; Cason, A.; Mullis, J.; Price, L.; Bailey, J.M. Replicating human expertise of mechanical ventilation waveform analysis in detecting patient-ventilator cycling asynchrony using machine learning. Comput. Biol. Med. 2018, 97, 137–144. [Google Scholar] [CrossRef] [PubMed]
- Pan, Q.; Zhang, L.; Jia, M.; Pan, J.; Gong, Q.; Lu, Y.; Zhang, Z.; Ge, H.; Fang, L. An interpretable 1D convolutional neural network for detecting patient-ventilator asynchrony in mechanical ventilation. Comput. Methods Programs Biomed. 2021, 204, 106057. [Google Scholar] [CrossRef]
- Zhang, L.; Mao, K.; Duan, K.; Fang, S.; Lu, Y.; Gong, Q.; Lu, F.; Jiang, Y.; Jiang, L.; Fang, W.; et al. Detection of patient-ventilator asynchrony from mechanical ventilation waveforms using a two-layer long short-term memory neural network. Comput. Biol. Med. 2020, 120, 103721. [Google Scholar] [CrossRef]
- Mills, T.C. Time Series Techniques for Economists; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Kantz, H.; Schreiber, T. Nonlinear Time Series Analysis; Cambridge University Press: Cambridge, UK, 2004; Volume 7. [Google Scholar]
- Scholkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Barrios, J. Pipeline Leak Detection Techniques and Systems: Comparative Assessment of Pipeline Leak Detection Methods. In Mechanical Engineering; University of Alberta: Edmonton, AB, Canada, 2019. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation forest. In Proceedings of the ICDM, Pisa, Italy, 15–19 December 2008; pp. 770–778. [Google Scholar]
- Adams, J.Y.; Lieng, M.K.; Kuhn, B.T.; Rehm, G.B.; Guo, E.C.; Taylor, S.L.; Delplanque, J.-P.; Anderson, N.R.N.R. Development and validation of a multi-algorithm analytic platform to detect off-target mechanical ventilation. Sci. Rep. 2017, 7, 14980. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.-H.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proceedings of the NIPS Workshop on Deep Learning and Representation Learning, Montreal, QC, Canada, 12 December 2014; p. 9. [Google Scholar]
- Gal, Y.; Ghahramani, Z. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In Proceedings of the NIPS, Barcelona, Spain, 5–10 December 2016; p. 9. [Google Scholar]
- Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Cliff, N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol. Bull. 1993, 114, 494–509. [Google Scholar] [CrossRef]
- Chong, T.C.; Loo, N.L.; Chiew, Y.S.; Mat-Nor, M.B.; Ralib, A.M. Classification Patient-Ventilator Asynchrony with Dual-Input Convolutional Neural Network. IFAC-Pap. 2021, 54, 322–327. [Google Scholar] [CrossRef]
Model | RMSE | Training Time |
---|---|---|
1-D CNN | 0.01 | 30 s |
LSTM | 0.01 | 600 s |
Model | RMSE | Training Time |
---|---|---|
1-D CNN | 0.09 | 70 s |
LSTM | 0.09 | 1200 s |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest | 0.9956 | 0.9945 | 0.0003 |
OC-SVM | 0.9955 | 0.9946 | 0.0014 |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest | 0.9997 | 0.9999 | 0.0006 |
OC-SVM | 0.9999 | 1.0 | 0.0006 |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest | 0.9968 | 0.9962 | 0.0011 |
OC-SVM | 0.9966 | 0.9966 | 0.0036 |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest | 0.9994 | 1.0 | 0.0014 |
OC-SVM | 0.9997 | 1.0 | 0.0007 |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest | 0.9995 | 1.0 | 0.0014 |
OC-SVM | 0.9984 | 0.999 | 0.0042 |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest (proposed) | 0.9959 | 0.9949 | 0.0006 |
OC-SVM (proposed) | 0.9927 | 0.9911 | 0.0012 |
Soft-bound Deep SVDD | 0.9859 | 0.9885 | 0.0239 |
Once-class Deep SVDD | 0.9826 | 0.9843 | 0.0239 |
OC-NN | 0.9923 | 0.9914 | 0.0047 |
OCAN | 0.7826 | 0.6859 | 0.1208 |
Model | Accuracy | TPR | FPR |
---|---|---|---|
Isolation Forest (proposed) | 0.9995 | 1.0 | 0.0014 |
OC-SVM (proposed) | 0.9990 | 0.9999 | 0.0040 |
Soft-bound Deep SVDD | 0.9981 | 1.0 | 0.0080 |
Once-class Deep SVDD | 0.9989 | 1.0 | 0.0043 |
OC-NN | 0.9780 | 0.9780 | 0.0222 |
OCAN | 0.9990 | 1.0 | 0.0041 |
GRU | Dropout GRU | Bidirectional | 1D CNN | |
---|---|---|---|---|
Test error (RMSE) | 0.1691 | 0.2406 | 0.1697 | 0.0970 |
OC-SVM | OC-SVM with Resampling | |
---|---|---|
Sensitivity | 0.8103 | 0.9729 |
Specificity | 0.9433 | 0.9519 |
Accuracy | 0.8763 | 0.9624 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Reshadi, M.; Li, W.; Xu, W.; Omashor, P.; Dinh, A.; Xiao, J.; Dick, S.; She, Y.; Lipsett, M. Deep-Shallow Metaclassifier with Synthetic Minority Oversampling for Anomaly Detection in a Time Series. Algorithms 2024, 17, 114. https://doi.org/10.3390/a17030114
Reshadi M, Li W, Xu W, Omashor P, Dinh A, Xiao J, Dick S, She Y, Lipsett M. Deep-Shallow Metaclassifier with Synthetic Minority Oversampling for Anomaly Detection in a Time Series. Algorithms. 2024; 17(3):114. https://doi.org/10.3390/a17030114
Chicago/Turabian StyleReshadi, MohammadHossein, Wen Li, Wenjie Xu, Precious Omashor, Albert Dinh, Jun Xiao, Scott Dick, Yuntong She, and Michael Lipsett. 2024. "Deep-Shallow Metaclassifier with Synthetic Minority Oversampling for Anomaly Detection in a Time Series" Algorithms 17, no. 3: 114. https://doi.org/10.3390/a17030114
APA StyleReshadi, M., Li, W., Xu, W., Omashor, P., Dinh, A., Xiao, J., Dick, S., She, Y., & Lipsett, M. (2024). Deep-Shallow Metaclassifier with Synthetic Minority Oversampling for Anomaly Detection in a Time Series. Algorithms, 17(3), 114. https://doi.org/10.3390/a17030114