A Novel Time-Series Transformation and Machine-Learning-Based Method for NTL Fraud Detection in Utility Companies
Abstract
:1. Introduction
2. Method
2.1. Feature Engineering
2.1.1. Sudden Jump Detection in the Smart Meter Usage Using the Sum of Finite Differences
2.1.2. ARIMA, Holt-Winters, and Other Smart Meter Readings Feature Extraction
2.2. Machine Learning
2.2.1. Hyper-Parameters Tuning
2.2.2. Distributed Random Forest
3. Experimental Evaluation
3.1. Materials
3.2. Performance Measures
4. Results and Discussion
4.1. Impact of the Number of Trees
4.2. Computational Burden
4.3. Comparison with Other Methods
4.4. When to Retrain the Model
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zheng, Z.; Yang, Y.; Niu, X.; Dai, H.N.; Zhou, Y. Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids. IEEE Trans. Ind. Inform. 2018, 14, 1606–1615. [Google Scholar] [CrossRef]
- Hasan, M.; Toma, R.N.; Nahid, A.A.; Islam, M.; Kim, J.M. Electricity theft detection in smart grid systems: A CNN-LSTM based approach. Energies 2019, 12, 3310. [Google Scholar] [CrossRef] [Green Version]
- Nagi, J.; Yap, K.S.; Tiong, S.K.; Ahmed, S.K.; Mohamad, M. Nontechnical loss detection for metered customers in power utility using support vector machines. IEEE Trans. Power Deliv. 2009, 25, 1162–1171. [Google Scholar] [CrossRef]
- Glauner, P.; Meira, J.A.; Valtchev, P.; State, R.; Bettinger, F. The challenge of non-technical loss detection using artificial intelligence: A survey. arXiv 2016, arXiv:1606.00626. [Google Scholar] [CrossRef] [Green Version]
- Xia, X.; Xiao, Y.; Liang, W. SAI: A suspicion assessment-based inspection algorithm to detect malicious users in smart grid. IEEE Trans. Inf. Forensics Secur. 2019, 15, 361–374. [Google Scholar] [CrossRef]
- Viegas, J.L.; Esteves, P.R.; Melício, R.; Mendes, V.; Vieira, S.M. Solutions for detection of non-technical losses in the electricity grid: A review. Renew. Sustain. Energy Rev. 2017, 80, 1256–1268. [Google Scholar] [CrossRef] [Green Version]
- Saeed, M.S.; Mustafa, M.W.; Hamadneh, N.N.; Alshammari, N.A.; Sheikh, U.U.; Jumani, T.A.; Khalid, S.B.A.; Khan, I. Detection of non-technical losses in power utilities—A comprehensive systematic review. Energies 2020, 13, 4727. [Google Scholar] [CrossRef]
- Messinis, G.M.; Hatziargyriou, N.D. Review of non-technical loss detection methods. Electr. Power Syst. Res. 2018, 158, 250–266. [Google Scholar] [CrossRef]
- Tariq, M.; Poor, H.V. Electricity theft detection and localization in grid-tied microgrids. IEEE Trans. Smart Grid 2016, 9, 1920–1929. [Google Scholar] [CrossRef]
- Ferreira, T.S.D.; Trindade, F.C.; Vieira, J.C. Load flow-based method for nontechnical electrical loss detection and location in distribution systems using smart meters. IEEE Trans. Power Syst. 2020, 35, 3671–3681. [Google Scholar] [CrossRef]
- Chen, L.; Xu, X.; Wang, C. Research on anti-electricity stealing method based on state estimation. In Proceedings of the 2011 IEEE Power Engineering and Automation Conference, Wuhan, China, 8–9 September 2011; Volume 2, pp. 413–416. [Google Scholar]
- McLaughlin, S.; Holbert, B.; Fawaz, A.; Berthier, R.; Zonouz, S. A multi-sensor energy theft detection framework for advanced metering infrastructures. IEEE J. Sel. Areas Commun. 2013, 31, 1319–1330. [Google Scholar] [CrossRef]
- Xiao, Z.; Xiao, Y.; Du, D.H.C. Exploring malicious meter inspection in neighborhood area smart grids. IEEE Trans. Smart Grid 2012, 4, 214–226. [Google Scholar] [CrossRef]
- Jokar, P.; Arianpoo, N.; Leung, V.C. Electricity theft detection in AMI using customers’ consumption patterns. IEEE Trans. Smart Grid 2015, 7, 216–226. [Google Scholar] [CrossRef]
- Guo, Y.; Ten, C.W.; Jirutitijaroen, P. Online data validation for distribution operations against cyber tampering. IEEE Trans. Power Syst. 2013, 29, 550–560. [Google Scholar] [CrossRef]
- Angelos, E.W.S.; Saavedra, O.R.; Cortés, O.A.C.; De Souza, A.N. Detection and identification of abnormalities in customer consumptions in power distribution systems. IEEE Trans. Power Deliv. 2011, 26, 2436–2442. [Google Scholar] [CrossRef]
- Zheng, K.; Chen, Q.; Wang, Y.; Kang, C.; Xia, Q. A novel combined data-driven approach for electricity theft detection. IEEE Trans. Ind. Inform. 2018, 15, 1809–1819. [Google Scholar] [CrossRef]
- Badrinath Krishna, V.; Iyer, R.K.; Sanders, W.H. ARIMA-based modeling and validation of consumption readings in power grids. In Proceedings of the International Conference on Critical Information Infrastructures Security, Berlin, Germany, 5–7 October 2015; Springer: Cham, Switzerland, 2015; pp. 199–210. [Google Scholar]
- Nagi, J.; Mohammad, A.; Yap, K.S.; Tiong, S.K.; Ahmed, S.K. Non-technical loss analysis for detection of electricity theft using support vector machines. In Proceedings of the 2008 IEEE 2nd International Power and Energy Conference, Johor Bahru, Malaysia, 1–3 December 2008; pp. 907–912. [Google Scholar]
- Ramos, C.C.O.; De Souza, A.N.; Gastaldello, D.S.; Papa, J.P. Identification and feature selection of non-technical losses for industrial consumers using the software weka. In Proceedings of the 2012 10th IEEE/IAS International Conference on Industry Applications, Fortaleza, Brazil, 5–7 November 2012; pp. 1–6. [Google Scholar]
- Khan, Z.A.; Adil, M.; Javaid, N.; Saqib, M.N.; Shafiq, M.; Choi, J.G. Electricity theft detection using supervised learning techniques on smart meter data. Sustainability 2020, 12, 8023. [Google Scholar] [CrossRef]
- Khan, I.U.; Javaid, N.; Taylor, C.J.; Gamage, K.A.; Ma, X. Big Data Analytics for Electricity Theft Detection in Smart Grids. In Proceedings of the 2021 IEEE Madrid PowerTech, Madrid, Spain, 28 June–2 July 2021; pp. 1–6. [Google Scholar]
- Nizar, A.; Dong, Z.; Wang, Y. Power utility nontechnical loss analysis with extreme learning machine method. IEEE Trans. Power Syst. 2008, 23, 946–955. [Google Scholar] [CrossRef]
- Costa, B.C.; Alberto, B.L.; Portela, A.M.; Maduro, W.; Eler, E.O. Fraud detection in electric power distribution networks using an ann-based knowledge-discovery process. Int. J. Artif. Intell. Appl. 2013, 4, 17. [Google Scholar] [CrossRef]
- Hussain, S.; Mustafa, M.W.; Jumani, T.A.; Baloch, S.K.; Alotaibi, H.; Khan, I.; Khan, A. A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection. Energy Rep. 2021, 7, 4425–4436. [Google Scholar] [CrossRef]
- Hussain, S.; Mustafa, M.W.; Ateyeh Al-Shqeerat, K.H.; Saeed, F.; Al-Rimy, B.A.S. A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data. Sensors 2021, 21, 8423. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Bohani, F.A.; Suliman, A.; Saripuddin, M.; Sameon, S.S.; Md Salleh, N.S.; Nazeri, S. A comprehensive analysis of supervised learning techniques for electricity theft detection. J. Electr. Comput. Eng. 2021, 2021, 9136206. [Google Scholar] [CrossRef]
- Badawi, S.A.; Fraz, M.M. Optimizing the trainable B-COSFIRE filter for retinal blood vessel segmentation. PeerJ 2018, 6, e5855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mujeeb, S.; Javaid, N.; Khalid, R.; Imran, M.; Naseer, N. DE-RUSBoost: An efficient electricity theft detection scheme with additive 404 communication layer. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications, Dublin, Ireland, 7–11 June 2020; Volume 405, pp. 1–6. [Google Scholar]
- Ullah, A.; Javaid, N.; Samuel, O.; Imran, M.; Shoaib, M. CNN and GRU based deep neural network for electricity theft detection to secure smart grid. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 1598–1602. [Google Scholar]
- Yao, D.; Wen, M.; Liang, X.; Fu, Z.; Zhang, K.; Yang, B. Energy theft detection with energy privacy preservation in the smart grid. IEEE Internet Things J. 2019, 6, 7659–7669. [Google Scholar] [CrossRef]
- Zhukov, A.V.; Sidorov, D.N.; Foley, A.M. Random Forest Based Approach for Concept Drift Handling. In Analysis of Images, Social Networks and Texts; Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2016; Volume 661. [Google Scholar]
- Dal Pozzolo, A.; Boracchi, G.; Caelen, O.; Alippi, C.; Bontempi, G. Credit card fraud detection and concept-drift adaptation with delayed supervised information. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–8. [Google Scholar]
- Wang, X.; Fan, Y.; Huang, Y.; Ling, J.; Klimowicz, A.; Pagano, G.; Li, B. Solving Sensor Reading Drifting Using Denoising Data Processing Algorithm (DDPA) for Long-Term Continuous and Accurate Monitoring of Ammonium in Wastewater. ACS EST Water 2020, 1, 530–541. [Google Scholar] [CrossRef]
- Bohani, R.S.; Cabral, D.R.; Gonçalves, P.M., Jr.; Santos, S.G. RDDM: Reactive drift detection method. Expert Syst. Appl. 2017, 90, 344–355. [Google Scholar]
Feature | Description |
---|---|
Timeseries readings (ts) | SM readings. |
Sliding window | Readings (32 lead and 32 lag) around each current reading |
Mean, Median, Sum, Variance, Min, Max | Statistical summaries |
Delta 1 to Delta 32 | Delta i = Lead i – lag i |
DifferencesSum | Delta I summation around current reading |
Holt-Winters | Holt-winters smoothing moving average |
ARIMA | Autoregressive integrated moving average |
Stationarity | Stationarity transformation. |
Trend, Seasonality, Random | Extracted from the seasonal trend decomposition analysis. |
Label | 1 for fraudulent and 0 for normal label |
Hyperparameter | Range Start | Range End |
---|---|---|
Batch size | 1 | 40 |
Number of trees | 1 | 200 |
Score tree interval | 1 | 20 |
Stopping rounds | 1 | 10 |
Stopping tolerance | 0.000001 | 0.001 |
Hyper-Parameter | Description | Value |
---|---|---|
Batch size | Is the number of training samples used to compute the loss function | 25 |
Number of trees | The number of trees in the random forest | 100 |
Score tree interval | Score of the model after every given number of trees | 5 |
Stopping rounds | To stop the model training if the AUC does not improve during those rounds | 3 |
Stopping metric | The performance metric used to stop the training | AUC |
Stopping tolerance | Specifies the relative tolerance to stop a grid search | 0.0005 |
Metric | Preprocessing Threshold | Formula |
---|---|---|
Accuracy (Acc.) | Measures the correctly identified fraud and non-fraud cases | |
Sensitivity (Recall) | Measures the method’s ability to correctly detect the fraud out of the total predicted fraud cases | |
Specificity | Measures the method’s ability to correctly detect the non-fraud of the total predicted non-fraud cases | |
AUC | Measures the method’s performance to distinguish whether the SM reading is fraudulent or not | |
Precision (Pr) | Measures the capability of the classifier to correctly identify the fraud out of the total actual fraud cases. | |
F1-score | The F1 score is a harmonic average of the recall and precision |
Loss Function | 50 Trees | 100 Trees |
---|---|---|
Mean Square error (MSE) | 0.00135 | 0.00164 |
Root Mean Square error (RMSE) | 0.03675 | 0.04060 |
Cross entropy or Logloss | 0.03127 | 0.03571 |
R | 0.99460 | 0.99340 |
AUC | 0.98325 | 0.98326 |
Model | Year | Accuracy | Recall | Specificity | Precision | F1-Score | AUC |
---|---|---|---|---|---|---|---|
Proposed method | 2022 | 0.98 | 0.98 | 0.99 | 0.99 | 0.98 | 0.98 |
CNN-GRU-PSO [31] | 2020 | 0.87 | 0.89 | ||||
CNN [32] | 2019 | 0.93 | |||||
FA-XGBoost [21] | 2020 | 0.95 | 0.97 | 0.93 | 0.94 | 0.95 | |
WADCNN [1] | 2018 | 0.86 | 0.74 | 0.87 | 0.70 | 0.76 | |
DERUSBOOST [30] | 2020 | 0.96 | 0.90 | 0.99 | 0.90 | 0.89 | |
CatBoost [25] | 2021 | 0.93 | 0.92 | 0.95 | 0.94 | ||
NGBoost [26] | 2021 | 0.93 | 0.91 | 0.95 | 0.92 | 0.94 | |
BSVM [22] | 2021 | 0.94 | 0.91 | 0.96 | 0.94 | 0.93 | |
Decision tree [28] | 2021 | 0.91 | 0.02 | 0.50 | 0.05 | 0.51 | |
ANN [28] | 2021 | 0.92 | 0.35 | 0.64 | 0.42 | 0.66 | |
Deep ANN [28] | 2021 | 0.92 | 0.40 | 0.59 | 0.45 | 0.69 | |
AdaBoost [28] | 2021 | 0.91 | 0.07 | 0.57 | 0.13 | 0.53 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Badawi, S.A.; Guessoum, D.; Elbadawi, I.; Albadawi, A. A Novel Time-Series Transformation and Machine-Learning-Based Method for NTL Fraud Detection in Utility Companies. Mathematics 2022, 10, 1878. https://doi.org/10.3390/math10111878
Badawi SA, Guessoum D, Elbadawi I, Albadawi A. A Novel Time-Series Transformation and Machine-Learning-Based Method for NTL Fraud Detection in Utility Companies. Mathematics. 2022; 10(11):1878. https://doi.org/10.3390/math10111878
Chicago/Turabian StyleBadawi, Sufian A., Djamel Guessoum, Isam Elbadawi, and Ameera Albadawi. 2022. "A Novel Time-Series Transformation and Machine-Learning-Based Method for NTL Fraud Detection in Utility Companies" Mathematics 10, no. 11: 1878. https://doi.org/10.3390/math10111878
APA StyleBadawi, S. A., Guessoum, D., Elbadawi, I., & Albadawi, A. (2022). A Novel Time-Series Transformation and Machine-Learning-Based Method for NTL Fraud Detection in Utility Companies. Mathematics, 10(11), 1878. https://doi.org/10.3390/math10111878