An Enhancement Method in Few-Shot Scenarios for Intrusion Detection in Smart Home Environments
Abstract
:1. Introduction
- This paper proposes a feature enhancement module to improve the data quality in the dataset by analyzing historical intrusion detection records of smart homes, adaptively extending feature columns for the smart home devices dataset, and performing data cleaning on the dataset;
- This paper proposes a data enhancement module to generate valid data to populate the dataset using conditional Wasserstein GAN to realize the operation of data enhancement for few-shot data;
- The effectiveness of the EM-FEDE method is evaluated using a typical smart home device dataset, N-BaIoT. The performance of the original dataset and the expanded dataset using the EM-FEDE method on each intrusion detection model is compared to conclude that the classifier’s performance is higher for the expanded dataset than the original dataset;
- The experiments demonstrate that expanding the dataset using the EM-FEDE method is crucial and effective in improving the performance of attack detection. This work successfully addresses the problem of few-shot data affecting the performance of intrusion detection models.
2. Related Works
2.1. Intrusion Detection Methods for Smart Homes
2.2. GAN-Based Data Enhancement Methods
3. EM-FEDE Method
3.1. Problem Analysis
3.2. Feature Enhancement
3.3. Data Enhancement
Algorithm 1: EM-FEDE |
Input: α = 0.0005, the learning rate; n = 50, the batch size; c = 0.01, the clipping parameter; , initial discriminator parameters; , initial generator parameters; Ne = 1000, the training cycles. Output: Expanded R Process: 1. Calculate LF by Equation (1) 2. If LF = 0 3. Add feature columns that are helpful for classification to R through Equations (2)–(4) 4. Numerization, de-duplication, and normalization by Equations (5)–(7) 5. Divide the processed R into training sets and test sets 6. End if 7. While θ has not converged or epoch < Ne do 8. epoch++ 9. Sample of m noise samples{z1, …, zn} ~ PZ a batch of prior data 10. Sample of m examples{(x1,y1), …, (xn,yn)} ~ Preal a batch from the real data 11. Update the discriminator D by ascending its stochastic gradient () 12. 13. 14. 15. Sample of m noise samples{z1, …, zm} ~ PZ a batch of prior data. 16. Update the generator G by ascending its stochastic gradient () 17. 18. 19. End while 20. Generate sample data for each class through the generator to populate R 21. Train the expanded R on different classifiers to obtain various evaluation indicators |
4. Results
4.1. N-BaIoT Dataset Description
4.2. Data Preprocessing
4.3. Experimental Environment
4.4. Network Structure
4.5. Results and Analysis
5. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cvitić, I.; Peraković, D.; Periša, M.; Jevremović, A.; Shalaginov, A. An Overview of Smart Home IoT Trends and related Cybersecurity Challenges. Mob. Netw. Appl. 2022. [Google Scholar] [CrossRef]
- Hammi, B.; Zeadally, S.; Khatoun, R.; Nebhen, J. Survey on smart homes: Vulnerabilities, risks, and countermeasures. Comput. Secur. 2022, 117, 102677. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, R.; Zhang, X.; Zhang, Y. Privacy Risk Assessment of Smart Home System Based on a STPA–FMEA Method. Sensors 2023, 23, 4664. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.Y.; Meng, Q.; Chen, Y.C.; Kumari, S.; Chen, C.M. Toward a Secure Smart-Home IoT Access Control Scheme Based on Home Registration Approach. Mathematics 2023, 11, 2123. [Google Scholar] [CrossRef]
- Li, Y.; Zuo, Y.; Song, H.; Lv, Z. Deep learning in security of internet of things. IEEE Internet Things J. 2021, 9, 22133–22146. [Google Scholar] [CrossRef]
- Chkirbene, Z.; Erbad, A.; Hamila, R.; Gouissem, A.; Mohamed, A.; Guizani, M.; Hamdi, M. A weighted machine learning-based attacks classification to alleviating class imbalance. IEEE Syst. J. 2020, 15, 4780–4791. [Google Scholar] [CrossRef]
- Zivkovic, M.; Tair, M.; Venkatachalam, K.; Bacanin, N.; Hubálovský, Š.; Trojovský, P. Novel hybrid firefly algorithm: An application to enhance XGBoost tuning for intrusion detection classification. PeerJ Comput. Sci. 2022, 8, e956. [Google Scholar]
- Li, X.K.; Chen, W.; Zhang, Q.; Wu, L. Building auto-encoder intrusion detection system based on random forest feature selection. Comput. Secur. 2020, 95, 101851. [Google Scholar]
- Wang, Z.; Liu, Y.; He, D.; Chan, S. Intrusion detection methods based on integrated deep learning model. Comput. Secur. 2021, 103, 102177. [Google Scholar] [CrossRef]
- Tsimenidis, S.; Lagkas, T.; Rantos, K. Deep learning in IoT intrusion detection. J. Netw. Syst. Manag. 2022, 30, 8. [Google Scholar] [CrossRef]
- Heartfield, R.; Loukas, G.; Budimir, S.; Bezemskij, A.; Fontaine, J.R.; Filippoupolitis, A.; Roesch, E. A taxonomy of cyber-physical threats and impact in the smart home. Comput. Secur. 2018, 78, 398–428. [Google Scholar] [CrossRef] [Green Version]
- Touqeer, H.; Zaman, S.; Amin, R.; Hussain, M.; Al-Turjman, F.; Bilal, M. Smart home security: Challenges, issues and solutions at different IoT layers. J. Supercomput. 2021, 77, 14053–14089. [Google Scholar] [CrossRef]
- Cao, X.; Luo, Q.; Wu, P. Filter-GAN: Imbalanced Malicious Traffic Classification Based on Generative Adversarial Networks with Filter. Mathematics 2022, 10, 3482. [Google Scholar] [CrossRef]
- Wang, M.; Yang, N.; Weng, N. Securing a Smart Home with a Transformer-Based IoT Intrusion Detection System. Electronics 2023, 12, 2100. [Google Scholar] [CrossRef]
- Guebli, W.; Belkhir, A. Inconsistency detection-based LOD in smart homes. Int. J. Semant. Web Inf. Syst. IJSWIS 2021, 17, 56–75. [Google Scholar] [CrossRef]
- Madhu, S.; Padunnavalappil, S.; Saajlal, P.P.; Vasudevan, V.A.; Mathew, J. Powering up an IoT-enabled smart home: A solar powered smart inverter for sustainable development. Int. J. Softw. Sci. Comput. Intell. IJSSCI 2022, 14, 1–21. [Google Scholar] [CrossRef]
- Tiwari, A.; Garg, R. Adaptive Ontology-Based IoT Resource Provisioning in Computing Systems. Int. J. Semant. Web Inf. Syst. IJSWIS 2022, 18, 1–18. [Google Scholar] [CrossRef]
- Elsayed, N.; Zaghloul, Z.S.; Azumah, S.W.; Li, C. Intrusion detection system in smart home network using bidirectional lstm and convolutional neural networks hybrid model. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Lansing, MI, USA, 9–11 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 55–58. [Google Scholar]
- Shi, L.; Wu, L.; Guan, Z. Three-layer hybrid intrusion detection model for smart home malicious attacks. Comput. Electr. Eng. 2021, 96, 107536. [Google Scholar] [CrossRef]
- Alani, M.M.; Awad, A.I. An Intelligent Two-Layer Intrusion Detection System for the Internet of Things. IEEE Trans. Ind. Inform. 2022, 19, 683–692. [Google Scholar] [CrossRef]
- Rani, D.; Gill, N.S.; Gulia, P.; Arena, F.; Pau, G. Design of an Intrusion Detection Model for IoT-Enabled Smart Home. IEEE Access 2023, 11, 52509–52526. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Fu, W.; Qian, L.; Zhu, X. GAN-based intrusion detection data enhancement. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2739–2744. [Google Scholar]
- Zhang, L.; Duan, L.; Hong, X.; Liu, X.; Zhang, X. Imbalanced data enhancement method based on improved DCGAN and its application. J. Intell. Fuzzy Syst. 2021, 41, 3485–3498. [Google Scholar] [CrossRef]
- Li, S.; Dutta, V.; He, X.; Matsumaru, T. Deep Learning Based One-Class Detection System for Fake Faces Generated by GAN Network. Sensors 2022, 22, 7767. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Xiao, Y.; Shen, H.; Wang, Z. An effective data enhancement method of deep learning for small weld data defect identification. Measurement 2023, 206, 112245. [Google Scholar] [CrossRef]
- Jin, H.; Huang, S.; Wang, B.; Chen, X.; Yang, B.; Qian, B. Soft sensor modeling for small data scenarios based on data enhancement and selective ensemble. Chem. Eng. Sci. 2023, 279, 118958. [Google Scholar] [CrossRef]
- Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2019, 17, 12–22. [Google Scholar] [CrossRef] [Green Version]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Devices | Working Hours (h) | Data Throughput | Frequency of Attack |
---|---|---|---|
Router | 24 | Larger | Higher |
Gateway | 20 | Larger | Higher |
Light | 14 | Smaller | Lower |
TV | 8 | Larger | Lower |
Intelligent door lock | 3 | Smaller | Lower |
Floor sweeper | 2 | Smaller | Lower |
Washing machine | 2 | Smaller | Lower |
Smart camera | 24 | Larger | Higher |
Symbols | Description |
---|---|
R | Historical intrusion detection records. |
Used to determine the existence of the device class and the data class in x. It returns 1 if features are present and 0 otherwise. | |
Insert() | Insert operation. |
Obtain the corresponding class from the information in x. | |
ai | The device class feature column. |
bi | The data class feature column. |
The function is used for mapping during the process of numericalization in x. | |
The function is used for removing duplicate data in x. | |
The function is used for normalizing the data in x. | |
L | 1-Lipschitz function. |
Preal | Real data distribution. |
Pz | Data distribution of input noise. |
Fake sample data generated by the generator. | |
D(x) | The probability that the discriminator determines that x belongs to the real data. |
Z | Noise vector of the a priori noise distribution Pz. |
Joint probability distribution of real data and generated data. | |
Fake_data | Generated data with label y_fake. |
Traffic Type Name | Number of Training Sets | Number of Test Sets |
---|---|---|
benign_traffic | 1054 | 325 |
gafgyt_attacks combo | 136 | 60 |
gafgyt_attacks junk | 122 | 75 |
gafgyt_attacks scan | 124 | 78 |
gafgyt_attacks tcp | 97 | 56 |
gafgyt_attacks udp | 91 | 51 |
mirai_attacks ack | 76 | 42 |
mirai_attacks scan | 83 | 40 |
mirai_attacks syn | 76 | 42 |
mirai_attacks udp | 73 | 49 |
mirai_attacks udpplain | 70 | 40 |
Category | Parameters |
---|---|
CPU | Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20 GHz |
RAM | 64 GB |
Programming Tools | Jupyter Notebook |
Programming Languages | Python3.8 |
Deep Learning Framework | Pytorch1.8 |
Machine Learning Platform | Weka3.9 |
Data Processing Library | Numpy, pandas, etc. |
G/D | Structure | Size |
---|---|---|
Generator | Input layer | 50 |
Hidden layer 1(Tanh()) | 128 | |
Hidden layer 2(Tanh()) | 256 | |
Hidden layer 3(Tanh()) | 128 | |
Output layer(Tanh()) | 116 | |
Discriminator | Input layer | 116 |
Hidden layer 1(Tanh()) | 128 | |
Hidden layer 2(Tanh()) | 128 | |
Output layer | 1 |
Classifier | Structure | Size |
---|---|---|
MLP | Input layer | 116 |
Hidden layer 1(Tanh()) | 128 | |
Hidden layer 2(Tanh()) | 128 | |
Output layer | 11 | |
CNN | Input layer | 116 |
Conv1D(Relu()) | 32 | |
Pooling layer | 32 | |
Conv1D(Relu()) | 32 | |
Pooling layer | 32 | |
Flatten | 224 | |
Dense | 50 | |
Dense | 11 |
Dataset (Generated Sample Ratios) | Number of Fake Samples | Number of Samples after Expansion |
---|---|---|
x (Original sample size) | 0 | 2002 |
2x | 2004 | 4006 (2002 + 2004) |
3x | 4006 | 6008 (2002 + 4006) |
4x | 6118 | 8120 (2002 + 6118) |
5x | 8010 | 10,012 (2002 + 8010) |
6x | 10,012 | 12,014 (2002 + 10,012) |
7x | 12,014 | 14,016 (2002 + 12,014) |
8x | 14,016 | 16,018 (2002 + 14,016) |
9x | 16,018 | 18,020 (2002 + 16,018) |
10x | 18,009 | 20,011 (2002 + 18,009) |
Dataset | Algorithm | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
N-BaIoT | J48 | 0.624 | ? | 0.624 | ? |
Random Forest | 0.755 | ? | 0.756 | ? | |
Bagging | 0.655 | ? | 0.655 | ? | |
PART | 0.673 | ? | 0.673 | ? | |
KStar | 0.789 | 0.699 | 0.701 | 0.699 | |
KNN | 0.768 | 0.773 | 0.768 | 0.771 | |
MLP | 0.811 | 0.711 | 0.706 | 0.708 | |
CNN | 0.712 | 0.726 | 0.673 | 0.698 | |
N-BaIoT after EM-FEDE method processing | J48 | 0.788 | 0.788 | 0.788 | 0.788 |
Random Forest | 0.804 | ? | 0.804 | ? | |
Bagging | 0.762 | ? | 0.762 | ? | |
PART | 0.765 | 0.795 | 0.765 | 0.779 | |
KStar | 0.838 | 0.831 | 0.838 | 0.834 | |
KNN | 0.812 | 0.796 | 0.812 | 0.803 | |
MLP | 0.842 | 0.731 | 0.678 | 0.703 | |
CNN | 0.769 | 0.828 | 0.736 | 0.779 |
Algorithm | Optimal Generation Sample Ratio nx (1 ≤ n ≤ 10) | Accuracy of the Original Dataset | Accuracy of the Mixed Dataset with the Optimal Generation Sample Ratio | The Percentage of Growth |
---|---|---|---|---|
J48 | 10x | 0.624 | 0.843 | 21.9% |
Random Forest | 6x | 0.755 | 0.817 | 6.2% |
Bagging | 2x | 0.655 | 0.849 | 19.4% |
PART | 5x | 0.673 | 0.765 | 9.2% |
KStar | 4x | 0.789 | 0.852 | 6.3% |
KNN | 4x | 0.768 | 0.833 | 7% |
MLP | 10x | 0.811 | 0.845 | 3.4% |
CNN | 10x | 0.712 | 0.771 | 5.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Wang, J.; Yang, T.; Li, Q.; Nijhum, N.A. An Enhancement Method in Few-Shot Scenarios for Intrusion Detection in Smart Home Environments. Electronics 2023, 12, 3304. https://doi.org/10.3390/electronics12153304
Chen Y, Wang J, Yang T, Li Q, Nijhum NA. An Enhancement Method in Few-Shot Scenarios for Intrusion Detection in Smart Home Environments. Electronics. 2023; 12(15):3304. https://doi.org/10.3390/electronics12153304
Chicago/Turabian StyleChen, Yajun, Junxiang Wang, Tao Yang, Qinru Li, and Nahian Alom Nijhum. 2023. "An Enhancement Method in Few-Shot Scenarios for Intrusion Detection in Smart Home Environments" Electronics 12, no. 15: 3304. https://doi.org/10.3390/electronics12153304
APA StyleChen, Y., Wang, J., Yang, T., Li, Q., & Nijhum, N. A. (2023). An Enhancement Method in Few-Shot Scenarios for Intrusion Detection in Smart Home Environments. Electronics, 12(15), 3304. https://doi.org/10.3390/electronics12153304