Incremental Online Machine Learning for Detecting Malicious Nodes in Vehicular Communications Using Real-Time Monitoring
Abstract
:1. Introduction
- We introduce a more dynamic approach for detecting attacks based on incremental online machine learning algorithms trained on data generated in real-time;
- We collect data form VANET scenarios using a robust methodology for VANET simulations based two well-known simulators, namely SUMO and NS-3;
- We select essential features that are relevant in capturing the behavior of black hole nodes in the AODV routing protocol;
- We assess the overall performance of classifiers in terms of multiple performance metrics, namely Accuracy, Precision, Recall and F1-score. Further, each performance metric is tracked over time to continuously evaluate the classifiers;
- The complexities of both classifiers in terms of training and testing time are computed and compared.
2. Related Works
3. Materials and Methods
3.1. Incremental Online Learning
- Adaptive, because it can adjust to changing data patterns over time. This suggests that it is capable of learning and improving its predictions or decisions as new data arrive [23].
3.2. Proposed Method
- Initial model training
- Incremental model training
- Attack detection
3.3. Data Collection
3.3.1. Simulation Environment and Scenarios
3.3.2. Definition of Features
- Routing Behavior Analysis: this involves tracking the overall routing control packets in AODV and monitoring particular packets such as RREQ (Route Request), RREP (Route Reply), and RERR (Route Error). These parameters can contribute to the detection of the black hole attack, so they are summarized in the CTRLpackets, CountRREQ, CountRREP, and CountRERR features;
- Traffic Analysis: this involves monitoring the traffic characteristics and keeping track of the number of bytes that are sent or received by each node;
- Dropping ratio monitoring: observing the dropping rate of packets can help in the identification of nodes that selectively drop or discard incoming packets, indicating malicious behavior;
- Throughput monitoring: black hole attack typically drops a large amount of data, which may decrease significantly the throughput, making it an important metric to monitor for detection purposes.
3.4. Data Preprocessing
3.5. Incremental Online Algorithms
Algorithm 1: Pseudo Code of Adaptive Random Forest |
Inputs: n_trees: the number of trees in the ensemble, STREAM: the stream of data instances, f_s_size: size of the random subset of features to select for each split.Outputs: arf_model: the trained ARF model
|
Algorithm 2: Pseudo Code of K-Nearest Neighbors |
Inputs: K: The number of nearest neighbors to consider, M: The memory size or maximum number of instances to retain, STREAM: the stream of data instances. Outputs: Predicted label for each instance in the data stream
Set the memory size (M). Initialize an empty memory buffer.
If the memory buffer is full, replace the oldest instance in the buffer with the new instance.
|
3.6. Prequential Evaluation
- Pretrain the model with initial data;
- Test and train: For each incoming data instance, the model is tested on the current instance and then trained using the same instance, using the partial_fit() method;
- Evaluate: The model’s performance is tracked over time where metrics are updated over time.
Algorithm 3: Prequential Accuracy for Model Evaluation |
Inputs: D: a stream of data (X,y), Arf: the classifier to be evaluated, pretrain_size: the number of samples for pretraining the model, Ts: test step, number of samples to process between each model test, max_samples: the maximum number of samples to evaluate. Outputs: Acc_list: a list containing accuracy measures after each test step.
(X, y) = D.next_sample() Arf.partial_fit(X, y) end for 4. Repeat for max_samples iterations: for i in range(max_samples): (X, y) = D.next_sample() prediction = model.predict(X) Arf.partial_fit(X, y) TotalPred+= 1 if y == prediction: CorrPred+= 1 if TotalPred % Ts == 0: accuracy = CorrPred/TotalPred Acc_list.append(accuracy) CorrPred = 0 end if end for 5. Return Acc_list |
4. Results and Discussion
4.1. Performance Assessment of Classifiers
4.2. Training and Testing Time
4.3. Comparison with State-of-the-Art Methods
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ajjaj, S.; El Houssaini, S.; Hain, M.; El Houssaini, M.-A. A New Multivariate Approach for Real Time Detection of Routing Security Attacks in VANETs. Information 2022, 13, 282. [Google Scholar] [CrossRef]
- Banafshehvaragh, S.T.; Rahmani, A.M. Intrusion, Anomaly, and Attack Detection in Smart Vehicles. Microprocess. Microsyst. 2023, 96, 104726. [Google Scholar] [CrossRef]
- Mchergui, A.; Moulahi, T.; Zeadally, S. Survey on Artificial Intelligence (AI) Techniques for Vehicular Ad-Hoc Networks (VANETs). Veh. Commun. 2022, 34, 100403. [Google Scholar] [CrossRef]
- Nallaperuma, D.; Nawaratne, R.; Bandaragoda, T.; Adikari, A.; Nguyen, S.; Kempitiya, T.; De Silva, D.; Alahakoon, D.; Pothuhera, D. Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management. IEEE Trans. Intell. Transp. Syst. 2019, 20, 4679–4690. [Google Scholar] [CrossRef]
- Losing, V.; Hammer, B.; Wersing, H. Incremental On-Line Learning: A Review and Comparison of State of the Art Algorithms. Neurocomputing 2018, 275, 1261–1274. [Google Scholar] [CrossRef]
- López, J.M. Fast and Slow Machine Learning. Ph.D. Thesis, Université Paris-Saclay–Télécom Paristech, Paris, France, 2019. [Google Scholar]
- Malik, A.; Khan, M.Z.; Faisal, M.; Khan, F.; Seo, J.-T. An Efficient Dynamic Solution for the Detection and Prevention of Black Hole Attack in VANETs. Sensors 2022, 22, 1897. [Google Scholar] [CrossRef]
- Ajjaj, S.; El Houssaini, S.; Hain, M.; El Houssaini, M.-A. Performance Assessment and Modeling of Routing Protocol in Vehicular Ad Hoc Networks Using Statistical Design of Experiments Methodology: A Comprehensive Study. ASI 2022, 5, 19. [Google Scholar] [CrossRef]
- Documentation-SUMO Documentation. Available online: https://sumo.dlr.de/docs/index.html (accessed on 21 September 2021).
- Ns-3|a Discrete-Event Network Simulator for Internet Systems. Available online: https://www.nsnam.org/ (accessed on 21 September 2021).
- Gomes, H.M.; Bifet, A.; Read, J.; Barddal, J.P.; Enembreck, F.; Pfharinger, B.; Holmes, G.; Abdessalem, T. Adaptive Random Forests for Evolving Data Stream Classification. Mach. Learn. 2017, 106, 1469–1495. [Google Scholar] [CrossRef]
- Montiel, J.; Jesse, R.; Bifet, A.; Talel, A. Scikit-Multiflow: A Multi-Output Streaming Framework. J. Mach. Learn. Res. 2018, 19, 2914–2915. [Google Scholar]
- Karagiannis, D.; Argyriou, A. Jamming Attack Detection in a Pair of RF Communicating Vehicles Using Unsupervised Machine Learning. Veh. Commun. 2018, 13, 56–63. [Google Scholar] [CrossRef]
- Singh, P.K.; Gupta, S.; Vashistha, R.; Nandi, S.K.; Nandi, S. Machine Learning Based Approach to Detect Position Falsification Attack in VANETs. In Security and Privacy; Nandi, S., Jinwala, D., Singh, V., Laxmi, V., Gaur, M.S., Faruki, P., Eds.; Springer: Singapore, 2019; Volume 939, pp. 166–178. ISBN 9789811375606. [Google Scholar]
- Singh, P.K.; Gupta, R.R.; Nandi, S.K.; Nandi, S. Machine Learning Based Approach to Detect Wormhole Attack in VANETs. In Web, Artificial Intelligence and Network Applications; Barolli, L., Takizawa, M., Xhafa, F., Enokido, T., Eds.; Springer International Publishing: Cham, Switzerland, 2019; Volume 927, pp. 651–661. ISBN 978-3-030-15034-1. [Google Scholar]
- Sonker, A.; Gupta, R.K. A New Procedure for Misbehavior Detection in Vehicular Ad-Hoc Networks Using Machine Learning. Int. J. Electr. Comput. Eng. IJECE 2021, 11, 2535. [Google Scholar] [CrossRef]
- Bangui, H.; Ge, M.; Buhnova, B. A Hybrid Machine Learning Model for Intrusion Detection in VANET. Computing 2022, 104, 503–531. [Google Scholar] [CrossRef]
- Kaur, G.; Kakkar, D. Hybrid Optimization Enabled Trust-Based Secure Routing with Deep Learning-Based Attack Detection in VANET. Ad Hoc Netw. 2022, 136, 102961. [Google Scholar] [CrossRef]
- Karthiga, B.; Durairaj, D.; Nawaz, N.; Venkatasamy, T.K.; Ramasamy, G.; Hariharasudan, A. Intelligent Intrusion Detection System for VANET Using Machine Learning and Deep Learning Approaches. Wirel. Commun. Mob. Comput. 2022, 2022, 5069104. [Google Scholar] [CrossRef]
- Sharma, A. Position Falsification Detection in VANET with Consecutive BSM Approach Using Machine Learning Algorithm. Ph.D. Thesis, Faculty of Graduate Studies through the School of Computer Science, Windsor, ON, Canada, 2021. [Google Scholar]
- Zhang, C.; Chen, K.; Zeng, X.; Xue, X. Misbehavior Detection Based on Support Vector Machine and Dempster-Shafer Theory of Evidence in VANETs. IEEE Access 2018, 6, 59860–59870. [Google Scholar] [CrossRef]
- Ercan, S.; Ayaida, M.; Messai, N. Misbehavior Detection for Position Falsification Attacks in VANETs Using Machine Learning. IEEE Access 2022, 10, 1893–1904. [Google Scholar] [CrossRef]
- Rojas, J.S.; Rendon, A.; Corrales, J.C. Consumption Behavior Analysis of over the Top Services: Incremental Learning or Traditional Methods? IEEE Access 2019, 7, 136581–136591. [Google Scholar] [CrossRef]
- Jin, B.; Jing, Z.; Zhao, H. Incremental and Decremental Extreme Learning Machine Based on Generalized Inverse. IEEE Access 2017, 5, 20852–20865. [Google Scholar] [CrossRef]
- Almeida, A.; Brás, S.; Sargento, S.; Pinto, F.C. Time Series Big Data: A Survey on Data Stream Frameworks, Analysis and Algorithms. J. Big Data 2023, 10, 83. [Google Scholar] [CrossRef]
- OpenStreetMap. Available online: https://www.openstreetmap.org/ (accessed on 4 September 2023).
- Das, S.R.; Belding-Royer, E.M.; Perkins, C.E. Ad Hoc On-Demand Distance Vector (AODV) Routing. Available online: https://tools.ietf.org/html/rfc3561 (accessed on 20 December 2020).
- Singh, D.; Singh, B. Investigating the Impact of Data Normalization on Classification Performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
- Hidalgo, J.I.G.; Maciel, B.I.F.; Barros, R.S.M. Experimenting with Prequential Variations for Data Stream Learning Evaluation. Comput. Intell. 2019, 35, 670–692. [Google Scholar] [CrossRef]
- AlQabbany, A.O.; Azmi, A.M. Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams. Entropy 2021, 23, 859. [Google Scholar] [CrossRef] [PubMed]
- Rashid, K.; Saeed, Y.; Ali, A.; Jamil, F.; Alkanhel, R.; Muthanna, A. An Adaptive Real-Time Malicious Node Detection Framework Using Machine Learning in Vehicular Ad-Hoc Networks (VANETs). Sensors 2023, 23, 2594. [Google Scholar] [CrossRef] [PubMed]
Parameter | Value |
---|---|
Platform | Linux, Ubuntu environment. |
Simulator of network | NS3.29 |
Simulator of Mobility | SUMO-0.32.0 |
Routing protocol | AODV |
Mac/Phy Layer | IEEE 802.11p |
WiFichannel | YansWifi |
Propagation model | friisLoss model |
Transmission power | 33 dbm |
Transport protocol | UDP |
Traffic type | CBR (constant bit rate) |
Packet size | 64 bytes |
Number of vehicles | 50 |
Runtime | 360 s |
Feature Name | Feature Description |
---|---|
CTRLpackets | AODV routing control packets: a black hole node may advertise a fake and optimized route to the destination. Thus, it is essential to keep track of the routing control packets and detect any changes in their number advertised by a node. |
CountRREQ | Number of Route Request messages that are used by the nodes to discover new routes to other nodes in the network. In the case of a black hole attack, the malicious node may not generate any RREQ messages because it is not interested in receiving any packets, and instead, it drops all the packets that it receives. |
CountRREP | Number of Route Reply messages that are sent by nodes in response to RREQ messages to establish a route to the destination node. In the case of a black hole attack, the black hole sends fake RREP; if a large number of RREP messages are received from a single node, it could be an indication that the node is a potential black hole. |
CountRERR | Number of Route Error (RERR) Messages: a black hole node may generate an abnormal amount of RERR messages, indicating that the node is not following the protocol’s correct path discovery mechanism. |
Throughput | Throughput measures the amount of data that can be transferred in a given time. A sudden decrease in throughput can be an indication of a black hole attack. |
SentPckts | The number of sent packets. The attacker node captures and drops all routing packets, leading to a significant decrease in the number of successfully sent packets. |
ReceivedPckts | The number of received packets. The rogue node selectively drops all the data packets received from other nodes, which may cause a decrease in the number of bytes that are received. |
DroppingRatio | The dropping ratio measures the percentage of packets dropped by a node, and this feature can be used to detect if a node is dropping a higher number of packets than normal. |
Classifier | Pretrain_Size | |||
---|---|---|---|---|
400 | 600 | 1000 | 1200 | |
Adaptive Random Forest (ARF) | 94.73% | 94.77% | 94.83% | 94.90% |
K-Nearest Neighbors classifier (KNN) | 86.86% | 87.13% | 87.30% | 87.83% |
Incremental Classifier | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Adaptive Random Forest (ARF) | 94.90% | 93.10% | 96.81% | 94.92% |
KNN Classifier | 87.38% | 87.58% | 86.62% | 87.09% |
Study | Approach | Real-Time | Dataset | Tools | Performance Metrics | Continuous Learning |
---|---|---|---|---|---|---|
[14] | SVM and Logistic Regression | No | VeReMi | PYTHON tools | F1-score | No |
[16] | Binary classification with Naïve Bayes, decision tree and Random Forest | No | VeReMi | PYTHON tools | Accuracy | No |
[17] | Random Forest and a posterior detection based on coresets | No | CICIDS2017 | MATLAB | Accuracy | No |
[18] | Hybrid optimization-based Deep Maxout Network (DMN) | No | BoT-IoT data and NSL-KDD data | PYTHON tools | Precision and Recall | No |
[19] | Adaptive Neuro Fuzzy Inference System (ANFIS) and Convolutional Neural Networks (CNN) | No | CICIDS 2017 | PYTHON tools | Precision Sensitivity Recall Specificity | No |
[22] | ML methods for classification, KNN and RF | No | VeReMi | PYTHON tools | Accuracy F1-Score | No |
[21] | SVM | No | Generated data | SUMO and OMNeT++ PYTHON tools | TPR, FPR, and ACC | No |
[31] | Distributed multi-layer classifier | Yes | Generated data | OMNET++ SUMO | Accuracy | No |
Our work | Incremental Online classification using Adaptive Random Forest (ARF) and K-Nearest Neighbors (KNN) classifiers | Yes | Generated data | NS-3 SUMO Python tools | Accuracy Recall Precision F1-score Training time Testing time | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ajjaj, S.; El Houssaini, S.; Hain, M.; El Houssaini, M.-A. Incremental Online Machine Learning for Detecting Malicious Nodes in Vehicular Communications Using Real-Time Monitoring. Telecom 2023, 4, 629-648. https://doi.org/10.3390/telecom4030028
Ajjaj S, El Houssaini S, Hain M, El Houssaini M-A. Incremental Online Machine Learning for Detecting Malicious Nodes in Vehicular Communications Using Real-Time Monitoring. Telecom. 2023; 4(3):629-648. https://doi.org/10.3390/telecom4030028
Chicago/Turabian StyleAjjaj, Souad, Souad El Houssaini, Mustapha Hain, and Mohammed-Alamine El Houssaini. 2023. "Incremental Online Machine Learning for Detecting Malicious Nodes in Vehicular Communications Using Real-Time Monitoring" Telecom 4, no. 3: 629-648. https://doi.org/10.3390/telecom4030028
APA StyleAjjaj, S., El Houssaini, S., Hain, M., & El Houssaini, M. -A. (2023). Incremental Online Machine Learning for Detecting Malicious Nodes in Vehicular Communications Using Real-Time Monitoring. Telecom, 4(3), 629-648. https://doi.org/10.3390/telecom4030028