An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams
Abstract
:1. Introduction
2. Review of Literature
3. Methodology
3.1. E-Stream Algorithm
3.2. Overview of the E-Stream Algorithm
3.2.1. Dimensionality Reduction
3.2.2. Reducing the Entropy Window
3.2.3. Entropy-Based Clustering for Dynamic Data Streams (DenStream)
- Potential micro-clusters are groups under evaluation to assess whether they fulfill the criteria to become core micro-clusters. These are still in an assessment phase and have not yet achieved core status;
- Core micro-clusters are those that have met the necessary criteria and are deemed significant, as they are held in buffer memory. These clusters provide critical insights and are prioritized for more detailed analysis, reflecting important data patterns;
- Weak micro-clusters refer to clusters that, due to the windowing mechanism, have lost their significance and no longer qualify as core micro-clusters. These clusters are gradually removed, allowing memory resources to be dedicated solely to the most relevant clusters.
3.2.4. Micro-Cluster Discovery
3.2.5. Micro-Cluster Modification
3.2.6. Storing Weak Micro-Clusters in a Buffer
3.2.7. Elimination of Micro-Clusters
4. Results
4.1. Quality Evaluation
F-Measures
4.2. Jaccard Index (JI)
- TP represents true positives;
- FP stands for false positives;
- FN refers to false negatives.
4.3. Fowlkes–Mallows Index (FM)
4.3.1. Purity
4.3.2. Rand Index
4.4. Complexity Assessment
5. Conclusions and Discussion
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Al-amri, R.; Murugesan, R.K.; Man, M.; Abdulateef, A.F.; Al-Sharafi, M.A.; Alkahtani, A.A. A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data. Appl. Sci. 2021, 11, 5320. [Google Scholar] [CrossRef]
- Manokaran, J.; Vairavel, G. Smart Anomaly Detection Using Data-Driven Techniques in IoT Edge: A Survey. In Proceedings of the Third International Conference on Communication, Computing and Electronics Systems: ICCCES 2021, Coimbatore, India, 28–29 October 2022; pp. 685–702. [Google Scholar] [CrossRef]
- Nguyen, T.-A.; Le, L.T.; Nguyen, T.D.; Bao, W.; Seneviratne, S.; Hong, C.S.; Tran, N.H. Federated PCA on Grassmann Manifold for IoT Anomaly Detection. IEEE/ACM Trans. Netw. 2024, 32, 4456–4471. [Google Scholar] [CrossRef]
- Adhikari, D.; Jiang, W.; Zhan, J.; Rawat, D.B.; Bhattarai, A. Recent Advances in Anomaly Detection in Internet of Things: Status, Challenges, and Perspectives. Comput. Sci. Rev. 2024, 54, 100665. [Google Scholar] [CrossRef]
- Chatterjee, A.; Ahmed, B.S. IoT Anomaly Detection Methods and Applications: A Survey. Internet Things 2022, 19, 100568. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, D.; Zhang, X.; Yuan, Z.; Cheng, X. Learning Graph Structures with Transformer for Multivariate Time-Series Anomaly Detection in IoT. IEEE Internet Things J. 2022, 9, 9179–9189. [Google Scholar] [CrossRef]
- Stampe, L.; Stockdiek, J.L.; Grimme, B.; Grimme, C. Benchmarking Sentence Embeddings in Textual Stream Clustering with Applications to Campaign Detection. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June 2024–5 July 2024; pp. 1–8. [Google Scholar] [CrossRef]
- de Sousa, R.G.; Meira Neto, A.C.; Fantinato, M.; Peres, S.M.; Reijers, H.A. Integrated Detection and Localization of Concept Drifts in Process Mining with Batch and Stream Trace Clustering Support. Data Knowl. Eng. 2024, 149, 102253. [Google Scholar] [CrossRef]
- Faroughi, A.; Boostani, R.; Tajalizadeh, H.; Javidan, R. ARD-Stream: An Adaptive Radius Density-Based Stream Clustering. Future Gener. Comput. Syst. 2023, 149, 416–431. [Google Scholar] [CrossRef]
- Islam, M.K.; Sarker, B. An Online Clustering Approach for Evolving Data-Stream Based on Data Point Density. In Proceedings of the International Conference on Emerging Technologies and Intelligent Systems: ICETIS 2021, Al Buraimi, Oman, 25–26 June 2022; pp. 105–115. [Google Scholar] [CrossRef]
- Sun, J.; Du, M.; Sun, C.; Dong, Y. Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1–14. [Google Scholar] [CrossRef]
- Kasongo, S.M.; Sun, Y. A Deep Learning Method with Wrapper Based Feature Extraction for Wireless Intrusion Detection System. Comput. Secur. 2020, 92, 101752. [Google Scholar] [CrossRef]
- Farhan, I.B.; Jasim, A.D. A Survey of Intrusion Detection Using Deep Learning in Internet of Things. Iraqi J. Comput. Sci. Math. 2022, 3, 83–93. [Google Scholar] [CrossRef]
- Hu, S.; Wang, R.; Ye, Y. Interactive Information Bottleneck for High-Dimensional Co-Occurrence Data Clustering. Appl. Soft Comput. 2021, 111, 107837. [Google Scholar] [CrossRef]
- Esfandiari, A.; Khaloozadeh, H.; Farivar, F. Interaction-Based Clustering Algorithm for Feature Selection: A Multivariate Filter Approach. Int. J. Mach. Learn. Cybern. 2023, 14, 1769–1782. [Google Scholar] [CrossRef]
- Ghosh, T.; Kirby, M. Linear Centroid Encoder for Supervised Principal Component Analysis. Pattern Recognit. 2024, 155, 110634. [Google Scholar] [CrossRef]
- Alhenawi, E.; Al-Sayyed, R.; Hudaib, A.; Mirjalili, S. Feature Selection Methods on Gene Expression Microarray Data for Cancer Classification: A Systematic Review. Comput. Biol. Med. 2022, 140, 105051. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Jin, H.; Chen, X.; Wang, B.; Yang, B.; Qian, B. Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams. Sensors 2023, 23, 1520. [Google Scholar] [CrossRef]
- Kumar Dinkar, A.; Alimul Haque, M.; Kumar Choudhary, A. Enhancing IoT Data Analysis with Machine Learning: A Comprehensive Overview. LatIA 2024, 2, 9. [Google Scholar] [CrossRef]
- Miah, M.A.R.; Kabir, R.; Sidq, L. An Integration of IoT and IoE Towards Sustainable Building Energy Management. In Proceedings of the 2023 IEEE PES 15th Asia-Pacific Power and Energy Engineering Conference (APPEEC), Chiang Mai, Thailand, 6–9 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, J.; Shen, W. A Review of Ensemble Learning Algorithms Used in Remote Sensing Applications. Appl. Sci. 2022, 12, 8654. [Google Scholar] [CrossRef]
- Osman, A.I.; Nasr, M.; Farghali, M.; Rashwan, A.K.; Abdelkader, A.; Al-Muhtaseb, A.H.; Ihara, I.; Rooney, D.W. Optimizing Biodiesel Production from Waste with Computational Chemistry, Machine Learning and Policy Insights: A Review. Env. Chem. Lett. 2024, 22, 1005–1071. [Google Scholar] [CrossRef]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An Introduction to Deep Reinforcement Learning. Found. Trends® Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
- Li, J.; Du, X.; Martins, J.R.R.A. Machine Learning in Aerodynamic Shape Optimization. Prog. Aerosp. Sci. 2022, 134, 100849. [Google Scholar] [CrossRef]
- Asif, M.; Ihsan, A.; Khan, W.U.; Ranjha, A.; Zhang, S.; Wu, S.X. Energy-Efficient Backscatter-Assisted Coded Cooperative NOMA for B5G Wireless Communications. IEEE Trans. Green. Commun. Netw. 2023, 7, 70–83. [Google Scholar] [CrossRef]
- Zhao, S. Energy Efficient Resource Allocation Method for 5G Access Network Based on Reinforcement Learning Algorithm. Sustain. Energy Technol. Assess. 2023, 56, 103020. [Google Scholar] [CrossRef]
- Alamu, O.; Olwal, T.O.; Djouani, K. Cooperative NOMA Networks with Simultaneous Wireless Information and Power Transfer: An Overview and Outlook. Alex. Eng. J. 2023, 71, 413–438. [Google Scholar] [CrossRef]
- Kolajo, T.; Daramola, O.; Adebiyi, A. Streaming Data and Data Streams. In Wiley StatsRef: Statistics Reference Online; Wiley: Hoboken, NJ, USA, 2021; pp. 1–16. [Google Scholar] [CrossRef]
- Al-Khamees, H.A.; Al-A’araji, N.; Al-Shamery, E.S. Survey: Clustering Techniques of Data Stream. In Proceedings of the 2021 1st Babylon International Conference on Information Technology and Science (BICITS), Babil, Iraq, 28–29 April 2021; pp. 113–119. [Google Scholar] [CrossRef]
- Soleymanian, M.; Mashayekhi, H.; Rahimi, M. An Incremental Clustering Algorithm Based on Semantic Concepts. Knowl. Inf. Syst. 2024, 66, 3303–3335. [Google Scholar] [CrossRef]
- Al-Ali, A.R.; Gupta, R.; Zualkernan, I.; Das, S.K. Role of IoT Technologies in Big Data Management Systems: A Review and Smart Grid Case Study. Pervasive Mob. Comput. 2024, 100, 101905. [Google Scholar] [CrossRef]
- Liu, X.; Dong, X.; Jia, N.; Zhao, W. Federated Learning-Oriented Edge Computing Framework for the IIoT. Sensors 2024, 24, 4182. [Google Scholar] [CrossRef] [PubMed]
- Nuryanto, U.W.; Basrowi, B.; Quraysin, I. Big Data and IoT Adoption in Shaping Organizational Citizenship Behavior: The Role of Innovation Organizational Predictor in the Chemical Manufacturing Industry. Int. J. Data Netw. Sci. 2024, 8, 225–268. [Google Scholar] [CrossRef]
- Srirama, S.N. A Decade of Research in Fog Computing: Relevance, Challenges, and Future Directions. Softw. Pract. Exp. 2024, 54, 3–23. [Google Scholar] [CrossRef]
- Amini, A.; Saboohi, H.; Herawan, T.; Wah, T.Y. MuDi-Stream: A Multi Density Clustering Algorithm for Evolving Data Stream. J. Netw. Comput. Appl. 2016, 59, 370–385. [Google Scholar] [CrossRef]
- Zubaroğlu, A.; Atalay, V. Data Stream Clustering: A Review. Artif. Intell. Rev. 2021, 54, 1201–1236. [Google Scholar] [CrossRef]
- Zeng, S.; Yang, C. Risk Evaluation of Livestream E-Commerce Platforms Based on Expert Trust Networks and CODAS. Expert. Syst. Appl. 2025, 260, 125408. [Google Scholar] [CrossRef]
- Khargotra, R.; Alam, T.; Thu, K.; András, K.; Singh, T. Optimization of Design Parameter of V-Shaped Perforated Blocks in Rectangular Duct of Solar Air Heater by Using Hybrid BWM-CODAS Technique. Sol. Energy Mater. Sol. Cells 2024, 264, 112627. [Google Scholar] [CrossRef]
- Gao, D. An Autoencoder-Based Fast Online Clustering Algorithm for Evolving Data Stream. In Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning, Shanghai, China, 17–19 March 2023; ACM: New York, NY, USA, 2023; pp. 90–95. [Google Scholar] [CrossRef]
- Sun, J.; Du, M.; Lew, Z.; Dong, Y. TWStream: Three-Way Stream Clustering. IEEE Trans. Fuzzy Syst. 2024, 32, 4927–4939. [Google Scholar] [CrossRef]
- Akhter, J.; Ahmed, M.M.; Samsuddoha, M. Online Clustering Technique with Adaptable Threshold and Radius for Evolving Data Stream. In Proceedings of the 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), Rajshahi, Bangladesh, 8–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Al-amri, R.; Murugesan, R.K.; Almutairi, M.; Munir, K.; Alkawsi, G.; Baashar, Y. A Clustering Algorithm for Evolving Data Streams Using Temporal Spatial Hyper Cube. Appl. Sci. 2022, 12, 6523. [Google Scholar] [CrossRef]
- Tareq, M.; Sundararajan, E.A.; Mohd, M.; Sani, N.S. Online Clustering of Evolving Data Streams Using a Density Grid-Based Method. IEEE Access 2020, 8, 166472–166490. [Google Scholar] [CrossRef]
- Mohd, N.; Singh, A.; Bhadauria, H.S. Intrusion Detection System Based on Hybrid Hierarchical Classifiers. Wirel. Pers. Commun. 2021, 121, 659–686. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar] [CrossRef]
- Mandal, P.; Mondal, S.; Cep, R.; Ghadai, R.K. Multi-Objective Optimization of an EDM Process for Monel K-500 Alloy Using Response Surface Methodology-Multi-Objective Dragonfly Algorithm. Sci. Rep. 2024, 14, 20757. [Google Scholar] [CrossRef]
- Acharya, B.R.; Sethi, A.; Das, A.K.; Saha, P.; Pratihar, D.K. Parametric Optimization of Micro-Tool Fabrication through Sheet-EDG Using Nature-Inspired Algorithms. J. Braz. Soc. Mech. Sci. Eng. 2024, 46, 72. [Google Scholar] [CrossRef]
- Xie, J.; Dai, M.; Xia, S.; Zhang, J.; Wang, G.; Gao, X. An Efficient Fuzzy Stream Clustering Method Based on Granular-Ball Structure. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; pp. 901–913. [Google Scholar] [CrossRef]
- Ma, F.; Wang, C.; Huang, J.; Zhong, Q.; Zhang, T. Key Grids Based Batch-Incremental CLIQUE Clustering Algorithm Considering Cluster Structure Changes. Inf. Sci. 2024, 660, 120109. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mutambik, I. An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams. Sensors 2024, 24, 7412. https://doi.org/10.3390/s24227412
Mutambik I. An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams. Sensors. 2024; 24(22):7412. https://doi.org/10.3390/s24227412
Chicago/Turabian StyleMutambik, Ibrahim. 2024. "An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams" Sensors 24, no. 22: 7412. https://doi.org/10.3390/s24227412
APA StyleMutambik, I. (2024). An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams. Sensors, 24(22), 7412. https://doi.org/10.3390/s24227412