Big Data Clustering Using Chemical Reaction Optimization Technique: A Computational Symmetry Paradigm for Location-Aware Decision Support in Geospatial Query Processing
Abstract
:1. Introduction
1.1. Problem Statement and Aim of the Work
1.2. Contribution
2. Related Work
3. The Proposed Geospatial Big Data Query Model
3.1. Problem Formulation
3.2. Proposed Method
4. Experimental Results
4.1. Experiment 1: KNN Search Analysis
4.2. Experiment 2: Comparative Study for KNN Search
4.3. Experiment 3: Clustering Error
4.4. Experiment 4: Convergence Time
4.5. Experiment 5: Clustering Accuracy
4.6. Limitation
- -
- Many decision-making processes in our world include optimization issues that are NP-hard. The large-scale, dynamism, and vagueness of these problems limit the use of independent optimization techniques.
- -
- Metaheuristics approaches (such as Chemical Reaction Optimization) typically assume that problem inputs, underlying objective functions, and optimization constraints are either deterministic or follow basic probabilistic rules. As a result of these high assumptions, several deterministic models are oversimplified copies of real-world systems.
- -
- In the absence of uncertainty in the optimization formulation, the optimal solutions for these systems may be unstable and sensitive to modest changes in input parameters.
- -
- Numerous metaheuristics use some type of stochastic optimization, such that the solution discovered is dependent on the provided random variables.
- -
- Compared with conventional approaches, more computing resources are needed.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Deng, X.; Liu, P.; Liu, X.; Wang, R.; Zhang, Y.; He, J.; Yao, Y. Geospatial big data: New paradigm of remote sensing applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3841–3851. [Google Scholar] [CrossRef]
- Li, S.; Dragicevic, S.; Castro, F.A.; Sester, M.; Winter, S.; Coltekin, A.; Pettit, C.; Jiang, B.; Haworth, J.; Stein, A.; et al. Geospatial big data handling theory and methods: A review and research challenges. ISPRS J. Photogramm. Remote Sens. 2016, 115, 119–133. [Google Scholar] [CrossRef] [Green Version]
- Li, Z. Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions. In High Performance Computing for Geospatial Applications; Springer: Cham, Switzerland, 2020; pp. 53–76. [Google Scholar]
- Wang, H.; Zhu, S. Multisource Aggregation Search and Scheduling for Remote Sensing Data Cluster. IEEE Geosci. Remote Sens. Lett. 2019, 7, 352–356. [Google Scholar] [CrossRef]
- Limkar, S.V.; Jha, R.K. A novel method for parallel indexing of real time geospatial big data generated by IoT devices. Future Gener. Comput. Syst. 2019, 97, 433–452. [Google Scholar] [CrossRef]
- Eldawy, A.; Mokbel, M.F. Spatialhadoop: A mapreduce framework for spatial data. In Proceedings of the IEEE 31st International Conference on Data Engineering, Seoul, Republic of Korea, 13–17 April 2015; pp. 1352–1363. [Google Scholar]
- Lenka, R.K.; Barik, R.K.; Gupta, N.; Ali, S.M.; Rath, A.; Dubey, H. Comparative analysis of SpatialHadoop and GeoSpark for geospatial big data analytics. In Proceedings of the 2nd International Conference on Contemporary Computing and Informatics, Greater Noida, India, 14–17 December 2016; pp. 484–488. [Google Scholar]
- Lee, K.; Ganti, R.K.; Srivatsa, M.; Liu, L. Efficient spatial query processing for big data. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, 4 November 2014; pp. 469–472. [Google Scholar]
- Aljawarneh, I.M.; Bellavista, P.; Corradi, A.; Montanari, R.; Foschini, L.; Zanotti, A. Efficient spark-based framework for big geospatial data query processing and analysis. In Proceedings of the IEEE Symposium on Computers and Communications, Heraklion, Greece, 3–6 July 2017; pp. 851–856. [Google Scholar]
- Shirkhorshidi, A.S.; Aghabozorgi, S.; Wah, T.Y.; Herawan, T. Big data clustering: A review. In Proceedings of the International Conference on Computational Science and Its Applications, Guimarães, Portugal, 30 June 2014; pp. 707–720. [Google Scholar]
- Fahad, A.; Alshatri, N.; Tari, Z.; Alamri, A.; Khalil, I.; Zomaya, A.Y.; Foufou, S.; Bouras, A. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2014, 2, 267–279. [Google Scholar] [CrossRef]
- Ayed, A.B.; Halima, M.B.; Alimi, A.M. Survey on clustering methods: Towards fuzzy clustering for big data. In Proceedings of the 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), Tunis, Tunisia, 11–14 August 2014; pp. 331–336. [Google Scholar]
- Arora, S.; Chana, I. A survey of clustering techniques for big data analysis. In Proceedings of the 5th International Conference-Confluence: The Next Generation Information Technology Summit, Noida, India, 25–26 September 2014; pp. 59–65. [Google Scholar]
- Shi, Z.; Pun-Cheng, L.S. Spatiotemporal data clustering: A survey of methods. ISPRS Int. J. Geo-Inf. 2019, 8, 112. [Google Scholar] [CrossRef] [Green Version]
- Xinxiang, H.; Henan, X. A new data mining algorithm based on Mapreduce and Hadoop. Int. J. Signal Proc. Image Process. Pattern Recognit. 2014, 7, 131–142. [Google Scholar]
- Mirzasoleiman, B.; Karbasi, A.; Sarkar, R.; Krause, A. Distributed sub-modular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems; ACM Digital Library: New York, NY, USA, 2013; pp. 2049–2057. [Google Scholar]
- Ene, A.; Im, S.; Moseley, B. Fast clustering using MapReduce. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21 August 2011; pp. 681–689. [Google Scholar]
- Yue, X.; Man, W.; Yue, J.; Liu, G. Parallel k-medoids++ spatial clustering algorithm based on mapreduce. arXiv 2016, arXiv:1608.06861. [Google Scholar]
- Martino, A.; Rizzi, A.; Mascioli, F.M. Distance matrix pre-caching and distributed computation of internal validation indices in k-medoids clustering. In Proceedings of the International Joint Conference on Neural Networks, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Bendechache, M.; Kechadi, M.T.; Le-Khac, N.A. Efficient large scale clustering based on data partitioning. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, Montreal, QC, Canada, 17–19 October 2016; pp. 612–621. [Google Scholar]
- Bendechache, M.; Le-Khac, N.A.; Kechadi, M.T. Performance evaluation of a distributed clustering approach for spatial datasets. In Australasian Conference on Data Mining; Springer: Singapore, 2017; pp. 38–56. [Google Scholar]
- Shaikh, S.; Memon, M.; Kim, K. A multi-criteria decision-making approach for ideal business location identification. Appl. Sci. 2021, 11, 4983. [Google Scholar] [CrossRef]
- Massai, L.; Nesi, P.; Pantaleo, G. PAVAL: A location-aware virtual personal assistant for retrieving geolocated points of interest and location-based services. Eng. Appl. Artif. Intell. 2018, 77, 70–85. [Google Scholar] [CrossRef]
- Yu, J.; Sarwat, M. GeoSparkViz: A cluster computing system for visualizing massive-scale geospatial data. VLDB J. 2021, 30, 237–258. [Google Scholar] [CrossRef]
- Peng, Q.; You, L.; Dong, N. A location-aware GIServices quality prediction model via collaborative filtering. Int. J. Digit. Earth 2017, 11, 897–912. [Google Scholar] [CrossRef]
- García-García, F.; Corral, A.; Iribarne, L.; Vassilakopoulos, M.; Manolopoulos, Y. Efficient distance join query processing in distributed spatial data management systems. Inf. Sci. 2019, 512, 985–1008. [Google Scholar] [CrossRef]
- Dritsas, E.; Kanavos, A.; Trigka, M.; Vonitsanos, G.; Sioutas, S.; Tsakalidis, A. Trajectory clustering and k-NN for robust privacy preserving k-NN query processing in GeoSpark. Algorithms 2020, 13, 182. [Google Scholar] [CrossRef]
- García-García, F.; Corral, A.; Iribarne, L.; Vassilakopoulos, M. Improving distance-join query processing with Voronoi-diagram based partitioning in SpatialHadoop. Future Gener. Comput. Syst. 2019, 111, 723–740. [Google Scholar] [CrossRef]
- Qiao, B.; Ma, L.; Chen, L.; Hu, B. A PID-Based kNN Query Processing Algorithm for Spatial Data. Sensors 2022, 22, 7651. [Google Scholar] [CrossRef]
- Schmidtke, H. Location-aware systems or location-based services: A survey with applications to CoViD-19 contact tracking. J. Reliab. Intell. Environ. 2020, 6, 191–214. [Google Scholar] [CrossRef]
- Ghosh, S.; Das, J.; Ghosh, S. Locator: A cloud-fog-enabled framework for facilitating efficient location based services. In Proceedings of the International Conference on Communication Systems & Networks, Bengaluru, India, 7–11 January 2020; pp. 87–92. [Google Scholar]
- Manna, P.; Bonfante, A.; Colandrea, M.; Di Vaio, C.; Langella, G. A geospatial decision support system to assist olive growing at the landscape scale. Comput. Electron. Agric. 2019, 168, 105143. [Google Scholar] [CrossRef]
- Sadeghi-Niaraki, A.; Jelokhani-Niaraki, M.; Choi, S.M. A volunteered geographic information-based environmental decision support system for waste management and decision making. Sustainability 2020, 12, 6012. [Google Scholar] [CrossRef]
- Keenan, P.; Jankowski, P. Spatial decision support systems: Three decades on. Decis. Support Syst. 2018, 116, 64–76. [Google Scholar] [CrossRef]
- Shin, H.; Lee, K.; Kwon, H. A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark. J. Supercomput. 2021, 78, 2556–2579. [Google Scholar] [CrossRef] [PubMed]
- Sajana, T.; Rani, C.S.; Narayana, K.V. A survey on clustering techniques for big data mining. Indian J. Sci. Technol. 2016, 9, 1–12. [Google Scholar] [CrossRef]
- Narayana, G.S.; Vasumathi, D. An attributes similarity-based K-medoids clustering technique in data mining. Arab. J. Sci. Eng. 2018, 43, 3979–3992. [Google Scholar] [CrossRef]
- Alasadi, S.A.; Bhaya, W.S. Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 2017, 12, 4102–4107. [Google Scholar]
- Uma, K.; Hanumanthappa, M. Data Collection Methods and Data Pre-processing Techniques for Healthcare Data Using Data Mining. Int. J. Sci. Eng. Res. 2017, 8, 1131–1136. [Google Scholar]
- Hudaib, A.; Khanafseh, M.; Surakhi, O. An improved version of K-medoid algorithm using CRO. Mod. Appl. Sci. 2018, 12, 116. [Google Scholar] [CrossRef]
- Majumder, S.; Sayed, A.; Jerin, J.; Inzamam-Ul-Hossain, M. Prediction of diabetics using chemical reaction optimization. In Proceedings of the International Conference on Computing Communication and Networking Technologies, Kharagpur, India,, 6–8 July 2021; pp. 1–5. [Google Scholar]
- Martino, A.; Rizzi, A.; Mascioli, F.M. Efficient Approaches for Solving the Large-Scale k-medoids Problem. In Proceedings of the 9th International Joint Conference on Computational Intelligence, Funchal-Madeira, Portugal, 1–3 November 2017; pp. 338–347. [Google Scholar]
- Whelan, M.; Le Khac, N.A.; Kechadi, M.T. Data reduction in very large spatio-temporal datasets. In Proceedings of the 19th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises, Larissa, Greece, 28–30 June 2010; pp. 104–109. [Google Scholar]
- Laloux, J.F.; Le-Khac, N.A.; Kechadi, M.T. Efficient distributed approach for density-based clustering. In Proceedings of the IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, Paris, France, 27–29 June 2011; pp. 145–150. [Google Scholar]
- Wang, B.; Yin, J.; Hua, Q.; Wu, Z.; Cao, J. Parallelizing k-means-based clustering on spark. In Proceedings of the International Conference on Advanced Cloud and Big Data, Chengdu, China, 13–16 August 2016; pp. 31–36. [Google Scholar]
- Bendechache, M.; Kechadi, M.T. Distributed clustering algorithm for spatial data mining. In Proceedings of the 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services, Fuzhou, China, 8–10 July 2015; pp. 60–65. [Google Scholar]
- Naacke, H.; Curé, O.; Amann, B. SPARQL query processing with Apache Spark. arXiv 2016, arXiv:1604.08903. [Google Scholar]
- Aly, A.M.; Aref, W.G.; Ouzzani, M. Spatial queries with k-nearest-neighbor and relational predicates. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3 November 2015; pp. 1–10. [Google Scholar]
- Papadias, D.; Zhang, J.; Mamoulis, N.; Tao, Y. Query processing in spatial network databases. In Proceedings of the VLDB Conference, Berlin, Germany, 9–12 September 2003; pp. 802–813. [Google Scholar]
- Piorkowski, M.; Sarafijanovic-Djukic, N.; Grossglauser, M. CRAWDAD Dataset Epfl/Mobility (v2009-02-24), Trace Set: Cab. 2019. Available online: http://crawdad.org/epfl/mobility/20090224/cab (accessed on 1 January 2022).
- Shah, P.; Chaudhary, S. Big data analytics framework for spatial data. In Proceedings of the International Conference on Big Data Analytics, Langkawi, Malaysia, 22 November 2018; pp. 250–265. [Google Scholar]
- Song, H.; Lee, J.; Han, W. PAMAE: Parallel k-medoids clustering with high accuracy and efficiency. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 4 August 2017; pp. 1087–1096. [Google Scholar]
No. of K | K = 3 | K = 5 | K = 15 | ||||||
Number of Rows in million | 5 | 15 | 30 | 5 | 15 | 30 | 5 | 15 | 30 |
Average Latency (Sec) | 10 | 23 | 50 | 20 | 48 | 90 | 52 | 140 | 200 |
Database Size (Number of Rows in Million) | 5 | 10 | 15 | 20 | 25 | 30 |
Proposed Model | 16 | 33 | 45 | 76 | 48 | 90 |
Related Technique [51] | 22 | 42 | 57 | 88 | 95 | 113 |
Clustering Algorithm | Clustering Error |
C1 (k-means) | 0.41 |
C2 (k-prototype) | 0.39 |
C3 (Object Clustering Iterative Learning) | 0.25 |
C4 (Similarity-based k-medoids clustering) | 0.23 |
Proposed Model | 0.19 |
Clustering Algorithm | Average Time |
C2 (K-prototype) | 6 |
C3 (Object Clustering Iterative Learning) | 12 |
C4 (Similarity-based k-medoids clustering) | 8 |
Proposed Model | 16 |
Techniques | Parallel K-medoids clustering [52] | Proposed Model (Parallel CRO-Based K-medoids Clustering) |
Clustering Accuracy | 94.6 | 97.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Neamah, A.F.; Ibrahim, H.K.; Darwish, S.M.; Hassen, O.A. Big Data Clustering Using Chemical Reaction Optimization Technique: A Computational Symmetry Paradigm for Location-Aware Decision Support in Geospatial Query Processing. Symmetry 2022, 14, 2637. https://doi.org/10.3390/sym14122637
Neamah AF, Ibrahim HK, Darwish SM, Hassen OA. Big Data Clustering Using Chemical Reaction Optimization Technique: A Computational Symmetry Paradigm for Location-Aware Decision Support in Geospatial Query Processing. Symmetry. 2022; 14(12):2637. https://doi.org/10.3390/sym14122637
Chicago/Turabian StyleNeamah, Ali Fahem, Hussein Khudhur Ibrahim, Saad Mohamed Darwish, and Oday Ali Hassen. 2022. "Big Data Clustering Using Chemical Reaction Optimization Technique: A Computational Symmetry Paradigm for Location-Aware Decision Support in Geospatial Query Processing" Symmetry 14, no. 12: 2637. https://doi.org/10.3390/sym14122637
APA StyleNeamah, A. F., Ibrahim, H. K., Darwish, S. M., & Hassen, O. A. (2022). Big Data Clustering Using Chemical Reaction Optimization Technique: A Computational Symmetry Paradigm for Location-Aware Decision Support in Geospatial Query Processing. Symmetry, 14(12), 2637. https://doi.org/10.3390/sym14122637