Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution
Abstract
:1. Introduction
- A novel clustering technique based on optimising DENCLUE parameters is introduced to address the issue of parameter tuning.
- The proposed technique is compared with the recent state-of-the-art methods, i.e., VDENCLUE and DPCSA, as well as classical DBSCAN. The comparison involves testing both the clustering quality and stability under noisy conditions. The evaluation involves several key clustering metrics, both internal and external, to present a comprehensive view of the performance.
- The methods are tested on several datasets, including both real (IRIS, Heart Disease, Seeds, and Wine) and synthetic (Aggregation, Two Moons, Path-based, Shapes, Spiral, and Zahn’s Compound) with diverse characteristics to prove the efficacy of the proposed models.
2. Methodology
2.1. Data Preprocessing
2.2. Noise Injection
2.3. Evaluation of Clustering Methods
2.3.1. Proposed Method—DE-Based DENCLUE Parameter Optimisation
Why Differential Evolution (DE) for Parameter Optimisation?
- DE has a better ability to explore the search space as compared to PSO and GA [42]. While PSO tends to emphasise exploitation (local search) and GA focuses on exploration (introducing new candidates in the search space), DE provides a strong balance between exploitation and exploration. This not only prevents DE from converging prematurely but also ensures that the convergence is carried out in a reasonable time.
- PSO does not provide a mutation mechanism like DE, which may lead to premature convergence to local optima. Moreover, DE’s mutation mechanism is superior to GA’s random mutation.
- PSO, unlike DE and GA, does not provide an explicit crossover mechanism. While GA provides an explicit crossover, DE has a much simpler and more effective crossover mechanism than GA.
- In DE, the influence of population size on convergence time is much less than the exponential time of GA due to the sorting and selection process. PSO, although having better convergence speed than DE, premature convergence limits its usage in complex high-dimensional problems. DE is more computationally efficient than GA due to simpler mutation and crossover.
Computational Complexity of the Proposed Method
2.3.2. Other Density-Based Methods
2.4. Performance Evaluation Through Clustering Metrics
3. Results
3.1. Experimental Setup
3.2. Clustering Quality and Noise Tolerance
- The clustering performance of benchmark methods’ is dependent on user-defined parameters. We have defined appropriate grids for the parameter search of these methods; however, an exhaustive search over the entire parameter space is rarely feasible. Since different parameters produce significantly different clustering outcomes, the criteria for selecting the best results depends on the desired outcome. Typical criteria include clustering validation metric(s) or visual analysis by the user. Our criteria have been to select the results that produced the best ARI (reported in Table 4 and Table 5). However, this comes with consequences: the clustering results with the best ARI need not be the best in other validation metrics as well. This is because the parameter values corresponding to the best ARI solution might not be optimal, a phenomenon presented in Figure 5 where the solution with the best ARI suffers from low DBCV and SI, along with deviation from the true cluster structure. In contrast, the proposed method’s optimal solution detects the true density-based structure, including density-based clustering validation (DBCV) [34] as part of its objective function. Not only does the proposed approach automatically learn parameters for the optimal clustering solution, but it is also optimised for the density-based structure and does not require the user to select the best solution from multiple options.
- The proposed method is based on DENCLUE, which has the inherent capability to detect clusters of varying densities even in the presence of noise. DENCLUE, when supplied with optimal parameters estimated through the proposed method, can better detect the underlying structure even in the presence of noise. DBSCAN, on the other hand, struggles with handling clusters of varying densities, which becomes even more challenging in the presence of noise. VDENCLUE, although capable of handling datasets with overlapping clusters due to varying KDE approaches, has inherent challenges when dealing with noisy datasets, as evident from the results shown in Table 4 and Table 5. This is due to the method’s broader kernel in low-density regions, which may include noise points and incorrectly assign them to clusters. Also, its kernel bandwidth based on the kth nearest neighbour leads to an over-smoothing effect in the noisy regions. Lastly, the DPCSA, although indicating comparable performance to the proposed method, faces challenges with noisy datasets due to two main reasons. DPCSA is sensitive to the cutoff distance; a sub-optimal cutoff value leads to incorrect local density computation, and subsequently misjudgment of noisy points. Moreover, the assignment strategy in DPCSA may lead to increased challenges in noisy datasets; if a point is incorrectly assigned to a cluster, the error can be propagated to other data points.
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Distribution of Metric Values
Appendix B
Statistical Analysis of Method Comparisons
Proposed vs. | |||||
---|---|---|---|---|---|
Noise Level | Dataset Group | VDENCLUE | DPCSA | DBSCAN | Significance (p < 0.0167) |
0 | Group 1 (Excluding datasets with significant overlap) | 0.0651 | 0.0651 | 0.0156 | Yes (DBSCAN) |
0 | Group 2 (Only datasets with significant overlap) | 0.5 | 1.0 | 0.25 | No |
0 | Group 3 (All datasets) | 0.0371 | 0.1601 | 0.0019 | Yes (DBSCAN) |
5 | Group 1 (Excluding datasets with significant overlap) | 0.0651 | 0.2968 | 0.2968 | No |
5 | Group 2 (Only datasets with significant overlap) | 0.5 | 0.75 | 1.0 | No |
5 | Group 3 (All datasets) | 0.0163 | 0.6953 | 0.1933 | Yes (VDENCLUE) |
10 | Group 1 (Excluding datasets with significant overlap) | 0.0468 | 0.2187 | 0.1562 | No |
10 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.25 | 0.25 | No |
10 | Group 3 (All datasets) | 0.0058 | 0.0390 | 0.0273 | Yes (VDENCLUE) |
20 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.4687 | 0.0468 | Yes (VDENCLUE) |
20 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.5 | 0.25 | No |
20 | Group 3 (All datasets) | 0.0019 | 0.1933 | 0.0058 | Yes (Except DPCSA) |
Proposed vs. | |||||
---|---|---|---|---|---|
Noise Level | Dataset Group | VDENCLUE | DPCSA | DBSCAN | Significance (p < 0.0167) |
0 | Group 1 (Excluding datasets with significant overlap) | 1.0 | 0.0312 | 0.8125 | No |
0 | Group 2 (Only datasets with significant overlap) | 0.75 | 0.25 | 0.5 | No |
0 | Group 3 (All datasets) | 1.0 | 0.0039 | 0.7695 | Yes (DPCSA) |
5 | Group 1 (Excluding datasets with significant overlap) | 1.0 | 0.1093 | 0.9375 | No |
5 | Group 2 (Only datasets with significant overlap) | 1.0 | 0.5 | 0.75 | No |
5 | Group 3 (All datasets) | 0.7695 | 0.0273 | 0.5566 | No |
10 | Group 1 (Excluding datasets with significant overlap) | 0.8125 | 0.1562 | 0.375 | No |
10 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.5 | 0.5 | No |
10 | Group 3 (All datasets) | 0.3222 | 0.1289 | 0.2324 | No |
20 | Group 1 (Excluding datasets with significant overlap) | 0.2187 | 0.0651 | 0.0651 | No |
20 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.25 | 0.5 | No |
20 | Group 3 (All datasets) | 0.04882 | 0.0136 | 0.0273 | Yes (DPCSA) |
Proposed vs. | |||||
---|---|---|---|---|---|
Noise Level | Dataset Group | VDENCLUE | DPCSA | DBSCAN | Significance (p < 0.0167) |
0 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.0463 | 0.0468 | Yes (VDENCLUE) |
0 | Group 2 (Only datasets with significant overlap) | 0.5 | 0.75 | 0.5 | No |
0 | Group 3 (All datasets) | 0.0039 | 0.0663 | 0.0136 | Yes (Except DPCSA) |
5 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.01562 | 0.0156 | Yes |
5 | Group 2 (Only datasets with significant overlap) | 1.0 | 1.0 | 0.75 | No |
5 | Group 3 (All datasets) | 0.0097 | 0.0371 | 0.0097 | Yes (Except DPCSA) |
10 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.0156 | 0.0156 | Yes |
10 | Group 2 (Only datasets with significant overlap) | 0.75 | 0.5 | 1.0 | No |
10 | Group 3 (All datasets) | 0.0136 | 0.1054 | 0.0097 | Yes (Except DPCSA) |
20 | Group 1 (Excluding datasets with significant overlap) | 0.2187 | 0.296 | 0.2968 | No |
20 | Group 2 (Only datasets with significant overlap) | 1.0 | 0.75 | 1.0 | No |
20 | Group 3 (All datasets) | 0.2324 | 0.4921 | 0.2753 | No |
Proposed vs. | |||||
---|---|---|---|---|---|
Noise Level | Dataset Group | VDENCLUE | DPCSA | DBSCAN | Significance (p < 0.0167) |
0 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.0463 | 0.0312 | Yes (VDENCLUE) |
0 | Group 2 (Only datasets with significant overlap) | 0.5 | 0.75 | 0.5 | No |
0 | Group 3 (All datasets) | 0.0039 | 0.0663 | 0.0097 | Yes (Except DPCSA) |
5 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.0156 | 0.0156 | Yes |
5 | Group 2 (Only datasets with significant overlap) | 1.0 | 1.0 | 1.0 | No |
5 | Group 3 (All datasets) | 0.0097 | 0.0371 | 0.0136 | Yes (Except DPCSA) |
10 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.0156 | 0.0156 | Yes |
10 | Group 2 (Only datasets with significant overlap) | 0.5 | 0.5 | 0.75 | No |
10 | Group 3 (All datasets) | 0.0165 | 0.1308 | 0.0165 | Yes (Except DPCSA) |
20 | Group 1 (Excluding datasets with significant overlap) | 0.2187 | 0.2968 | 0.2187 | No |
20 | Group 2 (Only datasets with significant overlap) | 0.5 | 0.75 | 0.75 | No |
20 | Group 3 (All datasets) | 0.4921 | 0.5566 | 0.375 | No |
Proposed vs. | |||||
---|---|---|---|---|---|
Noise Level | Dataset Group | VDENCLUE | DPCSA | DBSCAN | Significance (p < 0.0167) |
0 | Group 1 (Excluding datasets with significant overlap) | 0.0312 | 0.6875 | 0.0312 | No |
0 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.25 | 0.5 | No |
0 | Group 3 (All datasets) | 0.0039 | 0.1933 | 0.1308 | Yes (VDENCLUE) |
5 | Group 1 (Excluding datasets with significant overlap) | 0.0156 | 0.6875 | 0.0468 | Yes (VDENCLUE) |
5 | Group 2 (Only datasets with significant overlap) | 0.5 | 0.5 | 1.0 | No |
5 | Group 3 (All datasets) | 0.0039 | 0.25 | 0.1054 | Yes (VDENCLUE) |
10 | Group 1 (Excluding datasets with significant overlap) | 0.0312 | 0.625 | 0.0312 | No |
10 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.5 | 0.5 | No |
10 | Group 3 (All datasets) | 0.0039 | 1.0 | 0.1308 | Yes (VDENCLUE) |
20 | Group 1 (Excluding datasets with significant overlap) | 0.01562 | 0.4375 | 0.1093 | Yes (VDENCLUE) |
20 | Group 2 (Only datasets with significant overlap) | 0.25 | 0.75 | 1.0 | No |
20 | Group 3 (All datasets) | 0.0019 | 0.8203 | 0.1601 | Yes (VDENCLUE) |
References
- Nayyar, A.; Puri, V. Comprehensive analysis & performance comparison of clustering algorithms for big data. Rev. Comput. Eng. Res. 2017, 4, 54–80. [Google Scholar] [CrossRef]
- Premkumar, M.; Sinha, G.; Ramasamy, M.D.; Sahu, S.; Subramanyam, C.B.; Sowmya, R.; Abualigah, L.; Derebew, B. Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems. Sci. Rep. 2024, 14, 5434. [Google Scholar] [CrossRef]
- Mahdi, M.A.; Hosny, K.M.; Elhenawy, I. Scalable Clustering Algorithms for Big Data: A Review. IEEE Access 2021, 9, 80015–80027. [Google Scholar] [CrossRef]
- Murtagh, F.; Contreras, P. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip Rev. Data Min. Knowl. Discov. 2012, 2, 86–97. [Google Scholar] [CrossRef]
- Kriegel, H.P.; Kröger, P.; Kröger, K.; Sander, O.; Zimek, A. Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 231–240. [Google Scholar] [CrossRef]
- Cheng, W.; Wang, W.; Batista, S. Grid-Based Clustering. In Data Clust; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 128–148. [Google Scholar] [CrossRef]
- Mcnicholas, P.D. Model-Based Clustering. J. Classif. 2016, 33, 331–373. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 1996, 96, 226–231. [Google Scholar]
- Gholizadeh, N.; Saadatfar, H.; Hanafi, N. K-DBSCAN: An improved DBSCAN algorithm for big data. J. Supercomput. 2021, 77, 6214–6235. [Google Scholar] [CrossRef]
- Wang, Y.; Gu, Y.; Shun, J. Theoretically-efficient and practical parallel DBSCAN. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of 2020, Portland, OR, USA, 14–19 June 2020; p. 17. [Google Scholar] [CrossRef]
- Jang, J.; Lee, Y.; Lee, S.; Shin, D.; Kim, D.; Rim, H. A novel density-based clustering method using word embedding features for dialogue intention recognition. Clust. Comput. 2016, 19, 2315–2326. [Google Scholar] [CrossRef]
- Idrissi, A.; Rehioui, H.; Laghrissi, A.; Retal, S. An improvement of DENCLUE algorithm for the data clustering. In Proceedings of the 2015 5th International Conference on Information & Communication, Paris, France, 12–13 October 2015. [Google Scholar]
- Liu, P.; Zhou, D.; Wu, N. VDBSCAN: Varied density based spatial clustering of applications with noise. In Proceedings of the 2007 International Conference on Service Systems and Service, Vienna, Austria, 17–20 September 2007. [Google Scholar]
- Fahim, A.M.; Saake, G.; Salem AB, M.; Torkey, F.A.; Ramadan, M.A. An enhanced density based spatial clustering of applications with noise. In Proceedings of the 2009 IEEE International Advance Computing Conference, Patiala, India, 6–7 March 2009. [Google Scholar]
- Bryant, A.; Cios, K. Rnn-dbscan: A density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans. Knowl. Data Eng. 2018, 30, 1109–1121. [Google Scholar] [CrossRef]
- Saeed, M.M.; Al Aghbari, Z.; Alsharidah, M. Big data clustering techniques based on Spark: A literature review. PeerJ Comput. Sci. 2020, 6, e321. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Zhou, L.; Pei, S.; Yu, Z.; Chen, Y.; Liu, X.; Du, J.; Xiong, N. KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 3939–3953. [Google Scholar] [CrossRef]
- Patwary, M.A.; Satish, N.; Sundaram, N.; Manne, F.; Habib, S.; Dubey, P. Pardicle: Parallel Approximate Density-Based Clustering. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC, New Orleans, LA, USA, 16–21 November 2014; Volume 2015, pp. 560–571. [Google Scholar] [CrossRef]
- Bessrour, M.; Elouedi, Z.; Lefevre, E. E-DBSCAN: An evidential version of the DBSCAN method. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, Canberra, Australia, 1–4 December 2020; pp. 3073–3080. [Google Scholar] [CrossRef]
- Ienco, D.; Bordogna, G. Fuzzy extensions of the DBScan clustering algorithm. Soft Comput. 2018, 22, 1719–1730. [Google Scholar] [CrossRef]
- Yu, X.G.; Jian, Y. A new clustering algorithm based on KNN and denclue. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005, Guangzhou, China, 18–21 August 2005; pp. 2033–2038. [Google Scholar] [CrossRef]
- Ankerst, M.; Breunig, M.M.; Kriegel, H.P.; Sander, J. OPTICS: Ordering Points to Identify the Clustering Structure. SIGMOD Rec. (ACM Spec. Interest Group Manag. Data) 1999, 28, 49–60. [Google Scholar] [CrossRef]
- Khader, M.; Al-Naymat, G. Discovery of arbitrary-shapes clusters using DENCLUE algorithm. Int. Arab J. Inf. Technol. 2020, 17, 629–634. [Google Scholar] [CrossRef]
- Kazemi, U.; Boostani, R. FEM-DBSCAN: An Efficient Density-Based Clustering Approach. Iran. J. Sci. Technol. Trans. Electr. Eng. 2021, 45, 979–992. [Google Scholar] [CrossRef]
- Khader, M.; Al-Naymat, G. An overview of various enhancements of DENCLUE algorithm. In Proceedings of the Second International Conference on Data Science, E, New York, NY, USA, 2 December 2019. [Google Scholar] [CrossRef]
- Khader, M.S.; Al-Naymat, G. VDENCLUE: An Enhanced Variant of DENCLUE Algorithm. Adv. Intell. Syst. Comput. 2021, 1251, 425–436. [Google Scholar] [CrossRef]
- Yu, D.; Liu, G.; Guo, M.; Liu, X.; Yao, S. Density Peaks Clustering Based on Weighted Local Density Sequence and Nearest Neighbor Assignment. IEEE Access 2019, 7, 34301–34317. [Google Scholar] [CrossRef]
- Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Min. Knowl. Discov. 1998, 2, 169–194. [Google Scholar] [CrossRef]
- Xibilia, M.G.; Latino, M.; Marinkovic, Z.; Atanaskovic, A.; Donato, N. Soft Sensors Based on Deep Neural Networks for Applications in Security and Safety. IEEE Trans. Instrum. Meas. 2020, 69, 7869–7876. [Google Scholar] [CrossRef]
- Burnham, P.; Gomez-Lopez, N.; Heyang, M.; Cheng, A.P.; Lenz, J.S.; Dadhania, D.M.; Lee, J.R.; Suthanthiran, M.; Romero, R.; De Vlaminck, I. Separating the signal from the noise in metagenomic cell-free DNA sequencing. Microbiome 2020, 8, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Askari, S. Fuzzy C-Means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development. Expert Syst. Appl. 2021, 165, 113856. [Google Scholar] [CrossRef]
- Du, M.; Ding, S.; Xue, Y. A robust density peaks clustering algorithm using fuzzy neighborhood. Int. J. Mach. Learn. Cybern. 2018, 9, 1131–1140. [Google Scholar] [CrossRef]
- Iam-On, N. Clustering data with the presence of attribute noise: A study of noise completely at random and ensemble of multiple k-means clusterings. Int. J. Mach. Learn. Cybern. 2020, 11, 491–509. [Google Scholar] [CrossRef]
- Ajmal, O.; Mumtaz, S.; Arshad, H.; Soomro, A.; Hussain, T.; Attar, R.W.; Alhomoud, A. Enhanced Parameter Estimation of DENsity CLUstEring (DENCLUE) Using Differential Evolution. Mathematics 2024, 12, 2790. [Google Scholar] [CrossRef]
- Zhang, T.T.; Yuan, B. Density-Based Multiscale Analysis for Clustering in Strong Noise Settings with Varying Densities. IEEE Access 2018, 6, 25861–25873. [Google Scholar] [CrossRef]
- Li, Z.; Liu, J.; Chen, S.; Tang, X. Noise robust spectral clustering. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 14–21 October 2007. [Google Scholar]
- Tatjer, A.; Nagarajan, B.; Marques, R.; Radeva, P. CCLM: Class-Conditional Label Noise Modelling. Lect. Notes Comput. Sci. 2023, 14062, 3–14. [Google Scholar] [CrossRef]
- Rączkowska, A.; Osowska-Kurczab, A.; Szczerbiński, J.; Jasinska-Kobus, K.; Nazarko, K. AlleNoise—large-scale text classification benchmark dataset with real-world label noise. arXiv 2024, arXiv:2407.10992. [Google Scholar]
- Moulavi, D.; Jaskowiak, P.A.; Campello, R.J.G.B.; Zimek, A.; Sander, J. Density-based clustering validation. In Proceedings of the SIAM International Conference on Data Mining 2014, SDM 2014, Philadelphia, PA, USA, 24–26 April 2014; Volume 2, pp. 839–847. [Google Scholar] [CrossRef]
- Clerc, M. Particle Swarm Optimization. In Part. Swarm Optim; Wiley Online Library: Minneapolis, MN, USA, 2010. [Google Scholar] [CrossRef]
- Bakirtzis, A.; Kazarlis, S. Genetic Algorithms. In Advanced Solutions in Power Systems: HVDC, FACTS, and Artificial Intelligence; Wiley Online Library: Minneapolis, MN, USA, 2016; pp. 845–902. [Google Scholar] [CrossRef]
- Kachitvichyanukul, V. Comparison of three evolutionary algorithms: GA, PSO, and DE. Ind. Eng. Manag. Syst. 2012, 11, 215–223. [Google Scholar] [CrossRef]
- Srinivas, C.; Reddy, B.R.; Ramji, K.; Naveen, R. Sensitivity analysis to determine the parameters of genetic algorithm for machine layout. Procedia Mater. Sci. 2014, 6, 866–876. [Google Scholar] [CrossRef]
- Isiet, M.; Gadala, M. Sensitivity Analysis of Control Parameters in Particle Swarm Optimization. J. Comput. Sci. 2020, 41, 101086. [Google Scholar] [CrossRef]
- Karami, A.; Johansson, R. Choosing DBSCAN parameters automatically using differential evolution. Int. J. Comput. Appl. 2014, 91, 1–11. [Google Scholar] [CrossRef]
- Santosh, T.; Ramesh, D. DENCLUE-DE: Differential Evolution Based DENCLUE for Scalable Clustering in Big Data Analysis. Lect. Notes Data Eng. Commun. Technol. 2020, 44, 436–445. [Google Scholar] [CrossRef]
- Juwono, F.H.; Wong, W.K.; Pek, H.T.; Sivakumar, S.; Acula, D.D. Ovarian cancer detection using optimized machine learning models with adaptive differential evolution. Biomed. Signal Process. Control. 2022, 77, 103785. [Google Scholar] [CrossRef]
- Golilarz, N.A.; Mirmozaffari, M.; Gashteroodkhani, T.A.; Ali, L.; Dolatsara, H.A.; Boskabadi, A.; Yazdi, M. Optimized wavelet-based satellite image de-noising with multi-population differential evolution-assisted harris hawks optimization algorithm. IEEE Access 2020, 8, 133076–133085. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Hassan, B.A.; Tayfor, N.B.; Hassan, A.A.; Ahmed, A.M.; Rashid, T.A.; Abdalla, N.N. From A-to-Z review of clustering validation indices. Neurocomputing 2024, 601, 128198. [Google Scholar] [CrossRef]
- Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
- Vinh, N.X.; Epps, J.; Bailey, J. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009. [Google Scholar] [CrossRef]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 2, 224–227. [Google Scholar] [CrossRef]
- Amrulloh, K. Comparison Between Davies-Bouldin Index and Silhouette Coefficient Evaluation Methods in Retail Store Sales Transaction Data Clusterization Using K-Medoids Algorithm. In Proceedings of the 3rd South American International Industrial Engineering and Operations Management Conference, Asuncion, Paraguay, 19–21 July 2022; pp. 1952–1961. [Google Scholar]
- Panwong, P.; Boongoen, T.; Iam-On, N. Improving consensus clustering with noise-induced ensemble generation. Expert Syst. Appl. 2020, 146, 113138. [Google Scholar] [CrossRef]
- Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ (accessed on 29 August 2024).
- Fränti, P.; Sieranoja, S. K-means properties on six clustering benchmark datasets. Appl. Intell. 2018, 48, 4743–4759. [Google Scholar] [CrossRef]
- Zhou, Y.; Xiang, Y.; He, X. Constrained multiobjective optimization: Test problem construction and performance evaluations. IEEE Trans. Evol. Comput. 2020, 25, 172–186. [Google Scholar] [CrossRef]
- Hou, Y.; Wu, Y.; Liu, Z.; Han, H.; Wang, P. Dynamic multi-objective differential evolution algorithm based on the information of evolution progress. Sci. China Technol. Sci. 2021, 64, 1676–1689. [Google Scholar] [CrossRef]
- Ahmad, M.F.; Isa, N.A.M.; Lim, W.H.; Ang, K.M. Differential evolution: A recent review based on state-of-the-art works. Alex. Eng. J. 2022, 61, 3831–3872. [Google Scholar] [CrossRef]
- Ronkkonen, J.; Kukkonen, S.; Price, K.V. Real-parameter optimization with differential evolution. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Edinburgh, Scotland, UK, 2–5 September 2005; pp. 506–513. [Google Scholar]
- Pant, M.; Zaheer, H.; Garcia-Hernandez, L.; Abraham, A. Differential Evolution: A review of more than two decades of research. Eng. Appl. Artif. Intell. 2020, 90, 103479. [Google Scholar]
- Xie, Y.; Jia, X.; Shekhar, S.; Bao, H.; Zhou, X. Significant DBSCAN+: Statistically Robust Density-based Clustering. ACM Trans. Intell. Syst. Technol. 2021, 12, 1–26. [Google Scholar] [CrossRef]
- Marques, J.C.; Orger, M.B. Clusterdv: A simple density-based clustering method that is robust, general and automatic. Bioinformatics 2019, 35, 2125–2132. [Google Scholar] [CrossRef]
- Biju, V.G.; Prashanth, C.M. Friedman and Wilcoxon evaluations comparing SVM, bagging, boosting, K-NN and decision tree classifiers. J. Appl. Comput. Sci. Methods 2017, 9, 23–47. [Google Scholar] [CrossRef]
- Zimmerman, D.W.; Zumbo, B.D. Relative power of the wilcoxon test, the friedman test, and repeated-measures ANOVA on ranks. J. Exp. Educ. 1993, 62, 75–86. [Google Scholar] [CrossRef]
Algorithm | Pros | Cons |
---|---|---|
DBSCAN [28] | - No need to predefine the number of clusters - Robust to noise and outliers - Versatile in finding clusters of various shapes and sizes | - Sensitive to parameter choices (epsilon and minPts) - Struggles with datasets that have varying densities |
DENCLUE [21] | - Handles clusters of varying shapes and sizes - Less sensitive to specific distance parameters - Robust to noise | - Computationally intensive - May still struggle with datasets of highly varying densities |
VDENCLUE [26] | - Effectively handles varying densities - Flexible, accommodates different cluster shapes and sizes - Improved performance over traditional DENCLUE | - Computationally intensive - Requires careful parameter tuning |
DPCSA [27] | - Enhanced accuracy with local density and nearest neighbour consideration - Flexible, handles varying densities and complex shapes - Robust in complex datasets | - Increased computational complexity - Requires careful parameter tuning |
Dataset | Samples | Attributes | Clusters | Domain |
---|---|---|---|---|
IRIS | 150 | 4 | 3 | Biological data |
Heart Disease | 303 | 13 | 5 | Medical diagnosis |
Seeds | 210 | 7 | 3 | Agricultural data |
Wine | 178 | 13 | 3 | Chemical analysis |
Aggregation | 788 | 2 | 7 | Complex shapes |
Two Moons | 100 | 2 | 2 | Non-linear separability |
Path-based | 300 | 2 | 3 | Path-shaped clusters |
Shapes | 400 | 2 | 4 | Geometric diversity |
Spiral | 312 | 2 | 3 | Non-convex clusters |
Zahn’s Compound | 399 | 2 | 6 | Varying densities |
Method | Parameter | Range |
---|---|---|
DBSCAN | eps | [0.05, 0.1, 0.15, 0.2, 0.25, 0.3] |
minpts | [3, 5, 7, 10, 12] | |
DPCSA | ρ | [0, 1] |
δ | [0, 1] | |
VDENCLUE | K | |
density threshold | ||
Proposed method | F | 0.5 |
CR | 0.8 |
VDENCLUE | DPCSA | DBSCAN | Proposed Method | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0% | 5% | 10% | 20% | 0% | 5% | 10% | 20% | 0% | 5% | 10% | 20% | 0% | 5% | 10% | 20% | ||
Aggregation | SI | 0.12 | 0.09 | 0.07 | 0.08 | 0.46 | 0.40 | 0.48 | 0.35 | 0.41 | 0.44 | 0.36 | 0.34 | 0.43 | 0.43 | 0.4 | 0.31 |
DBI | 1.03 | 1.03 | 1.11 | 1.08 | 0.74 | 0.57 | 0.72 | 0.90 | 0.74 | 0.88 | 1.25 | 1.32 | 0.91 | 0.89 | 0.87 | 0.91 | |
ARI | 0.03 | 0.03 | 0.03 | 0.02 | 0.56 | 0.13 | 0.46 | 0.17 | 0.80 | 0.22 | 0.23 | 0.11 | 0.76 | 0.75 | 0.73 | 0.69 | |
AMI | 0.21 | 0.19 | 0.17 | 0.15 | 0.63 | 0.14 | 0.47 | 0.18 | 0.87 | 0.24 | 0.24 | 0.12 | 0.86 | 0.81 | 0.78 | 0.71 | |
DBCV | −0.48 | −0.59 | −0.46 | −0.69 | −0.18 | −0.39 | −0.84 | −0.77 | −0.26 | −0.26 | −0.26 | −0.34 | −0.1 | −0.16 | −0.21 | −0.22 | |
Two Moons | SI | 0.35 | 0.32 | 0.29 | 0.25 | 0.37 | 0.34 | 0.34 | 0.32 | 0.34 | 0.20 | 0.35 | 0.33 | 0.4 | 0.4 | 0.4 | 0.37 |
DBI | 0.63 | 0.68 | 0.79 | 0.79 | 1.01 | 1.02 | 0.97 | 1.14 | 1.06 | 1.19 | 1.16 | 1.08 | 1.33 | 1.33 | 1.31 | 1.21 | |
ARI | 0.07 | 0.07 | 0.07 | 0.05 | 1.00 | 0.99 | 0.89 | 0.68 | 0.82 | 0.27 | 0.26 | 0.16 | 1.00 | 1.00 | 0.97 | 0.85 | |
AMI | 0.19 | 0.18 | 0.18 | 0.12 | 1.00 | 0.99 | 0.82 | 0.57 | 0.85 | 0.29 | 0.25 | 0.16 | 1.00 | 1.00 | 0.94 | 0.76 | |
DBCV | −0.18 | −0.24 | −0.25 | −0.53 | 0.86 | 0.57 | 0.16 | −0.61 | 0.16 | 0.1 | −0.08 | −0.06 | 0.16 | 0.16 | −0.05 | −0.02 | |
Path-based | SI | 0.21 | 0.18 | 0.13 | 0.13 | 0.34 | 0.33 | 0.31 | 0.48 | 0.09 | 0.41 | 0.40 | 0.34 | 0.42 | 0.27 | 0.55 | 0.42 |
DBI | 0.86 | 1.00 | 1.11 | 1.02 | 1.06 | 1.14 | 1.14 | 0.72 | 1.90 | 1.25 | 1.56 | 1.60 | 1.24 | 1.02 | 0.63 | 0.78 | |
ARI | 0.07 | 0.07 | 0.07 | 0.05 | 0.22 | 0.10 | 0.19 | 0.06 | 0.37 | 0.10 | 0.08 | 0.06 | 0.57 | 0.22 | −0.01 | 0.01 | |
AMI | 0.20 | 0.19 | 0.17 | 0.14 | 0.27 | 0.12 | 0.21 | 0.06 | 0.42 | 0.11 | 0.09 | 0.08 | 0.67 | 0.33 | 0.01 | 0.01 | |
DBCV | −0.38 | −0.43 | −0.47 | −0.56 | 0.04 | −0.96 | −0.86 | −0.91 | −0.48 | −0.51 | −0.59 | −0.59 | −0.41 | −0.18 | −0.41 | −0.56 | |
Shapes | SI | 0.27 | 0.17 | 0.15 | 0.14 | 0.55 | 0.47 | 0.43 | 0.40 | 0.54 | 0.46 | 0.43 | 0.40 | 0.71 | 0.69 | 0.49 | 0.53 |
DBI | 0.85 | 0.97 | 1.03 | 1.02 | 0.64 | 0.96 | 0.84 | 0.87 | 0.71 | 1.49 | 1.52 | 1.53 | 0.58 | 0.58 | 0.58 | 0.67 | |
ARI | 0.11 | 0.11 | 0.10 | 0.08 | 0.71 | 0.53 | 0.38 | 0.33 | 0.73 | 0.45 | 0.38 | 0.36 | 1.00 | 0.96 | 0.95 | 0.83 | |
AMI | 0.27 | 0.24 | 0.22 | 0.19 | 0.83 | 0.60 | 0.48 | 0.42 | 0.83 | 0.48 | 0.46 | 0.38 | 1.00 | 0.94 | 0.93 | 0.77 | |
DBCV | −0.43 | −0.47 | −0.62 | −0.59 | 0.87 | −0.15 | −0.25 | −0.54 | 0.23 | 0.09 | −0.07 | −0.17 | 0.23 | 0.09 | 0.08 | −0.26 | |
Spiral | SI | 0.29 | 0.25 | 0.23 | 0.21 | 0.00 | 0.42 | - | 0.38 | −0.06 | 0.38 | 0.30 | 0.31 | 0.02 | 0.01 | 0.1 | 0.47 |
DBI | 0.97 | 1.09 | 1.02 | 1.13 | 5.87 | 0.43 | - | 0.53 | 4.28 | 2.24 | 2.61 | 2.04 | 7.73 | 8.02 | 4.07 | 0.82 | |
ARI | 0.03 | 0.03 | 0.02 | 0.02 | 0.2 | −0.000006 | 0 | −0.00002 | 0.46 | 0.05 | 0.05 | 0.03 | 1.00 | 0.97 | 0.96 | 0.01 | |
AMI | 0.15 | 0.14 | 0.13 | 0.11 | 0.2 | 0.0008 | 0 | 0.001 | 0.57 | 0.08 | 0.08 | 0.05 | 1.00 | 0.94 | 0.94 | 0.01 | |
DBCV | −0.14 | −0.33 | −0.33 | −0.59 | −0.82 | −0.82 | −1 | −0.89 | −0.37 | −0.34 | −0.46 | −0.48 | −0.23 | −0.24 | −0.41 | −0.43 | |
Zahn’s Compound | SI | 0.19 | 0.16 | 0.13 | 0.17 | 0.53 | 0.54 | 0.53 | 0.51 | 0.37 | 0.40 | 0.39 | 0.36 | 0.61 | 0.60 | 0.62 | 0.41 |
DBI | 1.01 | 0.99 | 1.00 | 1.03 | 0.70 | 0.47 | 0.53 | 0.50 | 2.02 | 1.94 | 2.17 | 1.46 | 0.77 | 0.79 | 0.76 | 0.75 | |
ARI | 0.11 | 0.10 | 0.09 | 0.06 | 0.29 | 0.13 | 0.12 | 0.09 | 0.69 | 0.27 | 0.21 | 0.15 | 0.75 | 0.72 | 0.7 | 0.01 | |
AMI | 0.24 | 0.22 | 0.20 | 0.17 | 0.31 | 0.14 | 0.12 | 0.09 | 0.75 | 0.28 | 0.21 | 0.16 | 0.81 | 0.76 | 0.71 | 0.01 | |
DBCV | −0.55 | −0.59 | −0.69 | −0.56 | −0.18 | −0.68 | −0.69 | −0.73 | −0.06 | −0.17 | −0.24 | −0.2 | −0.1 | −0.12 | −0.14 | −0.63 |
VDENCLUE | DPCSA | DBSCAN | Proposed Method | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0% | 5% | 10% | 20% | 0% | 5% | 10% | 20% | 0% | 5% | 10% | 20% | 0% | 5% | 10% | 20% | ||
IRIS | SI | 0.2 | 0.18 | 0.14 | 0.06 | 0.58 | 0.53 | 0.43 | 0.42 | 0.43 | 0.42 | 0.4 | 0.37 | 0.64 | 0.6 | 0.62 | 0.57 |
DBI | 1.06 | 1.14 | 1.25 | 1.36 | 0.53 | 0.4 | 0.4 | 0.4 | 3.14 | 2.8 | 2.43 | 1.75 | 0.69 | 0.81 | 0.71 | 0.86 | |
ARI | 0.18 | 0.17 | 0.16 | 0.14 | 0.63 | 0.43 | 0.26 | 0.2 | 0.48 | 0.45 | 0.41 | 0.36 | 0.57 | 0.55 | 0.55 | 0.50 | |
AMI | 0.27 | 0.24 | 0.22 | 0.19 | 0.76 | 0.52 | 0.26 | 0.2 | 0.61 | 0.55 | 0.5 | 0.46 | 0.74 | 0.71 | 0.69 | 0.62 | |
DBCV | −0.61 | −0.79 | −0.81 | −0.85 | −0.53 | −0.83 | −0.82 | −0.88 | −0.26 | −0.31 | 0.22 | −0.02 | 0.15 | −0.2 | 0.25 | −0.29 | |
Heart Disease | SI | 0.09 | 0.03 | 0.02 | 0.03 | 0.15 | 0.21 | 0.22 | 0.19 | −0.16 | −0.17 | −0.17 | −0.17 | 0.33 | 0.30 | 0.31 | 0.39 |
DBI | 2.29 | 2.54 | 2.25 | 2.2 | 0.43 | 0.54 | 0.49 | 0.45 | 1.34 | 1.33 | 1.32 | 1.3 | 1.67 | 1.79 | 1.72 | 1.45 | |
ARI | 0.1 | 0.05 | 0.06 | 0.02 | 0.03 | 0.05 | 0.05 | 0.04 | −0.05 | −0.05 | −0.05 | −0.05 | 0.13 | 0.13 | 0.14 | 0.17 | |
AMI | 0.08 | 0.06 | 0.07 | 0.05 | 0.09 | 0.1 | 0.1 | 0.09 | 0.02 | 0.02 | 0.02 | 0.02 | 0.14 | 0.13 | 0.14 | 0.16 | |
DBCV | −0.55 | −0.66 | −0.70 | −0.67 | −0.87 | −0.93 | −0.92 | −0.89 | −0.88 | −0.92 | −0.91 | −0.92 | 0.09 | 0.09 | 0.03 | 0.15 | |
Seeds | SI | 0.11 | 0.11 | 0.09 | 0.02 | 0.38 | 0.4 | 0.36 | 0.3 | 0.06 | 0.14 | 0.13 | 0.1 | 0.07 | 0.08 | 0.42 | 0.42 |
DBI | 1.38 | 1.51 | 1.56 | 1.76 | 0.8 | 0.42 | 0.44 | 0.45 | 2.1 | 2.26 | 2.27 | 2.3 | 1.37 | 1.59 | 1.00 | 1.35 | |
ARI | 0.17 | 0.16 | 0.15 | 0.12 | 0.49 | 0.49 | 0.46 | 0.38 | 0.16 | 0.15 | 0.14 | 0.11 | 0.16 | 0.01 | −0.01 | −0.01 | |
AMI | 0.22 | 0.2 | 0.18 | 0.15 | 0.54 | 0.53 | 0.45 | 0.34 | 0.23 | 0.22 | 0.2 | 0.19 | 0.20 | 0.01 | 0.01 | 0.01 | |
DBCV | −0.47 | −0.68 | −0.64 | −0.76 | −0.58 | −0.88 | −0.94 | −0.94 | −0.34 | −0.36 | −0.41 | −0.43 | −0.04 | −0.24 | −0.33 | −0.4 | |
Wine | SI | 0.15 | 0.19 | 0.15 | 0.11 | 0.1 | 0.16 | 0.17 | 0.16 | −0.27 | −0.29 | −0.31 | −0.32 | 0.35 | 0.30 | 0.35 | 0.15 |
DBI | 1.59 | 1.67 | 1.82 | 1.83 | 0.87 | 0.55 | 0.44 | 0.39 | 1.48 | 1.56 | 1.68 | 1.8 | 1.64 | 1.68 | 1.65 | 1.81 | |
ARI | 0.2 | 0.17 | 0.16 | 0.12 | 0.36 | 0.47 | 0.37 | 0.31 | 0.04 | 0.04 | 0.04 | 0.03 | 0.50 | 0.49 | 0.50 | −0.01 | |
AMI | 0.24 | 0.19 | 0.19 | 0.15 | 0.44 | 0.49 | 0.41 | 0.33 | 0.17 | 0.17 | 0.16 | 0.15 | 0.61 | 0.60 | 0.61 | 0.02 | |
DBCV | −0.50 | −0.69 | −0.57 | −0.75 | −0.17 | −0.81 | −0.76 | −0.85 | −0.55 | −0.68 | −0.71 | −0.71 | −0.19 | −0.19 | −0.19 | −0.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ajmal, O.; Arshad, H.; Arshed, M.A.; Ahmed, S.; Mumtaz, S. Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution. Mathematics 2024, 12, 3367. https://doi.org/10.3390/math12213367
Ajmal O, Arshad H, Arshed MA, Ahmed S, Mumtaz S. Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution. Mathematics. 2024; 12(21):3367. https://doi.org/10.3390/math12213367
Chicago/Turabian StyleAjmal, Omer, Humaira Arshad, Muhammad Asad Arshed, Saeed Ahmed, and Shahzad Mumtaz. 2024. "Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution" Mathematics 12, no. 21: 3367. https://doi.org/10.3390/math12213367
APA StyleAjmal, O., Arshad, H., Arshed, M. A., Ahmed, S., & Mumtaz, S. (2024). Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution. Mathematics, 12(21), 3367. https://doi.org/10.3390/math12213367