In this paper, we tested the clustering performance of the improved method using synthetic and real-world datasets and applied it to a clustering task on the Berkeley image segmentation dataset (BSDS500) [
34] to further validate the utility of the CSA-DBSCAN algorithm. We demonstrated the effectiveness of the new adaptive method for outlier assignment on datasets with more complex density structures. We compared the clustering results of the proposed algorithm with the classical k-means [
14], AP [
15], DBSCAN [
16], DPC [
10], DPCSA [
35], AmDPC [
36], and DeDPC [
37] algorithms for analysis. Among them, DPCSA, AmDPC and DeDPC are the most advanced clustering algorithms proposed in recent years. In this experiment, we compared and analyzed the values of the normalized mutual information (NMI), adjusted Rand index (ARI), and F-measure (FM) evaluation metrics of the above four mentioned algorithms, respectively, to further evaluate the clustering ability of these different algorithms. As shown in
Table 1, we used twelve datasets to measure the efficiency of the method. As also shown in
Table 1, the former six were synthetic datasets, and the latter six were real-world datasets. All datasets are described in detail below.
4.1. Evaluation of Clustering Effectiveness
The experimental environment was simulated using a computer with an Intel Pentium 2.9 GHz, 8.00 GB memory, and 500 GB hard drive running a Windows 10 operating system and using the MATLAB2019a programming language.
A detailed description of the clustering performance evaluation metrics is given before we derive the experimental results. We employed the three most valid cluster performance metrics (NMI, ARI, ACC, and FM). These metrics take values between −1 and 1, with values nearer to 1 indicating the better performance of the clusters.
NMI is a very valuable external evaluation metric of clustering. Suppose the true label of a dataset is
A, and the clustering result label is
B. Then, the unique values in
A are extracted to form vector
E, and the unique values in
B are extracted to form vector
F. The NMI of
A and
B are as follows:
where
indicates the probability for
e in
A, and
denotes the proportion of
f in
B.
indicates the joint probability of
e and
f.
The Rand index (RI) uses a paired approach to detect true negatives (TN), true positives (TP), false negatives (FN) and false positives (FP). In clustering methods, the metric values of the RI tend to have high values, and it is difficult to distinguish the effectiveness of the methods. Therefore, we used the ARI to calculate the algorithm performance instead of RI.
ACC is a classical evaluation index often used to measure clustering accuracy. Assuming that
and
represent cluster labels and real labels, respectively, in a dataset of
n samples, ACC can be defined as:
where
is a permutation mapping function that matches the cluster label and the real label.
Assume that the real label of the dataset is
A and the label of the clustering is
B.
denotes the quantity of samples that belong to the same clusters in both
A and
B.
indicates the quantity of samples that belong to the same clusters in
A and not in
B.
indicates the quantity of samples that belong to different clusters in
A but belong to the same clusters in
B. FM was calculated as shown below.
4.2. Performance Evaluation on Synthetic Datasets
We validated the clustering performance of the CSA-DBSCAN algorithm on synthetic datasets. The external evaluation index (ARI) was chosen as the fitness function () for this experiment. We selected Donutcurves, Target, Pearl, Complex8, and Complex9 datasets, all of which have complex density structures, to observe the ability of the CSA-DBSCAN method to handle complex data.
In the experiments, we kept the other parameters of the adaptive search constant and set the number of iterations to 100 to verify the adaptive effect of the improved method. The Twocirclesnoise and Target datasets for this experiment are datasets with anomalies as a separate class. Using this dataset allows further testing of the CSA-DBSCAN algorithm’s anomaly assignment performance. In
Table 2, we give the evaluation metric values for the clustering of the eight state-of-the-art algorithms for the six synthetic datasets. The evaluation metric values for the different clustering methods correspond to the clustering results in
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6 and
Figure 7. In this experiment, the best clustering results were obtained for all algorithms. Therefore, we analyzed the clustering accuracy and effectiveness of the algorithms in combination with the tabular and visual clustering results.
In
Table 2, Par indicates the parameter settings of the method on different datasets. The visual clustering results of the method in
Table 2 correspond to
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6 and
Figure 7. In
Table 2, the parameter of the k-means method is the quantity of clusters (
k), the parameter of the AP method is the value of the matrix diagonal (
preference), the parameters of the DBSCAN method are the quantities of neighborhoods (
MinPts) and the domain radius (
Eps), and the parameters of the DPC method are the average number of neighboring points, represented as a percentage (
p). The DPCSA method is a parameterless clustering algorithm; the AmDPC model’s parameters are the number of neighbors (
), peak split lines (
,
) and average number of neighbors, represented as a percentage (
p). The parameters of the DeDPC algorithm are the number of data point neighbors (
N) and the average number of neighbors, represented as a percentage (
p). The CSA-DBSCAN method can find the best
Eps parameter adaptively, so the parameter of this method is
MinPts. For the synthetic datasets, we uniformly set the search range of the CSA-DBSCAN method parameter to [0, 20]. Since the value of
MinPts is a positive integer within 15, the CSA-DBSCAN algorithm nicely overcomes the drawback inherent to parameter setting in the DBSCAN approach.
In
Table 2, we have marked the best clustering evaluation metric values for the different methods in bold. As can be seen in
Table 2, the CSA-DBSCAN method obtained the best cluster evaluation metric values on the six synthetic datasets.
Table 2 shows all the complex structured datasets with their uneven density distributions. It can be seen from the table that the k-means, AP, DPC, and DPCSA algorithms do not work well with this type of dataset, and AmDPC and DeDPC do not achieve the best metric values for some of the datasets, though the values they achieve are still high. The DBSCAN algorithm is very close to 1 for most of the clustering metric values but cannot reach 1. The main reason is that the algorithm incorrectly identifies the boundary points as anomalies in the clustering process. The outlier assignment mechanism of the CSA-DBSCAN algorithm solves the problem of incorrectly identified outliers in the DBSCAN method very well. The algorithm can adaptively find the best parameters and clustering data for any complex structure with good clustering accuracy and clustering performance.
To better analyze the clustering performance, in
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6 and
Figure 7 and
Table 2, all the algorithms correspond to the best clustering results. As shown in
Figure 2, the Donutcurves dataset is a complex dataset of ring and semi-ring structures. We can see that the k-means, AP, DPC, DPCSA, AmDPC, and DeDPC methods do not accurately cluster the Donutcuvers dataset. The DBSCAN method clustered the data accurately but identified one of the category points as noisy, resulting in poor clustering accuracy. The CSA-DBSCAN approach obtained better clustering results in this complex structured dataset. In
Figure 3, the Target dataset is a closed-loop dataset with an inhomogeneous density structure. We find that both the DBSCAN and CSA-DBSCAN algorithms can handle this type of dataset well. However, neither the AP nor the k-means algorithm can handle this type of dataset well. In
Figure 4, we can see that both the k-means and AP algorithms do not handle the torus data and the central category points well. The DBSCAN approach uniformly misidentifies the central category points and the points outside the torus as noisy points. The CSA-DBSCAN method not only identifies the central category points but also assigns the points next to the torus to the corresponding clusters. It is possible to state that the CSA-DBSCAN method has good clustering accuracy and clustering efficiency for closed-loop datasets with irregular density compared to other algorithms.
In
Figure 5, Pearl is a semi-annular multi-density dataset. We can see that the k-means, AP, DPC, and DeDPC algorithms are not able to handle the half-loop data, but both the DBSCAN and CSA-DBSCAN algorithms can handle this dataset well. However, the DBSCAN approach incorrectly identifies the boundary points as noisy points, while the DBSCAN, DPCSA, AmDPC, and CSA-DBSCAN algorithms obtain the best clustering results perfectly. In
Figure 6, Complex8 is a dataset with a complex structure and an uneven density distribution, and the processing of this dataset can particularly detect the clustering effect of the method. In
Figure 6, we can see that the k-means, AP, DPC, DPCSA, AmDPC, and DeDPC algorithms have poor clustering results and cannot identify complete classes for this dataset. It is encouraging that the CSA-DBSCAN method can obtain the optimal clustering effects more accurately than the DBSCAN algorithm. In
Figure 7, the Complex9 dataset is a complex dataset with annular, semi-annular, and long arc structures. We can see that the k-means, AP, DPC, DPCSA, AmDPC, and DeDPC methods cannot handle this class of data. The DBSCAN and CSA-DBSCAN methods can handle this dataset perfectly, and the CSA-DBSCAN algorithm has better clustering accuracy and performance.
To further verify the utility of the proposed method, we further tested the effectiveness of the algorithm on real-world datasets.
4.3. Performance Evaluation on Real-World Datasets
In this subsection, we describe the clustering properties of different algorithms on real-world datasets. In
Table 3, we give the values of clustering evaluation metrics for different methods on six high-dimensional datasets. The parameters of the algorithm tested on the real-world dataset are the same as those in
Table 2. To facilitate comparison and analysis of the algorithm performance, we give in the table the values of the best clustering evaluation metrics obtained after repeated training and testing. Since real-world datasets have characteristics such as high dimensionality and nonlinearity, we uniformly set the search range of the CSA-DBSCAN method to [0, 20]. The search space of the proposed method is one parameter, so the search dimension is set to 1.
From
Table 3, we can see that the CSA-DBSCAN algorithm for the Seeds dataset has higher values for all metrics than the other state-of-the-art clustering algorithms. From the Thyroid dataset, we can see that the NMI, ARI, and FM clustering metric values for the CSA-DBSCAN and DBSCAN methods are higher than the other algorithms, and the ACC metric values are slightly lower than the k-means method. In the Ionosphere dataset, the NMI, ARI, and FM clustering metrics of the CSA-DBSCAN algorithm are higher than the other algorithms, and the ACC metrics are slightly lower than those of the AmDPC method. In the Glass dataset, both the DeDPC and AmDPC algorithms have only one metric value higher than the other algorithms, respectively. In the Vehicle dataset, the ACC and FM metrics of the DeDPC algorithm are higher than the other algorithms, and the NMI and ARI metrics of the CSA-DBSCAN algorithm are higher than the other algorithms. In the Vehicle data set, the CSA-DBSCAN and DeDPC algorithms perform similarly. From the Iris dataset, we can see that the DPCSA algorithm achieves the best clustering evaluation metric values. The CSA-DBSCAN and AmDPC algorithms have the same and slightly lower clustering evaluation metric values than the DPCSA algorithm and are higher than the other clustering algorithms. Additionally, we can see that the CSA-DBSCAN algorithm only needs to input one parameter to obtain the best clustering result adaptively. Therefore, the parameters of the CSA-DBSCAN algorithm are simple and efficient. The CSA-DBSCAN method can redistribute these identified noisy points, which gives the new method better clustering capabilities and effectiveness in real-world datasets.
From
Table 3, we can find that the clustering metric values of the DBSCAN algorithm differ from the clustering metric values of the CSA-DBSCAN algorithm. The main reason is that the data distribution in the high-dimensional dataset is discrete, and the DBSCAN method sometimes incorrectly identifies some boundary points as anomalies. The CSA-DBSCAN method can redistribute these identified noisy points, which makes the method better regarding clustering accuracy and effectiveness in the real dataset. In summary, the CSA-DBSCAN algorithm can handle datasets with a complex structure of inhomogeneous density.
The CSA-DBSCAN algorithm also has its unique advantages in practical applications. For complex image data, such as image segmentation and face recognition, the CSA-DBSCAN method can further explore the potential relationship between boundary points and anomalies, providing more reference values for image processing.
4.4. Performance Test of Outlier Processing after Fusion Deviation Theory
In this subsection, we will further explore the impact of introducing deviation theory to measure the compactness between data points on the clustering performance. We will select three representative synthetic datasets and two real-world datasets from
Table 1 to fully measure the impact of deviation theory on the performance of outlier assignments. The nearest neighbor assignment mechanism of the CSA-DBSCAN algorithm is an extension of the KNN algorithm [
33]. Before the use of deviation theory, the CSA-DBSCAN algorithm mainly relied on the KNN algorithm for the assignment of anomalies. Therefore, this assignment mechanism is influenced by the parameter
k. Therefore, we will further discuss the effect of introducing the deviation theory and the variation of parameter value
k on the clustering performance. We will control the other parameters of the CSA-DBSCAN algorithm in
Table 2 and
Table 3 to keep them constant and observe the effect of introducing deviation theory and different
k values on the clustering results.
In
Figure 8,
Figure 9 and
Figure 10, we present the results of the anomaly assignment for the three synthetic datasets Twocirclesnoise, Complex8, and Complex9.
Figure 8a shows the results of the CSA-DBSCAN algorithm after iteration to obtain the optimal parameters, where the white points are the identified anomalies.
Figure 8b,c shows the results of anomaly assignment for different parameter values
k using the KNN method, respectively. After introducing the deviation theory to measure the compactness between data points, we directly assign the anomalies to the clustering results of the nearest category points according to the compactness relationship between the anomalies and other data points (see
Figure 8d).
Figure 9a and
Figure 10a represent the anomaly identification results after iterations of the Complex8 and Complex9 datasets, respectively. In
Figure 9b,c, we can see that some anomalies are assigned to other categories. In
Figure 9d, we can see that the anomalies are well assigned to the corresponding categories. In
Figure 10b, we can see that one outlier is incorrectly assigned to other classes along the direction of the red arrow. Both
Figure 10c and
Figure 10d display the accurate assignment of the outliers.
In
Table 4, we give the values of the clustering evaluation metrics for the Twocirclesnoise, Complex8, Complex9, Seeds, and Vehicle datasets under different allocation strategies, respectively.
Figure 8,
Figure 9 and
Figure 10 show the visual clustering results in
Table 4. Seeds and Vehicle are real-world datasets. For easier observation, we bold the clustering results after introducing the deviation theory. From the table, we can see that different
k values have an impact on the clustering performance, and the results of anomaly assignment using deviation theory have a significant improvement on the clustering performance. For the Seeds dataset, the clustering index value after introducing deviation theory is significantly higher than the KNN assignment method. In the Vehicle dataset, for NMI and ARI metric values, the method of assignation using the introduction of deviation theory always outperforms the method of assigning in other cases, and the variation in the values of ACC and FM is not significant. Therefore, overall, the Vehicle dataset has better performance for the assignment method with the introduction of deviation theory.
It can be seen that the KNN algorithm is used to directly assign outliers, and different parameter values k have a large impact on the clustering results. The deviation theory can measure the compactness between data points by processing the distance property in the data points. We assign anomalies directly to the nearest boundary points based on the compactness relationship between anomalies and category points. The assignment method with the introduction of deviation theory effectively solves the influence of the parameter k on the clustering performance and enables the CSA-DBSCAN algorithm to obtain more accurate clustering results. The assignment method with the introduction of deviation theory does not require any input parameters to complete the effective assignment of outliers. This allocation method greatly improves the effectiveness and stability of the CSA-DBSCAN algorithm’s outlier allocation.
4.5. Comparative Analysis of Algorithm Running Time
In this subsection, we compare and examine the algorithms’ running times in further detail. Before analyzing the algorithms’ running time, we provide the time complexity of the various methods (see
Table 5). In
Table 5,
n represents the number of data points in the dataset,
I represents the number of iterations, and
K represents the number of clusters determined by the k-means method. According to
Table 5, the DBSCAN algorithm has the lowest time complexity, followed by the k-means algorithm. The time complexity of the DPC, DPCSA, AmDPC, and DeDPC algorithms is similar and reasonable. Both the AP and CSA-DBSCAN algorithms are affected by the number of iterations during operation, so they have a higher time complexity. Theoretically, the time complexity of the AP algorithm is higher than that of the CSA-DBSCAN algorithm. However, the analysis of the runtime flow of the AP algorithm shows that the AP algorithm is only locally iterative during the runtime, while the CSA-DBSCAN algorithm is globally iterative during the runtime. In this case, it is difficult to further determine the magnitude of the time consumption of the AP and CSA-DBSCAN algorithms.
To further appreciate the time required to execute the various methods, we provide detailed running times for the 12 datasets in
Table 2 and
Table 3 (see
Table 6). Each dataset in
Table 6 corresponds to a different ordinal number.
Table 6’s runtime parameter settings are identical to those in
Table 2 and
Table 3. As seen in
Table 6, the DBSCAN algorithm has the shortest total runtime, while the k-means technique has the second shortest. The CSA-DBSCAN algorithm has the longest runtime compared to the other algorithms in the 12 datasets. Among them, the runtime consumption of the Complex8 and Complex9 datasets is 55.3762 and 86.5821 s, respectively, which far exceeds the runtime of the other algorithms. The Complex8 and Complex9 datasets have particularly large data volumes, and as a result, their runtime increases significantly. The CSA-DBSCAN algorithm is a fusion of CSA optimization and DBSCAN algorithm, and the overall running process is affected by the number of iterations, so the time the CSA-DBSCAN algorithm spends is higher than the other algorithms.
Table 6 includes a running time comparison (see
Figure 11) to help visualize the time cost of the various algorithms. The CSA-DBSCAN, AP, and AmDPC algorithms have the longest overall runtimes, and runtimes greater than one second can be seen in
Figure 11. We provide line comparison graphs of the CSA-DBSCAN, AP, and AmDPC algorithms to further observe the running time trends on different datasets (see
Figure 11). In
Figure 11, the horizontal coordinates represent the ordinal numbers corresponding to the different datasets in
Table 6. We redesigned the scale of the vertical coordinates because of the excessive differences in the running times of the different algorithms. As a result, the time cost of different algorithms can all be compared and analyzed more intuitively. In
Figure 11, we represent the algorithms with lower running times as bar charts. As can be seen in
Figure 11, the runtime of the line graph is overall greater than one second, and the runtime of the bar graph is overall less than one second.
Figure 11 shows that the CSA-DBSCAN algorithm has the longest runtime, followed by the AP algorithm. The AmDPC algorithm has a shorter runtime than the CSA-DBSCAN and AP algorithms but a longer runtime than the others. The DBSCAN method has the shortest running time, followed by the k-means algorithm, while the DPC, DPCSA, and DeDPC algorithms have times that are comparable. The first six datasets in
Figure 11 are synthetic, while the latter six are real-world datasets. From the description of the data attributes in
Table 1, we can see that the amount of data in the synthetic dataset is significantly higher than that in the real-world dataset, and the running times of all eight algorithms are higher in the synthetic dataset than in the real-world dataset. In the real-world dataset, Ionosphere and Glass are both particularly high-dimensional datasets, but their runtime consumption is still lower than that of the synthetic dataset among these eight state-of-the-art algorithms.
Therefore, we may conclude that changing the dimensionality has little effect on the running time consumption of the algorithms. The amount of data in different datasets has a significant impact on an algorithm’s running time. Because the CSA-DBSCAN algorithm is a systematic fusion of CSA optimization and DBSCAN algorithm in an iterative process, its running time consumption is significant when compared to other algorithms.
4.6. Robustness Analysis of the CSA-DBSCAN Algorithm
In this subsection, we tested and analyzed the CSA-DBSCAN algorithm’s robustness. The CSA-DBSCAN algorithm is an adaptive method that fuses swarm intelligence optimization and a density clustering method. The method is divided into two stages: chameleon swarm optimization and density clustering. The process of optimizing the parameters of the DBSCAN algorithm using the chameleon swarm method is primarily influenced by two parameters: the number of iterations I and the number of populations n. We uniformly set these two parameters to fixed values of 60 and 10, respectively, before running the optimization algorithm. The DBSCAN algorithm is mainly influenced by the MinPts parameter. This parameter is usually set to a positive integer between 1 and 15. Therefore, we further verified the robustness of the CSA-DBSCAN method by analyzing the effect of different parameters on the clustering performance.
We separately tested the effects of the three CSA-DBSCAN algorithm parameters
I,
n, and
MinPts on the clustering results and investigated their robustness. We began with the number of iterations
I, which influences the CSA-DBSCAN algorithm’s parameter search quality. We set different values of
I for testing:
I ∈ [50, 500] (
I is a positive integer), and then performed tests on three synthetic datasets and three real-world datasets, whose clustering results are shown in
Table 7. The values in
Table 7 are the means and standard deviations of the CSA-DBSCAN algorithm’s 10 experimental results for parameter
I ranging from 50 to 500. We highlighted the mean and standard deviation of the better evaluation metrics’ results. To ensure experiment reliability, we kept the population size parameters
n and
MinPts constant to test the effect of the CSA-DBSCAN algorithm parameter
I on clustering performance.
As expressed in
Table 7, the mean values of the assessment metrics for the datasets Twocirclesnoise, Pearl, Complex8, and Glass are greater and have a standard deviation of 0, which suggests that the number of iterations does not affect the clustering outcomes. The mean values and low standard deviations of the evaluation measures for the Vehicle and Iris datasets are comparable to those in
Table 3. Therefore, the clustering performance of the CSA-DBSCAN algorithm is sound for the Vehicle and Iris datasets. Generally, the variation of the clustering performance of the CSA-DBSCAN algorithm with the number of iterations
I is robust. The main reason for the variation in the clustering results for the Vehicle and Iris datasets is that the optimization process is not optimal at 50 iterations. In the experimental process, we only need to set the number of iterations to be greater than 50 to achieve the optimal optimization of the algorithm.
In
Table 8, we give a range of values for the number of chameleon populations,
n. For the experiments, we set
n to be between 5 and 55 and divided it into ten equal parts.
Table 8 shows the mean and standard deviation of the clustering index values for ten experiments with the number of iterations
I and
MinPts held constant. The better values are highlighted in bold. From
Table 8, we can see that the mean values of the indicators for the six datasets are the same as those in
Table 2 and
Table 3 and have zero standard deviation. Thus, it can be seen that the changes in the number of chameleon populations
n and the number of iterations
I hardly affect the clustering results. Therefore, the number of iterations
I and the number of chameleon populations
n are robust in the clustering process.
As shown in
Table 9, we maintained the number of iterations
I and the number of populations
n of the optimization process as constant values to see how the DBSCAN algorithm parameter
MinPts affects the clustering results. We set
MinPts from 1 to 10 and split them into 10 equal portions. By computing the mean and standard deviation of the clustering metrics for the six datasets at 10 different
MinPts, we were able to determine the sensitivity of the DBSCAN and CSA-DBSCAN algorithms to parameter changes. For easy observation, we bolded the improved mean and standard deviation in
Table 9. From
Table 9, we can see that the mean values of the CSA-DBSCAN algorithm are always higher than the DBSCAN algorithm for the six datasets. For the Twocirclesnoise, Pearl, and Complex8 datasets, the standard deviations of the CSA-DBSCAN algorithm are always lower than those of the DBSCAN method. This indicates that the CSA-DBSCAN algorithm is more robust with the variation of
MinPts in these three datasets. The standard deviations of a few measures in the DBSCAN algorithm are marginally lower than those in the CSA-DBSCAN algorithm in the Glass, Vehicle, and Iris datasets but are otherwise extremely close. This indicates that the clustering results of the CSA-DBSCAN algorithm are better and more robust in these six datasets. This shows that when the
MinPts values are changed, the CSA-DBSCAN algorithm is more robust than the DBSCAN method. In summary, the CSA-DBSCAN algorithm considerably enhances the DBSCAN algorithm’s soundness.
To examine the significance of the differences in performance between the CSA-DBSCAN and DBSCAN algorithms with different
MinPts settings in
Table 9, we conduct statistical tests, as shown in
Table 10. In
Table 10, we use
p-values to determine if there is a statistical difference in algorithm performance based on NMl, ARI, ACC, and FM evaluation criteria. As shown in
Table 10, the
p-values for the Pearl, Glass, and Iris datasets are less than 0.05, and in some cases, less than 0.01 for most measures, demonstrating that a performance difference between the CSA-DBSCAN and DBSCAN algorithms exists and is extremely statistically significant. Only the
p-value for the NMI metric in the Twocirclesnoise and Complex8 datasets was less than 0.05, indicating a statistically significant difference between the NMI values in these two datasets. In the Vehicle dataset, the
p-values for the ARI and FM measures are less than 0.01, showing a highly significant difference between these two metrics. In
Table 10, we highlight the values of the indicators with significant differences. Combining the mean and standard deviation of the evaluated metric values in
Table 9, we can see that the CSA-DBSCAN algorithm outperforms the DBSCAN method in general. Therefore, for the
MinPts parameter, the CSA-DBSCAN algorithm is overall more robust than DBSCAN and has a statistically significant difference.
In conclusion, as the number of iterations I, the number of populations n, and the value of MinPts are modified, the CSA-DBSCAN algorithm can still produce superior clustering results. The clustering performance of the CSA-DBSCAN algorithm has a certain robustness and effectiveness.
4.7. Color Image Segmentation Clustering Applications
In this subsection, we selected four real images from the Berkeley segmentation dataset (BSDS500) to test the clustering performance of the CSA-DBSCAN algorithm’s image segmentation. In this experiment, the image segmentation evaluation metric
was chosen as the fitness function to adaptively search for the optimal parameter values. In
Figure 12, the original image is shown on the leftmost side, and the image segmentation results of eight state-of-the-art clustering methods, k-means, AP, DBSCAN, DPC, DPCSA, AmDPC, DeDPC, and CSA-DBSCAN, are shown from left to right. In order to reduce the computational effort, an effective superpixel SLIC algorithm was used in this paper to pre-segment the image into multiple superpixel information uniformly before image processing. Finally, this information is fed into the different algorithms mentioned above for testing.
To evaluate the effectiveness of the CSA-DBSCAN algorithm for practical applications in image segmentation, we give the boundary displacement error (BDE) [
32] and probabilistic Rand index (PRI) metrics [
38] for different image segmentation methods. PRI is calculated as the ratio of the number of pixels with identical algorithmic segmentation and multiple manual segmentation labels to the whole number of pixels; BDE is a measure of the average displacement error of the boundary pixels in both segmentation results. In
Table 11, we give the segmentation metric metrics for the four real images in
Figure 12 in order from top to bottom. The “↓” indicates that the smaller the BDE value is, the better the image segmentation is. The “↑” indicates that the larger the PRI value, the better the image segmentation effect, and the PRI value is in the range of (0, 1). The evaluation index values in
Table 11 and
Table 12 correspond to the clustering results in
Figure 12. In the table, we bolded the best indicator values for a clearer view of the data.
From
Table 11 and
Table 12 and
Figure 12, we can see that the first three image k-means, AP, DPC, DPCSA, AmDPC, and DeDPC algorithms counting from top to bottom do not segment the images as well as the DBSCAN and CSA-DBSCAN algorithms. Both the DBSCAN and CSA-DBSCAN methods segment the images well and accurately describe the overall image contour. In the last image, the AmDPC algorithm PRI metric value reaches its highest value, and the CSA-DBSCAN algorithm PRI value is very close compared to AmDPC. However, the BDE metric value of the CSA-DBSCAN algorithm is much higher than that of the AmDPC algorithm. Overall, the CSA-DBSCAN algorithm outperforms the AmDPC algorithm for the segmentation of the fourth image. As can be seen in
Table 11 and
Table 12, the BDE and PRI metric values of the DBSCAN method are slightly lower than those of the CSA-DBSCAN method. This indicates that the CSA-DBSCAN method has better image segmentation accuracy compared to the DBSCN algorithm.
Therefore, compared with other mainstream clustering algorithms, the CSA-DBSCAN algorithm has better clustering accuracy and effectiveness in the application of image segmentation. In summary, the CSA-DBSCAN algorithm has a certain effectiveness and practicality in adaptive search parameters and image segmentation.