1. Introduction
Fuzzy C-Means (FCM) is a soft clustering technique [
1] that extends the traditional K-means (a hard clustering method) to accommodate data that may belong to multiple clusters simultaneously [
2]. Unlike hard clustering methods, which assign each datum exclusively to a single cluster [
3], FCM assigns membership values to each datum for every cluster, indicating the degree to which the datum belongs to each cluster [
4]. This membership value is represented by a fuzzy membership function, which allows for more nuanced and flexible clustering [
5].
In FCM, the algorithm iteratively updates prototypes (cluster centroids) and membership values until convergence is reached [
6]. The objective function of the FCM algorithm minimizes the total fuzzy membership-weighted squared distances between the data and the prototypes [
7]. This iterative process ensures that each datum is assigned to the prototype where it has the highest membership value. FCM finds applications in various fields, particularly in pattern recognition and classification problems [
8]. In classification tasks, FCM can be utilized as a soft clustering technique to assign membership probabilities to data across multiple classes [
9]. By considering the fuzzy memberships, FCM provides a more nuanced understanding of the data distribution, allowing for better classification accuracy, especially in scenarios where data may belong to multiple classes simultaneously or exhibit overlapping boundaries between classes [
10].
There are many advantages of the FCM algorithm. FCM is less sensitive to noise and outliers compared to hard clustering methods like K-means since it assigns membership values rather than definitive cluster assignments [
11]. FCM allows for flexible cluster boundaries by accommodating data that belong to multiple clusters to varying degrees, making it suitable for datasets with complex structures and overlapping clusters [
12]. Fuzzy membership values provided by FCM offer a more interpretable representation of data’s relationships with clusters, allowing for a more nuanced analysis of data distribution [
13]. FCM can handle datasets of varying shapes, densities, and sizes, making it applicable to a wide range of classification tasks across different domains. Overall, FCM offers a powerful approach to classification problems by incorporating fuzzy logic to provide a more flexible and robust clustering solution [
14].
Over the last few decades, theoretical research and application efforts surrounding FCM have made great progress. In [
15], the authors present a significant advancement in the field of image segmentation by introducing an innovative adaptive FCM method. Using a dynamic cluster center adjustment mechanism based on data density, the algorithm effectively addresses the limitations of traditional FCM in handling diverse and complex image datasets. The experimental results validate the superiority of the algorithm in terms of segmentation accuracy, computational efficiency, and robustness. By incorporating an adaptive-weighted Euclidean distance measure, the algorithm in [
16] addresses the limitations of traditional FCM and improves its accuracy and robustness in clustering tasks. The experimental results validate the effectiveness of the proposed enhancement and highlight its potential for various applications requiring accurate and robust clustering techniques, including data analysis, pattern recognition, and image segmentation.
By leveraging the local data relationships captured through the KNN graph, the FCM-KNN algorithm [
17] addresses the limitations of traditional FCM and improves its accuracy and robustness in clustering tasks. The experimental results validate the performance of the proposed hybrid approach and highlight its potential for various applications requiring accurate and robust clustering techniques. In [
18], a novel adaptation of the FCM clustering algorithm that incorporates density peaks is introduced to improve clustering performance. The authors provide a detailed description of the adaptive FCM algorithm based on density peaks. They explain how the algorithm identifies density peaks and incorporates them into the fuzzy membership and cluster center update steps. They outline the iterative process of the algorithm and discuss its computational complexity. The study significantly advances the field of clustering algorithms and has implications for various applications requiring accurate and robust clustering techniques.
By adjusting prototypes to better represent the underlying data distribution, an enhanced fuzzy clustering method is developed [
19]. The study explains how refinements to prototypes are incorporated into the iterative process of FCM, highlighting the adjustments made to better align prototypes with data. The authors compare it with traditional FCM and other state-of-the-art classification and reconstruction methods using benchmark datasets. Evaluation metrics such as classification accuracy, reconstruction errors, and computational efficiency are considered. However, the limitation of this scheme is that it cannot adaptively obtain the upper and lower bounds of the adjusted prototypes, which will increase the parameters designed by the algorithm. In [
20], the authors provide a detailed description of a data weighting algorithm and integrate it into the fuzzy clustering process. They explain how the algorithm adjusts prototype optimization based on the weighted contributions of data, emphasizing its adaptability to different datasets. By assigning appropriate weights to data, the algorithm improves the accuracy of prototype optimization and enhances the classification performance of fuzzy clustering algorithms. The experimental results validate the effectiveness of the proposed approach and highlight its potential for various applications requiring accurate and robust clustering techniques. In summary, adjusting the prototype is one of the most direct and effective ways to improve the classification performance of FCM.
The aforementioned studies all rely on the theory of fuzzy sets to attain algorithmic flexibility. Fuzzy-set-based granular computing is a method that combines fuzzy set theory and granular computing theory to address classification problems. In classification algorithms, it offers a more flexible and robust approach, particularly in handling data with high uncertainty and fuzziness. Furthermore, the existing research on fuzzy clustering typically generates prototypes as discrete points which effectively capture the global characteristics of the data but struggle to represent local characteristics. This limitation poses challenges for practical applications. For instance, when designing a classifier, it is difficult to achieve good classification performance through parameter adjustment. Therefore, effectively extending the prototypes can enhance the flexibility of the algorithm and greatly benefit its application.
In this study, we design a granular prototype based Fuzzy C-Means method that can be used to enhance its performance in classification. In the developed scheme, we utilize the membership functions to expand (define) the prototypes into a collection of granular prototypes to achieve a deeper revelation of the structure in the data. At this point, the data are softly divided into core and extended parts. By introducing these granular prototypes into the objective function of Fuzzy C-Means, we obtain a spectrum of potentially available partition matrices. Subsequently, quantum-behaved particle swarm optimization (QPSO) [
21,
22,
23] is employed to determine an optimal partition (matrix) for the data. This optimized matrix enables us to achieve superior classification results, effectively enhancing the classification rate of the dataset. To the best of our knowledge, this marks the first exploration of such an approach. Our experimental results substantiate the efficacy of the proposed scheme.
The paper is structured to systematically address key phases of the design issues.
Section 2 provides a concise introduction to Fuzzy C-Means clustering. In
Section 3, we elaborate on an enhanced Fuzzy C-Means through refinements of prototypes and apply the quantum-behaved particle swarm optimization to identify the optimal partition matrix for the data.
Section 4 presents the findings of our experimental studies. Finally,
Section 5 encapsulates our conclusions.
2. A Brief Introduction of Fuzzy C-Means Clustering
Fuzzy C-Means (FCM) clustering is a popular method used for partitioning a dataset into a set of clusters, where each datum may belong to multiple clusters with varying degrees of membership. Unlike traditional crisp clustering methods, which assign each datum to only a single cluster, FCM allows for soft clustering by assigning membership values to each datum for each cluster [
2,
24,
25]. For a dataset
X containing
N elements in
n-dimensional space, the FCM method aims to minimize an objective function known as the fuzzy partition coefficient, which measures the fuzziness or ambiguity of the clustering. The objective function is defined as the following [
26]:
where
is the number of clusters and
is the membership value of data
in cluster
.
denotes the weighting exponent
, also known as the fuzzifier parameter.
is the prototype (centroid) of cluster
, and
is the Euclidean distance between data
and the centroid
.
The FCM algorithm iteratively updates the membership values
and cluster prototypes
until convergence is achieved. By using Lagrange multipliers, the membership values and the cluster prototypes can be derived in the following equations:
FCM allows for greater flexibility in clustering compared to crisp clustering algorithms, making it suitable for applications where data may belong to multiple clusters simultaneously or exhibit ambiguity in their cluster assignments. In practical applications, researchers have consistently aimed to improve the performance of this fuzzy partitioning method.
3. Augmentation of Fuzzy C-Means with Granular Prototypes
In granular computing, the process of clustering data through FCM is also referred to as information granulation, representing a granulation mechanism [
27]. Information granules are expressed through prototypes and membership matrices. The prototypes are regarded as representative points in the data, and they serve as a condensed representation of data, offering an overview of the data’s characteristics. Typically, the expectation is that the data encapsulated by prototypes occupies a defined region within the feature space, which enhances the description and characterization of the data’s features. However, relying solely on prototypes often fails to effectively capture the structural nuances of the data or unveil its essence within the feature space. To better reflect and describe the data, as well as to characterize its essence more comprehensively, it becomes imperative to extend prototypes within the feature space in order to form granular prototypes.
In classification issues, the space (region) formed by the core data constitutes the granular prototypes (the core parts of the data are covered by the granular prototypes). In this study, we use prototypes and the membership matrix (
) to establish granular prototypes. To facilitate analysis, we write the membership matrix
in the following form:
It is easy to know that the core of the data can be identified by the subset of data exhibiting the highest membership degrees for each class. We define a membership threshold
, which we use to divide the membership matrix into two parts. Correspondingly, the data are also divided into two parts: the core part and the contour part.
Let
represent the
jth granular prototypes, which can better characterize the structure of data compared with the prototypes. Thus, we can modify the objective function of the FCM to discover the information contained in the data in depth. By introducing the granular prototypes, the objective function can be modified in the following manner:
In other words, the traditional prototypes (represented as point sets) are substituted with a group of representative regions. From a numerical calculation perspective, the above modification can also be expressed as the following:
Figure 1 depicts the membership functions alongside the granular prototypes with a membership threshold
. In
Figure 1b, the region within the yellow wireframe signifies the core part of the data, which corresponds to the granular prototypes. The incorporation of granular prototypes not only confines the prototype to particular regions during the optimization of the objective function but also enables adjustments to the prototype within that region based on user requirements, thereby enhancing the flexibility of the method.
In classification tasks during model training, we employ classification errors (original labels of the data) to supervise the objective function. This helps us derive the best prototype and consequently generate the optimal partition matrix to achieve the required data partitioning. The process of designing a classifier using the developed scheme is summarized as follows.
First, FCM is employed to generate the initial prototypes and membership matrix (partition matrix). Following this, a membership threshold is set and a number of granular prototypes are constructed. The granular prototypes serve as the primary elements of the information granules, offering an optimization space for traditional prototypes. To accommodate users’ classification requirements, the prototypes can be adjusted based on the granular prototypes, and the partition matrix is modified accordingly. To attain the optimal prototypes, the classification error determined by labels of the data is utilized for the model supervision, and quantum particle swarm optimization (QPSO) [
28] is employed for prototype optimization. The QPSO stands out for its prowess in global optimization, swift convergence, adaptability to dynamic environments, robustness against noise, and ability to maintain solution diversity. Additionally, its inherent parallelism enables efficient scalability and computation, making it a compelling choice for various optimization tasks.
In the optimization of QPSO, let
be the initial prototypes generated through the classical FCM, and we set the initial position of
K particles to
. The fitness value is obtained by calculating and minimizing the classification error. The personal best position of the
th particle is represented as
, and the global best position is given by
. The mean of the personal best positions of all particles can be computed in the following form:
where
indicates the current iterative. The convergence behavior of particles, as outlined in [
29], indicates that each particle is expected to converge towards its respective local attractor
. This local attractor is determined through the following:
By employing the Monte Carlo method, the determination of the particle position is achieved through the following:
where
is referred to as the contraction–expansion coefficient. Typically, it is computed as the following [
30]:
where
is the maximum iteration time. In general,
.
Ultimately, the optimized prototypes are produced and lead to a refined partition matrix, thereby enhancing the classification performance.
Figure 2 illustrates the methodology of implementing the proposed scheme in detail. To summarize, the fundamental steps include the following:
- (1)
The employment of classical FCM to conduct fuzzy clustering on dataset X, resulting in an initial partition matrix and a set of prototypes.
- (2)
The transformation of prototypes into granular prototypes using the membership functions.
- (3)
The replacement of prototypes with granular prototypes to establish an optimized boundary for the objective function.
- (4)
The utilization of QPSO for prototype optimization, guided by classification errors for supervision.
- (5)
The completion of the optimization process, which yields an optimal prototype, refining the partition matrix to maximize classification accuracy, thus completing model training.
4. Experimental Studies
The ensuing analysis seeks to evaluate the efficacy of the devised scheme through a comparative study with the FCM and Gaussian kernel-based fuzzy clustering (GKFCM) [
31] methods. The principal objective of this comprehensive array of experiments is to examine the classification performance of these clustering methodologies. A diverse range of experiments are conducted utilizing publicly available datasets (
http://archive.ics.uci.edu/ml,
https://www.kaggle.com/datasets, and
http://lib.stat.cmu.edu/datasets/ (accessed on 1 May 2024)). The classification rate [
32] serves as a primary index in these experiments, chosen for its widespread adoption as a standard measure for performance evaluation.
Throughout the experiments, we investigate different values within the interval [1.1, 3.1] for the fuzzification factor, adjusting it in increments of 0.2. For the GKFCM, we set the Gaussian kernel parameter σ2 as 10 to 100 in increments of 10. The number of classes is set to range from the original number of classes, denoted as C0 to C0 + 5. In the implementation of the QPSO, the method is run for 500 iterations with 30 particles by considering the number of prototypes and the number of dataset samples. However, we also allow the algorithm to terminate when the objective function remains unchanged in 50 consecutive iterations. To evaluate the effectiveness of the proposed approach, we utilize five-fold cross-validation, a commonly employed technique for assessing and validating the classification performance and stability of fuzzy classification models.
We utilize 10 publicly available datasets, detailed descriptions of which can be found on
http://archive.ics.uci.edu/ml,
https://www.kaggle.com/datasets, and
http://lib.stat.cmu.edu/datasets/ (accessed on 1 May 2024). Brief descriptions of these datasets are as follows: the Iris dataset is a classic machine learning dataset containing 150 samples of three different species of iris flowers (setosa, versicolor, and virginica), with four features for each sample (sepal length, sepal width, petal length, and petal width). The Connectionist Bench dataset comprises datasets commonly utilized in neural network research, featuring 208 samples. Each pattern consists of a set of 60 numbers ranging from 0.0 to 1.0. These numbers represent the energy within specific frequency bands, integrated over defined time periods. The Glass Identification dataset contains 214 samples of glass, each with nine attributes, including a refractive index and various chemical properties, aiming to classify the types of glass. The Pima Indian dataset contains 768 medical records of Pima Indian women, with eight physiological attributes (pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, and age), aiming to predict the onset of diabetes. The Statlog heart dataset contains medical data of patients, including 13 attributes (such as age, sex, and physiological measurements), aiming to predict the presence or absence of heart disease. The User Knowledge dataset contains 403 samples with five features and is used to assess the performance of learning systems and contains data on the knowledge levels of students in different domains and their learning behavior. The FIFA dataset contains data on football matches, players, teams, and match results, and it is used for football data analysis and prediction. The Habermans dataset contains medical data from breast cancer patients, including their age, year of operation, and the number of positive axillary nodes detected, aiming to predict the survival status of patients after five years. The Irish dataset is used for the classification of Irish individuals and contains physiological and demographic attributes. The Sensory dataset contains sensory evaluation data for food and beverages, including taste, aroma, and other sensory attributes.
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12 plot the experimental results, represented as the means of experimental results for five-fold cross-validation, indicating the classification rates for each dataset. Notably, the application of the proposed method consistently enhances the classification quality across all datasets.
The proposed approach noticeably improves the classification performance across all datasets and demonstrates significant advantages over the FCM algorithm. The kernel-based clustering method also provides enhancements for most datasets. Interestingly, in most cases, the proposed method achieves higher classification rates than the kernel-based clustering algorithm. On average, the classification performance improvement is about 5%, ranging from a minimal improvement of 1% to a maximum of 10%.
In summary, this study presents a novel approach to classifier design utilizing fuzzy clustering with granular prototypes. Through the use of granular prototypes, we achieve a space determined by the membership function for prototype optimization, enabling adjustments to the partitioning matrix and facilitating optimal data partitioning tailored to user requirements.
5. Conclusions
In this study, we develop a granular prototype-based fuzzy clustering method to design an enhanced fuzzy classifier. During the design process, the membership function of the data is used to extend the prototypes into a group of granular prototypes to significantly characterize the structure of data. Consequently, the data are softly partitioned into core and extended segments. We employ quantum-behaved particle swarm optimization to optimize the prototypes within the granular prototypes and use the classification errors derived from the data labels to supervise the optimization process. This developed scheme facilitates the creation of an optimal partition matrix tailored to the dataset. The theoretical framework presented in this study is supported by a series of experimental investigations. The experimental results show that compared with similar methods, our algorithm exhibits excellent performance in designing fuzzy classifiers.
The proposed scheme is not only beneficial to the theoretical research of developing clustering algorithms based on fuzzy sets, but it also facilitates the translation of these technologies into practical applications.
Unfortunately, the proposed algorithm involves the QPSO optimization which results in additional computational overhead. Thus, the proposed models facilitate a specific means of enhancing the performance of fuzzy clustering in classification and also highlight a broader challenge based on reducing computational complexity. One of the future directions is to explore the deep relationship between the size of granular prototypes and the classification rate. Future research endeavors could also explore extending the proposed granular prototype scheme to various fuzzy clustering methods, in order to enhance performance according to user requirements.