1. Introduction
Clustering is one of the basic methods to uncover the underline structure within data. Clustering divides a dataset into clusters, where data points contained within each cluster display strong similarity, while those belonging to separate clusters demonstrate significant dissimilarity [
1,
2]. In the era of the internet of things (IoT) and big data, a substantial volume of data is generated across various areas of interest. The collected data originate from various sources or multiple views with diverse representations, including documents in different languages, mobile internet usage, social networks, multimedia data analysis, information and image retrieval, healthcare applications, and scientific research. In multivariate analysis, clustering methods can generally be categorized into two main approaches: probability model-based clustering and nonparametric clustering. This study focuses on the latter, emphasizing nonparametric clustering techniques. Several widely used nonparametric clustering methods are used in the literature, including the k-means [
3,
4], fuzzy c-means [
5,
6,
7], and possibilistic c-means [
8,
9]. However, while these clustering techniques may work well for low-dimensional data, they are not well suited for high-dimensional datasets. Multiview learning is gaining ground as a promising path within the field of pattern recognition and machine learning. Unlike conventional single-view clustering, multiview clustering leverages valuable feature knowledge from diverse perspectives to enhance clustering performance; multiview data encompass more shared, complementary, and redundant information. Numerous methodologies for multiview clustering methods have been proposed in order to increase the efficacy.
In the multiview clustering literature, Bickel and Scheffer [
10] were the first to introduce multiview clustering as a method for handling multiview data. Since then, numerous researchers in the multiview clustering literature have contributed to advancements across various domains, such as in the paradigm of unsupervised multiview clustering (Yang and Hussain [
11] and Hussain et al. [
12]). In the field of multiview graph learning, various researchers have made significant contributions [
13,
14,
15,
16,
17]. For incomplete multiview clustering, algorithms were proposed by [
18,
19,
20,
21,
22]. In the area of tensor-based multiview clustering, notable algorithms have been proposed by Liu and Song [
23], Li et al. [
24], Chen et al. [
25] and Benjamin and Yang [
26]. Regarding multiview feature learning, several schemes have been developed. For example, Xu et al. [
27] proposed feature learning for contrastive clustering, Zhang et al. [
28] introduced sparse feature selection for images, Tang et al. [
29] introduced consensus learning for unsupervised feature selection, and Xu et al. [
30] presented a weighted multiview scheme for feature selection. In weighted multiview clustering, several researchers have made valuable contributions. Jiang et al. [
31] introduced collaborative fuzzy clustering, Wang and Chen [
32] proposed minimax optimization for fuzzy clustering, Yang and Sinaga [
33] developed a feature reduction algorithm for k-means, and Yang and Sinaga [
34] also proposed a collaborative scheme for FCM clustering. Some recent weighted multiview clustering techniques have been proposed. Benjamin and Yang [
35] presented an algorithm for PCM using L2 regularization, Khan et al. [
36] introduced a weighted concept-based scheme for incomplete multiview clustering, Zhou et al. [
37] proposed evidential c-means clustering in a multiview scenario, Liu et al. [
38] introduced an adaptive scheme with feature preference, Ouadfel and Abd Elaziz [
39] proposed a multi-objective gradient optimizer approach-based scheme, Houfar et al. [
40] introduced a binary weighted scheme in a multiview scenario, and Liu et al. [
41] proposed a weighted scheme in evidential clustering in a multiview paradigm.
The rest of this study is structured as follows:
Section 2 introduces a literature review. In
Section 3, we introduce the weighted multiview k-means algorithms designed for clustering multiview datasets. Experimental results and comparisons with existing methods are discussed in
Section 4. Discussions are outlined in
Section 5. Lastly,
Section 6 concludes this study and offers some future recommendations, particularly suggesting the use of a point symmetry-based distance instead of Euclidean distance to better capture cluster symmetry behaviors [
42,
43].
2. Literature Review
In this section, we review some multiview clustering algorithms that we compared with our proposed algorithms. We use two hard clustering algorithms built on multiview k-means, and two soft clustering algorithms based on multiview fuzzy c-means are discussed.
Xu et al. [
30] introduced a multiview clustering algorithm based on k-means, termed the multiview weighted algorithm with feature selection (WMCFS). This algorithm incorporates two approaches for data points and feature selection. The objective function of WMCFS, as described by Xu et al. [
30], is as follows:
The parameter is utilized to regulate the sparsity of view weights, while is employed to regulate the sparsity of feature weights. In WMCFS, the balancing parameter is utilized to control the feature weights within each view.
Yang and Sinaga [
33] introduced the feature reduction multiview k-means (FRMVKM) clustering algorithm. The objective function of FRMVKM [
33] is defined as:
In this formulation, represents the parameter used to control the feature weights within the k-th cluster of each view, while denotes the view weight.
Jiang et al. [
31] presented another extension to the multiview clustering algorithm, termed WV-Co-FCM. In this approach, they accounted for varying weights for each view, incorporating parameters to control the distribution of these view weights. The objective function of WV-Co-FCM is expressed as:
where
is the
h-th view weight with
used to regulate view weights,
and
WV-Co-FCM is designed to address multiview data, where weights play a crucial role in the final step. Specifically, without the influence of weights, the membership of the cluster center and each object are updated independently.
Wang and Chen [
32] presented the Minmax-FCM clustering algorithm, which operates without a collaboration step and considers different weights for each view. Minmax-FCM is constructed based on the single-view FCM, where it evaluates the least distance between the membership matrix
and cluster centers
in each view. The extreme value within views
influences the smallest separation between
and
. Consequently, the consistent clustering results in Minmax-FCM are achieved through Minmax optimization, aiming to minimize dissimilarities across diverse views. The formulation of Minmax-FCM [
35] is as follows:
In the Minmax-FCM scenario, and serve as user-defined parameters to regulate the distribution of views.
While previous works, such as WMCFS, FRMVKM, WVCoFCM, and Minmax-FCM, have made significant contributions to multiview clustering by focusing on feature selection, relevance filtering and different weights for views, there remains a gap in the effective integration of feature and view weights within the k-means framework. Specifically, existing algorithms focus either exclusively on feature selection (e.g., WMCFS) or weight adjustment (e.g., FRMVKM and WVCoFCM) but fail to comprehensively combine feature and view weights within a unified clustering strategy. Although the Minmax FCM takes view weights into account, it does not include a collaboration step, which further limits its ability to effectively capture the interdependencies between features and views. In response to this gap, we propose two novel algorithms, W-MV-KM and W-MV-KN-L2-Norm, both of which are based on the k-means algorithm and integrate feature weights and view weights in a synergistic manner. These algorithms provide an innovative approach by extending the k-means framework with weighted strategies to better capture the complex relationships in multiview datasets, improve clustering performance, and ensure more accurate feature selection. This contribution addresses the limitations of previous methods and presents a more comprehensive solution for multiview clustering.
3. Weighted Multiview K-Means Clustering
In this section, we introduce two weighted multiview k-means clustering algorithms tailored for multiview data. We additionally explore the clustering performance of weighted multiview k-means in comparison to existing multiview clustering algorithms. Consider a multiview dataset denoted as
comprising
views and
features, where
in the Euclidean space
where
. Let us consider the view vector weight
with
. let
where
assuming the data point
belongs to the
k-th cluster,
otherwise, i.e.,
let
where
let
be the c clusters. Therefore, we introduce our first weighted multiview k-means (W-MV-KM) algorithm, which assigns distinct weights to both views and features. Our objective is to devise a weighted scheme that aims to discern the significance of views as well as features within each view. The objective function for the proposed W-MV-FCM is
Theorem 1. Assume , let and is the view weight for the h-th view The constraints for minimizing the objective function of W-MV-KM are as follows:
Proof. For the optimization of the proposed W-MV-KM, the Lagrange multiplier technique is used. regarding the cluster center, membership, view weight, and feature weight, the constraints for minimizing the W-MV-KM are as follows. Initially, we hold V, W, and Z fixed and solve the updated equation for the cluster center. By taking the partial derivative of the with respect to and making them equal to zero, we have ; solving for , we obtain the updated Equation (6). Subsequently, with V, A, and W held constant, we solve the updated equation for the membership. By taking the partial derivative of the Lagrangian concerning and equating it to 0, we obtain Equation (7). Assuming Z, A, and W are fixed, we proceed with the partial derivative of the Lagrangian with respect to and equate the result to 0. . We obtain the updated Equation (8) for the view weight. Given fixed Z, A, and V, we compute the partial derivative of the Lagrangian with respect to and set it equal to zero, such that . Upon solving the equation, we can drive the updating Equation (9) for the feature weight . Therefore, the proposed W-MV-KM algorithm can be summarized as Algorithm 1. □
Algorithm 1. The W-MV-KM Algorithm. |
,
|
. |
. Initialize the feature weight initialize view weight , and set t = 1. |
Step 1: by Equation (6). |
Step 2: by Equation (7). |
Step 3: |
Step 4: |
Step 5: then stop; |
Else go back to Step 1. |
In W-MV-KM, represent two exponent parameters utilized for regulating the distribution of feature weights and view weights. When both are set to zero, the W-MV-KM objective function simplifies to the k-means objective function.
We aim to discover an optimal combination of values to minimize the intra-cluster variance of the h-th view and j-th features. The parameter regulates the distribution of view weights. Let then Equation (8) will can be expressed as , where is the intra-cluster variance. When values of Equation (8) suggests that only one view is chosen for clustering, which is not suitable for handling multiview datasets. As the values of increase, the distribution of view weights becomes more uniform, indicating that more views will contribute to the clustering process. Conversely, the parameter which governs the distribution of feature weights, can be examined as follows: Let us use to represent the intra-cluster variance of the j-th feature. With this, Equation (9) can be expressed as . If , this indicates the selection of only one feature. Then, we need to select values of that stabilize the algorithm and ultimately yield favorable results for W-MV-KM.
The clustering outcomes are influenced by the parameters
in the W-MV-KM algorithm, posing challenges in both the control and estimation of results. Hence, to manage and stabilize these effects, we set the values of both
to 2. This is how we apply regularization to the objective function for both the feature and view weights. We subsequently introduce another multiview learning approach for W-MV-KM, integrating L2 regularization, which we refer to as W-MV-KM-L2. In W-MV-KM-L2, L2 regularization is applied to both the features and view weights. The objective function of the W-MV-KM-L2 is formulated in the following:
In Algorithm 2, we incorporate the exponential parameters with the regularization framework. This integration serves to control the distribution of both features and views. In W-MV-KM, the parameters are determined by the user, while in W-MV-KM-L2, we calculate the parameters within the regularization terms by means of the equations and Likewise, we derive the updating equations for the W-MV-KM-L2 objective function through the utilization of the Lagrange multiplier. Consequently, we derive Theorem 2, as presented below.
Theorem 2. Suppose The necessary conditions for minimizing the objective function are
Proof. To demonstrate the optimization problem described in Equation (10), we employ the Lagrange multipliers. Consequently, we construct the Lagrangian function as follows. We first set W, V and Z to solve the updated equation for cluster centers . We achieve this by computing partial derivatives with respect to and subsequently setting them equal to zero, yielding the following equation . Upon solving for , we obtained the updated Equation (11) . Subsequently, by fixing A, W, and V, we proceed to solve the updated equation for the membership degree . After computing the partial derivative of with respect to and setting it equal to zero, we obtain . From this, we obtain the updated Equation (12). Consequently, with fixed Z, A, and W, we compute the partial derivative of with respect to and equating it to zero, we have . Further solving this equation leads to the derivation of the updated Equation (13). We compute the partial derivative of with respect to and set it to zero. The results obtained are as follows ; therefore, the updated Equation (14) is obtained. □
Algorithm 2. The W-MV-KM-L2 Algorithm |
and , “c” number of cluster and . |
Output: . |
, and set t = 1. |
Step 1: where τ is the covariance of the data points in the |
Step 2: by Equation (11). |
Step 3: by Equation (12). |
Step 4: . |
Step 5: . |
Step 6: , then stop. |
Otherwise, go back to Step 2. |
The W-MV-KM-L2 incorporates parameters , which play a vital role in determining the classification of view weights and feature weights. To regulate the behavior of we select appropriate values of parameters in a manner that ensures the stability of the algorithm, thus enabling the computation of final estimates. The parameters in the W-MV-KM-L2 objective function are computed through and , where c signifies the total number of clusters and signifies the total number of features in each view. If the value of is larger than the value of it becomes very small and consequently does not significantly contribute to the determination of view weights. The role of c becomes crucial in regulating the values of . Let be the intra-cluster variances in the h-th view. Then, it becomes . Smaller values of will result in smaller weights, whereas large values of will lead to larger view weights This demonstrates that supervises the weights in each view. Thus, view weights are utilized to update the feature weights in each view. Meanwhile, the other parameter is estimated using the sample variance for each view, such that and it is utilized to regulate the distribution in each view. From Equation (14), we regard as the intra-cluster variance of the features in view. Thus, Equation (14) becomes . If the values of are larger, then the feature weights also become larger, whereas if the values of are smaller, then the feature weights also become smaller. Hence, is employed to diminish the feature weights that are irrelevant. Consequently, features with larger weights are prioritized over those with smaller weights.
Computational Complexity: We conduct an analysis of computational complexity for both the proposed W-MV-KM algorithm and the W-MV-KM-L2 algorithm. These algorithms can be segmented into four parts: (1) compute the cluster center with O(nsc); (2) update the membership degree with O(nsc2d); (3) update the view weight with ; (4) update the feature weights with . The total computational complexity for the W-MV-KM and W-MV-KM-L2 algorithm is where n represents the number of data points, s denotes the number of views, c signifies the number of clusters, and stands for the dimensionality of data points.
4. Experimental Comparisons and Results
In this section, two synthetic and seven real-life datasets are used to demonstrate the performance of the proposed W-MV-KM-L2 algorithm. We compare the W-MV-KM-L2 algorithm with five existing algorithms, W-MV-KM, WMCFS [
30], WV-Co-FCM [
31], Minmax-FCM [
32], and FRMVKM [
33], in this section. For the experiments, we use the same initializations for the cluster center, feature weight, and view weights. For measuring the clustering performance, we use the following evaluation measures: accuracy rate (AR), Jaccard index (JI) [
44], Fowlkes–Mallows index (FMI) [
45], Rand Index (RI) [
46], and normalized mutual information (NMI) [
47]. Accuracy rate (AR) is the term used to describe the proportion of correct predictions or classifications made by a model or system, expressed as a percentage of the total number of predictions or classifications attempted. The Jaccard index (JI) [
44] shows how similar two sets are by comparing the number of items they have in common to the overall count of unique items in both sets. The FMI [
45] is a metric utilized to assess the similarity between two clustering results. It calculates the geometric mean of the pairwise similarity between points in the same clusters across two different clustering results. The RI [
46] measures how similar two sets of clusters are. It counts pairs of data points that are either in the same cluster in both sets or different clusters in both sets, and the NMI [
47] is like a friendly compass that helps us navigate through clusters of data. It measures how well two different clustering results match up, considering the size and arrangement of the clusters. The greater the magnitudes of AR, JI, FMI, RI, and NMI, the more exceptional the clustering performance becomes.
The experiments are executed in Matlab 2020a with identical initializations. They are repeated 100 times using distinct random initializations of the WMCFS, FRMVKM, WV-Co-FCM, Minmax-FCM, W-MV-KM, and W-MV-KM-L2 algorithms. For Minmax-FCM and WV-Co-FCM, the fuzziness index m is 2 for the methods.
Example 1. We generate a dataset with two distinct views using a Gaussian mixture model: ,
where h represents the view index. These two views are then merged to form the samples . The dataset consists of two clusters across both views, with 1500 data points sampled for each view. For the first view, the cluster means are set to and . In the second view, the cluster means are and . The covariance matrices , i = 1,…,4, are applied to both views. Each view consists of two main features: for view 1, and for view 2. Additionally, we introduce one noise feature per view. For view 1, the noise feature is , sampled from a uniform distribution in the range [0, 5], and for view 2, the noise feature is , sampled from a uniform distribution in the range [0, 10], named Artificial 1 dataset. A 2D and 3D graph illustrating the dataset is shown in Figure 1a–d for both view 1 and view 2, respectively. Figure 2 shows the clustering visualization of the two algorithms W-MV-KM and W-MV-KM-L2 for both view 1 and view 2 in 3D representation.
Table 1 shows the clustering performance of the proposed algorithms with the existing algorithms, where the W-MV-KM-L2 algorithm performs well, as compared to other algorithms in all evaluation measures. In this example, we try different values for two parameters
and report the values of the algorithms in
Table 2.
In W-MV-KM, we try different values for the two parameters, as mentioned in
Table 2. If we choose for both parameters
, then the algorithm is unable to give results; if we choose
, then the algorithm performs, but the results are very poor as compared to other existing algorithms. Thus, we propose W-MV-KM-L2, where the values of
are determined by the following formulas, as discussed in
Section 3. In the W-MV-KM algorithm, if we keep values of
, then the algorithm is unable to perform. So, for this, it should be
. If we increase the parameter values
, even then, the clustering performance is not improved. In both algorithms, W-MV-KM and W-MV-KM-L2, the view weight should be
because if it is
, then only one view is selected, which is against the basic principal of multiview clustering.
Example 2. In this example, we use the synthetic data used in [35], called Syn1. To implement the dataset in the proposed and existing algorithms, we keep different initializations for existing algorithms: we choose = 0.001
for WMCFS; for FRMVKM, we assume ; and for WV-Co-FCM, . These serve as the parameters. With the given initializations, we apply these algorithms to the dataset, and their clustering performance is summarized in Table 3. The results indicate that W-MV-KM-L2 obtained the highest values across all evaluation metrics. Example 3. In this example, we use the synthetic dataset used in [48] to implement the W-MV-KM-L2, W-MV-KM, Minmax-FCM, WV-Co-FCM, FRMVKM, and WMCFS in the Syn500 dataset. For our initializations, we choose = 0.01
for WMCFS; for FRMVKM, we assume ; and for WV-Co-FCM, . For Minmax-FCM and WV-Co-FCM, the fuzziness index is m = 2. With the given initializations, we apply these algorithms to the dataset, and their clustering performance is summarized in Table 4. The results indicate that W-MV-KM-L2 obtained the highest values across all evaluation metrics. We first normalized all the real datasets. We tested the performance of our proposed algorithms with seven real datasets. Seven real-world multiview datasets serve as the benchmark to assess the efficacy of the W-MV-KM-L2 algorithm. These are the Minist4 dataset [
49], Handwritten4 (HW) [
50], Caltech2 dataset [
51], UCI Derm dataset [
52], HumanEva 3D Motion [
53,
54,
55] dataset, UCI 3views dataset [
50], and Microsoft Research Cambridge Volume 1 (MSRC-V1) dataset [
56], respectively. The characteristics of these seven real datasets are displayed in
Table 5 in terms of cluster number
c, data type
s, the data number
n, the feature dimension
dh, and the view numbers.
Example 4. In this example, we use seven real datasets for comparing W-MV-KM-L2 with all the algorithms, such as WMCFS, FRMVKM, WV-Co-FCM, Minmax-FCM and W-MV-KM. Various combinations of and will result in different distributions for the view and feature weight vectors, respectively. This is the case for WMCFS and WV-Co-FCM, where and are user defined. For FRMVKM, we use for all datasets. In W-MV-KM-L2, the regularization parameters and are used to control the sparsity of the view weights and feature weights, respectively. The parameters and in W-MV-KM-L2 are simply calculated by and . We compare the results of the proposed algorithms with existing algorithms. To achieve this, we utilize real datasets and apply the algorithms using the assumed values of in Table 6. For each given value of
, we ran 100 simulations for each algorithm using different seed generations across all clustering methods and recorded their clustering performance metrics. This process was repeated for all real datasets listed in
Table 6. We present only the overall average mean and standard deviation of the performance measures for each algorithm and compare them with the results of W-MV-KM-L2. The outcomes, summarized in
Table 7, show that W-MV-KM-L2 consistently achieves higher results for AR, RI, NMI, JI, and FMI across all real datasets.
Example 5. In this example, we compare the number of iterations required for convergence, along with the running time, for the WMCFS, FRMVKM, WV-Co-FCM, Minmax-FCM, W-MV-KM, and W-MV-KM-L2 clustering algorithms. The number of iterations needed for convergence is reported in Table 8. The initial parameters are set as follows: m = 2 is used as the fuzzifier value for Minimax-FCM and WV-Co-FCM. The convergence (error) tolerance for all the algorithms is set to be . The running time per second achieved by all the six algorithms is reported in Table 9, and the lowest running time for any dataset achieved by any algorithm is highlighted in bold face. Example 6. In this example, we use the multiview cluster validity indices proposed by Yang and Hussain [11], the multiview Dunn (MV-Dunn) index, and multiview generalized Dunn (MV-G-Dunn) index. We implement the multiview validity indices only based on k-means such as WMCFS, FRMVKM, and W-MV-KM-L2. We consider the WMCFS+MV-Dunn, WMCFS+MV-G-Dunn, FRMVKM+MV-Dunn, FRMVKM+MV-G-Dunn, W-MV-KM-L2+MV-Dunn, and W-MV-KM-L2+MV-G-Dunn regularization with 40 different initializations. The estimated number of clusters, along with their percentages, determined using the MV-Dunn and MV-G-Dunn indices, are presented in Table 10 for WMCFS, Table 11 for FRMVKM, and Table 12 for WMVKML2.