A New Clustering Method Based on the Inversion Formula
Abstract
:1. Introduction
2. Estimation of the Density of the Modified Inversion Formula
2.1. Gaussian Mixture and Inversion Density Estimation
2.2. Modified Inversion Density Estimation
2.3. Modified Inversion Density Clustering Algorithm
Algorithm 1: Clustering Algorithm Based on the Modified Inversion Formula Density Estimation (MIDE) | |
Input: Data set X= [X1, X2, …, Xn], cluster number K | |
Output: C1, C2, _, Ct and , , Possible initiation of mean vector: (1) random uniform initialization (2) k-means (3) random point initialization Generate a T matrix. The set T is calculated when the design directions are evenly spaced on the sphere. | |
1 | For i = 1: t do |
2 | Density estimation for each point and cluster based on (9) Update , , values based on (22, 23, 24) |
4 | End |
5 | Return C1, C2, _, Ct and , , |
3. Experimental Analyses
3.1. Evaluation Metrics
3.2. Experimental Datasets
3.3. Performances of Clustering Methods
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix B
Dataset | K-Means | GMM | BGMM | MIDEv1 | MIDEv2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
Synthetic | ||||||||||
Aggregation | 0.836 | 0.004 | 0.886 | 0.035 | 0.909 | 0.041 | 0.779 | 0.006 | 0.845 | 0.005 |
Atom | 0.289 | 0.003 | 0.170 | 0.036 | 0.194 | 0.028 | 0.310 | 0.004 | 0.319 | 0.003 |
D31 | 0.969 | 0.005 | 0.951 | 0.008 | 0.871 | 0.004 | 0.791 | 0.007 | 0.822 | 0.006 |
R15 | 0.994 | 0.000 | 0.989 | 0.012 | 0.868 | 0.014 | 0.881 | 0.001 | 0.909 | 0.001 |
Gaussians1 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 |
Threenorm | 0.024 | 0.001 | 0.047 | 0.039 | 0.007 | 0.002 | 0.069 | 0.001 | 0.076 | 0.001 |
Twenty | 1.000 | 0.000 | 0.996 | 0.008 | 0.956 | 0.026 | 1.000 | 0.000 | 0.988 | 0.005 |
Wingnut | 0.562 | 0.000 | 0.778 | 0.002 | 0.779 | 0.000 | 0.459 | 0.000 | 0.420 | 0.001 |
Real | ||||||||||
Breast | 0.547 | 0.011 | 0.659 | 0.003 | 0.630 | 0.003 | - | - | - | - |
CPU | 0.487 | 0.013 | 0.398 | 0.025 | 0.389 | 0.033 | 0.467 | 0.013 | 0.529 | 0.011 |
Dermatology | 0.862 | 0.009 | 0.809 | 0.044 | 0.862 | 0.049 | - | - | - | - |
Diabetes | 0.090 | 0.004 | 0.084 | 0.041 | 0.105 | 0.017 | 0.089 | 0.004 | 0.106 | 0.003 |
Ecoli | 0.636 | 0.004 | 0.636 | 0.016 | 0.639 | 0.010 | 0.592 | 0.004 | 0.534 | 0.004 |
Glass | 0.303 | 0.019 | 0.327 | 0.052 | 0.364 | 0.042 | 0.304 | 0.020 | 0.369 | 0.024 |
Heart-statlog | 0.363 | 0.005 | 0.270 | 0.055 | 0.263 | 0.058 | 0.339 | 0.008 | 0.308 | 0.007 |
Iono | 0.125 | 0.000 | 0.305 | 0.052 | 0.299 | 0.024 | - | - | - | - |
Iris | 0.657 | 0.006 | 0.890 | 0.04 | 0.751 | 0.011 | 0.841 | 0.007 | 0.763 | 0.008 |
Wine | 0.876 | 0.000 | 0.856 | 0.055 | 0.926 | 0.054 | 0.822 | 0.001 | 0.799 | 0.003 |
Thyroid | 0.559 | 0.000 | 0.783 | 0.059 | 0.661 | 0.051 | 0.382 | 0.009 | 0.390 | 0.008 |
Generated clusters with outliers | ||||||||||
2 clusters (0.5% outliers) | 0.976 | 0.000 | 0.976 | 0.000 | 0.976 | 0.000 | 0.977 | 0.000 | 1.000 | 0.000 |
2 clusters (1% outliers) | 0.947 | 0.000 | 0.957 | 0.000 | 0.957 | 0.000 | 0.958 | 0.000 | 0.974 | 0.000 |
2 clusters (2% outliers) | 0.916 | 0.000 | 0.925 | 0.000 | 0.925 | 0.000 | 0.928 | 0.000 | 0.976 | 0.000 |
2 clusters (4% outliers) | 0.867 | 0.000 | 0.876 | 0.000 | 0.876 | 0.000 | 0.886 | 0.000 | 0.972 | 0.000 |
3 clusters (0.5% outliers) | 0.978 | 0.000 | 0.978 | 0.000 | 0.978 | 0.000 | 0.978 | 0.000 | 0.993 | 0.000 |
3 clusters (1% outliers) | 0.964 | 0.000 | 0.964 | 0.000 | 0.964 | 0.000 | 0.964 | 0.000 | 0.986 | 0.000 |
3 clusters (2% outliers) | 0.943 | 0.000 | 0.943 | 0.000 | 0.943 | 0.000 | 0.945 | 0.000 | 0.985 | 0.000 |
3 clusters (4% outliers) | 0.907 | 0.000 | 0.901 | 0.000 | 0.898 | 0.000 | 0.911 | 0.000 | 0.982 | 0.000 |
Dataset | K-Means | GMM | BGMM | MIDEv1 | MIDEv2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
Synthetic | ||||||||||
Aggregation | 0.725 | 0.008 | 0.795 | 0.069 | 0.860 | 0.089 | 0.687 | 0.035 | 0.862 | 0.023 |
Atom | 0.176 | 0.003 | 0.058 | 0.028 | 0.076 | 0.024 | 0.204 | 0.006 | 0.221 | 0.004 |
D31 | 0.949 | 0.016 | 0.903 | 0.027 | 0.634 | 0.017 | 0.494 | 0.037 | 0.529 | 0.026 |
R15 | 0.993 | 0.000 | 0.975 | 0.036 | 0.608 | 0.020 | 0.747 | 0.021 | 0.786 | 0.018 |
Gaussians1 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 1.000 |
Threenorm | 0.032 | 0.001 | 0.058 | 0.045 | 0.009 | 0.002 | 0.088 | 0.003 | 0.089 | 0.002 |
Twenty | 1.000 | 0.000 | 0.986 | 0.028 | 0.836 | 0.096 | 1.000 | 0.000 | 1.000 | 0.000 |
Wingnut | 0.670 | 0.000 | 0.862 | 0.001 | 0.863 | 0.000 | 0.565 | 0.007 | 0.533 | 0.005 |
Real | ||||||||||
Breast | 0.664 | 0.008 | 0.772 | 0.003 | 0.747 | 0.003 | - | - | - | - |
CPU | 0.529 | 0.014 | 0.315 | 0.070 | 0.336 | 0.081 | 0.461 | 0.043 | 0.708 | 0.026 |
Dermatology | 0.712 | 0.038 | 0.697 | 0.096 | 0.728 | 0.112 | - | - | - | - |
Diabetes | 0.058 | 0.003 | 0.059 | 0.046 | 0.079 | 0.028 | 0.059 | 0.005 | 0.086 | 0.002 |
Ecoli | 0.505 | 0.008 | 0.649 | 0.011 | 0.665 | 0.014 | 0.551 | 0.013 | 0.423 | 0.015 |
Glass | 0.162 | 0.014 | 0.178 | 0.055 | 0.211 | 0.040 | 0.151 | 0.024 | 0.229 | 0.011 |
Heart-statlog | 0.451 | 0.005 | 0.352 | 0.072 | 0.344 | 0.075 | 0.422 | 0.013 | 0.452 | 0.011 |
Iono | 0.168 | 0.000 | 0.383 | 0.066 | 0.368 | 0.049 | - | - | - | - |
Iris | 0.617 | 0.009 | 0.888 | 0.077 | 0.654 | 0.030 | 0.819 | 0.029 | 0.888 | 0.008 |
Wine | 0.897 | 0.000 | 0.869 | 0.072 | 0.932 | 0.063 | 0.835 | 0.031 | 0.865 | 0.012 |
Thyroid | 0.583 | 0.000 | 0.850 | 0.075 | 0.735 | 0.074 | 0.297 | 0.045 | 0.356 | 0.015 |
Generated blobs with outliers | ||||||||||
2 clusters (0.5% outliers) | 0.991 | 0.000 | 0.990 | 0.000 | 0.990 | 0.000 | 0.993 | 0.000 | 1.000 | 0.000 |
2 clusters (1% outliers) | 0.976 | 0.000 | 0.980 | 0.000 | 0.980 | 0.000 | 0.980 | 0.000 | 0.992 | 0.000 |
2 clusters (2% outliers) | 0.957 | 0.000 | 0.961 | 0.000 | 0.961 | 0.000 | 0.961 | 0.000 | 0.989 | 0.000 |
2 clusters (4% outliers) | 0.920 | 0.000 | 0.924 | 0.000 | 0.924 | 0.000 | 0.928 | 0.000 | 0.990 | 0.000 |
3 clusters (0.5% outliers) | 0.990 | 0.000 | 0.990 | 0.000 | 0.990 | 0.000 | 0.991 | 0.000 | 0.997 | 0.000 |
3 clusters (1% outliers) | 0.982 | 0.000 | 0.982 | 0.000 | 0.982 | 0.000 | 0.984 | 0.000 | 0.993 | 0.000 |
3 clusters (2% outliers) | 0.967 | 0.000 | 0.967 | 0.000 | 0.967 | 0.000 | 0.967 | 0.000 | 0.993 | 0.000 |
3 clusters (4% outliers) | 0.938 | 0.000 | 0.925 | 0.000 | 0.918 | 0.000 | 0.941 | 0.000 | 0.992 | 0.000 |
Dataset | K-Means | GMM | BGMM | MIDEv1 | MIDEv2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
Synthetic | ||||||||||
Aggregation | 0.780 | 0.007 | 0.800 | 0.071 | 0.870 | 0.062 | 0.831 | 0.009 | 0.871 | 0.012 |
Atom | 0.556 | 0.002 | 0.501 | 0.004 | 0.503 | 0.005 | 0.575 | 0.004 | 0.582 | 0.004 |
D31 | 0.951 | 0.017 | 0.901 | 0.029 | 0.581 | 0.019 | 0.556 | 0.031 | 0.609 | 0.042 |
R15 | 0.993 | 0.000 | 0.975 | 0.038 | 0.664 | 0.011 | 0.756 | 0.041 | 0.834 | 0.027 |
Gaussians1 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 |
Threenorm | 0.420 | 0.001 | 0.443 | 0.050 | 0.381 | 0.005 | 0.481 | 0.003 | 0.496 | 0.004 |
Twenty | 1.000 | 0.000 | 0.984 | 0.030 | 0.838 | 0.075 | 1.000 | 0.002 | 0.986 | 0.005 |
Wingnut | 0.834 | 0.000 | 0.931 | 0.001 | 0.932 | 0.000 | 0.779 | 0.000 | 0.808 | 0.000 |
Real | ||||||||||
Breast | 0.833 | 0.004 | 0.887 | 0.001 | 0.874 | 0.002 | - | - | - | - |
CPU | 0.656 | 0.013 | 0.489 | 0.058 | 0.500 | 0.077 | 0.733 | 0.011 | 0.751 | 0.010 |
Dermatology | 0.719 | 0.038 | 0.699 | 0.079 | 0.730 | 0.106 | - | - | - | - |
Diabetes | 0.252 | 0.004 | 0.283 | 0.033 | 0.299 | 0.028 | 0.275 | 0.008 | 0.307 | 0.004 |
Ecoli | 0.557 | 0.009 | 0.655 | 0.018 | 0.663 | 0.006 | 0.606 | 0.008 | 0.663 | 0.007 |
Glass | 0.340 | 0.010 | 0.362 | 0.036 | 0.365 | 0.032 | 0.397 | 0.012 | 0.412 | 0.009 |
Heart-statlog | 0.720 | 0.003 | 0.663 | 0.043 | 0.659 | 0.045 | 0.714 | 0.005 | 0.727 | 0.004 |
Iono | 0.549 | 0.000 | 0.686 | 0.018 | 0.673 | 0.031 | - | - | - | - |
Iris | 0.730 | 0.008 | 0.923 | 0.064 | 0.752 | 0.029 | 0.889 | 0.012 | 0.905 | 0.009 |
Wine | 0.935 | 0.000 | 0.917 | 0.052 | 0.958 | 0.046 | 0.904 | 0.012 | 0.917 | 0.011 |
Thyroid | 0.787 | 0.000 | 0.914 | 0.035 | 0.856 | 0.038 | 0.639 | 0.007 | 0.675 | 0.008 |
Generated blobs with outliers | ||||||||||
2 clusters (0.5% outliers) | 0.993 | 0.000 | 0.993 | 0.000 | 0.993 | 0.000 | 0.993 | 0.000 | 1.000 | 0.000 |
2 clusters (1% outliers) | 0.983 | 0.000 | 0.985 | 0.000 | 0.985 | 0.000 | 0.985 | 0.000 | 0.991 | 0.000 |
2 clusters (2% outliers) | 0.969 | 0.000 | 0.971 | 0.000 | 0.971 | 0.000 | 0.972 | 0.000 | 0.994 | 0.000 |
2 clusters (4% outliers) | 0.942 | 0.000 | 0.944 | 0.000 | 0.944 | 0.000 | 0.946 | 0.000 | 0.996 | 0.000 |
3 clusters (0.5% outliers) | 0.991 | 0.000 | 0.991 | 0.000 | 0.991 | 0.000 | 0.993 | 0.000 | 0.998 | 0.000 |
3 clusters (1% outliers) | 0.983 | 0.000 | 0.983 | 0.000 | 0.983 | 0.000 | 0.985 | 0.000 | 0.995 | 0.000 |
3 clusters (2% outliers) | 0.969 | 0.000 | 0.969 | 0.000 | 0.969 | 0.000 | 0.972 | 0.000 | 0.994 | 0.000 |
3 clusters (4% outliers) | 0.941 | 0.000 | 0.932 | 0.000 | 0.927 | 0.000 | 0.945 | 0.000 | 0.993 | 0.000 |
Dataset | K-Means | GMM | BGMM | MIDEv1 | MIDEv2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
Synthetic | ||||||||||
Aggregation | 0.785 | 0.006 | 0.840 | 0.055 | 0.891 | 0.070 | 0.875 | 0.011 | 0.867 | 0.015 |
Atom | 0.654 | 0.001 | 0.653 | 0.006 | 0.649 | 0.003 | 0.659 | 0.002 | 0.669 | 0.003 |
D31 | 0.951 | 0.015 | 0.906 | 0.025 | 0.681 | 0.012 | 0.645 | 0.011 | 0.689 | 0.016 |
R15 | 0.993 | 0.000 | 0.977 | 0.033 | 0.682 | 0.016 | 0.779 | 0.011 | 0.817 | 0.009 |
Gaussians1 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 |
Threenorm | 0.518 | 0.000 | 0.535 | 0.030 | 0.514 | 0.002 | 0.552 | 0.002 | 0.559 | 0.003 |
Twenty | 1.000 | 0.000 | 0.987 | 0.026 | 0.857 | 0.075 | 1.000 | 0.000 | 0.984 | 0.004 |
Wingnut | 0.835 | 0.000 | 0.931 | 0.001 | 0.932 | 0.000 | 0.792 | 0.001 | 0.764 | 0.001 |
Real | ||||||||||
Breast | 0.847 | 0.004 | 0.893 | 0.001 | 0.881 | 0.001 | - | - | - | - |
CPU | 0.771 | 0.006 | 0.619 | 0.052 | 0.633 | 0.065 | 0.802 | 0.012 | 0.871 | 0.009 |
Dermatology | 0.769 | 0.030 | 0.760 | 0.074 | 0.784 | 0.087 | - | - | - | - |
Diabetes | 0.326 | 0.002 | 0.382 | 0.017 | 0.378 | 0.028 | 0.375 | 0.008 | 0.389 | 0.007 |
Ecoli | 0.625 | 0.006 | 0.740 | 0.008 | 0.762 | 0.009 | 0.678 | 0.006 | 0.698 | 0.006 |
Glass | 0.393 | 0.012 | 0.435 | 0.058 | 0.437 | 0.048 | 0.540 | 0.021 | 0.519 | 0.015 |
Heart-statlog | 0.734 | 0.002 | 0.683 | 0.026 | 0.679 | 0.028 | 0.724 | 0.011 | 0.737 | 0.009 |
Iono | 0.601 | 0.000 | 0.711 | 0.004 | 0.698 | 0.023 | - | - | - | - |
Iris | 0.743 | 0.006 | 0.927 | 0.041 | 0.781 | 0.011 | 0.899 | 0.005 | 0.877 | 0.005 |
Wine | 0.932 | 0.000 | 0.914 | 0.042 | 0.955 | 0.038 | 0.895 | 0.011 | 0.886 | 0.008 |
Thyroid | 0.841 | 0.000 | 0.931 | 0.023 | 0.888 | 0.022 | 0.705 | 0.013 | 0.736 | 0.009 |
Generated blobs with outliers | ||||||||||
2 clusters (0.5% outliers) | 0.995 | 0.000 | 0.995 | 0.000 | 0.995 | 0.000 | 0.996 | 0.000 | 1.000 | 0.000 |
2 clusters (1% outliers) | 0.988 | 0.000 | 0.990 | 0.000 | 0.990 | 0.000 | 0.990 | 0.000 | 0.994 | 0.000 |
2 clusters (2% outliers) | 0.978 | 0.000 | 0.980 | 0.000 | 0.980 | 0.000 | 0.981 | 0.000 | 0.996 | 0.000 |
2 clusters (4% outliers) | 0.960 | 0.000 | 0.961 | 0.000 | 0.951 | 0.000 | 0.963 | 0.000 | 0.995 | 0.000 |
3 clusters (0.5% outliers) | 0.993 | 0.000 | 0.993 | 0.000 | 0.993 | 0.000 | 0.993 | 0.000 | 0.998 | 0.000 |
3 clusters (1% outliers) | 0.988 | 0.000 | 0.988 | 0.000 | 0.988 | 0.000 | 0.991 | 0.000 | 0.996 | 0.000 |
3 clusters (2% outliers) | 0.978 | 0.000 | 0.978 | 0.000 | 0.978 | 0.000 | 0.981 | 0.000 | 0.996 | 0.000 |
3 clusters (4% outliers) | 0.959 | 0.000 | 0.951 | 0.000 | 0.948 | 0.000 | 0.964 | 0.000 | 0.995 | 0.000 |
References
- Ding, S.; Jia, H.; Du, M.; Xue, Y. A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf. Sci. 2018, 429, 215–228. [Google Scholar] [CrossRef]
- Liu, A.-A.; Nie, W.-Z.; Gao, Y.; Su, Y.-T. View-based 3-D model retrieval: A benchmark. IEEE Trans. Cybern. 2017, 48, 916–928. [Google Scholar] [CrossRef] [PubMed]
- Nie, W.; Cheng, H.; Su, Y. Modeling temporal information of mitotic for mitotic event detection. IEEE Trans. Big Data 2017, 3, 458–469. [Google Scholar] [CrossRef]
- Karim, M.R.; Beyan, O.; Zappa, A.; Costa, I.G.; Rebholz-Schuhmann, D.; Cochez, M.; Decker, S. Deep learning-based clustering approaches for bioinformatics. Brief. Bioinform. 2021, 22, 393–415. [Google Scholar] [CrossRef] [Green Version]
- Kim, T.; Chen, I.R.; Lin, Y.; Wang, A.Y.Y.; Yang, J.Y.H.; Yang, P. Impact of similarity metrics on single-cell RNA-seq data clustering. Brief. Bioinform. 2019, 20, 2316–2326. [Google Scholar] [CrossRef] [PubMed]
- Govender, P.; Sivakumar, V. Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmos. Pollut. Res. 2020, 11, 40–56. [Google Scholar] [CrossRef]
- Xu, S.; Yang, X.; Yu, H.; Yu, D.-J.; Yang, J.; Tsang, E.C. Multi-label learning with label-specific feature reduction. Knowl. -Based Syst. 2016, 104, 52–61. [Google Scholar] [CrossRef]
- Liu, K.; Yang, X.; Yu, H.; Mi, J.; Wang, P.; Chen, X. Rough set based semi-supervised feature selection via ensemble selector. Knowl. -Based Syst. 2019, 165, 282–296. [Google Scholar] [CrossRef]
- Wiwie, C.; Baumbach, J.; Röttger, R. Comparing the performance of biomedical clustering methods. Nat. Methods 2015, 12, 1033–1038. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.-H. A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Appl. Soft Comput. 2014, 20, 4–14. [Google Scholar] [CrossRef]
- Polat, K. Classification of Parkinson’s disease using feature weighting method on the basis of fuzzy C-means clustering. Int. J. Syst. Sci. 2012, 43, 597–609. [Google Scholar] [CrossRef]
- Nilashi, M.; Ibrahim, O.; Ahani, A. Accuracy improvement for predicting Parkinson’s disease progression. Sci. Rep. 2016, 6, 1–18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Trevithick, L.; Painter, J.; Keown, P. Mental health clustering and diagnosis in psychiatric in-patients. BJPsych Bull. 2015, 39, 119–123. [Google Scholar] [CrossRef] [PubMed]
- Yilmaz, N.; Inan, O.; Uzer, M.S. A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases. J. Med. Syst. 2014, 38, 48–59. [Google Scholar] [CrossRef]
- Alashwal, H.; El Halaby, M.; Crouse, J.J.; Abdalla, A.; Moustafa, A.A. The application of unsupervised clustering methods to Alzheimer’s disease. Front. Comput. Neurosci. 2019, 13, 31. [Google Scholar] [CrossRef]
- Farouk, Y.; Rady, S. Early diagnosis of alzheimer’s disease using unsupervised clustering. Int. J. Intell. Comput. Inf. Sci. 2020, 20, 112–124. [Google Scholar] [CrossRef]
- Li, D.; Yang, K.; Wong, W.H. Density estimation via discrepancy based adaptive sequential partition. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Rothfuss, J.; Ferreira, F.; Walther, S.; Ulrich, M. Conditional density estimation with neural networks: Best practices and benchmarks. arXiv 2019, arXiv:1903.00954. [Google Scholar]
- Trentin, E.; Lusnig, L.; Cavalli, F. Parzen neural networks: Fundamentals, properties, and an application to forensic anthropology. Neural Netw. 2018, 97, 137–151. [Google Scholar] [CrossRef]
- Trentin, E. Soft-constrained neural networks for nonparametric density estimation. Neural Process. Lett. 2018, 48, 915–932. [Google Scholar] [CrossRef]
- Huynh, H.T.; Nguyen, L. Nonparametric maximum likelihood estimation using neural networks. Pattern Recognit. Lett. 2020, 138, 580–586. [Google Scholar] [CrossRef]
- Ruzgas, T.; Lukauskas, M.; Čepkauskas, G. Nonparametric Multivariate Density Estimation: Case Study of Cauchy Mixture Model. Mathematics 2021, 9, 2717. [Google Scholar] [CrossRef]
- Biernacki, C.; Celeux, G.; Govaert, G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 2003, 41, 561–575. [Google Scholar] [CrossRef]
- Xu, Q.; Yuan, S.; Huang, T. Multi-dimensional uniform initialization Gaussian mixture model for spar crack quantification under uncertainty. Sensors 2021, 21, 1283. [Google Scholar] [CrossRef]
- Fraley, C. Algorithms for model-based Gaussian hierarchical clustering. SIAM J. Sci. Comput. 1998, 20, 270–281. [Google Scholar] [CrossRef] [Green Version]
- Maitra, R. Initializing partition-optimization algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 2009, 6, 144–157. [Google Scholar] [CrossRef] [Green Version]
- Meila, M.; Heckerman, D. An experimental comparison of several clustering and initialization methods. arXiv 2013, arXiv:1301.7401. [Google Scholar]
- Hasselblad, V. Estimation of parameters for a mixture of normal distributions. Technometrics 1966, 8, 431–444. [Google Scholar] [CrossRef]
- Behboodian, J. On a mixture of normal distributions. Biometrika 1970, 57, 215–217. [Google Scholar] [CrossRef]
- Ćwik, J.; Koronacki, J. Multivariate density estimation: A comparative study. Neural Comput. Appl. 1997, 6, 173–185. [Google Scholar] [CrossRef] [Green Version]
- Tsuda, K.; Akaho, S.; Asai, K. The em algorithm for kernel matrix completion with auxiliary data. J. Mach. Learn. Res. 2003, 4, 67–81. [Google Scholar]
- Lartigue, T.; Durrleman, S.; Allassonnière, S. Deterministic approximate EM algorithm; Application to the Riemann approximation EM and the tempered EM. Algorithms 2022, 15, 78. [Google Scholar] [CrossRef]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar]
- Everitt, B. Finite Mixture Distributions; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Redner, R.A.; Walker, H.F. Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 1984, 26, 195–239. [Google Scholar] [CrossRef]
- Xie, C.-H.; Chang, J.-Y.; Liu, Y.-J. Estimating the number of components in Gaussian mixture models adaptively for medical image. Optik 2013, 124, 6216–6221. [Google Scholar] [CrossRef]
- Ahmadinejad, N.; Liu, L. J-Score: A Robust Measure of Clustering Accuracy. arXiv 2021, arXiv:2109.01306. [Google Scholar]
- Zhong, S.; Ghosh, J. Generative model-based document clustering: A comparative study. Knowl. Inf. Syst. 2005, 8, 374–384. [Google Scholar] [CrossRef]
- Lawrence, H.; Phipps, A. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar]
- Wang, P.; Shi, H.; Yang, X.; Mi, J. Three-way k-means: Integrating k-means and three-way decision. Int. J. Mach. Learn. Cybern. 2019, 10, 2767–2777. [Google Scholar] [CrossRef]
- Fowlkes, E.B.; Mallows, C.L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. -Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 224–227. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, Y.; Wang, J.; Du, W.; Zhou, C. A novel SVC method based on K-means. In Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking, Hainan, China, 13–15 December 2008; pp. 55–58. [Google Scholar]
- Hyde, R.; Angelov, P. Data density based clustering. In Proceedings of the 2014 14th UK Workshop on Computational Intelligence (UKCI), Bradford, UK, 8–10 September 2014; pp. 1–7. [Google Scholar]
ID | Data Sets | Sample Size (N) | Dimensions (D) | Classes |
---|---|---|---|---|
Synthetic | ||||
1 | Aggregation | 788 | 2 | 7 |
2 | Atom | 800 | 3 | 2 |
3 | D31 | 3100 | 2 | 31 |
4 | R15 | 600 | 2 | 15 |
5 | Gaussians1 | 100 | 2 | 2 |
6 | Threenorm | 1000 | 2 | 2 |
7 | Twenty | 1000 | 2 | 20 |
8 | Wingnut | 1016 | 2 | 2 |
Real | ||||
9 | Breast | 570 | 30 | 2 |
10 | CPU | 209 | 6 | 4 |
11 | Dermatology | 366 | 17 | 6 |
12 | Diabetes | 442 | 10 | 4 |
13 | Ecoli | 336 | 7 | 8 |
14 | Glass | 214 | 9 | 6 |
15 | Heart-statlog | 270 | 13 | 2 |
16 | Iono | 351 | 34 | 2 |
17 | Iris | 150 | 4 | 3 |
18 | Wine | 178 | 13 | 3 |
19 | Thyroid | 215 | 5 | 3 |
Generated clusters with outliers | ||||
20 | 2 clusters (0.5% outliers) | 1005 | 2 | 2 |
21 | 2 clusters (1% outliers) | 1010 | 2 | 2 |
22 | 2 clusters (2% outliers) | 1020 | 2 | 2 |
23 | 2 clusters (4% outliers) | 1040 | 2 | 2 |
25 | 3 clusters (0.5% outliers) | 1005 | 2 | 3 |
26 | 3 clusters (1% outliers) | 1010 | 2 | 3 |
27 | 3 clusters (2% outliers) | 1020 | 2 | 3 |
28 | 3 clusters (4% outliers) | 1040 | 2 | 3 |
Dataset | K-Means | GMM | BGMM | MIDEv1 | MIDEv2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
Synthetic | ||||||||||
Aggregation | 0.857 | 0.005 | 0.835 | 0.075 | 0.907 | 0.042 | 0.889 | 0.008 | 0.895 | 0.009 |
Atom | 0.710 | 0.002 | 0.618 | 0.028 | 0.637 | 0.022 | 0.723 | 0.002 | 0.746 | 0.004 |
D31 | 0.972 | 0.015 | 0.928 | 0.028 | 0.601 | 0.022 | 0.721 | 0.017 | 0.723 | 0.013 |
R15 | 0.997 | 0.000 | 0.979 | 0.036 | 0.669 | 0.011 | 0.768 | 0.008 | 0.855 | 0.007 |
Gaussians1 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 |
Threenorm | 0.591 | 0.001 | 0.612 | 0.047 | 0.549 | 0.006 | 0.649 | 0.003 | 0.679 | 0.003 |
Twenty | 1.000 | 0.000 | 0.985 | 0.029 | 0.838 | 0.075 | - | - | - | - |
Wingnut | 0.909 | 0.000 | 0.964 | 0.000 | 0.965 | 0.000 | 0.876 | 0.000 | 0.880 | 0.000 |
Real | ||||||||||
Breast | 0.908 | 0.003 | 0.940 | 0.001 | 0.933 | 0.001 | - | - | - | - |
CPU | 0.738 | 0.008 | 0.574 | 0.073 | 0.590 | 0.093 | 0.808 | 0.007 | 0.828 | 0.006 |
Dermatology | 0.739 | 0.044 | 0.737 | 0.080 | 0.756 | 0.109 | - | - | - | - |
Diabetes | 0.356 | 0.010 | 0.419 | 0.043 | 0.439 | 0.033 | 0.420 | 0.008 | 0.448 | 0.007 |
Ecoli | 0.649 | 0.013 | 0.753 | 0.018 | 0.739 | 0.006 | 0.714 | 0.011 | 0.754 | 0.009 |
Glass | 0.447 | 0.016 | 0.468 | 0.025 | 0.483 | 0.025 | 0.465 | 0.013 | 0.487 | 0.017 |
Heart-statlog | 0.837 | 0.002 | 0.794 | 0.045 | 0.791 | 0.045 | - | - | - | - |
Iono | 0.707 | 0.000 | 0.810 | 0.029 | 0.803 | 0.023 | - | - | - | - |
Iris | 0.831 | 0.007 | 0.953 | 0.065 | 0.838 | 0.049 | 0.933 | 0.006 | 0.955 | 0.005 |
Wine | 0.966 | 0.000 | 0.953 | 0.048 | 0.977 | 0.038 | 0.943 | 0.003 | 0.953 | 0.004 |
Thyroid | 0.874 | 0.000 | 0.953 | 0.029 | 0.917 | 0.035 | 0.754 | 0.007 | 0.778 | 0.009 |
Generated blobs with outliers | ||||||||||
2 clusters (0.5% outliers) | 0.995 | 0.000 | 0.995 | 0.000 | 0.995 | 0.000 | 0.995 | 0.000 | 1.000 | 0.000 |
2 clusters (1% outliers) | 0.989 | 0.000 | 0.990 | 0.000 | 0.990 | 0.000 | 0.990 | 0.000 | 0.996 | 0.000 |
2 clusters (2% outliers) | 0.979 | 0.000 | 0.980 | 0.000 | 0.980 | 0.000 | 0.981 | 0.000 | 0.997 | 0.000 |
2 clusters (4% outliers) | 0.961 | 0.000 | 0.962 | 0.000 | 0.962 | 0.000 | 0.964 | 0.000 | 0.996 | 0.000 |
3 clusters (0.5% outliers) | 0.994 | 0.000 | 0.994 | 0.000 | 0.994 | 0.000 | 0.994 | 0.000 | 0.999 | 0.000 |
3 clusters (1% outliers) | 0.989 | 0.000 | 0.989 | 0.000 | 0.989 | 0.000 | 0.989 | 0.000 | 0.997 | 0.000 |
3 clusters (2% outliers) | 0.979 | 0.000 | 0.979 | 0.000 | 0.979 | 0.000 | 0.981 | 0.000 | 0.997 | 0.000 |
3 clusters (4% outliers) | 0.961 | 0.000 | 0.951 | 0.000 | 0.945 | 0.000 | 0.965 | 0.000 | 0.996 | 0.000 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lukauskas, M.; Ruzgas, T. A New Clustering Method Based on the Inversion Formula. Mathematics 2022, 10, 2559. https://doi.org/10.3390/math10152559
Lukauskas M, Ruzgas T. A New Clustering Method Based on the Inversion Formula. Mathematics. 2022; 10(15):2559. https://doi.org/10.3390/math10152559
Chicago/Turabian StyleLukauskas, Mantas, and Tomas Ruzgas. 2022. "A New Clustering Method Based on the Inversion Formula" Mathematics 10, no. 15: 2559. https://doi.org/10.3390/math10152559
APA StyleLukauskas, M., & Ruzgas, T. (2022). A New Clustering Method Based on the Inversion Formula. Mathematics, 10(15), 2559. https://doi.org/10.3390/math10152559