Unsupervised and Supervised Feature Selection for Incomplete Data via L2,1-Norm and Reconstruction Error Minimization
Abstract
:1. Introduction
2. Related Work
2.1. Imputation Methods
2.2. Unsupervised Feature Selection Methods on Incomplete Data Sets
2.3. Supervised Feature Selection Methods on Incomplete Data Sets
3. Approach
3.1. Notations and Definitions
3.2. L2,1-Norm Minimization UFS for Incomplete Data
3.2.1. The Basic Unsupervised Feature Selection Method
3.2.2. The Objection Function of UFS on Incomplete Data
3.2.3. Optimization of Objective Function
- (1)
- Update by Fixing
- (2)
- Update by Fixing
Algorithm 1 Proposed Algorithm for Optimizing (6) |
Input: Incomplete dataset , regularization parameter , , , and feature selection number k. output: , . (1) Initialize , , R. (2) After setting different missing ratios, calculate the indicator matrix according to (1). (3) Calculate . (4) Fixed , calculated according to (11). (5) Fixed , calculated according to (15). (6) Repeat steps (4) and (5) until convergence. (7) Use the calculated results to select the features. |
3.2.4. Convergence Analysis
3.3. Supervised Feature Selection for Incomplete Data
3.3.1. The Basic Supervised Feature Selection Method
3.3.2. The Objection Function of Supervised Feature Selection on Incomplete Data
3.3.3. Optimization of Objective Function
Algorithm 2 Proposed Algorithm for Optimizing (24) |
Input: Incomplete dataset , , regularization parameter , . output: . (1) Initialize , , , . (2) After setting different missing ratios, calculate the indicator matrix . (3) Calculate . (4) Fixed , the optimal is formed by (27). (5) Fixed , Calculate the diagonal matrix according to (31). (6) Repeat steps (3)–(5) until convergence. (7) Use the calculated results to select the features. |
3.3.4. Convergence Analysis
4. Experiment and Result Analysis
4.1. Unsupervised Feature Selection
4.1.1. Evaluation Metrics
4.1.2. Dataset
4.1.3. Comparison Schemes
- For the baseline, we used the k-means clustering algorithm with all features of OS.
- Lapscore [25] is a filter method that evaluates the importance of each feature based on its Laplacian score. It selects features of the OS.
- GSR [28] is a general framework which unifies a sparse embedding model and feature selection together. It selects features of the OS.
- RFS [29] is another typical embedded model of feature selection which has proven its effectiveness in reducing the influence of outliers. It selects features of the OS.
- GSR_mean is an imputation feature selection framework, which uses the mean-value imputation method [15] on the IS, and selects feature from the union of the IS and OS using GSR.
- GSR_KNN is imputation feature selection framework, which uses KNN imputation method [12] on the IS, and selects features from the union of the IS and OS using GSR.
- GSR_missForest [18] is an iterative imputation framework based on random forest on the IS, which selects features from the union of the IS and OS using GSR.
- GSR_DGM [20] is a probabilistic framework based on deep generative models for missing value imputation on the IS, which selects features from the union of the IS and OS using GSR.
- GSR_MIWAE [21] is an importance-weighted autoencoder framework, which maximizes a potentially tight lower bound of the log-likelihood on the IS, and selects features from the union of the IS and OS using GSR.
- HQ-UFS [23] is a framework for incomplete data sets, in which the half-quadratic minimization technique is used to make the weight of outliers more negligible or even zero and reducing the influence of outliers. It selects features directly from the incomplete data set.
4.1.4. Experimental Results
- (1)
- As the incomplete instance ratio increased, the performance of all schemes dropped sharply. For instance, on the cifar data set, the ACC of all schemes dropped by 1.31% on average at a ratio of 0.1, compared with a ratio of 0.9. In addition, on the vehicle data set, the ACC of all schemes dropped by 5.41% on average at a ratio of 0.1, compared with a ratio of 0.9. It also can be verified that most of the schemes achieved the best performance with small ratios, showing that the number of complete instances played an important role in those feature selection schemes.
- (2)
- The performance of our UFS-ID approach was similar to that of imputation and HQ-UFS approaches. It achieves better performance compared with other traditional feature selection approaches on the OSs of incomplete data sets. For example, on the USPSt data set, the UFS-ID approach achieved around 3.0% and 5.5% improvements at the ratios of 0.1 and 0.9, compared with RFS. This indicates that the more information is employed for the imputation, the more similar is the performance of HQ-UFS and UFS-ID.
- (3)
- The performance of GSR_knn, GSR_mean, GSR_missForest, GSR_DGM, GSR_MIWAE, and HQ-UFS was worse than that of our UFS-ID method. The reason for this is that UFS-ID utilizes neighbor data reconstruction information to improve the incomplete data structure for the selection of discriminative features, whereas other approaches do not add the information derived from neighbor data.
4.2. Supervised Feature Selection
4.2.1. Dataset
4.2.2. Comparison Methods and Experimental Settings
- The SID method [13], a framework in which the objective function takes into account the uncertainty of instances due to the missing values and which solves the revised optimization problem using an EM algorithm;
- Mean, an imputation feature selection framework, which uses the mean-value imputation method [15] on the IS, and which selects features from the union of the IS and OS using RFS;
- EM, an imputation feature selection framework, which uses the EM imputation method [16] on the IS and which selects feature from the union of the IS and OS using RFS;
- missForest [18], an iterative imputation framework based on a random forest on the IS, which selects features from the union of the IS and OS using RFS;
- DGM [20], a probabilistic framework based on the use of deep generative models for missing value imputation on the IS, and which selects feature from the union of the IS and OS using RFS; and
- MIWAE [21], an importance-weighted autoencoder framework, which maximizes a potentially tight lower bound of the log-likelihood on the IS and selects features from the union of the IS and OS using RFS.
4.2.3. Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
- Hart, P.E.; Stork, D.G.; Duda, R.O. Pattern Classification; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
- Lee Rodgers, J.; Nicewander, W.A. Thirteen ways to look at the correlation coefficient. Am. Stat. 1988, 42, 59–66. [Google Scholar] [CrossRef]
- Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
- Feng, Y.; Xiao, J.; Zhuang, Y.; Liu, X. Adaptive unsupervised multi-view feature selection for visual concept recognition. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 343–357. [Google Scholar]
- Krzanowski, W.J. Selection of variables to preserve multivariate data structure, using principal components. J. R. Stat. Soc. Ser. C 1987, 36, 22–33. [Google Scholar] [CrossRef]
- Sa, W.; Ke-yong, W.; Lian, Z. Feature Selection via Analysis of Relevance and Redundancy. J. Beijing Inst. Technol. 2008, 17, 300–304. [Google Scholar]
- Zhu, P.; Zuo, W.; Zhang, L.; Hu, Q.; Shiu, S.C. Unsupervised feature selection by regularized self-representation. Pattern Recognit. 2015, 48, 438–446. [Google Scholar] [CrossRef]
- Zhu, X.; Yang, J.; Zhang, C.; Zhang, S. Efficient utilization of missing data in cost-sensitive learning. IEEE Trans. Knowl. Data Eng. 2019, 33, 2425–2436. [Google Scholar] [CrossRef]
- Zhou, Y.; Tian, L.; Zhu, C.; Jin, X.; Sun, Y. Video coding optimization for virtual reality 360-degree source. IEEE J. Sel. Top. Signal Process. 2019, 14, 118–129. [Google Scholar] [CrossRef]
- Van Hulse, J.; Khoshgoftaar, T.M. Incomplete-case nearest neighbor imputation in software measurement data. Inf. Sci. 2014, 259, 596–610. [Google Scholar] [CrossRef]
- Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Wang, R. Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1774–1785. [Google Scholar] [CrossRef]
- Lou, Q.; Obradovic, Z. Margin-based feature selection in incomplete data. In Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; Volume 26. [Google Scholar]
- Cismondi, F.; Fialho, A.S.; Vieira, S.M.; Reti, S.R.; Sousa, J.M.; Finkelstein, S.N. Missing data in medical databases: Impute, delete or classify? Artif. Intell. Med. 2013, 58, 63–72. [Google Scholar] [CrossRef]
- Garcia, C.; Leite, D.; Škrjanc, I. Incremental missing-data imputation for evolving fuzzy granular prediction. IEEE Trans. Fuzzy Syst. 2019, 28, 2348–2362. [Google Scholar] [CrossRef]
- Simone, R. An accelerated EM algorithm for mixture models with uncertainty for rating data. Comput. Stat. 2021, 36, 691–714. [Google Scholar] [CrossRef]
- Pan, R.; Yang, T.; Cao, J.; Lu, K.; Zhang, Z. Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl. Intell. 2015, 43, 614–632. [Google Scholar] [CrossRef]
- Stekhoven, D.J.; Bühlmann, P. MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef]
- Gondara, L.; Wang, K. Multiple imputation using deep denoising autoencoders. arXiv 2017, arXiv:1705.02737. [Google Scholar]
- Zhang, H.; Xie, P.; Xing, E. Missing value imputation based on deep generative models. arXiv 2018, arXiv:1808.01684. [Google Scholar]
- Mattei, P.A.; Frellsen, J. MIWAE: Deep generative modelling and imputation of incomplete data sets. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 4413–4423. [Google Scholar]
- Le Morvan, M.; Josse, J.; Scornet, E.; Varoquaux, G. What is a good imputation to predict with missing values? Adv. Neural Inf. Process. Syst. 2021, 34, 11530–11540. [Google Scholar]
- Shen, H.T.; Zhu, Y.; Zheng, W.; Zhu, X. Half-quadratic minimization for unsupervised feature selection on incomplete data. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 3122–3135. [Google Scholar] [CrossRef]
- Lazar, C.; Taminau, J.; Meganck, S.; Steenhoff, D.; Coletta, A.; Molter, C.; de Schaetzen, V.; Duque, R.; Bersini, H.; Nowe, A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 2012, 9, 1106–1119. [Google Scholar] [CrossRef]
- He, X.; Cai, D.; Niyogi, P. Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 2005, 18, 507–514. [Google Scholar]
- Kabir, M.M.; Islam, M.M.; Murase, K. A new wrapper feature selection approach using neural network. Neurocomputing 2010, 73, 3273–3283. [Google Scholar] [CrossRef]
- Peng, C.; Kang, Z.; Yang, M.; Cheng, Q. Feature selection embedded subspace clustering. IEEE Signal Process. Lett. 2016, 23, 1018–1022. [Google Scholar] [CrossRef]
- Peng, H.; Fan, Y. A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Nie, F.; Huang, H.; Cai, X.; Ding, C. Efficient and robust feature selection via joint ℓ2,1-norms minimization. Adv. Neural Inf. Process. Syst. 2010, 23, 1813–1821. [Google Scholar]
- Shu, W.; Shen, H. Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recognit. 2016, 51, 268–280. [Google Scholar] [CrossRef]
- Fan, L.; Wu, X.; Tong, W.; Zeng, W. L2,1-norm minimization for Unsupervised Feature Selection from Incomplete Data. In Proceedings of the 2021 7th International Conference on Computer and Communications (ICCC), Chengdu, China, 10–13 December 2021; pp. 1491–1495. [Google Scholar]
- Gilad-Bachrach, R.; Navot, A.; Tishby, N. Margin based feature selection-theory and algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 43. [Google Scholar]
- Kira, K.; Rendell, L.A. A practical approach to feature selection. In Machine Learning Proceedings 1992; Elsevier: Amsterdam, The Netherlands, 1992; pp. 249–256. [Google Scholar]
- Lou, Q.; Obradovic, Z. Modeling multivariate spatio-temporal remote sensing data with large gaps. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
- Grangier, D.; Melvin, I. Feature set embedding for incomplete data. Adv. Neural Inf. Process. Syst. 2010, 23, 793–801. [Google Scholar]
- Śmieja, M.; Struski, Ł.; Tabor, J.; Marzec, M. Generalized RBF kernel for incomplete data. Knowl. Based Syst. 2019, 173, 150–162. [Google Scholar] [CrossRef]
- Przewiezlikowski, M.; Smieja, M.; Struski, L.; Tabor, J. MisConv: Convolutional Neural Networks for Missing Data. In Proceedings of the WACV, Waikoloa, HI, USA, 4–8 January 2022; pp. 2917–2926. [Google Scholar]
- Zhang, R.; Li, X. Unsupervised feature selection via data reconstruction and side information. IEEE Trans. Image Process. 2020, 29, 8097–8106. [Google Scholar] [CrossRef]
Notation | Definition |
---|---|
Incomplete data set of n instances and d features | |
The label set c is the total number of classes | |
The index matrix indicating whether the information in is complete or not | |
The feature weight coefficient matrix | |
The reconstruction weight matrix | |
Frobenius norm of a matrix | |
L2,1 norm of a matrix | |
norm of a function | |
∘ | Hadamard product |
Dataset | Instance | Feature | Class |
---|---|---|---|
CANE | 1080 | 856 | 9 |
cifar | 60,000 | 3072 | 10 |
connect-4 | 67,557 | 126 | 3 |
vehicle | 78,823 | 126 | 3 |
USPSt | 2007 | 256 | 10 |
Dataset | Ratio | Baseline | LapScore | GSR | RFS | GSR_mean | GSR_knn | GSR_missForest | GSR_DGM | GSR_MIWAE | HQ-UFS | UFS-ID | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ACC | NMI | ||
CNAE | 0 | 48.1 ** ±2.5 | 41.5 ** ±2.5 | 49.9 ** ±2.5 | 41.9 ** ±2.2 | 48.7 ** ±3.0 | 42.1 ** ±2.4 | 54.6 ** ±2.1 | 46.6 ** ±1.7 | 46.4 ** ±2.6 | 40.5 ** ±2.2 | 49.3 ** ±2.5 | 41.7 ** ±2.2 | 51.2 ** ±2.0 | 43.5 ** ±1.9 | 52.6 ** ±2.2 | 44.1 ** ±1.7 | 54.7 ** ±3.0 | 48.7 ** ±3.2 | 55.7 ** ±3.1 | 48.9 ** ±2.9 | 58.8 ±2.9 | 54.9 ±3.0 |
0.1 | 46.4 ** ±3.0 | 37.7 ** ±2.3 | 49.1 ** ±3.0 | 40.6 ** ±2.9 | 50.3 ** ±2.2 | 43.3 ** ±2.1 | 55.7 ** ±2.0 | 47.4 ** ±2.8 | 49.4 ** ±1.5 | 41.4 ** ±2.2 | 50.7 ** ±2.1 | 40.9 ** ±2.1 | 57.2 ** ±2.5 | 47.2 ** ±1.8 | 57.5 ** ±1.6 | 47.1 ** ±1.9 | 58.7 ** ±2.4 | 48.2 ** ±2.0 | 58.1 ** ±1.9 | 48.3 ** ±2.6 | 63.1 ±2.3 | 55.8 ±2.3 | |
0.3 | 45.4 ** ±3.0 | 37.2 ** ±2.2 | 45.9 ** ±1.9 | 39.9 ** ±1.9 | 52.2 ** ±2.0 | 45.4 ** ±2.1 | 60.1 ** ±3.2 | 52.9 ** ±2.8 | 48.8 ** ±3.6 | 43.0 ** ±2.8 | 53.1 ** ±2.8 | 44.3 ** ±1.5 | 58.2 ** ±3.1 | 51.4 ** ±2.8 | 59.4 ** ±2.3 | 52.0 ** ±2.9 | 61.0 ** ±2.2 | 53.2 ** ±2.5 | 61.2 ** ±3.5 | 53.6 ** ±2.6 | 63.9 ±2.6 | 58.8 ±3.3 | |
0.5 | 37.4 ** ±1.6 | 30.8 ** ±1.7 | 52.1 ** ±3.6 | 44.2 ** ±3.7 | 49.1 ** ±3.8 | 42.1 ** ±3.5 | 55.4 ** ±3.3 | 47.5 ** ±2.5 | 42.6 ** ±3.8 | 35.6 ** ±0.3 | 45.3 ** ±3.7 | 37.6 ** ±3.1 | 55.2 ** ±3.2 | 46.2 ** ±2.8 | 55.4 ** ±3.3 | 47.0 ** ±2.3 | 56.0 ** ±3.1 | 47.2 ** ±3.5 | 56.8 ** ±3.5 | 48.0 ** ±3.3 | 57.6 ±3.1 | 51.4 ±2.9 | |
0.7 | 43.2 ** ±3.6 | 36.3 ** ±3.3 | 49.8 ** ±2.8 | 42.5 ** ±2.4 | 44.1 ** ±3.3 | 39.5 ** ±3.5 | 54.4 ** ±3.1 | 47.8 ** ±3.0 | 43.4 ** ±0.8 | 33.6 ** ±0.2 | 46.1 ** ±1.8 | 38.1 ** ±0.8 | 53.2 ** ±3.1 | 45.2 ** ±1.9 | 53.4 ** ±2.4 | 45.5 ** ±3.1 | 54.0 ** ±3.2 | 46.2 ** ±2.3 | 54.2 ** ±4.2 | 46.8 ** ±3.9 | 58.9 ±3.4 | 53.5 ±3.7 | |
0.9 | 42.2 ** ±2.1 | 39.5 ** ±2.0 | 45.1 ** ±2.0 | 44.4 ** ±2.1 | 45.2 ** ±2.9 | 42.3 ** ±2.6 | 53.6 ** ±3.2 | 51.9 ** ±3.0 | 45.4 ** ±1.1 | 37.4 ** ±2.8 | 45.9 ** ±3.0 | 37.2 ** ±2.9 | 52.1 ** ±2.1 | 47.2 ** ±2.5 | 53.1 ** ±2.1 | 47.4 ** ±2.1 | 4.1 ** ±2.2 | 48.2 ** ±1.9 | 54.6 ** ±2.2 | 48.8 ** ±2.1 | 55.6 ±2.3 | 55.9 ±1.7 | |
cifar | 0 | 21.2 ** ±0.2 | 7.4 ** ±0.1 | 18.6 ** ±0.2 | 4.8 ** ±0.1 | 19.2 ** ±0.3 | 7.1 ** ±0.1 | 18.7 ** ±0.2 | 5.1 ** ±0.1 | 19.3 ** ±0.1 | 6.8 ** ±0.3 | 19.8 ** ±0.2 | 7.5 ** ±0.1 | 18.6 ** ±0.1 | 7.4 ** ±0.2 | 18.4 ** ±0.1 | 7.4 ** ±0.1 | 19.0 ** ±0.2 | 7.2 ** ±0.1 | 21.3 ** ±0.4 | 8.1 ** ±0.1 | 24.2 ±0.3 | 11.1 ±0.2 |
0.1 | 21.3 ** ±0.2 | 7.5 ** ±0.1 | 18.5 ** ±0.3 | 4.7 ** ±0.1 | 19.8 ** ±0.5 | 6.7 ** ±0.2 | 18.5 ** ±0.2 | 4.9 ** ±0.1 | 20.9 ** ±0.4 | 7.2 ** ±0.2 | 21.3 ** ±0.2 | 8.1 ** ±0.1 | 20.3 ** ±0.1 | 8.2 ** ±0.1 | 21.5 ** ±0.2 | 8.4 ** ±0.1 | 21.6 ** ±0.2 | 8.5 ** ±0.2 | 21.7 ** ±0.2 | 8.4 ** ±0.2 | 23.4 ±0.2 | 10.1 ±0.2 | |
0.3 | 20.7 ** ±0.1 | 7.7 ** ±0.1 | 17.2 ** ±0.2 | 7.8 ** ±0.1 | 21.4 ** ±0.1 | 7.7 ** ±0.1 | 18.4 ** ±0.2 | 5.1 ** ±0.1 | 21.2 ** ±0.1 | 7.5 ** ±0.1 | 21.7 ** ±0.4 | 7.3 ** ±0.1 | 20.8 ** ±0.1 | 7.2 ** ±0.2 | 21.1 ** ±0.1 | 7.8 ** ±0.1 | 21.5 ** ±0.2 | 7.7 ** ±0.1 | 21.6 ** ±0.2 | 7.8 ** ±0.2 | 22.7 ±0.1 | 11.5 ±0.2 | |
0.5 | 20.4 ** ±0.2 | 7.1 ** ±0.2 | 17.7 ** ±0.1 | 4.9 ** ±0.1 | 21.2 ** ±0.3 | 7.4 ** ±0.3 | 17.9 ** ±0.1 | 6.3 ** ±0.3 | 20.3 ** ±0.3 | 7.1 ** ±0.3 | 20.8 ** ±0.2 | 7.8 ** ±0.1 | 20.5 ** ±0.1 | 7.4 ** ±0.2 | 20.7 ** ±0.1 | 7.6 ** ±0.1 | 21.1 ** ±0.1 | 7.7 ** ±0.2 | 21.3 ** ±0.1 | 7.6 ** ±0.1 | 22.4 ±0.2 | 12.9 ±0.2 | |
0.7 | 19.9 ** ±0.3 | 6.1 ** ±0.1 | 18.3 ** ±0.3 | 4.2 ** ±0.4 | 20.1 ** ±0.1 | 6.5 ** ±0.1 | 17.4 ** ±0.2 | 6.2 ** ±0.3 | 20.3 ** ±0.3 | 6.2 ** ±0.5 | 20.5 ** ±0.2 | 6.7 * ±0.4 | 20.7 ** ±0.1 | 6.8 ** ±0.2 | 20.7 ** ±0.2 | 6.6 ** ±0.2 | 20.9 ** ±0.1 | 6.8 ** ±0.1 | 20.8 ** ±0.4 | 6.8 ** ±0.3 | 21.8 ±0.1 | 12.1 ±0.3 | |
0.9 | 18.8 ±0.7 | 5.9 ** ±0.4 | 17.1 ** ±0.1 | 3.6 ** ±0.2 | 19.5 ** ±0.7 | 6.0 ** ±0.4 | 17.1 ** ±0.6 | 4.3 ** ±0.5 | 20.1 ** ±0.6 | 6.2 ** ±0.2 | 20.4 ** ±0.4 | 6.3 ** ±0.2 | 19.8 ** ±0.1 | 6.0 ±0.4 | 20.1 ** ±0.2 | 6.1 ±0.3 | 19.7 ** ±0.2 | 6.0 ±0.3 | 20.3 ** ±0.1 | 6.3 ** ±0.3 | 21.3 ±0.3 | 11.5 ±0.2 | |
connect-4 | 0 | 37.6 ** ±0.6 | 10.1 ** ±0.1 | 38.5 ** ±1.3 | 10.4 ** ±0.1 | 42.2 ** ±1.0 | 11.6 ** ±0.3 | 40.6 ** ±1.3 | 10.8 ** ±0.1 | 42.4 ** ±2.1 | 11.5 ** ±0.2 | 42.4 ** ±1.4 | 11.2 ** ±0.5 | 42.6 ** ±1.7 | 11.3 ** ±0.2 | 42.3 ** ±1.8 | 11.6 ** ±0.2 | 41.9 ** ±1.4 | 11.6 ** ±0.2 | 42.8 ** ±1.6 | 11.5 ** ±0.2 | 43.0 ±1.8 | 11.7 ±0.2 |
0.1 | 37.9 ** ±0.7 | 10.1 ** ±0.1 | 38.4 ±0.9 | 10.3 ** ±0.1 | 41.3 ** ±2.9 | 11.4 ** ±0.2 | 40.7 ** ±1.3 | 10.3 ** ±0.1 | 42.5 ** ±3.1 | 11.7 ±0.1 | 41.6 ** ±1.7 | 11.4 ±0.5 | 40.1 ** ±1.3 | 10.9 ** ±0.1 | 40.5 ** ±1.2 | 11.1 ** ±0.2 | 41.4 ** ±1.2 | 11.5 ** ±0.3 | 41.7 ** ±0.8 | 11.5 ** ±0.2 | 42.9 ±1.5 | 11.7 ±0.1 | |
0.3 | 37.6 ** ±0.2 | 10.3 ** ±0.1 | 38.6 ±0.9 | 10.8 ** ±0.1 | 42.6 ** ±0.8 | 11.2 ±0.2 | 35.9 ** ±0.5 | 10.2 ** ±0.1 | 42.4 ** ±2.8 | 11.4 ** ±0.2 | 41.5 ±2.9 | 11.7 ** ±0.3 | 41.6 ** ±1.3 | 11.0 ** ±0.1 | 41.6 ** ±1.3 | 11.2 ** ±0.2 | 41.7 ** ±1.2 | 11.5 ** ±0.2 | 41.4 ** ±1.4 | 11.8 ** ±0.3 | 42.8 ±1.3 | 11.6 ±0.2 | |
0.5 | 38.9 ** ±1.5 | 10.4 ** ±0.1 | 37.8 ** ±0.6 | 10.4 ** ±0.1 | 41.1 ** ±2.6 | 12.1 ** ±0.4 | 35.6 ** ±0.4 | 10.3 ** ±0.1 | 41.5 ** ±2.6 | 11.6 * ±0.1 | 41.5 ** ±2.6 | 11.7 ±0.1 | 41.2 ** ±1.1 | 11.4 ** ±0.1 | 41.0 ** ±1.1 | 11.7 ** ±0.2 | 41.2 ** ±1.3 | 11.6 ** ±0.2 | 41.7 ** ±1.1 | 11.3 ** ±0.2 | 42.0 ±1.2 | 11.3 ±0.1 | |
0.7 | 37.5 ** ±0.6 | 10.2 ** ±0.1 | 37.1 ** ±0.1 | 10.3 ** ±0.1 | 40.7 ** ±2.0 | 11.4 ** ±0.2 | 36.1 * ±0.3 | 10.4 ** ±0.1 | 41.5 ** ±2.1 | 11.8 ** ±0.4 | 41.6 ** ±2.3 | 11.6 ** ±0.1 | 41.5 ** ±1.1 | 11.2 ** ±0.1 | 41.6 ** ±1.1 | 11.0 ** ±0.2 | 41.7 ** ±1.0 | 11.2 ** ±0.1 | 41.8 ** ±1.1 | 11.3 ** ±0.2 | 42.3 ±1.2 | 12.2 ±0.1 | |
0.9 | 37.6 ** ±0.4 | 10.2 ** ±0.1 | 37.3 ** ±0.7 | 10.3 ** ±0.1 | 40.8 * ±2.0 | 11.3 ±0.2 | 35.6 ** ±0.6 | 10.4 ** ±0.1 | 40.6 ** ±3.0 | 11.4 * ±0.3 | 40.8 ** ±2.8 | 11.5 ** ±0.1 | 40.5 ** ±1.1 | 11.3 ** ±0.2 | 40.6 ** ±1.2 | 11.2 ** ±0.2 | 40.7 ** ±1.2 | 11.5 ** ±0.1 | 40.9 ** ±0.8 | 11.2 ** ±0.1 | 42.1 ±1.2 | 11.5 ±0.1 | |
vehicle | 0 | 54.1 ** ±1.2 | 17.3 ** ±0.5 | 57.3 ** ±0.3 | 15.7 ** ±0.1 | 55.8 ** ±2.7 | 15.1 ** ±0.9 | 55.3 ** ±1.4 | 15.2 ** ±1.1 | 55.9 ** ±1.7 | 14.9 ** ±1.0 | 57.5 ** ±0.4 | 16.1 * ±0.2 | 57.5 ** ±1.0 | 16.3 ** ±0.2 | 57.4 ** ±1.1 | 16.0 ** ±0.2 | 57.5 ** ±1.1 | 17.1 ** ±0.2 | 57.6 ** ±1.4 | 16.2 ** ±0.2 | 59.3 ±1.2 | 17.1 ±0.2 |
0.1 | 54.4 ** ±1.2 | 16.6 ** ±0.8 | 57.2 ±0.8 | 15.2 * ±0.5 | 57.1 * ±1.0 | 15.2 ±0.4 | 56.1 ** ±1.4 | 15.4 * ±0.8 | 56.3 ** ±0.8 | 15.6 ±0.5 | 54.3 ** ±4.1 | 19.4 ** ±2.2 | 57.1 ** ±1.5 | 18.3 ** ±0.4 | 57.4 ** ±1.2 | 19.1 ** ±0.1 | 57.9 ** ±1.4 | 19.7 ** ±0.4 | 57.1 ** ±1.2 | 15.7 ** ±0.6 | 58.3 ±1.3 | 18.7 ±0.4 | |
0.3 | 56.2 ** ±1.3 | 15.4 ** ±0.3 | 56.5 ±0.1 | 15.3 * ±0.2 | 56.0 ** ±0.5 | 15.3 ±0.2 | 53.6 ** ±2.5 | 15.3 * ±0.8 | 56.1 ** ±0.8 | 15.3 * ±0.5 | 56.1 ±1.5 | 15.3 ±1.3 | 56.5 ** ±1.5 | 15.3 ** ±0.3 | 56.7 ** ±1.4 | 15.4 ** ±0.3 | 56.9 ** ±1.3 | 15.4 ** ±0.5 | 56.4 ** ±1.9 | 15.5 ** ±0.9 | 57.6 ±1.8 | 16.3 ±0.7 | |
0.5 | 54.5 ** ±1.6 | 15.1 ** ±1.0 | 54.6 ** ±2.3 | 13.4 ** ±1.3 | 54.4 ** ±0.7 | 14.4 ±0.4 | 54.3 ** ±1.5 | 14.7 ** ±0.5 | 53.2 ** ±2.3 | 14.4 ±1.1 | 54.3 ** ±3.2 | 16.3 ** ±1.6 | 55.7 ** ±1.4 | 16.3 ** ±0.4 | 56.1 ** ±1.5 | 16.4 ** ±0.4 | 56.2 ** ±2.3 | 16.8 ** ±0.6 | 56.3 ** ±0.8 | 14.3 ** ±0.5 | 57.8 ±1.7 | 15.7 ±0.3 | |
0.7 | 55.1 ** ±1.3 | 14.5 ** ±0.8 | 50.1 ** ±2.9 | 11.3 ** ±2.3 | 53.5 ±1.2 | 13.1 ** ±0.6 | 52.2 * ±1.4 | 13.9 ±0.9 | 49.3 ** ±2.8 | 12.2 ** ±1.3 | 51.4 ** ±2.4 | 14.3 ** ±2.3 | 53.0 ** ±1.1 | 14.2 ** ±0.4 | 52.6 ** ±1.5 | 13.4 ** ±0.4 | 53.1 ** ±1.5 | 14.3 ** ±0.6 | 53.5 ** ±1.3 | 13.8 ** ±1.0 | 54.4 ±1.7 | 15.4 ±0.3 | |
0.9 | 51.4 ** ±1.5 | 12.7 ** ±0.6 | 50.8 ±0.5 | 11.5 * ±0.2 | 50.4 * ±1.5 | 11.5 ±0.7 | 50.4 ** ±0.7 | 13.5 ** ±0.2 | 49.6 ** ±1.2 | 10.8 ** ±0.8 | 48.5 ** ±2.3 | 11.9 * ±3.0 | 52.5 ** ±0.9 | 11.2 ** ±0.4 | 51.4 ** ±2.0 | 11.6 ** ±1.1 | 52.9 ** ±1.9 | 11.7 ** ±1.6 | 51.5 ** ±1.8 | 11.8 ** ±1.5 | 53.1 ±1.7 | 12.4 ±1.3 | |
USPSt | 0 | 64.4 ** ±1.2 | 59.5 ** ±0.7 | 59.7 ** ±1.7 | 57.2 ** ±1.0 | 64.3 ** ±1.9 | 58.6 ** ±0.9 | 64.1 ** ±1.1 | 58.2 ** ±0.7 | 62.4 ** ±2.3 | 58.2 ** ±0.9 | 64.3 ** ±1.9 | 58.6 ** ±0.9 | 65.7 ** ±1.9 | 58.5 ** ±1.0 | 66.0 ** ±2.0 | 58.9 ** ±0.9 | 66.2 ** ±1.9 | 59.1 ** ±0.7 | 67.7 ** ±2.2 | 60.9 ** ±0.8 | 68.1 ±2.1 | 61.4 ±0.9 |
0.1 | 60.9 ** ±1.9 | 58.1 ** ±1.0 | 59.2 ** ±1.9 | 56.9 ** ±1.0 | 64.7 ** ±1.8 | 59.1 ** ±1.3 | 64.6 ** ±1.6 | 58.7 ±0.7 | 65.1 ** ±2.2 | 58.3 ** ±0.5 | 67.5 ** ±3.4 | 59.3 ±2.1 | 66.1 ** ±0.9 | 58.6 ** ±1.1 | 66.2 ** ±1.0 | 58.8 ** ±0.8 | 66.4 ** ±1.5 | 59.1 ** ±0.8 | 67.0 ** ±0.8 | 59.4 ** ±0.7 | 67.6 ±0.9 | 60.4 ±0.7 | |
0.3 | 62.0 ** ±1.9 | 57.9 ** ±0.9 | 58.7 ** ±2.1 | 56.7 ** ±0.9 | 61.9 ** ±1.9 | 55.5 ** ±1.2 | 65.5 ** ±1.1 | 57.6 ** ±0.8 | 63.9 ** ±3.1 | 61.3 ** ±0.7 | 63.0 ** ±2.2 | 58.2 ** ±1.6 | 66.0 ** ±0.8 | 63.1 ** ±0.4 | 66.1 ** ±0.8 | 63.0 ** ±0.4 | 66.5 ** ±1.2 | 63.3 ** ±0.4 | 67.9 ** ±0.7 | 56.7 ** ±0.2 | 67.9 ±0.8 | 58.8 ±0.4 | |
0.5 | 61.0 ** ±1.5 | 59.8 ** ±0.7 | 59.0 ** ±2.1 | 58.7 ** ±1.1 | 56.5 ** ±1.3 | 53.6 ** ±0.8 | 65.5 ** ±1.8 | 61.2 ** ±0.5 | 61.4 ** ±2.1 | 57.8 ** ±2.0 | 57.6 ** ±1.3 | 54.7 ** ±1.0 | 65.0 ** ±1.7 | 59.1 ** ±1.4 | 65.2 ** ±1.8 | 59.6 ** ±1.1 | 66.0 ** ±2.0 | 60.2 ** ±0.9 | 65.9 ** ±1.8 | 61.1 ** ±0.6 | 68.3 ±1.9 | 61.1 ±0.7 | |
0.7 | 61.4 ** ±1.8 | 58.8 ** ±0.7 | 59.4 ** ±1.0 | 58.7 ** ±0.7 | 59.4 ** ±2.1 | 56.6 ** ±1.5 | 67.8 ** ±1.1 | 60.0 ** ±0.6 | 65.4 ** ±2.5 | 58.8 ** ±1.2 | 63.8 ** ±3.2 | 57.6 ** ±1.7 | 65.1 ** ±1.8 | 58.7 ** ±1.2 | 65.7 ** ±1.4 | 59.0 ** ±1.2 | 65.8 ** ±1.7 | 59.6 ** ±1.3 | 68.1 ** ±2.3 | 61.0 ** ±1.0 | 69.1 ±2.0 | 61.2 ±0.8 | |
0.9 | 61.1 ** ±2.0 | 59.4 ** ±0.9 | 58.1 ** ±2.3 | 59.4 ** ±1.2 | 58.6 ** ±2.3 | 59.3 ** ±1.4 | 62.0 ** ±1.3 | 61.9 ±0.5 | 61.6 ** ±1.9 | 57.6 ** ±1.4 | 64.3 ** ±2.8 | 59.4 ** ±2.6 | 64.4 ** ±1.6 | 58.5 ** ±1.5 | 64.9 ** ±1.3 | 59.6 ** ±1.5 | 65.1 ** ±1.9 | 59.8 ** ±1.5 | 66.7 ** ±1.5 | 61.9 ** ±0.7 | 67.5 ±1.7 | 62.5 ±0.9 |
Dataset | Instance | Feature | Class |
---|---|---|---|
DLBCL | 141 | 661 | 3 |
MNIST | 5000 | 780 | 10 |
Splice | 1000 | 60 + 2000 | 2 |
wpbc | 198 | 33 + 2000 | 2 |
USPS | 9298 | 256 | 10 |
Arcene | 200 | 10,000 | 2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, J.; Fan, L.; Xu, X.; Wu, X. Unsupervised and Supervised Feature Selection for Incomplete Data via L2,1-Norm and Reconstruction Error Minimization. Appl. Sci. 2022, 12, 8752. https://doi.org/10.3390/app12178752
Cai J, Fan L, Xu X, Wu X. Unsupervised and Supervised Feature Selection for Incomplete Data via L2,1-Norm and Reconstruction Error Minimization. Applied Sciences. 2022; 12(17):8752. https://doi.org/10.3390/app12178752
Chicago/Turabian StyleCai, Jun, Linge Fan, Xin Xu, and Xinrong Wu. 2022. "Unsupervised and Supervised Feature Selection for Incomplete Data via L2,1-Norm and Reconstruction Error Minimization" Applied Sciences 12, no. 17: 8752. https://doi.org/10.3390/app12178752
APA StyleCai, J., Fan, L., Xu, X., & Wu, X. (2022). Unsupervised and Supervised Feature Selection for Incomplete Data via L2,1-Norm and Reconstruction Error Minimization. Applied Sciences, 12(17), 8752. https://doi.org/10.3390/app12178752