Cross-Validation for Lower Rank Matrices Containing Outliers
Abstract
:1. Introduction
2. EK Method
3. Proposed Methodology
- PRESS: According to Equation (5), the optimal rank is the one that minimizes the statistic. Bro et al. [9] found that, in some cases, this criterion may be more effective than .
- PRESS75: A resistant PRESS statistic is constructed by averaging the 75% of the smallest squared errors . The optimal rank is the one that minimizes PRESS75.
- PRESS50: A resistant PRESS statistic is constructed by averaging the 50% of the smallest squared errors . The optimal rank is the one that minimizes PRESS50.
- : According to Equation (6), the optimal rank is the total number of important components.
- : The optimal rank is the number of the largest important component using . Krzanowski [8] found that, in some data sets, the statistic does not always show a monotonically decreasing behaviour. For example, if , , and , the optimal rank is 4, even though component 3 is not important.
- : In Equation (6), PRESS is replaced by PRESS75 and the optimal rank is the total number of important components.
- : The optimal rank corresponds to the number of the largest important components using .
- : In Equation (6), PRESS is replaced by PRESS50 and the optimal rank is the total number of important components.
- : The optimal rank corresponds to the number of the largest important component using .
4. Robust Singular Value Decompositions (rSVDs)
5. Simulation Study
6. Real Data
7. Results
8. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Good, I.J. Some applications of the singular value decomposition of a matrix. Technometrics 1969, 11, 823–831. [Google Scholar] [CrossRef]
- Geladi, P.; Linderholm, J. Principal component analysis. In Comprehensive Chemometrics 2nd Edition: Chemical and Biochemical Data Analysis; Brown, S., Tauler, R., Walczak, B., Eds.; Elsevier: Amsterdam, The Netherlands, 2020; pp. 17–37. [Google Scholar]
- Arciniegas-Alarcón, S.; García-Peña, M.; Krzanowski, W.J. Imputation using the singular-value decomposition: Variants of existing methods, proposed and assessed. Int. J. Innov. Comput. Inf. Control 2020, 16, 1681–1696. [Google Scholar]
- Gabriel, K.R. Le biplot–outil d’exploration de données multidimensionelles. J. Soc. Française Stat. 2002, 143, 5–55. [Google Scholar]
- Gauch, H. A simple protocol for AMMI analysis of yield trials. Crop Sci. 2013, 53, 1860–1869. [Google Scholar] [CrossRef]
- Yan, W. Crop Variety Trials: Data Management and Analysis; Wiley Blackwell: Hoboken, NJ, USA, 2014. [Google Scholar]
- Rodrigues, P.C.; Lourenço, V.; Mahmoudvand, R. A robust approach to singular sprectrum analysis. Qual. Reliab. Eng. Int. 2018, 34, 1437–1447. [Google Scholar] [CrossRef]
- Krzanowski, W.J. Cross-validation in principal component analysis. Biometrics 1987, 43, 575–584. [Google Scholar] [CrossRef]
- Bro, R.; Kjeldahl, K.; Smilde, A.K.; Kiers, H.A.L. Cross-validation of component models: A critical look at current methods. Anal. Bioanal. Chem. 2008, 390, 1241–1251. [Google Scholar] [CrossRef]
- Owen, A.B.; Perry, P. Bi-cross-validation of the svd and the nonnegative matrix factorization. Ann. Appl. Stat. 2009, 3, 564–594. [Google Scholar] [CrossRef] [Green Version]
- Josse, J.; Husson, F. Selecting the number of components in principal component analysis using cross-validation approximations. Comput. Stat. Data Anal. 2012, 56, 1869–1879. [Google Scholar] [CrossRef]
- Camacho, J.; Ferrer, A. Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: Theoretical aspects. J. Chemom. 2012, 26, 361–373. [Google Scholar] [CrossRef]
- Saccenti, E.; Camacho, J. On the use of the observation-wise k-fold operation in PCA cross-validation. J. Chemom. 2015, 29, 467–478. [Google Scholar] [CrossRef]
- Eastment, H.T.; Krzanowski, W.J. Cross-validatory choice of the number of components from a principal component analysis. Technometrics 1982, 24, 73–77. [Google Scholar] [CrossRef]
- Dias, C.T.S.; Krzanowski, W.J. Model selection and cross-validation in additive main effect and multiplicative (AMMI) models. Crop Sci. 2003, 43, 865–873. [Google Scholar] [CrossRef]
- Liu, Y.J.; Tran, T.; Postma, G.; Buydens, L.M.C.; Jansen, J. Estimating the number of components and detecting outliers using Angle Distribution of Loading Subspaces (ADLS) in PCA analysis. Anal. Chim. Acta 2018, 1020, 17–29. [Google Scholar] [CrossRef]
- Krzanowski, W.J. Cross-validatory choice in principal component analysis: Some sampling results. J. Stat. Comput. Simul. 1983, 18, 299–314. [Google Scholar] [CrossRef]
- Krzanowski, W.J.; Kline, P. Cross-validation for choosing the number of important components in principal component analysis. Multivar. Behav. Res. 1995, 30, 149–165. [Google Scholar] [CrossRef]
- Forkman, J.; Piepho, H.P. Parametric bootstrap methods for testing multiplicative terms in GGE and AMMI models. Biometrics 2014, 70, 639–647. [Google Scholar] [CrossRef] [Green Version]
- García-Peña, M.; Arciniegas-Alarcón, S.; Krzanowski, W.J.; Duarte, D. Missing value imputation using the robust singular-value decomposition: Proposals and numerical evaluation. Crop Sci. 2021, 61, 3288–3300. [Google Scholar] [CrossRef]
- Arciniegas-Alarcón, S.; García-Peña, M.; Rengifo, C.; Krzanowski, W.J. Techniques for robust imputation in incomplete two-way tables. Appl. Syst. Innov. 2021, 4, 62. [Google Scholar] [CrossRef]
- Hubert, M.; Engelen, S. Fast cross-validation of high-breakdown resampling methods for PCA. Comput. Stat. Data Anal. 2007, 51, 5013–5024. [Google Scholar] [CrossRef]
- Gabriel, K.R.; Odoroff, C.L. Resistant lower rank approximation of matrices. In Data Analysis and Statistics III; Diday, E., Jambu, M., Lebart, L., Thomassone, Eds.; North-Holland: Amsterdam, The Netherlands, 1984; pp. 23–30. [Google Scholar]
- Hawkins, D.M.; Liu, L.; Young, S.S. Robust Singular Value Decomposition; Technical Report 122; National Institute of Statistical Sciences: Washington, DC, USA, 2001. [Google Scholar]
- Zhang, L.; Shen, H.; Huang, J.Z. Robust regularized singular-value decomposition with application to mortality data. Tha Ann. Appl. Stat. 2013, 7, 1540–1561. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; ISBN 3-900051-070. Available online: https://www.r-project.org/ (accessed on 1 April 2022).
- Krzanowski, W.J. Principles of Multivariate Analysis: A User’s Perspective Oxford; University Press: Oxford, UK, 2000. [Google Scholar]
- Krzanowski, W.J. Between-group comparison of principal components–some sampling results. J. Stat. Comput. Simul. 1982, 15, 141–154. [Google Scholar] [CrossRef]
- Maronna, R.A.; Yohai, V.J. Robust low-rank approximation of data matrices with elementwise contamination. Technometrics 2008, 50, 295–304. [Google Scholar] [CrossRef]
- Skov, T.; Ballabio, D.; Bro, R. Multiblock variance partitioning: A new approach for comparing variation in multiple data blocks. Anal. Chim. Acta 2008, 615, 18–29. [Google Scholar] [CrossRef]
- Bro, R.; Smilde, A.K. Principal component analysis. Anal. Methods 2014, 6, 2812–2831. [Google Scholar] [CrossRef] [Green Version]
- Rodrigues, P.C.; Monteiro, A.; Lourenço, V.M. A robust AMMI model for the analysis of genotype × environment data. Bioinformatics 2016, 32, 58–66. [Google Scholar] [CrossRef] [Green Version]
- Krzanowski, W.J. Missing value imputation in multivariate data using the singular value decomposition of a matrix. Biom. Lett. 1988, 25, 31–39. [Google Scholar]
- Eshghi, P. Dimensionality choice in principal component analysis via cross-validatory methods. Chemom. Intell. Lab. Syst. 2014, 130, 6–13. [Google Scholar] [CrossRef]
- González-Cebrián, A.; Arteaga, F.; Folch-Fortuny, A.; Ferrer, A. How to simulate outliers with desired properties. Chemom. Intell. Lab. Syst. 2021, 212, 104301. [Google Scholar] [CrossRef]
- Grentzelos, C.; Caroni, C.; Barranco-Chamorro, I. A comparative study of methods to handle outliers in multivariate data analysis. Comput. Math. Methods 2020, 3, e1129. [Google Scholar] [CrossRef]
- Alkan, B.B.; Atakan, C.; Alkan, N. A comparison of different procedures for principal component analysis in the presence of outliers. J. Appl. Stat. 2015, 42, 1716–1722. [Google Scholar] [CrossRef]
0.9-Trimmed Means of Procrustes Values | |||||||||
---|---|---|---|---|---|---|---|---|---|
Outliers = 0% | Methods | Outliers = 10% | Methods | ||||||
Criterion | EK01 | EK13 | EK82 | EK84 | Criterion | EK01 | EK13 | EK82 | EK84 |
PRESS | 928 | 411 | 405 | 2326 | PRESS | 5,929,179 | 9,858,840 | 10,533,642 | 2726 |
PRESS75 | 845 | 367 | 357 | 2284 | PRESS75 | 6,503,877 | 10,051,928 | 10,592,223 | 2764 |
PRESS50 | 802 | 357 | 315 | 2273 | PRESS50 | 9,915,026 | 13,550,429 | 13,794,540 | 2771 |
Wm | 1773 | 1547 | 1563 | 2364 | Wm | 16,969,901 | 23,472,838 | 23,671,632 | 2703 |
Wm(Max) | 1431 | 1017 | 1000 | 2359 | Wm(Max) | 16,969,901 | 23,472,838 | 23,671,632 | 2706 |
Wm75 | 1222 | 841 | 826 | 2295 | Wm75 | 6,890,112 | 10,644,089 | 11,219,853 | 2741 |
Wm75(Max) | 1102 | 647 | 614 | 2292 | Wm75(Max) | 8,091,972 | 11,470,833 | 11,918,769 | 2748 |
Wm50 | 1177 | 822 | 806 | 2280 | Wm50 | 10,431,960 | 15,358,453 | 15,708,124 | 2753 |
Wm50(Max) | 1041 | 640 | 619 | 2279 | Wm50(Max) | 13,724,787 | 18,830,770 | 19,093,611 | 2760 |
Outliers = 5% | Methods | Outliers = 20% | Methods | ||||||
Criterion | EK01 | EK13 | EK82 | EK84 | Criterion | EK01 | EK13 | EK82 | EK84 |
PRESS | 1,349,656 | 5,347,643 | 5,517,974 | 2503 | PRESS | 16,219,919 | 20,858,768 | 23,719,760 | 3794 |
PRESS75 | 1,593,606 | 5,978,138 | 6,160,294 | 2500 | PRESS75 | 16,585,561 | 20,389,484 | 23,240,805 | 3875 |
PRESS50 | 2,085,285 | 7,208,180 | 7,361,273 | 2491 | PRESS50 | 24,876,820 | 26,596,384 | 26,604,794 | 3896 |
Wm | 6,121,399 | 12,296,349 | 12,339,272 | 2513 | Wm | 32,095,002 | 43,746,109 | 47,429,575 | 3491 |
Wm(Max) | 6,145,795 | 12,296,349 | 12,339,272 | 2511 | Wm(Max) | 32,300,042 | 43,746,109 | 47,429,575 | 3529 |
Wm75 | 1,802,253 | 6,876,657 | 7,043,231 | 2498 | Wm75 | 17,200,893 | 20,911,456 | 23,301,683 | 3710 |
Wm75(Max) | 2,427,846 | 8,446,853 | 8,567,883 | 2500 | Wm75(Max) | 18,713,152 | 21,288,046 | 23,488,022 | 3736 |
Wm50 | 2,659,677 | 8,260,863 | 8,433,306 | 2494 | Wm50 | 23,995,906 | 27,336,626 | 27,389,422 | 3769 |
Wm50(Max) | 4,369,880 | 10,172,078 | 10,266,390 | 2493 | Wm50(Max) | 28,428,412 | 31,084,469 | 29,862,533 | 3799 |
0.9-Trimmed Means of Critical Angles | |||||||||
---|---|---|---|---|---|---|---|---|---|
Outliers = 0% | Methods | Outliers = 10% | Methods | ||||||
Criterion | EK01 | EK13 | EK82 | EK84 | Criterion | EK01 | EK13 | EK82 | EK84 |
PRESS | 0.8547 | 0.1242 | 0.0000 | 0.9791 | PRESS | 1.5224 | 1.5708 | 1.5612 | 1.1699 |
PRESS75 | 0.9473 | 0.1574 | 0.0000 | 1.1286 | PRESS75 | 1.5100 | 1.5568 | 1.5652 | 1.2546 |
PRESS50 | 1.0053 | 0.1935 | 0.0000 | 1.1629 | PRESS50 | 1.3777 | 1.3662 | 1.4003 | 1.2339 |
Wm | 0.8177 | 0.7221 | 0.7330 | 1.0202 | Wm | 1.2399 | 1.2184 | 1.2110 | 1.1507 |
Wm(Max) | 0.6050 | 0.2747 | 0.2094 | 1.0174 | Wm(Max) | 1.2399 | 1.2184 | 1.2110 | 1.1608 |
Wm75 | 0.6846 | 0.0950 | 0.0000 | 1.1048 | Wm75 | 1.4913 | 1.5103 | 1.5134 | 1.2072 |
Wm75(Max) | 0.7120 | 0.0695 | 0.0000 | 1.0948 | Wm75(Max) | 1.4939 | 1.5216 | 1.5465 | 1.2083 |
Wm50 | 0.6963 | 0.0956 | 0.0000 | 1.1163 | Wm50 | 1.3135 | 1.1962 | 1.2189 | 1.2315 |
Wm50(Max) | 0.7274 | 0.0824 | 0.0000 | 1.1184 | Wm50(Max) | 1.3118 | 1.2863 | 1.2883 | 1.2278 |
Outliers = 5% | Methods | Outliers = 20% | Methods | ||||||
Criterion | EK01 | EK13 | EK82 | EK84 | Criterion | EK01 | EK13 | EK82 | EK84 |
PRESS | 1.5524 | 1.5680 | 1.5674 | 1.0965 | PRESS | 1.5331 | 1.5397 | 1.5462 | 1.2667 |
PRESS75 | 1.5361 | 1.5136 | 1.5256 | 1.1683 | PRESS75 | 1.5506 | 1.5619 | 1.5650 | 1.2965 |
PRESS50 | 1.4639 | 1.4059 | 1.4189 | 1.2177 | PRESS50 | 1.3658 | 1.3845 | 1.4704 | 1.3262 |
Wm | 1.2733 | 1.2049 | 1.1807 | 1.1067 | Wm | 1.3682 | 1.1988 | 1.1768 | 1.1356 |
Wm(Max) | 1.2754 | 1.2049 | 1.1807 | 1.1059 | Wm(Max) | 1.3618 | 1.1988 | 1.1768 | 1.1540 |
Wm75 | 1.4223 | 1.2850 | 1.2965 | 1.1224 | Wm75 | 1.5051 | 1.5372 | 1.5628 | 1.2210 |
Wm75(Max) | 1.4773 | 1.3853 | 1.3826 | 1.1292 | Wm75(Max) | 1.5164 | 1.5613 | 1.5680 | 1.2497 |
Wm50 | 1.1589 | 1.2008 | 1.2478 | 1.1582 | Wm50 | 1.3567 | 1.3086 | 1.3888 | 1.2847 |
Wm50(Max) | 1.3298 | 1.2757 | 1.2939 | 1.1666 | Wm50(Max) | 1.3588 | 1.3698 | 1.4449 | 1.2950 |
Procrustes Statistics for the Real Dataset | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Outliers = 0% | Methods | Outliers = 10% | Methods | ||||||||||||||
Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R | Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R |
PRESS | 185 | 9 | 66 | 7 | 162 | 4 | 493 | 4 | PRESS | 598 | 1 | 79,846 | 1 | 124,788 | 1 | 500 | 5 |
PRESS75 | 185 | 9 | 66 | 7 | 53 | 7 | 493 | 4 | PRESS75 | 598 | 1 | 79,846 | 1 | 124,788 | 1 | 514 | 6 |
PRESS50 | 280 | 5 | 66 | 7 | 24 | 9 | 492 | 7 | PRESS50 | 598 | 1 | 79,846 | 1 | 124,788 | 1 | 500 | 5 |
Wm | 300 | 4 | 469 | 1 | 455 | 1 | 491 | 3 | Wm | 188,330 | 5 | 250,402 | 4 | 343,767 | 4 | 500 | 5 |
Wm(Max) | 280 | 5 | 231 | 3 | 222 | 3 | 491 | 3 | Wm(Max) | 188,330 | 5 | 250,402 | 4 | 343,767 | 4 | 500 | 5 |
Wm75 | 280 | 5 | 129 | 5 | 162 | 4 | 493 | 4 | Wm75 | 611 | 2 | 79,846 | 1 | 124,788 | 1 | 499 | 5 |
Wm75(Max) | 280 | 5 | 66 | 7 | 53 | 7 | 493 | 4 | Wm75(Max) | 317,830 | 8 | 79,846 | 1 | 124,788 | 1 | 499 | 5 |
Wm50 | 300 | 4 | 94 | 6 | 82 | 6 | 493 | 4 | Wm50 | 147,451 | 4 | 79,846 | 1 | 124,788 | 1 | 499 | 5 |
Wm50(Max) | 300 | 4 | 38 | 9 | 24 | 9 | 493 | 4 | Wm50(Max) | 317,830 | 8 | 79,846 | 1 | 124,788 | 1 | 499 | 5 |
Outliers = 5% | Methods | Outliers = 20% | Methods | ||||||||||||||
Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R | Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R |
PRESS | 564 | 1 | 50,690 | 1 | 63,605 | 1 | 475 | 4 | PRESS | 190,811 | 1 | 218,985 | 1 | 354,334 | 1 | 1236 | 4 |
PRESS75 | 564 | 1 | 50,690 | 1 | 63,605 | 1 | 475 | 4 | PRESS75 | 190,811 | 1 | 218,985 | 1 | 354,334 | 1 | 1662 | 5 |
PRESS50 | 564 | 1 | 50,690 | 1 | 63,605 | 1 | 475 | 4 | PRESS50 | 190,811 | 1 | 218,985 | 1 | 354,334 | 1 | 1662 | 5 |
Wm | 130,404 | 7 | 291,105 | 11 | 247,466 | 6 | 475 | 4 | Wm | 190,811 | 1 | 849,997 | 7 | 354,334 | 1 | 915 | 3 |
Wm(Max) | 130,404 | 7 | 291,105 | 11 | 247,466 | 6 | 475 | 4 | Wm(Max) | 498,098 | 3 | 849,997 | 7 | 626,980 | 3 | 915 | 3 |
Wm75 | 564 | 1 | 99,950 | 2 | 63,605 | 1 | 475 | 4 | Wm75 | 378,913 | 2 | 218,985 | 1 | 503,121 | 2 | 1662 | 5 |
Wm75(Max) | 564 | 1 | 130,657 | 3 | 63,605 | 1 | 475 | 4 | Wm75(Max) | 498,098 | 3 | 218,985 | 1 | 626,980 | 3 | 1662 | 5 |
Wm50 | 30,753 | 2 | 130,657 | 3 | 123,006 | 2 | 475 | 4 | Wm50 | 587,876 | 4 | 218,985 | 1 | 503,121 | 2 | 1662 | 5 |
Wm50(Max) | 101,359 | 6 | 252,100 | 8 | 173,123 | 3 | 475 | 4 | Wm50(Max) | 778,336 | 7 | 218,985 | 1 | 626,980 | 3 | 1662 | 5 |
Critical Angles for the Real Dataset | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Outliers = 0% | Methods | Outliers = 10% | Methods | ||||||||||||||
Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R | Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R |
PRESS | 0.9468 | 9 | 0.2145 | 7 | 0.0000 | 4 | 1.5706 | 4 | PRESS | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.5419 | 5 |
PRESS75 | 0.9468 | 9 | 0.2145 | 7 | 0.0000 | 7 | 1.5706 | 4 | PRESS75 | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.4672 | 6 |
PRESS50 | 1.1845 | 5 | 0.2145 | 7 | 0.0000 | 9 | 1.3454 | 7 | PRESS50 | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.5419 | 5 |
Wm | 0.9548 | 4 | 1.5708 | 1 | 1.5708 | 1 | 1.5160 | 3 | Wm | 1.5340 | 5 | 1.5212 | 4 | 1.4336 | 4 | 1.5419 | 5 |
Wm(Max) | 1.1845 | 5 | 0.1481 | 3 | 0.0000 | 3 | 1.5160 | 3 | Wm(Max) | 1.5340 | 5 | 1.5212 | 4 | 1.4336 | 4 | 1.5419 | 5 |
Wm75 | 1.1845 | 5 | 0.2507 | 5 | 0.0000 | 4 | 1.5706 | 4 | Wm75 | 0.8734 | 2 | 1.5708 | 1 | 1.5708 | 1 | 1.5419 | 5 |
Wm75(Max) | 1.1845 | 5 | 0.2145 | 7 | 0.0000 | 7 | 1.5706 | 4 | Wm75(Max) | 1.2729 | 8 | 1.5708 | 1 | 1.5708 | 1 | 1.5419 | 5 |
Wm50 | 0.9548 | 4 | 0.2297 | 6 | 0.0000 | 6 | 1.5706 | 4 | Wm50 | 1.4129 | 4 | 1.5708 | 1 | 1.5708 | 1 | 1.5419 | 5 |
Wm50(Max) | 0.9548 | 4 | 0.2196 | 9 | 0.0000 | 9 | 1.5706 | 4 | Wm50(Max) | 1.2729 | 8 | 1.5708 | 1 | 1.5708 | 1 | 1.5419 | 5 |
Outliers = 5% | Methods | Outliers = 20% | Methods | ||||||||||||||
Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R | Criterion | EK01 | R | EK13 | R | EK82 | R | EK84 | R |
PRESS | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.5674 | 4 | PRESS | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.4772 | 4 |
PRESS75 | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.5674 | 4 | PRESS75 | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.4279 | 5 |
PRESS50 | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.5674 | 4 | PRESS50 | 1.5708 | 1 | 1.5708 | 1 | 1.5708 | 1 | 1.4279 | 5 |
Wm | 1.4977 | 7 | 1.4511 | 11 | 1.5510 | 6 | 1.5674 | 4 | Wm | 1.5708 | 1 | 1.3250 | 7 | 1.5708 | 1 | 1.2653 | 3 |
Wm(Max) | 1.4977 | 7 | 1.4511 | 11 | 1.5510 | 6 | 1.5674 | 4 | Wm(Max) | 1.4304 | 3 | 1.3250 | 7 | 1.4549 | 3 | 1.2653 | 3 |
Wm75 | 1.5708 | 1 | 1.5394 | 2 | 1.5708 | 1 | 1.5674 | 4 | Wm75 | 1.4678 | 2 | 1.5708 | 1 | 1.2746 | 2 | 1.4279 | 5 |
Wm75(Max) | 1.5708 | 1 | 1.3051 | 3 | 1.5708 | 1 | 1.5674 | 4 | Wm75(Max) | 1.4304 | 3 | 1.5708 | 1 | 1.4549 | 3 | 1.4279 | 5 |
Wm50 | 1.2620 | 2 | 1.3051 | 3 | 1.3082 | 2 | 1.5674 | 4 | Wm50 | 1.4821 | 4 | 1.5708 | 1 | 1.2746 | 2 | 1.4279 | 5 |
Wm50(Max) | 1.5166 | 6 | 1.5696 | 8 | 1.2432 | 3 | 1.5674 | 4 | Wm50(Max) | 1.3989 | 7 | 1.5708 | 1 | 1.4549 | 3 | 1.4279 | 5 |
Simulated Matrix with n = 100, p = 8, Rank = 4 | ||||
---|---|---|---|---|
Outliers | EK01 | EK13 | EK82 | EK84 |
0% | 15.05 | 10.36 | 9.94 | 17.43 |
5% | 10.52 | 11.40 | 8.45 | 12.94 |
10% | 11.36 | 10.47 | 8.69 | 13.98 |
20% | 14.89 | 12.75 | 11.58 | 20.47 |
Mean | 12.96 | 11.25 | 9.67 | 16.21 |
SD | 2.35 | 1.11 | 1.43 | 3.43 |
Real matrix with n = 44 and p = 14 | ||||
0% | 23.93 | 13.84 | 16.37 | 17.55 |
5% | 15.69 | 23.40 | 14.34 | 18.17 |
10% | 16.98 | 21.34 | 19.11 | 19.48 |
20% | 16.67 | 16.86 | 16.65 | 19.17 |
Mean | 18.32 | 18.86 | 16.62 | 18.59 |
SD | 3.78 | 4.32 | 1.95 | 0.89 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arciniegas-Alarcón, S.; García-Peña, M.; Krzanowski, W.J. Cross-Validation for Lower Rank Matrices Containing Outliers. Appl. Syst. Innov. 2022, 5, 69. https://doi.org/10.3390/asi5040069
Arciniegas-Alarcón S, García-Peña M, Krzanowski WJ. Cross-Validation for Lower Rank Matrices Containing Outliers. Applied System Innovation. 2022; 5(4):69. https://doi.org/10.3390/asi5040069
Chicago/Turabian StyleArciniegas-Alarcón, Sergio, Marisol García-Peña, and Wojtek J. Krzanowski. 2022. "Cross-Validation for Lower Rank Matrices Containing Outliers" Applied System Innovation 5, no. 4: 69. https://doi.org/10.3390/asi5040069
APA StyleArciniegas-Alarcón, S., García-Peña, M., & Krzanowski, W. J. (2022). Cross-Validation for Lower Rank Matrices Containing Outliers. Applied System Innovation, 5(4), 69. https://doi.org/10.3390/asi5040069