Efficient Estimation and Validation of Shrinkage Estimators in Big Data Analytics
Abstract
:1. Introduction
2. Statistical Methodology
- The ridge estimator, with , proposed by [6] as a possible solution to multicollinearity is given byIn this case, The shrinkage parameter, k, should be estimated using the data, but it is unclear which value of k produces the best estimator. A large number of efforts have been aimed at accurately estimating the shrinkage parameter. Some proposed techniques for estimating k were proposed by refs. [11,12,13,14] and recently, [15]. We consider the estimator of the shrinkage parameter proposed by [6]. The estimator is , where is the maximum element of .
- The modified ridge-type estimator with isThis estimator includes the ridge and OLS estimators as special cases. The authors of Ref. [7] suggest an iterative approach to estimating the shrinkage parameters. Their approach is as follows:
- (a)
- Let the initial estimate of d be calculated as where is the ith element of .
- (b)
- Using from 1, estimate k as .
- (c)
- Let . Estimate as .
- (d)
- Use if is not between 0 and 1.
- The Liu estimator with the combined benefits of the ridge estimator and the Stein type estimator [16] was proposed by [8]. Given that rank() = p, the Liu estimator is given by
- A limitation of the estimator of the shrinkage parameter of the Liu estimator is that, in some instances, it has a negative value that affects the estimator’s performance [17]. The modified one-parameter Liu estimator proposed by [9] yields a positive value of and provides a significant improvement in the performance of the estimator. Given that rank() = p, the modified one-parameter Liu estimator is given by
- Lastly, the Kibria–Lukman estimator, proposed by [10], is a one-parameter estimator that combines the characteristics of both the ridge and Liu estimators. Given that rank() = p, the estimator is given by
3. Efficient Model Estimation and Validation in Big Data Analytics
Algorithm 1 Model estimation and validation based on the sufficient statistics array |
Input: Subsets of the data Output: , , and measures of prediction performance based on the testing dataset. 1. Divide the data into K blocks. MAP tasks 2. For each of the K blocks: 2.1 Calculate Reduce tasks 3. Let j be the index of the block to be excluded from each of the K training datasets such that represent the remaining blocks. For the K training datasets: 3.1 Calculate 3.2 Calculate , the eigenvectors of . 3.3 Calculate , the eigenvalues of . MAP tasks 4. For each of the K blocks: 4.1 Calculate , , and for each with . Reduce tasks 5. For each of the K training datasets with : 5.1 Calculate . 5.2 Calculate , and . 5.3 Calculate the shrinkage parameter of the estimator under consideration. 5.4 Calculate and . 6. For each of the K testing datasets: 6.1 Calculate the predicted response. 6.2 Calculate prediction performance measures based on the observed and predicted responses. |
4. Numerical Analysis
4.1. Simulation Study
4.2. Application
5. Conclusions and Future Work
- Extending the methodology to generalised linear models.
- Extending the methodology to streaming data.
- Extending the methodology to robust estimators.
- Extending the methodology to estimators for feature selection, for which one may consider measures of validation and evaluation similar to those in [21].
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, C.; Chen, M.H.; Schifano, E.; Wu, J.; Yan, J. Statistical methods and computing for big data. Stat. Interface 2016, 9, 399. [Google Scholar] [CrossRef] [PubMed]
- Emerson, J.W.; Kane, M.J. Don’t drown in the data. Significance 2012, 9, 38–39. [Google Scholar] [CrossRef]
- Chan, J.Y.L.; Leow, S.M.H.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.W.; Chen, Y.L. Mitigating the multicollinearity problem and its machine learning approach: A review. Mathematics 2022, 10, 1283. [Google Scholar] [CrossRef]
- Shaheen, N.; Shah, I.; Almohaimeed, A.; Ali, S.; Alqifari, H.N. Some Modified Ridge Estimators for Handling the Multicollinearity Problem. Mathematics 2023, 11, 2522. [Google Scholar] [CrossRef]
- Zhang, T.; Yang, B. An exact approach to ridge regression for big data. Comput. Stat. 2017, 32, 909–928. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Lukman, A.F.; Ayinde, K.; Binuomote, S.; Clement, O.A. Modified ridge-type estimator to combat multicollinearity: Application to chemical data. J. Chemom. 2019, 33, e3125. [Google Scholar] [CrossRef]
- Kejian, L. A new class of blased estimate in linear regression. Commun. Stat.-Theory Methods 1993, 22, 393–402. [Google Scholar] [CrossRef]
- Lukman, A.F.; Kibria, B.; Ayinde, K.; Jegede, S.L. Modified one-parameter liu estimator for the linear regression model. Model. Simul. Eng. 2020, 2020, 9574304. [Google Scholar] [CrossRef]
- Kibria, B.; Lukman, A.F. A new ridge-type estimator for the linear regression model: Simulations and applications. Scientifica 2020, 2020, 9758378. [Google Scholar] [CrossRef] [PubMed]
- Kibria, B.G. Performance of some new ridge regression estimators. Commun. Stat.-Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
- Alkhamisi, M.; Khalaf, G.; Shukur, G. Some modifications for choosing ridge parameters. Commun. Stat.-Theory Methods 2006, 35, 2005–2020. [Google Scholar] [CrossRef]
- Lukman, A.F.; Ayinde, K. Review and classifications of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–967. [Google Scholar] [CrossRef]
- Muniz, G.; Kibria, B.G. On some ridge regression estimators: An empirical comparisons. Commun. Stat.-Simul. Comput. 2009, 38, 621–630. [Google Scholar] [CrossRef]
- Arashi, M.; Saleh, A.M.E.; Kibria, B.G. Theory of Ridge Regression Estimation with Applications; John Wiley & Sons: New York, NY, USA, 2019. [Google Scholar]
- Stein, C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc. 3rd Berkeley Symp. Math. Stat. Probab. 1956, 1, 197–206. [Google Scholar]
- Özkale, M.R.; Kaçiranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Stat.-Theory Methods 2007, 36, 2707–2725. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kannard, R.W.; Baldwin, K.F. Ridge regression: Some simulations. Commun. Stat.-Theory Methods 1975, 4, 105–123. [Google Scholar] [CrossRef]
- Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509. [Google Scholar] [PubMed]
- Saleh, A.M.E.; Arashi, M.; Saleh, R.A.; Norouzirad, M. Rank-Based Methods for Shrinkage and Selection: With Application to Machine Learning; John Wiley & Sons: New York, NY, USA, 2022. [Google Scholar]
- Sechidis, K.; Azzimonti, L.; Pocock, A.; Corani, G.; Weatherall, J.; Brown, G. Efficient feature selection using shrinkage estimators. Mach. Learn. 2019, 108, 1261–1286. [Google Scholar] [CrossRef]
Estimator | ||||
---|---|---|---|---|
0.9 | Ridge | −2.776 | 6.501 | |
0.9 | Modified ridge | −1.784 | 3.679 | −4.567 |
0.9 | Liu | 2.568 | 1.825 | 1.760 |
0.9 | Modified Liu | 3.077 | 5.115 | 3.470 |
0.9 | Kibria–Lukman | −1.865 | −1.388 | 1.421 |
0.999 | Ridge | 4.184 | 2.637 | 2.540 |
0.999 | Modified ridge | −6.462 | −6.894 | −7.709 |
0.999 | Liu | 8.086 | 9.276 | −4.793 |
0.999 | Modified Liu | 3.448 | 2.908 | 1.503 |
0.999 | Kibria–Lukman | 2.920 | 2.776 | 1.386 |
Estimator | ||
---|---|---|
Ridge | 6.719 | |
Modified ridge | 1.803 | 2.871 |
Liu | 5.253 | 1.496 |
Modified Liu | 3.131 | 9.764 |
Kibria–Lukman | 7.488 | 5.760 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
du Plessis, S.; Arashi, M.; Maribe, G.; Millard, S.M. Efficient Estimation and Validation of Shrinkage Estimators in Big Data Analytics. Mathematics 2023, 11, 4632. https://doi.org/10.3390/math11224632
du Plessis S, Arashi M, Maribe G, Millard SM. Efficient Estimation and Validation of Shrinkage Estimators in Big Data Analytics. Mathematics. 2023; 11(22):4632. https://doi.org/10.3390/math11224632
Chicago/Turabian Styledu Plessis, Salomi, Mohammad Arashi, Gaonyalelwe Maribe, and Salomon M. Millard. 2023. "Efficient Estimation and Validation of Shrinkage Estimators in Big Data Analytics" Mathematics 11, no. 22: 4632. https://doi.org/10.3390/math11224632
APA Styledu Plessis, S., Arashi, M., Maribe, G., & Millard, S. M. (2023). Efficient Estimation and Validation of Shrinkage Estimators in Big Data Analytics. Mathematics, 11(22), 4632. https://doi.org/10.3390/math11224632