Next Article in Journal
A Case Study of the Impact of Climate Change on Agricultural Loan Credit Risk
Next Article in Special Issue
Multiple Change-Point Detection in a Functional Sample via the 𝒢-Sum Process
Previous Article in Journal
Efficient Covering of Thin Convex Domains Using Congruent Discs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations

1
Department of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad P.O. Box 9177948974, Iran
2
Department of Statistics, University of Pretoria, Pretoria 0002, South Africa
3
Department of Statistics, Faculty of Mathematical Sciences, Shahrood University of Technology, Shahrood P.O. Box 3619995181, Iran
4
Department of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, Semnan University, Semnan P.O. Box 3514799422, Iran
5
Department of Economics and Statistics, University of Mauritius, Réduit 80837, Mauritius
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(23), 3057; https://doi.org/10.3390/math9233057
Submission received: 10 November 2021 / Revised: 23 November 2021 / Accepted: 24 November 2021 / Published: 28 November 2021
(This article belongs to the Special Issue Advances of Functional and High-Dimensional Data Analysis)

Abstract

:
The ridge regression estimator is a commonly used procedure to deal with multicollinear data. This paper proposes an estimation procedure for high-dimensional multicollinear data that can be alternatively used. This usage gives a continuous estimate, including the ridge estimator as a particular case. We study its asymptotic performance for the growing dimension, i.e., p when n is fixed. Under some mild regularity conditions, we prove the proposed estimator’s consistency and derive its asymptotic properties. Some Monte Carlo simulation experiments are executed in their performance, and the implementation is considered to analyze a high-dimensional genetic dataset.

1. Introduction

Consider the multiple regression model given by
Y = X β + ϵ ,
where Y = ( y 1 , , y n ) is a vector of n responses, X = ( x 1 , , x n ) is an n × p design matrix, with the ith predictor x i R p , β = ( β 1 , , β p ) is the coefficients vector, and ϵ is an n-vector of unobserved errors. Further, we shall assume E ( ϵ ) = 0 , E ϵ ϵ = σ 2 I n , σ 2 > 0 .
When p < n , the ordinary least squares (LS) estimator of β is given by
β ^ = arg min β R p S ( β ) , S ( β ) = ( Y X β ) ( Y X β ) = ( X X ) 1 X Y .
However, for the high dimensional (HD) case, p > n the LS estimator cannot be obtained, because X X is rank deficient. It is well known that the ridge regression (RR) estimator of [1], followed by [2] regularization, however, exists. The rationale is to add a positive value k > 0 to the eigenvalues of X X to efficiently estimate the parameters via β ^ Ridge = ( X X + k I p ) 1 X Y . Refer to Saleh et al. [3] for theory and application of the RR approach. Using the projection of β onto the row space of X is a well-described remedy. Wang et al. [4] used this technique and proposed a high dimensional LS estimator as a limiting case of the RR, while Buhlmann [5] also used the projection method and developed a bias correction in the RR estimator to propose a bias-corrected RR estimator for the high dimensional setting. Shao and Deng [6] used the method and proposed to threshold the RR estimator when the projection vector is sparse, in the sense that many of its components are small and demonstrated consistency. Dicker [7] studied the minimum property of the RR estimator and derived its asymptotic risk for the growing dimension, i.e., p . Although the RR estimator involves high dimensional problems, there exits a counterpart that has not been considered in high dimension.

An Existing Two-Parameter Biased Estimator

It is well known that the RR estimator is an efficient approach for multicollinear situations. Since then, many authors have developed ridge-type estimators to overcome the issue of multicollinearity. One drawback of the RR estimator is that it is a non-linear function of the tuning parameter. Hence, Liu [8] developed a similar estimator; however, it is linear for the tuning parameter via the following optimization problem, for the case p < n :
min β R p S ( β ) + ( d β ^ β ) ( d β ^ β ) .
The solution to the optimization problem (3) has the form
β ^ Liu = ( X X + I p ) 1 ( X Y + d β ^ ) ,
where d ( 0 , 1 ) is termed as the biasing parameter.
Combining the advantages of the RR and Liu estimators, Ozkale and Kaciranlar [9] proposed a two-parameter estimator by solving the following optimization problem:
min β R p S ( β ) + k ( d β ^ β ) ( d β ^ β ) c ,
where c is a constant, and k is the Lagrangian multiplier. The resulting two-parameter ridge estimator has the form
β ^ ( k , d ) = ( X X + k I p ) 1 ( X Y + k d β ^ )
The above estimator has several advantages and can be simplified to LS, RR, and Liu estimators as limiting cases (see Figure 1). It can be argued that this estimator can also be interpreted as a restricted estimator under stochastic prior information about β .
With growing dimensions p, p > n , the LS estimator (2) cannot be obtained, so it is not possible to use the two-parameter ridge estimator in Equation (6). Hence, developing a high-dimensional two-parameter version of this estimator and studying its asymptotic performance is interesting and worthwhile. Therefore, in this paper, we propose a high-dimensional version of Ozkale and Kaciranlar’s estimator and give the asymptotic properties. The paper’s organization is as follows: In Section 2, a high-dimensional two-parameter estimator is proposed, and its asymptotic characteristics are discussed. Section 3 indicates the generalized cross validation for choosing the parameters. In Section 4, some simulation experiments are presented to assess the novel estimator’s statistical and computational performance, and an application to the AML data is illustrated in this section The conclusion is presented in the last section.

2. The Proposed Estimator

In this section, we develop an HD estimator and establish its asymptotic properties. To show a component is dependent to p, we shall use the subscript p and particularly consider the scenarios in which p and n is fixed. This is termed large p, fixed n, which is more general than scenarios with p / n ρ ( 0 , ) , a common assumption in high-dimensional settings.
Consider a diverging number of variables case, in which p is allowed to tend to infinity. This case fulfills the high-dimensional case p > n . Under this setting, the inverse of X X does not exist; however, the RR estimator is still valid and applicable. Further, the Liu estimator cannot be obtained. As a remedy, one can use the Moore–Penrose inverse of X X , a particular case of the generalized inverse. Wang and Leng [10] showed that ( X X ) 1 X can be seen as the Moore-Penrose inverse of X for p < n , and that X ( X X ) 1 is the Moore–Penrose inverse of X when p > n . This gives, for any p , n > 0 ,
( X X + s I p ) 1 X = X ( X X + s I n ) 1 ,
where s is an arbitrary nonegative constant.
Multiplying both sides of (7) by Y reveals that the LS estimator can be represented as
β ^ = lim s ( X X + s I p ) 1 X Y = lim s X ( X X + s I n ) 1 Y = X ( X X ) 1 Y .
Now, for the HD case, substitute (8) in (6) to obtain
β ^ HD = ( X X + k p I p ) 1 ( X Y + k p d p X ( X X ) 1 Y ) = ( X X + k p I p ) 1 ( X + k p d p X ( X X ) 1 ) Y = ( X X + k p I p ) 1 ( X + k p d p X + ) Y ,
where X + = X ( X X ) 1 is the Moore–Penrose inverse of X .
We impose the following regularity conditions for studying the asymptotic performance of the estimator. β ^ HD given by (9).
(A1) 
1 / k p = o ( 1 ) . There exists a constant 0 δ < 0.5 , such that a component of X is O k p δ .
(A2) 
d p = o ( 1 ) . There exists a constant 0 η < 0.5 , such that a component of X + is O d p η .
(A3) 
For sufficiently large p, there is a vector b p × 1 , such that β = X X b , and there exists a constant ε > 0 , such that each component of the vector b p × 1 is O ( 1 / p ε + 1.5 ) , and k p = o ( p ε a p ) , with a p = o ( 1 ) . (An example of such choice is k p = p and ε = 0.5 + δ ).
(A4) 
For sufficiently large p, there exists a constant δ > 0 , such that each component of β is O ( p 2 δ ) and 1 / d p = o ( p δ ) . Further, k p δ 1 = o ( d p ) .
Assumption (A3) is adopted from Luo [11]. Let β ^ HD = β ^ 1 HD , , β ^ p HD .
Theorem 1. 
Assume(A1)and(A2). Then, var β ^ i HD = o ( 1 ) for all i = 1 , , p .
Proof. 
For the proof, refer to Appendix A. □
Theorem 2. 
Assume(A1)(A3). Further, suppose λ i p = O k p , where λ i p > 0 is the ith eigenvalue of X X . Then, b i a s β ^ i HD = o ( 1 ) for all i = 1 , 2 , , p .
Proof. 
For the proof, refer to Appendix A. □
Using Theorems 1 and 2, it can be verified that the HD estimator β ^ HD is a consistent estimator for β as p .
The following result reveals the asymptotic distribution of this estimator as p .
Theorem 3. 
Assume 1 / k p = o ( 1 ) , and for sufficiently large p, there exists a constant δ > 0 , such that each component of β is O 1 / p 2 + δ . Let k p = o ( p δ ) , λ i p = o ( k p ) . Furthermore, suppose that ϵ N n 0 , σ 2 I n , σ 2 > 0 . Then,
1 d p β ^ HD β D N 0 , σ 2 X + X + a s p .
Proof. 
For the proof, refer to Appendix A. □

3. Generalized cross Validation

As noted, the estimator β ^ HD depends on both the ridge parameter k p and Liu parameter d p that must be optimized in practice. To do this, we use the generalized cross-validation (GCV) criterion. The GCV uses to choose the ridge and Liu parameters by minimizing an estimate of the unobservable risk function
R β ; β ^ HD = 1 n E ( Y ) Y ^ HD ( k p , d p ) E ( Y ) Y ^ HD ( k p , d p ) = 1 n E ( Y ) X β ^ HD ( k p , d p ) 2 ,
where
Y ^ HD ( k p , d p ) = X β ^ HD = ( X X + k p I p ) 1 ( X + k p d p X + ) Y = H ( k p , d p ) Y ,
with H ( k p , d p ) = X ( X X + k p I p ) 1 ( X + k p d p X + ) , termed as the hat matrix of Y .
This is straightforward to demonstrate, as in [12].
E R β ; β ^ HD = 1 n I n H ( k p , d p ) X β 2 + σ 2 n tr H ( k p , d p ) H ( k p , d p ) = ν 1 2 ( k p , d p ) + σ 2 ν 2 ( k p , d p ) ,
where ν 1 2 ( k p , d p ) = 1 n I n H ( k , d ) X β 2 and ν 2 ( k p , d p ) = 1 n tr H ( k p , d p ) H ( k p , d p .
The GCV function is then defined as
GCV β ^ HD = 1 n I n H ( k p , d p ) y 2 1 1 n tr H ( k p , d p ) 2 = 1 n I n H ( k p , d p ) y 2 1 μ 1 ( k p , d p ) 2 ,
where μ 1 ( k p , d p ) = 1 n tr H ( k p , d p ) .
The following theorem extends the GCV theorem proposed by Akdeniz and Roozbeh [13].
Theorem 4. 
According to the definition of GCV, we have
E R β ; β ^ HD E GCV β ^ HD + σ 2 E R β ; β ^ HD = 1 σ 2 1 μ 1 ( k p , d p ) 2 + 1 D ( k , d ) × σ 2 μ 1 ( k p , d p ) 2 1 μ 1 ( k p , d p ) 2 ,
where D ( k , d ) = ν 1 2 ( k p , d p ) + σ 2 ν 2 ( k p , d p ) , and consequently,
E R β ; β ^ HD E GCV β ^ HD + σ 2 E R β ; β ^ HD < σ 2 1 μ 1 ( k p , d p ) 2 × 2 μ 1 ( k p , d p ) + μ 1 ( k p , d p ) 2 ν 2 ( k p , d p ) ,
whenever 0 < μ 1 ( k p , d p ) < 1 .
Proof. 
r For the proof, refer to Appendix A. □

4. Numerical Investigations

In this section, for performance assessment of the proposed HD estimator β ^ HD , we conduct a simulation study along with the analysis of real data.

4.1. Simulation

Here, we consider the multiple regression model with varying squared multiple correlation coefficient R 2 and error distribution, given by the following relation:
Y = c X β + σ ϵ ,
where β = ( β 1 , 0 ) , β 1 is the active set, and its dimension is p 1 = 0.4 p . The absolute values of a normal distribution with mean 0 and standard deviation 5 is considered β 1 . The remaining p p 1 components are zero.
In this example, motivated by McDonald and Galarneau [14], the explanatory variables are computed by
x j = 1 ρ 2 z j + ρ z p 1 , j = 1 , , p ,
where the z j s are independent standard normal pseudo-random vectors, and ρ is specified such that the correlation between any two explanatory variables is given by ρ 2 . Similarly to Zhu et al. [15], the variance is set to σ 2 = 6.83 , and two different kinds of error distribution are taken for ϵ : (1) the standard normal is N n ( 0 , I n ) , and (2) standard t with 5 degrees of freedom t n ( 0 , I n , 5 ) . The constant c is also varied to control the signal-to-noise ratio, and it is set to 0.5 , 1, and 2 with the corresponding R 2 = 20 % , 50 % and 80 % . R 2 represents the proportion of the variable for a dependent variable that is explained by an independent variable or variables in a regression model.
We consider ρ { 0.8 , 0.95 } ; the sample size and the number of covariates are set to n { 30 , 50 , 100 } , p { 256 , 512 , 1024 } , respectively. Following regularity conditions ( A 1 )–( A 4 ), we set k p = p . For δ = 0.25 = 1 / 4 , we take d p = p 1 / 5 , which guarantees ( A 4 ). We then simulate β ^ HD and β ^ Ridge 100 times using Equation (9) and β ^ Ridge = ( X X + k p I p ) 1 X Y .
For comparison purposes, the quadratic bias (QB) and mean squared error (MSE) are computed according to
QB ( β ^ * ) = 1 100 j = 1 100 ( β ^ j * β ) ( β ^ j * β ) , and MSE ( β ^ * ) = 1 100 j = 1 100 ( β ^ j * β ) ( β ^ j * β ) ,
respectively, where β ^ * is one of β ^ HD or β ^ Ridge .

4.2. Review of Results

In Theorem 2, the condition for which the proposed β ^ HD is unbiased is investigated based on the eigenvalues of X X . Here, we numerically analyze the biasedness of this estimator by comparing the ridge estimator concerning the parameters of the model. For this purpose, the difference in QB is reported in Table 1 by evaluating
diff = QB ( β ^ HD ) QB ( β ^ Ridge ) .
If diff is positive, then the quadratic bias of the proposed estimator is larger than that of the ridge estimator.
To comprise the MSEs, we use the relative mean square error (RMSE) given by
RMSE = MSE ( β ^ Ridge ) MSE ( β ^ HD ) .
The results are reported in Table 2. If RMSE > 1 , then the proposed estimator has a smaller MSE compared to the ridge.
Based on the results of Table 1 and Table 2, the following conclusions are made:
(1)
The performance of the estimators is affected by the number of observations (n), the number of variables (p), the signal to noise ratio (c), and the degree of multicollinearity ( ρ ).
(2)
By increasing the degree of multicollinearity, ρ , although for both cases of error distributions, the QB of the proposed estimator increases for c = 0.5 and 1, its MSE decreases dramatically since the RMSE increases.
(3)
The signal-to-noise shows the effect of β in the model. Lower values (less than 1) are a sign of model sparsity, since, when c is small, the proposed estimator performs better than the ridge. This is evidence that our estimator is a better candidate as an alternative in sparse models in the MSE sense. However, the QB increases for large c values, which forces the model to overestimate the parameters.
(4)
As p increases, although the proposed estimator is superior to the ridge in sparse models (small c values), the efficiency decreases. This is more evident when the ratio p / n becomes larger. This fact may come as poor performance of the proposed estimators, but our estimator is still preferred in high dimensions for sparse models.
(5)
Obviously, as n increases, so does the RMSE; however, the QB becomes very large, and it is due to the nature of the proposed estimators because of its complicated form. It must be noted that this does not contradict the results of Theorem 2, since the simulation scheme does not obey the regularity condition.
(6)
There is evidence of robustness for the distribution tail for sparse models, i.e., the QB and RMSE are the same for both normal and t distributions. However, as c increases, the QB of the proposed estimator explodes for the heavier tail distribution. This may be seen as a disadvantage of the proposed estimators, but even for large values of c, the RMSE stays the same, evidence of relatively small variance for the heavier tail distribution.

4.3. AML Data Analysis

This section assesses the performance of the proposed estimators using the mean prediction error (MPE) and MSE criteria of a data set adopted from Metzeler et al. [16], in which the information for 79 patients was collected. The data can be accessed from the Gene Expression Omnibus (GEO) data repository (http://www.ncbi.nlm.nih.gov/geo/ (accessed on 1 January 2021)) by the National Center for Biotechnology Information (NCBI), where the data is available under GEO accession number GSE12417. We only use the data set that was used as a test set. This contains gene expression data for 79 adult patients with cytogenetically normal acute myeloid leukemia (CN-AML), showing heterogeneous treatment outcomes. According to Sill et al. [17], we reduce the total number of 54,675 gene expression features that have been measured with the Affymetrix HG-U133 Plus 2.0 microarray technology to the top p { 1000 , 2000 } features with the largest variance across all 79 samples. We considered overall survival time based on month as the response variable. The condition number of the design matrix for the AML data set is approximately 1095.80 , evident of severe multicollinearity among columns of the design matrix ([18], see p. 298). To find the optimum values of k and d, denoted by k opt and d opt for practical purposes, we use the GCV given by Equation (12). Hence, we use the following formulas:
β ^ HD* = ( X X + k opt I p ) 1 ( X Y + k opt d opt X ( X X ) 1 Y ) β ^ Ridge* = ( X X + k opt I p ) 1 X Y .
To compute the MPE and MSE, we divide the whole data set into two train ( T = ( X train , Y trian ) ) and validation ( V = ( X valid , Y valid ) ) sets, comprising 70 % and 30 % , respectively. Then, the measures are evaluated using
MPE boot ( β ^ * ) = 1 N . boot j = 1 N . boot ( X valid β ^ j train * Y valid ) ( X valid β ^ j train * Y valid ) , MSE boot ( β ^ * ) = 1 N . boot j = 1 N . boot ( β ^ j train * β HD * ) ( β ^ j train * β HD * ) ,
where N . boot stands for the number of bootstrapped sample, β ^ * is one of the proposed and ridge estimators, and β ^ HD * is the assumed true parameter obtained by Equation (9) from the whole data set.
RMPE boot = MPE boot ( β ^ Ridge * ) MPE boot ( β ^ HD * ) RMSE boot = MSE boot ( β ^ Ridge * ) MSE boot ( β ^ HD * )
The results are tabulated in Table 3 for the number of bootstrap N . boot = 200 . The following conclusions are obtained from Table 3:
(1)
Using the GCV, the proposed estimator is shown to be consistently superior to the ridge estimator, relative to RMSE and RMPE criteria.
(2)
Similarly to the results of simulations, with growing p, the MSE of the proposed estimator increases compared to the ridge estimator. However, as p gets larger the mean prediction error becomes smaller, which shows the superiority for prediction purposes.
Further, Figure 2 depicts the MSE and MPE values for both HD and ridge estimators, for the case p = 1000 . It is obvious that the high-dimensional estimator performs better compared to the ridge. For the case p = 2000 , we obtained similar results.

5. Conclusions

In this note, we propose a high-dimensional two-parameter ridge estimator to the conventional ridge and Liu estimators. Its asymptotic properties have also been discussed. This estimator, via simulation and real-life experiments, is efficient in high dimensional problems and can potentially overcome multicollinearity. Additionally, the proposed high-dimensional ridge estimator yields superior performance in the small mean squared error sense.

Author Contributions

Conceptualization, M.A. and N.M.K.; methodology, M.A. and N.M.K.; validation, M.A., M.N., M.R. and N.M.K.; formal analysis, M.A., M.N., M.R. and N.M.K.; investigation, M.A., M.N., M.R. and N.M.K.; resources, M.N.; writing—original draft preparation, M.A.; writing—review and editing, M.A., M.N., M.R. and N.M.K.; visualization, M.A., M.N. and M.R.; supervision, M.A. and N.M.K.; project administration, M.A., M.N., M.R. and N.M.K.; funding acquisition, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was based upon research supported, in part, by the visiting professor program, University of Pretoria, and the National Research Foundation (NRF) of South Africa, SARChI Research Chair UID: 71199; Reference: IFR170227223754 grant No. 109214. The work of M. Norouzirad and M. Roozbeh is based on the research supported in part by the Iran National Science Foundation (INSF) (grant number 97018318). The opinions expressed and conclusions arrived at are those of the authors and are not necessarily attributed to the NRF.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this article may be simulated in R, using the stated seed value and parameter values. The real data set is available at http://www.ncbi.nlm.nih.gov/geo/ (accessed on 1 January 2021).

Acknowledgments

We would like to sincerely thank two anonymous reviewers for their constructive comments, which led us to put many details in the paper and improved the presentation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of the Main Results

Proof of Theorem 1. 
By definition, we have
var β ^ HD = σ 2 X X + k p I p 1 X + k p d p X + X + k p d p X + X X + k p I p 1 = σ 2 X X k p + I p 1 X k p + d p X + X k p + d p X + X X k p + I p 1
By (A1), X / k p = O ( 1 ) k p δ 1 = o ( 1 ) and X X / k p + I p I p . By (A2), d p X + = O ( 1 ) d p 1 η = o ( 1 ) . Hence, var β ^ i HD 0 as p , and the proof is complete. □
Proof of Theorem 2. 
By definition
E β ^ HD = ( X X + k p I p ) 1 ( X + k p d p X + ) X β = X X k p + I p 1 X X k p + d p X + X β = X X k p + I p 1 X X k p β + d p X + X β .
Under (A2), d p X + X = o ( 1 ) . The proof is complete using Theorem 2 of Luo [11]. □
Proof of Theorem 3. 
We have
1 d p ( β ^ HD β ) = 1 d p ( X X + k p I p ) 1 ( X + k p d p X + ) ( X β + ϵ ) β = X X k p + I p 1 X k p d p + X + ϵ + 1 d p X X k p + I p 1 d p X + X I p β .
By (A1), X X / k p + I p I p , by (A2), d p X + X = o ( 1 ) , and by (A4), X / k p d p = o ( 1 ) . Hence,
1 d p ( β ^ HD β ) X + ϵ
The proof is complete. □
Proof of Theorem 4. 
It is straightforward to verify that
E GCV β ^ HD = ν 1 2 ( k p , d p ) + σ 2 1 2 μ 1 ( k p , d p ) + ν 2 ( k p , d p ) 1 μ 1 ( k p , d p ) 2 .
Hence
E R β ; β ^ HD E GCV β ^ HD = E R β ^ HD ( k p , d p ) ; β 1 1 1 μ 1 ( k p , d p ) 2 σ 2 1 2 μ 1 ( k p , d p ) 1 μ 1 ( k p , d p ) 2 ,
which leads to the required result. □

References

  1. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  2. Tikhonov, A.N. Solution of incorrectly formulated problems and the regularization method. Sov. Math. Dokl. 1963, 4, 1035–1038. [Google Scholar]
  3. Saleh, A.K.M.E.; Arashi, M.; Kibria, B.M.G. Theory of Ridge Regression Estimation with Applications; John Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
  4. Wang, X.; Dunson, D.; Leng, C. No penalty no tears: Least squares in high-dimensional models. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1814–1822. [Google Scholar]
  5. Bühlmann, P. Statistical significance in high-dimensional linear models. Bernoulli 2013, 19, 1212–1242. [Google Scholar] [CrossRef] [Green Version]
  6. Shao, J.; Deng, X. Estimation in high-dimensional linear models with deterministic design matrices. Ann. Stat. 2012, 40, 812–831. [Google Scholar] [CrossRef] [Green Version]
  7. Dicker, L.H. Ridge regression and asymptotic minimum estimation over spheres of growing dimension. Bernoulli 2016, 22, 1–37. [Google Scholar] [CrossRef]
  8. Liu, K. A new class of biased estimate in linear regression. Commun. Stat. Theory Methods 1993, 22, 393–402. [Google Scholar]
  9. Ozkale, M.R.; Kaciranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Stat. Theory Methods 2007, 36, 2707–2725. [Google Scholar] [CrossRef]
  10. Wang, X.; Leng, C. High dimensional ordinary least squares projection for screening variables. J. R. Stat. Soc. Ser. B 2015. [Google Scholar] [CrossRef] [Green Version]
  11. Luo, J. The discovery of mean square error consistency of ridge estimator. Stat. Probab. Lett. 2010, 80, 343–347. [Google Scholar] [CrossRef]
  12. Amini, M.; Roozbeh, M. Optimal partial ridge estimation in restricted semiparametric regression models. J. Multivar. Anal. 2015, 136, 26–40. [Google Scholar] [CrossRef]
  13. Akdeniz, F.; Roozbeh, M. Generalized difference-based weighted mixed almost unbiased ridge estimator in partially linear models. Stat. Pap. 2019, 60, 1717–1739. [Google Scholar] [CrossRef]
  14. McDonald, G.C.; Galarneau, D.I. A Monte Carlo of Some Ridge-Type Estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
  15. Zhu, L.P.; Li, L.; Li, R.; Zhu, L.X. Model-free feature screening for ultrahigh dimensional data. J. Am. Stat. Assoc. 2011, 106, 1464–1475. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Metzeler, K.H.; Hummel, M.; Bloomfield, C.D.; Spiekermann, K.; Braess, J.; Sauerl, M.C.; Heinecke, A.; Radmacher, M.; Marcucci, G.; Whitman, S.P.; et al. An 86 Probe Set Gene Expression Signature Predicts Survival in Cytogenetically Normal Acute Myeloid Leukemia. Blood 2008, 112, 4193–4201. [Google Scholar] [CrossRef] [PubMed]
  17. Sill, M.; Hielscher, T.; Becker, N.; Zucknick, M. c060: Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models; R Package Version 0.2-4; 2014. Available online: http://CRAN.R-project.org/package=c060 (accessed on 1 January 2021).
  18. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 5th ed.; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
Figure 1. Special limiting cases.
Figure 1. Special limiting cases.
Mathematics 09 03057 g001
Figure 2. Box-plot of the MSE and MPE values for p = 1000 in the AML data.
Figure 2. Box-plot of the MSE and MPE values for p = 1000 in the AML data.
Mathematics 09 03057 g002
Table 1. The difference between quadratic biases of the high dimensional and ridge estimators.
Table 1. The difference between quadratic biases of the high dimensional and ridge estimators.
ρ = 0.8 ρ = 0.95
N ( 0 , I n ) t n ( 0 , I n , 5 ) N ( 0 , I n ) t n ( 0 , I n , 5 )
p c n diffdiffdiffdiff
256 0.5 305.76575.764310.053510.1134
506.49116.494111.472211.5088
10017.831417.849330.100830.4137
13022.9671487.416939.6556459.3473
5025.8621522.629845.1138480.5501
10070.8693798.6551118.1326676.1919
23091.75262413.9746158.00262256.1664
50103.39962587.7382179.85092357.4111
100283.05493922.0057470.41143259.2211
512 0.5 303.19433.20126.55286.6001
504.48004.47819.58619.6151
10010.212110.248920.182820.3366
13012.7657926.754026.0911916.2663
5017.88611009.359538.0549969.4353
10040.72541192.445579.90941095.9628
23051.06054621.0862104.28924555.0569
5071.51575029.3107151.94614809.2595
100162.76165920.6337318.73435397.7878
1024 0.5 301.75941.75843.73843.7410
503.91883.93459.25239.3437
1005.12365.118912.646912.6455
1307.03181637.679814.89601636.5664
5015.67581804.854836.96491763.7468
10020.45641940.609150.29931856.0197
23028.12218181.425559.53128167.9835
5062.71579008.4246147.87158781.1968
10081.77569682.7404147.87159229.7803
Table 2. The relative MSE of the high dimensional and ridge estimators.
Table 2. The relative MSE of the high dimensional and ridge estimators.
ρ = 0.8 ρ = 0.95
N ( 0 , I n ) t n ( 0 , I n , 5 ) N ( 0 , I n ) t n ( 0 , I n , 5 )
p c n RMSERMSERMSERMSE
256 0.5 301.00501.00501.01401.0139
501.00581.00581.01611.0160
1001.02221.02221.05431.0539
1301.00321.00321.01791.0178
501.00391.00391.02091.0209
1001.02211.02201.08831.0876
2300.98160.98160.98520.9851
500.97930.97930.98290.9829
1000.94340.94350.95870.9584
512 0.5 301.00111.00111.00311.0031
501.00161.00161.00481.0048
1001.00411.00411.01191.0119
1301.00041.00041.00291.0029
501.00071.00071.00481.0048
1001.00231.00231.01391.0139
2300.99480.99480.99240.9924
500.99290.99290.98950.9895
1000.98430.98430.98100.9809
1024 0.5 301.00031.00031.00091.0009
501.00071.00071.00221.0022
1001.00091.00091.00311.0031
1301.00011.00011.00061.0006
501.00021.00021.00171.0017
1001.00031.00031.00251.0025
2300.99840.99840.99730.9973
500.99640.99640.99330.9933
1000.99540.99540.99100.9911
Table 3. RMPE and RMSE values for 200 bootstrapped samples in the analysis of AML data.
Table 3. RMPE and RMSE values for 200 bootstrapped samples in the analysis of AML data.
Criterion p = 1000 p = 2000
RMPE boot 1.001981 1.002278
RMSE boot 1.046073 1.039997
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Arashi, M.; Norouzirad, M.; Roozbeh, M.; Khan, N.M. A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations. Mathematics 2021, 9, 3057. https://doi.org/10.3390/math9233057

AMA Style

Arashi M, Norouzirad M, Roozbeh M, Khan NM. A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations. Mathematics. 2021; 9(23):3057. https://doi.org/10.3390/math9233057

Chicago/Turabian Style

Arashi, Mohammad, Mina Norouzirad, Mahdi Roozbeh, and Naushad Mamode Khan. 2021. "A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations" Mathematics 9, no. 23: 3057. https://doi.org/10.3390/math9233057

APA Style

Arashi, M., Norouzirad, M., Roozbeh, M., & Khan, N. M. (2021). A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations. Mathematics, 9(23), 3057. https://doi.org/10.3390/math9233057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop