The VIF and MSE in Raise Regression

Salmerón Gómez, Román; Rodríguez Sánchez, Ainara; García, Catalina García; García Pérez, José

doi:10.3390/math8040605

Open AccessArticle

The VIF and MSE in Raise Regression

by

Román Salmerón Gómez

¹

,

Ainara Rodríguez Sánchez

²

,

Catalina García García

^1,*

and

José García Pérez

³

¹

Department of Quantitative Methods for Economics and Business, University of Granada, 18010 Granada, Spain

²

Department of Economic Theory and History, University of Granada, 18010 Granada, Spain

³

Department of Economy and Company, University of Almería, 04120 Almería, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(4), 605; https://doi.org/10.3390/math8040605

Submission received: 1 April 2020 / Accepted: 13 April 2020 / Published: 16 April 2020

(This article belongs to the Special Issue Quantitative Methods for Economics and Finance)

Download

Browse Figures

Versions Notes

Abstract

:

The raise regression has been proposed as an alternative to ordinary least squares estimation when a model presents collinearity. In order to analyze whether the problem has been mitigated, it is necessary to develop measures to detect collinearity after the application of the raise regression. This paper extends the concept of the variance inflation factor to be applied in a raise regression. The relevance of this extension is that it can be applied to determine the raising factor which allows an optimal application of this technique. The mean square error is also calculated since the raise regression provides a biased estimator. The results are illustrated by two empirical examples where the application of the raise estimator is compared to the application of the ridge and Lasso estimators that are commonly applied to estimate models with multicollinearity as an alternative to ordinary least squares.

Keywords:

detection; mean square error; multicollinearity; raise regression; variance inflation factor

1. Introduction

In the last fifty years, different methods have been developed to avoid the instability of estimates derived from collinearity (see, for example, Kiers and Smilde [1]). Some of these methods can be grouped within a general denomination known as penalized regression.

In general terms, the penalized regression parts from the linear model (with p variables and n observations),

Y = X β + u

, and obtains the regularization of the estimated parameters, minimizing the following objective function:

{(Y - X β)}^{t} (Y - X β) + P (β),

where

P (β)

is a penalty term that can take different forms. One of the most common penalty terms is the bridge penalty term ([2,3]) is given by

P (β) = λ \sum_{j = 1}^{p} {|β_{j}|}^{α}, α > 0,

where

λ

is a tuning parameter. Note that the ridge ([4]) and the Lasso ([5]) regressions are obtained when

α = 2

and

α = 1

, respectively. Penalties have also been called soft thresholding ([6,7]).

These methods are applied not only for the treatment of multicollinearity but also for the selection of variables (see, for example, Dupuis and Victoria-Feser [8], Li and Yang [9] Liu et al. [10], or Uematsu and Tanaka [11]), which is a crucial issue in many areas of science when the number of variables exceeds the sample size. Zou and Hastie [12] proposed elastic net regularization by using the penalty terms

λ_{1}

and

λ_{2}

that combine the Lasso and ridge regressions:

P (β) = λ_{1} \sum_{j = 1}^{p} |β_{j}| + λ_{2} \sum_{j = 1}^{p} β_{j}^{2} .

Thus, the Lasso regression usually selects one of the regressors from among all those that are highly correlated, while the elastic net regression selects several of them. In the words of Tutz and Ulbricht [13] “the elastic net catches all the big fish”, meaning that it selects the whole group.

From a different point of view, other authors have also presented different techniques and methods well suited for dealing with the collinearity problems: continuum regression ([14]), least angle regression ([15]), generalized maximum entropy ([16,17,18]), the principal component analysis (PCA) regression ([19,20]), the principal correlation components estimator ([21]), penalized splines ([22]), partial least squares (PLS) regression ([23,24]), or the surrogate estimator focused on the solution of the normal equations presented by Jensen and Ramirez [25].

Focusing on collinearity, the ridge regression is one of the more commonly applied methodologies and it is estimated by the following expression:

\hat{β} (K) = {(X^{t} X + K \cdot I)}^{- 1} X^{t} Y

(1)

where

I

is the identity matrix with adequate dimensions and K is the ridge factor (ordinary least squares (OLS) estimators are obtained when

K = 0

). Although ridge regression has been widely applied, it presents some problems with current practice in the presence of multicollinearity and the estimators derived from the penalty come into these same problems whenever

n > p

:

In relation to the calculation of the variance inflation factors (VIF), measures that quantify the degree of multicollinearity existing in a model from the coefficient of determination of the regression between the independent variables (for more details, see Section 2), García et al. [26] showed that the application of the original data when working with the ridge estimate leads to non-monotone VIF values by considering the VIF as a function of the penalty term. Logically, the Lasso and the elastic net regression inherit this property.
By following Marquardt [27]: “The least squares objective function is mathematically independent of the scaling of the predictor variables (while the objective function in ridge regression is mathematically dependent on the scaling of the predictor variables)”. That is to say, the penalized objective function will bring problems derived from the standardization of the variables. This fact has to be taken into account both for obtaining the estimators of the regressors and for the application of measures that detect if the collinearity has been mitigated. Other penalized regressions (such as Lasso and elastic net regressions) are not scale invariant and hence yield different results depending on the predictor scaling used.
Some of the properties of the OLS estimator that are deduced from the normal equations are not verified by the ridge estimator and, among others, the estimated values for the endogenous variable are not orthogonal to the residuals. As a result, the following decomposition is verified

$\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2} = \sum_{i = 1}^{n} {({\hat{Y}}_{i} (K) - \bar{Y})}^{2} + \sum_{i = 1}^{n} e_{i} {(K)}^{2} + 2 \sum_{i = 1}^{n} ({\hat{Y}}_{i} (K) - \bar{Y}) \cdot e_{i} (K) .$

When the OLS estimators are obtained ( $K = 0$ ), the third term is null. However, this term is not null when K is not zero. Consequently, the relationship $T S S (K) = E S S (K) + R S S (K)$ is not satisfied in ridge regression, and the definition of the coefficient of determination may not be suitable. This fact not only limits the analysis of the goodness of fit but also affects the global significance since the critical coefficient of determination is also questioned. Rodríguez et al. [28] showed that the estimators obtained from the penalties mentioned above inherit the problem of the ridge regression in relation to the goodness of fit.

In order to overcome these problems, this paper is focused on the raise regression (García et al. [29] and Salmerón et al. [30]) based on the treatment of collinearity from a geometrical point of view. It consists in separating the independent variables by using the residuals (weighted by the raising factor) of the auxiliary regression traditionally used to obtain the VIF. Salmerón et al. [30] showed that the raise regression presents better conditions than ridge regression and, more recently, García et al. [31] showed, among other questions, that the ridge regression is a particular case of the raise regression.

This paper presents the extension of the VIF to the raise regression showing that, although García et al. [31] showed that the application of the raise regression guarantees a diminishing of the VIF, it is not guaranteed that its value is lower the threshold traditionally established as troubling. Thus, it will be concluded that an unique application of the raise regression does not guarantee the mitigation of the multicollinearity. Consequently, this extension complements the results presented by García et al. [31] and determines, on the one hand, whether it is necessary to apply a successive raise regression (see García et al. [31] for more details) and, on the other hand, the most adequate variable for raising and the most optimal value for the raising factor in order to guarantee the mitigation of the multicollinearity.

On the other hand, the transformation of variables is common when strong collinearity exists in a linear model. The transformation to unit length (see Belsley et al. [32]) or standardization (see Marquardt [27]) is typical. Although the VIF is invariant to these transformations when it is calculated after estimation by OLS (see García et al. [26]), it is not guaranteed either in the case of the raise regression or in ridge regression as showed by García et al. [26]. The analysis of this fact is one of the goals of this paper.

Finally, since the raise estimator is biased, it is interesting to calculate its mean square error (MSE). It is studied whether the MSE of the raise regression is less than the one obtained by OLS. In this case, this study could be used to select an adequate raising factor similar to what is proposed by Hoerl et al. [33] in the case of the ridge regression. Note that estimators with MSE less than the one from OLS estimators are traditionally preferred (see, for example, Stein [34], James and Stein [35], Hoerl and Kennard [4], Ohtani [36], or Hubert et al. [37]). In addition, this measure allows us to conclude whether the raise regression is preferable, in terms of MSE, to other alternative techniques.

The structure of the paper is as follows: Section 2 briefly describes the VIF and the raise regression, and Section 3 extends the VIF to this methodology. Some desirable properties of the VIF are analyzed, and its asymptotic behavior is studied. It is also concluded that the VIF is invariant to data transformation. Section 4 calculates the MSE of the raise estimator, showing that there is a minimum value that is less than the MSE of the OLS estimator. Section 5 illustrates the contribution of this paper with two numerical examples. Finally, Section 6 summarizes the main conclusions of this paper.

2. Preliminaries

2.1. Variance Inflation Factor

The following model for p independent variables and n observations is considered:

Y = β_{1} + β_{2} X_{2} + \dots + β_{i} X_{i} + \dots + β_{p} X_{p} + u = X β + u,

(2)

where

Y

is a vector

n \times 1

that contains the observations of the dependent variable,

X = [1 X_{2} \dots X_{i} \dots X_{p}]

(with

1

being a vector of ones with dimension

n \times 1

) is a matrix with order

n \times p

that contains (by columns) the observations of the independent variables,

β

is a vector

p \times 1

that contains the coefficients of the independent variables, and

u

is a vector

n \times 1

that represents the random disturbance that is supposed to be spherical (

E [u] = 0

and

V a r (u) = σ^{2} I

, where

0

is a vector with zeros with dimension

n \times 1

and

I

the identity matrix with adequate dimensions, in this case

p \times p

).

Given the model in Equation (2), the variance inflation factor (VIF) is obtained as follows:

V I F (k) = \frac{1}{1 - R_{k}^{2}}, k = 2, \dots, p,

(3)

where

R_{k}^{2}

is the coefficient of determination of the regression of the variable

X_{k}

as a function of the rest of the independent variables of the model in Equation (2):

X_{k} = α_{1} + α_{2} X_{2} + \dots + α_{k - 1} X_{k - 1} + α_{k + 1} X_{k + 1} + \dots + α_{p} X_{p} + v = X_{- k} α + v,

(4)

where

X_{- k}

corresponds to the matrix

X

after the elimination of the column k (variable

X_{k}

).

If the variable

X_{k}

has no linear relationship (i.e., is orthogonal) with the rest of the independent variables, the coefficient of determination will be zero (

R_{k}^{2} = 0

) and the

V I F (k) = 1

. As the linear relationship increases, the coefficient of determination (

R_{k}^{2}

) and consequently

V I F (k)

will also increase. Thus, the higher the VIF associated with the variable

X_{k}

, the greater the linear relationship between this variable and the rest of the independent variables in the model in Equation (2). It is considered that the collinearity is troubling for values of VIF higher than 10. Note that the VIF ignores the role of the constant term (see, for example, Salmerón et al. [38] or Salmerón et al. [39]), and consequently, this extension will be useful when the multicollinearity is essential; that is to say, when there is a linear relationship between at least two independent variables of the model of regression without considering the constant term (see, for example, Marquandt and Snee [40] for the definitions of essential and nonessential multicollinearity).

2.2. Raise Regression

Raise regression, presented by García et al. [29] and more developed further by Salmerón et al. [30], uses the residuals of the model in Equation (4),

e_{k}

, to raise the variable k as

{\tilde{X}}_{k} = X_{k} + λ e_{k}

with

λ \geq 0

(called the raising factor) and to verify that

e_{k}^{t} X_{- k} = 0

, where

0

is a vector of zeros with adequate dimensions. In that case, the raise regression consists in the estimation by OLS of the following model:

Y = β_{1} (λ) + β_{2} (λ) X_{2} + \dots + β_{k} (λ) {\tilde{X}}_{k} + \dots + β_{p} (λ) X_{p} + \tilde{u} = \tilde{X} β (λ) + \tilde{u},

(5)

where

\tilde{X} = [1 X_{2} \dots {\tilde{X}}_{k} \dots X_{p}] = [X_{- k} {\tilde{X}}_{k}]

. García et al. [29] showed (Theorem 3.3) that this technique does not alter the global characteristics of the initial model. That is to say, the models in Equations (2) and (5) have the same coefficient of determination and experimental statistics for the global significance test.

Figure 1 illustrates the raise regression for two independent variables being geometrically separated by using the residuals weighted by the raising factor

λ

. Thus, the selection of an adequate value for

λ

is essential, analogously to what occurs with the ridge factor K. A preliminary proposal about how to select the raising factor in a model with two independent standardized variables can be found in García et al. [41]. Other recently published papers introduce and highlight the various advantages of raise estimators for statistical analysis: Salmerón et al. [30] presented the raise regression for

p = 3

standardized variables and showed that it presents better properties than the ridge regression and that the individual inference of the raised variable is not altered, García et al. [31] showed that it is guaranteed that all the VIFs associated with the model in Equation (5) diminish but that it is not possible to quantify the decrease, García and Ramírez [42] presented the successive raise regression, and García et al. [31] showed, among other questions, that ridge regression is a particular case of raise regression.

The following section presents the extension of the VIF to be applied after the estimation by raise regression since it will be interesting whether, after the raising of one independent variable, the VIF falls below 10. It will be also analyzed when a successive raise regression can be recommendable (see García and Ramírez [42]).

3. VIF in Raise Regression

To calculate the VIF in the raise regression, two cases have to be differentiated depending on the dependent variable,

X_{k}

, of the auxiliary regression:

If it is the raised variable, ${\tilde{X}}_{i}$ with $i = 2, \dots, p$ , the coefficient of determination, $R_{i}^{2} (λ)$ , of the following auxiliary regression has to be calculated:

$\begin{matrix} {\tilde{X}}_{i} & = & α_{1} (λ) + α_{2} (λ) X_{2} + \dots + α_{i - 1} (λ) X_{i - 1} + α_{i + 1} (λ) X_{i + 1} + \dots + α_{p} (λ) X_{p} + \tilde{v} \\ = & X_{- i} α (λ) + \tilde{v} . \end{matrix}$

(6)
If it is not the raised variable, $X_{j}$ with $j = 2, \dots, p$ being $j \neq i$ , the coefficient of determination, $R_{j}^{2} (λ)$ , of the following auxiliary regression has to be calculated:

$\begin{matrix} X_{j} & = & α_{1} (λ) + α_{2} (λ) X_{2} + \dots + α_{i} (λ) {\tilde{X}}_{i} + \dots + α_{j - 1} (λ) X_{j - 1} + α_{j + 1} (λ) X_{j + 1} \\ + \dots + α_{p} (λ) X_{p} + \tilde{v} \\ = & (X_{- i, - j} {\tilde{X}}_{i}) (\begin{matrix} α_{- i, - j} (λ) \\ α_{i} (λ) \end{matrix}) + \tilde{v}, \end{matrix}$

(7)

where $X_{- i, - j}$ corresponding to the matrix $X$ after the elimination of columns i and j (variables $X_{i}$ and $X_{j}$ ). The same notation is used for $α_{- i, - j} (λ)$ .

Once these coefficients of determination are obtained (as indicated in the following subsections), the VIF of the raise regression will be given by the following:

V I F (k, λ) = \frac{1}{1 - R_{k}^{2} (λ)}, k = 2, \dots, p .

(8)

3.1. VIF Associated with Raise Variable

In this case, for

i = 2, \dots, p

, the coefficient of determination of the regression in Equation (6) is given by

\begin{matrix} R_{i}^{2} (λ) & = & 1 - \frac{(1 + 2 λ + λ^{2}) R S S_{i}^{- i}}{T S S_{i}^{- i} + (λ^{2} + 2 λ) R S S_{i}^{- i}} = \frac{E S S_{i}^{- i}}{T S S_{i}^{- i} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \\ = & \frac{R_{i}^{2}}{1 + (λ^{2} + 2 λ) (1 - R_{i}^{2})}, \end{matrix}

(9)

since:

\begin{matrix} T S S_{i}^{- i} (λ) & = & {\tilde{X}}_{i}^{t} {\tilde{X}}_{i} - n \cdot {\bar{\tilde{X}}}_{i}^{2} = X_{i}^{t} X_{i} + (λ^{2} + 2 λ) e_{i}^{t} e_{i} - n \cdot {\bar{X}}_{i}^{2} \\ = & T S S_{i}^{- i} + (λ^{2} + 2 λ) R S S_{i}^{- i}, \\ R S S_{i}^{- i} (λ) & = & {\tilde{X}}_{i}^{t} {\tilde{X}}_{i} - \hat{α} {(λ)}^{t} X_{- i}^{t} {\tilde{X}}_{i} = X_{i}^{t} X_{i} + (λ^{2} + 2 λ) e_{i}^{t} e_{i} - {\hat{α}}^{t} X_{- i}^{t} X_{i} \\ = & (λ^{2} + 2 λ + 1) R S S_{i}^{- i}, \end{matrix}

where

T S S_{i}^{- i}

,

E S S_{i}^{- i}

and

R S S_{i}^{- i}

are the total sum of squares, explained sum of squares, and residual sum of squares of the model in Equation (4). Note that it has been taken into account that

{\tilde{X}}_{i}^{t} {\tilde{X}}_{i} = {(X_{i} + λ e_{i})}^{t} (X_{i} + λ e_{i}) = X_{i}^{t} X_{i} + (λ^{2} + 2 λ) e_{i}^{t} e_{i},

since

e_{i}^{t} X_{i} = e_{i}^{t} e_{i} = R S S_{i}^{- i}

and

\hat{α} (λ) = {(X_{- i}^{t} X_{- i})}^{- 1} X_{- i}^{t} {\tilde{X}}_{i} = \hat{α},

due to

X_{- i}^{t} {\tilde{X}}_{i} = X_{- i}^{t} X_{i}

.

Indeed, from Equation (9), it is evident that

$R_{i}^{2} (λ)$ decreases as $λ$ increases.
$lim_{λ \to + \infty} R_{i}^{2} (λ) = 0$ .
$R_{i}^{2} (λ)$ is continuous in zero; that is to say, $R_{i}^{2} (0) = R_{i}^{2}$ .

Finally, from properties 1) and 3), it is deduced that

R_{i}^{2} (λ) \leq R_{i}^{2}

for all

λ

.

3.2. VIF Associated with Non-Raised Variables

In this case, for

j = 2, \dots, p,

with

j \neq i

, the coefficient of determination of regression in Equation (7) is given by

\begin{matrix} R_{j}^{2} (λ) & = & 1 - \frac{R S S_{j}^{- j} (λ)}{T S S_{j}^{- j} (λ)} \\ = & \frac{1}{T S S_{j}^{- j}} (T S S_{j}^{- j} - R S S_{j}^{- i, - j} + \frac{R S S_{i}^{- i, - j} \cdot (R S S_{j}^{- i, - j} - R S S_{j}^{- j})}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}}), \end{matrix}

(10)

Taking into account that

{\tilde{X}}_{i}^{t} X_{j} = {(X_{i} + λ e_{i})}^{t} X_{j} = X_{i}^{t} X_{j}

since

e_{i}^{t} X_{j} = 0

, it is verified that

T S S_{j}^{- j} (λ) = X_{j}^{t} X_{j} - n \cdot {\bar{X}}_{j}^{2} = T S S_{j}^{- j},

and, from Appendix A and Appendix B,

\begin{matrix} R S S_{j}^{- j} (λ) & = & X_{j}^{t} X_{j} - \hat{α} {(λ)}^{t} (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ {\tilde{X}}_{i}^{t} X_{j} \end{matrix}) \\ = & X_{j}^{t} X_{j} - {\hat{α}}_{- i, - j} {(λ)}^{t} X_{- i, - j}^{t} X_{j} - {\hat{α}}_{i} {(λ)}^{t} X_{i}^{t} X_{j} \\ \underset{Appendix A}{\underset{︸}{=}} & X_{j}^{t} X_{j} - X_{j}^{t} X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{j} \\ - \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}} \cdot \\ \cdot (R S S_{i}^{- i, - j} X_{j}^{t} X_{- i, - j} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} \\ + X_{j}^{t} X_{i} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j}) \\ - \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}} \cdot {\hat{α}}_{i}^{t} X_{i}^{t} X_{j} \\ = & X_{j}^{t} (I - X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t}) X_{j} \\ - \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}} \cdot \\ \cdot (R S S_{i}^{- i, - j} X_{j}^{t} X_{- i, - j} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} \\ + X_{j}^{t} X_{i} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} + {\hat{α}}_{i}^{t} X_{i}^{t} X_{j}) \\ \underset{Appendix B}{\underset{︸}{=}} & R S S_{j}^{- i, - j} \\ - \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}} \cdot (R S S_{j}^{- i, - j} - R S S_{j}^{- j}), \end{matrix}

where

T S S_{j}^{- j}

and

R S S_{j}^{- j}

are the total sum of squares and residual sum of squares of the model in Equation (4) and where

R S S_{i}^{- i, - j}

and

R S S_{j}^{- i, - j}

are the residual sums of squares of models:

\begin{matrix} X_{i} & = & X_{- i, - j} γ + η, \end{matrix}

(11)

\begin{matrix} X_{j} & = & X_{- i, - j} δ + ν . \end{matrix}

(12)

Indeed, from Equation (10), it is evident that

$R_{j}^{2} (λ)$ decreases as $λ$ increases.
$lim_{λ \to + \infty} R_{j}^{2} (λ) = \frac{T S S_{j}^{- j} - R S S_{j}^{- i, - j}}{T S S_{j}^{- j}}$ .
$R_{j}^{2} (λ)$ is continuous in zero. That is to say, $R_{j}^{2} (0) = \frac{T S S_{j}^{- j} - R S S_{j}^{- j}}{T S S_{j}^{- j}} = R_{j}^{2}$ .

Finally, from properties 1) and 3), it is deduced that

R_{j}^{2} (λ) \leq R_{j}^{2}

for all

λ

.

3.3. Properties of $V I F (k, λ)$

From conditions verified by the coefficient of determination in Equations (9) and (10), it is concluded that

V I F (k, λ)

(see expression Equation (8)), verifies that

The VIF associated with the raise regression is continuous in zero because the coefficients of determination of the auxiliary regressions in Equations (6) and (7) are also continuous in zero. That is to say, for $λ = 0$ , it coincides with the VIF obtained for the model in Equation (2) when it is estimated by OLS:

$V I F (k, 0) = \frac{1}{1 - R_{k}^{2} (0)} = \frac{1}{1 - R_{k}^{2}} = V I F (k), k = 2, \dots, p .$
The VIF associated with the raise regression decreases as $λ$ increases since this is the behavior of the coefficient of determination of the auxiliary regressions in Equations (6) and (7). Consequently,

$V I F (k, λ) = \frac{1}{1 - R_{k}^{2} (λ)} \leq \frac{1}{1 - R_{k}^{2}} = V I F (k), k = 2, \dots, p, \forall λ \geq 0 .$
The VIF associated with the raised variable is always higher than one since

$lim_{λ \to + \infty} V I F (i, λ) = lim_{λ \to + \infty} \frac{1}{1 - R_{i}^{2} (λ)} = \frac{1}{1 - 0} = 1, i = 2, \dots, p .$
The VIF associated with the non-raised variables has a horizontal asymptote since

$\begin{matrix} lim_{λ \to + \infty} V I F (j, λ) & = & lim_{λ \to + \infty} \frac{1}{1 - R_{j}^{2} (λ)} = \frac{1}{1 - \frac{T S S_{j}^{- j} - R S S_{j}^{- i, - j}}{T S S_{j}^{- j}}} \\ = & \frac{T S S_{j}^{- j}}{R S S_{j}^{- i, - j}} = \frac{T S S_{j}^{- i, - j}}{R S S_{j}^{- i, - j}} = \frac{1}{1 - R_{i j}^{2}} = V I F_{- i} (j), \end{matrix}$

where $R_{i j}^{2}$ is the coefficient of determination of the regression in Equation (12) for $j = 2, \dots, p$ and $j \neq i$ . Indeed, this asymptote corresponds to the VIF, $V I F_{- i} (j)$ , of the regression $Y = X_{- i} ξ + w$ and, consequently, will also always be equal to or higher than one.

Thus, from properties (1) to (4),

V I F (k, λ)

has the very desirable properties of being continuous, monotone in the raise parameter, and higher than one, as presented in García et al. [26].

In addition, the property (4) can be applied to determine the variable to be raised only considering the one with a lower horizontal asymptote. If the asymptote is lower than 10 (the threshold established traditionally as worrying), the extension could be applied to determine the raising factor by selecting, for example, the first

λ

that verifies

V I F (k, λ) < 10

for

k = 2, \dots, p

. If none of the

p - 1

asymptotes is lower than the established threshold, it will not be enough to raise one independent variable and a successive raise regression will be recommended (see García and Ramírez [42] and García et al. [31] for more details). Note that, if it were necessary to raise more than one variable, it is guaranteed that there will be values of the raising parameter that mitigate multicollinearity since, in the extreme case where all the variables of the model are raised, all the VIFs associated with the raised variables tend to one.

3.4. Transformation of Variables

The transformation of data is very common when working with models where strong collinearity exists. For this reason, this section analyzes whether the transformation of the data affects the VIF obtained in the previous section.

Since the expression given by Equation (9) can be expressed with

i = 2, \dots, p

in the function of

R_{i}^{2}

:

R_{i}^{2} (λ) = \frac{R_{i}^{2}}{1 + (λ^{2} + 2 λ) \cdot (1 - R_{i}^{2})},

it is concluded that it is invariant to origin and scale changes and, consequently, the VIF calculated from it will also be invariant.

On the other hand, the expression given by Equation (10) can be expressed for

j = 2, \dots, p,

with

j \neq i

as

\begin{matrix} R_{j}^{2} (λ) & = & 1 - \frac{R S S_{j}^{- i, - j}}{T S S_{j}^{- j}} + \frac{1}{T S S_{j}^{- j}} \cdot \frac{R S S_{i}^{- i, - j} \cdot (R S S_{j}^{- i, - j} - R S S_{j}^{- j})}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}} \\ = & R_{i j}^{2} + \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) \cdot R S S_{i}^{- i}} \cdot (\frac{R S S_{j}^{- i, - j}}{T S S_{j}^{- i, - j}} - \frac{R S S_{j}^{- j}}{T S S_{j}^{- j}}) \\ = & R_{i j}^{2} + \frac{R_{j}^{2} - R_{i j}^{2}}{1 + (λ^{2} + 2 λ) \cdot \frac{R S S_{i}^{- i}}{R S S_{i}^{- i, - j}}}, \end{matrix}

(13)

where it was applied that

T S S_{j}^{- j} = T S S_{j}^{- i, - j}

.

In this case, by following García et al. [26], transforming the variable

X_{i}

as

x_{i} = \frac{X_{i} - a_{i}}{b_{i}}, a_{i} \in R, b_{i} \in R - {0}, i = 2, \dots, p,

it is obtained that

R S S_{i}^{- i} (T) = \frac{1}{b_{i}^{2}} R S S_{i}^{- i}

and

R S S_{i}^{- i, - j} (T) = \frac{1}{b_{i}^{2}} R S S_{i}^{- i, - j}

where

R S S_{i}^{- i} (T)

and

R S S_{i}^{- i, - j} (T)

are the residual sum of squares of the transformed variables.

Taking into account that

X_{i}

is the dependent variables in the regressions of

R S S_{i}^{- i}

and

R S S_{i}^{- i, - j}

, the following is obtained:

\frac{R S S_{i}^{- i}}{R S S_{i}^{- i, - j}} = \frac{R S S_{i}^{- i} (T)}{R S S_{i}^{- i, - j} (T)} .

Then, the expression given by Equation (13) is invariant to data transformations (As long as the dependent variables are transformed from the regressions of

R S S_{i}^{- i}

and

R S S_{i}^{- i, - j}

in the same form. For example, (a) for considering that

a_{i}

is its mean and

b_{i}

is its standard deviation (typification), (b) for considering that

a_{i}

is its mean and

b_{i}

is its standard deviation multiplied by the square root of the number of observations (standardization), or (c) for considering that

a_{i}

is zero and

b_{i}

is the square root of the squares sum of observations (unit length).) and, consequently, the VIF calculated from it will also be invariant.

4. MSE for Raise Regression

Since the estimator

β

obtained from Equation (5) is biased, it is interesting to study its Mean Square Error (MSE).

Taking into account that, for

k = 2, \dots, p

,

\begin{matrix} {\tilde{X}}_{k} & = & X_{k} + λ e_{k} \\ = & (1 + λ) X_{k} - λ ({\hat{α}}_{0} + {\hat{α}}_{1} X_{1} + \dots + {\hat{α}}_{k - 1} X_{k - 1} + {\hat{α}}_{k + 1} X_{k + 1} + \dots + {\hat{α}}_{p} X_{p}), \end{matrix}

it is obtained that matrix

\tilde{X}

of the expression in Equation (5) can be rewritten as

\tilde{X} = X \cdot M_{λ}

, where

M_{λ} = (\begin{matrix} 1 & 0 & \dots & 0 & - λ {\hat{α}}_{0} & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 & - λ {\hat{α}}_{1} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 1 & - λ {\hat{α}}_{k - 1} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 1 + λ & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & - λ {\hat{α}}_{k + 1} & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & - λ {\hat{α}}_{p} & 0 & \dots & 1 \end{matrix}) .

(14)

Thus, we have

\hat{β} (λ) = {({\tilde{X}}^{t} \cdot \tilde{X})}^{- 1} {\tilde{X}}^{t} \cdot Y = M_{λ}^{- 1} \cdot \hat{β}

, and then, the estimator of

β

obtained from Equation (5) is biased unless

M_{λ} = I

, which only occurs when

λ = 0

, that is to say, when the raise regression coincides with OLS. Moreover,

\begin{matrix} t r (V a r (\hat{β} (λ))) & = & t r (M_{λ}^{- 1} \cdot V a r (\hat{β}) \cdot {(M_{λ}^{- 1})}^{t}) = σ^{2} t r ({({\tilde{X}}^{t} \tilde{X})}^{- 1}), \\ {(E [\hat{β} (λ)] - β)}^{t} (E [\hat{β} (λ)] - β) & = & β^{t} {(M_{λ}^{- 1} - I)}^{t} (M_{λ}^{- 1} - I) β, \end{matrix}

where

t r

denotes the trace of a matrix.

In that case, the MSE for raise regression is

\begin{matrix} MSE (\hat{β} (λ)) & = & t r (V a r (\hat{β} (λ))) + {(E [\hat{β} (λ)] - β)}^{t} (E [\hat{β} (λ)] - β) \\ = & σ^{2} t r ({({\tilde{X}}^{t} \tilde{X})}^{- 1}) + β^{t} {(M_{λ}^{- 1} - I)}^{t} (M_{λ}^{- 1} - I) β \\ \underset{Appendix C}{\underset{︸}{=}} & σ^{2} t r ({(X_{- k}^{t} X_{- k})}^{- 1}) + (1 + \sum_{j = 0, j \neq k}^{p} {\hat{α}}_{j}^{2}) \cdot β_{k}^{2} \cdot \frac{λ^{2} + h}{{(1 + λ)}^{2}}, \end{matrix}

where

h = \frac{σ^{2}}{β_{k}^{2} \cdot R S S_{k}^{- k}}

.

We can obtain the MSE from the estimated values of

σ^{2}

and

β_{k}

from the model in Equation (2).

On the other hand, once the estimations are obtained and taking into account the Appendix C,

λ_{m i n} = \frac{{\hat{σ}}^{2}}{{\hat{β}}_{k}^{2} \cdot R S S_{k}^{- k}}

minimizes

MSE (\hat{β} (λ))

. Indeed, it is verified that

MSE (\hat{β} (λ_{m i n})) < MSE (\hat{β} (0))

; that is to say, if the goal is exclusively to minimize the MSE (as in the work presented by Hoerl et al. [33]),

λ_{m i n}

should be selected as the raising factor.

Finally, note that, if

λ_{m i n} > 1

, then

MSE (\hat{β} (λ)) < MSE (\hat{β} (0))

for all

λ > 0

.

5. Numerical Examples

To illustrate the results of previous sections, two different set of data will be used that collect the two situations shown in the graphs of Figure A1 and Figure A2. The second example also compares results obtained by the raise regression to results obtained by the application of ridge and Lasso regression.

5.1. Example 1: $h < 1$

The data set includes different financial variables for 15 Spanish companies for the year 2016 (consolidated account and results between €800,000 and €9,000,000) obtained from the dabase Sistema de Análisis de Balances Ibéricos (SABI) database. The relationship is studied between the number of employees,

E

, and the fixed assets (€),

FA

; operating income (€),

OI

; and sales (€),

S

. The model is expressed as

E = β_{1} + β_{2} FA + β_{3} OI + β_{4} S + u .

(15)

Table 1 displays the results of the estimation by OLS of the model in Equation (15). The presence of essential collinearity in the model in Equation (15) is indicated by the determinant close to zero (0.0000919) of the correlation matrix of independent variables

R = (\begin{matrix} 1 & 0.7264656 & 0.7225473 \\ 0.7264656 & 1 & 0.9998871 \\ 0.7225473 & 0.9998871 & 1 \end{matrix}),

and the VIFs (2.45664, 5200.315, and 5138.535) higher than 10. Note that the collinearity is provoked fundamentally by the relationship between OI and S.

In contrast, due to the fact that the coefficients of variation of the independent variables (1.015027, 0.7469496, and 0.7452014) are higher than 0.1002506, the threshold established as troubling by Salmerón et al. [39], it is possible to conclude that the nonessential multicollinearity is not troubling. Thus, the extension of the VIF seems appropriate to check if the application of the raise regression has mitigated the multicollinearity.

Remark 1.

λ^{(1)}

and

λ^{(2)}

will be the raising factor of the first and second raising, respectively.

5.1.1. First Raising

A possible solution could be to apply the raise regression to try to mitigate the collinearity. To decide which variable is raised, the thresholds for the VIFs associated with the raise regression are calculated with the goal of raising the variable that the smaller horizontal asymptotes present. In addition to raising the variable that presents the lowest VIF, it would be interesting to obtain a lower mean squared error (MSE) after raising. For this, the

λ_{m i n}^{(1)}

is calculated for each case. Results are shown in Table 2. Note that the variable to be raised should be the second or third since their asymptotes are lower than 10, although in both cases

λ_{m i n}^{(1)}

is lower than 1 and it is not guaranteed that the MSE of the raise regression will be less than the one obtained from the estimation by the OLS of the model in Equation (15). For this reason, this table also shows the values of

λ^{(1)}

that make the MSE of the raise regression coincide with the MSE of the OLS regression,

λ_{m s e}^{(1)}

, and the minimum value of

λ^{(1)}

that leads to values of VIF less than 10,

λ_{v i f}^{(1)}

.

Figure 2 displays the VIF associated with the raise regression for

0 \leq λ^{(1)} \leq 900

after raising the second variable. It is observed that VIFs are always higher than its corresponding horizontal asymptotes.

The model after raising the second variable will be given by

E = β_{1} (λ) + β_{2} (λ) FA + β_{3} (λ) \tilde{OI} + β_{4} (λ) S + \tilde{u},

(16)

where

\tilde{OI} = OI + λ^{(1)} \cdot e_{OI}

with

e_{OI}

the residual of regression:

OI = α_{1} + α_{2} FA + α_{3} S + v .

Remark 2.

The coefficient of variation of

\tilde{OI}

for

λ^{(1)} = 24.5

is equal to 0.7922063; that is to say, it was lightly increased.

As can be observed from Table 3, in Equation (16), the collinearity is not mitigated by considering

λ^{(1)}

equal to

λ_{m i n}^{(1)}

and

λ_{m s e}^{(1)}

. For this reason, Table 1 only shows the values of the model in Equation (16) for the value of

λ^{(1)}

that leads to VIF lower than 10.

5.1.2. Transformation of Variables

After the first raising, it is interesting to verify that the VIF associated with the raise regression is invariant to data transformation. With this goal, the second variable has been raised, obtaining the

V I F (FA, λ^{(1)})

,

V I F (\tilde{OI}, λ^{(1)})

, and

V I F (S, λ^{(1)})

for

λ^{(1)} \in {0, 0.5, 1, 1.5, 2, \dots, 95, 10}

, supposing original, unit length, and standardized data. Next, the three possible differences and the average of the VIF associated with each variable are obtained. Table 4 displays the results from which it is possible to conclude that differences are almost null and that, consequently, the VIF associated with the raise regression is invariant to the most common data transformation.

5.1.3. Second Raising

After the first raising, we can use the results obtained from the value of

λ

that obtains all VIFs less than 10 or consider the results obtained for

λ_{m i n}

or

λ_{m s e}

and continue the procedure with a second raising. By following the second option, we part from the value of

λ^{(1)} = λ_{m i n}^{(1)} = 0.42

obtained after the first raising. From Table 5, the third variable is selected to be raised. Table 6 shows the VIF associated with the following model for

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

:

E = β_{1} (λ) + β_{2} (λ) FA + β_{3} (λ) \tilde{OI} + β_{4} (λ) \tilde{S} + \tilde{u},

(17)

where

\tilde{S} = S + λ^{(2)} \cdot e_{S}

with

e_{S}

the residuals or regression:

S = α_{1} (λ) + α_{2} (λ) FA + α_{3} (λ) \tilde{OI} + \tilde{v} .

Remark 3.

The coefficient of variation of

\tilde{OI}

for

λ^{(1)} = 0.42

is equal to 0.7470222, and the coefficient of variation of

\tilde{S}

for

λ^{(2)} = 17.5

is equal to 0.7473472. In both cases, they were slightly increased.

Note than it is only possible to state that collinearity has been mitigated when

λ^{(2)} = λ_{v i f}^{(2)} = 17.5

. Results of this estimation are displayed in Table 1.

Considering that, after the first raising, it is obtained that

λ^{(1)} = λ_{m s e}^{(1)} = 1.43

, from Table 7, the third variable is selected to be raised. Table 8 shows the VIF associated with the following model for

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

:

E = β_{1} (λ) + β_{2} (λ) FA + β_{3} (λ) \tilde{OI} + β_{4} (λ) \tilde{S} + \tilde{u},

(18)

where

\tilde{S} = S + λ \cdot e_{S}

.

Remark 4.

The coefficient of variation of

\tilde{OI}

for

λ^{(1)} = 1.43

is equal to 0.7473033, and the coefficient of variation of

\tilde{S}

for

λ^{(2)} = 10

is equal to 0.7651473. In both cases, they were lightly increased.

Remark 5.

Observing the coefficients of variation of

\tilde{OI}

for different raising factor. it is concluded that the coefficient of variation increases as the raising factor increases: 0.7470222 (

λ = 0.42

), 0.7473033 (

λ = 1.43

), and 0.7922063 (

λ = 24.5

).

Note that it is only possible to state that collinearity has been mitigated when

λ^{(2)} = λ_{v i f}^{(2)} = 10

. Results of the estimations of this model are shown in Table 1.

5.1.4. Interpretation of Results

Analyzing the results of Table 1, it is possible to conclude that

In the model in Equation (16) (in which the second variable is raised considering the smallest $λ$ that makes all the VIFs less than 10, $λ^{(1)} = 24.5$ ), the variable sales have a coefficient significantly different from zero, where in the original model this was not the case. In this case, the MSE is superior to the one obtained by OLS.
In the model in Equation (17) (in which the second variable is raised considering the value of $λ$ that minimizes the MSE, $λ^{(1)} = 0.42$ , and after that, the third variable is raised considering the smallest $λ$ that makes all the VIFs less than 10, $λ^{(2)} = 17.5$ ), there is no difference in the individual significance of the coefficient.
In the model in Equation (18) (in which the second variable is raised considering the value of $λ$ that makes the MSE of the raise regression coincide with that of OLS, $λ^{(1)} = 1.43$ , and next, the third variable is raised considering the smallest $λ$ that makes all the VIFs less than 10, $λ^{(2)} = 10$ ), there is no difference in the individual significance of the coefficient.
Although the coefficient of variable $OI$ is not significantly different from zero in any case, the not expected negative sign obtained in model in Equation (15) is corrected in models Equations (17) and (18).
In the models with one or two raisings, all the global characteristics coincide with that of the model in Equation (15). Furthermore, there is a relevant decrease in the estimation of the standard deviation for the second and third variable.
In models with one or two raisings, the MSE increases, with the model in Equation (16) being the one that presents the smallest MSE among the biased models.

Thus, in conclusion, the model in Equation (16) is selected as it presents the smallest MSE and there is an improvement in the individual significance of the variables.

5.2. Example 2: $h > 1$

This example uses the following model previously applied by Klein and Goldberger [43] about consumption and salaries in the United States from 1936 to 1952 (1942 to 1944 were war years, and data are not available):

C = β_{1} + β_{2} WI + β_{3} NWI + β_{4} FI + u,

(19)

where

C

is consumption,

WI

is wage income,

NWI

is non-wage, non-farm income, and

FI

is the farm income. Its estimation by OLS is shown in Table 9.

However, this estimation is questionable since no estimated coefficient is significantly different to zero while the model is globally significant (with 5% significance level), and the VIFs associated with each variable (12.296, 9.23, and 2.97) indicate the presence of severe essential collinearity. In addition, the determinant of the matrix of correlation

R = (\begin{matrix} 1 & 0.9431118 & 0.8106989 \\ 0.9431118 & 1 & 0.7371272 \\ 0.8106989 & 0.7371272 & 1 \end{matrix}),

is equal to 0.03713592 and, consequently, lower than the threshold recommended by García et al. [44] (

1.013 \cdot 0.1 + 0.00008626 \cdot n - 0.01384 \cdot p = 0.04714764

being

n = 14

and

p = 4

); it is maintained the conclusion that the near multicollinearity existing in this model is troubling.

Once again, the values of the coefficients of variation (0.2761369, 0.2597991, and 0.2976122) indicate that the nonessential multicollinearity is not troubling (see Salmerón et al. [39]). Thus, the extension of the VIF seems appropriate to check if the application of the raise regression has mitigated the near multicollinearity.

Next, it is presented the estimation of the model by raise regression and the results are compared to the estimation by ridge and Lasso regression.

5.2.1. Raise Regression

When calculating the thresholds that would be obtained for VIFs by raising each variable (see Table 10), it is observed that, in all cases, they are less than 10. However, when calculating

λ_{m i n}

in each case, a value higher than one is only obtained when raising the third variable. Figure 3 displays the MSE for

λ \in [0, 37)

. Note that

M S E (\hat{β} (λ))

is always less than the one obtained by OLS, 49.434, and presents an asymptote in

lim_{λ \to + \infty} M S E (\hat{β} (λ)) = 45.69422

.

The following model is obtained by raising the third variable:

C = β_{1} (λ) + β_{2} (λ) WI + β_{3} (λ) NWI + β_{4} (λ) \tilde{FI} + \tilde{u},

(20)

where

\tilde{FI} = FI + λ \cdot e_{FI}

being

e_{FI}

the residuals of regression:

FI = α_{1} + α_{2} WI + α_{3} NWI + v .

Remark 6.

The coefficient of variation

\tilde{FI}

for

λ^{(1)} = 6.895

is 1.383309. Thus, the application of the raise regression has mitigated the nonessential multicollinearity in this variable.

Table 9 shows the results for the model in Equation (20), being

λ = 6.895

. In this case, the MSE is the lowest possible for every possible value of

λ

and lower than the one obtained by OLS for the model in Equation (19). Furthermore, in this case, the collinearity is not strong once all the VIF are lower than 10 (9.098, 9.049, and 1.031, respectively). However, the individual significance in the variable was not improved.

With the purpose of improving this situation, another variable is raised. If the first variable is selected to be raised, the following model is obtained:

C = β_{1} (λ) + β_{2} (λ) \tilde{WI} + β_{3} (λ) NWI + β_{4} (λ) FI + \tilde{u},

(21)

where

\tilde{WI} = WI + λ \cdot e_{WI}

being

e_{WI}

the residuals of regression:

WI = α_{1} + α_{2} NWI + α_{3} FI + v .

Remark 7.

The coefficient of variation of

\tilde{WI}

for

λ^{(1)} = 0.673

is 0.2956465. Thus, it is noted that the raise regression has lightly mitigated the nonessential mutlicollinearity of this variable.

Table 9 shows the results for the model in Equation (21), being

λ = 0.673

. In this case, the MSE is lower than the one obtained by OLS for the model in Equation (19). Furthermore, in this case, the collinearity is not strong once all the VIF are lower than 10 (5.036024, 4.705204, and 2.470980, respectively). Note that raising this variable, the values of VIFs are lower than raising the first variable but the MSE is higher. However, this model is selected as preferable due to the individual significance being better in this model and the MSE being lower than the one obtained by OLS.

5.2.2. Ridge Regression

This subsection presents the estimation of the model in Equation (19) by ridge regression (see Hoerl and Kennard [4] or Marquardt [45]). The first step is the selection of the appropriate value of K.

The following suggestions are addressed:

Hoerl et al. [33] proposed the value of $K_{H K B} = p \cdot \frac{{\hat{σ}}^{2}}{{\hat{β}}^{t} \hat{β}}$ since probability higher than 50% leads to a MSE lower than the one from OLS.
García et al. [26] proposed the value of K, denoted as $K_{V I F}$ , that leads to values of VIF lower than 10 (threshold traditionally established as troubling).
García et al. [44] proposed the following values:

$\begin{matrix} K_{e x p} & = & 0.006639 \cdot e^{1 - d e t (R)} - 0.00001241 \cdot n + 0.005745 \cdot p, \\ K_{l i n e a r} & = & 0.01837 \cdot (1 - d e t (R)) - 0.00001262 \cdot n + 0.005678 \cdot p, \\ K_{s q} & = & 0.7922 \cdot {(1 - d e t (R))}^{2} - 0.6901 \cdot (1 - d e t (R)) - 0.000007567 \cdot n \\ - 0.01081 \cdot p, \end{matrix}$

where $d e t (R)$ denotes the determinant of the matrix of correlation, $R$ .

The following values are obtained

K_{H K B} = 0.417083

,

K_{V I F} = 0.013

,

K_{e x p} = 0.04020704

,

K_{l i n e a r} = 0.04022313

, and

K_{s q} = 0.02663591

.

Table 11 and Table 12 show (The results for

K_{l i n e a r}

are not considered as they are very similar to results obtained by

K_{e x p}

.) the estimations obtained from ridge estimators (expression (1)) and the individual significance intervals obtained by bootstrap considering percentiles 5 and 95 for 5000 repeats. It is also calculated the goodness of the fit by following the results shown by Rodríguez et al. [28] and the MSE.

Note that only the constant term can be considered significatively different to zero and that, curiously, the value of K proposed by Hoerl et al. [33] leads to a value of MSE higher than the one from OLS while the values proposed by García et al. [26] and García et al. [44] lead to a value of MSE lower than the one obtained by OLS. All cases lead to values of VIF lower than 10; see García et al. [26] for its calculation:

\begin{matrix} 2.0529, 1.8933 and 1.5678 & for & K_{H K B}, \\ 9.8856, 7.5541 and 2.7991 & for & K_{V I F}, \\ 7.1255, 5.6191 and 2.5473 & for & K_{e x p}, \\ 8.2528, 6.4123 and 2.65903 & for & K_{s q} . \end{matrix}

In any case, the lack of individual significance justifies the selection of the raise regression as preferable in comparison to the models obtained by ridge regression.

5.2.3. Lasso Regression

The Lasso regression (see Tibshirani [5]) is a method initially designed to select variables constraining the coefficient to zero, being specially useful in models with a high number of independent variables. However, this estimation methodology has been widely applied in situation where the model presents worrying near multicollinearity.

Table 13 shows results obtained by the application of the Lasso regression to the model in Equation (19) by using the package glmnet of the programming environment R Core Team [46]. Note that these estimations are obtained for the optimal value of

λ = 0.1258925

obtained after a k-fold cross-validation.

The inference obtained by bootstrap methodology (with 5000 repeats) allows us to conclude that in, at least, the 5% of the cases, the coefficient of NWI is constrained to zero. Thus, this variable should be eliminated from the model.

However, we consider that this situation should be avoided, and as an alternative to the elimination of variable, that is, as an alternative from the following model, the estimation by raise or ridge regression is proposed.

C = π_{1} + π_{2} WI + π_{3} FI + ϵ,

(22)

It could be also appropriate to apply the residualization method (see, for example, York [47], Salmerón et al. [48], and García et al. [44]), which consists in the estimation of the following model:

C = τ_{1} + τ_{2} WI + τ_{3} FI + τ_{4} {res}_{NWI} + ε,

(23)

where, for example,

{res}_{NWI}

represents the residuals of the regression of

NWI

as a function of

WI

that will be interpreted as the part of

NWI

not related to

WI

. In this case (see García et al. [44]), it is verified that

{\hat{π}}_{i} = {\hat{τ}}_{i}

for

i = 1, 2, 3

. That is to say, the model in Equation (23) estimates the same relationship between

WI

and

FI

with

C

as in the model in Equation (22) with the benefit that the variable

NWI

is not eliminated due to a part of it being considered.

6. Conclusions

The Variance Inflation Factor (VIF) is one of the most applied measures to diagnose collinearity together with the Condition Number (CN). Once the collinearity is detected, different methodologies can be applied as, for example, the raise regression, but it will be required to check if the methodology has mitigated the collinearity effectively. This paper extends the concept of VIF to be applied after the raise regression and presents an expression of the VIF that verifies the following desirable properties (see García et al. [26]):

continuous in zero. That is to say, when the raising factor ( $λ$ ) is zero, the VIF obtained in the raise regression coincides with the one obtained by OLS;
decreasing as a function of the raising factor ( $λ$ ). That is to say, the degree of collinearity diminishes as $λ$ increases, and
always equal or higher than 1.

The paper also shows that the VIF in the raise regression is scale invariant, which is a very common transformation when working with models with collinearity. Thus, it yields identical results regardless of whether predictions are based on unstandardized or standardized predictors. Contrarily, the VIFs obtained from other penalized regressions (ridge regression, Lasso, and Elastic Net) are not scale invariant and hence yield different results depending on the predictor scaling used.

Another contribution of this paper is the analysis of the asymptotic behavior of the VIF associated with the raised variable (verifying that its limit is equal to 1) and associated with the rest of the variables (presenting an horizontal asymptote). This analysis allows to conclude that

It is possible to know a priori how far each of the VIFs can decrease simply by calculating their horizontal asymptote. This could be used as a criterion to select the variable to be raised, the one with the lowest horizontal asymptote being chosen.
If there is asymptote under the threshold established as worrying, the extension of the VIF can be applied to select the raising factor considering the value of $λ$ that verifies $V I F (k, λ) < 10$ for $k = 2, \dots, p$ .
It is possible that the collinearity is not mitigated with any value of $λ$ . This can happen when at least one horizontal asymptote is greater than the threshold. In that case, a second variable has to be raised. García and Ramírez [42] and García et al. [31] show the successive raising procedure.

On the other hand, since the raise estimator is biased, the paper analyzes its Mean Square Error (MSE), showing that there is a value of

λ

that minimizes the possibility of the MSE being lower than the one obtained by OLS. However, it is not guaranteed that the VIF for this value of

λ

presents a value less than the established thresholds. The results are illustrated with two numerical examples, and in the second one, the results obtained by OLS are compared to the results obtained with the raise, ridge, and Lasso regressions that are widely applied to estimated models with worrying multicollinearity. It is showed that the raise regression can compete and even overcome these methodologies.

Finally, we propose as future lines of research the following questions:

The examples showed that the coefficients of variation increase after raising the variables. This fact is associated with an increase in the variability of the variable and, consequently, with a decrease of the near nonessential multicollinearity. Although a deeper analysis is required, it seems that raise regression mitigates this kind of near multicollinearity.
The value of the ridge factor traditionally applied, $K_{H K B}$ , leads to estimators with smaller MSEs than the OLS estimators with probability greater than 0.5. In contrast, the value of the raising factor $λ_{m i n}$ always leads to estimators with smaller MSEs than OLS estimators. Thus, it is deduced that the ridge regression provides estimators with MSEs higher than the MSEs of OLS estimators with probability lower than 0.5. These questions seem to indicate that, in terms of MSE, the raise regression can present better behaviour than the ridge regression. However, the confirmation of this judgment will require a more complete analysis, including other aspects such as interpretability and inference.

Author Contributions

Conceptualization, J.G.P., C.G.G. and R.S.G. and A.R.S.; methodology, R.S.G. and A.R.S.; software, A.R.S.; validation, J.G.P., R.S.G. and C.G.G.; formal analysis, R.S.G. and C.G.G.; investigation, R.S.G. and A.R.S.; writing—original draft preparation, A.R.S. and C.G.G.; writing—review and editing, C.G.G.; supervision, J.G.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We thank the anonymous referees for their useful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Given the linear model in Equation (7), it is obtained that

\begin{matrix} \hat{α} (λ) & = & {(\begin{matrix} X_{- i, - j}^{t} X_{- i, - j} & X_{- i, - j}^{t} {\tilde{X}}_{i} \\ {\tilde{X}}_{i}^{t} X_{- i, - j} & {\tilde{X}}_{i}^{t} {\tilde{X}}_{i} \end{matrix})}^{- 1} \cdot (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ {\tilde{X}}_{i}^{t} X_{j} \end{matrix}) \\ = & {(\begin{matrix} X_{- i, - j}^{t} X_{- i, - j} & X_{- i, - j}^{t} X_{i} \\ X_{i}^{t} X_{- i, - j} & X_{i}^{t} X_{i} + (λ^{2} + 2 λ) R S S_{i}^{- i} \end{matrix})}^{- 1} \cdot (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ X_{i}^{t} X_{j} \end{matrix}) \\ = & (\begin{matrix} A (λ) & B (λ) \\ B {(λ)}^{t} & C (λ) \end{matrix}) \cdot (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ X_{i}^{t} X_{j} \end{matrix}) \\ = & (\begin{matrix} A (λ) \cdot X_{- i, - j}^{t} X_{j} + B (λ) \cdot X_{i}^{t} X_{j} \\ B {(λ)}^{t} \cdot X_{- i, - j}^{t} X_{j} + C (λ) \cdot X_{i}^{t} X_{j} \end{matrix}) = (\begin{matrix} {\hat{α}}_{- i, - j} (λ) \\ {\hat{α}}_{i} (λ) \end{matrix}), \end{matrix}

Since it is verified that

e_{i}^{t} X_{- i, - j} = 0

, then

{\tilde{X}}_{i}^{t} X_{- i, - j} = {(X_{i} + λ e_{i})}^{t} X_{- i, - j} = X_{i}^{t} X_{- i, - j}

, where

\begin{matrix} C (λ) & = & {(X_{i}^{t} X_{i} + (λ^{2} + 2 λ) R S S_{i}^{- i} - X_{i}^{t} X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{i})}^{- 1} \\ = & {(X_{i}^{t} (I - X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t}) X_{i} + (λ^{2} + 2 λ) R S S_{i}^{- i})}^{- 1} \\ = & {(R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i})}^{- 1}, \\ B (λ) & = & - {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{i} \cdot C (λ) = \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot B, \\ A (λ) & = & {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} + {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{i} \cdot C (λ) \cdot X_{i}^{t} X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} \\ = & {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} + \frac{{(R S S_{i}^{- i, - j})}^{2}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot B \cdot B^{t} . \end{matrix}

Then,

\begin{matrix} {\hat{α}}_{- i, - j} (λ) & = & {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{j} + \frac{{(R S S_{i}^{- i, - j})}^{2}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} \\ + \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot B \cdot X_{i}^{t} X_{j} \\ = & {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{j} \\ + \frac{R S S_{i}^{- i, - j} (R S S_{i}^{- i, - j} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} + B \cdot X_{i}^{t} X_{j})}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}}, \\ {\hat{α}}_{i} (λ) & = & \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} \\ + \frac{1}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot X_{i}^{t} X_{j} \\ = & \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot (B^{t} \cdot X_{- i, - j}^{t} X_{j} + {(R S S_{i}^{- i, - j})}^{- 1} X_{i}^{t} X_{j}) \\ = & \frac{R S S_{i}^{- i, - j}}{R S S_{i}^{- i, - j} + (λ^{2} + 2 λ) R S S_{i}^{- i}} \cdot {\hat{α}}_{i} . \end{matrix}

Appendix B

Given the linear model

X_{j} = X_{- j} α + v = (X_{- i, - j} X_{i}) (\begin{matrix} α_{- i, - j} \\ α_{i} \end{matrix}) + v,

it is obtained that

\begin{matrix} \hat{α} & = & {(\begin{matrix} X_{- i, - j}^{t} X_{- i, - j} & X_{- i, - j}^{t} X_{i} \\ X_{i}^{t} X_{- i, - j} & X_{i}^{t} X_{i} \end{matrix})}^{- 1} \cdot (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ X_{i}^{t} X_{j} \end{matrix}) = (\begin{matrix} A & B \\ B^{t} & C \end{matrix}) \cdot (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ X_{i}^{t} X_{j} \end{matrix}) \\ = & (\begin{matrix} A \cdot X_{- i, - j}^{t} X_{j} + B \cdot X_{i}^{t} X_{j} \\ B^{t} \cdot X_{- i, - j}^{t} X_{j} + C \cdot X_{i}^{t} X_{j} \end{matrix}) = (\begin{matrix} {\hat{α}}_{- i, - j} \\ {\hat{α}}_{i} \end{matrix}), \end{matrix}

where

\begin{matrix} C & = & {(X_{i}^{t} X_{i} - X_{i}^{t} X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{i})}^{- 1} \\ = & {(X_{i}^{t} (I - X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t}) X_{i})}^{- 1} = {(R S S_{i}^{- i, - j})}^{- 1}, \\ B & = & - {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{i} \cdot C, \\ A & = & {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} \cdot (I + X_{- i, - j}^{t} X_{i} \cdot C \cdot X_{i}^{t} X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1}) \\ = & {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} + \frac{1}{C} \cdot B \cdot B^{t} . \end{matrix}

In that case, the residual sum of squares is given by

\begin{matrix} R S S_{j}^{- j} & = & X_{j}^{t} X_{j} - {(\begin{matrix} A \cdot X_{- i, - j}^{t} X_{j} + B \cdot X_{i}^{t} X_{j} \\ B^{t} \cdot X_{- i, - j}^{t} X_{j} + C \cdot X_{i}^{t} X_{j} \end{matrix})}^{t} (\begin{matrix} X_{- i, - j}^{t} X_{j} \\ X_{i}^{t} X_{j} \end{matrix}) \\ = & X_{j}^{t} X_{j} - X_{j}^{t} X_{- i, - j} \cdot A^{t} \cdot X_{- i, - j}^{t} X_{j} - X_{j}^{t} X_{i} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} - {\hat{α}}_{i}^{t} X_{i}^{t} X_{j} \\ = & (X_{j}^{t} X_{j} - X_{j}^{t} X_{- i, - j} {(X_{- i, - j}^{t} X_{- i, - j})}^{- 1} X_{- i, - j}^{t} X_{j}) \\ - R S S_{i}^{- i, - j} X_{j}^{t} X_{- i, - j} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} - X_{j}^{t} X_{i} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} - {\hat{α}}_{i}^{t} X_{i}^{t} X_{j} \\ = & R S S_{j}^{- i, - j} - (R S S_{i}^{- i, - j} X_{j}^{t} X_{- i, - j} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} + X_{j}^{t} X_{i} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} \\ + {\hat{α}}_{i}^{t} X_{i}^{t} X_{j}), \end{matrix}

and consequently

R S S_{j}^{- i, - j} - R S S_{j}^{- j} = R S S_{i}^{- i, - j} X_{j}^{t} X_{- i, - j} \cdot B \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} + X_{j}^{t} X_{i} \cdot B^{t} \cdot X_{- i, - j}^{t} X_{j} + {\hat{α}}_{i}^{t} X_{i}^{t} X_{j} .

Appendix C

First, parting from the expression Equation (14), it is obtained that

M_{λ}^{- 1} = (\begin{matrix} 1 & 0 & \dots & 0 & - \frac{λ}{1 + λ} {\hat{α}}_{0} & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 & \frac{λ}{1 + λ} {\hat{α}}_{1} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 1 & {(- 1)}^{k - 1} \frac{λ}{1 + λ} {\hat{α}}_{k - 1} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & \frac{1}{1 + λ} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & {(- 1)}^{k + 1} \frac{λ}{1 + λ} {\hat{α}}_{k + 1} & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & {(- 1)}^{p} \frac{λ}{1 + λ} {\hat{α}}_{p} & 0 & \dots & 1 \end{matrix}),

and then,

{(M_{λ}^{- 1} - I)}^{t} (M_{λ}^{- 1} - I) = (\begin{matrix} 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & a (λ) & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 \end{matrix}),

where

a (λ) = \frac{λ^{2}}{{(1 + λ)}^{2}} \cdot ({\hat{α}}_{0} + {\hat{α}}_{1} + \dots + {\hat{α}}_{k - 1}^{2} + 1 + {\hat{α}}_{k + 1}^{2} + \dots + {\hat{α}}_{p}^{2})

. In that case,

β^{t} {(M_{λ}^{- 1} - I)}^{t} (M_{λ}^{- 1} - I) β = a (λ) \cdot β_{k}^{2} .

Second, partitioning

\tilde{X}

in the form

\tilde{X} = [X_{- k} {\tilde{X}}_{k}]

, it is obtained that

{({\tilde{X}}^{t} \tilde{X})}^{- 1} = (\begin{matrix} {(X_{- k}^{t} X_{- k})}^{- 1} + \frac{\hat{α} {\hat{α}}^{t}}{{(1 + λ)}^{2} \cdot e_{k}^{t} e_{k}} & - \frac{\hat{α}}{{(1 + λ)}^{2} \cdot e_{k}^{t} e_{k}} \\ - \frac{{\hat{α}}^{t}}{{(1 + λ)}^{2} \cdot e_{k}^{t} e_{k}} & \frac{1}{{(1 + λ)}^{2} \cdot e_{k}^{t} e_{k}} \end{matrix}),

and then,

t r ({({\tilde{X}}^{t} \tilde{X})}^{- 1}) = t r ({(X_{- k}^{t} X_{- k})}^{- 1}) + \frac{1}{{(1 + λ)}^{2} \cdot e_{k}^{t} e_{k}} \cdot (t r (\hat{α} {\hat{α}}^{t}) + 1) .

Consequently, it is obtained that

MSE (\hat{β} (λ)) = σ^{2} t r ({(X_{- k}^{t} X_{- k})}^{- 1}) + (1 + \sum_{j = 0, j \neq k}^{p} {\hat{α}}_{j}^{2}) \cdot β_{k}^{2} \cdot \frac{λ^{2} + h}{{(1 + λ)}^{2}},

(A1)

where

h = \frac{σ^{2}}{β_{k}^{2} \cdot R S S_{k}^{- k}}

.

Third, taking into account that the first and second derivatives of expression Equation (A1) are, respectively,

\begin{matrix} \frac{\partial}{\partial λ} MSE (\hat{β} (λ)) & = & (1 + \sum_{j = 0, j \neq k}^{p} {\hat{α}}_{j}^{2}) \cdot β_{k}^{2} \cdot \frac{2 (λ - h)}{{(1 + λ)}^{3}}, \\ \frac{\partial^{2}}{\partial λ^{2}} MSE (\hat{β} (λ)) & = & - 2 (1 + \sum_{j = 0, j \neq k}^{p} {\hat{α}}_{j}^{2}) \cdot β_{k}^{2} \cdot \frac{2 λ - (1 + 3 h)}{{(1 + λ)}^{4}} . \end{matrix}

Since

λ \geq 0

, it is obtained that

MSE (\hat{β} (λ))

is decreasing if

λ < h

and increasing if

λ > h

, and it is concave if

λ > \frac{1 + 3 h}{2}

and convex if

λ < \frac{1 + 3 h}{2}

.

Indeed, given that

\begin{matrix} lim_{λ \to + \infty} MSE (\hat{β} (λ)) & = & σ^{2} t r ({(X_{- k}^{t} X_{- k})}^{- 1}) + (1 + \sum_{j = 0, j \neq k}^{p} {\hat{α}}_{j}^{2}) \cdot β_{k}^{2}, \\ MSE (\hat{β} (0)) & = & σ^{2} t r ({(X_{- k}^{t} X_{- k})}^{- 1}) + (1 + \sum_{j = 0, j \neq k}^{p} {\hat{α}}_{j}^{2}) \cdot β_{k}^{2} \cdot h, \end{matrix}

(A2)

if

h > 1

, then

MSE (\hat{β} (0)) > lim_{λ \to + \infty} MSE (\hat{β} (λ))

, and if

h < 1

, then

MSE (\hat{β} (0)) < lim_{λ \to + \infty} MSE (\hat{β} (λ))

. That is to say, if

h > 1

, then the raise estimator presents always a lower MSE than the one obtained by OLS for all

λ

, and comparing expressions Equations (A1) and (A2) when

h < 1

,

MSE (\hat{β} (λ)) \leq MSE (\hat{β} (0))

if

λ \leq \frac{2 \cdot h}{1 - h}

.

From this information, the behavior of the MSE is represented in Figure A1 and Figure A2. Note that the MSE presents a minimum value for

λ = h

.

Figure A1.

M S E (\hat{β} (λ))

representation for

h = \frac{σ^{2}}{(e_{k}^{t} e_{k}) \cdot β_{k}^{2}} < 1

.

Figure A1.

M S E (\hat{β} (λ))

representation for

h = \frac{σ^{2}}{(e_{k}^{t} e_{k}) \cdot β_{k}^{2}} < 1

.

Figure A2.

M S E (\hat{β} (λ))

representation for

h = \frac{σ^{2}}{(e_{k}^{t} e_{k}) \cdot β_{k}^{2}} > 1

.

Figure A2.

M S E (\hat{β} (λ))

representation for

h = \frac{σ^{2}}{(e_{k}^{t} e_{k}) \cdot β_{k}^{2}} > 1

.

References

Kiers, H.; Smilde, A. A comparison of various methods for multivariate regression with highly collinear variables. Stat. Methods Appl. 2007, 16, 193–228. [Google Scholar] [CrossRef]
Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
Fu, W.J. Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 1998, 7, 397–416. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]
Klinger, A. Inference in high dimensional generalized linear models based on soft thresholding. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2001, 63, 377–392. [Google Scholar] [CrossRef] [Green Version]
Dupuis, D.; Victoria-Feser, M. Robust VIF regression with application to variable selection in large data sets. Ann. Appl. Stat. 2013, 7, 319–341. [Google Scholar] [CrossRef]
Li, Y.; Yang, H. A new Liu-type estimator in linear regression model. Stat. Pap. 2012, 53, 427–437. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Feng, Y.; Wall, M. Variable selection and prediction with incomplete high-dimensional data. Ann. Appl. Stat. 2016, 10, 418–450. [Google Scholar] [CrossRef]
Uematsu, Y.; Tanaka, S. High-dimensional macroeconomic forecasting and variable selection via penalized regression. Econom. J. 2019, 22, 34–56. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Tutz, G.; Ulbricht, J. Penalized regression with correlation-based penalty. Stat. Comput. 2009, 19, 239–253. [Google Scholar] [CrossRef] [Green Version]
Stone, M.; Brooks, R.J. Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J. R. Stat. Soc. Ser. B (Methodol.) 1990, 52, 237–269. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar]
Golan, A.; Judge, G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation With Limited Data; John Wiley and Sons: Chichester, UK, 1997. [Google Scholar]
Golan, A. Information and entropy econometrics review and synthesis. Found. Trends Econom. 2008, 2, 1–145. [Google Scholar] [CrossRef]
Macedo, P. Ridge Regression and Generalized Maximum Entropy: An improved version of the Ridge–GME parameter estimator. Commun. Stat.-Simul. Comput. 2017, 46, 3527–3539. [Google Scholar] [CrossRef]
Batah, F.S.M.; Özkale, M.R.; Gore, S. Combining unbiased ridge and principal component regression estimators. Commun. Stat. Theory Methods 2009, 38, 2201–2209. [Google Scholar] [CrossRef]
Massy, W.F. Principal components regression in exploratory statistical research. J. Am. Stat. Assoc. 1965, 60, 234–256. [Google Scholar] [CrossRef]
Guo, W.; Liu, X.; Zhang, S. The principal correlation components estimator and its optimality. Stat. Pap. 2016, 57, 755–779. [Google Scholar] [CrossRef]
Aguilera-Morillo, M.; Aguilera, A.; Escabias, M.; Valderrama, M. Penalized spline approaches for functional logit regression. Test 2013, 22, 251–277. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
De Jong, S. SIMPLS: An alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 1993, 18, 251–263. [Google Scholar] [CrossRef]
Jensen, D.; Ramirez, D. Surrogate models in ill-conditioned systems. J. Stat. Plan. Inference 2010, 140, 2069–2077. [Google Scholar] [CrossRef]
García, J.; Salmerón, R.; García, C.; López Martín, M.D.M. Standardization of variables and collinearity diagnostic in ridge regression. Int. Stat. Rev. 2016, 84, 245–266. [Google Scholar] [CrossRef]
Marquardt, D. You should standardize the predictor variables in your regression models. Discussion of: A critique of some ridge regression methods. J. Am. Stat. Assoc. 1980, 75, 87–91. [Google Scholar]
Rodríguez, A.; Salmerón, R.; García, C. The coefficient of determination in the ridge regression. Commun. Stat. Simul. Comput. 2019. [Google Scholar] [CrossRef] [Green Version]
García, C.G.; Pérez, J.G.; Liria, J.S. The raise method. An alternative procedure to estimate the parameters in presence of collinearity. Qual. Quant. 2011, 45, 403–423. [Google Scholar] [CrossRef]
Salmerón, R.; García, C.; García, J.; López, M.D.M. The raise estimator estimation, inference, and properties. Commun. Stat. Theory Methods 2017, 46, 6446–6462. [Google Scholar] [CrossRef]
García, J.; López-Martín, M.; García, C.; Salmerón, R. A geometrical interpretation of collinearity: A natural way to justify ridge regression and its anomalies. Int. Stat. Rev. 2020. [Google Scholar] [CrossRef]
Belsley, D.A.; Kuh, E.; Welsch, R.E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 571. [Google Scholar]
Hoerl, A.; Kannard, R.; Baldwin, K. Ridge regression: some simulations. Commun. Stat. Theory Methods 1975, 4, 105–123. [Google Scholar] [CrossRef]
Stein, C. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1956; pp. 197–206. [Google Scholar]
James, W.; Stein, C. Estimation with Quadratic Loss. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1961; pp. 361–379. [Google Scholar]
Ohtani, K. An MSE comparison of the restricted Stein-rule and minimum mean squared error estimators in regression. Test 1998, 7, 361–376. [Google Scholar] [CrossRef]
Hubert, M.; Gijbels, I.; Vanpaemel, D. Reducing the mean squared error of quantile-based estimators by smoothing. Test 2013, 22, 448–465. [Google Scholar] [CrossRef]
Salmerón, R.; García, C.; García, J. Variance Inflation Factor and Condition Number in multiple linear regression. J. Stat. Comput. Simul. 2018, 88, 2365–2384. [Google Scholar] [CrossRef]
Salmerón, R.; Rodríguez, A.; García, C. Diagnosis and quantification of the non-essential collinearity. Comput. Stat. 2019. [Google Scholar] [CrossRef]
Marquandt, D.; Snee, R. Ridge regression in practice. Am. Stat. 1975, 29, 3–20. [Google Scholar]
García, C.B.; Garcí, J.; Salmerón, R.; López, M.M. Raise regression: Selection of the raise parameter. In Proceedings of the International Conference on Data Mining, Vancouver, BC, Canada, 30 April–2 May 2015. [Google Scholar]
García, J.; Ramírez, D. The successive raising estimator and its relation with the ridge estimator. Commun. Stat. Simul. Comput. 2016, 46, 11123–11142. [Google Scholar] [CrossRef]
Klein, L.; Goldberger, A. An Economic Model of the United States, 1929–1952; North Holland Publishing Company: Amsterdan, The Netherlands, 1964. [Google Scholar]
García, C.; Salmerón, R.; García, C.; García, J. Residualization: Justification, properties and application. J. Appl. Stat. 2019. [Google Scholar] [CrossRef]
Marquardt, D. Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 1970, 12, 591–612. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
York, R. Residualization is not the answer: Rethinking how to address multicollinearity. Soc. Sci. Res. 2012, 41, 1379–1386. [Google Scholar] [CrossRef]
Salmerón, R.; García, J.; García, C.; García, C. Treatment of collinearity through orthogonal regression: An economic application. Boletín Estadística Investig. Oper. 2016, 32, 184–202. [Google Scholar]

Figure 1. Representation of the raise method.

Figure 2. VIF of the variables after raising

OI

.

Figure 2. VIF of the variables after raising

OI

.

Figure 3. Mean square error (MSE) for the model in Equation (19) after raising third variable.

Table 1. Estimations of the models in Equations (15)–(18): Standard deviation is inside the parenthesis,

R^{2}

is the coefficient of determination,

F_{3, 11}

is the experimental value of the joint significance contrast, and

{\hat{σ}}^{2}

is the variance estimate of the random perturbation.

Table 1. Estimations of the models in Equations (15)–(18): Standard deviation is inside the parenthesis,

R^{2}

is the coefficient of determination,

F_{3, 11}

is the experimental value of the joint significance contrast, and

{\hat{σ}}^{2}

is the variance estimate of the random perturbation.

	Model (15)	p-Value	Model (16) for $λ_{vif}^{(1)} = 24.5$	p-Value	Model (17) for $λ_{\min}^{(1)} = 0.42$ and $λ_{vif}^{(2)} = 17.5$	p-Value	Model (18) for $λ_{mse}^{(1)} = 1.43$ and $λ_{vif}^{(2)} = 10$	p-Value
Intercept	994.21 (17,940)	0.957	4588.68 (17,773.22)	0.801	5257.84 (1744.26)	0.772	5582.29 (17,740.18)	0.759
$FA$	−1.28 (0.55)	0.039	−1.59 (0.50)	0.009	−1.59 (0.51)	0.009	−1.58 (0.51)	0.009
$OI$	−81.79 (52.86)	0.150
${\tilde{OI}}_{λ_{v i f}^{(1)}}$			−3.21 (2.07)	0.150
${\tilde{OI}}_{λ_{m i n}^{(1)}}$					1.67 (2.28)	0.478
${\tilde{OI}}_{λ_{m s e}^{(1)}}$							1.51 (2.24)	0.517
$S$	87.58 (53.29)	0.129	8.38 (2.35)	0.004
${\tilde{S}}_{λ_{v i f}^{(2)}}$					3.42 (2.03)	0.120	3.55 (1.99)	0.103
$R^{2}$	0.70		0.70		0.70		0.70
$F_{3, 11}$	8.50		8.50		8.50		8.50
${\hat{σ}}^{2}$	1,617,171,931		1,617,171,931		1,617,171,931		1,617,171,931
MSE	321,730,738		321,790,581		336,915,567		325,478,516

Table 2. Horizontal asymptotes for variance inflation factors (VIF) after raising each variable and

λ_{m i n}^{(1)}

,

λ_{m s e}^{(1)}

, and

λ_{v i f}^{(1)}

.

Table 2. Horizontal asymptotes for variance inflation factors (VIF) after raising each variable and

λ_{m i n}^{(1)}

,

λ_{m s e}^{(1)}

, and

λ_{v i f}^{(1)}

.

Raised	$lim_{λ^{(1)} \to + \infty} VIF (FA, λ^{(1)})$	$lim_{λ^{(1)} \to + \infty} VIF (OI, λ^{(1)})$	$lim_{λ^{(1)} \to + \infty} VIF (S, λ^{(1)})$
Variable 1	1	4429.22	4429.22
Variable 2	2.09	1	2.09
Variable 3	2.12	2.12	1
Raised	$λ_{m i n}^{(1)}$	$λ_{m s e}^{(1)}$	$λ_{v i f}^{(1)}$
Variable 1	0.18	0.45	∄
Variable 2	0.42	1.43	24.5
Variable 3	0.37	1.18	24.7

Table 3. VIF of regression Equation (16) for

λ^{(1)}

equal to

λ_{m i n}^{(1)}

,

λ_{m s e}^{(1)}

, and

λ_{v i f}^{(1)}

.

Table 3. VIF of regression Equation (16) for

λ^{(1)}

equal to

λ_{m i n}^{(1)}

,

λ_{m s e}^{(1)}

, and

λ_{v i f}^{(1)}

.

	$VIF (FA, λ^{(1)})$	$VIF (\tilde{OI}, λ^{(1)})$	$VIF (S, λ^{(1)})$
$λ_{m i n}^{(1)}$	2.27	2587.84	2557.66
$λ_{m s e}^{(1)}$	2.15	878.10	868.58
$λ_{v i f}^{(1)}$	2.09	9.00	9.99

Table 4. Effect of data transformations on VIF associated with raise regression.

	$VIF (FA, λ^{(1)})$	$VIF (\tilde{OE}, λ^{(1)})$	$VIF (S, λ^{(1)})$
Original–Unit length	$9.83 \cdot 10^{- 16}$	$1.55 \cdot 10^{- 11}$	$1.83 \cdot 10^{- 10}$
Original–Standardized	$- 1.80 \cdot 10^{- 16}$	$- 3.10 \cdot 10^{- 10}$	$2.98 \cdot 10^{- 10}$
Unit length–Standardized	$- 1.16 \cdot 10^{- 15}$	$- 3.26 \cdot 10^{- 10}$	$1.15 \cdot 10^{- 10}$

Table 5. Horizontal asymptote for VIFs after raising each variable in the second raising for

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

and

λ_{v i f}^{(2)}

.

Table 5. Horizontal asymptote for VIFs after raising each variable in the second raising for

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

and

λ_{v i f}^{(2)}

.

Raised	$lim_{λ^{(2)} \to + \infty} VIF (FA, λ^{(2)})$	$lim_{λ^{(2)} \to + \infty} VIF (\tilde{OI}, λ^{(2)})$	$lim_{λ^{(2)} \to + \infty} VIF (S, λ^{(2)})$
Variable 1	1	2381.56	2381.56
Variable 3	2.12	2.12	1
Raised	$λ_{m i n}^{(2)}$	$λ_{m s e}^{(2)}$	$λ_{v i f}^{(2)}$
Variable 1	0.15	0.34	∄
Variable 3	0.35	1.09	17.5

Table 6. VIFs of regression Equation (16) for

λ^{(2)}

equal to

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

.

Table 6. VIFs of regression Equation (16) for

λ^{(2)}

equal to

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

.

	$VIF (FA, λ^{(2)})$	$VIF (\tilde{OI}, λ^{(2)})$	$VIF (\tilde{S}, λ^{(2)})$
$λ_{m i n}^{(2)}$	2.20	1415.06	1398.05
$λ_{m s e}^{(2)}$	2.15	593.98	586.20
$λ_{v i f}^{(2)}$	2.12	9.67	8.47

Table 7. Horizontal asymptote for VIFs after raising each variables in the second raising for

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

.

Table 7. Horizontal asymptote for VIFs after raising each variables in the second raising for

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

.

Raised	$lim_{λ^{(2)} \to + \infty} VIF (FA, λ^{(2)})$	$lim_{λ^{(2)} \to + \infty} VIF (\tilde{OI}, λ^{(2)})$	$lim_{λ^{(2)} \to + \infty} VIF (S, λ^{(2)})$
Variable 1	1	853.40	853.40
Variable 3	2.12	2.12	1
Raised	$λ_{m i n}^{(2)}$	$λ_{m s e}^{(2)}$	$λ_{v i f}^{(2)}$
Variable 1	0.12	0.27	∄
Variable 3	0.32	0.92	10

Table 8. VIFs of regression Equation (16) for

λ^{(2)}

equal to

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

.

Table 8. VIFs of regression Equation (16) for

λ^{(2)}

equal to

λ_{m i n}^{(2)}

,

λ_{m s e}^{(2)}

, and

λ_{v i f}^{(2)}

.

	$VIF (FA, λ^{(2)})$	$VIF (\tilde{OI}, λ^{(2)})$	$VIF (\tilde{S}, λ^{(2)})$
$λ_{m i n}^{(2)}$	2.14	508.54	502.58
$λ_{m s e}^{(2)}$	2.13	239.42	236.03
$λ_{v i f}^{(2)}$	2.12	9.36	8.17

Table 9. Estimation of the original and raised models: Standard deviation is inside the parentheses,

R^{2}

is the coefficient of determination,

F_{3, 10}

is the experimental value of the joint significance contrast, and

{\hat{σ}}^{2}

is the variance estimate of the random perturbation.

Table 9. Estimation of the original and raised models: Standard deviation is inside the parentheses,

R^{2}

is the coefficient of determination,

F_{3, 10}

is the experimental value of the joint significance contrast, and

{\hat{σ}}^{2}

is the variance estimate of the random perturbation.

	Model (19)	p-Value	Model (20) for $λ_{\min} = 6.895$	p-Value	Model (21) for $λ_{\min} = 0.673$	p-Value
Intercept	18.7021 (6.8454)	0.021	19.21507 (6.67216)	0.016	18.2948 (6.8129)	0.023
WI	0.3803 (0.3121)	0.251	0.43365 (0.26849)	0.137
$\tilde{WI}$					0.2273 (0.1866)	0.251
NWI	1.4186 (0.7204)	0.077	1.38479 (0.71329)	0.081	1.7269 (0.5143)	0.007
FI	0.5331 (1.3998)	0.711			0.8858 (1.2754)	0.503
$\tilde{FI}$			0.06752 (0.17730)	0.711
$R^{2}$	0.9187		0.9187		0.9187
$\hat{σ}$	6.06		6.06		6.06
$F_{3, 10}$	37.68		37.68		37.68
MSE	49.43469		45.61387		48.7497

Table 10. Horizontal asymptote for VIFs after raising each variable and

λ_{m i n}

.

Table 10. Horizontal asymptote for VIFs after raising each variable and

λ_{m i n}

.

Raised	$lim_{λ \to + \infty} VIF (WI, λ)$	$lim_{λ \to + \infty} VIF (NWI, λ)$	$lim_{λ \to + \infty} VIF (FI, λ)$	$λ_{\min}$
Variable 1	1	2.19	2.19	0.673
Variable 2	2.92	1	2.92	0.257
Variable 3	9.05	9.05	1	6.895

Table 11. Estimation of the ridge models for

K_{H K B} = 0.417083

and

K_{V I F} = 0.013

. Confidence interval, at 10% confidence, is obtained from bootstrap inside the parentheses, and

R^{2}

is the coefficient of determination obtained from Rodríguez et al. [28].

Table 11. Estimation of the ridge models for

K_{H K B} = 0.417083

and

K_{V I F} = 0.013

. Confidence interval, at 10% confidence, is obtained from bootstrap inside the parentheses, and

R^{2}

is the coefficient of determination obtained from Rodríguez et al. [28].

	Model (19) for $K_{HKB} = 0.417083$	Model (19) for $K_{VIF} = 0.013$
Intercept	12.2395 (6.5394, 15.9444)	18.3981 (12.1725, 24.1816)
WI	0.3495 (−0.4376, 1.2481)	0.3787 (−0.4593, 1.216)
NWI	1.6474 (−0.1453, 3.4272)	1.4295 (−0.2405, 3.2544)
FI	0.8133 (−1.5584, 3.028)	0.5467 (−1.827, 2.9238)
$R^{2}$	0.8957	0.9353
MSE	64.20028	47.99713

Table 12. Estimation of the ridge models for

K_{e x p} = 0.04020704

and

K_{s q} = 0.02663591

. Confidence interval, at 10% confidence, is obtained from bootstrap inside the parentheses, and

R^{2}

is the coefficient of determination obtained from Rodríguez et al. [28].

Table 12. Estimation of the ridge models for

K_{e x p} = 0.04020704

and

K_{s q} = 0.02663591

. Confidence interval, at 10% confidence, is obtained from bootstrap inside the parentheses, and

R^{2}

is the coefficient of determination obtained from Rodríguez et al. [28].

	Model (19) for $K_{\exp} = 0.04020704$	Model (19) for $K_{sq} = 0.02663591$
Intercept	17.7932 (11.4986, 22.9815)	18.0898 (11.8745, 23.8594)
WI	0.3756 (−0.4752, 1.2254)	0.3771 (−0.4653, 1.2401)
NWI	1.4512 (−0.2249, 3.288)	1.4406 (−0.2551, 3.2519)
FI	0.5737 (−1.798, 2.9337)	0.5605 (−1.6999, 2.9505)
$R^{2}$	0.918034	0.9183955
MSE	45.76226	46.75402

Table 13. Estimation of the Lasso model for

λ = 0.1258925

: Confidence interval at 10% confidence (obtained from bootstrap inside the parentheses).

Table 13. Estimation of the Lasso model for

λ = 0.1258925

: Confidence interval at 10% confidence (obtained from bootstrap inside the parentheses).

	Model (19) for $λ = 0.1258925$
Intercept	19.1444 (13.5814489, 24.586207)
WI	0.4198 (−0.2013491, 1.052905)
NWI	1.3253 (0.0000000, 2.752345)
FI	0.4675 (−1.1574169, 2.151648)

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salmerón Gómez, R.; Rodríguez Sánchez, A.; García, C.G.; García Pérez, J. The VIF and MSE in Raise Regression. Mathematics 2020, 8, 605. https://doi.org/10.3390/math8040605

AMA Style

Salmerón Gómez R, Rodríguez Sánchez A, García CG, García Pérez J. The VIF and MSE in Raise Regression. Mathematics. 2020; 8(4):605. https://doi.org/10.3390/math8040605

Chicago/Turabian Style

Salmerón Gómez, Román, Ainara Rodríguez Sánchez, Catalina García García, and José García Pérez. 2020. "The VIF and MSE in Raise Regression" Mathematics 8, no. 4: 605. https://doi.org/10.3390/math8040605

APA Style

Salmerón Gómez, R., Rodríguez Sánchez, A., García, C. G., & García Pérez, J. (2020). The VIF and MSE in Raise Regression. Mathematics, 8(4), 605. https://doi.org/10.3390/math8040605

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The VIF and MSE in Raise Regression

Abstract

1. Introduction

2. Preliminaries

2.1. Variance Inflation Factor

2.2. Raise Regression

3. VIF in Raise Regression

3.1. VIF Associated with Raise Variable

3.2. VIF Associated with Non-Raised Variables

3.3. Properties of $V I F (k, λ)$

3.4. Transformation of Variables

4. MSE for Raise Regression

5. Numerical Examples

5.1. Example 1: $h < 1$

5.1.1. First Raising

5.1.2. Transformation of Variables

5.1.3. Second Raising

5.1.4. Interpretation of Results

5.2. Example 2: $h > 1$

5.2.1. Raise Regression

5.2.2. Ridge Regression

5.2.3. Lasso Regression

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

The VIF and MSE in Raise Regression

Abstract

1. Introduction

2. Preliminaries

2.1. Variance Inflation Factor

2.2. Raise Regression

3. VIF in Raise Regression

3.1. VIF Associated with Raise Variable

3.2. VIF Associated with Non-Raised Variables

3.3. Properties of V I F ( k , λ )

3.4. Transformation of Variables

4. MSE for Raise Regression

5. Numerical Examples

5.1. Example 1: h < 1

5.1.1. First Raising

5.1.2. Transformation of Variables

5.1.3. Second Raising

5.1.4. Interpretation of Results

5.2. Example 2: h > 1

5.2.1. Raise Regression

5.2.2. Ridge Regression

5.2.3. Lasso Regression

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3. Properties of $V I F (k, λ)$

5.1. Example 1: $h < 1$

5.2. Example 2: $h > 1$