Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators

Akhtar, Nadeem; Alharthi, Muteb Faraj; Khan, Muhammad Shakir

doi:10.3390/math12193027

Open AccessArticle

Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators

by

Nadeem Akhtar

^1,*

,

Muteb Faraj Alharthi

²

and

Muhammad Shakir Khan

³

¹

Higher Education Department, Peshawar 26281, Khyber Pakhtunkhwa, Pakistan

²

Department of Mathematics and Statistics, College of Science, Taif University, Taif 21944, Saudi Arabia

³

Directorate General Livestock & Dairy Development Department (Research Wing) Peshawar, Peshawar 24551, Khyber Pakhtunkhwa, Pakistan

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(19), 3027; https://doi.org/10.3390/math12193027

Submission received: 8 August 2024 / Revised: 5 September 2024 / Accepted: 25 September 2024 / Published: 27 September 2024

(This article belongs to the Special Issue Application of Regression Models, Analysis and Bayesian Statistics)

Download

Browse Figure

Versions Notes

Abstract

:

Multicollinearity, a critical issue in regression analysis that can severely compromise the stability and accuracy of parameter estimates, arises when two or more variables exhibit correlation with each other. This paper solves this problem by introducing six new, improved two-parameter ridge estimators (ITPRE): NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6. These ITPRE are designed to remove multicollinearity and improve the accuracy of estimates. A comprehensive Monte Carlo simulation analysis using the mean squared error (MSE) criterion demonstrates that all proposed estimators effectively mitigate the effects of multicollinearity. Among these, the NATPR2 estimator consistently achieves the lowest estimated MSE, outperforming existing ridge estimators in the literature. Application of these estimators to a real-world dataset further validates their effectiveness in addressing multicollinearity, underscoring their robustness and practical relevance in improving the reliability of regression models.

Keywords:

multicollinearity; regression analysis; ridge parameters; two-parameter ridge estimators; error variance; estimation performance; Monte Carlo simulation

MSC:

62J05; 62J07; 62H20

1. Introduction

Regression analysis is one of the most common methods of analysis and modeling recognized for its practicality in forecasting and estimation. Lipovetsky and Conklin [1] have pointed out that it is useful in several ways. The ordinary least-squares (OLS) method is often favored for parameter estimation in regression analysis due to its appealing mathematical properties and computational ease. However, the OLS method works optimally only under specific circumstances, particularly when predictors are orthogonal. In real-world applications, predictors are often highly correlated, leading to multicollinearity problems in regression models. This multicollinearity makes OLS estimates less efficient and can cause important variables to have statistically insignificant regression coefficients. Moreover, it decreases statistical power and produces wider confidence intervals for the regression coefficients. To assess the extent of collinearity among predictors, the condition number (CN) of the X′X matrix, expressed as

C N = \frac{λ_{m a x}}{λ_{m i n}}

, is frequently used. Generally, a CN ≤ 10 indicates weak multicollinearity. As Belsley et al. [2] described, the collinearity is considered negligible when CN ≤ 10, moderate to strong when 10 < CN ≤ 30, and severe when CN ≥ 30. The ridge regression technique was proposed to address multicollinearity in regression models [3,4]. Consider the following linear regression model:

y = X β + ϵ

(1)

where

y

is an

(n \times 1)

vector of responses,

X

is an

(n \times p)

design matrix of predictors,

β

is a

(p \times 1)

vector of unknown regression coefficients, and

ϵ

is an

(n \times 1)

vector of error terms assumed to follow a multivariate normal distribution with a mean vector of 0 and a variance–covariance matrix of

σ^{2} I_{n}

. In this context,

I_{n}

denotes the identity matrix of order n. The OLS estimates are calculated as follows:

{\hat{β}}_{O L S} = {(X^{'} X)}^{- 1} X^{'} y

(2)

The ridge regression estimates are given by the following:

{\hat{β}}_{R i d g e} = {(X^{'} X + k I)}^{- 1} X^{'} y

(3)

where I is a

p \times p

identity matrix and k is a positive scalar. From Equation (3), it is evident that the CN of the

(X^{'} X + k I)

matrix decreases as

k

increases; however, the introduction of

k

introduces bias in the ridge estimators in exchange for reduced variances of the regression coefficients. Thus, the key challenge in ridge regression is selecting an optimal value of

k

that provides the best bias–variance trade-off. The effectiveness of ridge estimators is influenced by various factors, including the degree of correlation among predictors, the variance of errors, the number of predictors, and the sample size. Because data characteristics can vary widely, no single ridge estimator consistently excels in every situation.

Researchers have explored various methods to identify the optimal k value for ridge estimators. Ref. [5] developed a generalized ridge estimator and conducted comparisons with simple ridge and rank procedure methods. They concluded that while the generalized ridge estimator may have potential advantages, simple ridge and rank procedures are sufficiently adaptable for practical use. Hoerl et al. [6] introduced a novel ridge estimator based on the harmonic mean of

(\frac{\hat{σ^{2}}}{\hat{α_{i}^{2}}})

and showed through simulation studies that it performed better than OLS. A proposed improvement to the ridge method by developing quantile-based ridge estimators for

k_{i}

, which demonstrated superior performance over existing ridge estimators and OLS in their simulations [7]. Addressing the challenge of selecting the k value in one-parameter ridge estimators, Lipovetsky and Conklin introduced a two-parameter ridge estimator (TPRE), providing a more refined approach to overcome the limitations of traditional methods [1]. The ridge estimator for the coefficients can be represented in a generalized form as follows:

{\hat{β}}_{(q, k)} = q (X^{'} X + k I)^{- 1} X^{'} y

(4)

where q is a scaling factor,

X^{'} X

is the matrix of predictors, y is the vector of responses, k is the ridge parameter, and I is the identity matrix.

\hat{q} = \frac{{(X^{'} y)}^{'} {(X^{'} X + k I)}^{- 1} X^{'} y}{{(X^{'} y)}^{'} {(X^{'} X + k I)}^{- 1} X^{'} X {(X^{'} X + k I)}^{- 1} X^{'} y}

(5)

It is important to note that Equation (4) can be seen as a general form of Equations (2) and (3) if

q = 1

and

q = k or k = 0

. Subsequently, different researchers have suggested modifications to the two-parameter ridge regression model; for details, see refs. [8,9,10,11,12].

The proposed method for determining the value of q was by maximizing the R-squared value, while the parameter k was computed by minimizing the mean squared error (MSE). They found that their two-parameter ridge regression model had lower MSE and provided better orthogonality between predicted values and residuals compared to the one-parameter model. This model has since been refined by other researchers. For instance, Lipovetsky [13] further investigated the properties of the two-parameter ridge model, and [8] optimized the tuning parameters k and q, comparing their performance against OLS, one-parameter ridge estimators, and contraction estimators using the matrix MSE criterion. Refs. introduced and developed three new variations of the two-parameter ridge estimators [9,10]. More recently, Khan et al. [14] introduced six novel two-parameter ridge estimators and benchmarked them against the existing two-parameter ridge, one-parameter ridge, and OLS estimators. Although these estimators show superior performance under specific conditions, no single estimator consistently outperforms the others across all scenarios. Lukman and Ayinde [15] conducted a comprehensive review and classification of various techniques used for estimating ridge parameters. Additionally, Lukman, Ayinde, and Ajiboye [16] performed a Monte Carlo analysis on different estimators based on classification methods for ridge parameters. Lipovetsky and Conklin [1] introduced the two-parameter ridge (LCTPR) estimator to improve the fit-of-ridge regression models by using two parameters instead of one. The effectiveness of ridge estimators is influenced by several factors, including the degree of multicollinearity, error variance, the number of predictors, and sample size, and their performance can decline under stringent conditions.

The previous literature shows that, while most of the estimators are efficient under certain conditions, none of the ridge estimators dominate others in all situations. The efficiency is reduced by strong multicollinearity, high variability of errors, a large number of predictors, and a small sample size in the analyzed population. To address these issues, we propose six enhanced two-parameter ridge estimators aimed at effectively tackling issues of severe multicollinearity. These new ridge parameters are formulated based on the optimal selection of the ridge parameters

\hat{q}

and

\hat{k}

.

The effectiveness of the proposed ridge parameters is assessed through a Monte Carlo simulation study and the analysis of a real dataset where the independent variables are correlated with each other. This article is structured as follows: Section 2 details the methodology of ridge regression and presents our newly proposed ridge parameters alongside a brief overview of several existing ridge estimators. Section 3 describes the simulation study and interprets the findings. Section 4 demonstrates the application of the proposed new ridge parameters using a real dataset. Finally, Section 5 offers concluding remarks and insights.

2. Methodology

To simplify the mathematical representation, model (1) can be reformulated into a canonical or orthogonal form as follows:

y = Z α + ϵ

(6)

Equation (6) represents the model reformulated in canonical form, where,

Z = X D

,

α = D^{'} β

, and

D^{'} D = I_{p}

. Here, D represents an orthogonal matrix containing the eigenvectors of the X′X matrix, while

I_{p}

denotes the identity matrix. Moreover,

Λ = D^{'} X^{'} X D

, with

Λ

defined as

diag (λ_{1}, λ_{2}, \dots, λ_{p})

, where

λ_{1}, λ_{2}, \dots, λ_{p}

are the positive eigenvalues of the X′X matrix, ordered from smallest to largest.

By using this transformation, Equations (2)–(4) can be rewritten in their canonical forms as follows:

\hat{α} = Λ^{- 1} Z^{'} y

(7)

Equation (7) provides the canonical form of the ordinary least squares estimator in the transformed model.

{\hat{α}}_{k} = {(Λ + k I_{p})}^{- 1} Z^{'} y

(8)

Equation (8) refers to the ridge estimator in its canonical form, which incorporates the regularization parameter k.

{\hat{α}}_{(q, k)} = q {(Λ + k I_{p})}^{- 1} Z^{'} y

(9)

Equation (9) illustrates the two-parameter ridge estimator, which generalizes the ridge regression by introducing an additional parameter q.

2.1. Existing Ridge Parameters

The following are established ridge parameters.

i.: Hoerl and Kennard parameter

{\hat{k}}_{H K} = \frac{\hat{σ^{2}}}{\hat{α_{\max}^{2}}}

(10)

ii.: Hoerl, Kennard, and Baldwin (HKB) ridge parameter

{\hat{k}}_{H K B} = \frac{p \hat{σ^{2}}}{\sum_{i = 1}^{p} \hat{α_{i}^{2}}}

(11)

iii.: Kibria parameters

Ref. [17] proposed three ridge parameters by extending the research of [6].

{\hat{k}}_{A M} = \frac{1}{p} \sum_{i = 1}^{p} \frac{\hat{σ^{2}}}{\hat{α_{i}^{2}}}

(12)

{\hat{k}}_{G M} = \frac{\hat{σ^{2}}}{{(\prod_{i = 1}^{p} \hat{α_{i}^{2}})}^{\frac{1}{p}}}

(13)

{\hat{k}}_{M e d} = Med (\frac{\hat{σ^{2}}}{\hat{α_{i}^{2}}})

(14)

iv.: Khalaf, Mansson, and Shukur (KMS) parameter

{\hat{k}}_{K M S} = λ_{\max} \frac{\sum_{i = 1}^{p} |\hat{α_{i}}|}{\{\frac{\hat{σ^{2}}}{\hat{α_{\max}^{2}}}\}}

(15)

v.: Toker and Kaciranlar two-parameter ridge parameters

{\hat{q}}_{o p t} = \frac{\sum_{i = 1}^{p} \hat{α_{i}^{2}} \frac{λ_{i}}{λ_{i} + k}}{\sum_{i = 1}^{p} \frac{\hat{σ^{2}} λ_{i} + \hat{α_{i}^{2}} λ_{i}^{2}}{{(λ_{i} + k)}^{2}}}

(16)

In the above equation,

k

is defined as in Equation (10).

{\hat{k}}_{o p t} = \frac{{\hat{q}}_{o p t} \sum_{i = 1}^{p} \frac{\hat{σ^{2}}}{λ_{i}} + ({\hat{q}}_{o p t} - 1) \sum_{i = 1}^{p} \hat{α_{i}^{2}} λ_{i}^{2}}{\sum_{i = 1}^{p} \hat{α_{i}^{2}} λ_{i}}

(17)

2.2. Proposed Ridge Parameters

In this study, we introduce six modified Lipovetsky–Conklin ridge (MLCR) estimators, NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6, which combine the approaches from refs. [1,14] to optimize the values of k and q.

The values of k for our new proposed ridge parameters are determined as follows:

{\hat{k}}_{1} = (\sqrt[3]{\frac{1}{p} \sum_{i = 1}^{p} {(λ_{i} |\hat{α_{i}}|)}^{4}}) (\frac{\hat{σ^{2}}}{\hat{α_{\max}^{2}}})

(18)

The

{\hat{k}}_{1}

using the cubic mean of

\frac{1}{p} \sum_{i = 1}^{p} {(λ_{i} |\hat{α_{i}}|)}^{4}

across all predictors, scaled by the variance ratio

\frac{\hat{σ^{2}}}{\hat{α_{\max}^{2}}} .

This approach emphasizes the robustness of larger coefficients while controlling for the overall variance.

{\hat{k}}_{2} = (\sqrt[4]{\frac{1}{p} \sum_{i = 1}^{p} {(λ_{i} |\hat{α_{i}}|)}^{5}}) (\frac{\hat{σ^{2}}}{\hat{α_{\max}^{2}}})

(19)

The

{\hat{k}}_{2}

is derived using the fourth root mean of

{(λ_{i} |\hat{α_{i}}|)}^{5}

, providing an alternative weighting scheme that further emphasizes larger coefficients, potentially improving model stability in high-dimensional settings.

{\hat{k}}_{3} = m i n ({\hat{k}}_{1}^{*}, {\hat{k}}_{2}^{*}, . . ., {\hat{k}}_{p}^{*})

(20)

where

{\hat{k}}_{i}^{*}

= (λ_{i} |\hat{α_{i}}|)

.

This parameter selects the minimum value among

{\hat{k}}_{i}^{*}, w h e r e i = 1,2, . . ., p,

focusing on the most conservative regularization parameter that balances bias and variance effectively.

{\hat{k}}_{4} = \max (k_{1}, k_{2}, \dots, k_{p}), k_{i} = \frac{λ_{i}}{|\hat{α_{i}}|}, where i = 1,2, ., ., ., p

(21)

{\hat{k}}_{4}

takes the maximum value of

\frac{λ_{i}}{|\hat{α_{i}}|}

, maximizing regularization to constrain the influence of variables with large coefficients.

{\hat{k}}_{5} = \sqrt{(M e a n ((λ_{i} |\hat{α_{i}}|))} (\frac{\hat{σ^{2}}}{\hat{α_{\max}^{2}}})

(22)

{\hat{k}}_{5}

as the square root of the mean of

(λ_{i} |\hat{α_{i}}|)

, scaled by the variance ratio, offers a balanced approach between extreme regularization and no regularization.

{\hat{k}}_{6} = \sqrt[3]{\max (\frac{λ_{i}}{|\hat{α_{i}}|})}

(23)

{\hat{k}}_{6}

uses the cube root of the maximum ratio

(\frac{λ_{i}}{|\hat{α_{i}}|})

, targeting a moderate regularization that is neither too conservative nor too aggressive.

The estimators

{\hat{k}}_{1}, {\hat{k}}_{2}

, and

{\hat{k}}_{5}

are modifications of the HK ridge estimator Equation (10) in which regularization is improved through different weighting of coefficients. They also seek to enhance model stability and less rigid flexibility, especially when working with high-dimensional data structures, and generate large coefficients and form variances with a range of mathematical techniques and scaling.

The

\hat{q}

values are computed using Equation (5), with the corresponding

\hat{k}

values clearly shown in Equations (18)–(23). Based on these corresponding

\hat{k}

and

\hat{q}

values, six optimized two-parameter ridg estimators are derived, referred to as NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6 in this research.

In the following section, we present a simulation study to evaluate the performance of the proposed ridge parameters in comparison to existing ones.

3. Monte Carlo Simulation

This section gives a scenario of the result of the Monte Carlo simulation study that is used to compare the performance of the new ridge parameters with other existing ridge parameters. The subsequent sub-sections provide information about the simulation approach and the employed algorithm. The Monte Carlo simulation is a computational approach used to solve statistical models by sampling using random numbers. Multiple simulations tried to assess the properties like the bias, the variance, and the mean square error in order to analyze the efficiencies of the different methods such as the ridge regression estimator proposed in this context.

3.1. Simulation Technique

In this study, predictors are generated using Equation (24), which takes into account varying degrees of collinearity among them, as described by [17,18], as follows:

x_{i j} = \sqrt{1 - ρ^{2}} z_{j i} + ρ z_{j p + 1}, i = 1,2, \dots, p and j = 1,2, \dots, n

(24)

where

ρ

represents the pairwise correlation among predictors,

p

is the number of predictors, n is the sample size, and

z_{j i}

are pseudo-random numbers drawn from a standard normal distribution. This study examines

ρ

values of 0.80, 0.90, 0.95, and 0.99; sample sizes

n

of

20, 50,

and

100

; and predictor counts

p

of

4

and

10

.

The response variable is generated using the following equation:

y_{j} = α_{0} + α_{1} x_{1 j} + α_{2} x_{2 j} + \dots + α_{p} x_{p j} + ϵ_{j}, j = 1,2, \dots, n

(25)

In this formula,

ϵ_{j}

is a random error term drawn from a normal distribution with mean 0 and variance

σ^{2}

. This study considers four values of

σ

: 0.40, 0.90, 4, and 10. The regression coefficients

α_{j}

are set based on the most favorable direction, as outlined by Halawa et al. [19], with

α_{0}

set to zero.

Hence, the estimated mean squared error (EMSE) is computed from 5000 replications as follows:

EMSE (\hat{α}) = \frac{1}{5000} \sum_{j = 1}^{5000} {(\hat{α_{j}} - α)}^{'} (\hat{α_{j}} - α)

(26)

All computations were carried out using R-Studio version 2022.12.0.

3.2. Performance Evaluation Criteria

The estimators’ performance is evaluated using the mean square error (MSE) criterion, based on approaches from previous research, including those by [4,18], and using Equations (22)–(27). The estimated MSE for any estimator

\hat{α}

of the parameter

α

is defined as follows:

MSE (\hat{α}) = E [{(\hat{α} - α)}^{'} (\hat{α} - α)]

(27)

This expression calculates the expected value of the squared deviation between the estimator

\hat{α}

and the true parameter

α

. It serves as a metric for evaluating the estimator’s accuracy and precision, reflecting its ability to provide reliable estimates close to the actual parameter values.

3.3. Analysis and Findings

The estimated MSE values are outlined in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11 and Table A12, available in Appendix A. Our simulation study revealed the following findings:

Across various sample sizes, error variances, and numbers of predictor variables, our new ridge parameters consistently exhibited the lowest estimated MSE under most simulation scenarios. Notably, in cases of severe multicollinearity (ρ > 0.99), the ridge estimators MTPR3, NATPR1, and NATPR2 outperformed the existing estimators.

These findings illustrate the effectiveness of the new ridge parameters in addressing strong multicollinearity. In contrast, the OLS estimator demonstrates the poorest performance under multicollinear conditions when compared to ridge estimators.

As multicollinearity intensifies, the estimated MSE for OLS and many ridge estimators tends to increase. However, an interesting observation is that the MSE of new ridge parameters decreases with higher multicollinearity, suggesting their robustness against such issues.

When the error variance increases, the estimated MSE of all estimators generally rises, regardless of sample size, multicollinearity level, or the number of predictors. Despite this, new ridge parameters maintain a stable MSE, exhibiting only a modest increase compared to the OLS and many existing ridge estimators, showcasing their resilience against higher error variances amidst multicollinearity.

Increasing the sample size results in a decrease in estimated MSE for all estimators, which aligns with general statistical principles. However, ridge regression estimators show markedly better performance than OLS across all sample sizes.

As the number of predictors increases, the estimated MSE for all estimators rises, with the OLS estimator showing a more rapid increase compared to ridge estimators.

Our results, displayed in various tables, reveal that NATPR2 frequently achieves the lowest EMSE. Moreover, MTPR3, NATPR1, and NATPR2 consistently perform well across different levels of

ρ

.

The simulation results confirm that ridge parameters consistently outperform OLS in the presence of multicollinearity. Furthermore, among the ridge estimators examined, the two-parameter variants are superior to the one-parameter versions. Notably, our new ridge estimators, especially NATPR2, generally outperform existing methods in most scenarios considered.

Table 1 offers an in-depth analysis of the simulation study, encapsulating 96 distinct scenarios to evaluate the performance of various ridge parameters. Among the ridge parameters tested, NATPR2 consistently demonstrated superior performance by recording the lowest MSE in most scenarios. This consistent outperformance highlights NATPR2’s robustness, particularly under stringent or challenging conditions, making it a standout choice compared to other estimators.

Table 1 serves as a practical guide for selecting the most appropriate estimator based on specific conditions, which include varying sample sizes, error variances, and values of

p

(4 and 10). The analysis is organized around three sample sizes—20, 50, and 100—and considers four distinct error variances: 0.4, 0.9, 4, and 10. For each combination of sample size, error variance and different levels of

ρ

the table recommends the most suitable estimator.

The NATPR2 is frequently the recommended ridge estimator, especially for all sample sizes and across various error variances and pair correlations. As a result, NATPR2 stands out as the most dependable parameter for minimizing MSE, making it the favored option under the diverse conditions that handle the multicollinearity data. The NATPR2 frequently achieves the lowest EMSE. Moreover, MTPR3, NATPR1, and NATPR2 consistently perform well across different levels of

ρ

, sample sizes, and different error variances.

4. Real-Life Data Analysis

In this section, we demonstrate the application of new ridge parameters by analyzing a dataset representing Pakistan’s GDP growth. This dataset, summarized in Table 2, includes observations spanning 14 years, from the financial year of 2007–2008 to 2020–2021. The response variable “y” denotes the GDP growth. The predictor variables are as follows:

X_{1}

: Consumer Price Index,

X_{2}

: tax-to-GDP ratio,

X_{3}

: savings-to-GDP ratio,

X_{4}

: investment-to-GDP ratio,

X_{5}

: milk production,

X_{6}

: meat production,

X_{7}

: fish production,

X_{8}

: poultry production.

These data points are sourced from the Economic Survey of Pakistan, Statistical Supplement [20]. These data are modeled using a linear regression approach represented by the following equation:

y = α_{0} + α_{1} X_{1} + α_{2} X_{2} + α_{3} X_{3} + α_{4} X_{4} + α_{5} X_{5} + α_{6} X_{6} + α_{7} X_{7} + α_{8} X_{8} + ϵ

The condition number of the matrix, calculated to be 3,937,104, indicates severe multicollinearity within the dataset. This analysis is based on the data and methodology outlined by [21], providing insights into the factors influencing Pakistan’s GDP growth.

In the Figure 1, high positive relationships are observed for

X_{2}

with

X_{5}

with a coefficient of 0.75,

X_{6}

with a coefficient of 0.73, and

X_{8}

with a coefficient of 0.71. These orientations have very high correlation coefficients that imply that

X_{2}

has a direct proportional relationship with

X_{5}

,

X_{6}

, and

X_{8}

. Other correlations are moderate or weak, indicating varying degrees of linear relationships between the variables.

In the real dataset, the eigenvalues calculated are 4.134289315, 1.900981733, 1.022049275, 0.58361122, 0.30723455, 0.051004124, 0.000828734, and

1.05008 \times 10^{- 6}

, the magnitude of the largest eigenvalue is 4.134289315, and the smallest non-zero eigenvalue is

1.05008 \times 10^{- 6}

. The condition number is calculated as follows:

κ (A) = \frac{λ_{M a x}}{λ_{M i n}} = \frac{4.134289315}{1.05008 \times 10^{- 6}} \approx 3,937,488.835

The Variance Inflation Factor (VIF) is a statistic that shows how much the variance of a regression coefficient is increased because of the collinearity of the independent variables. VIF is calculated as follows:

V I F = \frac{1}{1 - R^{2}}

where

R^{2}

means the coefficient of determination. Tolerance is the reciprocal of VIF and also shows multicollinearity.

VIF = 1: The relationship that exists between the independent and the dependent variables.

Further, 1 < VIF < 5: moderate correlation; acceptable level of contaminant.

And 5 ≤ VIF < 10: high correlation; can be dangerous.

VIF ≥ 10: prevalence of severe multicollinearity.

The VIF analysis reveals that several predictor variables, particularly X₅, X₆, and X₈, exhibit severe multicollinearity.

These are mainly attributed to high VIF values and extreme multicollinearity specifically for

X_{5}, X_{6}, a n d X_{8}

. In this case, the use of ridge estimators may be required. Higher VIF values imply that multicollinearity may affect the regression results and complicate the effects of each predictor [20]. Ridge regression is another common technique that employs a penalty to the size of coefficients; this method is efficient in managing the issue of multicollinearity by reducing the size of the coefficients of correlated predictors. This approach assists in bringing stability to the model estimates and enhances the clarity of the conclusions made from the regression analysis. The proposed new ridge estimators and existing estimators to reduce the effects of multicollinearity were used, leading to more accurate and efficient models.

Table 3 compares the MSE for various estimators, highlighting their effectiveness in handling multicollinearity. The OLS estimator has the highest MSE (4262.71), indicating poor performance in the presence of multicollinearity. The OLS estimator exhibits the highest MSE, indicating poor performance, whereas ridge estimators like MTPR3, NATPR1, and NATPR2 have the lowest MSE, demonstrating superior capability. The regression coefficients reveal how different estimators adjust predictor influence: OLS estimates are less stable, while ridge estimators show more consistent and reliable coefficients. Overall, NATPR2 effectively reduces MSE and provides stable coefficient estimates, underscoring their suitability for addressing multicollinearity.

The results show that the new ridge parameters consistently have lower MSE compared to all existing ones. Furthermore, while most of the ridge parameters demonstrate similar performance levels, they significantly surpass OLS in reducing MSE. The analysis of real-world data further supports these findings, with the MSE values for the proposed estimators (highlighted in bold) being notably lower compared to other ridge estimators.

The performance of ridge estimators varies significantly based on the MSE and the coefficients of the predictors. Among the new ridge parameters, NATPR2 and NATPR6 stand out as some of the best performing in terms of MSE. The interpretation of the coefficients suggests that different estimators highlight the importance of various predictors, with some exhibiting extreme values that indicate significant impacts. These variations emphasize the importance of selecting an appropriate estimator tailored to the specific context and characteristics of the dataset to effectively address multicollinearity.

5. Conclusions

In this research, six new ridge parameters have been proposed and evaluated their performance through a comprehensive simulation study focusing on the minimum mean squared error criterion. The results indicate that the new parameter, NATPR2, consistently achieves the minimum estimated MSE among the existing ridge estimators, demonstrating superior capability in dealing with multicollinearity compared to existing ridge estimators and OLS. Additionally, the application of these estimators to a real-world dataset showcases their practical effectiveness in mitigating multicollinearity. This dual approach illustrates the robustness and applicability of the new ridge parameters in improving the reliability of regression models affected by multicollinearity. We encourage the use of ITPR estimators in statistical and data analysis practice to address multicollinearity issues effectively. In terms of future work, it will be of interest to compare ITPR estimators with other advanced techniques, such as Lasso and Elastic Net, to identify their relative strengths and potential complementarities.

Author Contributions

All the authors have contributed equally to the article. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available within the article.

Acknowledgments

The authors thank the editor and reviewers for their valuable suggestions that greatly improved this article. The authors would like to acknowledge the Deanship of Graduate Studies and Scientific Research, Taif University for funding this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Detailed Simulation Results

This appendix presents the detailed simulation results that correspond to the summary statistics provided in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11 and Table A12 of the main document. This study explores the impact of varying ρ values (0.80, 0.90, 0.95, and 0.99), sample sizes (n = 20, 50, and 100), values of

σ

as 0.40, 0.90, 4, and 10, and the number of predictors

p =

(4 and 10).

Table A1. Estimated MSE for n = 20, p = 4, with

σ = 0.40

and

0.90

.

Table A1. Estimated MSE for n = 20, p = 4, with

σ = 0.40

and

0.90

.

	σ = 0.40				σ = 0.90
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	0.38325	0.56410	1.10678	5.08776	2.02120	3.08827	6.21301	25.4360
HK	0.27352	0.36538	0.57340	1.67473	0.92472	1.33875	2.14300	8.11300
HKB	0.17558	0.21244	0.35762	1.21020	0.60053	0.95851	1.79455	5.35672
KAM	0.36068	0.52537	0.99588	4.33711	1.79705	2.72268	5.37244	21.0102
KGM	0.08936	0.12693	0.19238	0.54064	0.35178	0.47335	0.65605	1.72347
KMed	0.10809	0.14431	0.23213	0.58433	0.40712	0.53242	0.65377	1.29987
KMS	0.22742	0.31859	0.56050	2.91765	1.12991	1.81596	3.89148	17.4913
LCTPR	0.01658	0.02649	0.03981	0.13418	0.08975	0.12827	0.16704	0.24686
TKTPR	0.35418	0.27736	0.16511	1.35473	1.00313	1.07124	0.69499	0.18079
MTPR1	0.00264	0.00309	0.00368	0.00898	0.03226	0.02298	0.01334	0.09093
MTPR2	0.00362	0.00481	0.00657	0.04307	0.03232	0.05201	0.05249	0.14886
MTPR3	0.09841	0.15945	0.48436	3.14634	0.63172	1.58254	3.18781	17.5655
NATPR1	0.03601	0.04298	0.05750	0.17385	0.02779	0.03266	0.04844	0.08461
NATPR2	0.00281	0.00262	0.00218	0.00222	0.01308	0.01270	0.01157	0.01051
NATPR3	0.24969	0.34247	0.56288	1.91231	0.91421	1.22717	2.15407	5.88694
NATPR4	0.00335	0.00309	0.00238	0.00223	0.01645	0.01523	0.01279	0.01074
NATPR5	0.18828	0.25487	0.39105	1.59626	0.21541	0.34326	0.60748	1.78942
NATPR6	0.02647	0.02566	0.01859	0.00760	0.14125	0.12812	0.09407	0.03670

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A2. Estimated MSE values for n = 20, p = 4, σ = 4 and 10.

	σ = 4				σ = 10
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	37.7789	57.6475	104.817	566.477	225.587	337.234	678.806	3319.24
HK	11.7096	17.3431	30.0443	178.555	72.6859	102.151	179.141	1007.68
HKB	9.66101	14.5322	24.8174	141.527	53.5046	76.9805	149.327	706.840
KAM	31.9563	48.4306	86.7160	472.236	186.9560	280.100	561.313	2723.56
KGM	2.83752	3.90981	5.80974	15.4068	11.5785	12.9029	21.7525	62.0216
KMed	2.61183	3.10228	5.40036	18.2810	13.6536	14.4878	29.3715	143.154
KMS	28.5101	44.8173	83.7848	502.088	192.088	292.716	607.221	3112.57
LCTPR	0.93711	0.75989	0.57678	0.33893	6.60311	4.58271	5.03358	1.57920
TKTPR	5.41471	4.52518	4.32591	5.08503	27.6281	29.1989	24.3389	25.9264
MTPR1	0.71862	0.51164	0.46462	0.55903	6.67132	6.51933	5.62609	3.11176
MTPR2	2.31943	2.26714	4.72434	29.1980	23.8595	29.1815	46.9396	278.649
MTPR3	11.52001	16.5018	37.2093	308.865	78.8923	133.535	257.805	1768.56
NATPR1	0.54013	0.37295	0.35594	0.51030	5.14761	3.78066	4.79072	3.55465
NATPR2	0.43421	0.27182	0.23576	0.26524	4.99772	3.41782	4.26760	1.42585
NATPR3	9.08894	11.0002	13.7066	38.0659	26.1653	37.2251	49.4474	100.545
NATPR4	0.53725	0.33602	0.26702	0.23260	6.69660	4.53944	4.87783	1.50382
NATPR5	0.83120	0.65655	0.79205	2.72999	5.62601	4.77334	6.08720	11.3683
NATPR6	4.7499	3.57255	2.70413	0.91532	46.6870	48.1130	44.0609	24.1001

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A3. Estimated MSE for n = 50,

p = 4

, σ = 0.4 and 0.9.

Table A3. Estimated MSE for n = 50,

p = 4

, σ = 0.4 and 0.9.

	σ = 0.40				σ = 0.90
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	0.09560	0.12932	0.24656	1.47709	0.44120	0.61533	1.27835	7.06826
HK	0.08822	0.12726	0.19987	0.75187	0.31644	0.26668	0.63102	2.40309
HKB	0.06556	0.08169	0.11527	0.47550	0.21118	0.26615	0.38654	1.75395
KAM	0.09402	0.12640	0.23691	1.34434	0.41660	0.57409	1.15592	6.11915
KGM	0.04013	0.04597	0.07088	0.28975	0.12606	0.15533	0.24764	0.91630
KMed	0.05751	0.05486	0.07722	0.34678	0.15466	0.17570	0.30426	0.85916
KMS	0.08000	0.10284	0.16675	0.77977	0.27044	0.35131	0.64298	4.25617
LCTPR	0.00582	0.00935	0.01190	0.06659	0.03055	0.04078	0.06310	0.23857
TKTPR	1.11710	0.64934	0.02914	0.07704	0.30975	0.62575	0.25885	0.02343
MTPR1	0.00105	0.02148	0.00083	0.00086	0.00533	0.00739	0.00501	0.00393
MTPR2	0.00114	0.00199	0.00088	0.00162	0.00594	0.00713	0.00673	0.00930
MTPR3	0.00101	0.02676	0.00105	0.13452	0.00817	0.02552	0.05803	0.56574
NATPR1	0.00923	0.00813	0.00518	0.00549	0.00619	0.00551	0.00538	0.00487
NATPR2	0.00108	0.00094	0.00084	0.00086	0.00543	0.00498	0.00501	0.00392
NATPR3	0.07844	0.10206	0.18017	0.82045	0.30740	0.40529	0.73433	2.68080
NATPR4	0.00131	0.00112	0.00093	0.00088	0.00677	0.00607	0.00549	0.00402
NATPR5	0.06011	0.07045	0.08744	0.21314	0.02724	0.02336	0.02447	0.09847
NATPR6	0.01934	0.02012	0.01728	0.00734	0.10165	0.10944	0.08842	0.03579

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A4. Estimated MSE for n = 50,

p = 4

, σ = 4 and 10.

Table A4. Estimated MSE for n = 50,

p = 4

, σ = 4 and 10.

	σ = 4				σ = 10
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
Ols	9.66568	13.1143	27.0237	129.944	54.2773	77.5192	154.840	769.351
HK	3.45855	4.16569	8.74296	42.3718	15.5678	26.1527	46.4637	241.900
HKB	2.49479	3.16439	6.55168	29.9469	12.0725	19.3421	33.1454	164.928
KAM	8.36355	11.2563	23.0661	109.657	45.6901	66.1299	130.409	645.810
KGM	1.25234	1.56147	2.49765	7.72815	4.07929	5.75169	8.94809	24.2595
KMed	1.14813	1.39380	2.13626	9.35231	4.34931	6.01744	10.6362	51.1085
KMS	6.36262	8.73895	19.3800	105.800	41.5918	62.2790	128.058	691.980
LCTPR	0.45701	0.47877	0.40148	0.21646	2.2749	2.02178	1.26625	0.64292
TKTPR	3.01712	2.59049	2.20651	2.12551	10.8346	12.8267	9.33270	12.0617
MTPR1	0.14603	0.13766	0.08999	0.16251	1.85013	1.40617	0.97573	0.55485
MTPR2	0.41489	0.32031	0.42134	2.57251	3.26083	4.01291	4.83125	27.5295
MTPR3	1.14314	1.38185	3.53653	39.9506	8.53750	14.7589	32.1481	254.594
NATPR1	0.10175	0.09169	0.08330	0.08218	1.56019	1.29169	0.90789	0.66819
NATPR2	0.10143	0.09143	0.08304	0.08196	1.54720	1.27473	0.88969	0.53831
NATPR3	3.39570	4.46672	7.45535	18.2845	12.0053	15.1865	23.4625	76.0472
NATPR4	0.13518	0.11629	0.09369	0.08394	1.97126	1.55626	1.00531	0.55562
NATPR5	0.10545	0.09529	0.08971	0.09326	1.60726	1.35288	0.98529	1.40109
NATPR6	2.68211	2.707664	2.159933	0.743203	24.4514	27.23933	27.27038	12.80432

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A5. MSE estimates for n = 100,

p = 4

,

σ = 0.4

and 0.9.

Table A5. MSE estimates for n = 100,

p = 4

,

σ = 0.4

and 0.9.

	σ = 0.4				σ = 0.90
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	0.03372	0.05191	0.10649	0.53829	0.17359	0.27880	0.54897	2.28734
HK	0.03277	0.10858	0.09717	0.36753	0.15085	0.17847	0.37377	0.91008
HKB	0.02972	0.04201	0.06732	0.21269	0.10335	0.13507	0.23604	0.57165
KAM	0.0335	0.05136	0.10444	0.50566	0.16871	0.26823	0.51522	2.01569
KGM	0.02353	0.02484	0.03698	0.13131	0.06735	0.09038	0.13254	0.34373
KMed	0.05924	0.05037	0.04322	0.13810	0.08571	0.10483	0.14664	0.42521
KMS	0.03136	0.04634	0.08674	0.30612	0.13361	0.19386	0.31542	1.16164
LCTPR	0.00235	0.00290	0.00548	0.02877	0.01228	0.01950	0.02824	0.09874
TKTPR	1.27013	0.14490	0.05432	0.00716	0.95260	0.84373	0.18538	0.07456
MTPR1	0.00049	0.00240	0.00048	0.00045	0.00289	0.00784	0.00248	0.00218
MTPR2	0.00053	0.00078	0.00049	0.00149	0.00300	0.00518	0.00278	0.00421
MTPR3	0.00047	0.00436	0.00047	0.00733	0.00281	0.07713	0.01470	0.03107
NATPR1	0.00394	0.00323	0.00228	0.00094	0.00327	0.00288	0.00258	0.00222
NATPR2	0.00050	0.00049	0.00048	0.00038	0.00273	0.00266	0.00247	0.00213
NATPR3	0.02994	0.04478	0.08398	0.32676	0.13541	0.19684	0.35173	1.20957
NATPR4	0.00062	0.00059	0.00054	0.00040	0.00373	0.00329	0.00280	0.00220
NATPR5	0.02430	0.03094	0.04277	0.05343	0.01383	0.01146	0.00883	0.01298
NATPR6	0.01178	0.01366	0.01586	0.00954	0.06819	0.08005	0.08253	0.04482

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A6. MSE estimates for n = 100, p = 4,

σ = 4

and 10.

Table A6. MSE estimates for n = 100, p = 4,

σ = 4

and 10.

	σ = 4				σ = 10
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
Ols	3.41671	5.44906	10.0840	49.2304	20.6850	32.5107	61.8685	296.501
HK	1.42292	2.08132	3.06763	14.7420	7.40148	10.0053	19.1454	84.8054
HKB	0.96314	1.48912	2.46986	10.5845	5.24787	8.53798	14.2342	58.3697
KAM	3.03196	4.78069	8.62290	41.6646	17.7665	27.7457	52.1899	246.519
KGM	0.62838	0.83284	1.05099	3.91369	2.30684	2.89242	4.14741	11.5657
KMed	0.70710	0.84530	0.97211	3.67698	2.27520	2.70039	4.20093	18.8968
KMS	2.03633	3.36576	6.38540	37.0754	14.8772	24.2202	47.7874	252.597
LCTPR	0.26119	0.32606	0.28362	0.23564	0.99514	1.06884	0.63539	0.45309
TKTPR	1.97415	2.01673	0.88277	0.36200	6.47696	5.88733	7.11104	4.10309
MTPR1	0.06432	0.04916	0.04890	0.04300	0.46677	0.60136	0.30686	0.34005
MTPR2	0.09785	0.09834	0.07145	0.12377	0.70755	0.94256	1.06638	2.93319
MTPR3	0.17257	0.46320	0.24133	2.77433	1.51864	2.37112	5.81781	37.4781
NATPR1	0.05890	0.04636	0.04456	0.04298	0.42796	0.49206	0.29876	0.39425
NATPR2	0.05885	0.04630	0.04452	0.04293	0.42702	0.49078	0.29694	0.33900
NATPR3	1.96999	2.23069	3.82652	10.7023	5.97492	8.23171	13.5469	32.8223
NATPR4	0.06792	0.06037	0.05128	0.04437	0.63454	0.62988	0.35202	0.35009
NATPR5	0.05311	0.04724	0.04533	0.04436	0.43246	0.49850	0.30716	0.68939
NATPR6	1.54673	1.75892	1.70891	1.02186	12.4949	14.4197	15.9081	11.8794

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A7. MSE estimates for n = 20, p = 10,

σ = 0.4

and 0.9.

Table A7. MSE estimates for n = 20, p = 10,

σ = 0.4

and 0.9.

	σ = 0.4				σ = 0.90
Estimators	0.80	0.90	0.95	0.99	0.88	0.90	0.95	0.99
OLS	0.83107	2.60898	5.41201	25.7680	4.42697	12.7023	25.0088	135.511
HK	0.59995	1.35747	2.51169	10.1866	2.25646	4.94137	9.63424	52.2559
HKB	0.25822	0.56756	1.09388	4.54666	1.01769	2.49751	4.48970	24.6443
KAM	0.81389	2.51300	5.18019	24.4276	4.27839	12.0651	23.7112	128.755
KGM	0.08848	0.14808	0.27499	1.19526	0.36384	0.65347	1.26263	5.16765
KMed	0.10124	0.17153	0.36192	1.49885	0.46692	0.84105	1.57994	5.07678
KMS	0.45982	1.21016	2.77413	16.2384	2.50214	7.34610	15.8761	105.176
LCTPR	0.00421	0.00505	0.00764	0.04440	0.02092	0.03063	0.05063	0.18785
TKTPR	0.09808	0.15664	0.26824	0.02456	0.49557	0.49839	1.11501	0.03957
MTPR1	0.00129	0.00110	0.00105	0.00132	0.00818	0.00588	0.03639	0.00406
MTPR2	0.00152	0.00211	0.00291	0.02679	0.01846	0.02278	0.06931	0.19057
MTPR3	0.01471	0.26679	1.26496	14.9825	0.19907	2.13654	4.04481	67.0210
NATPR1	0.06954	0.09096	0.16149	0.69063	0.03413	0.09244	0.15493	0.77417
NATPR2	0.00136	0.00093	0.00106	0.00084	0.00595	0.00474	0.00427	0.00407
NATPR3	0.67726	1.77365	3.32387	11.3829	3.00055	6.35503	10.9464	44.0648
NATPR4	0.00175	0.00117	0.00117	0.00083	0.00845	0.00617	0.00498	0.00415
NATPR5	0.46996	1.06290	2.11394	9.69199	0.75946	1.92571	3.51794	20.7961
NATPR6	0.05154	0.04257	0.03169	0.00974	0.28803	0.22800	0.16991	0.04782

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A8. MSE estimates for n = 20, p = 10,

σ = 4

and 10.

Table A8. MSE estimates for n = 20, p = 10,

σ = 4

and 10.

	σ = 4				σ = 10
Estimation	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
Ols	81.1749	254.896	508.520	2543.38	501.168	1554.58	3229.25	16189.7
HK	36.3600	99.4658	188.804	941.444	206.140	613.994	1250.71	6020.83
HKB	16.9698	47.9965	82.9070	387.143	94.8011	266.112	556.740	3105.78
KAM	77.7797	241.660	482.359	2395.18	478.693	1475.00	3063.37	15316.3
KGM	4.93754	8.45690	17.4561	64.1206	24.7149	48.5526	87.7588	330.942
KMed	5.66952	9.50056	20.2434	89.6196	33.2890	65.1450	130.632	581.689
KMS	63.5509	209.710	435.764	2324.71	440.307	1415.53	3004.01	15560.1
LCTPR	0.51798	0.39292	0.47234	0.28384	5.98181	4.48636	4.26051	1.90416
TKTPR	11.7175	10.6349	4.88778	3.79228	102.189	97.4429	103.158	105.829
MTPR1	0.95478	0.26812	0.13593	0.10622	10.4110	7.11033	6.6429	4.26234
MTPR2	5.52362	8.49067	16.5498	114.627	83.6971	162.684	401.632	2238.43
MTPR3	20.0522	86.5001	191.658	1743.14	224.861	754.627	1719.82	11696.3
NATPR1	0.19176	0.14706	0.24177	0.35938	5.54509	5.16434	7.56588	71.3746
NATPR2	0.14816	0.09556	0.08707	0.08008	4.66976	3.42793	3.87346	6.48998
NATPR3	36.7736	68.3937	111.536	396.630	142.060	264.580	442.532	1347.68
NATPR4	0.21520	0.12700	0.10197	0.08289	5.89517	3.99598	3.89408	1.80819
NATPR5	0.82190	2.51567	5.02491	15.7964	8.10474	10.9468	22.8083	144.692
NATPR6	8.87995	6.12479	4.14233	1.04484	110.137	93.8545	90.6088	57.5535

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A9. MSE estimates for n = 50, p = 10,

σ = 0.4

and 0.9.

Table A9. MSE estimates for n = 50, p = 10,

σ = 0.4

and 0.9.

	σ = 0.4				σ = 0.90
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	0.24655	0.40945	0.85319	4.41072	1.29135	2.10129	4.17951	22.1215
HK	0.21727	0.32579	0.57188	2.14813	0.83445	1.16256	1.91748	8.53736
HKB	0.11598	0.17023	0.23294	0.93112	0.35696	0.46094	0.84443	3.75902
KAM	0.24413	0.40294	0.83166	4.21949	1.25912	2.03022	4.00570	20.9982
KGM	0.02965	0.03966	0.06731	0.32210	0.11110	0.14677	0.28496	1.21435
KMed	0.03149	0.04008	0.07986	0.41058	0.13390	0.18634	0.39113	1.65792
KMS	0.17116	0.23657	0.41942	2.13626	0.67957	1.05639	2.08482	13.7997
LCTPR	0.00126	0.00152	0.00207	0.01143	0.00694	0.00695	0.01308	0.04408
TKTPR	0.04235	0.02064	0.05770	0.03880	0.16308	0.12482	0.07882	0.46178
MTPR1	0.00052	0.00045	0.00042	0.01068	0.00283	0.00229	0.00181	0.00181
MTPR2	0.00054	0.00047	0.00045	0.01580	0.00308	0.00266	0.00267	0.01821
MTPR3	0.00048	0.00064	0.00280	0.37904	0.00828	0.06612	0.07335	1.92968
NATPR1	0.01103	0.00861	0.00568	0.01246	0.00394	0.00319	0.00290	0.00493
NATPR2	0.00053	0.00045	0.00040	0.00032	0.00285	0.00229	0.00180	0.00177
NATPR3	0.21681	0.33418	0.65390	2.71747	1.00639	1.51375	2.80384	10.5563
NATPR4	0.00065	0.00057	0.00046	0.00034	0.00387	0.00299	0.00215	0.00183
NATPR5	0.14243	0.17983	0.26513	0.83842	0.09445	0.10936	0.17351	0.81872
NATPR6	0.03078	0.03756	0.03554	0.01480	0.18508	0.20223	0.18573	0.07258

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A10. MSE estimates for n = 50, p = 10,

σ = 4

and 10.

Table A10. MSE estimates for n = 50, p = 10,

σ = 4

and 10.

	σ = 4				σ = 10
Estimation	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
Ols	26.2784	41.3571	87.2430	444.742	160.3181	337.234	678.806	3319.24
HK	11.3091	15.9104	31.9102	177.699	66.51654	102.151	179.141	1007.68
HKB	5.11381	7.56262	13.2009	72.5361	29908762	76.9805	149.327	706.840
KAM	25.2197	39.3850	82.5592	421.483	153.7216	280.100	561.313	2723.56
KGM	1.54732	2.21816	3.64172	16.0661	7.149232	12.9029	21.7525	62.0216
KMed	1.97754	2.69152	4.23310	22.7442	9.429115	14.4878	29.3715	143.154
KMS	18.4312	29.3836	64.7798	375.958	1335800	292.716	607.221	3112.57
LCTPR	0.12715	0.14947	0.16376	0.20436	1.25548	4.58271	5.03358	1.57920
TKTPR	2.14771	1.59326	0.73290	1.63445	26.7189	29.1989	24.3389	25.9264
MTPR1	0.08196	0.12449	0.04636	0.03718	1.15109	6.51933	5.62609	3.11176
MTPR2	0.18039	0.24309	0.22574	3.25197	6.46149	29.1815	46.9396	278.649
MTPR3	0.90035	2.21962	2.65107	62.3240	23.6076	133.535	257.805	1768.56
NATPR1	0.05770	0.04335	0.03850	0.03314	0.84851	3.78066	4.79072	3.55465
NATPR2	0.05717	0.04290	0.03809	0.03258	0.83319	3.41782	4.26760	1.42585
NATPR3	14.6153	18.6054	33.6821	123.566	59.9422	37.2251	49.4474	100.545
NATPR4	0.08987	0.06017	0.04546	0.03393	11.8457	4.53944	4.87783	1.50382
NATPR5	0.07981	0.07185	0.08103	0.25321	0.92458	4.77334	6.08720	11.3683
NATPR6	5.01505	5.14999	4.20820	1.53582	46.5414	48.1130	44.0609	24.1001

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A11. MSE for

n = 100, p = 10, σ = 0.40 a n d σ = 0.90

.

Table A11. MSE for

n = 100, p = 10, σ = 0.40 a n d σ = 0.90

.

	σ = 0.40				σ = 0.9
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	0.13394	0.14359	0.25101	1.29691	0.64083	0.66486	1.42647	7.08059
HK	0.12467	0.13369	0.22108	0.85026	0.47619	0.51459	0.92373	3.28431
HKB	0.07558	0.08214	0.12368	0.33642	0.23312	0.23660	0.38007	1.48996
KAM	0.13312	0.14273	0.24855	1.26479	0.62844	0.65376	1.39225	6.82443
KGM	0.01811	0.01899	0.02717	0.11774	0.06072	0.06875	0.12743	0.47928
KMed	0.01881	0.01863	0.02969	0.14151	0.07294	0.08349	0.16103	0.71306
KMS	0.10286	0.11034	0.16800	0.63706	0.35350	0.38730	0.72777	3.87792
LCTPR	0.00067	0.00059	0.00077	0.00408	0.00239	0.00320	0.00499	0.01590
TKTPR	0.01284	0.00989	0.00103	0.17340	0.06077	0.01177	0.12404	0.28563
MTPR1	0.00024	0.00020	0.00025	0.00015	0.00120	0.00125	0.00093	0.00083
MTPR2	0.00025	0.00021	0.00019	0.00022	0.00123	0.00127	0.00104	0.00253
MTPR3	0.00023	0.00019	0.00018	0.03750	0.00116	0.00120	0.00274	0.22131
NATPR1	0.00332	0.00284	0.00165	0.00075	0.00140	0.00143	0.00104	0.00094
NATPR2	0.00025	0.00020	0.00018	0.00015	0.00121	0.00125	0.00093	0.00081
NATPR3	0.12148	0.13001	0.22565	1.01187	0.53915	0.56460	1.12213	4.70696
NATPR4	0.00029	0.00025	0.00022	0.00016	0.01581	0.00163	0.00114	0.00086
NATPR5	0.07366	0.07666	0.10021	0.18028	0.03085	0.03115	0.03023	0.08188
NATPR6	0.02411	0.02507	0.02948	0.01948	0.13686	0.14720	0.16977	0.10039

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Table A12. MSE for

n = 100, p = 10, σ = 4 a n d σ = 10

.

Table A12. MSE for

n = 100, p = 10, σ = 4 a n d σ = 10

.

	σ = 4				σ = 10
Estimators	0.80	0.90	0.95	0.99	0.80	0.90	0.95	0.99
OLS	12.8540	13.2367	27.4983	128.865	83.2064	85.5356	166.720	836.922
HK	5.07849	5.98744	11.4412	52.8225	33.9587	38.7765	67.9607	355.620
HKB	2.43568	2.73602	5.10240	23.8650	16.6004	16.1935	31.6437	162.891
KAM	12.3399	12.7558	26.3889	123.336	79.7521	82.2335	159.845	801.126
KGM	0.87175	1.00895	1.67176	6.75501	4.45336	5.13533	8.65265	34.2907
KMed	1.27724	1.40090	2.11749	8.37770	5.48317	6.52208	10.6702	56.2758
KMS	8.02503	8.35162	18.6268	100.660	65.0768	67.3909	135.566	744.790
LCTPR	0.05704	0.09349	0.09935	0.17982	0.38132	0.65187	0.40283	0.24024
TKTPR	0.81687	1.10237	0.68324	0.20319	11.9902	10.7315	12.0569	5.98078
MTPR1	0.02407	0.02810	0.01897	0.01652	0.52387	0.35385	0.20611	0.10800
MTPR2	0.02827	0.04120	0.02676	0.08000	1.50595	1.17397	2.12948	9.57489
MTPR3	0.05975	0.33043	0.30563	1.80735	4.83654	3.89695	11.3403	69.3587
NATPR1	0.02432	0.02342	0.01910	0.01656	0.17380	0.34883	0.11695	0.10747
NATPR2	0.02426	0.02336	0.01905	0.01652	0.16676	0.34512	0.11637	0.10745
NATPR3	7.78073	8.17582	14.8524	54.4417	36.5701	40.1081	63.1600	213.398
NATPR4	0.03528	0.03364	0.02374	0.01744	0.25665	0.42745	0.15047	0.11339
NATPR5	0.02739	0.02624	0.02265	0.02392	0.18782	0.36516	0.12139	0.11051
NATPR6	3.36533	3.53704	3.65592	2.08955	28.7752	30.3016	33.3207	17.9080

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

References

Lipovetsky, S.; Conklin, W.M. Ridge regression in two-parameter solution. Appl. Stoch. Models Bus. Ind. 2005, 21, 525–540. [Google Scholar] [CrossRef]
Belsley, D.A.; Kuh, E.; Welsch, R.E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Applications to nonorthogonal problems. Technometrics 1970, 12, 69–82. [Google Scholar] [CrossRef]
Hocking, R.R.; Speed, F.M.; Lynn, M.J. A Class of biased estimators in linear regression. Technometrics 1976, 18, 425–437. [Google Scholar] [CrossRef]
Hoerl, A.; Kennard, R.; Baldwin, K. Ridge regression: Some simulations. Commun. Stat. Simul. Comput. 1975, 4, 105–123. [Google Scholar] [CrossRef]
Suhail, S.; Khan, R.; Khan, S. Ridge estimation methods in the presence of multicollinearity: A simulation study. J. Appl. Stat. 2020, 47, 1068–1086. [Google Scholar]
Toker, S.; Kaçiranlar, S. Two-parameter ridge regression estimators. Commun. Stat. Theory Methods 2013, 42, 4110–4115. [Google Scholar]
Khalaf, G.; Mansson, K.; Shukur, G. Improved ridge parameters for biased estimation in linear regression. Commun. Stat. Theory Methods 2013, 42, 4116–4129. [Google Scholar]
Khan, M.S.; Ali, A.; Suhail, M.; Kibria, B.G. On some two parameter estimators for the linear regression models with correlated predictors: Simulation and application. Commun. Stat. Simul. Comput. 2024, 1–15. [Google Scholar] [CrossRef]
Khan, M.S.; Ali, A.; Suhail, M.; Alotaibi, E.S.; Alsubaie, N.E. On the estimation of ridge penalty in linear regression: Simulation and application. Kuwait J. Sci. 2024, 51, 100273. [Google Scholar] [CrossRef]
Yasin, S.; Khan, R.; Hussain, I. Modified two-parameter ridge estimators for linear regression model with multicollinearity. Commun. Stat. Simul. Comput. 2021, 50, 2230–2247. [Google Scholar]
Lipovetsky, S. Ridge regression: A review. J. Appl. Stat. 2006, 33, 697–708. [Google Scholar]
Khan, R.; Mukherjee, R.; Ullah, S. Improved ridge regression estimators in the presence of multicollinearity. J. Stat. Comput. Simul. 2023, 93, 1047–1064. [Google Scholar]
Lukman, Q.F.; Ayinde, K. Review and classification of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–967. [Google Scholar] [CrossRef]
Lukman, Q.F.; Ayinde, K.; Ajiboye, Q.S. Monte Carlo study of some classification-based ridge parameter estimators. J. Mod. Appl. Stat. Methods 2017, 16, 24. [Google Scholar] [CrossRef]
Kibria, B.M.G. Performance of some new ridge regression estimators. Commun. Stat. Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
McDonald, G.C.; Galarneau, D.I. A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
Halawa, S.A.; King, J.E.; Khalaf, G. Mean squared error properties of some recent ridge regression estimators. J. Appl. Stat. 2000, 27, 315–330. [Google Scholar]
Economic Survey of Pakistan, Statistical Supplement. 2022. Available online: https://www.finance.gov.pk/survey_2022.html (accessed on 1 June 2024).
Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]

Figure 1. The pairwise correlation matrix.

Table 1. Summary table of recommended estimator, under certain conditions based on MSE.

Sample Size	Error Variance	$p = 4$				$p = 10$
Sample Size	Error Variance	0.80	0.90	0.95	0.99	0.88	0.90	0.95	0.99
20	0.4	MTPR1	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	MTPR1	NATPR4
	0.9	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2
	4	NATPR2	NATPR2	NATPR2	NATPR4	NATPR2	NATPR2	NATPR2	NATPR2
	10	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2
50	0.4	MTPR3	NATPR2	NATPR2	MTPR1	MTPR3	MTPR1	NATPR2	NATPR2
	0.9	MTPR1	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2
	4	NATPR2	NATPR2	NATPR1	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2
	10	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2
100	0.5	MTPR3	NATPR2	MTPR3	NATPR2	MTPR3	MTPR3	NTPR3	NATPR2
	1	NATPR2	NATPR2	NATPR2	NATPR1	MTPR3	MTPR3	NATPR2	NATPR2
	5	NATPR2	NATPR2	NATPR2	NATPR1	NATPR2	NATPR2	NATPR2	NATPR2
	10	NATPR1	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2	NATPR2

Note: summary of estimated MSE recommended estimators.

Table 2. Variance Inflation Factor (VIF) analysis of predictor variables.

Variables	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$
VIF	13.49	30.56	3.95	7.74	154,653.9	633,024.8	1.64	165,801.7

Table 3. MSE and regression coefficient estimates.

Estimators	MSE	${\hat{α}}_{1}$	${\hat{α}}_{2}$	${\hat{α}}_{3}$	${\hat{α}}_{4}$	${\hat{α}}_{5}$	${\hat{α}}_{6}$	${\hat{α}}_{7}$	${\hat{α}}_{8}$
OLS	4262.71	−0.0540	−0.5194	−0.08515	6.142853	−0.14297	−0.52688	−0.14228	0.003376
HK	0.65275	−0.2607	0.42329	0.00541	−177.354	−0.31855	0.429389	0.026622	−0.00013
HKB	21.1698	−0.5194	−0.1404	0.025403	−0.14111	−0.34176	−0.14248	3.395291	−0.05460
KAM	2793.04	0.42330	0.02657	−0.00097	−0.32021	0.159179	0.02696	−0.28325	−0.26341
KGM	0.15643	−0.1404	6.02792	−0.04636	−0.34608	−0.02782	6.216128	−0.05454	−0.52474
KMed	0.13533	0.02657	−11.471	−0.19166	0.161788	0.000874	−63.4684	−0.26314	0.427647
KMS	3936.71	6.14316	−0.0540	−0.31095	−0.02834	0.003284	−0.05432	−0.52415	−0.14190
LCTPR	0.12337	−184.56	−0.2607	0.194707	0.000893	−0.00013	−0.26206	0.427103	0.026849
TKTPR	0.14270	−0.0540	−0.5194	−0.04348	0.003354	−0.10945	−0.52205	−0.14168	6.173631
MTPR1	0.12236	−0.2604	0.42330	0.001841	−0.00013	−0.33300	0.425448	0.026725	−36.0849
MTPR2	0.14065	−0.5185	−0.1404	0.007421	−0.10582	−0.41771	−0.14117	5.025176	−0.07592
MTPR3	508.540	0.42201	0.02657	−0.00028	−0.33255	0.212539	0.026707	−1.00387	−0.30938
NATPR1	1.63181	0.02567	−149.34	−0.26074	0.219751	0.001316	−12.9337	−0.32069	0.304517
NATPR2	0.12167	−0.1082	−0.2359	0.423305	0.001389	−0.00019	−0.26462	0.162571	0.002758
NATPR3	3.46265	1.94433	−0.0515	−0.51941	−0.04121	0.005001	−0.05486	−0.34737	−0.06687
NATPR4	25.6981	−0.1396	6.14133	−0.05404	−0.42602	−0.03945	6.071767	−0.14055	−0.49408
NATPR5	167.709	−0.0540	−0.4345	−0.14046	0.005291	−0.05482	−0.52698	−0.02850	0.011033
NATPR6	0.16138	−0.2607	0.31543	0.026579	−0.00020	−0.26448	0.429256	0.000898	−0.00042

Note: the bold values represent the lowest estimated MSE of the ridge parameters.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akhtar, N.; Alharthi, M.F.; Khan, M.S. Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators. Mathematics 2024, 12, 3027. https://doi.org/10.3390/math12193027

AMA Style

Akhtar N, Alharthi MF, Khan MS. Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators. Mathematics. 2024; 12(19):3027. https://doi.org/10.3390/math12193027

Chicago/Turabian Style

Akhtar, Nadeem, Muteb Faraj Alharthi, and Muhammad Shakir Khan. 2024. "Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators" Mathematics 12, no. 19: 3027. https://doi.org/10.3390/math12193027

APA Style

Akhtar, N., Alharthi, M. F., & Khan, M. S. (2024). Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators. Mathematics, 12(19), 3027. https://doi.org/10.3390/math12193027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators

Abstract

1. Introduction

2. Methodology

2.1. Existing Ridge Parameters

2.2. Proposed Ridge Parameters

3. Monte Carlo Simulation

3.1. Simulation Technique

3.2. Performance Evaluation Criteria

3.3. Analysis and Findings

4. Real-Life Data Analysis

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Detailed Simulation Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI