Next Article in Journal
Explicit Modeling of Multi-Product Customer Orders in a Multi-Period Production Planning Model
Next Article in Special Issue
Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem
Previous Article in Journal
On the Optimal Choice of Strike Conventions in Exchange Option Pricing
Previous Article in Special Issue
Robust Negative Binomial Regression via the Kibria–Lukman Strategy: Methodology and Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators

by
Nadeem Akhtar
1,*,
Muteb Faraj Alharthi
2 and
Muhammad Shakir Khan
3
1
Higher Education Department, Peshawar 26281, Khyber Pakhtunkhwa, Pakistan
2
Department of Mathematics and Statistics, College of Science, Taif University, Taif 21944, Saudi Arabia
3
Directorate General Livestock & Dairy Development Department (Research Wing) Peshawar, Peshawar 24551, Khyber Pakhtunkhwa, Pakistan
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(19), 3027; https://doi.org/10.3390/math12193027
Submission received: 8 August 2024 / Revised: 5 September 2024 / Accepted: 25 September 2024 / Published: 27 September 2024
(This article belongs to the Special Issue Application of Regression Models, Analysis and Bayesian Statistics)

Abstract

:
Multicollinearity, a critical issue in regression analysis that can severely compromise the stability and accuracy of parameter estimates, arises when two or more variables exhibit correlation with each other. This paper solves this problem by introducing six new, improved two-parameter ridge estimators (ITPRE): NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6. These ITPRE are designed to remove multicollinearity and improve the accuracy of estimates. A comprehensive Monte Carlo simulation analysis using the mean squared error (MSE) criterion demonstrates that all proposed estimators effectively mitigate the effects of multicollinearity. Among these, the NATPR2 estimator consistently achieves the lowest estimated MSE, outperforming existing ridge estimators in the literature. Application of these estimators to a real-world dataset further validates their effectiveness in addressing multicollinearity, underscoring their robustness and practical relevance in improving the reliability of regression models.

1. Introduction

Regression analysis is one of the most common methods of analysis and modeling recognized for its practicality in forecasting and estimation. Lipovetsky and Conklin [1] have pointed out that it is useful in several ways. The ordinary least-squares (OLS) method is often favored for parameter estimation in regression analysis due to its appealing mathematical properties and computational ease. However, the OLS method works optimally only under specific circumstances, particularly when predictors are orthogonal. In real-world applications, predictors are often highly correlated, leading to multicollinearity problems in regression models. This multicollinearity makes OLS estimates less efficient and can cause important variables to have statistically insignificant regression coefficients. Moreover, it decreases statistical power and produces wider confidence intervals for the regression coefficients. To assess the extent of collinearity among predictors, the condition number (CN) of the X′X matrix, expressed as C N = λ m a x λ m i n , is frequently used. Generally, a CN ≤ 10 indicates weak multicollinearity. As Belsley et al. [2] described, the collinearity is considered negligible when CN ≤ 10, moderate to strong when 10 < CN ≤ 30, and severe when CN ≥ 30. The ridge regression technique was proposed to address multicollinearity in regression models [3,4]. Consider the following linear regression model:
  y = X β + ϵ  
where y is an ( n × 1 ) vector of responses, X is an ( n × p ) design matrix of predictors, β is a ( p × 1 ) vector of unknown regression coefficients, and ϵ is an ( n × 1 ) vector of error terms assumed to follow a multivariate normal distribution with a mean vector of 0 and a variance–covariance matrix of σ 2 I n . In this context, I n denotes the identity matrix of order n. The OLS estimates are calculated as follows:
β ^ O L S = X X 1 X y
The ridge regression estimates are given by the following:
β ^ R i d g e = X X + k I 1 X y
where I is a p × p   identity matrix and k is a positive scalar. From Equation (3), it is evident that the CN of the X X + k I matrix decreases as k increases; however, the introduction of k introduces bias in the ridge estimators in exchange for reduced variances of the regression coefficients. Thus, the key challenge in ridge regression is selecting an optimal value of k that provides the best bias–variance trade-off. The effectiveness of ridge estimators is influenced by various factors, including the degree of correlation among predictors, the variance of errors, the number of predictors, and the sample size. Because data characteristics can vary widely, no single ridge estimator consistently excels in every situation.
Researchers have explored various methods to identify the optimal k value for ridge estimators. Ref. [5] developed a generalized ridge estimator and conducted comparisons with simple ridge and rank procedure methods. They concluded that while the generalized ridge estimator may have potential advantages, simple ridge and rank procedures are sufficiently adaptable for practical use. Hoerl et al. [6] introduced a novel ridge estimator based on the harmonic mean of σ 2 ^ α i 2 ^ and showed through simulation studies that it performed better than OLS. A proposed improvement to the ridge method by developing quantile-based ridge estimators for k i , which demonstrated superior performance over existing ridge estimators and OLS in their simulations [7]. Addressing the challenge of selecting the k value in one-parameter ridge estimators, Lipovetsky and Conklin introduced a two-parameter ridge estimator (TPRE), providing a more refined approach to overcome the limitations of traditional methods [1]. The ridge estimator for the coefficients can be represented in a generalized form as follows:
  β ^ ( q , k ) = q ( X X + k I ) 1 X y  
where q is a scaling factor, X X is the matrix of predictors, y is the vector of responses, k is the ridge parameter, and I is the identity matrix.
q ^ = X y X X + k I 1 X y X y X X + k I 1 X X X X + k I 1 X y
It is important to note that Equation (4) can be seen as a general form of Equations (2) and (3) if q = 1 and q = k   or   k = 0 . Subsequently, different researchers have suggested modifications to the two-parameter ridge regression model; for details, see refs. [8,9,10,11,12].
The proposed method for determining the value of q was by maximizing the R-squared value, while the parameter k was computed by minimizing the mean squared error (MSE). They found that their two-parameter ridge regression model had lower MSE and provided better orthogonality between predicted values and residuals compared to the one-parameter model. This model has since been refined by other researchers. For instance, Lipovetsky [13] further investigated the properties of the two-parameter ridge model, and [8] optimized the tuning parameters k and q, comparing their performance against OLS, one-parameter ridge estimators, and contraction estimators using the matrix MSE criterion. Refs. introduced and developed three new variations of the two-parameter ridge estimators [9,10]. More recently, Khan et al. [14] introduced six novel two-parameter ridge estimators and benchmarked them against the existing two-parameter ridge, one-parameter ridge, and OLS estimators. Although these estimators show superior performance under specific conditions, no single estimator consistently outperforms the others across all scenarios. Lukman and Ayinde [15] conducted a comprehensive review and classification of various techniques used for estimating ridge parameters. Additionally, Lukman, Ayinde, and Ajiboye [16] performed a Monte Carlo analysis on different estimators based on classification methods for ridge parameters. Lipovetsky and Conklin [1] introduced the two-parameter ridge (LCTPR) estimator to improve the fit-of-ridge regression models by using two parameters instead of one. The effectiveness of ridge estimators is influenced by several factors, including the degree of multicollinearity, error variance, the number of predictors, and sample size, and their performance can decline under stringent conditions.
The previous literature shows that, while most of the estimators are efficient under certain conditions, none of the ridge estimators dominate others in all situations. The efficiency is reduced by strong multicollinearity, high variability of errors, a large number of predictors, and a small sample size in the analyzed population. To address these issues, we propose six enhanced two-parameter ridge estimators aimed at effectively tackling issues of severe multicollinearity. These new ridge parameters are formulated based on the optimal selection of the ridge parameters q ^ and k ^ .
The effectiveness of the proposed ridge parameters is assessed through a Monte Carlo simulation study and the analysis of a real dataset where the independent variables are correlated with each other. This article is structured as follows: Section 2 details the methodology of ridge regression and presents our newly proposed ridge parameters alongside a brief overview of several existing ridge estimators. Section 3 describes the simulation study and interprets the findings. Section 4 demonstrates the application of the proposed new ridge parameters using a real dataset. Finally, Section 5 offers concluding remarks and insights.

2. Methodology

To simplify the mathematical representation, model (1) can be reformulated into a canonical or orthogonal form as follows:
y = Z α + ϵ
Equation (6) represents the model reformulated in canonical form, where, Z = X D , α = D β , and D D = I p . Here, D represents an orthogonal matrix containing the eigenvectors of the X′X matrix, while I p denotes the identity matrix. Moreover, Λ = D X X D , with Λ defined as diag λ 1 , λ 2 , , λ p , where λ 1 , λ 2 , , λ p are the positive eigenvalues of the X′X matrix, ordered from smallest to largest.
By using this transformation, Equations (2)–(4) can be rewritten in their canonical forms as follows:
α ^ = Λ 1 Z y
Equation (7) provides the canonical form of the ordinary least squares estimator in the transformed model.
α ^ k = Λ + k I p 1 Z y
Equation (8) refers to the ridge estimator in its canonical form, which incorporates the regularization parameter k.
α ^ ( q , k ) = q Λ + k I p 1 Z y
Equation (9) illustrates the two-parameter ridge estimator, which generalizes the ridge regression by introducing an additional parameter q.

2.1. Existing Ridge Parameters

The following are established ridge parameters.
i.
Hoerl and Kennard parameter
k ^ H K = σ 2 ^ α max 2 ^
ii.
Hoerl, Kennard, and Baldwin (HKB) ridge parameter
k ^ H K B = p σ 2 ^ i = 1 p α i 2 ^
iii.
Kibria parameters
Ref. [17] proposed three ridge parameters by extending the research of [6].
k ^ A M = 1 p i = 1 p σ 2 ^ α i 2 ^
k ^ G M = σ 2 ^ i = 1 p α i 2 ^ 1 p
k ^ M e d = Med σ 2 ^ α i 2 ^
iv.
Khalaf, Mansson, and Shukur (KMS) parameter
k ^ K M S = λ max i = 1 p α i ^ σ 2 ^ α max 2 ^
v.
Toker and Kaciranlar two-parameter ridge parameters
q ^ o p t = i = 1 p α i 2 ^ λ i λ i + k i = 1 p σ 2 ^ λ i + α i 2 ^ λ i 2 λ i + k 2
In the above equation, k is defined as in Equation (10).
k ^ o p t = q ^ o p t i = 1 p σ 2 ^ λ i + q ^ o p t 1   i = 1 p α i 2 ^ λ i 2 i = 1 p α i 2 ^ λ i

2.2. Proposed Ridge Parameters

In this study, we introduce six modified Lipovetsky–Conklin ridge (MLCR) estimators, NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6, which combine the approaches from refs. [1,14] to optimize the values of k and q.
The values of k for our new proposed ridge parameters are determined as follows:
k ^ 1 = 1 p i = 1 p λ i α i ^ 4 3 σ 2 ^ α max 2 ^
The k ^ 1 using the cubic mean of 1 p i = 1 p λ i α i ^ 4 across all predictors, scaled by the variance ratio σ 2 ^ α max 2 ^ . This approach emphasizes the robustness of larger coefficients while controlling for the overall variance.
k ^ 2 = 1 p i = 1 p λ i α i ^ 5 4 σ 2 ^ α max 2 ^
The k ^ 2 is derived using the fourth root mean of λ i α i ^ 5 , providing an alternative weighting scheme that further emphasizes larger coefficients, potentially improving model stability in high-dimensional settings.
k ^ 3 = m i n ( k ^ 1 * ,   k ^ 2 * ,   .   .   . , k ^ p * )
where k ^ i *   = λ i α i ^ .
This parameter selects the minimum value among k ^ i * ,   w h e r e   i = 1,2 , .   . . , p , focusing on the most conservative regularization parameter that balances bias and variance effectively.
k ^ 4 = max k 1 , k 2 , , k p ,   k i = λ i α i ^ ,   where   i = 1,2 , . , . , . , p
k ^ 4 takes the maximum value of λ i α i ^ , maximizing regularization to constrain the influence of variables with large coefficients.
k ^ 5 = M e a n ( λ i α i ^ σ 2 ^ α max 2 ^
k ^ 5 as the square root of the mean of λ i α i ^ , scaled by the variance ratio, offers a balanced approach between extreme regularization and no regularization.
k ^ 6 = max λ i α i ^ 3
k ^ 6 uses the cube root of the maximum ratio λ i α i ^ , targeting a moderate regularization that is neither too conservative nor too aggressive.
The estimators k ^ 1 , k ^ 2 , and k ^ 5 are modifications of the HK ridge estimator Equation (10) in which regularization is improved through different weighting of coefficients. They also seek to enhance model stability and less rigid flexibility, especially when working with high-dimensional data structures, and generate large coefficients and form variances with a range of mathematical techniques and scaling.
The q ^ values are computed using Equation (5), with the corresponding k ^ values clearly shown in Equations (18)–(23). Based on these corresponding k ^ and q ^ values, six optimized two-parameter ridg estimators are derived, referred to as NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6 in this research.
In the following section, we present a simulation study to evaluate the performance of the proposed ridge parameters in comparison to existing ones.

3. Monte Carlo Simulation

This section gives a scenario of the result of the Monte Carlo simulation study that is used to compare the performance of the new ridge parameters with other existing ridge parameters. The subsequent sub-sections provide information about the simulation approach and the employed algorithm. The Monte Carlo simulation is a computational approach used to solve statistical models by sampling using random numbers. Multiple simulations tried to assess the properties like the bias, the variance, and the mean square error in order to analyze the efficiencies of the different methods such as the ridge regression estimator proposed in this context.

3.1. Simulation Technique

In this study, predictors are generated using Equation (24), which takes into account varying degrees of collinearity among them, as described by [17,18], as follows:
x i j = 1 ρ 2 z j i + ρ z j p + 1 , i = 1,2 , , p and j = 1,2 , , n  
where ρ represents the pairwise correlation among predictors, p is the number of predictors, n is the sample size, and z j i are pseudo-random numbers drawn from a standard normal distribution. This study examines ρ values of 0.80, 0.90, 0.95, and 0.99; sample sizes n of 20 ,   50 , and 100 ; and predictor counts p of 4 and 10 .
The response variable is generated using the following equation:
y j = α 0 + α 1 x 1 j + α 2 x 2 j + + α p x p j + ϵ j , j = 1,2 , , n
In this formula, ϵ j is a random error term drawn from a normal distribution with mean 0 and variance σ 2 . This study considers four values of σ : 0.40, 0.90, 4, and 10. The regression coefficients α j are set based on the most favorable direction, as outlined by Halawa et al. [19], with α 0 set to zero.
Hence, the estimated mean squared error (EMSE) is computed from 5000 replications as follows:
EMSE α ^ = 1 5000 j = 1 5000 α j ^ α α j ^ α
All computations were carried out using R-Studio version 2022.12.0.

3.2. Performance Evaluation Criteria

The estimators’ performance is evaluated using the mean square error (MSE) criterion, based on approaches from previous research, including those by [4,18], and using Equations (22)–(27). The estimated MSE for any estimator α ^ of the parameter α is defined as follows:
MSE α ^ = E α ^ α α ^ α
This expression calculates the expected value of the squared deviation between the estimator α ^ and the true parameter α . It serves as a metric for evaluating the estimator’s accuracy and precision, reflecting its ability to provide reliable estimates close to the actual parameter values.

3.3. Analysis and Findings

The estimated MSE values are outlined in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11 and Table A12, available in Appendix A. Our simulation study revealed the following findings:
Across various sample sizes, error variances, and numbers of predictor variables, our new ridge parameters consistently exhibited the lowest estimated MSE under most simulation scenarios. Notably, in cases of severe multicollinearity (ρ > 0.99), the ridge estimators MTPR3, NATPR1, and NATPR2 outperformed the existing estimators.
These findings illustrate the effectiveness of the new ridge parameters in addressing strong multicollinearity. In contrast, the OLS estimator demonstrates the poorest performance under multicollinear conditions when compared to ridge estimators.
As multicollinearity intensifies, the estimated MSE for OLS and many ridge estimators tends to increase. However, an interesting observation is that the MSE of new ridge parameters decreases with higher multicollinearity, suggesting their robustness against such issues.
When the error variance increases, the estimated MSE of all estimators generally rises, regardless of sample size, multicollinearity level, or the number of predictors. Despite this, new ridge parameters maintain a stable MSE, exhibiting only a modest increase compared to the OLS and many existing ridge estimators, showcasing their resilience against higher error variances amidst multicollinearity.
Increasing the sample size results in a decrease in estimated MSE for all estimators, which aligns with general statistical principles. However, ridge regression estimators show markedly better performance than OLS across all sample sizes.
As the number of predictors increases, the estimated MSE for all estimators rises, with the OLS estimator showing a more rapid increase compared to ridge estimators.
Our results, displayed in various tables, reveal that NATPR2 frequently achieves the lowest EMSE. Moreover, MTPR3, NATPR1, and NATPR2 consistently perform well across different levels of ρ .
The simulation results confirm that ridge parameters consistently outperform OLS in the presence of multicollinearity. Furthermore, among the ridge estimators examined, the two-parameter variants are superior to the one-parameter versions. Notably, our new ridge estimators, especially NATPR2, generally outperform existing methods in most scenarios considered.
Table 1 offers an in-depth analysis of the simulation study, encapsulating 96 distinct scenarios to evaluate the performance of various ridge parameters. Among the ridge parameters tested, NATPR2 consistently demonstrated superior performance by recording the lowest MSE in most scenarios. This consistent outperformance highlights NATPR2’s robustness, particularly under stringent or challenging conditions, making it a standout choice compared to other estimators.
Table 1 serves as a practical guide for selecting the most appropriate estimator based on specific conditions, which include varying sample sizes, error variances, and values of p (4 and 10). The analysis is organized around three sample sizes—20, 50, and 100—and considers four distinct error variances: 0.4, 0.9, 4, and 10. For each combination of sample size, error variance and different levels of ρ the table recommends the most suitable estimator.
The NATPR2 is frequently the recommended ridge estimator, especially for all sample sizes and across various error variances and pair correlations. As a result, NATPR2 stands out as the most dependable parameter for minimizing MSE, making it the favored option under the diverse conditions that handle the multicollinearity data. The NATPR2 frequently achieves the lowest EMSE. Moreover, MTPR3, NATPR1, and NATPR2 consistently perform well across different levels of ρ , sample sizes, and different error variances.

4. Real-Life Data Analysis

In this section, we demonstrate the application of new ridge parameters by analyzing a dataset representing Pakistan’s GDP growth. This dataset, summarized in Table 2, includes observations spanning 14 years, from the financial year of 2007–2008 to 2020–2021. The response variable “y” denotes the GDP growth. The predictor variables are as follows: X 1 : Consumer Price Index, X 2 : tax-to-GDP ratio, X 3 : savings-to-GDP ratio, X 4 : investment-to-GDP ratio, X 5 : milk production, X 6 : meat production, X 7 : fish production, X 8 : poultry production.
These data points are sourced from the Economic Survey of Pakistan, Statistical Supplement [20]. These data are modeled using a linear regression approach represented by the following equation:
y = α 0 + α 1 X 1 + α 2 X 2 + α 3 X 3 + α 4 X 4 + α 5 X 5 + α 6 X 6 + α 7 X 7 + α 8 X 8 + ϵ
The condition number of the matrix, calculated to be 3,937,104, indicates severe multicollinearity within the dataset. This analysis is based on the data and methodology outlined by [21], providing insights into the factors influencing Pakistan’s GDP growth.
In the Figure 1, high positive relationships are observed for X 2 with X 5 with a coefficient of 0.75, X 6 with a coefficient of 0.73, and X 8 with a coefficient of 0.71. These orientations have very high correlation coefficients that imply that X 2 has a direct proportional relationship with X 5 , X 6 , and X 8 . Other correlations are moderate or weak, indicating varying degrees of linear relationships between the variables.
In the real dataset, the eigenvalues calculated are 4.134289315, 1.900981733, 1.022049275, 0.58361122, 0.30723455, 0.051004124, 0.000828734, and 1.05008 × 10 6 , the magnitude of the largest eigenvalue is 4.134289315, and the smallest non-zero eigenvalue is 1.05008 × 10 6 . The condition number is calculated as follows:
κ A = λ M a x λ M i n = 4.134289315 1.05008 × 10 6 3,937,488.835
The Variance Inflation Factor (VIF) is a statistic that shows how much the variance of a regression coefficient is increased because of the collinearity of the independent variables. VIF is calculated as follows:
V I F = 1 1 R 2
where R 2 means the coefficient of determination. Tolerance is the reciprocal of VIF and also shows multicollinearity.
VIF = 1: The relationship that exists between the independent and the dependent variables.
Further, 1 < VIF < 5: moderate correlation; acceptable level of contaminant.
And 5 ≤ VIF < 10: high correlation; can be dangerous.
VIF ≥ 10: prevalence of severe multicollinearity.
The VIF analysis reveals that several predictor variables, particularly X5, X6, and X8, exhibit severe multicollinearity.
These are mainly attributed to high VIF values and extreme multicollinearity specifically for X 5 , X 6 ,   a n d   X 8 . In this case, the use of ridge estimators may be required. Higher VIF values imply that multicollinearity may affect the regression results and complicate the effects of each predictor [20]. Ridge regression is another common technique that employs a penalty to the size of coefficients; this method is efficient in managing the issue of multicollinearity by reducing the size of the coefficients of correlated predictors. This approach assists in bringing stability to the model estimates and enhances the clarity of the conclusions made from the regression analysis. The proposed new ridge estimators and existing estimators to reduce the effects of multicollinearity were used, leading to more accurate and efficient models.
Table 3 compares the MSE for various estimators, highlighting their effectiveness in handling multicollinearity. The OLS estimator has the highest MSE (4262.71), indicating poor performance in the presence of multicollinearity. The OLS estimator exhibits the highest MSE, indicating poor performance, whereas ridge estimators like MTPR3, NATPR1, and NATPR2 have the lowest MSE, demonstrating superior capability. The regression coefficients reveal how different estimators adjust predictor influence: OLS estimates are less stable, while ridge estimators show more consistent and reliable coefficients. Overall, NATPR2 effectively reduces MSE and provides stable coefficient estimates, underscoring their suitability for addressing multicollinearity.
The results show that the new ridge parameters consistently have lower MSE compared to all existing ones. Furthermore, while most of the ridge parameters demonstrate similar performance levels, they significantly surpass OLS in reducing MSE. The analysis of real-world data further supports these findings, with the MSE values for the proposed estimators (highlighted in bold) being notably lower compared to other ridge estimators.
The performance of ridge estimators varies significantly based on the MSE and the coefficients of the predictors. Among the new ridge parameters, NATPR2 and NATPR6 stand out as some of the best performing in terms of MSE. The interpretation of the coefficients suggests that different estimators highlight the importance of various predictors, with some exhibiting extreme values that indicate significant impacts. These variations emphasize the importance of selecting an appropriate estimator tailored to the specific context and characteristics of the dataset to effectively address multicollinearity.

5. Conclusions

In this research, six new ridge parameters have been proposed and evaluated their performance through a comprehensive simulation study focusing on the minimum mean squared error criterion. The results indicate that the new parameter, NATPR2, consistently achieves the minimum estimated MSE among the existing ridge estimators, demonstrating superior capability in dealing with multicollinearity compared to existing ridge estimators and OLS. Additionally, the application of these estimators to a real-world dataset showcases their practical effectiveness in mitigating multicollinearity. This dual approach illustrates the robustness and applicability of the new ridge parameters in improving the reliability of regression models affected by multicollinearity. We encourage the use of ITPR estimators in statistical and data analysis practice to address multicollinearity issues effectively. In terms of future work, it will be of interest to compare ITPR estimators with other advanced techniques, such as Lasso and Elastic Net, to identify their relative strengths and potential complementarities.

Author Contributions

All the authors have contributed equally to the article. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available within the article.

Acknowledgments

The authors thank the editor and reviewers for their valuable suggestions that greatly improved this article. The authors would like to acknowledge the Deanship of Graduate Studies and Scientific Research, Taif University for funding this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Detailed Simulation Results

This appendix presents the detailed simulation results that correspond to the summary statistics provided in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11 and Table A12 of the main document. This study explores the impact of varying ρ values (0.80, 0.90, 0.95, and 0.99), sample sizes (n = 20, 50, and 100), values of σ as 0.40, 0.90, 4, and 10, and the number of predictors p = (4 and 10).
Table A1. Estimated MSE for n = 20, p = 4, with σ = 0.40   and 0.90 .
Table A1. Estimated MSE for n = 20, p = 4, with σ = 0.40   and 0.90 .
σ = 0.40σ = 0.90
Estimators0.800.900.950.990.800.900.950.99
OLS0.383250.564101.106785.087762.021203.088276.2130125.4360
HK0.273520.365380.573401.674730.924721.338752.143008.11300
HKB0.175580.212440.357621.210200.600530.958511.794555.35672
KAM0.360680.525370.995884.337111.797052.722685.3724421.0102
KGM0.089360.126930.192380.540640.351780.473350.656051.72347
KMed0.108090.144310.232130.584330.407120.532420.653771.29987
KMS0.227420.318590.560502.917651.129911.815963.8914817.4913
LCTPR0.016580.026490.039810.134180.089750.128270.167040.24686
TKTPR0.354180.277360.165111.354731.003131.071240.694990.18079
MTPR10.002640.003090.003680.008980.032260.022980.013340.09093
MTPR20.003620.004810.006570.043070.032320.052010.052490.14886
MTPR30.098410.159450.484363.146340.631721.582543.1878117.5655
NATPR10.036010.042980.057500.173850.027790.032660.048440.08461
NATPR20.002810.002620.002180.002220.013080.012700.011570.01051
NATPR30.249690.342470.562881.912310.914211.227172.154075.88694
NATPR40.003350.003090.002380.002230.016450.015230.012790.01074
NATPR50.188280.254870.391051.596260.215410.343260.607481.78942
NATPR60.026470.025660.018590.007600.141250.128120.094070.03670
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A2. Estimated MSE values for n = 20, p = 4, σ = 4 and 10.
Table A2. Estimated MSE values for n = 20, p = 4, σ = 4 and 10.
σ = 4σ = 10
Estimators0.800.900.950.990.800.900.950.99
OLS 37.778957.6475104.817566.477225.587337.234678.8063319.24
HK11.709617.343130.0443178.55572.6859102.151179.1411007.68
HKB9.6610114.532224.8174141.52753.504676.9805149.327706.840
KAM31.956348.430686.7160472.236186.9560280.100561.3132723.56
KGM2.837523.909815.8097415.406811.578512.902921.752562.0216
KMed2.611833.102285.4003618.281013.653614.487829.3715143.154
KMS28.510144.817383.7848502.088192.088292.716607.2213112.57
LCTPR0.937110.759890.576780.338936.603114.582715.033581.57920
TKTPR5.414714.525184.325915.0850327.628129.198924.338925.9264
MTPR10.718620.511640.464620.559036.671326.519335.626093.11176
MTPR22.319432.267144.7243429.198023.859529.181546.9396278.649
MTPR311.5200116.501837.2093308.86578.8923133.535257.8051768.56
NATPR10.540130.372950.355940.510305.147613.780664.790723.55465
NATPR20.434210.271820.235760.265244.997723.417824.267601.42585
NATPR39.0889411.000213.706638.065926.165337.225149.4474100.545
NATPR40.537250.336020.267020.232606.696604.539444.877831.50382
NATPR50.831200.656550.792052.729995.626014.773346.0872011.3683
NATPR64.74993.572552.704130.9153246.687048.113044.060924.1001
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A3. Estimated MSE for n = 50, p = 4 , σ = 0.4 and 0.9.
Table A3. Estimated MSE for n = 50, p = 4 , σ = 0.4 and 0.9.
σ = 0.40σ = 0.90
Estimators0.800.900.950.990.800.900.950.99
OLS0.095600.129320.246561.477090.441200.615331.278357.06826
HK0.088220.127260.199870.751870.316440.266680.631022.40309
HKB0.065560.081690.115270.475500.211180.266150.386541.75395
KAM0.094020.126400.236911.344340.416600.574091.155926.11915
KGM0.040130.045970.070880.289750.126060.155330.247640.91630
KMed0.057510.054860.077220.346780.154660.175700.304260.85916
KMS0.080000.102840.166750.779770.270440.351310.642984.25617
LCTPR0.005820.009350.011900.066590.030550.040780.063100.23857
TKTPR1.117100.649340.029140.077040.309750.625750.258850.02343
MTPR10.001050.021480.000830.000860.005330.007390.005010.00393
MTPR20.001140.001990.000880.001620.005940.007130.006730.00930
MTPR30.001010.026760.001050.134520.008170.025520.058030.56574
NATPR10.009230.008130.005180.005490.006190.005510.005380.00487
NATPR20.001080.000940.000840.000860.005430.004980.005010.00392
NATPR30.078440.102060.180170.820450.307400.405290.734332.68080
NATPR40.001310.001120.000930.000880.006770.006070.005490.00402
NATPR50.060110.070450.087440.213140.027240.023360.024470.09847
NATPR60.019340.020120.017280.007340.101650.109440.088420.03579
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A4. Estimated MSE for n = 50, p = 4 , σ = 4 and 10.
Table A4. Estimated MSE for n = 50, p = 4 , σ = 4 and 10.
σ = 4σ = 10
Estimators0.800.900.950.990.800.900.950.99
Ols9.6656813.114327.0237129.94454.277377.5192154.840769.351
HK3.458554.165698.7429642.371815.567826.152746.4637241.900
HKB2.494793.164396.5516829.946912.072519.342133.1454164.928
KAM8.3635511.256323.0661109.65745.690166.1299130.409645.810
KGM1.252341.561472.497657.728154.079295.751698.9480924.2595
KMed1.148131.393802.136269.352314.349316.0174410.636251.1085
KMS6.362628.7389519.3800105.80041.591862.2790128.058691.980
LCTPR0.457010.478770.401480.216462.27492.021781.266250.64292
TKTPR3.017122.590492.206512.1255110.834612.82679.3327012.0617
MTPR10.146030.137660.089990.162511.850131.406170.975730.55485
MTPR20.414890.320310.421342.572513.260834.012914.8312527.5295
MTPR31.143141.381853.5365339.95068.5375014.758932.1481254.594
NATPR10.101750.091690.083300.082181.560191.291690.907890.66819
NATPR20.101430.091430.083040.081961.547201.274730.889690.53831
NATPR33.395704.466727.4553518.284512.005315.186523.462576.0472
NATPR40.135180.116290.093690.083941.971261.556261.005310.55562
NATPR50.105450.095290.089710.093261.607261.352880.985291.40109
NATPR62.682112.7076642.1599330.74320324.451427.2393327.2703812.80432
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A5. MSE estimates for n = 100, p = 4 , σ = 0.4 and 0.9.
Table A5. MSE estimates for n = 100, p = 4 , σ = 0.4 and 0.9.
σ = 0.4σ = 0.90
Estimators0.800.900.950.990.800.900.950.99
OLS0.033720.051910.106490.538290.173590.278800.548972.28734
HK0.032770.108580.097170.367530.150850.178470.373770.91008
HKB0.029720.042010.067320.212690.103350.135070.236040.57165
KAM0.03350.051360.104440.505660.168710.268230.515222.01569
KGM0.023530.024840.036980.131310.067350.090380.132540.34373
KMed0.059240.050370.043220.138100.085710.104830.146640.42521
KMS0.031360.046340.086740.306120.133610.193860.315421.16164
LCTPR0.002350.002900.005480.028770.012280.019500.028240.09874
TKTPR1.270130.144900.054320.007160.952600.843730.185380.07456
MTPR10.000490.002400.000480.000450.002890.007840.002480.00218
MTPR20.000530.000780.000490.001490.003000.005180.002780.00421
MTPR30.000470.004360.000470.007330.002810.077130.014700.03107
NATPR10.003940.003230.002280.000940.003270.002880.002580.00222
NATPR20.000500.000490.000480.000380.002730.002660.002470.00213
NATPR30.029940.044780.083980.326760.135410.196840.351731.20957
NATPR40.000620.000590.000540.000400.003730.003290.002800.00220
NATPR50.024300.030940.042770.053430.013830.011460.008830.01298
NATPR60.011780.013660.015860.009540.068190.080050.082530.04482
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A6. MSE estimates for n = 100, p = 4, σ = 4 and 10.
Table A6. MSE estimates for n = 100, p = 4, σ = 4 and 10.
σ = 4σ = 10
Estimators0.800.900.950.990.800.900.950.99
Ols3.416715.4490610.084049.230420.685032.510761.8685296.501
HK1.422922.081323.0676314.74207.4014810.005319.145484.8054
HKB0.963141.489122.4698610.58455.247878.5379814.234258.3697
KAM3.031964.780698.6229041.664617.766527.745752.1899246.519
KGM0.628380.832841.050993.913692.306842.892424.1474111.5657
KMed0.707100.845300.972113.676982.275202.700394.2009318.8968
KMS2.036333.365766.3854037.075414.877224.220247.7874252.597
LCTPR0.261190.326060.283620.235640.995141.068840.635390.45309
TKTPR1.974152.016730.882770.362006.476965.887337.111044.10309
MTPR10.064320.049160.048900.043000.466770.601360.306860.34005
MTPR20.097850.098340.071450.123770.707550.942561.066382.93319
MTPR30.172570.463200.241332.774331.518642.371125.8178137.4781
NATPR10.058900.046360.044560.042980.427960.492060.298760.39425
NATPR20.058850.046300.044520.042930.427020.490780.296940.33900
NATPR31.969992.230693.8265210.70235.974928.2317113.546932.8223
NATPR40.067920.060370.051280.044370.634540.629880.352020.35009
NATPR50.053110.047240.045330.044360.432460.498500.307160.68939
NATPR61.546731.758921.708911.0218612.494914.419715.908111.8794
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A7. MSE estimates for n = 20, p = 10, σ = 0.4 and 0.9.
Table A7. MSE estimates for n = 20, p = 10, σ = 0.4 and 0.9.
σ = 0.4σ = 0.90
Estimators0.800.900.950.990.880.900.950.99
OLS0.831072.608985.4120125.76804.4269712.702325.0088135.511
HK0.599951.357472.5116910.18662.256464.941379.6342452.2559
HKB0.258220.567561.093884.546661.017692.497514.4897024.6443
KAM0.813892.513005.1801924.42764.2783912.065123.7112128.755
KGM0.088480.148080.274991.195260.363840.653471.262635.16765
KMed0.101240.171530.361921.498850.466920.841051.579945.07678
KMS0.459821.210162.7741316.23842.502147.3461015.8761105.176
LCTPR0.004210.005050.007640.044400.020920.030630.050630.18785
TKTPR0.098080.156640.268240.024560.495570.498391.115010.03957
MTPR10.001290.001100.001050.001320.008180.005880.036390.00406
MTPR20.001520.002110.002910.026790.018460.022780.069310.19057
MTPR30.014710.266791.2649614.98250.199072.136544.0448167.0210
NATPR10.069540.090960.161490.690630.034130.092440.154930.77417
NATPR20.001360.000930.001060.000840.005950.004740.004270.00407
NATPR30.677261.773653.3238711.38293.000556.3550310.946444.0648
NATPR40.001750.001170.001170.000830.008450.006170.004980.00415
NATPR50.469961.062902.113949.691990.759461.925713.5179420.7961
NATPR60.051540.042570.031690.009740.288030.228000.169910.04782
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A8. MSE estimates for n = 20, p = 10, σ = 4 and 10.
Table A8. MSE estimates for n = 20, p = 10, σ = 4 and 10.
σ = 4σ = 10
Estimation0.800.900.950.990.800.900.950.99
Ols81.1749254.896508.5202543.38501.1681554.583229.2516189.7
HK36.360099.4658188.804941.444206.140613.9941250.716020.83
HKB16.969847.996582.9070387.14394.8011266.112556.7403105.78
KAM77.7797241.660482.3592395.18478.6931475.003063.3715316.3
KGM4.937548.4569017.456164.120624.714948.552687.7588330.942
KMed5.669529.5005620.243489.619633.289065.1450130.632581.689
KMS63.5509209.710435.7642324.71440.3071415.533004.0115560.1
LCTPR0.517980.392920.472340.283845.981814.486364.260511.90416
TKTPR11.717510.63494.887783.79228102.18997.4429103.158105.829
MTPR10.954780.268120.135930.1062210.41107.110336.64294.26234
MTPR25.523628.4906716.5498114.62783.6971162.684401.6322238.43
MTPR320.052286.5001191.6581743.14224.861754.6271719.8211696.3
NATPR10.191760.147060.241770.359385.545095.164347.5658871.3746
NATPR20.148160.095560.087070.080084.669763.427933.873466.48998
NATPR336.773668.3937111.536396.630142.060264.580442.5321347.68
NATPR40.215200.127000.101970.082895.895173.995983.894081.80819
NATPR50.821902.515675.0249115.79648.1047410.946822.8083144.692
NATPR68.879956.124794.142331.04484110.13793.854590.608857.5535
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A9. MSE estimates for n = 50, p = 10, σ = 0.4 and 0.9.
Table A9. MSE estimates for n = 50, p = 10, σ = 0.4 and 0.9.
σ = 0.4σ = 0.90
Estimators0.800.900.950.990.800.900.950.99
OLS0.246550.409450.853194.410721.291352.101294.1795122.1215
HK0.217270.325790.571882.148130.834451.162561.917488.53736
HKB0.115980.170230.232940.931120.356960.460940.844433.75902
KAM0.244130.402940.831664.219491.259122.030224.0057020.9982
KGM0.029650.039660.067310.322100.111100.146770.284961.21435
KMed0.031490.040080.079860.410580.133900.186340.391131.65792
KMS0.171160.236570.419422.136260.679571.056392.0848213.7997
LCTPR0.001260.001520.002070.011430.006940.006950.013080.04408
TKTPR0.042350.020640.057700.038800.163080.124820.078820.46178
MTPR10.000520.000450.000420.010680.002830.002290.001810.00181
MTPR20.000540.000470.000450.015800.003080.002660.002670.01821
MTPR30.000480.000640.002800.379040.008280.066120.073351.92968
NATPR10.011030.008610.005680.012460.003940.003190.002900.00493
NATPR20.000530.000450.000400.000320.002850.002290.001800.00177
NATPR30.216810.334180.653902.717471.006391.513752.8038410.5563
NATPR40.000650.000570.000460.000340.003870.002990.002150.00183
NATPR50.142430.179830.265130.838420.094450.109360.173510.81872
NATPR60.030780.037560.035540.014800.185080.202230.185730.07258
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A10. MSE estimates for n = 50, p = 10, σ = 4 and 10.
Table A10. MSE estimates for n = 50, p = 10, σ = 4 and 10.
σ = 4σ = 10
Estimation0.800.900.950.990.800.900.950.99
Ols26.278441.357187.2430444.742160.3181337.234678.8063319.24
HK11.309115.910431.9102177.69966.51654102.151179.1411007.68
HKB5.113817.5626213.200972.53612990876276.9805149.327706.840
KAM25.219739.385082.5592421.483153.7216280.100561.3132723.56
KGM1.547322.218163.6417216.06617.14923212.902921.752562.0216
KMed1.977542.691524.2331022.74429.42911514.487829.3715143.154
KMS18.431229.383664.7798375.9581335800292.716607.2213112.57
LCTPR0.127150.149470.163760.204361.255484.582715.033581.57920
TKTPR2.147711.593260.732901.6344526.718929.198924.338925.9264
MTPR10.081960.124490.046360.037181.151096.519335.626093.11176
MTPR20.180390.243090.225743.251976.4614929.181546.9396278.649
MTPR30.900352.219622.6510762.324023.6076133.535257.8051768.56
NATPR10.057700.043350.038500.033140.848513.780664.790723.55465
NATPR20.057170.042900.038090.032580.833193.417824.267601.42585
NATPR314.615318.605433.6821123.56659.942237.225149.4474100.545
NATPR40.089870.060170.045460.0339311.84574.539444.877831.50382
NATPR50.079810.071850.081030.253210.924584.773346.0872011.3683
NATPR65.015055.149994.208201.5358246.541448.113044.060924.1001
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A11. MSE for n = 100 ,   p = 10 ,   σ = 0.40   a n d   σ = 0.90 .
Table A11. MSE for n = 100 ,   p = 10 ,   σ = 0.40   a n d   σ = 0.90 .
σ = 0.40σ = 0.9
Estimators0.800.900.950.990.800.900.950.99
OLS0.133940.143590.251011.296910.640830.664861.426477.08059
HK0.124670.133690.221080.850260.476190.514590.923733.28431
HKB0.075580.082140.123680.336420.233120.236600.380071.48996
KAM0.133120.142730.248551.264790.628440.653761.392256.82443
KGM0.018110.018990.027170.117740.060720.068750.127430.47928
KMed0.018810.018630.029690.141510.072940.083490.161030.71306
KMS0.102860.110340.168000.637060.353500.387300.727773.87792
LCTPR0.000670.000590.000770.004080.002390.003200.004990.01590
TKTPR0.012840.009890.001030.173400.060770.011770.124040.28563
MTPR10.000240.000200.000250.000150.001200.001250.000930.00083
MTPR20.000250.000210.000190.000220.001230.001270.001040.00253
MTPR30.000230.000190.000180.037500.001160.001200.002740.22131
NATPR10.003320.002840.001650.000750.001400.001430.001040.00094
NATPR20.000250.000200.000180.000150.001210.001250.000930.00081
NATPR30.121480.130010.225651.011870.539150.564601.122134.70696
NATPR40.000290.000250.000220.000160.015810.001630.001140.00086
NATPR50.073660.076660.100210.180280.030850.031150.030230.08188
NATPR60.024110.025070.029480.019480.136860.147200.169770.10039
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Table A12. MSE for n = 100 ,   p = 10 ,   σ = 4   a n d   σ = 10 .
Table A12. MSE for n = 100 ,   p = 10 ,   σ = 4   a n d   σ = 10 .
σ = 4σ = 10
Estimators0.800.900.950.990.800.900.950.99
OLS12.854013.236727.4983128.86583.206485.5356166.720836.922
HK5.078495.9874411.441252.822533.958738.776567.9607355.620
HKB2.435682.736025.1024023.865016.600416.193531.6437162.891
KAM12.339912.755826.3889123.33679.752182.2335159.845801.126
KGM0.871751.008951.671766.755014.453365.135338.6526534.2907
KMed1.277241.400902.117498.377705.483176.5220810.670256.2758
KMS8.025038.3516218.6268100.66065.076867.3909135.566744.790
LCTPR0.057040.093490.099350.179820.381320.651870.402830.24024
TKTPR0.816871.102370.683240.2031911.990210.731512.05695.98078
MTPR10.024070.028100.018970.016520.523870.353850.206110.10800
MTPR20.028270.041200.026760.080001.505951.173972.129489.57489
MTPR30.059750.330430.305631.807354.836543.8969511.340369.3587
NATPR10.024320.023420.019100.016560.173800.348830.116950.10747
NATPR20.024260.023360.019050.016520.166760.345120.116370.10745
NATPR37.780738.1758214.852454.441736.570140.108163.1600213.398
NATPR40.035280.033640.023740.017440.256650.427450.150470.11339
NATPR50.027390.026240.022650.023920.187820.365160.121390.11051
NATPR63.365333.537043.655922.0895528.775230.301633.320717.9080
Note: the bold values represent the lowest estimated MSE of the ridge parameters.

References

  1. Lipovetsky, S.; Conklin, W.M. Ridge regression in two-parameter solution. Appl. Stoch. Models Bus. Ind. 2005, 21, 525–540. [Google Scholar] [CrossRef]
  2. Belsley, D.A.; Kuh, E.; Welsch, R.E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
  3. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  4. Hoerl, A.E.; Kennard, R.W. Ridge regression: Applications to nonorthogonal problems. Technometrics 1970, 12, 69–82. [Google Scholar] [CrossRef]
  5. Hocking, R.R.; Speed, F.M.; Lynn, M.J. A Class of biased estimators in linear regression. Technometrics 1976, 18, 425–437. [Google Scholar] [CrossRef]
  6. Hoerl, A.; Kennard, R.; Baldwin, K. Ridge regression: Some simulations. Commun. Stat. Simul. Comput. 1975, 4, 105–123. [Google Scholar] [CrossRef]
  7. Suhail, S.; Khan, R.; Khan, S. Ridge estimation methods in the presence of multicollinearity: A simulation study. J. Appl. Stat. 2020, 47, 1068–1086. [Google Scholar]
  8. Toker, S.; Kaçiranlar, S. Two-parameter ridge regression estimators. Commun. Stat. Theory Methods 2013, 42, 4110–4115. [Google Scholar]
  9. Khalaf, G.; Mansson, K.; Shukur, G. Improved ridge parameters for biased estimation in linear regression. Commun. Stat. Theory Methods 2013, 42, 4116–4129. [Google Scholar]
  10. Khan, M.S.; Ali, A.; Suhail, M.; Kibria, B.G. On some two parameter estimators for the linear regression models with correlated predictors: Simulation and application. Commun. Stat. Simul. Comput. 2024, 1–15. [Google Scholar] [CrossRef]
  11. Khan, M.S.; Ali, A.; Suhail, M.; Alotaibi, E.S.; Alsubaie, N.E. On the estimation of ridge penalty in linear regression: Simulation and application. Kuwait J. Sci. 2024, 51, 100273. [Google Scholar] [CrossRef]
  12. Yasin, S.; Khan, R.; Hussain, I. Modified two-parameter ridge estimators for linear regression model with multicollinearity. Commun. Stat. Simul. Comput. 2021, 50, 2230–2247. [Google Scholar]
  13. Lipovetsky, S. Ridge regression: A review. J. Appl. Stat. 2006, 33, 697–708. [Google Scholar]
  14. Khan, R.; Mukherjee, R.; Ullah, S. Improved ridge regression estimators in the presence of multicollinearity. J. Stat. Comput. Simul. 2023, 93, 1047–1064. [Google Scholar]
  15. Lukman, Q.F.; Ayinde, K. Review and classification of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–967. [Google Scholar] [CrossRef]
  16. Lukman, Q.F.; Ayinde, K.; Ajiboye, Q.S. Monte Carlo study of some classification-based ridge parameter estimators. J. Mod. Appl. Stat. Methods 2017, 16, 24. [Google Scholar] [CrossRef]
  17. Kibria, B.M.G. Performance of some new ridge regression estimators. Commun. Stat. Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
  18. McDonald, G.C.; Galarneau, D.I. A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
  19. Halawa, S.A.; King, J.E.; Khalaf, G. Mean squared error properties of some recent ridge regression estimators. J. Appl. Stat. 2000, 27, 315–330. [Google Scholar]
  20. Economic Survey of Pakistan, Statistical Supplement. 2022. Available online: https://www.finance.gov.pk/survey_2022.html (accessed on 1 June 2024).
  21. Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]
Figure 1. The pairwise correlation matrix.
Figure 1. The pairwise correlation matrix.
Mathematics 12 03027 g001
Table 1. Summary table of recommended estimator, under certain conditions based on MSE.
Table 1. Summary table of recommended estimator, under certain conditions based on MSE.
Sample
Size
Error
Variance
p = 4 p = 10
0.800.900.950.990.880.900.950.99
200.4MTPR1NATPR2NATPR2NATPR2NATPR2NATPR2MTPR1NATPR4
0.9NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2
4NATPR2NATPR2NATPR2NATPR4NATPR2NATPR2NATPR2NATPR2
10NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2
500.4MTPR3NATPR2NATPR2MTPR1MTPR3MTPR1NATPR2NATPR2
0.9MTPR1NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2
4NATPR2NATPR2NATPR1NATPR2NATPR2NATPR2NATPR2NATPR2
10NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2
1000.5MTPR3NATPR2MTPR3NATPR2MTPR3MTPR3NTPR3NATPR2
1NATPR2NATPR2NATPR2NATPR1MTPR3MTPR3NATPR2NATPR2
5NATPR2NATPR2NATPR2NATPR1NATPR2NATPR2NATPR2NATPR2
10NATPR1NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2NATPR2
Note: summary of estimated MSE recommended estimators.
Table 2. Variance Inflation Factor (VIF) analysis of predictor variables.
Table 2. Variance Inflation Factor (VIF) analysis of predictor variables.
Variables X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8
VIF13.4930.563.957.74154,653.9633,024.81.64165,801.7
Table 3. MSE and regression coefficient estimates.
Table 3. MSE and regression coefficient estimates.
EstimatorsMSE α ^ 1 α ^ 2 α ^ 3 α ^ 4 α ^ 5 α ^ 6 α ^ 7 α ^ 8
OLS4262.71−0.0540−0.5194−0.085156.142853−0.14297−0.52688−0.142280.003376
HK0.65275−0.26070.423290.00541−177.354−0.318550.4293890.026622−0.00013
HKB21.1698−0.5194−0.14040.025403−0.14111−0.34176−0.142483.395291−0.05460
KAM2793.040.423300.02657−0.00097−0.320210.1591790.02696−0.28325−0.26341
KGM0.15643−0.14046.02792−0.04636−0.34608−0.027826.216128−0.05454−0.52474
KMed0.135330.02657−11.471−0.191660.1617880.000874−63.4684−0.263140.427647
KMS3936.716.14316−0.0540−0.31095−0.028340.003284−0.05432−0.52415−0.14190
LCTPR0.12337−184.56−0.26070.1947070.000893−0.00013−0.262060.4271030.026849
TKTPR0.14270−0.0540−0.5194−0.043480.003354−0.10945−0.52205−0.141686.173631
MTPR10.12236−0.26040.423300.001841−0.00013−0.333000.4254480.026725−36.0849
MTPR20.14065−0.5185−0.14040.007421−0.10582−0.41771−0.141175.025176−0.07592
MTPR3508.5400.422010.02657−0.00028−0.332550.2125390.026707−1.00387−0.30938
NATPR11.631810.02567−149.34−0.260740.2197510.001316−12.9337−0.320690.304517
NATPR20.12167−0.1082−0.23590.4233050.001389−0.00019−0.264620.1625710.002758
NATPR33.462651.94433−0.0515−0.51941−0.041210.005001−0.05486−0.34737−0.06687
NATPR425.6981−0.13966.14133−0.05404−0.42602−0.039456.071767−0.14055−0.49408
NATPR5167.709−0.0540−0.4345−0.140460.005291−0.05482−0.52698−0.028500.011033
NATPR60.16138−0.26070.315430.026579−0.00020−0.264480.4292560.000898−0.00042
Note: the bold values represent the lowest estimated MSE of the ridge parameters.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Akhtar, N.; Alharthi, M.F.; Khan, M.S. Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators. Mathematics 2024, 12, 3027. https://doi.org/10.3390/math12193027

AMA Style

Akhtar N, Alharthi MF, Khan MS. Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators. Mathematics. 2024; 12(19):3027. https://doi.org/10.3390/math12193027

Chicago/Turabian Style

Akhtar, Nadeem, Muteb Faraj Alharthi, and Muhammad Shakir Khan. 2024. "Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators" Mathematics 12, no. 19: 3027. https://doi.org/10.3390/math12193027

APA Style

Akhtar, N., Alharthi, M. F., & Khan, M. S. (2024). Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators. Mathematics, 12(19), 3027. https://doi.org/10.3390/math12193027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop