1. Introduction
Regression analysis is one of the most common methods of analysis and modeling recognized for its practicality in forecasting and estimation. Lipovetsky and Conklin [
1] have pointed out that it is useful in several ways. The ordinary least-squares (OLS) method is often favored for parameter estimation in regression analysis due to its appealing mathematical properties and computational ease. However, the OLS method works optimally only under specific circumstances, particularly when predictors are orthogonal. In real-world applications, predictors are often highly correlated, leading to multicollinearity problems in regression models. This multicollinearity makes OLS estimates less efficient and can cause important variables to have statistically insignificant regression coefficients. Moreover, it decreases statistical power and produces wider confidence intervals for the regression coefficients. To assess the extent of collinearity among predictors, the condition number (CN) of the X′X matrix, expressed as
, is frequently used. Generally, a CN ≤ 10 indicates weak multicollinearity. As Belsley et al. [
2] described, the collinearity is considered negligible when CN ≤ 10, moderate to strong when 10 < CN ≤ 30, and severe when CN ≥ 30. The ridge regression technique was proposed to address multicollinearity in regression models [
3,
4]. Consider the following linear regression model:
where
is an
vector of responses,
is an
design matrix of predictors,
is a
vector of unknown regression coefficients, and
is an
vector of error terms assumed to follow a multivariate normal distribution with a mean vector of 0 and a variance–covariance matrix of
. In this context,
denotes the identity matrix of order
n. The OLS estimates are calculated as follows:
The ridge regression estimates are given by the following:
where
I is a
identity matrix and k is a positive scalar. From Equation (3), it is evident that the CN of the
matrix decreases as
increases; however, the introduction of
introduces bias in the ridge estimators in exchange for reduced variances of the regression coefficients. Thus, the key challenge in ridge regression is selecting an optimal value of
that provides the best bias–variance trade-off. The effectiveness of ridge estimators is influenced by various factors, including the degree of correlation among predictors, the variance of errors, the number of predictors, and the sample size. Because data characteristics can vary widely, no single ridge estimator consistently excels in every situation.
Researchers have explored various methods to identify the optimal k value for ridge estimators. Ref. [
5] developed a generalized ridge estimator and conducted comparisons with simple ridge and rank procedure methods. They concluded that while the generalized ridge estimator may have potential advantages, simple ridge and rank procedures are sufficiently adaptable for practical use. Hoerl et al. [
6] introduced a novel ridge estimator based on the harmonic mean of
and showed through simulation studies that it performed better than OLS. A proposed improvement to the ridge method by developing quantile-based ridge estimators for
, which demonstrated superior performance over existing ridge estimators and OLS in their simulations [
7]. Addressing the challenge of selecting the k value in one-parameter ridge estimators, Lipovetsky and Conklin introduced a two-parameter ridge estimator (TPRE), providing a more refined approach to overcome the limitations of traditional methods [
1]. The ridge estimator for the coefficients can be represented in a generalized form as follows:
where
q is a scaling factor,
is the matrix of predictors,
y is the vector of responses,
k is the ridge parameter, and
I is the identity matrix.
It is important to note that Equation (4) can be seen as a general form of Equations (2) and (3) if
and
. Subsequently, different researchers have suggested modifications to the two-parameter ridge regression model; for details, see refs. [
8,
9,
10,
11,
12].
The proposed method for determining the value of q was by maximizing the R-squared value, while the parameter k was computed by minimizing the mean squared error (MSE). They found that their two-parameter ridge regression model had lower MSE and provided better orthogonality between predicted values and residuals compared to the one-parameter model. This model has since been refined by other researchers. For instance, Lipovetsky [
13] further investigated the properties of the two-parameter ridge model, and [
8] optimized the tuning parameters
k and
q, comparing their performance against OLS, one-parameter ridge estimators, and contraction estimators using the matrix MSE criterion. Refs. introduced and developed three new variations of the two-parameter ridge estimators [
9,
10]. More recently, Khan et al. [
14] introduced six novel two-parameter ridge estimators and benchmarked them against the existing two-parameter ridge, one-parameter ridge, and OLS estimators. Although these estimators show superior performance under specific conditions, no single estimator consistently outperforms the others across all scenarios. Lukman and Ayinde [
15] conducted a comprehensive review and classification of various techniques used for estimating ridge parameters. Additionally, Lukman, Ayinde, and Ajiboye [
16] performed a Monte Carlo analysis on different estimators based on classification methods for ridge parameters. Lipovetsky and Conklin [
1] introduced the two-parameter ridge (LCTPR) estimator to improve the fit-of-ridge regression models by using two parameters instead of one. The effectiveness of ridge estimators is influenced by several factors, including the degree of multicollinearity, error variance, the number of predictors, and sample size, and their performance can decline under stringent conditions.
The previous literature shows that, while most of the estimators are efficient under certain conditions, none of the ridge estimators dominate others in all situations. The efficiency is reduced by strong multicollinearity, high variability of errors, a large number of predictors, and a small sample size in the analyzed population. To address these issues, we propose six enhanced two-parameter ridge estimators aimed at effectively tackling issues of severe multicollinearity. These new ridge parameters are formulated based on the optimal selection of the ridge parameters and .
The effectiveness of the proposed ridge parameters is assessed through a Monte Carlo simulation study and the analysis of a real dataset where the independent variables are correlated with each other. This article is structured as follows:
Section 2 details the methodology of ridge regression and presents our newly proposed ridge parameters alongside a brief overview of several existing ridge estimators.
Section 3 describes the simulation study and interprets the findings.
Section 4 demonstrates the application of the proposed new ridge parameters using a real dataset. Finally,
Section 5 offers concluding remarks and insights.
2. Methodology
To simplify the mathematical representation, model (1) can be reformulated into a canonical or orthogonal form as follows:
Equation (6) represents the model reformulated in canonical form, where, , , and . Here, D represents an orthogonal matrix containing the eigenvectors of the X′X matrix, while denotes the identity matrix. Moreover, , with defined as , where are the positive eigenvalues of the X′X matrix, ordered from smallest to largest.
By using this transformation, Equations (2)–(4) can be rewritten in their canonical forms as follows:
Equation (7) provides the canonical form of the ordinary least squares estimator in the transformed model.
Equation (8) refers to the ridge estimator in its canonical form, which incorporates the regularization parameter
k.
Equation (9) illustrates the two-parameter ridge estimator, which generalizes the ridge regression by introducing an additional parameter q.
2.1. Existing Ridge Parameters
The following are established ridge parameters.
- i.
Hoerl and Kennard parameter
- ii.
Hoerl, Kennard, and Baldwin (HKB) ridge parameter
- iii.
Kibria parameters
Ref. [
17] proposed three ridge parameters by extending the research of [
6].
- iv.
Khalaf, Mansson, and Shukur (KMS) parameter
- v.
Toker and Kaciranlar two-parameter ridge parameters
In the above equation,
is defined as in Equation (10).
2.2. Proposed Ridge Parameters
In this study, we introduce six modified Lipovetsky–Conklin ridge (MLCR) estimators, NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6, which combine the approaches from refs. [
1,
14] to optimize the values of
k and
q.
The values of k for our new proposed ridge parameters are determined as follows:
The
using the cubic mean of
across all predictors, scaled by the variance ratio
This approach emphasizes the robustness of larger coefficients while controlling for the overall variance.
The
is derived using the fourth root mean of
, providing an alternative weighting scheme that further emphasizes larger coefficients, potentially improving model stability in high-dimensional settings.
where
.
This parameter selects the minimum value among
focusing on the most conservative regularization parameter that balances bias and variance effectively.
takes the maximum value of
, maximizing regularization to constrain the influence of variables with large coefficients.
as the square root of the mean of
, scaled by the variance ratio, offers a balanced approach between extreme regularization and no regularization.
uses the cube root of the maximum ratio , targeting a moderate regularization that is neither too conservative nor too aggressive.
The estimators , and are modifications of the HK ridge estimator Equation (10) in which regularization is improved through different weighting of coefficients. They also seek to enhance model stability and less rigid flexibility, especially when working with high-dimensional data structures, and generate large coefficients and form variances with a range of mathematical techniques and scaling.
The values are computed using Equation (5), with the corresponding values clearly shown in Equations (18)–(23). Based on these corresponding and values, six optimized two-parameter ridg estimators are derived, referred to as NATPR1, NATPR2, NATPR3, NATPR4, NATPR5, and NATPR6 in this research.
In the following section, we present a simulation study to evaluate the performance of the proposed ridge parameters in comparison to existing ones.
3. Monte Carlo Simulation
This section gives a scenario of the result of the Monte Carlo simulation study that is used to compare the performance of the new ridge parameters with other existing ridge parameters. The subsequent sub-sections provide information about the simulation approach and the employed algorithm. The Monte Carlo simulation is a computational approach used to solve statistical models by sampling using random numbers. Multiple simulations tried to assess the properties like the bias, the variance, and the mean square error in order to analyze the efficiencies of the different methods such as the ridge regression estimator proposed in this context.
3.1. Simulation Technique
In this study, predictors are generated using Equation (24), which takes into account varying degrees of collinearity among them, as described by [
17,
18], as follows:
where
represents the pairwise correlation among predictors,
is the number of predictors, n is the sample size, and
are pseudo-random numbers drawn from a standard normal distribution. This study examines
values of 0.80, 0.90, 0.95, and 0.99; sample sizes
of
and
; and predictor counts
of
and
.
The response variable is generated using the following equation:
In this formula,
is a random error term drawn from a normal distribution with mean 0 and variance
. This study considers four values of
: 0.40, 0.90, 4, and 10. The regression coefficients
are set based on the most favorable direction, as outlined by Halawa et al. [
19], with
set to zero.
Hence, the estimated mean squared error (EMSE) is computed from 5000 replications as follows:
All computations were carried out using R-Studio version 2022.12.0.
3.2. Performance Evaluation Criteria
The estimators’ performance is evaluated using the mean square error (MSE) criterion, based on approaches from previous research, including those by [
4,
18], and using Equations (22)–(27). The estimated MSE for any estimator
of the parameter
is defined as follows:
This expression calculates the expected value of the squared deviation between the estimator and the true parameter . It serves as a metric for evaluating the estimator’s accuracy and precision, reflecting its ability to provide reliable estimates close to the actual parameter values.
3.3. Analysis and Findings
The estimated MSE values are outlined in
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8,
Table A9,
Table A10,
Table A11 and
Table A12, available in
Appendix A. Our simulation study revealed the following findings:
Across various sample sizes, error variances, and numbers of predictor variables, our new ridge parameters consistently exhibited the lowest estimated MSE under most simulation scenarios. Notably, in cases of severe multicollinearity (ρ > 0.99), the ridge estimators MTPR3, NATPR1, and NATPR2 outperformed the existing estimators.
These findings illustrate the effectiveness of the new ridge parameters in addressing strong multicollinearity. In contrast, the OLS estimator demonstrates the poorest performance under multicollinear conditions when compared to ridge estimators.
As multicollinearity intensifies, the estimated MSE for OLS and many ridge estimators tends to increase. However, an interesting observation is that the MSE of new ridge parameters decreases with higher multicollinearity, suggesting their robustness against such issues.
When the error variance increases, the estimated MSE of all estimators generally rises, regardless of sample size, multicollinearity level, or the number of predictors. Despite this, new ridge parameters maintain a stable MSE, exhibiting only a modest increase compared to the OLS and many existing ridge estimators, showcasing their resilience against higher error variances amidst multicollinearity.
Increasing the sample size results in a decrease in estimated MSE for all estimators, which aligns with general statistical principles. However, ridge regression estimators show markedly better performance than OLS across all sample sizes.
As the number of predictors increases, the estimated MSE for all estimators rises, with the OLS estimator showing a more rapid increase compared to ridge estimators.
Our results, displayed in various tables, reveal that NATPR2 frequently achieves the lowest EMSE. Moreover, MTPR3, NATPR1, and NATPR2 consistently perform well across different levels of .
The simulation results confirm that ridge parameters consistently outperform OLS in the presence of multicollinearity. Furthermore, among the ridge estimators examined, the two-parameter variants are superior to the one-parameter versions. Notably, our new ridge estimators, especially NATPR2, generally outperform existing methods in most scenarios considered.
Table 1 offers an in-depth analysis of the simulation study, encapsulating 96 distinct scenarios to evaluate the performance of various ridge parameters. Among the ridge parameters tested, NATPR2 consistently demonstrated superior performance by recording the lowest MSE in most scenarios. This consistent outperformance highlights NATPR2’s robustness, particularly under stringent or challenging conditions, making it a standout choice compared to other estimators.
Table 1 serves as a practical guide for selecting the most appropriate estimator based on specific conditions, which include varying sample sizes, error variances, and values of
(4 and 10). The analysis is organized around three sample sizes—20, 50, and 100—and considers four distinct error variances: 0.4, 0.9, 4, and 10. For each combination of sample size, error variance and different levels of
the table recommends the most suitable estimator.
The NATPR2 is frequently the recommended ridge estimator, especially for all sample sizes and across various error variances and pair correlations. As a result, NATPR2 stands out as the most dependable parameter for minimizing MSE, making it the favored option under the diverse conditions that handle the multicollinearity data. The NATPR2 frequently achieves the lowest EMSE. Moreover, MTPR3, NATPR1, and NATPR2 consistently perform well across different levels of , sample sizes, and different error variances.
4. Real-Life Data Analysis
In this section, we demonstrate the application of new ridge parameters by analyzing a dataset representing Pakistan’s GDP growth. This dataset, summarized in
Table 2, includes observations spanning 14 years, from the financial year of 2007–2008 to 2020–2021. The response variable “y” denotes the GDP growth. The predictor variables are as follows:
: Consumer Price Index,
: tax-to-GDP ratio,
: savings-to-GDP ratio,
: investment-to-GDP ratio,
: milk production,
: meat production,
: fish production,
: poultry production.
These data points are sourced from the Economic Survey of Pakistan, Statistical Supplement [
20]. These data are modeled using a linear regression approach represented by the following equation:
The condition number of the matrix, calculated to be 3,937,104, indicates severe multicollinearity within the dataset. This analysis is based on the data and methodology outlined by [
21], providing insights into the factors influencing Pakistan’s GDP growth.
In the
Figure 1, high positive relationships are observed for
with
with a coefficient of 0.75,
with a coefficient of 0.73, and
with a coefficient of 0.71. These orientations have very high correlation coefficients that imply that
has a direct proportional relationship with
,
, and
. Other correlations are moderate or weak, indicating varying degrees of linear relationships between the variables.
In the real dataset, the eigenvalues calculated are 4.134289315, 1.900981733, 1.022049275, 0.58361122, 0.30723455, 0.051004124, 0.000828734, and
, the magnitude of the largest eigenvalue is 4.134289315, and the smallest non-zero eigenvalue is
. The condition number is calculated as follows:
The Variance Inflation Factor (
VIF) is a statistic that shows how much the variance of a regression coefficient is increased because of the collinearity of the independent variables.
VIF is calculated as follows:
where
means the coefficient of determination. Tolerance is the reciprocal of
VIF and also shows multicollinearity.
VIF = 1: The relationship that exists between the independent and the dependent variables.
Further, 1 < VIF < 5: moderate correlation; acceptable level of contaminant.
And 5 ≤ VIF < 10: high correlation; can be dangerous.
VIF ≥ 10: prevalence of severe multicollinearity.
The VIF analysis reveals that several predictor variables, particularly X5, X6, and X8, exhibit severe multicollinearity.
These are mainly attributed to high
VIF values and extreme multicollinearity specifically for
. In this case, the use of ridge estimators may be required. Higher
VIF values imply that multicollinearity may affect the regression results and complicate the effects of each predictor [
20]. Ridge regression is another common technique that employs a penalty to the size of coefficients; this method is efficient in managing the issue of multicollinearity by reducing the size of the coefficients of correlated predictors. This approach assists in bringing stability to the model estimates and enhances the clarity of the conclusions made from the regression analysis. The proposed new ridge estimators and existing estimators to reduce the effects of multicollinearity were used, leading to more accurate and efficient models.
Table 3 compares the MSE for various estimators, highlighting their effectiveness in handling multicollinearity. The OLS estimator has the highest MSE (4262.71), indicating poor performance in the presence of multicollinearity. The OLS estimator exhibits the highest MSE, indicating poor performance, whereas ridge estimators like MTPR3, NATPR1, and NATPR2 have the lowest MSE, demonstrating superior capability. The regression coefficients reveal how different estimators adjust predictor influence: OLS estimates are less stable, while ridge estimators show more consistent and reliable coefficients. Overall, NATPR2 effectively reduces MSE and provides stable coefficient estimates, underscoring their suitability for addressing multicollinearity.
The results show that the new ridge parameters consistently have lower MSE compared to all existing ones. Furthermore, while most of the ridge parameters demonstrate similar performance levels, they significantly surpass OLS in reducing MSE. The analysis of real-world data further supports these findings, with the MSE values for the proposed estimators (highlighted in bold) being notably lower compared to other ridge estimators.
The performance of ridge estimators varies significantly based on the MSE and the coefficients of the predictors. Among the new ridge parameters, NATPR2 and NATPR6 stand out as some of the best performing in terms of MSE. The interpretation of the coefficients suggests that different estimators highlight the importance of various predictors, with some exhibiting extreme values that indicate significant impacts. These variations emphasize the importance of selecting an appropriate estimator tailored to the specific context and characteristics of the dataset to effectively address multicollinearity.