Next Article in Journal
Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches?
Previous Article in Journal
Comparing Estimation Methods for the Power–Pareto Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Instrumental Variable Method for Regularized Estimation in Generalized Linear Measurement Error Models

Department of Statistics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
*
Author to whom correspondence should be addressed.
Econometrics 2024, 12(3), 21; https://doi.org/10.3390/econometrics12030021
Submission received: 26 April 2024 / Revised: 16 June 2024 / Accepted: 5 July 2024 / Published: 12 July 2024

Abstract

:
Regularized regression methods have attracted much attention in the literature, mainly due to its application in high-dimensional variable selection problems. Most existing regularization methods assume that the predictors are directly observed and precisely measured. It is well known that in a low-dimensional regression model if some covariates are measured with error, then the naive estimators that ignore the measurement error are biased and inconsistent. However, the impact of measurement error in regularized estimation procedures is not clear. For example, it is known that the ordinary least squares estimate of the regression coefficient in a linear model is attenuated towards zero and, on the other hand, the variance of the observed surrogate predictor is inflated. Therefore, it is unclear how the interaction of these two factors affects the selection outcome. To correct for the measurement error effects, some researchers assume that the measurement error covariance matrix is known or can be estimated using external data. In this paper, we propose the regularized instrumental variable method for generalized linear measurement error models. We show that the proposed approach yields a consistent variable selection procedure and root-n consistent parameter estimators. Extensive finite sample simulation studies show that the proposed method performs satisfactorily in both linear and generalized linear models. A real data example is provided to further demonstrate the usage of the method.

1. Introduction

Regularization is an important approach for estimation in regression models with a relatively large number of parameters because it provides a stable numerical procedure and better prediction while avoiding the overfitting problem. This approach has attracted much attention in the recent literature, mainly due to its applications in variable selection problems in high-dimensional models where the conventional statistical methods are infeasible theoretically and computationally. To address the variable selection problem in sparse regression, various regularization methods have been proposed, e.g., the bridge regression (Frank and Friedman 1993), Lasso (Tibshirani 1996), SCAD (Fan and Li 2001), adaptive Lasso (Zou 2006), MCP (Zhang et al. 2010), elastic net (Zou and Hastie 2005) and Dantzig selector (Candes et al. 2007). A more detailed review of regularization methods can be found in Fan and Lv (2010) and Negahban et al. (2012).
In real data analysis, it is common that some predictors cannot be observed directly or measured precisely. For example, the long-term average systolic blood pressure and cholesterol level are important factors of cardiovascular disease, which are usually measured with error. In a lung cancer risk study, the inhaled dose of air pollutants cannot be measured precisely and are approximated by the average level of pollutants within a certain area. In regression models, it is well known that if some predictors are measured with error, ordinary estimation procedures ignoring the ME are biased and inconsistent. However, the impact of ME on regularized estimation procedures is not clear. For example, in a linear model, the naive least squares estimate for the regression coefficient of the mismeareued predictor is attenuated towards zero and, on the other hand, the variance of the corresponding observed surrogate predictor is inflated. Therefore, the combined effect of these two factors may cause false-positive or -negative results in the selection outcome, as illustrated in Example 1.
Research on regularized estimation in ME models is sparse. Some authors considered the penalized version of the usual error correction methods, assuming the ME covariance matrix is known or can be estimated using replicate data. For example, Liang and Li (2009) applied the penalized least squares method with attenuation correction and quantile estimation with orthogonal regression adjustment in a partial linear model. Ma and Li (2010) studied general parametric and semiparametric models using the method of penalized estimating equations. Further, Huang and Zhang (2013) used the penalized score functions, while Zhang et al. (2017) used a prediction criterion for variable selection in linear ME models.
Another major approach to estimation in ME models is the instrumental variable (IV) method. This method has been used to treat the endogeneity problem in high-dimensional regression models by Fan and Liao (2014) who proposed the focused generalized method of moments estimator. Lin et al. (2015) studied a two-stage regularization method for selecting relevant instruments and predictors in linear models under the assumption that the random errors are jointly normally distributed. Zhong et al. (2020) proposed a two-stage estimation procedure with instrumental variables for a dummy endogenous variable.
All these works mainly focus on a general endogegeity problem in linear models or binary response variables. So far, there is very few, if any, published studies focusing specifically on the IV approach to measurement error problems. In this paper, we try to fill in this gap. Specifically, we extend the IV method to study the variable selection and estimation problem in generalized linear models with ME. This method does not require the distribution or covariance matrix of the ME to be known. It is an extension of the method of conditional moments by Wang and Hsiao (2011). The proposed selection procedure and estimator are consistent and enjoy the oracle property under general conditions.
The rest of the paper is organized as follows: In Section 2, we introduce the regularized instrumental variable method and study its asymptotic properties. Section 3 contains the special case of linear model. Numerical examples are given in Section 4 followed by a real example in Section 5. Technical details are relegated to Appendix A.

2. The Model and Estimation Method

Suppose the response variable Y has the conditional mean function
E ( Y | X , Z ) = g ( α + β x T X + β z T Z ) ,
where X R p is a vector of error-prone predictors in low dimension, Z R q is a vector of error-free predictors and g ( · ) is a link function. Equation (1) includes the generalized linear models as well as the so-called single index models as special cases. We assume that the observed surrogate predictors are
X * = X + δ ,
where δ is a random ME. Further, we assume that there are instrumental variables (IV) W R l besides the main sample ( Y , X * , Z ) . The usual requirement for an IV is that it is correlated with the unobserved predictor X but independent of the ME δ and is conditionally independent of Y given ( X , Z ) . Following the literature (Wang 2021; Wang and Hsiao 2011), we assume that the IV W is related with X through
X = Γ W + U ,
where Γ is the p × l matrix of unknown parameters which is assumed to have full rank p, U is independent of W , Z , has mean E ( U ) = 0 and density f U ( u ; ϕ ) with unknown parameters ϕ R k . It is further assumed that the ME δ in (2) satisfies E ( δ | X , Z , W ) = 0 . Throughout this paper, we assumed that Z is exogenous and all expectations on it are taken conditionally; however, Z is suppressed to simplify notations. We also adopt the common assumption in the ME literature that the ME δ is nondifferential, which implies that E ( Y | X , X * , W ) = E ( Y | X ) .
Now, we consider the estimation of unknown parameters in (1)–(3) given an i i d random sample Y i , X i * , W i , i = 1 , 2 , . . . , n . First, substituting (3) into (2) results in a usual linear regression equation
X * = Γ W + U + δ
and, therefore, Γ can be consistently estimated by the least squares estimator
Γ ^ = ( i = 1 n X i * W i T ) ( i = 1 n W i W i T ) 1 .
In the following, we focus on the estimation of other parameters of main interest ψ = ( α , β x T , β z T , ϕ T ) T in model (1)–(3). Specifically, we propose an estimator based on the fist two conditional moments E ( Y | W ) and E ( Y X * | W ) . To simplify notation, we denote X ˜ * = ( 1 , X * T ) T and X ˜ = ( 1 , X T ) T . Then, the two conditional moments can be written together as
E ( X ˜ * Y | W ) = X ˜ * g ( α + β x T Γ W + β z T Z + β x T u ) f U ( u ; ϕ ) d u = x ˜ g ( α + β x T X + β z T Z ) f U ( X Γ W ; ϕ ) d X : = m ( Γ W ; ψ ) ,
where
m ( v ; ψ ) = x ˜ g ( α + β x T X + β z T Z ) f U ( X v ; ϕ ) d X .
Then, the loss function for estimating ψ is defined as
L n ( ψ ) = 1 2 i = 1 n ρ ^ i ( ψ ) T A i ρ ^ i ( ψ ) ,
where ρ ^ i ( ψ ) = Y i X ˜ i * m ( Γ ^ W i ; ψ ) and A i = A ( W i ) are a semipositive definite matrix which may depend on W i .
One of the main features in the high-dimensional variable selection framework is the sparsity of the model, where many regression parameters in β = ( β x T , β z T ) T have a true value zero. In the following, we denote the true parameter values of β as β 0 , the index set of non-zero coefficients as J = { j : β 0 j 0 } and its compliment set as J c = { j : β 0 j = 0 } . We further denote β J = { β j , j J } , β J c = { β j , j J c } and ψ J = ( α , β J T , ϕ T ) T . Similarly, let Γ J be the matrix consisting of rows of Γ corresponding to the index set J, and γ = vec ( Γ T ) as the vector consisting of the columns of Γ T . Finally, the proposed regularized IV estimator is defined as the minimizer of the objective function
Q n ( ψ ) = L n ( ψ ) + n j = 1 d p λ n ( | β j | ) ,
where p λ n ( | b | ) is a penalty function. Let ψ 0 = ( α 0 , β 0 T , ϕ 0 T ) T be the true value of model parameters.
Theorem 1.
Under Assumptions A1–A5 in Appendix A, suppose the penalty function satisfies
a n = max { p λ n ( | β 0 j | ) : β 0 j 0 } = O ( n 1 / 2 ) ,
and
b n = max { p λ n ( | β 0 j | ) : β 0 j 0 } = o ( 1 ) .
Then there exists a local minimizer ψ ^ of the objective function (6) such that | | ψ ^ ψ 0 | | = O p ( n 1 / 2 ) .
Further, let b = ( 0 , p λ n ( | β 0 J T | ) , 0 ) T sign ( ψ 0 J ) and Σ = diag ( 0 , p λ n ( | β 0 J T | ) , 0 ) . We have the following results.
Theorem 2.
If λ n 0 , n λ n and lim inf n lim inf ξ 0 + p λ n ( ξ ) / λ n > 0 , then with probability approaching 1, the root n consistent estimator ψ ^ in Theorem 1, satisfies
(a) β ^ J c = 0 ,
(b) ψ ^ J has asymptotic distribution
n ( H + Σ ) ( ψ ^ J ψ 0 J ) + n b d N ( 0 , D C D T ) ,
where
H = E ρ T ( ψ 0 J ) ψ J A ( W ) ρ ( ψ 0 J ) ψ J T ,
D = I s + 2 , E ρ T ψ 0 J ψ J A ( W ) ρ ψ 0 J γ T I p E ( W W T ) 1 ,
C = E ( K K T )
and
K = ρ T ψ 0 J / ψ J · A ( W ) ρ ψ 0 J ( X J * Γ 0 J W ) W .
From the proof of the above theorem in the Appendix, it can be seen that the covariance matrix D C D T can be estimated by
1 n L n ( ψ ^ J ) ψ J L n ( ψ ^ J ) ψ J T p D C D T ,
where
L n ( ψ J ) ψ J = i = 1 n ρ i T ( ψ J ) ψ J A i ρ i ( ψ J ) .
Though the estimator is consistent regardless of the choice of A ( W ) , there exists an optimal weight A ( W ) matrix theoretically for the most efficient estimator. Following Wang and Hsiao (2011), the optimal weight matrix is given by
A ( W ) = E [ ρ ( ψ 0 J ) ρ T ( ψ 0 J ) | W ] .
Since the optimal weight matrix involves unknown parameters, A ( W ) can be calculated via a two-stage estimation procedure. First, the objective function is minimized using the identity matrix as a weight matrix. In the second stage, the estimators are obtained with the optimal weight matrix, which is calculated with the estimates from the first stage.
As noted in Abarin and Wang (2012), for some models like gamma log-linear and Poisson log-liner model, the analytical form of the expectation (4) can be obtained for some error distribution f U ( u ) . For example, when the random error u follows an univariate normal distribution u N ( 0 , ϕ ) , the integral in (4) has the following closed-form expression
E ( X ˜ * Y | W ) = a ˜ ξ ,
where a ˜ = ( 1 , Γ W + β x ϕ ) T and ξ = exp ( α + β x Γ W + β z T Z + β x 2 ϕ / 2 ) . With the closed-form expression, the burden of computation is eased a lot. On the other hand, in situations where the integral in (4) does not have analytical form, Monte Carlo methods (e.g., importance sampling) can be used to approximate the integral. Specifically, we follow the suggestions in Wang and Hsiao (2011) to calculate (4) as follows.
(1)
Choose a candidate distribution whose density function h ( x ) is known;
(2)
Generate i.i.d. random sample { x i s , s = 1 , 2 , , S , S + 1 , , 2 S ; i = 1 , 2 , , n } from density function h ( x ) ;
(3)
Calculate the Monte Carlo approximation of m ( Γ w i ; ψ ) as
m S 1 ( Γ w i ; ψ ) = 1 S s = 1 S x ˜ i s g ( α + β x T x i s + β z T z i ) f U ( x i s Γ w i ; ϕ ) h ( x i s ) ,
and
m S 2 ( Γ w i ; ψ ) = 1 S s = S + 1 2 S x ˜ i s g ( α + β x T x i s + β z T z i ) f U ( x i s Γ w i ; ϕ ) h ( x i s ) ;
(4)
Apply the gradient descent method on approximated loss function
L n ( ψ ) = 1 2 i = 1 n ρ ^ i , S 1 T ( ψ ) A i ρ ^ i , S 2 ( ψ ) + n j = 1 d p λ n ( | β j | ) ,
where ρ ^ i , S 1 ( ψ ) = y i x ˜ i * m S 1 ( Γ ^ w i ; ψ ) and ρ ^ i , S 2 ( ψ ) = y i x ˜ i * m S 2 ( Γ ^ w i ; ψ ) .
For some penalty functions like SCAD and MCP, b and Σ are both zero when the tuning parameter λ n is sufficiently small. Hence the resulting estimator has the oracle performance such that β ^ J c = 0 and the asymptotic distribution of ψ ^ J is given by
n ( ψ ^ J ψ 0 J ) d N ( 0 , H 1 D C D T H 1 ) .

3. Linear ME Model

For the linear regression model, the proposed regularized IV method simplifies to a regularized two-stage least squares method when the weight matrix A = I . Specifically, consider a linear model
Y = α + β x T X + β z T Z + ϵ ,
where ϵ N ( 0 , σ 2 ) , E ( U | W , Z ) = 0 and E ( U U T | W , Z ) = Σ u . Without loss of generality, assume the intercept α is zero. The regularized instrumental variable estimator is defined as the minimizer of the following objective function
1 2 i = 1 n ( y i β x T x ^ i β z T z i ) 2 + n j = 1 d p λ n ( | β j | ) .
where x ^ i = Γ ^ w i . Since the naive estimator is inconsistent in estimation and selection in general, the observed covariates are replaced by its corrected version x ^ based on instrumental variables. Furthermore, since the objective function in (8) involves the non-independence of a random sample ( y i , x ^ i , z i ) due to the involvement of Γ ^ , the standard results for regularized linear regression cannot be applied directly. For the linear regression model, we have the following results.
Corollary 1.
If a n = O ( n 1 / 2 ) , b n = o ( 1 ) and E ( W ˜ W ˜ T ) is positive definite, where W ˜ = ( W T , Z T ) T , then there exists a local minimizer β ^ of Q ( β ) such that | | β ^ β 0 | | = O p ( n 1 / 2 ) .
Corollary 2.
If λ n 0 , n λ n and lim inf n lim inf ξ 0 + p λ n ( ξ ) / λ n > 0 , then with probability approaching 1, the root n consistent estimator β ^ in (8) satisfies
(a) β ^ J c = 0 ,
(b) β ^ J has the following asymptotic normal distribution
n ( H + Σ ) ( β ^ J β 0 J + ( H + Σ ) 1 b ) d N ( 0 , E [ ( Y β 0 J T X ˜ * ) 2 Γ ˜ 0 J W ˜ W ˜ T Γ ˜ 0 J T ] ) ,
where H = Γ ˜ 0 J E ( W ˜ W ˜ T ) Γ ˜ 0 J T , Γ ˜ = diag ( Γ ^ , I q ) and Γ ˜ J is the matrix consisting of rows of Γ ˜ corresponding to the index set J.

4. Numerical Examples

In this section, we conduct simulations to assess the finite sample performance of the proposed instrumental variable estimator (IVE) on variable selection as well as parameter estimation. For comparison purposes, we also calculate the regularized estimator (TRE) using the true data ( y i , x i T , z i T ) , and the naive estimator (NAE) using the observed sample ( y i , x i * T , z i T ) . The proposed method is implemented with SCAD penalty function. The tuning parameter is selected by BIC that has the property of recovering the true model consistently for SCAD penalty (Wang et al. 2007). To assess the selection performance, we calculate the false-positive (FP) rate that is the average number of zero coefficients incorrectly estimated as non-zero, and the false negative (FN) rate that is the average number of non-zero coefficients incorrectly estimated as zero. We also calculate the Matthews correlation coefficient (MCC) that is a general measure of describing the confusion matrix of true/false positives/negatives and is defined as ( T P × T N F P × F N ) / [ ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) ] 1 / 2 . The MCC ranges from −1 to 1, where the large value indicates good prediction. Finally, we calculate the mean squared error (MSE) | | β ^ β 0 | | 2 to assess estimation accuracy.
Example 1.
First, we consider a linear model Y = β x X + β z T Z + ϵ , where ( β x , β z T ) = ( 3 , 1.5 , 0 , 0 , 2 , 0 , 0 , 0 ) and ( Z 1 , W , Z 2 , , Z 7 ) T are jointly generated from N ( 0 , Σ ) with Σ i j = 0.7 | i j | . In addition, the true covariate X is generated as X = 1.5 W + U , where ϵ and U are the standard normal. The observed surrogate X * is generated as X * = X + δ , where δ follows a normal distribution with mean zero and variance σ δ 2 .
Figure 1 shows the estimated coefficients, FP and FN, for various values of σ δ 2 with a sample size n = 200 . The results of the naive method (NAE) are on the left-hand side, while the results of the IVE method are on the right-hand side. Both the FP and FN increase with σ δ 2 for the naive method, as seen from the bottom left graph. In contrast, the IV estimator is robust against the magnitude of σ δ 2 . The simulation results with σ δ 2 = 2 are reported in Table 1. It can be seen that the naive method has both high FP and FN in the selection results. The increase in FN is due to the fact that covariate z 1 is dropped from the model incorrectly, as shown in Table 1. On the other hand, the TR and IV methods perform well in recovering the true model.
The selection results of three methods ( σ δ 2 = 1 ) with sample sizes n = 50 , 100 , 200 are reported in Table 2. As the sample size increases, it can be seen that both FP and FN decrease for TR and IV methods, whereas the FP increases for the naive method. In addition, the performance of MCC and MSE is better for TR and IV methods than that of the naive method. The selection is biased for the naive method regardless of the sample size.
Example 2.
In this example, we consider a logistic model where Y | ( X , Z ) follows the Bernoulli distribution with mean function g ( α + β x X + β z T Z ) , where g ( η ) = exp ( η ) / ( 1 + exp ( η ) ) and ( α , β x , β z T ) = ( 1 , 3 , 1.5 , 0 , 0 , 2 , 0 , 0 , 0 ) . The covariates ( Z 1 , W , Z 2 , , Z 7 ) T are jointly generated from N ( 0 , Σ ) with Σ i j = 0.7 | i j | . Further, X = 1.5 W + U and the rest of the model setting is the same as in Example 1. The simulation results are shown on the left-hand side of Table 3. The results show similar patterns as in Example 1, where values of FP and FN are both low for TR and IV methods, compared with the NA method.
Example 3.
In this example, we consider the Poisson model for Y | ( X , Z ) with mean function exp ( α + β x X + β z T Z ) , and the rest of model setting is the same as in Example 2. The simulation results are shown on the right-hand side of Table 3. It can be seen that in the Poisson log-linear model, the naive method performs the worst among all three methods, where FP and FN remain at a high level. In contrast, the results from the IV method is similar to that of the TR method, where values of FP, FN and MSE are close to zero and MCC is close to one.
Example 4.
In this example, we consider the linear model of Example 1 with relatively high dimension p. In particular, we simulate the data with sample size n = 50 and p = 100 . The true parameter values are ( β x , β z ) = ( 3 , 1.5 , 0 , 0 , 2 , 0 , 0 , , 0 ) . The simulation results in Table 4 show that the proposed IVE method performs similarly to the small p scenarios, and, in particular, it outperforms the naive method clearly.
Example 5.
In this example, we consider the linear model of Example 5 with different parameter settings where ( β x , β z ) = ( 0 , 1.5 , 2 , 0 , 1 , 0 , 0 , 0 ) . In this case, the corresponding coefficient is 0 for the error prone covariate x. The simulation results are presented in Table 5. It can be observed that the ME has virtually no effects on the FN. Also, as the sample size increases, the IV estimation performs nearly the same as the TR model.
Example 6.
In this example, we consider the linear model. First, we consider a linear model Y = β x X + β z T Z + ϵ , where ( β x , β z T ) = ( 3 , 1.5 , 0 , 0 , 2 , 0 , 0 , 0 ) and ( Z 1 , X , W , Z 2 , , Z 7 ) T are jointly generated from N ( 0 , Σ ) with Σ i j = 0.7 | i j | . Note that the covariate X and W are jointly generated together with all other covariates in this case. The rest of the model setting remains the same as in Example 1. The results are shown in Table 6, which are similar as those in Example 1, regardless of the data-generating mechanism. The IV estimator performs better compared with NA estimator as the sample size increases.

5. Real Data Example

We applied the proposed method on a real dataset in this section. The Mobility Program Clinical Research Unit of St. Michael’s Hospital conducted research studying the prognostic factors of work productivity after a limb injury. The dataset was collected through the Work Limitations Questionnaire (WLQ) from a group of injured workers attending Shoulder & Elbow Specialty clinic, which is managed by Workplace Safety & Insurance Board of Ontario, Canada. The WLQ developed by Lerner et al. (2001) and Lerner et al. (2012) offers a way of measuring how the health problems affect the job performance and the productivity loss at work. The WLQ has shown its good criterion validity and is adopted by several research institutes such as Ida et al. (2012) and Tang et al. (2011). There were 168 recruited participants who were worker compensation claimants and may or may not be working at the time of initial clinic attendance. Typically, injured workers were referred to these clinics if they have a chronic work-related upper limb injury greater than 6 months in duration without sufficient recovery.
In this paper, we are interested in exploring the prognostic factors of the WLQ index. The response variable, the work limitations questionnaire index, evaluates the proportion of time where difficulty is experienced in the following four different domains: time management, physical demands, and mental–interpersonal and output demands. This index quantifies the productivity loss at work as a result of health disorders. The predictors (prognostic factors) are supervisor support ( x * ); lower quick disabilities of the arm, shoulder and hand (DASH) score ( z 1 ); better mental health factor score ( z 2 ); better physical health factor score ( z 3 ); age ( z 4 ); lower von Korff pain intensity score ( z 5 ); lower von Korff pain intensity score ( z 6 ); and lower shoulder pain and disability index ( z 7 ). The instrumental variables are organization support and decision authority. The work disability is an important issue in public health, caused by whether the productivity loss at work can exceed the direct medical cost. In the literature, supervisor support is associated with the productivity and health outcomes of workers. Physical and mental disorders are also significantly related to work loss. For example, positive support from a supervisor is associated with low degree of stress and low sickness absence of the employees (Nielsen et al. 2006; Stansfeld et al. 1997). Physical–mental comorbidity is also found to have an additive increase effect in work loss (Buist-Bouwman et al. 2005). The estimation results are presented in Table 7. It can be observed that, besides the covariates lower quick DASH score and better mental health factor score that are retained in the model for the naive method, the IV method keeps the supervisor support and lower shoulder pain and disability index.

6. Conclusions and Discussion

Although the regularized regression methods have been widely investigated in the literature, most of the published works assume the data are precisely measured. Some researchers study the high-dimensional measurement error models, assuming the ME covariance matrix is known or can be estimated using replicate data. However, the replicate data are not always available in real applications. Instead, instrumental data are more flexible and relatively easy to obtain. Technically, the assumption of instrumental variable is weaker than the replicate measurements. Although the IV approach is used by some authors to study general endogeneity problem in linear models, very few studies focus specifically on ME problems. Developing methodologies in this particular context allow us to obtain more insights into ME issues such as its impact on variable selection and parameter estimation in high-dimensional models.
In this paper, we extended the instrument variable method to the regularization estimation setup to correct for ME effects in both linear and generalized linear ME models. Besides the attenuation effect, the ME also affects the selection results in various settings. The proposed estimator is shown to have the oracle property, which is consistent in both variable selection and parameter estimation. The asymptotic distribution is derived for the proposed estimator in both linear and generalized linear ME models. Extensive simulation studies for linear, logistic and Poisson log-linear models are conducted examining the performance of the proposed estimator, as well as the naive estimator. Simulation results show that the proposed estimator performs well in various model settings with a finite sample size. The extension of the proposed method to nonlinear models is of interest for future research.
In this paper, we assume that the possibly mismeasured covariates are of low dimension. In future, it is important to study the case where a large number of covariates are measured with errors.

Author Contributions

Methodology, L.X. and L.W.; software programming, L.X.; validation, L.X. and L.W.; investigation, L.X. and L.W.; data curation, L.X.; writing—original draft preparation, L.X.; writing—review and editing, L.X. and L.W.; visualization, L.X.; supervision, L.W.; project administration, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Natural Sciences and Engineering Research Council of Canada (RGPIN 2023-04924).

Data Availability Statement

Data is unavailable due to privacy restrictions.

Acknowledgments

The authors thank the editor and three anonymous referees for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

We assume the following regularity conditions.
Assumption A1.
E ( ( ρ ( ψ ) ρ ( ψ 0 ) ) T A ( W ) ( ρ ( ψ ) ρ ( ψ 0 ) ) = 0 if and only if ψ = ψ 0 , where ρ ( ψ ) = Y X ˜ * m ( Γ W ; ψ ) .
Assumption A2.
The parameter spaces Ψ R d + k + 1 and Γ R p × l are compact.
Assumption A3.
f U ( u ; ϕ ) is continuously differentiable with respect to u and ϕ respectively. Furthermore, E | | A ( W ) | | ( | Y | 2 + | | Y X ˜ * | | 2 ) < , functions G ( X , Z ; θ ) f U ( X Γ W ; ϕ ) and G ( X , Z ; θ ) f U ( X Γ W ; ϕ ) / u T and their first-order partial derivatives with respect to θ and ϕ respectively are dominated by a function η ( X , Z , W ) that satisfies
E | | A ( W ) | | ( η ( x , z , W ) ( | | x | | + | | z | | + 1 ) d x d z ) 2 < .
Assumption A4.
f U ( u ; ϕ ) is twice continuously differentiable with respect to ϕ in an open neibouhood of ϕ 0 and the first two partial derivatives of G ( X , Z ; θ ) f U ( X Γ W ; ϕ ) with respect to ψ satisfies the similar dominating condition as in Assumption A3.
Assumption A5.
The matrix E ρ T ( ψ 0 J ) ψ J A ( W ) ρ ( ψ 0 J ) ψ J T is non-singular.
Proof of Theorem 1.
The score and hessian function of L n ( ψ ) are written as G n ( ψ ) and H n ( ψ ) , respectively. By Assumption A4, using the Taylor expansion, the score of the objective function Q n ( ψ ) can be written as
S n ( ψ ) = G n ( ψ 0 ) + p ˜ λ n ( | β 0 | ) sign ( ψ 0 ) + H n ( ψ * ) ( ψ ψ 0 ) + p ˜ λ n ( | β 0 | ) ( ψ ψ 0 ) ( 1 + o p ( 1 ) ) ,
where ψ * is in between ψ and ψ 0 . It is sufficient to show that S n ( ψ ) = 0 has a solution ψ satisfying | | ψ ^ ψ 0 | | = O p ( n 1 / 2 ) . To this end, we show for any ψ such that | | ψ ψ 0 | | = n 1 / 2 C the inequality ( ψ ψ 0 ) T S n ( ψ ) > 0 holds with probability approaching 1. By Assumption A5 and b n = 0 ( 1 ) , it follows that
( ψ ψ 0 ) T S n ( ψ ) = ( ψ ψ 0 ) T ( G n ( ψ 0 ) + p ˜ λ n ( | β 0 | ) sign ( ψ 0 ) ) + n | | ψ ψ 0 | | 2 ( 1 + o p ( 1 ) ) .
It can be seen that by Assumptions A1–A3, the first term is of order O p ( C ) , and the second term is of order O p ( C 2 ) . Hence, for a sufficiently large C, the second term dominates the others. ( ψ ψ 0 ) T S n ( ψ ) is shown to be positive with probability tending to 1, which completes the proof. □
Proof of Theorem 2a.
By Assumption A4, the Taylor expansion of S n ( ψ ) around ψ 0 is given by
S n ( ψ ) = G n ( ψ 0 ) + H n ( ψ * ) ( ψ ψ 0 ) + n p ˜ λ n ( | β | ) sign ( ψ ) = n λ n 1 n λ n G n ( ψ 0 ) + 1 n λ n H n ( ψ * ) ( ψ ψ 0 ) + 1 λ n p ˜ λ n ( | β | ) sign ( ψ ) .
For j J c , ϵ n = C n 1 / 2 , by Assumptions A1–A5, it can be shown that
S n ( β j ) = n λ n O p ( 1 n λ n ) + p λ n ( | β j | ) λ n sign ( β j ) .
Together with the condition lim inf n lim inf ξ 0 + p λ n ( ξ ) / λ n > 0 , we have G n ( β j ) > 0 if 0 < β j < ϵ n ; G n ( β j ) < 0 if ϵ n < β j < 0 . Hence, P ( β j = 0 ) 1 for j J c . □
Proof of Theorem 2b.
By Assumption A4, the Taylor expansion of S n ( ψ J ) around ψ 0 J is given by
S n ( ψ J ) = G n ( ψ 0 J ) + H n ( ψ J * ) ( ψ J ψ 0 J ) + n b + n p ˜ λ n ( | β J * | ) ( ψ J ψ 0 J ) ,
where
p ˜ λ n ( | β J * | ) = diag ( 0 , p λ n ( | β J * T | ) , 0 ) ,
G n ( ψ 0 J ) = i = 1 n ρ ^ i T ( ψ 0 J ) ψ J A i ρ ^ i ( ψ 0 J )
and
H n ( ψ J * ) = i = 1 n ρ ^ i T ( ψ J * ) ψ J A i ρ ^ i ( ψ J * ) ψ J T + ρ ^ i T ( ψ J * ) A i I s + 2 vec ρ ^ i T ( ψ J * ) / ψ J ψ J T .
By rearranging the terms, we obtain
1 n G n ( ψ 0 J ) = n 1 n H ( ψ J * ) + p ˜ λ n ( | β J * | ) ( ψ J ψ 0 J ) + n p ˜ λ n ( | β 0 J | ) sign ( ψ 0 J ) .
Note that
1 n H ( ψ J * ) p E ρ T ( ψ 0 J ) ψ J A ( W ) ρ ( ψ 0 J ) ψ J T + ρ T ( ψ 0 J ) A I s + 2 vec ρ T ( ψ 0 J ) / ψ J ψ J T = H ,
since the expectation of the second term is
E ρ T ( ψ J * ) A I s + 2 vec ρ T ( ψ J * ) / ψ J ψ J T = E E ( ρ T ( ψ J * ) | W ˜ ) A I s + 2 vec ρ T ( ψ J * ) / ψ J ψ J T = 0 .
Now, consider the first-order Taylor expansion of G n ( ψ 0 J ) around γ 0 J . Again by Assumption A4,
G n ( ψ 0 J ) = i = 1 n ρ i T ( ψ 0 J ) ψ J A i ρ i ( ψ 0 J ) + 2 L ˜ n ψ 0 J ψ J γ J T γ ^ J γ 0 J ,
where
2 L ˜ n ψ 0 J ψ J γ J T = i = 1 n ρ i T ( ψ 0 J , γ J * ) ψ J A i ρ i ( ψ 0 J , γ J * ) γ J T + ( ρ i T ( ψ 0 J , γ J * ) A i I s + 2 ) vec ( ρ i T ( ψ 0 J , γ J * ) / ψ J ) γ J T .
Using a similar argument of Equation (A1), it can be shown that
1 n 2 L ˜ n ψ 0 J ψ J γ J T p E ρ T ( ψ 0 J ) ψ J A ρ ( ψ 0 J ) γ J T .
In addition, the term γ ^ J γ 0 J in Equation (A2) can be written as
γ ^ J γ 0 J = ( I p W i W i T ) 1 ( X J i * Γ 0 J W i ) W i .
Hence, Equation (A2) can be written as
G n ( ψ 0 J ) = D n i = 1 n K i ,
where
D n = I s + 2 , 2 L ˜ n ψ 0 J ψ J γ J T ( I p ( i = 1 n W i W i T ) 1 ) I s + 2 , E ρ T ( ψ 0 J ) ψ J A ρ ( ψ 0 J ) γ J T ( I p E ( W W T ) 1 ) = D ,
K i = ρ i T ψ 0 J / ψ J · A i ρ i ψ 0 J ( X J i * Γ 0 J W i ) W i .
Then, together with Assumption A5,
n ( H + Σ ) ( ψ ^ J ψ 0 J ) + n b d N ( 0 , D C D T ) .

References

  1. Abarin, Taraneh, and Liqun Wang. 2012. Instrumental variable approach to covariate measurement error in generalized linear models. Annals of the Institute of Statistical Mathematics 64: 475–93. [Google Scholar] [CrossRef]
  2. Buist-Bouwman, M. A., Ron de Graaf, W. A. M. Vollebergh, and Johan Ormel. 2005. Comorbidity of physical and mental disorders and the effect on work-loss days. Acta Psychiatrica Scandinavica 111: 436–43. [Google Scholar] [CrossRef] [PubMed]
  3. Candes, Emmanuel, and Terence Tao. 2007. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics 35: 2313–51. [Google Scholar]
  4. Fan, Jianqing, and Jinchi Lv. 2010. A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20: 101. [Google Scholar]
  5. Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
  6. Fan, Jianqing, and Yuan Liao. 2014. Endogeneity in high dimensions. The Annals of Statistics 42: 872. [Google Scholar] [CrossRef] [PubMed]
  7. Frank, Lldiko E., and Jerome H. Friedman. 1993. A statistical view of some chemometrics regression tools. Technometrics 35: 109–35. [Google Scholar] [CrossRef]
  8. Huang, Xianzheng, and Hongmei Zhang. 2013. Variable selection in linear measurement error models via penalized score functions. Journal of Statistical Planning and Inference 143: 2101–11. [Google Scholar] [CrossRef]
  9. Ida, Hiromasa, Kazumi Nakagawa, Masako Miura, Kyoko Ishikawa, and Naonori Yakura. 2012. Development of the work limitations questionnaire japanese version (wlq-j): Fundamental examination of the reliability and validity of the wlq-j. Sangyo eiseigaku zasshi = Journal of Occupational Health 54: 101–7. [Google Scholar] [CrossRef]
  10. Lerner, Debra, Benjamin C. Amick, III, William H. Rogers, Susan Malspeis, Kathleen Bungay, and Diane Cynn. 2001. The work limitations questionnaire. Medical Care 39: 72–85. [Google Scholar] [CrossRef]
  11. Lerner, Debra, David Adler, Richard C. Hermann, Hong Chang, Evette J. Ludman, Annabel Greenhill, Katherine Perch, William C. McPeck, and William H. Rogers. 2012. Impact of a work-focused intervention on the productivity and symptoms of employees with depression. Journal of Occupational and Environmental Medicine 54: 128. [Google Scholar] [CrossRef] [PubMed]
  12. Liang, Hua, and Runze Li. 2009. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association 104: 234–48. [Google Scholar] [CrossRef] [PubMed]
  13. Lin, Wei, Rui Feng, and Hongzhe Li. 2015. Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. Journal of the American Statistical Association 110: 270–88. [Google Scholar] [CrossRef] [PubMed]
  14. Ma, Yanyuan, and Runze Li. 2010. Variable selection in measurement error models. Bernoulli: Official Journal of the Bernoulli Society for Mathematical Statistics and Probability 16: 274. [Google Scholar] [CrossRef] [PubMed]
  15. Negahban, Sahand N., Pradeep K. Ravikumar, Martin J. Wainwright, and Bin Yu. 2012. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statistical Science 27: 538–57. [Google Scholar] [CrossRef]
  16. Nielsen, Martin L., Reiner Rugulies, Karl B. Christensen, Lars Smith-Hansen, and Tage S. Kristensen. 2006. Psychosocial work environment predictors of short and long spells of registered sickness absence during a 2-year follow up. Journal of Occupational and Environmental Medicine 48: 591–98. [Google Scholar] [CrossRef] [PubMed]
  17. Stansfeld S. A., G. S. Rael, J. Head, M. Shipley, and M. Marmot. 1997. Social support and psychiatric sickness absence: A prospective study of british civil servants. Psychological Medicine 27: 35–48. [Google Scholar] [CrossRef]
  18. Tang, Kenneth, Dorcas E. Beaton, Annelies Boonen, Monique A. Gignac, and Claire Bombardier. 2011. Measures of work disability and productivity: Rheumatoid arthritis specific work productivity survey (wps-ra), workplace activity limitations scale (wals), work instability scale for rheumatoid arthritis (ra-wis), work limitations questionnaire (wlq), and work productivity and activity impairment questionnaire (wpai). Arthritis Care & Research 63: S337. [Google Scholar]
  19. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 58: 267–88. [Google Scholar] [CrossRef]
  20. Wang, Hansheng, Runze Li, and Chih-Ling Tsai. 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94: 553–68. [Google Scholar] [CrossRef]
  21. Wang, Liqun. 2021. Identifiability in measurement error models. In Handbook of Measurement Error Models. Boca Raton: Chapman and Hall/CRC, pp. 55–70. [Google Scholar]
  22. Wang, Liqun, and Cheng Hsiao. 2011. Method of moments estimation and identifiability of semiparametric nonlinear errors-in-variables models. Journal of Econometrics 165: 30–44. [Google Scholar] [CrossRef]
  23. Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef] [PubMed]
  24. Zhang, Xinyu, Haiying Wang, Yanyuan Ma, and Raymond J. Carroll. 2017. Linear model selection when covariates contain errors. Journal of the American Statistical Association 112: 1553–61. [Google Scholar] [CrossRef] [PubMed]
  25. Zhong, Wei, Wei Zhou, Qingliang Fan, and Yang Gao. 2020. Dummy endogenous treatment effect estimation using high-dimensional instrumental variables. Canadian Journal of Statistics 50: 795–819. [Google Scholar] [CrossRef]
  26. Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101: 1418–29. [Google Scholar] [CrossRef]
  27. Zou, Hui, and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 67: 301–20. [Google Scholar] [CrossRef]
Figure 1. Estimation and selection results for Example 2 with n = 200 . In the upper panel, the estimated non-zero parameters are marked red, while the estimated zero parameters are marked blue. In the lower panel, FN is marked red and FP is marked blue.
Figure 1. Estimation and selection results for Example 2 with n = 200 . In the upper panel, the estimated non-zero parameters are marked red, while the estimated zero parameters are marked blue. In the lower panel, FN is marked red and FP is marked blue.
Econometrics 12 00021 g001
Table 1. Simulation results of Example 1 with n = 200 , σ δ 2 = 2 .
Table 1. Simulation results of Example 1 with n = 200 , σ δ 2 = 2 .
FPFNMCCMSE z 1 x z 2 z 3 z 4 z 5 z 6 z 7
TR0.10.00.980.02TR1001009998100989898
IV0.20.00.960.34IV1001009898100979795
NA1.00.80.545.69NA231001796100969596
Table 2. Simulation results of Example 1 with different sample sizes and σ δ 2 = 1 .
Table 2. Simulation results of Example 1 with different sample sizes and σ δ 2 = 1 .
n = 50n = 100n = 200
FPFNMCCMSEFPFNMCCMSEFPFNMCCMSE
TR0.300.920.080.100.960.030.100.980.01
IV0.300.920.520.200.960.190.100.980.07
NA0.500.870.870.600.850.710.800.820.64
Table 3. Simulation results of Examples 2 and 3 with n = 200 , σ δ 2 = 5 .
Table 3. Simulation results of Examples 2 and 3 with n = 200 , σ δ 2 = 5 .
FPFNMCCMSE FPFNMCCMSE
TR0.40.00.901.51TR000.990.03
IV0.60.00.871.62IV0.300.940.07
NA1.10.10.753.20NA2.60.70.3225.68
Table 4. Simulation results of Example 4 with n = 50 , p = 100 .
Table 4. Simulation results of Example 4 with n = 50 , p = 100 .
σ δ 2 = 1 σ δ 2 = 2 σ δ 2 = 5
FPFNMCCMSEFPFNMCCMSEFPFNMCCMSE
TR0.700.000.900.190.600.000.910.120.600.000.910.17
IV5.800.100.564.704.600.000.604.195.400.000.574.32
NA8.100.200.4611.406.300.200.5110.907.700.200.4711.45
Table 5. Simulation results of Example 5.
Table 5. Simulation results of Example 5.
n = 100n = 200n = 500
FPFNMCCMSEFPFNMCCMSEFPFNMCCMSE
TR0.180.0014.460.020.060.0014.820.120.040.0014.880.00
IV0.320.0014.040.220.200.0014.404.190.020.0014.940.03
NA0.320.0014.040.680.360.0013.9210.900.820.0012.540.63
Table 6. Simulation results of Example 6.
Table 6. Simulation results of Example 6.
n = 100n = 200n = 500
FPFNMCCMSEFPFNMCCMSEFPFNMCCMSE
TR0.260.0014.220.040.080.0014.760.010.000.0015.000.00
IV0.360.0013.920.370.360.0013.920.120.200.0014.400.05
NA0.700.0012.900.890.840.0012.480.861.060.0011.820.92
Table 7. Estimation results of WLQ data.
Table 7. Estimation results of WLQ data.
IV Naive Full
coefsecoefsecoefse
int8.190.0908.190.4048.190.408
x * –0.090.090--0.130.445
z 1 –1.830.164–1.690.457–1.490.706
z 2 –1.560.119–1.600.457–1.740.501
z 3 ----–0.520.555
z 4 ----–0.330.415
z 5 ----–0.250.583
z 6 ----0.260.451
z 7 0.230.128--0.050.629
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xue, L.; Wang, L. Instrumental Variable Method for Regularized Estimation in Generalized Linear Measurement Error Models. Econometrics 2024, 12, 21. https://doi.org/10.3390/econometrics12030021

AMA Style

Xue L, Wang L. Instrumental Variable Method for Regularized Estimation in Generalized Linear Measurement Error Models. Econometrics. 2024; 12(3):21. https://doi.org/10.3390/econometrics12030021

Chicago/Turabian Style

Xue, Lin, and Liqun Wang. 2024. "Instrumental Variable Method for Regularized Estimation in Generalized Linear Measurement Error Models" Econometrics 12, no. 3: 21. https://doi.org/10.3390/econometrics12030021

APA Style

Xue, L., & Wang, L. (2024). Instrumental Variable Method for Regularized Estimation in Generalized Linear Measurement Error Models. Econometrics, 12(3), 21. https://doi.org/10.3390/econometrics12030021

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop