Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application

Opoku, Eugene A.; Ahmed, Syed Ejaz; Nathoo, Farouk S.

doi:10.3390/e23101348

Open AccessArticle

Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application

by

Eugene A. Opoku

^1,*,

Syed Ejaz Ahmed

² and

Farouk S. Nathoo

¹

Department of Mathematics and Statistics, University of Victoria, Victoria, BC V8P 5C2, Canada

²

Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(10), 1348; https://doi.org/10.3390/e23101348

Submission received: 9 September 2021 / Revised: 8 October 2021 / Accepted: 12 October 2021 / Published: 15 October 2021

(This article belongs to the Special Issue Big Data Analytics and Information Science for Business and Biomedical Applications II)

Download

Browse Figures

Versions Notes

Abstract

:

In a host of business applications, biomedical and epidemiological studies, the problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis for linear mixed models (LMM). We consider an efficient estimation strategy for high-dimensional data application, where the dimensions of the parameters are larger than the number of observations. In this paper, we are interested in estimating the fixed effects parameters of the LMM when it is assumed that some prior information is available in the form of linear restrictions on the parameters. We propose the pretest and shrinkage estimation strategies using the ridge full model as the base estimator. We establish the asymptotic distributional bias and risks of the suggested estimators and investigate their relative performance with respect to the ridge full model estimator. Furthermore, we compare the numerical performance of the LASSO-type estimators with the pretest and shrinkage ridge estimators. The methodology is investigated using simulation studies and then demonstrated on an application exploring how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer’s disease.

Keywords:

linear mixed model; ridge estimation; pretest and shrinkage estimation; multicollinearity; asymptotic bias and risk; LASSO estimation; high-dimensional data

1. Introduction

In many fields such as bio-informatics, physical biology, and epidemiology, the response of interest is represented by repeated measures of some variables of interest that are collected over a specified time period for different independent subjects or individuals. These types of data are commonly encountered in medical research where the responses are subject to various time-dependent and time-constant effects such as pre- and post-treatment types, gender effect, and baseline measures, among others. A widely-used statistical tool in the analysis and modeling of longitudinal and repeated measures data is the linear mixed effects model (LMM) [1,2]. This model provides an effective and flexible way to describe the means and the covariance structures of a response variable after accounting for within subject correlation.

The rapid growth in the size and scope of longitudinal data has created a need for innovative statistical strategies in longitudinal data analysis. Classical methods are based on the assumption that the number of predictors is less than the number of observations. However, there is an increasing demand for efficient prediction strategies for analysis of high-dimensional data, where the number of observed data elements (sample size) are smaller than the number of predictors in a linear model context. Existing techniques that deal with high-dimensional data mostly rely on various penalized estimators. Due to the trade-off between model complexity and model prediction, the statistical inference of model selection becomes an extremely important and challenging problem in high-dimensional data analysis.

Over the years, many penalized regularization approaches have been developed to do variable selection and estimation simultaneously. Among them, the least absolute shrinkage and selection operator (LASSO) is commonly used [3]. It is a useful estimation technique in part due to its convexity and computational efficiency. The LASSO approach is based on an

ℓ_{1}

penalty for regularization of regression parameters. Ref. [4] provides a comprehensive summary of the consistency properties of the LASSO approach. Related penalized likelihood methods have been extensively studied in the literature, see for example [5,6,7,8,9,10]. The penalized likelihood methods have a close connection to Bayesian procedures. Thus, the LASSO estimate corresponds to a Bayes method that puts a Laplacian (double-exponential) prior on the regression coefficients [11,12].

In this paper, our interest lies in estimating the fixed effect parameters of the LMM using a ridge estimation technique when it is assumed that some prior information is available in the form of potential linear restrictions on the parameters. One possible source of prior information is using a Bayesian approach. An alternative source of prior information may be obtained from previous studies or expert knowledge that search for or assume sparsity patterns.

We consider the problem of fixed effect parameter estimation for LMMs when there exist many predictors relative to the sample size. These predictors may be classified into two groups: sparse and non-sparse. Thus, there are two choices to be considered: a full model with all predictors, and a sub-model that contains only non-sparse predictors. When the sub-model based on available subspace information is true (i.e., the assumed restriction holds), it then provides more efficient statistical inferences than those based on a full model. In contrast, if the sub-model is not true, the estimates could become biased and inefficient. The consequences of incorporating subspace information therefore depend on the quality or reliability of the information being incorporated into the estimation procedure. One way to deal with uncertain subspace information is to use a pretest estimation strategy. The validity of the information is tested before incorporation into a final estimator. Another approach is shrinkage estimation, which shrinks the full model estimator to the sub-model estimator by utilizing subspace information. Besides these estimation strategies, there is a growing literature on simultaneous model selection and estimation. These approaches are known as penalty strategies. By shrinking some regression coefficients toward zero, the penalty methods simultaneously select a sub-model and estimate its regression parameters. Several authors have investigated the pretest, shrinkage, and penalty estimation strategies in partial linear model, Poisson regression model, and Weibull censored regression model [13,14,15].

To formulate the problem, we suppose that the vector of the fixed effects parameter

β

in the LMM can be partitioned into two sub-vectors

β = {(β_{1}^{'}, β_{2}^{'})}^{'}

, where

β_{1}

is the coefficient vector of non-sparse predictors and

β_{2}

is the coefficient vector of sparse predictors. Our interest lies in the estimation of

β_{1}

when

β_{2}

is close to zero. To deal with this problem in the context of low dimensional data, ref. [16] propose an improved estimation strategy using sub-model selection and post-estimation for the LMM. Within this framework, linear shrinkage and shrinkage pretest estimation strategies are developed, which combine full model and sub-model estimators in an effective way as a trade-off between bias and variance. Ref. [17] extend this study by using a likelihood ratio test to develop James–Stein shrinkage and pretest estimation methods based on LMM for longitudinal data. In addition, the non-penalty estimators are compared with several penalty estimators (LASSO, adaptive LASSO and Elastic Net) for best performance.

In most real data situations, there is also the problem of multicollinearity among predictor variables for high-dimensional data. Various biased estimation techniques such as shrinkage estimation, partial least squares estimation [18] and Liu estimators [19] have been implemented to deal with this problem, but the widely used technique is ridge estimation [20]. The ridge estimator overcomes the weakness of the least squares estimator with a smaller mean squared error. To overcome and combat multicollinearity, ref. [21] propose pretest and Stein-type ridge regression estimators for linear and partially linear models. Furthermore, ref. [22] also develop shrinkage estimation based on Liu regression to overcome multicollinearity in linear models.

Our primary focus is on the estimation and prediction problem for linear mixed effect models when there are many potential predictors that have a weak or no influence on the response of interest. This method simultaneously controls overfitting using general least square estimation with a roughness penalty. We propose pretest and shrinkage estimation strategies using the ridge estimation technique as a base estimator and numerically compare their performance with the LASSO and adaptive LASSO estimators. Our proposed estimation strategy is applied to both high-dimensional and low-dimensional data.

The rest of this article is organized as follows. In Section 2, we present the linear mixed effect model and the proposed estimation techniques. We introduce the full and sub-model estimators based on ridge estimation. Thereafter, we construct the pretest and shrinkage ridge estimators. Section 3 provides the asymptotic bias and risk of these estimators. A Monte Carlo simulation is used to evaluate the performance of the estimators including a comparison with the lasso-type estimators, and the results are reported in Section 4. Section 5 presents a demonstration of the proposed methodology on a high-dimensional resting-state effective brain connectivity and genetic data. We also illustrate the proposed estimation methods in an application to a low-dimensional Amsterdam growth and health study. Section 6 presents a discussion with recommendations.

2. Model and Estimation Strategies

In this section, we present the linear mixed effect model and the proposed estimation strategies.

2.1. Linear Mixed Model

Suppose that we have a sample of N subjects. For the

i^{t h}

subject, we collect the response variable

y_{i j}

for the jth time, where

i = 1 \dots, n; j = 1 \dots, n_{i}

and

N = \sum_{i = 1}^{n} n_{i}

. Let

Y_{i} = {(y_{i 1}, \dots y_{i n_{i}})}^{'}

denotes the

n_{i} \times 1

vector of responses from the ith subject. Let

X_{i} = {(x_{i 1}, \dots, x_{i n_{i}})}^{'}

and

Z_{i} = {(z_{i 1}, \dots, z_{i n_{i}})}^{'}

be

n_{i} \times

p and

n_{i} \times

q known fixed-effects and random-effect design matrix for the ith subject of full rank p and q, respectively. The linear mixed effect model [1] for a vector of repeated responses

Y_{i}

on the ith subject is assumed to have the form

Y_{i} = X_{i} β + Z_{i} a_{i} + ϵ_{i},

(1)

where

β = {(β_{1}, \dots, β_{p})}^{'}

is the p × 1 vector of unknown fixed-effect parameters or regression coefficients,

a_{i}

is the q × 1 vector of unobservable random effects for the ith subject, assumed to come from a multivariate normal distribution with zero mean and a covariance matrix G, where G is an unknown

q \times q

covariance matrix and

ϵ_{i}

denotes

n_{i} \times

1 vector of error terms assumed to be normally distributed with zero mean, covariance matrix

σ^{2} I_{n_{i}}

. Further,

ϵ_{i}

are assumed to be independent of the random effects

a_{i}

.

The marginal distribution for the response

y_{i}

is normal with mean

X_{i} β

and covariance matrix

C o v (Y_{i}) = Z_{i} σ_{i}^{2} Z_{i}^{T} + σ^{2} I_{n} .

By stacking the vectors, the mixed model can be can be expressed as

Y = X β + Za + ϵ

. From the Equation (1), the distribution of the model follows

Y \sim N_{n} (X β, V)

, where

E (Y) = X β

with covariance,

V = \sum_{i = 1}^{n} Z_{i} σ_{i}^{2} Z_{i}^{T} + σ^{2} I_{n}

.

2.2. Ridge Full Model and Sub-Model Estimator

The generalized least square estimator (GLS) is defined as

{\hat{β}}^{GLS} = {(X^{T} V^{- 1} X)}^{- 1} X^{T} V^{- 1} Y

and the ridge full model estimator can be obtained by introducing a penalized regression so that

\hat{β} = arg {min}_{β} \{{(Y - X β)}^{T} V^{- 1} (Y - X β) + k β^{T} β\}

and

{\hat{β}}^{Ridge} = {(X^{T} V^{- 1} X + k I)}^{- 1} X^{T} V^{- 1} Y

, where

{\hat{β}}^{Ridge}

is the ridge full model estimator and

k \in [0, \infty)

is the tuning parameter. If k = 0,

{\hat{β}}^{Ridge}

is the GLS estimator and

{\hat{β}}^{Ridge} = 0

for k is sufficiently large. We select the value of k using cross validation.

We let

X = (X_{1}, X_{2})

, where

X_{1}

is an

n \times p_{1}

sub-matrix containing the non-sparse predictors and

X_{2}

is an

n \times p_{2}

sub-matrix that contains the sparse predictors. Accordingly,

β = (β_{1}, β_{2})

where

β_{1}

and

β_{2}

have dimensions

p_{1}

and

p_{2}

, respectively, with

p_{1} + p_{2} = p

,

p_{i} \geq 0

for i = 1, 2.

A sub-model is defined as

Y = X β + Za + ϵ subject to β^{T} β \leq ϕ and β_{2} = 0

which corresponds to

Y = X_{1} β_{1} + Za + ϵ subject to β_{1}^{T} β_{1} \leq ϕ .

The sub-model estimator

{\hat{β}}_{1}^{RSM}

of

β_{1}

has the form

{\hat{β}}_{1}^{RSM} = {(X_{1}^{T} V^{- 1} X_{1} + k I)}^{- 1} X_{1}^{T} V^{- 1} Y .

We denote

{\hat{β}}_{1}^{RFM}

as the full model ridge estimator of

β_{1}

and given as

{\hat{β}}_{1}^{RFM} = {(X_{1}^{T} V^{- 1 / 2} M_{X_{2}} V^{- 1 / 2} X_{1} + k I)}^{- 1} X_{1}^{T} V^{- 1 / 2} M_{X_{2}} V^{- 1 / 2} Y,

where

M_{X_{2}} = I - P = I - V^{- 1 / 2} X_{2} {(X_{2} V^{- 1} X_{2})}^{- 1} X_{2}^{T} V^{- 1 / 2} .

2.3. Pretest Ridge Estimation Strategy

Generally, the sub-model estimator will be more efficient than the full model estimator if the information embodied in the imposed linear restrictions is valid, thus

β_{2}

is close to zero. However, if the information is not valid the sub-model estimator is likely to be more biased and may have a higher risk than the full model estimator. There is, therefore, some doubt as to whether or not to impose the restrictions on the model’s parameter. It is in response to this uncertainty that a statistical test may be used to determine the validity of the proposed restrictions. Accordingly, the procedure to follow in practice is pretest the validity of the restrictions and if the outcome of the pretest suggests that they are correct then the model parameters are estimated incorporating the restrictions. If the pretest rejects the restrictions then the parameters are estimated from the sample information alone. This motivates the consideration of the pretest estimation strategy for the LMM.

The pretest estimator is a combination of the full model estimator

{\hat{β}}_{1}^{RFM},

and sub-model estimator

{\hat{β}}_{1}^{RSM},

through an indicator function

I (L_{n} \leq d_{n, α})

, where

L_{n}

is an appropriate test statistic to test

H_{0} : β_{2} = 0

versus

H_{A} : β_{2} \neq 0

. Moreover,

d_{n, α}

is an

α

level critical value based on distribution of

L_{n}

under

H_{0}

. We define test statistics based on the log-likelihood ratio test as

L_{n} = 2 \{ℓ^{*} ({\hat{β}}^{RFM} ∣ Y) - ℓ^{*} ({\hat{β}}^{RSM} ∣ Y)\}

.

Under

H_{0}

, the test statistic

L_{n}

follows asymptotic chi-square distribution with

p_{2}

degrees of freedom. The pretest test ridge estimator

{\hat{β}}_{1}^{RPT}

of

β_{1}

is then defined by

{\hat{β}}_{1}^{RPT} = {\hat{β}}_{1}^{RFM} - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) I (L_{n} \leq d_{n, α}), p_{2} \geq 1

.

2.4. Shrinkage Ridge Estimation Strategy

The pre-test estimator is a discontinuous function of the sub-model

{\hat{β}}_{1}^{RSM}

and full model

{\hat{β}}_{1}^{RFM}

, which depends on the hard threshold

(d_{n, α} = χ_{p_{2}, α}^{2})

. We address this limitation by defining the shrinkage ridge estimator based on soft thresholding. The shrinkage ridge estimator (RSE) of

β_{1}

, denoted as

{\hat{β}}_{1}^{RSE},

is defined as

{\hat{β}}_{1}^{RSE} = {\hat{β}}_{1}^{RSM} + ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (1 - (p_{2} - 2) L_{n}^{- 1}), p_{2} \geq 3 .

Here,

{\hat{β}}_{1}^{RSE}

is the linear combination of the full model

{\hat{β}}_{1}^{RFM}

and sub-model

{\hat{β}}_{1}^{RSM}

estimates. If

L_{n} \leq (p_{2} - 2),

then a relatively large weight is placed on

{\hat{β}}_{1}^{RSM}

otherwise, more weight is on

{\hat{β}}_{1}^{RFM}

. A setback with

{\hat{β}}_{1}^{RSE}

is that it is not a convex combination of

{\hat{β}}_{1}^{RFM}

and

{\hat{β}}_{1}^{RSM}

. This can cause over-shrinkage, which gives the estimator opposite sign of

{\hat{β}}_{1}^{RFM}

. This could happen if

(p_{2} - 2) L_{n}^{- 1}

is larger than one. To counter this, we use the positive-part shrinkage ridge estimator (RPS) defined as

{\hat{β}}_{1}^{RPS} = {\hat{β}}_{1}^{RSM} + ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) {(1 - (p_{2} - 2) L_{n}^{- 1})}^{+}, p_{2} \geq 3

where

{(1 - (p_{2} - 2) L_{n}^{- 1})}^{+} = max (0, 1 - (p_{2} - 2) L_{n}^{- 1}) .

The RPS estimator will control possible over-shrinking in the RSE estimator.

3. Asymptotic Results

In this section, we derive the asymptotic distributional bias and risk of the estimators considered in Section 2. We examine the properties of the estimators for increasing n and as

β_{2}

approaches the null vector under the sequence of local alternatives defined as

K_{n} : β_{2} = β_{2 (n)} = \frac{κ}{\sqrt{n}},

(2)

where

κ = {(κ_{1}, κ_{2} \dots, κ_{p_{2}})}^{'} \in R^{p_{2}}

is a fixed vector. The vector

\frac{κ}{\sqrt{n}}

is a measure of how far local alternatives

K_{n}

differ from the subspace information

β_{2} = 0

. In order to evaluate the performance of the estimators, we define the asymptotic distributional bias of the estimator

{\hat{β}}_{1}^{*}

as

ADB ({\hat{β}}_{1}^{*}) = lim_{n \to \infty} E \{\sqrt{n} ({\hat{β}}_{1}^{*} - β_{1})\},

In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimator

{\hat{β}}_{1}^{*}

is expressed as

Cov ({\hat{β}}_{1}^{*}) = lim_{n \to \infty} E \{n ({\hat{β}}_{1}^{*} - β_{1}) {({\hat{β}}_{1}^{*} - β_{1})}^{T}\} .

Following the asymptotic covariance matrix, we define the asymptotic risk of an estimator

{\hat{β}}_{1}^{*}

as

R ({\hat{β}}_{1}^{*}) = tr (Q Cov ({\hat{β}}_{1}^{*}))

. Q is a positive definite matrix of weights with dimensions of

p \times p

. We set Q = I in this study.

Assumption 1.

We make the following two regularity conditions to establish the asymptotic properties of the estimators.

1.

\frac{1}{n} {max}_{1 \leq i \leq n}^{} x_{i}^{T} {[X^{T} V^{- 1} X]}^{- 1} x_{i} \to

0as n

\to \infty,

where

x_{i}^{T}

is the ith row ofX.

2.

B_{n} = n^{- 1} {[X^{T} V^{- 1} X]}^{- 1} \to B,

for some finiteB =

(\begin{matrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{matrix})

.

Theorem 1.

For

k < \infty

, If

k / \sqrt{n} \to λ_{o}

and B is non-singular, the distribution of the full model ridge estimator,

{\hat{β}}_{n}^{RFM}

is

\sqrt{n} ({\hat{β}}_{n}^{RFM} - β) \overset{D}{\to} N (- λ_{o} B^{- 1} β, B^{- 1}),

where

\overset{D}{\to}

denotes convergence in distribution.

Proof.

See Theorem 2 in [23]. □

Proposition 1.

Assuming the above assumption 1 together with Theorem 1 hold, under the local alternatives

K_{n}

, we have

(\begin{matrix} φ_{1} \\ φ_{3} \end{matrix}) \overset{D}{\to} N [(\begin{matrix} - μ_{11.2} \\ δ \end{matrix}), (\begin{matrix} B_{11.2}^{- 1} & Φ \\ Φ & Φ \end{matrix})],

(\begin{matrix} φ_{3} \\ φ_{2} \end{matrix}) \overset{D}{\to} N [(\begin{matrix} δ \\ - γ \end{matrix}), (\begin{matrix} Φ & 0 \\ 0 & B_{11}^{- 1} \end{matrix})],

where

φ_{1} = \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})

,

φ_{2} = \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1})

,

φ_{3} = \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM})

,

γ = μ_{11.2} + δ

,

δ = B_{11}^{- 1} B_{12} κ

,

Φ = B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1}

,

B_{22.1} = B_{22} - B_{21} B_{11}^{- 1} B_{12}

,

μ = - λ_{o} B^{- 1} β = (\begin{matrix} μ_{1} \\ μ_{2} \end{matrix})

and

μ_{11.2} = μ_{1} - B_{12} B_{22}^{- 1} ((β_{2} - κ) - μ_{2})

.

Proof.

See Appendix A. □

Theorem 2.

Under the condition of Theorem 1 and the local alternatives

K_{n}

, the ADBs of the proposed estimators are

\begin{matrix} A D B ({\hat{β}}_{1}^{RFM}) & = - μ_{11.2}, \\ A D B ({\hat{β}}_{1}^{RSM}) & = - μ_{11.2} - B_{11}^{- 1} B_{12} δ = - γ, \\ A D B ({\hat{β}}_{1}^{RPT}) & = - μ_{11.2} - δ H_{p_{2} + 2} (χ_{p_{2}, α}^{2}; Δ), \\ A D B ({\hat{β}}_{1}^{RSE}) & = - μ_{11.2} - (p_{2} - 2) δ E (χ_{p_{2} + 2}^{- 2} (Δ)), \\ A D B ({\hat{β}}_{1}^{RPS}) & = - μ_{11.2} - δ H_{p_{2} + 2} (χ_{p_{2} - 2}^{2}; Δ)} - (p_{2} - 2) δ E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{- 2} > p_{2} - 2)\}, \end{matrix}

where

Δ = κ^{T} B_{22.1}^{- 1} κ

,

B_{22.1} = B_{22} - B_{21} B_{11}^{- 1} B_{12}

, and

H_{v} (x; Δ)

is the cumulative distribution function of the non-central chi-squared distribution with non-centrality parameter Δ and v degrees of freedom, and

E (χ_{v}^{- 2 j} (Δ))

is the expected value of the inverse of a non-central

χ^{2}

distribution with v degrees of freedom and non-centrality parameter Δ,

E (χ_{v}^{- 2 j} (Δ)) = \int_{0}^{\infty} x^{- 2 j} d H_{v} (x, Δ) .

Proof.

See Appendix B.1. □

Since the ADBs of the estimators are in non-scalar form, we define the following asymptotic quadratic bias (AQDB) of

{\hat{β}}_{1}^{*}

by

AQDB ({\hat{β}}_{1}^{*}) = {(ADB ({\hat{β}}_{1}^{*}))}^{'} B_{11.2} (ADB ({\hat{β}}_{1}^{*})),

where

B_{11.2} = B_{11} - B_{12} B_{22}^{- 1} B_{21} .

Corollary 1.

Suppose Theorem 2 holds. Then, under

{K_{n}}

, the AQDBs of the estimators are

\begin{matrix} AQDB ({\hat{β}}_{1}^{RFM}) & = μ_{11.2}^{T} B_{11.2} μ_{11.2}, \\ AQDB ({\hat{β}}_{1}^{RSM}) & = γ^{T} B_{11.2} γ, \\ AQDB ({\hat{β}}_{1}^{RPT}) & = μ_{11.2}^{T} B_{11.2} μ_{11.2} + μ_{11.2}^{T} B_{11.2} δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \\ + δ^{T} B_{11.2} μ_{11.2} H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + δ^{T} B_{11.2} δ H_{p_{2} + 2}^{2} (χ_{p_{2}}^{2}; Δ), \\ AQDB ({\hat{β}}_{1}^{RSE}) & = μ_{11.2}^{T} B_{11.2} μ_{11.2} + (p_{2} - 2) μ_{11.2}^{T} B_{11.2} δ E (χ_{p_{2} + 2}^{- 2} (Δ)) \\ + (p_{2} - 2) δ^{T} B_{11.2} μ_{11.2} E (χ_{p_{2} + 2}^{- 2} (Δ)) + {(p_{2} - 2)}^{2} δ^{T} B_{11.2} δ {(E (χ_{p_{2} + 2}^{- 2} (Δ)))}^{2}, \\ AQDB ({\hat{β}}_{1}^{RPS}) & = μ_{11.2}^{T} B_{11.2} μ_{11.2} + (δ^{T} B_{11.2} μ_{11.2} + μ_{11.2}^{T} B_{11.2} δ) [H_{p_{2} + 2} (p_{2} - 2; Δ) \\ + (p_{2} - 2) E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{- 2} (Δ) > p_{2} - 2)\}] + δ^{T} B_{11.2} δ [H_{p_{2} + 2} (p_{2} - 2; Δ) \\ + (p_{2} - 2) E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{- 2} (Δ) > p_{2} - 2)\}]^{2} . \end{matrix}

When

B_{11.2} = 0

, the AQDB of all estimators are equivalent, and the estimators are therefore asymptotically unbiased. If we assume that

B_{11.2} \neq 0

, the results for the bias of the estimators can be summarized as follows:

The AQDB of ${\hat{β}}_{1}^{RSM}$ is an unbounded function of $γ^{T} B_{11.2} γ$ .
The AQDB of ${\hat{β}}_{1}^{RPT}$ starts from $μ_{11.2}^{T} B_{11.2} μ_{11.2}$ at $Δ = 0$ , and when $Δ$ increases, it increases to the maximum and then decreases to zero.
The characteristics of ${\hat{β}}_{1}^{RSE}$ and ${\hat{β}}_{1}^{RPS}$ are similar to ${\hat{β}}_{1}^{RPT}$ . The AQDB of ${\hat{β}}_{1}^{RSE}$ and ${\hat{β}}_{1}^{RPS}$ similarly start from $μ_{11.2}^{T} B_{11.2} μ_{11.2}$ at $Δ = 0$ , and increase to a point, and then decrease towards zero, since $E \{χ_{p_{2} + 2}^{- 2} (Δ)\}$ is a non-increasing on of $Δ$ .

Theorem 3.

Suppose Theorem 1 holds and under the local alternatives

K_{n}

, the covariance matrices of the estimators are

\begin{matrix} C o v ({\hat{β}}_{1}^{RFM}) & = B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T}, \\ C o v ({\hat{β}}_{1}^{RSM}) & = B_{11}^{- 1} + γ γ^{T}, \\ C o v ({\hat{β}}_{1}^{RPT}) & = B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T} + 2 μ_{11.2}^{T} δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) - Φ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \\ + δ δ^{T} [2 H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ)], \\ C o v ({\hat{β}}_{1}^{RSE}) & = B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T} + 2 (p_{2} - 2) μ_{11.2}^{T} δ E (χ_{p_{2} + 2}^{- 2} (Δ)) \\ - (p_{2} - 2) Φ \{2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 2}^{- 4} (Δ))\} \\ + (p_{2} - 2) δ δ^{T} \{- 2 E (χ_{p_{2} + 4}^{- 2} (Δ)) + 2 E (χ_{p_{2} + 2}^{- 2} (Δ)) + (p_{2} - 2) E (χ_{p_{2} + 4}^{- 4} (Δ))\}, \\ C o v ({\hat{β}}_{1}^{RPS}) & = Cov ({\hat{β}}_{1}^{RSE}) + 2 δ μ_{11.2}^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - 2 Φ E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - 2 δ δ^{T} E ({1 - (p_{2} - 2) χ_{p_{2} + 4}^{- 2} (Δ)} I (χ_{p_{2} + 4}^{2} (Δ) \leq p_{2} - 2)) \\ + 2 δ δ^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - {(p_{2} - 2)}^{2} Φ E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2, α}^{2} (Δ) \leq p_{2} - 2)) \\ - {(p_{2} - 2)}^{2} δ δ^{T} E (χ_{p_{2} + 2, α}^{- 4} (Δ) I (χ_{p_{2} + 2, α}^{2} (Δ) \leq p_{2} - 2)) \\ + Φ H_{p_{2} + 2} (p_{2} - 2; Δ) + δ δ^{T} H_{p_{2} + 4} (p_{2} - 2; Δ) . \end{matrix}

Proof.

See Appendix B.2. □

Corollary 2.

Under the local alternatives (

K_{n}

) and from Theorem 3, the risk of the estimators are obtained as

\begin{matrix} R [{\hat{β}}_{1}^{RFM}] & = t r ({Q B}_{11.2}^{- 1}) + μ_{11.2}^{T} Q μ_{11.2}, \\ R [{\hat{β}}_{1}^{RSM}] & = t r ({Q B}_{11}^{- 1}) + γ^{T} Q γ, \\ R [{\hat{β}}_{1}^{RPT}] & = t r ({Q B}_{11.2}^{- 1}) + μ_{11.2}^{T} Q μ_{11.2} + 2 μ_{11.2}^{T} Q δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \\ - t r (Q Φ) H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + δ Q δ^{T} [2 H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ)], \\ R [{\hat{β}}_{1}^{RSE}] & = t r ({Q B}_{11.2}^{- 1}) + μ_{11.2}^{T} Q μ_{11.2} + 2 (p_{2} - 2) μ_{11.2}^{T} Q δ E (χ_{p_{2} + 2}^{- 2} (Δ)) \\ - (p_{2} - 2) tr (Q Φ) [E (χ_{p_{2} + 2}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 2}^{- 4} (Δ))] \\ + (p_{2} - 2) δ^{T} Q δ [2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - 2 E (χ_{p_{2} + 4}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 4}^{- 4} (Δ))], \\ R [{\hat{β}}_{1}^{RPS}] & = R [{\hat{β}}_{1}^{RSE}] + 2 δ Q μ_{11.2}^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - 2 t r (Q Φ) E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - 2 δ^{T} Q δ E ({1 - (p_{2} - 2) χ_{p_{2} + 4}^{- 2} (Δ)} I (χ_{p_{2} + 4}^{2} (Δ) \leq p_{2} - 2)) \\ + 2 δ^{T} Q δ E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - {(p_{2} - 2)}^{2} t r (Q Φ) E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - {(p_{2} - 2)}^{2} δ^{T} Q δ E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ + t r (Q Φ) H_{p_{2} + 2} (p_{2} - 2; Δ) + δ^{T} Q δ H_{p_{2} + 4} (p_{2} - 2; Δ) . \end{matrix}

From Theorem 2, when

B_{12} = 0,

the risks of estimators

{\hat{β}}_{1}^{RSM},

{\hat{β}}_{1}^{RPT},

{\hat{β}}_{1}^{RSE},

and

{\hat{β}}_{1}^{RPS}

are reduced to common value

tr ({Q B}_{11.2}^{- 1}) + μ_{11.2}^{T} Q μ_{11.2}

, the risk of

{\hat{β}}_{1}^{RFM}

. If

B_{12} \neq 0

, the results can be summarized as follows:

The risk of ${\hat{β}}_{1}^{RFM}$ remains constant while the risk of ${\hat{β}}_{1}^{RSM}$ is an unbounded function of $Δ$ since $Δ \in [0, \infty) .$
The risk of ${\hat{β}}_{1}^{RPT}$ increases as $Δ$ moves away from zero, achieves it maximum and then decreases towards the risk of the full model estimator.
The risk of ${\hat{β}}_{1}^{RFM}$ is smaller than the risk of ${\hat{β}}_{1}^{RPT}$ for small values in the neighborhood of $Δ$ and for the rest of the parameter space, ${\hat{β}}_{1}^{RPT}$ outperforms ${\hat{β}}_{1}^{RFM}$ , thus, $R [{\hat{β}}_{1}^{RFM}] > R [{\hat{β}}_{1}^{RPT}]$ .
Comparing the risks of ${\hat{β}}_{1}^{RSE}$ and ${\hat{β}}_{1}^{RFM}$ , it can be seen that the estimator ${\hat{β}}_{1}^{RSE}$ outperforms ${\hat{β}}_{1}^{RFM}$ that is, $R [{\hat{β}}_{1}^{RSE}] \leq R [{\hat{β}}_{1}^{RFM}]$ for all $Δ \geq 0 .$

4. Simulation Studies

In this section, we conduct a simulation study to assess the performance of the suggested estimators for finite samples. The criterion for comparing the performance of any estimator in our study is the mean square error. We simulate the response from the following LMM model

Y_{i} = X_{i} β + Z_{i} a_{i} + ϵ_{i},

(3)

where

ϵ_{i} \sim N (0, σ^{2} I_{n_{i}})

with

σ^{2} = 1

. We generate random effect covariate

a_{i}

from a multivariate normal distribution with zero mean and covariance matrix

G = 0.5 I_{2 \times 2}

, where

I_{2 \times 2}

is

2 \times 2

identity matrix. The design matrix

X_{i} = {(x_{i 1}, \dots, x_{i n_{i}})}^{'}

is generated from a

n_{i}

-multivariate normal distribution with mean vector and covariance matrix

\sum_{x}

. Furthermore, we assume that the off-diagonal elements of the covariance matrix

\sum_{x}

are equal to

ρ

, which is the coefficient of correlation between any two predictors, with

ρ = 0.3, 0.7, 0.9 .

The ratio of the largest eigenvalue to the smallest eigen-value of matrix

X^{T} V^{- 1} X

is calculated as a condition number index (CNI) [24], which assesses the existence of multicollinearity in the design matrix. If the CNI is larger than 30, then the model has significant multicollinearity. Our simulations are based on the linear mixed effects model in Equation (3) with

n = 60

and 100 subjects.

We consider a situation when the model is assumed to be sparse. In this study, our interest lies in testing the hypothesis

H_{o} : β_{2} = 0

, and our goal is to estimate the fixed effect coefficient

β_{1}

. We partition the fixed effects coefficients as

β = {(β_{1}^{'}, β_{2}^{'})}^{'} = {(β_{1}^{'}, 0_{p_{2}})}^{'} .

The coefficients

β_{1}

and

β_{2}

are

p_{1}

and

p_{2}

dimensional vectors, respectively, with

p = p_{1} + p_{2}

.

In order to investigate the behavior of the estimators, we define

Δ^{*} = | | β - β_{o} | |

, where

β_{o} = {(β_{1}^{T}, 0_{p_{2}})}^{T}

and

| | . | |

is the euclidean norm. We considered

Δ^{*}

values between 0 and 4. If

Δ^{*} = 0

, then we will have

β = {(1, 1, 1, 1, \underset{p_{2}}{\underset{︸}{0, 0, \dots, 0}})}^{T}

to generate the response under null hypothesis. On the other hand, when

Δ^{*} \geq 0,

say

Δ^{*} = 4,

we will have

β = {(1, 1, 1, 1, 4, \underset{p_{2} - 1}{\underset{︸}{0, 0, \dots, 0}})}^{T}

to generate the response under the local alternative hypothesis. In our simulation study, we consider the number of fixed effect or predictor variables as

(p_{1}, p_{2}) \in {(5, 40), (5, 500), (5, 1000)} .

Each realization is repeated 5000 times to obtain consistent results and compute the MSE of suggested estimators with

α = 0.05

.

Based on the simulated data, we calculate the mean square error (MSE) of all the estimators as MSE

(\hat{β}) = \frac{1}{5000} \sum_{j = 1}^{5000} {(\hat{β} - β)}^{T} (\hat{β} - β)

, where

\hat{β}

denotes any one of

{\hat{β}}^{RSM}, {\hat{β}}^{RPT}, {\hat{β}}^{RSE}

and

{\hat{β}}^{RPS}

, in the jth repetition. We use the relative mean squared efficiency (RMSE), or the ratio of MSE for risk performance comparison. The RMSE of an estimator

{\hat{β}}^{*}

with respect to the baseline full model ridge estimator

{\hat{β}}_{1}^{RFM}

is defined as

RMSE ({\hat{β}}_{1}^{RFM} : {\hat{β}}_{1}^{*}) = \frac{MSE ({\hat{β}}_{1}^{RFM})}{MSE ({\hat{β}}_{1}^{*})},

where

β_{1}^{*}

is one of the suggested estimators under consideration.

4.1. Simulation Results

In this subsection, we present the results from our simulation study. We report the results for

n = 60, 100

and

p_{1} = 5

with different values of correlation coefficient

ρ

are shown in Table 1. Furthermore, we plot the RMSEs against

Δ^{*}

in Figure 1 and Figure 2. The findings can be summarized as follows:

When $Δ^{*} = 0$ , the sub-model RSM outperforms all other estimators. As $Δ^{*} = 0$ moves from zero, the RMSE of the sub-model decreases and goes to zero.
The pretest ridge estimator RPT outperforms shrinkage ridge and positive Stein ridge estimators in the case of $Δ^{*} = 0$ . However, for large number of sparse predictors $p_{2}$ while keeping $p_{1}$ and n fixed, RPT is less efficient than RPS and RSE. In the case of $Δ^{*}$ being larger than zero, the RMSE of RPT decreases, and it remains below 1 for immediate values of $Δ^{*}$ , after that the RMSE of RPT increases and approaches one for larger values of $Δ^{*} .$
RPS performs better than RSE in the entire parameter space induced by $Δ^{*}$ as presented in Table 1 and Table 2. Similarly, both shrinkage estimators RPS and RSE outperforms the full ridge model estimator irrespective of the corrected sub-model selected. This is consistent with the asymptotic theory presented in Section 3.
$Δ^{*}$ which measures the degree of deviation from the Assumption 1 on the parameter space, it is clear that one cannot go wrong with the use of shrinkage estimators even if the selected sub-model is wrongly specified. As evident from Table 1 and Table 2, Figure 1 and Figure 2, if the selected sub-model is correct, that is, $Δ^{*} = 0$ , then the shrinkage estimators are relatively efficient compared with the ridge full model estimator. On the other hand, if the sub-model is misspecified, the gain slowly diminishes. However, in terms of risk, the shrinkage estimators are at least as good as the full ridge model estimator. Therefore, the use of shrinkage estimators makes sense in application when a sub-model cannot be correctly specified.
The RMSE of the ridge-type estimators are an increasing function of the amount of multicollinearity. This indicates that the ridge-type estimators perform better than the classical estimator in the presence of multicollinearity among predictor variables.

4.2. Comparison with LASSO-Type Estimators

We compare our listed estimators with the LASSO and adaptive LASSO estimators. A 10-fold cross-validation is used for selecting the optimal value of the penalty parameters that minimizes the mean square errors for the LASSO-type estimators. The results for

ρ = 0.3, 0.7, 0.9

,

n = 60, 100

,

p_{1} = 10

and

p_{2} = 50, 500, 1000, 2000

are presented in Table 3. We observe the following from Table 3.

The performance of the sub-model estimator is the best among all estimators.
The pretest ridge estimator performs better than the other estimators. However, for larger values of sparse predictors $p_{2}$ the shrinkage estimators outperform the pretest estimator.
The performance of the LASSO and aLASSO estimators are comparable when $ρ$ is small. The pretest and shrinkage estimators remain stable for a given value of $ρ$ .
For a large number of sparse predictors $p_{2}$ , the shrinkage and pretest ridge estimators outperforms the lasso-type estimators. This indicates the superiority of the shrinkage estimators over the LASSO-type estimators. Therefore shrinkage estimators are preferable when there is multicollinearity in our predictor variables.

5. Real Data Application

We consider two real data analyses using Amsterdam Growth and Health Data and a genetic and brain network connectivity edge weight data to illustrate the performance of the proposed estimators.

5.1. Amsterdam Growth and Health Data (AGHD)

The AGHD data is obtained from the Amsterdam Growth and Health Study [25]. The goal of this study is to investigate the relationship between lifestyle and health in adolescence into young adulthood. The response variable Y is the total serum cholesterol measured over six time points. There are five covariates:

X_{1}

is the baseline fitness level measured as the maximum oxygen uptake on a treadmill,

X_{2}

is the amount of body fat estimated by the sum of the thickness of four skinfolds,

X_{3}

is a smoking indicator (0 = no, 1 = yes),

X_{4}

is the gender (1 = female, 2 = male), and time measurement as

X_{5}

and subject specific random effects.

A total of 147 subjects participated in the study where all variables were measured at

n_{i} = 6

time occasions. In order to apply the proposed methods, firstly, we apply a variable selection based on AIC procedure to select the sub-model. For the AGHD data, we fit a linear mixed model with all the five covariates for both fixed and subject specific random effects by two stage selection procedure for the purpose of choosing both the random and fixed effects. The analysis found

X_{2}

and

X_{5}

to be significant covariates for prediction of the response variable serum cholestrol and the other variables are ignored since they are not significantly important. Based on this information, a sub-model is chosen to be

X_{2}

and

X_{5}

and the full model includes all the covariates. We construct the shrinkage estimators from the full-model and sub-model. In terms of null hypothesis, the restriction can be written as

β_{2} = (β_{1}, β_{3}, β_{4}) = (0, 0, 0)

with

p = 5

,

p_{1} = 2

and

p_{2} = 3

.

To evaluate the performance of the estimators, we obtain the mean square prediction error (MSPE) using bootstrap samples. We draw 1000 bootstrap samples of the 147 subjects from the data matrix

{(Y_{i j}, X_{i j}), i = 1, 2, \dots, 147; j = 1, 2, \dots, 6}

. We then calculate the relative prediction error (RPE) of

β_{1}^{*}

with respect to

β_{1}^{RFM}

, the full model estimator. The RPE is defined as

RPE ({\hat{β}}_{1}^{RFM} : {\hat{β}}_{1}^{*}) = \frac{MSPE ({\hat{β}}_{1}^{*})}{MSPE ({\hat{β}}_{1}^{RFM})} = \frac{{(Y - X_{1} {\hat{β}}_{1}^{*})}^{'} (Y - X_{1} {\hat{β}}_{1}^{*})}{{(Y - X_{1} {\hat{β}}_{1}^{RFM})}^{'} (Y - X_{1} {\hat{β}}_{1}^{RFM})},

where

β_{1}^{*}

is one of the listed estimators. If

RPE < 1

, then

{\hat{β}}_{1}^{*}

outperforms

{\hat{β}}_{1}^{RFM} .

Table 4 reports the estimates, standard error of the non-sparse predictors and RPEs of the estimators with respect to the full model. As expected, the sub-model ridge estimator

{\hat{β}}_{1}^{RSM}

has the minimum RPE because it is computed when the sub-model is correct, that is,

Δ^{*} = 0

. It is evident by the RPE values in Table 4 that the shrinkage estimators are superior to the LASSO-type estimators. Furthermore, the positive shrinkage is more efficient than the shrinkage ridge estimator.

5.2. Resting-State Effective Brain Connectivity and Genetic Data

This data comprises longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) effective brain connectivity network and genetic study [26] data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The 111 subjects comprise 36 cognitively normal (CN), 63 mild cognitive impairment (MCI) and 12 Alzheimer’s Disease (AD) subjects. The response is a network connection between regions of interest estimated from an rs-fMRI scan within the Default Mode Network (DMN), and we observe a longitudinal sequence of such connections for each subject with the number of repeated measurements. The DMN consists of a set of brain regions that tend to be active in resting-state, when a subject is mind wandering with no intended task. For this data analysis, we consider the network edge weight from the left intraparietal cortex to posterior cingulate cortex (LIPC → PCC) as our response. The genetic data are single nucleotide polymorphism (SNPs) from non-sex chromosomes, i.e., chromosome 1 to chromosome 22. SNPs with minor allele frequency less than 5% are removed as are SNPs with a Hardy–Weinberg equilibrium p-value lower than

10^{- 6}

or a missing rate greater than 5%. After preprocessing we are left with 1,220,955 SNPs and the longitudinal rs-fMRI effective connectivity network using the 111 subjects with rs-fMRI data. The response is network edge weight. There are SNPs which are the fixed effects and subject specific random effects.

In order to apply the proposed methods, we use a genome- wide association study (GWAS) for screening the genetic data to 100 SNPs. We implement a second screening by applying multinomial logistic regression to identify a smaller subset of the 100 SNPs that are potentially associated with disease (CN/MCI/AD). This yields a subset of top 10 SNPs. This showed the top 10 SNPs are the most important predictors and the other 90 SNPs are ignored as not significant. We now have two models, which are the full model with all 100 SNPs and sub-model with 10 SNPs selected. Finally, we construct the pretest and shrinkage estimators from the full-model and sub-model.

We draw 1000 bootstrap samples with replacements from the corresponding data matrix

{(Y_{i j}, X_{i j}), i = 1, \dots, 111; j = 1, \dots, n_{i}} .

We report the RPE of the estimators based on the bootstrap simulation with respect to the full model ridge estimator in Table 5. We observe that the RPE of the sub-model, pretest, shrinkage and positive shrinkage ridge estimators outperforms the full model estimator. Clearly, the sub-model ridge estimator has the smallest RPE since it’s computed when the candidate sub-model is correct, i.e.,

Δ = 0

. Both shrinkage ridge estimators outperform the pretest ridge estimator. Particularly, the positive shrinkage performed better than the shrinkage estimator. The performance of both shrinkage and pretest ridge estimators are better than the LASSO-type estimators. Thus, the data analysis is in line with our simulation and theoretical findings.

6. Conclusions

In this paper, we present efficient estimation strategies for the linear mixed effect model when there exists multicollinearity among predictor variables for high-dimensional data application. We considered the estimation of fixed effects parameters in the linear mixed model when some of the predictors may have a very weak influence on the response of interest. We introduced pretest and shrinkage estimation in our model using the ridge estimation as the reference estimator. In addition, we established the asymptotic properties of the pretest and shrinkage ridge estimators. Our theoretical findings demonstrate that the shrinkage ridge estimators outperform the full model ridge estimator and perform relatively better than the sub-model estimator in a wide range of the parameter space.

Additionally, a Monte Carlo simulation was conducted to investigate and assess the finite sample behavior of proposed estimators when the model is sparse (restrictions on parameters hold). As expected, the sub-model ridge estimator outshines all other estimators when the restrictions hold. However, when this assumption is violated, the shrinkage and pretest ridge estimators outperform the sub-model estimator. Furthermore, when the number of sparse predictors are extremely large relative to the sample size, the shrinkage estimators outperform the pretest ridge estimator. These numerical results are consistent with our asymptotic result. We also assess the relative performance of the LASSO-type estimators with our ridge-type estimators. We observe that the performance of pretest and shrinkage ridge estimators are superior to the LASSO-type estimators when predictors are highly correlated. For our real data application, the shrinkage ridge estimators are superior with the smallest relative prediction error compared to the LASSO-type estimators.

In summary, the results of the data analyses strongly confirm the findings of the simulation study and suggest the use of the shrinkage ridge estimation strategy when no prior information about the parameter subspace is available. The results of our simulation study and real data application are consistent with available results in [27,28,29].

In our future work, we will focus on other penalty estimators like the Elastic-Net, the minimax concave penalty (MCP), and the smoothly clipped absolute deviation method (SCAD) as estimation strategy in LMM for high-dimensional data. These estimators will be assessed and compared with the proposed ridge-type estimators. Another interesting extension will be integrating two sub-models by incorporating ridge-type estimation strategies in the linear mixed effect models. The goal is to improve the estimation accuracy of the non-sparse set of the fixed effects parameters by combining an over-fitted model estimator with an under-fitted one [27,29]. This approach will include combining two sub-models produced by two different variable selection techniques from the LMM [28].

Author Contributions

Conceptualization, E.A.O. and S.E.A.; methodology, E.A.O. and F.S.N.; formal analysis, E.A.O.; writing—original draft preparation, E.A.O.; writing—review and editing, E.A.O., S.E.A. and F.S.N.; supervision, F.S.N. and S.E.A.; funding acquisition, F.S.N. and S.E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Sciences and Engineering Research Council of Canada (NSERC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here https://pubmed.ncbi.nlm.nih.gov/22434862/ (accessed on 20 April 2021).

Acknowledgments

Research is supported by the Visual and Automated Disease Analytics (VADA) graduate training program.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Proposition 1.

The asymptotic relationship between the sub-model and full model estimators of

β_{1}

, we use the argument and equation:

\hat{Y} = Y - X_{2} {\hat{β}}_{2}^{RFM}

, where

\begin{matrix} {\hat{β}}_{1}^{RFM} & = arg min_{β_{1}} \{{(\hat{Y} - X_{1} β_{1})}^{T} V^{- 1} (\hat{Y} - X_{1} β_{1}) + λ | | β_{1} {| |}^{2}\} \\ = {[X_{1}^{T} V^{- 1} X_{1} + λ I_{p_{1}}]}^{- 1} X_{1}^{T} V^{- 1} \hat{Y} \\ = {[X_{1}^{T} V^{- 1} X_{1} + λ I_{p_{1}}]}^{- 1} X_{1}^{T} V^{- 1} Y - {[X_{1}^{T} V^{- 1} X_{1} + λ I_{p_{1}}]}^{- 1} X_{1}^{T} V^{- 1} X_{2} {\hat{β}}_{2}^{RFM} \\ = {\hat{β}}_{1}^{RSM} - {[X_{1} V^{- 1} X_{1} + λ I_{p_{1}}]}^{- 1} X_{1}^{T} V^{- 1} X_{2} {\hat{β}}_{2}^{RFM} \\ = {\hat{β}}_{1}^{RSM} - B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM} \end{matrix}

From Theorem 1, we partition

\sqrt{n} ({\hat{β}}^{RFM} - β)

as

\sqrt{n} ({\hat{β}}^{RFM} - β) = (\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}), \sqrt{n} ({\hat{β}}_{2}^{RFM} - β_{2}))

. We obtain

\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}) \overset{D}{\to} N_{p_{1}} (- μ_{11.2}, B_{11.2}^{- 1}),

where

B_{11.2}^{- 1} = B_{11} - B_{12} B_{22}^{- 1} B_{21} .

We have shown that

{\hat{β}}_{1}^{RSM} = {\hat{β}}_{1}^{RFM} + B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM} .

Using this expression and under the local alternative

{K_{n}}

, we obtain the following expressions

\begin{matrix} φ_{2} & = \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1}) \\ = \sqrt{n} ({\hat{β}}_{1}^{RFM} + B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM} - β_{1}) \\ = φ_{1} + B_{11}^{- 1} B_{12} \sqrt{n} {\hat{β}}_{2}^{RFM}, \\ φ_{3} & = \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) \\ = \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}) - \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1}) \\ = φ_{1} - φ_{2} . \end{matrix}

Since

φ_{2}

and

φ_{3}

are linear functions of

φ_{1},

as

n \to \infty

, they are also asymptotically normally distributed. Their mean vectors and covariance matrices are as follows:

\begin{matrix} E (φ_{1}) & = E (\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})) = - μ_{11.2} \\ E (φ_{2}) & = E (φ_{1} + B_{11}^{- 1} B_{12} \sqrt{n} {\hat{β}}_{2}^{RFM}) \\ = E (φ_{1}) + B_{11}^{- 1} B_{12} \sqrt{n} E ({\hat{β}}_{2}^{RFM}) \\ = - μ_{11.2} + B_{11}^{- 1} B_{12} κ = - (μ_{11.2} - δ) = - γ \\ E (φ_{3}) & = E (φ_{1} - φ_{2}) = - μ_{11.2} - (- (μ_{11.2} - δ)) = δ \\ V a r (φ_{1}) & = B_{22.1}^{- 1} \\ V a r (φ_{2}) & = V a r (φ_{1} + B_{11}^{- 1} B_{12} \sqrt{n} {\hat{β}}_{2}^{RFM}) \\ = V a r (φ_{1}) + B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} \\ + 2 C o v [\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}), \sqrt{n} ({\hat{β}}_{2}^{RFM} - β_{2})] {(B_{11}^{- 1} B_{12})}^{T} \\ = B_{22.1}^{- 1} - B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} = B_{11}^{- 1} \\ V a r (φ_{3}) & = V a r (\sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM})) \\ = V a r (\sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RFM} - B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM})) \\ = B_{11}^{- 1} B_{12} V a r [\sqrt{n} {\hat{β}}_{2}^{RFM}] {(B_{11}^{- 1} B_{12})}^{T} \\ = B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} = Φ \\ C o v (φ_{1}, φ_{3}) & = C o v [\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}), \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM})] \\ = V a r (\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})) - C o v [\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}), \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1})] \\ = V a r (φ_{1}) - C o v [\sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}), \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}) + \sqrt{n} B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM}] \\ = B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} = Φ \end{matrix}

\begin{matrix} C o v (φ_{2}, φ_{3}) & = C o v [\sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1}), \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM})] \\ = C o v [\sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1}), \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})] - V a r (\sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1})) \\ = B_{11.2}^{- 1} - B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} - B_{11}^{- 1} \\ = B_{11.2}^{- 1} - (B_{11.2}^{- 1} - B_{11}^{- 1}) - B_{11}^{- 1} = 0 \end{matrix}

Therefore, the asymptotic distributions of the vectors

φ_{2}

and

φ_{3}

are obtained as follows:

\begin{matrix} φ_{2} = \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1}) \overset{D}{\to} N_{p_{1}} (- γ, B_{11}^{- 1}) \\ φ_{3} = \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) \overset{D}{\to} N_{p_{1}} (δ, Φ) \end{matrix}

□

Appendix B

We next introduce the lemmas given in [30] to aid with the proof of the bias and covariance of the estimators.

Lemma A1.

Let

V = {(V_{1}, V_{2}, \dots V_{p})}^{T}

be a p-dimensional normal vector distributed as

N_{p} (μ_{v}, \sum_{p}),

then for a measurable function

Ψ,

we have

\begin{matrix} E [V Ψ (V^{T} V)] = μ_{v} E [Ψ χ_{p + 2}^{2} (Δ)] \\ E [{VV}^{T} Ψ (V^{T} V)] = \sum_{p} E [Ψ χ_{p + 2}^{2} (Δ)] + μ_{v} μ_{v}^{T} E [Ψ χ_{p + 4}^{2} (Δ)] \end{matrix}

where

χ_{k}^{2} (Δ)

is a non-central chi-square distribution with k degrees of freedom and non-centrality parameter Δ.

Appendix B.1

Proof of Theorem 2.

\begin{matrix} ADB ({\hat{β}}_{1}^{RFM}) & = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})\} \\ = - μ_{11.2} . \end{matrix}

\begin{matrix} ADB ({\hat{β}}_{1}^{RSM}) & = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM} - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})\} - E \{lim_{n \to \infty} \sqrt{n} (B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM})\} \\ = - μ_{11.2} - E \{lim_{n \to \infty} \sqrt{n} (B_{11}^{- 1} B_{12} {\hat{β}}_{2}^{RFM})\} \\ = - μ_{11.2} - B_{11}^{- 1} B_{12} κ = - (μ_{11.2} + δ) = - γ . \end{matrix}

Using Lemma 1,

\begin{matrix} ADB ({\hat{β}}_{1}^{RPT}) & = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RPT} - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) I (L_{n} \leq d_{n, α}) - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})\} - E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) I (L_{n} \leq d_{n, α})\} \\ = - μ_{11.2} - E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) I (L_{n} \leq d_{n, α}))\} \\ = - μ_{11.2} - δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) . \end{matrix}

\begin{matrix} ADB ({\hat{β}}_{1}^{RSE}) & = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSE} - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1} - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})\} - E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1}\} \\ = - μ_{11.2} - E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1}\} \\ = - μ_{11.2} - (p_{2} - 2) δ E (χ_{p_{2} + 2}^{- 2} (Δ)) . \end{matrix}

\begin{matrix} ADB ({\hat{β}}_{1}^{RPS}) & = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RPS} - β_{1})\} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSM} + ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (1 - (p_{2} - 2) L_{n}^{- 1}) I (L_{n} > p_{2} - 2) - β_{1})\} \\ = E {\sqrt{n} [{\hat{β}}_{1}^{RSM} + ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (1 - I (L_{n} \leq p_{2} - 2)) \\ - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1} I (L_{n} > p_{2} - 2) - β_{1}]} \\ = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1})\} - E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) I (L_{n} \leq p_{2} - 2)\} \\ - E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1} I (L_{n} > p_{2} - 2) \\ = - μ_{11.2} - δ H_{p_{2} + 2} (χ_{p_{2} - 2}^{2}; Δ)} - (p_{2} - 2) δ E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{- 2} > p_{2} - 2)\} . \end{matrix}

□

Appendix B.2

In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimator

{\hat{β}}_{1}^{*}

is expressed as

Cov ({\hat{β}}_{1}^{*}) = lim_{n \to \infty} E \{n ({\hat{β}}_{1}^{*} - β_{1}) {({\hat{β}}_{1}^{*} - β_{1})}^{T}\} .

Proof of Theorem 3.

We first start by computing the asymptotic covariance of the estimator

{\hat{β}}_{1}^{RFM}

as:

\begin{matrix} Cov ({\hat{β}}_{1}^{RFM}) & = E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RFM} - β_{1}) \sqrt{n} {({\hat{β}}_{1}^{RFM} - β_{1})}^{T}} \\ = E (φ_{1} φ_{1}^{T}) = Cov (φ_{1} φ_{1}^{T}) + E (φ_{1}) E (φ_{1}^{T}) \\ = B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T} . \end{matrix}

Furthermore, similarly, the asymptotic covariance of the estimator

{\hat{β}}_{1}^{RSM}

is obtained as:

\begin{matrix} Cov ({\hat{β}}_{1}^{RSM}) & = E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSM} - β_{1}) \sqrt{n} {({\hat{β}}_{1}^{RSM} - β_{1})}^{T}} \\ = E (φ_{2} φ_{2}^{T}) = C o v (φ_{2} φ_{2}^{T}) + E (φ_{2}) E (φ_{2}^{T}) \\ = B_{11}^{- 1} + γ γ^{T} . \end{matrix}

The asymptotic covariance of the estimator

{\hat{β}}_{1}^{RPT}

is obtained as:

\begin{matrix} Cov ({\hat{β}}_{1}^{RPT}) & = E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RPT} - β_{1}) \sqrt{n} {({\hat{β}}_{1}^{RPT} - β_{1})}^{T}} \\ = E {lim_{n \to \infty} n [({\hat{β}}_{1}^{RFM} - β_{1}) - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) I (L_{n} \leq d_{n, α})] \\ [{({\hat{β}}_{1}^{RFM} - β_{1}) - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) I (L_{n} \leq d_{n, α})]}^{T}\} \\ = E \{[φ_{1} - φ_{3} I (L_{n} \leq d_{n, α})] {[φ_{1} - φ_{3} I (L_{n} \leq d_{n, α})]}^{T}\} \\ = E \{φ_{1} φ_{1}^{T} - 2 φ_{3} φ_{1}^{T} I (L_{n} \leq d_{n, α}) + φ_{3} φ_{3}^{T} I (L_{n} \leq d_{n, α})\} \end{matrix}

Thus, we need to find

E \{φ_{1} φ_{1}^{T}\}

,

E \{φ_{3} φ_{1}^{T} I (L_{n} \leq d_{n, α})\}

and

E \{φ_{3} φ_{3}^{T} I (L_{n} \leq d_{n, α})\}

, The first term is

E \{φ_{1} φ_{1}^{T}\} = B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T}

. Using Lemma 1, the third term is computed as:

E \{φ_{3} φ_{3}^{T} I (L_{n} \leq d_{n, α})\} = Φ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + δ δ^{T} H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ) .

The second term

E \{φ_{3} φ_{1}^{T} I (L_{n} \leq d_{n, α})\}

can be computed from normal theory as

\begin{matrix} E \{φ_{3} φ_{1}^{T} I (L_{n} \leq d_{n, α})\} & = E \{E (φ_{3} φ_{1}^{T} I (L_{n} \leq d_{n, α}) | φ_{3})\} = E \{φ_{3} E (φ_{1}^{T} I (L_{n} \leq d_{n, α}) | φ_{3})\} \\ = E \{φ_{3} {[- μ_{11.2} + (φ_{3} - δ)]}^{T} I (L_{n} \leq d_{n, α})\} \\ = - E \{φ_{3} μ_{11.2} I (L_{n} \leq d_{n, α})\} + E \{φ_{3} {(φ_{3} - δ)}^{T} I (L_{n} \leq d_{n, α})\} \\ = - μ_{11.2}^{T} E {φ_{3} I (L_{n} \leq d_{n, α})} + E {φ_{3} φ_{3}^{T} I (L_{n} \leq d_{n, α})} \\ - E \{φ_{3} δ^{T} I (L_{n} \leq d_{n, α})\} \\ = - μ_{11.2}^{T} δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + {C o v (φ_{3} φ_{3}^{T}) H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \\ + E (φ_{3}) E (φ_{3}^{T}) H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ) - δ δ^{T} H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ)} \\ = - μ_{11.2}^{T} δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + Φ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + δ δ^{T} H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ) \\ - δ δ^{T} H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \end{matrix}

Putting all the terms together and simplifying, we obtain

\begin{matrix} Cov ({\hat{β}}_{1}^{RPT}) \\ = & μ_{11.2} μ_{11.2}^{T} + 2 μ_{11.2}^{T} δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) + B_{11.2}^{- 1} - Φ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) - δ δ^{T} H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ) \\ + & 2 δ δ^{T} H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \\ = & B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T} + 2 μ_{11.2}^{T} δ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) - Φ H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) \\ + δ δ^{T} [2 H_{p_{2} + 2} (χ_{p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{p_{2}}^{2}; Δ)] . \end{matrix}

The asymptotic covariance of the estimator

{\hat{β}}_{1}^{RSE}

can be obtained as

\begin{matrix} Cov ({\hat{β}}_{1}^{RSE}) & = E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSE} - β_{1}) \sqrt{n} {({\hat{β}}_{1}^{RSE} - β_{1})}^{T}} \\ = E {lim_{n \to \infty} n [({\hat{β}}_{1}^{RFM} - β_{1}) - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1}] \\ [{({\hat{β}}_{1}^{RFM} - β_{1}) - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) (p_{2} - 2) L_{n}^{- 1}]}^{T}\} \\ = E \{[φ_{1} - φ_{3} (p_{2} - 2) L_{n}^{- 1}] {[φ_{1} - φ_{3} (p_{2} - 2) L_{n}^{- 1}]}^{T}\} \\ = E \{φ_{1} φ_{1}^{T} - 2 (p_{2} - 2) φ_{3} φ_{1}^{T} L_{n}^{- 1} + {(p_{2} - 2)}^{2} φ_{3} φ_{3}^{T} L_{n}^{- 2}\} \end{matrix}

We need to compute

E \{φ_{3} φ_{3}^{T} L_{n}^{- 2}\}

and

E \{φ_{3} φ_{1}^{T} L_{n}^{- 1}\}

. By using Lemma 1, the first term is obtained as follows:

E \{φ_{3} φ_{3}^{T} L_{n}^{- 2}\} = Φ E (χ_{p_{2} + 2}^{- 4} (Δ)) + δ δ^{T} E (χ_{p_{2} + 4}^{- 4} (Δ)) .

The second term is computed from normal theory

\begin{matrix} E \{φ_{3} φ_{1}^{T} L_{n}^{- 1}\} & = E \{E (φ_{3} φ_{1}^{T} L_{n}^{- 1} | φ_{3})\} = E \{φ_{3} E (φ_{1}^{T} L_{n}^{- 1} | φ_{3})\} \\ = E \{φ_{3} {[- μ_{11.2} + (φ_{3} - δ)]}^{T} L_{n}^{- 1}\} \\ = - E \{φ_{3} μ_{11.2} L_{n}^{- 1}\} + E \{φ_{3} {(φ_{3} - δ)}^{T} L_{n}^{- 1}\} \\ = - μ_{11.2}^{T} E {φ_{3} L_{n}^{- 1}} + E {φ_{3} φ_{3}^{T} L_{n}^{- 1}} - E \{φ_{3} δ^{T} L_{n}^{- 1}\} \end{matrix}

From above, we can find

E \{φ_{3} δ^{T} L_{n}^{- 1}\} = δ δ^{T} E (χ_{p_{2} + 2}^{- 2} (Δ))

and

E \{φ_{3} L_{n}^{- 1}\} = δ E (χ_{p_{2} + 2}^{- 2} (Δ))

. Putting these terms together and simplifying, we obtain

\begin{matrix} Cov ({\hat{β}}_{1}^{RSE}) & = B_{11.2}^{- 1} + μ_{11.2} μ_{11.2}^{T} + 2 (p_{2} - 2) μ_{11.2}^{T} δ E (χ_{p_{2} + 2}^{- 2} (Δ)) \\ - (p_{2} - 2) Φ \{2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 2}^{- 4} (Δ))\} \\ + & (p_{2} - 2) δ δ^{T} \{- 2 E (χ_{p_{2} + 4}^{- 2} (Δ)) + 2 E (χ_{p_{2} + 2}^{- 2} (Δ)) + (p_{2} - 2) E (χ_{p_{2} + 4}^{- 4} (Δ))\} . \end{matrix}

Since

{\hat{β}}_{1}^{RPS} = {\hat{β}}_{1}^{RSE} - ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2) .

We derive the covariance of the estimator

{\hat{β}}_{1}^{RPS}

as follows.

\begin{matrix} Cov ({\hat{β}}_{1}^{RPS}) & = E \{lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RPS} - β_{1}) \sqrt{n} {({\hat{β}}_{1}^{RPS} - β_{1})}^{T}\} \\ = E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSE} - β_{1}) - \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2) \\ \times {[\sqrt{n} ({\hat{β}}_{1}^{RSE} - β_{1}) - \sqrt{n} ({\hat{β}}_{1}^{RFM} - {\hat{β}}_{1}^{RSM}) \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)]}^{T}} \\ = E {lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{RSE} - β_{1}) \sqrt{n} {({\hat{β}}_{1}^{RSE} - β_{1})}^{T} - 2 φ_{3} \sqrt{n} {({\hat{β}}_{1}^{RSE} - β_{1})}^{T} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2) \\ + φ_{3} φ_{3}^{T} {\{1 - (p_{2} - 2) L_{n}^{- 1}\}}^{2} I (L_{n} \leq p_{2} - 2)} \\ = Cov ({\hat{β}}_{1}^{RSE}) - 2 E \{lim_{n \to \infty} φ_{3} \sqrt{n} {({\hat{β}}_{1}^{RSE} - β_{1})}^{T} {\{1 - (p_{2} - 2) L_{n}^{- 1}\}}^{2} I (L_{n} \leq p_{2} - 2)\} \\ + E \{lim_{n \to \infty} φ_{3} φ_{3}^{T} {\{1 - (p_{2} - 2) L_{n}^{- 1}\}}^{2} I (L_{n} \leq p_{2} - 2)\} \\ = Cov ({\hat{β}}_{1}^{RSE}) - 2 E \{lim_{n \to \infty} φ_{3} φ_{1}^{T} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)\} \\ + 2 E \{lim_{n \to \infty} φ_{3} φ_{3}^{T} (p_{2} - 2) L_{n}^{- 1} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)\} \\ + E \{lim_{n \to \infty} φ_{3} φ_{3}^{T} {\{1 - (p_{2} - 2) L_{n}^{- 1}\}}^{2} I (L_{n} \leq p_{2} - 2)\} \\ = Cov ({\hat{β}}_{1}^{RSE}) - 2 E \{lim_{n \to \infty} φ_{3} φ_{1}^{T} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)\} \\ - E \{lim_{n \to \infty} φ_{3} φ_{3}^{T} {(p_{2} - 2)}^{2} L_{n}^{- 2} I (L_{n} \leq p_{2} - 2)\} + E \{lim_{n \to \infty} φ_{3} φ_{3}^{T} I (L_{n} \leq p_{2} - 2)\} \end{matrix}

We first compute the last term in the equation above

E \{φ_{3} φ_{3}^{T} I (L_{n} \leq p_{2} - 2)\}

as

E \{φ_{3} φ_{3}^{T} I (L_{n} \leq p_{2} - 2)\} = Φ H_{p_{2} + 2} (p_{2} - 2; Δ) + δ δ^{T} H_{p_{2} + 4} (p_{2} - 2; Δ) .

Using Lemma 1 and from the normal theory, we find,

\begin{matrix} E \{φ_{3} φ_{1}^{T} {1 - (p_{2} - 2) L_{n}^{- 1}} I (L_{n} \leq p_{2} - 2)\} \\ = E \{E (φ_{3} φ_{1}^{T} {1 - (p_{2} - 2) L_{n}^{- 1}} I (L_{n} \leq p_{2} - 2) | φ_{3})\} \\ = E \{φ_{3} E (φ_{1}^{T} {1 - (p_{2} - 2) L_{n}^{- 1}} I (L_{n} \leq p_{2} - 2) | φ_{3})\} \\ = E \{φ_{3} {[μ_{11.2} + (φ_{3} - δ)]}^{T} {1 - (p_{2} - 2) L_{n}^{- 1}} I (L_{n} \leq p_{2} - 2)\} \\ = - μ_{11.2} E (φ_{3} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)) \\ + E (φ_{3} φ_{3}^{T} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)) \\ - E (φ_{3} δ^{T} \{1 - (p_{2} - 2) L_{n}^{- 1}\} I (L_{n} \leq p_{2} - 2)) \\ = - δ μ_{11.2}^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{- 2} (Δ) \leq p_{2} - 2)) \\ + Φ E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{- 2} (Δ) \leq p_{2} - 2)) \\ + δ δ^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 4}^{- 2} (Δ)\} I (χ_{p_{2} + 4}^{- 2} (Δ) \leq p_{2} - 2)) \\ - δ δ^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 4}^{- 2} (Δ)\} I (χ_{p_{2} + 4}^{- 2} (Δ) \leq p_{2} - 2)) . \end{matrix}

\begin{matrix} E \{φ_{3} φ_{3}^{T} {(p_{2} - 2)}^{2} L_{n}^{- 2} I (L_{n} \leq p_{2} - 2)\} & = {(p_{2} - 2)}^{2} Φ E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ + {(p_{2} - 2)}^{2} δ δ^{T} E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \end{matrix}

Putting all the terms together, we obtain

\begin{matrix} Cov ({\hat{β}}_{1}^{RPS}) & = Cov ({\hat{β}}_{1}^{RSE}) + 2 δ μ_{11.2}^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - 2 Φ E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - 2 δ δ^{T} E ({1 - (p_{2} - 2) χ_{p_{2} + 4}^{- 2} (Δ)} I (χ_{p_{2} + 4}^{2} (Δ) \leq p_{2} - 2)) \\ + 2 δ δ^{T} E (\{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ - {(p_{2} - 2)}^{2} Φ E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2, α}^{2} (Δ) \leq p_{2} - 2)) \\ - {(p_{2} - 2)}^{2} δ δ^{T} E (χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)) \\ + Φ H_{p_{2} + 2} (p_{2} - 2; Δ) + δ δ^{T} H_{p_{2} + 4} (p_{2} - 2; Δ) . \end{matrix}

□

References

Laird, N.M.; Ware, J.H. Random-effects models for longitudinal data. Biometrics 1982, 38, 963–974. [Google Scholar] [CrossRef]
Longford, N. Regression analysis of multilevel data with measurement error. Br. J. Math. Stat. Psychol. 1993, 46, 301–311. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Tran, M.N. The loss rank criterion for variable selection in linear regression analysis. Scand. J. Stat. 2011, 38, 466–479. [Google Scholar] [CrossRef] [Green Version]
Huang, J.; Ma, S.; Zhang, C.H. Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 2008, 18, 1603–1618. [Google Scholar]
Kim, Y.; Choi, H.; Oh, H.S. Smoothly clipped absolute deviation on high dimensions. J. Am. Stat. Assoc. 2008, 103, 1665–1673. [Google Scholar] [CrossRef]
Wang, H.; Leng, C. Unified LASSO estimation by least squares approximation. J. Am. Stat. Assoc. 2007, 102, 1039–1048. [Google Scholar] [CrossRef]
Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol. 2006, 68, 49–67. [Google Scholar] [CrossRef]
Leng, C.; Lin, Y.; Wahba, G. A note on the lasso and related procedures in model selection. Stat. Sin. 2006, 16, 1273–1284. [Google Scholar]
Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
Greenlaw, K.; Szefer, E.; Graham, J.; Lesperance, M.; Nathoo, F.S.; Initiative, A.D.N. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics 2017, 33, 2513–2522. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ahmed, S.E.; Nicol, C.J. An application of shrinkage estimation to the nonlinear regression model. Comput. Stat. Data Anal. 2012, 56, 3309–3321. [Google Scholar] [CrossRef]
Ahmed, S.E.; Raheem, S.E. Shrinkage and absolute penalty estimation in linear regression models. Wiley Interdiscip. Rev. Comput. Stat. 2012, 4, 541–553. [Google Scholar] [CrossRef]
Lisawadi, S.; Kashif Ali Shah, M.; Ejaz Ahmed, S. Model selection and post estimation based on a pretest for logistic regression models. J. Stat. Comput. Simul. 2016, 86, 3495–3511. [Google Scholar] [CrossRef]
Ahmed, S.E.; Opoku, E.A. Submodel selection and post-estimation of the linear mixed models. In Proceedings of the Tenth International Conference on Management Science and Engineering Management; Springer: Berlin/Heidelberg, Germany, 2017; pp. 633–646. [Google Scholar]
Raheem, S.E.; Ahmed, S.E.; Doksum, K.A. Absolute penalty and shrinkage estimation in partially linear models. Comput. Stat. Data Anal. 2012, 56, 874–891. [Google Scholar] [CrossRef]
Geladi, P.; Kowalski, B.R. Partial least-squares regression: A tutorial. Anal. Chim. Acta 1986, 185, 1–17. [Google Scholar] [CrossRef]
Liu, K. Using Liu-type estimator to combat collinearity. Commun. Stat.-Theory Methods 2003, 32, 1009–1020. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Yüzbaşı, B.; Ejaz Ahmed, S. Shrinkage and penalized estimation in semi-parametric models with multicollinear data. J. Stat. Comput. Simul. 2016, 86, 3543–3561. [Google Scholar] [CrossRef]
Yüzbası, B.; Ahmed, S.E.; Güngör, M. Improved penalty strategies in linear regression models. REVSTAT J. 2017, 15, 251–276. [Google Scholar]
Knight, K.; Fu, W. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar]
Belsley, D.A. Conditioning Diagnostics: Collinearity and Weak Data in Regression; Number 519.536 B452; Wiley: Hoboken, NJ, USA, 1991. [Google Scholar]
Twisk, J.; Kemper, H.; Mellenbergh, G. Longitudinal development of lipoprotein levels in males and females aged 12–28 years: The Amsterdam Growth and Health Study. Int. J. Epidemiol. 1995, 24, 69–77. [Google Scholar] [CrossRef] [PubMed]
Nie, Y.; Opoku, E.; Yasmin, L.; Song, Y.; Wang, J.; Wu, S.; Scarapicchia, V.; Gawryluk, J.; Wang, L.; Cao, J.; et al. Spectral dynamic causal modelling of resting-state fMRI: An exploratory study relating effective brain connectivity in the default mode network to genetics. Stat. Appl. Genet. Mol. Biol. 2020, 19. [Google Scholar] [CrossRef]
Ahmed, S.E.; Kim, H.; Yıldırım, G.; Yüzbaşı, B. High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. In International Workshop on Matrices and Statistics; Springer: Berlin/Heidelberg, Germany, 2016; pp. 145–175. [Google Scholar]
Ejaz Ahmed, S.; Yüzbaşı, B. Big data analytics: Integrating penalty strategies. Int. J. Manag. Sci. Eng. Manag. 2016, 11, 105–115. [Google Scholar] [CrossRef]
Ahmed, S.E.; Yüzbaşı, B. High dimensional data analysis: Integrating submodels. In Big and Complex Data Analysis; Springer: Berlin/Heidelberg, Germany, 2017; pp. 285–304. [Google Scholar]
Judge, G.G.; Bock, M.E. The Statistical Implication of Pre-Test and Steinrule Estimators in Econometrics; Elsevier: Amsterdam, The Netherlands, 1978. [Google Scholar]

Figure 1. RMSE of estimators as a function of the non-centrality parameter

Δ

when n = 60, and

p_{1} = 5

.

Figure 1. RMSE of estimators as a function of the non-centrality parameter

Δ

when n = 60, and

p_{1} = 5

.

Figure 2. RMSE of estimators as a function of the non-centrality parameter

Δ

when n = 100, and

p_{1} = 5

.

Figure 2. RMSE of estimators as a function of the non-centrality parameter

Δ

when n = 100, and

p_{1} = 5

.

Table 1. RMSEs of RSM, RPT, RSE, and RPS estimators with respect to

{\hat{β}}^{RFM}

when

Δ \geq 0

for

p_{1} = 5

and

n = 60

.

Table 1. RMSEs of RSM, RPT, RSE, and RPS estimators with respect to

{\hat{β}}^{RFM}

when

Δ \geq 0

for

p_{1} = 5

and

n = 60

.

$ρ$	$p_{2}$	$Δ$	CNI	RSM	RPT	RSE	RPS
0.3	40	0	361	2.61	2.07	1.94	1.96
		1		1.05	1.07	1.20	1.25
		2		0.25	0.95	1.04	1.05
		3		0.12	0.98	0.99	1.00
		4		0.08	1.00	1.00	1.00
	500	0	613	4.48	3.29	3.48	1.96
		1		1.26	1.12	1.26	1.29
		2		0.41	0.97	1.08	1.09
		3		0.18	0.99	1.00	1.00
		4		0.13	1.00	1.00	1.00
	1000	0	693	5.36	4.53	4.67	4.71
		1		1.53	1.21	1.35	1.39
		2		0.49	1.01	1.13	1.14
		3		0.28	0.99	0.99	0.99
		4		0.10	1.00	1.00	1.00
0.7	40	0	1352	3.18	2.33	2.17	2.18
		1		1.04	1.11	1.20	1.23
		2		0.42	1.03	1.04	1.04
		3		0.23	0.98	0.99	1.00
		4		0.14	1.00	1.00	1.00
	500	0	1789	4.48	2.76	2.94	3.02
		1		1.08	1.43	1.52	1.53
		2		0.67	1.03	1.07	1.06
		3		0.35	0.98	1.00	1.00
		4		0.19	1.00	1.00	1.00
	1000	0	2134	6.82	5.24	5.30	3.02
		1		1.16	1.32	1.42	1.53
		2		0.75	1.10	1.15	1.16
		3		0.39	0.99	1.00	1.00
		4		0.11	1.00	1.00	1.00

Table 2. RMSEs of RSM, RPT, RSE, and RPS estimators with respect to

{\hat{β}}^{RFM}

when

Δ \geq 0

for

p_{1} = 5

, and

n = 100

.

Table 2. RMSEs of RSM, RPT, RSE, and RPS estimators with respect to

{\hat{β}}^{RFM}

when

Δ \geq 0

for

p_{1} = 5

, and

n = 100

.

$ρ$	$p_{2}$	$Δ$	CNI	RSM	RPT	RSE	RPS
0.3	40	0	150	2.38	2.09	1.88	1.90
		1		0.89	1.01	1.05	1.08
		2		0.21	0.94	1.01	1.02
		3		0.06	0.94	0.99	1.00
		4		0.02	1.00	1.00	1.00
	500	0	340	4.15	2.65	2.99	3.17
		1		0.87	1.08	1.18	1.21
		2		0.14	0.96	1.03	1.05
		3		0.06	0.99	0.99	1.00
		4		0.03	1.00	1.00	1.00
	1000	0	536	4.30	2.75	3.02	3.08
		1		0.96	1.09	1.13	1.15
		2		0.21	0.8	1.03	1.03
		3		0.09	1.00	1.00	1.00
		4		0.04	1.00	1.00	1.00
0.7	40	0	997	3.27	2.15	2.09	2.11
		1		0.85	1.02	1.09	1.10
		2		0.21	0.98	1.02	1.02
		3		0.06	0.99	0.99	0.99
		4		0.01	1.00	1.00	1.00
	500	0	1589	4.13	2.22	2.35	2.39
		1		1.04	1.19	1.21	1.20
		2		0.30	0.97	1.05	1.05
		3		0.14	1.00	1.00	1.00
		4		0.08	1.00	1.00	1.00
	1000	0	1751	5.17	3.71	4.03	4.09
		1		1.01	1.15	1.24	1.25
		2		0.39	1.04	1.07	1.06
		3		0.16	0.99	1.00	1.00
		4		0.11	1.00	1.00	1.00

Table 3. RMSEs of estimators with respect to

{\hat{β}}^{RFM}

when

Δ = 0

for

p_{1} = 10

.

Table 3. RMSEs of estimators with respect to

{\hat{β}}^{RFM}

when

Δ = 0

for

p_{1} = 10

.

n	$ρ$	$p_{2}$	CNI	RSM	RPT	RSE	RPS	LASSO	aLASSO
60	0.3	50	35.64	3.31	2.25	1.82	1.95	1.23	1.28
		500	452.76	4.13	3.71	2.61	3.01	1.47	1.52
		1000	1265.34	5.02	4.28	4.61	4.78	1.96	2.15
		2000	4567.56	7.13	5.10	6.18	6.39	2.70	3.06
	0.7	50	61.34	3.52	3.05	2.51	2.55	1.14	1.21
		500	743.17	4.49	3.65	3.41	3.50	1.36	1.58
		1000	2350.89	5.84	4.11	4.32	4.61	1.68	1.95
		2000	6908.39	8.10	5.31	6.24	6.29	1.84	2.02
	0.9	50	120.21	4.21	3.61	3.34	3.35	1.10	1.05
		500	950.98	4.82	3.3.8	3.72	3.73	1.21	1.16
		1000	5892.51	6.35	4.10	5.01	5.13	1.42	1.31
		2000	8352.73	8.51	4.63	5.24	5.38	1.61	1.35
100	0.3	50	31.21	2.91	2.54	2.12	2.23	1.32	1.36
		500	356.64	3.75	3.31	2.84	2.92	1.54	1.61
		1000	975.32	4.25	2.53	3.42	3.61	1.92	2.06
		2000	2764.84	5.61	4.25	4.91	5.08	2.31	2.46
	0.7	50	52.79	3.18	2.61	2.30	2.37	1.28	1.53
		500	578.43	4.28	3.05	3.52	3.59	1.46	2.07
		1000	1281.66	5.10	3.26	3.78	3.82	1.84	2.52
		2000	3498.30	6.12	3.01	4.26	4.33	2.27	2.41
	0.9	50	79.41	4.11	3.41	3.21	3.28	1.28	1.21
		500	681.43	4.35	3.55	3.41	3.50	1.43	1.51
		1000	1470.32	5.82	3.18	4.01	4.14	1.72	1.79
		2000	4105.90	7.04	4.57	5.22	5.32	1.87	1.96

Table 4. Estimate, standard error for the active predictors and RPEs of estimators with respect to full-model estimator for the Amsterdam Growth and Health Study data.

	RFM	RSM	RPT	RSE	RPS	LASSO	aLASSO
Estimate( $β_{2}$ )	0.381	0.395	0.392	0.389	0.390	0.624	0.611
Standard error	0.104	0.102	0.100	0.009	0.008	0.081	0.079
Estimate ( $β_{5}$ )	0.137	0.125	0.131	0.130	0.133	0.101	0.105
Standard error	0.012	0.010	0.009	0.011	0.010	0.013	0.012
RPE	1.000	0.723	0.841	0.838	0.831	0.986	0.973

Table 5. RPEs of estimators.

	RFM	RSM	RPT	RSE	RPS	LASSO	aLASSO
RPE	1.000	0.802	0.947	0.932	0.928	1.051	1.190

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Opoku, E.A.; Ahmed, S.E.; Nathoo, F.S. Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application. Entropy 2021, 23, 1348. https://doi.org/10.3390/e23101348

AMA Style

Opoku EA, Ahmed SE, Nathoo FS. Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application. Entropy. 2021; 23(10):1348. https://doi.org/10.3390/e23101348

Chicago/Turabian Style

Opoku, Eugene A., Syed Ejaz Ahmed, and Farouk S. Nathoo. 2021. "Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application" Entropy 23, no. 10: 1348. https://doi.org/10.3390/e23101348

APA Style

Opoku, E. A., Ahmed, S. E., & Nathoo, F. S. (2021). Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application. Entropy, 23(10), 1348. https://doi.org/10.3390/e23101348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application

Abstract

1. Introduction

2. Model and Estimation Strategies

2.1. Linear Mixed Model

2.2. Ridge Full Model and Sub-Model Estimator

2.3. Pretest Ridge Estimation Strategy

2.4. Shrinkage Ridge Estimation Strategy

3. Asymptotic Results

4. Simulation Studies

4.1. Simulation Results

4.2. Comparison with LASSO-Type Estimators

5. Real Data Application

5.1. Amsterdam Growth and Health Data (AGHD)

5.2. Resting-State Effective Brain Connectivity and Genetic Data

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix B.1

Appendix B.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI