1. Introduction
In many fields such as bio-informatics, physical biology, and epidemiology, the response of interest is represented by repeated measures of some variables of interest that are collected over a specified time period for different independent subjects or individuals. These types of data are commonly encountered in medical research where the responses are subject to various time-dependent and time-constant effects such as pre- and post-treatment types, gender effect, and baseline measures, among others. A widely-used statistical tool in the analysis and modeling of longitudinal and repeated measures data is the linear mixed effects model (LMM) [
1,
2]. This model provides an effective and flexible way to describe the means and the covariance structures of a response variable after accounting for within subject correlation.
The rapid growth in the size and scope of longitudinal data has created a need for innovative statistical strategies in longitudinal data analysis. Classical methods are based on the assumption that the number of predictors is less than the number of observations. However, there is an increasing demand for efficient prediction strategies for analysis of high-dimensional data, where the number of observed data elements (sample size) are smaller than the number of predictors in a linear model context. Existing techniques that deal with high-dimensional data mostly rely on various penalized estimators. Due to the trade-off between model complexity and model prediction, the statistical inference of model selection becomes an extremely important and challenging problem in high-dimensional data analysis.
Over the years, many penalized regularization approaches have been developed to do variable selection and estimation simultaneously. Among them, the least absolute shrinkage and selection operator (LASSO) is commonly used [
3]. It is a useful estimation technique in part due to its convexity and computational efficiency. The LASSO approach is based on an
penalty for regularization of regression parameters. Ref. [
4] provides a comprehensive summary of the consistency properties of the LASSO approach. Related penalized likelihood methods have been extensively studied in the literature, see for example [
5,
6,
7,
8,
9,
10]. The penalized likelihood methods have a close connection to Bayesian procedures. Thus, the LASSO estimate corresponds to a Bayes method that puts a Laplacian (double-exponential) prior on the regression coefficients [
11,
12].
In this paper, our interest lies in estimating the fixed effect parameters of the LMM using a ridge estimation technique when it is assumed that some prior information is available in the form of potential linear restrictions on the parameters. One possible source of prior information is using a Bayesian approach. An alternative source of prior information may be obtained from previous studies or expert knowledge that search for or assume sparsity patterns.
We consider the problem of fixed effect parameter estimation for LMMs when there exist many predictors relative to the sample size. These predictors may be classified into two groups: sparse and non-sparse. Thus, there are two choices to be considered: a full model with all predictors, and a sub-model that contains only non-sparse predictors. When the sub-model based on available subspace information is true (i.e., the assumed restriction holds), it then provides more efficient statistical inferences than those based on a full model. In contrast, if the sub-model is not true, the estimates could become biased and inefficient. The consequences of incorporating subspace information therefore depend on the quality or reliability of the information being incorporated into the estimation procedure. One way to deal with uncertain subspace information is to use a pretest estimation strategy. The validity of the information is tested before incorporation into a final estimator. Another approach is shrinkage estimation, which shrinks the full model estimator to the sub-model estimator by utilizing subspace information. Besides these estimation strategies, there is a growing literature on simultaneous model selection and estimation. These approaches are known as penalty strategies. By shrinking some regression coefficients toward zero, the penalty methods simultaneously select a sub-model and estimate its regression parameters. Several authors have investigated the pretest, shrinkage, and penalty estimation strategies in partial linear model, Poisson regression model, and Weibull censored regression model [
13,
14,
15].
To formulate the problem, we suppose that the vector of the fixed effects parameter
in the LMM can be partitioned into two sub-vectors
, where
is the coefficient vector of non-sparse predictors and
is the coefficient vector of sparse predictors. Our interest lies in the estimation of
when
is close to zero. To deal with this problem in the context of low dimensional data, ref. [
16] propose an improved estimation strategy using sub-model selection and post-estimation for the LMM. Within this framework, linear shrinkage and shrinkage pretest estimation strategies are developed, which combine full model and sub-model estimators in an effective way as a trade-off between bias and variance. Ref. [
17] extend this study by using a likelihood ratio test to develop James–Stein shrinkage and pretest estimation methods based on LMM for longitudinal data. In addition, the non-penalty estimators are compared with several penalty estimators (LASSO, adaptive LASSO and Elastic Net) for best performance.
In most real data situations, there is also the problem of multicollinearity among predictor variables for high-dimensional data. Various biased estimation techniques such as shrinkage estimation, partial least squares estimation [
18] and Liu estimators [
19] have been implemented to deal with this problem, but the widely used technique is ridge estimation [
20]. The ridge estimator overcomes the weakness of the least squares estimator with a smaller mean squared error. To overcome and combat multicollinearity, ref. [
21] propose pretest and Stein-type ridge regression estimators for linear and partially linear models. Furthermore, ref. [
22] also develop shrinkage estimation based on Liu regression to overcome multicollinearity in linear models.
Our primary focus is on the estimation and prediction problem for linear mixed effect models when there are many potential predictors that have a weak or no influence on the response of interest. This method simultaneously controls overfitting using general least square estimation with a roughness penalty. We propose pretest and shrinkage estimation strategies using the ridge estimation technique as a base estimator and numerically compare their performance with the LASSO and adaptive LASSO estimators. Our proposed estimation strategy is applied to both high-dimensional and low-dimensional data.
The rest of this article is organized as follows. In
Section 2, we present the linear mixed effect model and the proposed estimation techniques. We introduce the full and sub-model estimators based on ridge estimation. Thereafter, we construct the pretest and shrinkage ridge estimators.
Section 3 provides the asymptotic bias and risk of these estimators. A Monte Carlo simulation is used to evaluate the performance of the estimators including a comparison with the lasso-type estimators, and the results are reported in
Section 4.
Section 5 presents a demonstration of the proposed methodology on a high-dimensional resting-state effective brain connectivity and genetic data. We also illustrate the proposed estimation methods in an application to a low-dimensional Amsterdam growth and health study.
Section 6 presents a discussion with recommendations.
2. Model and Estimation Strategies
In this section, we present the linear mixed effect model and the proposed estimation strategies.
2.1. Linear Mixed Model
Suppose that we have a sample of
N subjects. For the
subject, we collect the response variable
for the jth time, where
and
. Let
denotes the
vector of responses from the ith subject. Let
and
be
p and
q known fixed-effects and random-effect design matrix for the ith subject of full rank
p and
q, respectively. The linear mixed effect model [
1] for a vector of repeated responses
on the ith subject is assumed to have the form
where
is the p × 1 vector of unknown fixed-effect parameters or regression coefficients,
is the q × 1 vector of unobservable random effects for the ith subject, assumed to come from a multivariate normal distribution with zero mean and a covariance matrix
G, where
G is an unknown
covariance matrix and
denotes
1 vector of error terms assumed to be normally distributed with zero mean, covariance matrix
. Further,
are assumed to be independent of the random effects
.
The marginal distribution for the response
is normal with mean
and covariance matrix
By stacking the vectors, the mixed model can be can be expressed as
. From the Equation (
1), the distribution of the model follows
, where
with covariance,
.
2.2. Ridge Full Model and Sub-Model Estimator
The generalized least square estimator (GLS) is defined as and the ridge full model estimator can be obtained by introducing a penalized regression so that and , where is the ridge full model estimator and is the tuning parameter. If k = 0, is the GLS estimator and for k is sufficiently large. We select the value of k using cross validation.
We let , where is an sub-matrix containing the non-sparse predictors and is an sub-matrix that contains the sparse predictors. Accordingly, where and have dimensions and , respectively, with , for i = 1, 2.
A sub-model is defined as which corresponds to The sub-model estimator of has the form We denote as the full model ridge estimator of and given as where
2.3. Pretest Ridge Estimation Strategy
Generally, the sub-model estimator will be more efficient than the full model estimator if the information embodied in the imposed linear restrictions is valid, thus is close to zero. However, if the information is not valid the sub-model estimator is likely to be more biased and may have a higher risk than the full model estimator. There is, therefore, some doubt as to whether or not to impose the restrictions on the model’s parameter. It is in response to this uncertainty that a statistical test may be used to determine the validity of the proposed restrictions. Accordingly, the procedure to follow in practice is pretest the validity of the restrictions and if the outcome of the pretest suggests that they are correct then the model parameters are estimated incorporating the restrictions. If the pretest rejects the restrictions then the parameters are estimated from the sample information alone. This motivates the consideration of the pretest estimation strategy for the LMM.
The pretest estimator is a combination of the full model estimator and sub-model estimator through an indicator function , where is an appropriate test statistic to test versus . Moreover, is an level critical value based on distribution of under . We define test statistics based on the log-likelihood ratio test as .
Under
, the test statistic
follows asymptotic chi-square distribution with
degrees of freedom. The pretest test ridge estimator
of
is then defined by
.
2.4. Shrinkage Ridge Estimation Strategy
The pre-test estimator is a discontinuous function of the sub-model
and full model
, which depends on the hard threshold
. We address this limitation by defining the shrinkage ridge estimator based on soft thresholding. The shrinkage ridge estimator (RSE) of
, denoted as
is defined as
Here,
is the linear combination of the full model
and sub-model
estimates. If
then a relatively large weight is placed on
otherwise, more weight is on
. A setback with
is that it is not a convex combination of
and
. This can cause over-shrinkage, which gives the estimator opposite sign of
. This could happen if
is larger than one. To counter this, we use the positive-part shrinkage ridge estimator (RPS) defined as
where
The RPS estimator will control possible over-shrinking in the RSE estimator.
3. Asymptotic Results
In this section, we derive the asymptotic distributional bias and risk of the estimators considered in
Section 2. We examine the properties of the estimators for increasing
n and as
approaches the null vector under the sequence of local alternatives defined as
where
is a fixed vector. The vector
is a measure of how far local alternatives
differ from the subspace information
. In order to evaluate the performance of the estimators, we define the asymptotic distributional bias of the estimator
as
In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimator
is expressed as
Following the asymptotic covariance matrix, we define the asymptotic risk of an estimator
as
.
Q is a positive definite matrix of weights with dimensions of
. We set
Q =
I in this study.
Assumption 1. We make the following two regularity conditions to establish the asymptotic properties of the estimators.
1.0as nwhereis the ith row ofX.
2.for some finiteB = .
Theorem 1. For,
Ifand B is non-singular, the distribution of the full model ridge estimator,
iswheredenotes convergence in distribution. Proposition 1. Assuming the above assumption 1 together with Theorem 1 hold, under the local alternatives,
we have
where
,
,
,
,
,
,
,
and
.
Theorem 2. Under the condition of Theorem 1 and the local alternatives,
the ADBs of the proposed estimators arewhere,
, and
is the cumulative distribution function of the non-central chi-squared distribution with non-centrality parameter Δ
and v degrees of freedom, and is the expected value of the inverse of a non-central distribution with v degrees of freedom and non-centrality parameter Δ,
Since the ADBs of the estimators are in non-scalar form, we define the following asymptotic quadratic bias (AQDB) of
by
where
Corollary 1. Suppose Theorem 2 holds. Then, under,
the AQDBs of the estimators are When , the AQDB of all estimators are equivalent, and the estimators are therefore asymptotically unbiased. If we assume that , the results for the bias of the estimators can be summarized as follows:
The AQDB of is an unbounded function of .
The AQDB of starts from at , and when increases, it increases to the maximum and then decreases to zero.
The characteristics of and are similar to . The AQDB of and similarly start from at , and increase to a point, and then decrease towards zero, since is a non-increasing on of .
Theorem 3. Suppose Theorem 1 holds and under the local alternatives,
the covariance matrices of the estimators are Corollary 2. Under the local alternatives (
)
and from Theorem 3, the risk of the estimators are obtained as From Theorem 2, when the risks of estimators and are reduced to common value , the risk of . If , the results can be summarized as follows:
The risk of remains constant while the risk of is an unbounded function of since
The risk of increases as moves away from zero, achieves it maximum and then decreases towards the risk of the full model estimator.
The risk of is smaller than the risk of for small values in the neighborhood of and for the rest of the parameter space, outperforms , thus, .
Comparing the risks of and , it can be seen that the estimator outperforms that is, for all
4. Simulation Studies
In this section, we conduct a simulation study to assess the performance of the suggested estimators for finite samples. The criterion for comparing the performance of any estimator in our study is the mean square error. We simulate the response from the following LMM model
where
with
. We generate random effect covariate
from a multivariate normal distribution with zero mean and covariance matrix
, where
is
identity matrix. The design matrix
is generated from a
-multivariate normal distribution with mean vector and covariance matrix
. Furthermore, we assume that the off-diagonal elements of the covariance matrix
are equal to
, which is the coefficient of correlation between any two predictors, with
The ratio of the largest eigenvalue to the smallest eigen-value of matrix
is calculated as a condition number index (CNI) [
24], which assesses the existence of multicollinearity in the design matrix. If the CNI is larger than 30, then the model has significant multicollinearity. Our simulations are based on the linear mixed effects model in Equation (
3) with
and 100 subjects.
We consider a situation when the model is assumed to be sparse. In this study, our interest lies in testing the hypothesis , and our goal is to estimate the fixed effect coefficient . We partition the fixed effects coefficients as The coefficients and are and dimensional vectors, respectively, with .
In order to investigate the behavior of the estimators, we define , where and is the euclidean norm. We considered values between 0 and 4. If , then we will have to generate the response under null hypothesis. On the other hand, when say we will have to generate the response under the local alternative hypothesis. In our simulation study, we consider the number of fixed effect or predictor variables as Each realization is repeated 5000 times to obtain consistent results and compute the MSE of suggested estimators with .
Based on the simulated data, we calculate the mean square error (MSE) of all the estimators as MSE, where denotes any one of and , in the jth repetition. We use the relative mean squared efficiency (RMSE), or the ratio of MSE for risk performance comparison. The RMSE of an estimator with respect to the baseline full model ridge estimator is defined as where is one of the suggested estimators under consideration.
4.1. Simulation Results
In this subsection, we present the results from our simulation study. We report the results for
and
with different values of correlation coefficient
are shown in
Table 1. Furthermore, we plot the RMSEs against
in
Figure 1 and
Figure 2. The findings can be summarized as follows:
When , the sub-model RSM outperforms all other estimators. As moves from zero, the RMSE of the sub-model decreases and goes to zero.
The pretest ridge estimator RPT outperforms shrinkage ridge and positive Stein ridge estimators in the case of . However, for large number of sparse predictors while keeping and n fixed, RPT is less efficient than RPS and RSE. In the case of being larger than zero, the RMSE of RPT decreases, and it remains below 1 for immediate values of , after that the RMSE of RPT increases and approaches one for larger values of
RPS performs better than RSE in the entire parameter space induced by
as presented in
Table 1 and
Table 2. Similarly, both shrinkage estimators RPS and RSE outperforms the full ridge model estimator irrespective of the corrected sub-model selected. This is consistent with the asymptotic theory presented in
Section 3.
which measures the degree of deviation from the Assumption 1 on the parameter space, it is clear that one cannot go wrong with the use of shrinkage estimators even if the selected sub-model is wrongly specified. As evident from
Table 1 and
Table 2,
Figure 1 and
Figure 2, if the selected sub-model is correct, that is,
, then the shrinkage estimators are relatively efficient compared with the ridge full model estimator. On the other hand, if the sub-model is misspecified, the gain slowly diminishes. However, in terms of risk, the shrinkage estimators are at least as good as the full ridge model estimator. Therefore, the use of shrinkage estimators makes sense in application when a sub-model cannot be correctly specified.
The RMSE of the ridge-type estimators are an increasing function of the amount of multicollinearity. This indicates that the ridge-type estimators perform better than the classical estimator in the presence of multicollinearity among predictor variables.
4.2. Comparison with LASSO-Type Estimators
We compare our listed estimators with the LASSO and adaptive LASSO estimators. A 10-fold cross-validation is used for selecting the optimal value of the penalty parameters that minimizes the mean square errors for the LASSO-type estimators. The results for
,
,
and
are presented in
Table 3. We observe the following from
Table 3.
The performance of the sub-model estimator is the best among all estimators.
The pretest ridge estimator performs better than the other estimators. However, for larger values of sparse predictors the shrinkage estimators outperform the pretest estimator.
The performance of the LASSO and aLASSO estimators are comparable when is small. The pretest and shrinkage estimators remain stable for a given value of .
For a large number of sparse predictors , the shrinkage and pretest ridge estimators outperforms the lasso-type estimators. This indicates the superiority of the shrinkage estimators over the LASSO-type estimators. Therefore shrinkage estimators are preferable when there is multicollinearity in our predictor variables.
5. Real Data Application
We consider two real data analyses using Amsterdam Growth and Health Data and a genetic and brain network connectivity edge weight data to illustrate the performance of the proposed estimators.
5.1. Amsterdam Growth and Health Data (AGHD)
The AGHD data is obtained from the Amsterdam Growth and Health Study [
25]. The goal of this study is to investigate the relationship between lifestyle and health in adolescence into young adulthood. The response variable
Y is the total serum cholesterol measured over six time points. There are five covariates:
is the baseline fitness level measured as the maximum oxygen uptake on a treadmill,
is the amount of body fat estimated by the sum of the thickness of four skinfolds,
is a smoking indicator (0 = no, 1 = yes),
is the gender (1 = female, 2 = male), and time measurement as
and subject specific random effects.
A total of 147 subjects participated in the study where all variables were measured at time occasions. In order to apply the proposed methods, firstly, we apply a variable selection based on AIC procedure to select the sub-model. For the AGHD data, we fit a linear mixed model with all the five covariates for both fixed and subject specific random effects by two stage selection procedure for the purpose of choosing both the random and fixed effects. The analysis found and to be significant covariates for prediction of the response variable serum cholestrol and the other variables are ignored since they are not significantly important. Based on this information, a sub-model is chosen to be and and the full model includes all the covariates. We construct the shrinkage estimators from the full-model and sub-model. In terms of null hypothesis, the restriction can be written as with , and .
To evaluate the performance of the estimators, we obtain the mean square prediction error (MSPE) using bootstrap samples. We draw 1000 bootstrap samples of the 147 subjects from the data matrix
. We then calculate the relative prediction error (RPE) of
with respect to
, the full model estimator. The RPE is defined as
where
is one of the listed estimators. If
, then
outperforms
Table 4 reports the estimates, standard error of the non-sparse predictors and RPEs of the estimators with respect to the full model. As expected, the sub-model ridge estimator
has the minimum RPE because it is computed when the sub-model is correct, that is,
. It is evident by the RPE values in
Table 4 that the shrinkage estimators are superior to the LASSO-type estimators. Furthermore, the positive shrinkage is more efficient than the shrinkage ridge estimator.
5.2. Resting-State Effective Brain Connectivity and Genetic Data
This data comprises longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) effective brain connectivity network and genetic study [
26] data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The 111 subjects comprise 36 cognitively normal (CN), 63 mild cognitive impairment (MCI) and 12 Alzheimer’s Disease (AD) subjects. The response is a network connection between regions of interest estimated from an rs-fMRI scan within the Default Mode Network (DMN), and we observe a longitudinal sequence of such connections for each subject with the number of repeated measurements. The DMN consists of a set of brain regions that tend to be active in resting-state, when a subject is mind wandering with no intended task. For this data analysis, we consider the network edge weight from the left intraparietal cortex to posterior cingulate cortex (LIPC → PCC) as our response. The genetic data are single nucleotide polymorphism (SNPs) from non-sex chromosomes, i.e., chromosome 1 to chromosome 22. SNPs with minor allele frequency less than 5% are removed as are SNPs with a Hardy–Weinberg equilibrium p-value lower than
or a missing rate greater than 5%. After preprocessing we are left with 1,220,955 SNPs and the longitudinal rs-fMRI effective connectivity network using the 111 subjects with rs-fMRI data. The response is network edge weight. There are SNPs which are the fixed effects and subject specific random effects.
In order to apply the proposed methods, we use a genome- wide association study (GWAS) for screening the genetic data to 100 SNPs. We implement a second screening by applying multinomial logistic regression to identify a smaller subset of the 100 SNPs that are potentially associated with disease (CN/MCI/AD). This yields a subset of top 10 SNPs. This showed the top 10 SNPs are the most important predictors and the other 90 SNPs are ignored as not significant. We now have two models, which are the full model with all 100 SNPs and sub-model with 10 SNPs selected. Finally, we construct the pretest and shrinkage estimators from the full-model and sub-model.
We draw 1000 bootstrap samples with replacements from the corresponding data matrix
We report the RPE of the estimators based on the bootstrap simulation with respect to the full model ridge estimator in
Table 5. We observe that the RPE of the sub-model, pretest, shrinkage and positive shrinkage ridge estimators outperforms the full model estimator. Clearly, the sub-model ridge estimator has the smallest RPE since it’s computed when the candidate sub-model is correct, i.e.,
. Both shrinkage ridge estimators outperform the pretest ridge estimator. Particularly, the positive shrinkage performed better than the shrinkage estimator. The performance of both shrinkage and pretest ridge estimators are better than the LASSO-type estimators. Thus, the data analysis is in line with our simulation and theoretical findings.
6. Conclusions
In this paper, we present efficient estimation strategies for the linear mixed effect model when there exists multicollinearity among predictor variables for high-dimensional data application. We considered the estimation of fixed effects parameters in the linear mixed model when some of the predictors may have a very weak influence on the response of interest. We introduced pretest and shrinkage estimation in our model using the ridge estimation as the reference estimator. In addition, we established the asymptotic properties of the pretest and shrinkage ridge estimators. Our theoretical findings demonstrate that the shrinkage ridge estimators outperform the full model ridge estimator and perform relatively better than the sub-model estimator in a wide range of the parameter space.
Additionally, a Monte Carlo simulation was conducted to investigate and assess the finite sample behavior of proposed estimators when the model is sparse (restrictions on parameters hold). As expected, the sub-model ridge estimator outshines all other estimators when the restrictions hold. However, when this assumption is violated, the shrinkage and pretest ridge estimators outperform the sub-model estimator. Furthermore, when the number of sparse predictors are extremely large relative to the sample size, the shrinkage estimators outperform the pretest ridge estimator. These numerical results are consistent with our asymptotic result. We also assess the relative performance of the LASSO-type estimators with our ridge-type estimators. We observe that the performance of pretest and shrinkage ridge estimators are superior to the LASSO-type estimators when predictors are highly correlated. For our real data application, the shrinkage ridge estimators are superior with the smallest relative prediction error compared to the LASSO-type estimators.
In summary, the results of the data analyses strongly confirm the findings of the simulation study and suggest the use of the shrinkage ridge estimation strategy when no prior information about the parameter subspace is available. The results of our simulation study and real data application are consistent with available results in [
27,
28,
29].
In our future work, we will focus on other penalty estimators like the Elastic-Net, the minimax concave penalty (MCP), and the smoothly clipped absolute deviation method (SCAD) as estimation strategy in LMM for high-dimensional data. These estimators will be assessed and compared with the proposed ridge-type estimators. Another interesting extension will be integrating two sub-models by incorporating ridge-type estimation strategies in the linear mixed effect models. The goal is to improve the estimation accuracy of the non-sparse set of the fixed effects parameters by combining an over-fitted model estimator with an under-fitted one [
27,
29]. This approach will include combining two sub-models produced by two different variable selection techniques from the LMM [
28].