Next Article in Journal
Collective Intelligence in Design Crowdsourcing
Next Article in Special Issue
Permutation Variation and Alternative Hyper-Sphere Decomposition
Previous Article in Journal
The Complex Systems for Conflict Interaction Modelling to Describe a Non-Trivial Epidemiological Situation
Previous Article in Special Issue
Bivariate Continuous Negatively Correlated Proportional Models with Applications in Schizophrenia Research
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data

1
School of Mathematics, Yunnan Normal University, Kunming 650092, China
2
Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(4), 538; https://doi.org/10.3390/math10040538
Submission received: 8 January 2022 / Revised: 30 January 2022 / Accepted: 31 January 2022 / Published: 9 February 2022
(This article belongs to the Special Issue Recent Advances in Computational Statistics)

Abstract

:
Motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, the objective of integration of important biomarkers for the early detection of Mild Cognitive Impairment (MCI) to Alzheimer’s disease (AD) as a therapeutic intervention is most likely to be beneficial in the early stages of disease progression. Developing predictors for MCI to AD comes down to genotype variables such that the dimension of predictors increases as the sample becomes large. Thus, we consider the sparsity concept of coefficients in a high-dimensional regression model with clustered failure time data such as ADNI, which enables enhancing predictive performances and facilitates the model’s interpretability. In this study, we propose two MM algorithms (profile and non-profile) for the shared frailty survival model firstly and then extend the two proposed MM algorithms to regularized estimation in sparse high-dimensional regression model. The convergence properties of our proposed estimators are also established. Furthermore simulation studies and analysis of ADNI data are illustrated by our proposed methods.

1. Introduction

In biomedical research, we often encounter clustered failure time data in which individuals from the same cluster (e.g., family) share common genetic and/or environmental factors. An illustrative example comes from the Alzheimer’s Disease Neuroimaging Initiative study where each participant may develop Mild Cognitive Impairment (MCI) and/or develop Alzheimer’s disease (AD). The times to these two events for each individual are expected to be highly associated.
In order to account for the correlation of associated failure times, shared frailty or random effect models are commonly used by researchers ([1,2,3,4]). In particular, the gamma frailty model ([5,6,7,8]) is numerically convenient for such analysis since the likelihood function exhibits a closed form. For example, ref. [2] proposed a frailty model where individuals within the same cluster share a common random effect. Refs. [9,10,11] provided a good and comprehensive review of the applications of different frailty models for clustered survival data. The authors of [12] compared several commonly used methods and demonstrated the advantages of the gamma frailty model. Refs. [13,14] also explored the nonparametric frailty model and the between-within frailty model for correlated survival data.
For high-dimension regression analysis, an important and useful strategy is to exploit sparsity and assume that the true parameters lie in a low-dimensional subspace. In the past years, sparsity-restricted estimation has attracted a great deal of attention in high-dimensional regression models. That is, only a few regression coefficients are assumed to be nonzero ([15,16]). In addition to the widely used LASSO penalty, there exists other types of regularization methods such as ridge regression ([17]), bridge regression ([18]), and elastic net ([19]). In order to improve the performance of the LASSO, many modifications such as the adaptive Lasso, the smoothly clipped absolute deviation (SCAD, [20]), and minimax concave penalty (MCP, [21]) have also been proposed.
In this paper, we develop fast and efficient algorithms for regularized estimation in the general frailty model for clustered failure time data. As the model parameters consists of both the regression coefficients and the unknown nonparametric baseline hazard function, computation in the frailty model with survival data is usually intensive, especially when frailty does not result in a closed-form likelihood function. The regularized estimation poses even more challenges for the problem and further complicates computational burden. The existing approaches of general frailty model almost rely on the EM algorithm. The EM algorithm adopts Newton’s method in its maximization step, which involves matrix inversion and may not have good performance in a high-dimensional environment. The minorization-maximization (MM) algorithm possesses an ascent property, which driving the target likelihood function to increase and reliably converges to the maximum from well-chosen initial values ([22,23,24]). In particular, ref. [25] proposed three different MM algorithms for the gamma frailty model and further demonstrated their utility for the estimation in high-dimensional situations via decomposing a high-dimensional objective function into separable low-dimensional functions. In this paper, we first develop a pair of MM algorithms, including profile and non-profile, for the general frailty model. As demonstrated by [25] in numerical studies, the decomposition inosculated with regularized estimation well in sparse high-dimensional settings. For illustration, we chose to use concave penalties such as the smoothly clipped absolute deviations penalty (SCAD, [20]) and the minimax concave penalty (MCP, [21]) to explore the sparsity because they both possess the good property of unbiasedness.
The rest of the paper is organized as follows. In Section 2, we provide the model description, an overview of MM principle, and then propose a pair of MM algorithms. In Section 3, we derive regularized estimation methods via profile and non-profile MM algorithms in sparse high-dimensional regression setting. The convergence properties of the proposed algorithms are provided in Section 4. In Section 5, a series of simulation studies was conducted to assess the finite-sample performances of the proposed methods. Section 6 provides a real application to the ADNI data.

2. The Model and Estimation

2.1. The Model

Consider datasets from some population that contains M i 1 individuals in i-th subgroup of the population, i = 1 , , B . Individuals within the i-th subgroup have dependent event times due to some unobserved covariate information summarized in a frailty, ω i . Let Y i j be the event time, let C i j be the censoring time, and let X i j = ( X i j 1 , , X i j p ) denote the potential covariates for the j-th individual in the i-th subgroup. The censoring time C i j is assumed to be independent of Y i j , given covarites X i j and frailty ω i . Define t i j = min ( Y i j , C i j ) and δ i j = I ( Y i j C i j ) , where I ( · ) denotes the indicator function. Suppose that censorship is noninformative. Conditional on frailty ω i , the hazard rate function is of the following form:
λ ( t i j | X i j , ω i ) = ω i λ 0 ( t i j ) exp { X i j β } ,
where λ 0 ( · ) is an arbitrary baseline hazard rate, and β is a vector of unknown parameters. Assume that ω i , i = 1 , , B are independent and identically distributed with density function f ( ω i | θ ) on the domain W . Denote α = ( θ , β , Λ 0 ) ; we then propose profile MM and non-profile MM methods to estimate parameters based on the minorization-maximization (MM) principle.

2.2. An Overview of MM Principle

Before introducing the two proposed methods, let us review the MM principle. Assume that Y o b s is the observed data, ( α Y o b s ) is the log-likelihood function with unknown parameter α = ( α 1 , , α q ) T , and the maximum likelihood estimate of α is α ^ = argmax ( α Y o b s ) . The MM principle involves two M steps: One is a minorization step, and the other is maximization step. In a maximization problem, the first step is minorization, which aims to construct a minorization/surrogate function for the objective log-likelihood function ( α Y o b s ) to be maximized through a series of inequalities satisfying the following two conditions:
Q ( α α ( k ) ) ( α Y o b s ) , Q ( α ( k ) α ( k ) ) = ( α ( k ) Y o b s ) ,
where α ( k ) is the k-th approximation of α ^ . Once the minorization function Q ( α α ( k ) ) is successfully constructed for objective function ( α Y o b s ) , the following maximization step is to maximizing the surrogate function Q ( α α ( k ) ) to obtain the ( k + 1 ) -th approximation of α ^ rather than the objective function, i.e.,
α ( k + 1 ) = argmax   Q ( α α ( k ) ) .
By MM principle, we may have ( α ( k + 1 ) Y o b s ) Q ( α ( k + 1 ) α ( k ) ) Q ( α ( k ) α ( k ) ) = ( α ( k ) Y o b s ) , and the values of the objective function continue to increase until convergence.

2.3. Profile MM Estimation Procedure

For general shared frailty models, we can write the observed log-likelihood function ( α | Y o b s ) as follows:
( α | Y o b s ) = i = 1 B log W τ i ( ω i | α ) d ω i ,
where the following is the case.
τ i ( ω i | α ) = f ( ω i | θ ) j = 1 M i λ 0 ( t i j ) ω i exp ( X i j β ) δ i j exp Λ 0 ( t i j ) ω i exp ( X i j β ) .
Generally, the Laplace transform of the frailty’s distribution is theoretically intractable. Hence, the explicit form of marginal hazard is not available to us. However, that has not stopped us from developing the profile MM approach for estimation in general shared frailty model. Define the following:
v i ( ω i | α ( k ) ) = τ i ( ω i | α ( k ) ) W τ i ( ω i | α ( k ) ) d ω i ,
and rewrite the objective function as follows.
( α | Y o b s ) = i = 1 B log W τ i ( ω i | α ) v i ( ω i | α ( k ) ) · v i ( ω i | α ( k ) ) d ω i .
By Jensen’s inequality, we have the following:
φ X h ( x ) · g ( x ) d x X φ h ( x ) · g ( x ) d x ,
where X is a subset of the real line R , φ ( · ) is the concave function, h ( · ) is an arbitrary real-valued function defined on X , and g ( · ) is a density function defined on X . In Equation (4), v i ( ω i | α ( k ) ) is a density function, and choosing h ( x ) as τ i ( ω i | α ) / v i ( ω i | α ( k ) ) , we can apply the above Jensen’s inequality and construct the following surrogate function for ( α | Y o b s ) :
Q 1 ( α | α ( k ) ) = Q 11 ( θ | α ( k ) ) + Q 12 ( β , Λ 0 | α ( k ) ) ,
where the following is the case:
Q 11 ( θ | α ( k ) ) = i = 1 B W log [ f ( ω i | θ ) ] · v i ( ω i | θ ( k ) , α ( k ) ) d ω i ,
which only consists of parameter θ . Moreover, we have the following:
Q 12 ( β , Λ 0 | α ( k ) ) = i = 1 B j = 1 M i δ i j log ( λ 0 ( t i j ) ) + δ i j X i j β A i ( k ) Λ 0 ( t i j ) exp ( X i j β ) ,
where A i ( k ) = W ω i · v i ( ω i | α ( k ) ) d ω i , i = 1 , , B . The minorizing function Q 1 ( α | α ( k ) ) separates parameters θ and ( β , Λ 0 ) into (6) and (7), respectively. In the second M-step, the updated estimates of θ involve maximizing (6) numerically. Due to the presence of the nonparametric component Λ 0 , updating ( β , Λ 0 ) is still a big challenge. Following [7], we apply the profile estimation method to profile out Λ 0 in Q 12 ( β , Λ 0 | α ( k ) ) , which results in the estimate of Λ 0 given β .
d Λ ^ 0 ( t i j ) = δ i j r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) exp ( X r s β ) .
Substituting (8) into Q 12 ( β , Λ 0 | α ( k ) ) , we obtain the following:
Q 13 ( β | α ( k ) ) = i = 1 B j = 1 M i δ i j X i j β δ i j log r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) exp ( X r s β ) ,
which involves β only. In Equation (9), Newton’s method and large matrix inversion are required to update β in this procedure when there exist a large number of covariates. Here, we further construct minorizing functions for Q 13 ( β | α ( k ) ) to separate the regression parameters β 1 , , β q from each other under the MM principle. We first use the supporting hyperplane inequality:
log ( x ) log ( x 0 ) x x 0 x 0
to minorize Q 13 ( β | α ( k ) ) ; then, we have the following surrogate function:
Q 14 ( β | α ( k ) ) = i = 1 B j = 1 M i δ i j X i j β δ i j r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) exp ( X r s β ) r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) exp ( X r s β ( k ) ) + c ,
where c is a constant not depending on β . We apply Jensen’s inequality to the concave function exp ( · ) in Q 14 ( β | α ( k ) ) by rewriting the following:
X r s β = p = 1 q π p r s [ π p r s 1 X p r s ( β p β p ( k ) ) + X r s β ( k ) ] ,
where π p r s = | X p r s | / p = 1 q | X p r s | . Finally, the minorizing function for Q 14 ( β | α ( k ) ) is as follows:
Q 15 ( β 1 , , β q | α ( k ) ) = ^ p = 1 q Q 15 p ( β p | α ( k ) ) ,
where the following is the case.
Q 15 p ( β p | α ( k ) ) = i = 1 B j = 1 M i { δ i j X p i j β p δ i j r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) π p r s exp π p r s 1 X p r s ( β p β p ( k ) ) + X r s β ( k ) r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) exp ( X r s β ( k ) ) } , p = 1 , , q .
In general, the structure of minorizing function for the objective log-likelihood function ( α | Y o b s ) in (3) is the following:
Q p r o ( θ , β | α ( k ) ) = Q 11 ( θ | α ( k ) ) + p = 1 q Q 15 p ( β p | α ( k ) ) ,
with an explicit form update of d Λ 0 by (8). We may observe that the objective function to be maximized is decomposed into a sum of q + 1 univariate functions from (12) as Q 11 ( θ | α ( k ) ) only consists of one parameter usually. The next maximization step of this MM algorithm only involves q + 1 separate univariate optimizations and matrix inversion is unnecessary. Note that the success of the above profile MM algorithm requires the convergence of two improper integrals such as W ω i · v i ( ω i | α ( k ) ) d ω i and W log [ f ( ω i | θ ) ] · v i ( ω i | α ( k ) ) d ω i . Moreoveer, the convergences of these improper integrals are obvious when we assume the distribution of random effect comes from an exponential distribution family. The estimation proceeds by profile MM algorithm are summarized as follows:
Step 1.
Given initial values for θ , β , and Λ 0 ;
Step 2.
Update θ via (6). Update each β p via (11) for p = 1 , , q ;
Step 3.
Based on the update of β , compute the estimates of Λ 0 ( t i j ) via (8);
Step 4.
Iterate steps 2 and 3 until convergence.

2.4. Non-Profile MM Estimation Procedure

In this subsection, we bypass the profile estimation procedure in previous subsection and continue to develop new MM procedures for Equation (7) to separate parameters β and nuisance baseline hazard rate Λ 0 . Actually, to separate β and Λ 0 of (7) is to deal with the last term Λ 0 ( t i j ) exp ( X i j β ) . As in [26], we use the following arithmetic-geometric mean inequality.
i = 1 n x i a i i = 1 n a i | | a | | 1 x i | | a | | 1
Here, x i and a i are non-negative. Choosing x 1 = Λ 0 ( t i j ) / Λ 0 ( k ) ( t i j ) and x 2 = exp ( X i j β ) / exp ( X i j β ( k ) ) in inequality (13), we obtain the following surrogate function for (7):
Q 2 ( β , Λ 0 | α ( k ) ) = i = 1 B j = 1 M i δ i j log ( λ 0 ( t i j ) ) + δ i j X i j β A i ( k ) exp ( X i j β ( k ) ) 2 Λ 0 ( k ) ( t i j ) Λ 0 ( t i j ) 2 A i ( k ) Λ 0 ( k ) ( t i j ) 2 exp ( X i j β ( k ) ) exp ( 2 X i j β ) , = ^ Q 21 ( Λ 0 | α ( k ) ) + Q 22 ( β | α ( k ) )
where the following is the case.
Q 21 ( Λ 0 | α ( k ) ) = i = 1 B j = 1 M i δ i j log ( λ 0 ( t i j ) ) A i ( k ) exp ( X i j β ( k ) ) 2 Λ 0 ( k ) ( t i j ) Λ 0 ( t i j ) 2 ,
Q 22 ( β | α ( k ) ) = i = 1 B j = 1 M i δ i j X i j β A i ( k ) Λ 0 ( k ) ( t i j ) 2 exp ( X i j β ( k ) ) exp ( 2 X i j β ) .
In order to obtain the nonparametric estimate of Λ 0 , the maximization of (15) is required. For ease of computation, a one-step late skill is applied to the first order derivative of Equation (15); then, we obtain the estimate of λ 0 by the following:
d Λ ^ 0 ( t i j ) = δ i j r = 1 B s = 1 M r I ( t r s t i j ) A r ( k ) exp ( X r s β ( k ) ) ,
which is same as (8) in the profile estimation method. To update β , a similar technique as dealing with Q 14 ( β | α ( k ) ) is used. We apply Jensen’s inequality to the concave function exp ( · ) in Q 22 ( β | α ( k ) ) by rewriting the following:
2 X i j β = p = 1 q π p i j 2 π p i j 1 X p i j ( β p β p ( k ) ) + 2 X i j β ( k ) ,
where π p i j = | X p i j | / p = 1 q | X p i j | . In the end, the minorizing function for Q 22 ( β | α ( k ) ) is as follows:
Q 23 ( β 1 , , β q | α ( k ) ) = ^ p = 1 q Q 23 p ( β p | α ( k ) ) ,
where the following is obtained:
Q 23 p ( β p | α ( k ) ) = i = 1 B j = 1 M i δ i j X p i j β p π p i j A i ( k ) Λ 0 ( k ) ( t i j ) exp 2 π p i j 1 X p i j ( β p β p ( k ) ) + 2 X i j β ( k ) 2 exp ( X i j β ( k ) ) ,
for p = 1 , , q . As a result, we construct the surrogate function for the objective log-likelihood function via a non-profile MM principle as follows:
Q n o n p r o ( θ , β | α ( k ) ) = Q 11 ( θ | α ( k ) ) + p = 1 q Q 23 p ( β p | α ( k ) ) ,
with explicit form update of d Λ 0 by (17). From (20), we can find similar nice features as Q p r o ( θ , β | α ( k ) ) ; that is, Q n o n p r o ( θ , β | α ( k ) ) is a sum of q + 1 univariate functions, which means that the next maximization (second M) step only involves q + 1 simple univariate optimizations. It is worth noting that the parameter separated feature in the proposed profile MM and non-profile MM algorithms will help incoporate with the existing simple off-the-shelf accelerators well and brings about great effectiveness in computation time, as discussed in [25]. The estimation proceeds by non-profile MM algorithm are summarized as follows:
Step 1.
Given initial values of θ , β , and Λ 0 ;
Step 2.
Update the estimate of θ via (6). Update the estimate of β p based on (19) for p = 1 , , q ;
Step 3.
Using the updated estimate of β , compute the estimates of Λ 0 ( t i j ) via (17);
Step 4.
Iterate steps 2 and 3 until convergence.

3. Regularized Estimation Methods via MM Methods

Followed by Section 3, both proposed estimation approaches created nice parameter-separated surrogate functions to be maximized which may incorporate with regularization cleverly. In this part, we propose regularized estimation approaches based on MM algorithms in regression analysis under general shared frailty model. Many variable selection criteria arise as special cases of the general formulation as discussed in [20], where the penalized likelihood function takes the form
P ( α | Y o b s ) = ( θ , β , Λ 0 | Y o b s ) N p = 1 q P ( | β p | , λ ) ,
where ( θ , β , Λ 0 | Y o b s ) is the log-likelihood function for the shared frailty model, q is the dimension of β , P ( · , λ ) is a given nonnegative penalty function, and λ 0 is a tuning parameter, which is allowed to use λ p in more general cases. This penalty shrinks some of the coefficients to zero. Under the scope of general frailty topic, the computation of MLEs is extremely complicated and hard in terms of accuracy as parameters of interests involve three blocks θ ,   β , and Λ 0 , and even more complicated when there exists a large numbers of coefficients. It is worth mentioning that the proposed profile and non-profile MM algorithms decomposed the coefficient vector β from the other two blocks θ , and Λ 0 and different coefficient parameters are separated from each other. This nice feature of the proposed profile and non-profile MM algorithms may mesh well with various regularization problems in (21) to produce more sparse and accurate estimates. Under profile MM algorithmic technique in (12), we obtain the corresponding minorization function for (21) as follows.
Q p r o ( θ , β | α ( k ) ) N p = 1 q P ( | β p | , λ ) .
Under the non-profile MM algorithmic technique in (20), the minorization function for (21) is as follows.
Q n o n p r o ( θ , β | α ( k ) ) N p = 1 q P ( | β p | , λ ) .
When P ( · , λ ) is piecewise differentiable, nondecreasing, and concave on ( 0 , ) such as L 1 , MCP and SCAD penalties [27], explored a connection between the local quadratic approximation with MM algorithm. The penalty term P ( · , λ ) can be minorized by a local quadratic approximation form as follows:
P ( | β p | , λ ) P ( | β p ( k ) | , λ ) [ β p 2 ( β p ( k ) ) 2 ] P ( | β p ( k ) | + , λ ) 2 | β p ( k ) | = ^ Φ ( β p | β p ( k ) ) ,
which is actually a one-step minorizing process. By combining function (24) with (12) or (20), respectively, we obtain the final surrogate functions for penalized likelihood function (21) as follows:
Q p r o P ( θ , β | α ( k ) ) = Q p r o ( θ , β | α ( k ) ) N p = 1 q β p 2 · P ( | β p ( k ) | + , λ ) 2 | β p ( k ) | + c 1 ,
and the following is obtained.
Q n o n p r o P ( θ , β | α ( k ) ) = Q n o n p r o ( θ , β | α ( k ) ) N p = 1 q β p 2 · P ( | β p ( k ) | + , λ ) 2 | β p ( k ) | + c 2 .
Both Equations (25) and (26) are written as a sum of a series of univariate functions so that maximizing (25) and (26) will be easier than directly maximizing (21). Moreover, some simple off-the-shelf accelerators may also be used here to make the optimization problems more simplier and efficient. Regularized estimation proceeds by (profile/non-profile) MM algorithms and are summarized as follows:
Step 1.
Given initial values of θ , β and Λ 0 ;
Step 2.
Update the estimate of θ via (6);
Step 3.
For profile MM method, update β by maximizing
p = 1 q Q 15 p ( β p | α ( k ) ) N p = 1 q β p 2 · P ( | β p ( k ) | + , λ ) 2 | β p ( k ) | .
For non-profile MM method, update β by maximizing
p = 1 q Q 23 p ( β p | α ( k ) ) N p = 1 q β p 2 · P ( | β p ( k ) | + , λ ) 2 | β p ( k ) | ;
Step 4.
Using the updated estimate of β in Step 3, compute the estimates of Λ 0 ( t i j ) via (8) for profile MM method and via (17) for non-profile MM method, respectively;
Step 5.
Iterate steps 2 to 4 until convergence.

Model Selection

From the recent literature, tuning parameter λ may be selected by multiple data driven model selection criteria, such as Bayesian information criterion BIC ([28]) and generalized cross-validation GCV ([29]). In this paper, we consider a widely used BIC-type criterion, defined by the following:
B I C λ = 2 ( α ^ ) + C n ( S ^ + 1 ) log ( N ) ,
to select the tuning parameter λ , where C n = max { 1 , log [ log ( q + 1 ) ] } , q is the dimension of β , and the degrees of freedom S ^ are defined as the number of nonzero parameters in β ^ .

4. Theoretical Properties

We first consider the convergence properties for the profile and non-profile MM algorithms of maximizing ( α ) and P ( α ) . As discussed in Section 3 and Section 4, Q p r o ( θ , β | α ( k ) ) and Q n o n p r o ( θ , β | α ( k ) ) are the constructed minorizing functions of ( α ) via the profile and non-profile MM approach, Q p r o P ( θ , β | α ( k ) ) and Q n o n p r o P ( θ , β | α ( k ) ) are the constructed minorizing functions of penalized likelihood function P ( α ) via the profile and non-profile MM approach, θ and β are the parameter vectors, and α ( k ) is its current estimate. In this paper, we assume that the common component Q 11 ( θ | α ( k ) ) of all minorizing functions is strictly concave with respect to θ , and the second M step of MM principle is based on the Newton–Raphson method. As strict concavity holds for all surrogate functions Q p r o ( · | α ( k ) ) , Q n o n p r o ( · | α ( k ) ) , Q p r o P ( · | α ( k ) ) and Q n o n p r o P ( · | α ( k ) ) by using concave penalties L 1 , MCP and SCAD, we denote their unique maximizers by M p r o ( α ( k ) ) , M n o n p r o ( α ( k ) ) , M p r o P ( α ( k ) ) , and M n o n p r o P ( α ( k ) ) , respectively. Following Proposition 15.4.3 of Lange [30], we provide the following convergence properties.
Proposition 1.
Assume the differentiability and coerciveness of ( α ) hold, all stationary points of ( α ) are isolated and the subsets { α Ω : ( α ) ( α ( k ) ) } of parameter domain Ω are compact. Then, the profile iteration sequence ( θ ( k + 1 ) , β ( k + 1 ) ) = M p r o ( α ( k ) ) together with Λ 0 ( k + 1 ) in (8) and non-profile sequence ( θ ( k + 1 ) , β ( k + 1 ) ) = M n o n p r o ( α ( k ) ) together with Λ 0 ( k + 1 ) in (17) converge to the stationary point of ( α ) . If the strict concavity of ( α ) also hold, then the profile MM sequence and non-profile MM sequence converge to the same maximum point of ( α ) .
Proposition 2.
Suppose P ( · | λ ) is piecewise differentiable, nondecreasing and concave on ( 0 , ) , continuous at 0 and P ( 0 + | λ ) < . Then, for all β p 0 , Φ ( β q | β q ( k ) ) as defined in (24) minorizes P ( | β q | | λ ) at points ± | β q | . In particular, conditions in construction minorization function hold, that is, P ( | β q | | λ ) Φ ( β q | β q ( k ) ) for all β q with equality at β q = β q ( k ) and the ascent property P ( | β q ( k + 1 ) | | λ ) Φ ( β q ( k + 1 ) | β q ( k ) ) Φ ( β q ( k ) | β q ( k ) ) = P ( | β q ( k ) | | λ ) hold.
Based on Proposition 2, we can easily obtain the following convergence properties of profile MM and non-profile MM algorithms for maximizing penalized likelihood function P ( α ) defined in (21).
Proposition 3.
Assume the differentiability and coerciveness of P ( α ) hold, all stationary points of P ( α ) are isolated and the subsets { α Ω : P ( α ) P ( α ( k ) ) } of parameter domain Ω are compact. Then, the profile iteration sequence ( θ ( k + 1 ) , β ( k + 1 ) ) = M p r o P ( α ( k ) ) together with Λ 0 ( k + 1 ) in (8) and non-profile sequence ( θ ( k + 1 ) , β ( k + 1 ) ) = M n o n p r o P ( α ( k ) ) together with Λ 0 ( k + 1 ) in (17) converge to the stationary point of P ( α ) . If the strict concavity of P ( α ) also hold, then the profile MM sequence and non-profile MM sequence converge to the same maximum point of P ( α ) .

5. Numerical Examples

Example 1.
We independently simulate data from three different frailty models:
λ ( t | X i j , ω i ) = ω i λ 0 ( t ) exp { X i j β } , ω i L o g n o r m a l ( 0 , θ ) , θ = 0.25 , I n v e r s e G a u s s i a n ( θ , θ 2 ) , θ = 1 , G a m m a ( 1 / θ , 1 / θ ) , θ = 2 ,
with three different sample sizes ( B , M ) = { ( 15 , 20 ) , ( 30 , 13 ) , ( 50 , 10 ) } . The true value of regression vector β is set to be ( 2 6 , 1 6 , 1 6 , 2 6 , 3 6 ) with dimension q = 30 and all X i ’s are generated from independent uniform distribution between 0 and 0.5 . The censoring times are generated from independent uniform distribution to yield censoring proportion at around 15% or 30%. In this example, we numerically illustrate the efficiency of two proposed profile and non-profile MM algorithms under three different (Log-normal, Inverse Gaussian, and Gamma) frailty models with three different sample sizes at two censoring situations. Furthermore, we compare the performance of the two proposed profile and non-profile MM algorithms with the existing estimation approach by coxph function of the survival R package under gamma and log-normal frailty models since the coxph function of the survival R package only allows gamma and log-normal frailty. The computation time of MM algorithm can be improved by using simple off-the-shelf accelerators ([31,32]); here, we implement the accelerated profile MM and non-profile MM algorithms with the squared iterative method (SqS1). We set the stopping criterion of iteration as follows.
( α ( k + 1 ) | Y o b s ) ( α ( k ) | Y o b s ) ( α ( k ) | Y o b s ) + 1 < 10 6 .
Based on 500 replications, the average values of estimated frailty and regression parameters (MLE), their biases (Bias), their empirical standard deviations (SD), and run times (T) based on three estimation methods are summarized in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6. In general, with a sample size increase, the biases and empirical standard deviations of almost all parameters become smaller. Small sample size ( B , M ) = ( 15 , 20 ) always causes obvious biases for frailty parameters in Log-normal and Inverse Gaussian frailty models because the number of clusters is too small. Moreover, a larger censoring proportion usually results in greater empirical standard deviations for almostly all cases. From Table 1, Table 2, Table 3 and Table 4, we also observe that the existing estimation approach using coxph function is the fastest among the three methods since the survival R package is optimised using the C language. In terms of estimation accuracy, the non-profile MM algorithm performs the best for both frailty and regression parameters, almost always exhibiting smallest biases and empirical standard deviations in all situations. Even in a small sample size such as ( B , M ) = ( 15 , 20 ) , the non-profile MM estimates for frailty parameter still perform well, especially for Log-normal frailty and Gamma frailty models.
Example 2.
In this example, we simulated 200 realizations consisting of B = 50 clusters and M = 6 subjects in each cluster from the frailty model:
λ ( t | X i j , ω i ) = ω i λ 0 ( t ) exp { X i j β } ,
where the frailty terms are ω i G a m m a ( 1 / θ , 1 / θ ) with θ = 0.5 for i = 1 , , B , β = ( 1 , 3 , 0 46 , 2 , 4 ) , q = 50 , λ 0 ( t ) = 5 . The X i were marginally standard normal, and the correlation between X i and X j is ϱ | i j | with ϱ = 0.25 , 0.75 , respectively. The censoring times were generated from uniform distribution to yield a censoring proportion around 30 % . In this simulation experiment, the utility of our proposed profile MM and non-profile MM estimation methods for regularized estimation was illustrated in a sparse high-dimensional regression model (21) with three different penalties ( L 1 , MCP, and SCAD). The model error (ME) and relative model error (RME), which is the ratio of the model error of the regularized estimator and that of the ordinary maximum likelihood estimator, are calculated to evaluate the estimation accuracy. Based on 200 replications, we report the median of relative model errors (MRME) and the average number of correctly and incorrectly identified zero coefficients in Table 7 in which the column labeled “Correct" presents the average restricted only to the true zero coefficients, while the column labeled “Incorrect" depicts the average of coefficients erroneously set to 0. From Table 4, we can observe that the proposed profile and non-profile MM algorithms mesh very well with MCP and SCAD penalties and provide good results in both parameter estimation and variable selection, especially for MCP penalty. For L 1 penalty, we observe that it meshes well with profile and non-profile MM algorithms under lower correlation case at ϱ = 0.25 but tends to yield biased estimates and inaccurate variable selection results when the correlation between X i and X j becomes stronger. Furthermore, we report the average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties based on 200 repetition in Table 8. It can be observed that both profile and non-profile MM methods perform similarly well in varying ϱ and different penalties, always showing small biases and empirical standard deviations.

6. Real Data Analysis

Alzheimer Disease’s (AD) is the most common dementia that causes progressive memory and other body function losses. According to Alzheimer Association, Alzheimer’s is the sixth leading cause of death in the United States. Many researches have studied the development of AD from Mild Cognitive Impairment (MCI). The authors of [33] analyzed the AUC score of whether MCI will transfer to AD using LASSO penalized logistic regression. The authors of [34] applied cognitive scores, ApoE genotype, and CSF biomarkers to predict the transition time from MCI to AD. Moreover, there are research studies such as [35] that used image data to analyze the transition time to dementia. In this paper, we will apply clinical data and selected SNP genotypes to conduct feature selection under three different frailty models (Gamma, Inverse Gaussian, and Log-normal) using SCAD and MCP penalties. The dataset was obtained from ADNI database (adni.loni.usc.edu). In this dataset, 267 people were recorded as cognitively normal (CN) during the first visit. Among these people, 78 of them were diagnosed with MCI before the last visit. Eventually, we observed 22 people transferred from the MCI stage to dementia. To predict the these two conversion times, 19 clinical predictors were applied mainly based on the age, marriage status, education level, and test score wuch as Alzheimer’s Disease Assessment Scale cognitive subscale (ADAS-Cog) and Functional Activities Questionnaire (FAQ). Other than these predictors, we also included genotypes of SNP from GWAS such as ApoE- ε 4 , which have a relationship with early onset dementia or late-onset dementia; 132 covariates are selected for model training.
In general, using the same notation from simulation, we have ( B , M ) = ( 276 , 2 ) . Individuals are independent, and two events of the same individual are grouped into the same cluster that share the same frailty. The dataset contains the following information. For individual ( i = 1 , 2 , , 276 ) and event ( j = 1 , 2 ), t i j is minimum value of event time Y i j and censoring time C i j . As shown by Table 9, censoring time C i j in this dataset is the time difference between the first observation date of certain event (CN or MCI) and last observation date of this patient. The event time is the state transition time, and Y i 1 is the time difference between the first observation date of this patient in state CN and date of transition from CN to MCI. If no transition is observed, censoring occurs where δ i 1 = 0 and t i 1 = C i 1 , otherwise, δ i 1 = 1 and t i 1 = Y i 1 . Similarly, Y i 2 is the time difference between the first observation date of this patient in state MCI and the date of transition from MCI to Dementia. If no transition is observed, δ i 2 = 0 and t i 2 = C i 2 , otherwise, δ i 2 = 1 and t i 2 = Y i 2 . X i j = ( X i j 1 , , X i j p ) denote the potential covariates (such as SNP genotypes and clinical predictors) where p = 132 . According to data description, it is reasonable to assume that censoring time C i j is independent of event time Y i j . Moreover, the two events from each individual are considered to be highly associated due to common genetic factors and/or certain habits. Thus, the frailty model is suitable for analyzing this dataset.
Three frailty models are applied for variable selection using SCAD and MCP penalties. According to the results presented by Table 10, Table 11 and Table 12, around 70 covariates were selected for Gamma Frailty and Log-normal Frailty model, and 27 covariates were selected by an Inverse Gaussian model. Similar covairates were selected by SCAD or MCP penalty for both Profile MM and Non-profile MM under the same model, which verifies the result from simulation study. For selected covariates, the ApoE- ε 4 is selected as a negative effect to the survival time as expected. In addition to ApoE- ε 4 , single SNPs such as rs2333227 and rs669 from the MPO and A2M gene are associated with the development of Alzheimer’s Disease. We can observe similar results from [36]. However, many selected SNPs are not recognized from other studies, and it needs to be analyzed further whether these SNPs can help to identify the risk of Alzheimer’s.
We use the model with the lowest BIC score for prediction by constructing prognostic index in order to see whether the selected covariates can help identify the individual with low risk or high risk. Let X be a collection of X i j . The full dataset is divided into 10 parts ( X ( k ) ( k = 1 , 2 , , 10 ) ). Similarly to the cross-validation process, without the part of dataset X ( k ) , covariates are selected using BIC criteria and the vector of parameters β ^ k is estimated correspondingly. The risk factor r ^ k = X ( k ) β ^ k ( k = 1 , 2 , , 10 ) is calculated using dataset X ( k ) without considering the individuals’ Latent variable. Individuals with estimated risk factors below median are classified into to low risk group, and and individual with estimated risk above median is classified into the high risk group. The survival plots for estimated high risk group and low risk group are constructed based on their true survival times. Figure 1 shows that the selected model has conducted a good estimation for the transition time from stage CN to stage MCI. Due to the lack of data for the second transition, their confidence intervals are overlapped. Therefore, the prediction result, especially from stage CN to stage MCI, has shown good estimation using the corresponding model, and two risk groups are distinguished by predicted risk levels.

7. Discussion

The profile and non-profile MM algorithms are proposed for high-dimension regression analysis with clustered failure time data where a general frailty is used to model within-cluster dependence and the penalty such as the SCAD and MCP is used for inducing sparsity. The proposed methods can separate the high-dimensional minorizing function into a sum of univariate functions after a sequence of minorization steps. These approaches avoid matrix inversion and provide a toolkit for developing more efficient algorithms in a broad range of in statistical optimization problems. Meshing well with sparsity penalties such as SCAD or MCP, the two regularized MM algorithms are further shown to exhibit certain numerical advantages in sparse high-dimensional regression analyses. The shared frailty model only represents a special and relatively simple model among the widely used frailty models for multivariate survival data. For example, the standard shared frailty model assumes that all subjects in the same cluster share a common frailty. This assumption can be relaxed to the correlated frailty terms among subjects in the same cluster. Correlated frailty models present the limitation that shared frailty models may only be used to fit positively correlated event times. Furthermore, frailty is assumed to be time-constant. However, unobserved heterogeneity may also be time dependent, which can be explained by an unobserved random process that unfolds over time. Based on this idea, several approaches have been proposed such as diffusion processes modeling or Levy processes modeling for frailty. As an approach based on birth-death Poisson or simpler, piecewise constant, frailty models have recently been proposed. It would be worthwhile to extend the proposed two MM algorithms in these applications. Lastly, spatially correlated survival data present another important and useful setting where the proposed MM algorithms can be further extended to accommodate.

Author Contributions

Conceptualization, J.X.; Data curation, X.H., J.X. and Y.Z.; Formal analysis, X.H. and Y.Z.; Investigation, Y.Z.; Methodology, X.H. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the Editor and referees for many helpful comments. This work has been partially supported by the National Natural Science Foundation of China (11901515).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Clayton, D.G. A model for association in bivariate life tables and its application in epidemiologic studies of familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [Google Scholar] [CrossRef]
  2. Clayton, D.G.; Cuzick, J. Multivariate generalizations of the proportional hazards model. J. R. Stat. Soc. Ser. 1985, 148, 82–117. [Google Scholar] [CrossRef]
  3. Oakes, D. Bivariate survival models induced by frailties. J. Am. Stat. Assoc. 1989, 84, 487–493. [Google Scholar] [CrossRef]
  4. Zeng, D.; Chen, Q.; Ibrahim, J. Gamma frailty transformation models for multivariate survival times. Biometrika 2009, 96, 277–291. [Google Scholar] [CrossRef] [PubMed]
  5. Andersen, P.K.; Klein, J.P.; Knudsen, K.M.; Palacios, R.T. Estimation of variance in Cox’s regression model with shared gamma frailties. Biometrics 1997, 53, 1475–1484. [Google Scholar] [CrossRef] [PubMed]
  6. Cox, D.R. Regression models and life tables (with discussion). J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar]
  7. Klein, J.P. Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 1992, 48, 795–806. [Google Scholar] [CrossRef]
  8. Nielsen, G.G.; Gill, R.D.; Andersen, P.K.; Sorensen, T.I.A. A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Stat. 1992, 19, 25–43. [Google Scholar]
  9. Balan, T.A.; Putter, H. A tutorial on frailty models Stat. Methods Med Res. 2020, 29, 3424–3454. [Google Scholar] [CrossRef]
  10. Do Ha, I.; Lee, Y. A review of h-likelihood for survival analysis. Jpn. J. Stat. Data Sci. 2021, 4, 1157–1178. [Google Scholar]
  11. Duchateau, L.; Janssen, P. The Frailty Model; Springer Science & Business Media: New York, NY, USA, 2007. [Google Scholar]
  12. Glidden, D.V.; Vittinghoff, E. Modelling clustered survival data from multicentre clinical trials. Stat. Med. 2004, 23, 369–388. [Google Scholar] [CrossRef] [PubMed]
  13. Dahlqwist, E.; Pawitan, Y.; Sjölander, A. Regression standardization and attributable fraction estimation with between-within frailty models for clustered survival data. Stat. Methods Med Res. 2019, 28, 462–485. [Google Scholar] [CrossRef] [PubMed]
  14. Manda, S.O. A nonparametric frailty model for clustered survival data. Commun. Stat. Methods 2011, 40, 863–875. [Google Scholar] [CrossRef]
  15. Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic Decomposition by Basis Pursuit. Siam J. Sci. Comput. 1998, 20, 33–61. [Google Scholar] [CrossRef]
  16. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  17. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  18. Frank, L.E.; Friedman, J.H. A Statistical View of Some Chemometrics Regression Tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
  19. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  20. Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  21. Zhang, C.-H. Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  22. Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
  23. Lange, K.; Hunter, D.R.; Yang, I. Optimization transfer using surrogate objective functions (with discussions). J. Comput. Graph. Stat. 2000, 9, 1–20. [Google Scholar]
  24. Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med. Res. 1997, 6, 38–54. [Google Scholar] [CrossRef] [PubMed]
  25. Huang, X.F.; Xu, J.F.; Tian, G.L. On profile MM algorithms for gamma frailty survival models. Stat. Sin. 2019, 29, 895–916. [Google Scholar] [CrossRef] [Green Version]
  26. Lange, K.; Zhou, H. MM algorithms for geometric and signomial programming. Math. Program. Ser. 2014, 143, 339–356. [Google Scholar] [CrossRef] [Green Version]
  27. Hunter, D.R.; Li, R. Variable selection using MM algorithms. Ann. Stat. 2005, 33, 1617–1642. [Google Scholar] [CrossRef] [Green Version]
  28. Schwarz, C. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  29. Craven, P.; Wahba, G. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 1979, 31, 377–403. [Google Scholar] [CrossRef]
  30. Lange, K. Numerical Analysis for Statisticians, 2nd ed.; Statistics and Computing; Springer: New York, NY, USA, 2010. [Google Scholar]
  31. Varadhan, R.; Roland, C. Simple and globally convergent methods for accelerating the convergence of any EM algorithms. Scand. J. Stat. 2008, 35, 335–353. [Google Scholar] [CrossRef]
  32. Zhou, H.; Alexander, D.; Lange, K. A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 2011, 21, 261–273. [Google Scholar] [CrossRef] [Green Version]
  33. Ye, J.P.; Farnum, M.; Yang, E.; Verbeeck, R.; Lobanov, V.; Raghavan, V.; Novak, G.; DiaBernardo, A.; Narayan, V.A. Sparse learning and stability selection for predicting mci to ad conversion using baseline adni data. BMC Neurol. 2012, 12, 46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Da, X.; Toledo, J.; Toledo, J.; Zee, J.; Wolk, D.A.; Xie, S.X.; Ou, Y.; Shacklett, A.; Parmpi, P.; Shaw, L.; et al. Integration and relative value of biomarkers for prediction of mci to ad progression: Spatial patterns of brain atrophy, cognitive scores, apoe genotype, and csf markers. Neuroimage Clin. 2014, 4, 164–173. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Kong, D.H.; Ibrahim, J.G.; Lee, E.; Zhu, H.T. Flcrm: Functional linear cox regression model. Biometrics 2018, 74, 109–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Zappia, M.; Manna, I.; Serra, P.; Cittadella, R.; Andreoli, V.; La Russa, A.; Annesi, F.; Spadafora, P.; Romeo, N.; Nicoletti, G.; et al. Increased risk for alzheimer disease with the interaction of mpo and a2m polymorphisms. Arch. Neurol. 2004, 61, 341–344. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Survival curves of the transition times from CN to MCI and from MCI to AD for high-risk and low risk groups constructed using prognostic index.
Figure 1. Survival curves of the transition times from CN to MCI and from MCI to AD for high-risk and low risk groups constructed using prognostic index.
Mathematics 10 00538 g001
Table 1. Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 15% censoring.
Table 1. Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 15% censoring.
Par.Scenario 1: ( B , M ) = ( 15 , 20 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 8.5716 5.7928 0.0586
MLEBiasSDMLEBiasSDMLEBiasSD
θ 0.34380.09380.33590.29250.04250.31730.29550.04550.3458
β 1 −2.2157−0.21570.5034−2.0969-0.09690.5175−2.1697−0.16970.5842
β 5 −2.1316−0.13160.5725−2.1392-0.13920.5567−2.2005−0.20050.5747
β 10 −1.0976−0.09760.5523−1.0863−0.08630.5232−1.0854−0.08540.5794
β 15 1.07200.07200.51601.01260.01260.51851.12320.12320.5392
β 20 2.18710.18710.55842.09660.09660.52862.05600.05600.5217
β 25 3.26190.26190.55763.14120.14120.54553.23960.23960.6022
β 30 3.21560.21560.57753.15070.15070.54683.27810.27810.5669
Par.Scenario 2: ( B , M ) = ( 30 , 13 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 5.7458 7.5817 0.0893
MLEBiasSDMLEBiasSDMLEBiasSD
θ 0.32220.07220.30060.25200.00200.10990.28130.03130.1065
β 1 −2.1110−0.11100.4399−2.0930−0.09300.4531−2.0783−0.07830.4743
β 5 −2.1236−0.12360.4634−2.1208−0.12080.4615−2.1232−0.12320.4384
β 10 −1.0788−0.07880.4491−1.0528−0.05280.4517−1.0598−0.05980.4877
β 15 1.05070.05070.44670.9944−0.00560.44161.04570.04570.4539
β 20 2.06630.06630.46792.04210.04210.46652.14150.14150.4872
β 25 3.17340.17340.48243.05860.05860.44993.20540.20540.4726
β 30 3.18330.18330.45743.06660.06660.43713.12790.12790.4558
Par.Scenario 3: ( B , M ) = ( 50 , 10 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 6.2193 11.4968 0.1281
MLEBiasSDMLEBiasSDMLEBiasSD
θ 0.26830.01830.24270.2478−0.00220.08620.28660.03660.0884
β 1 −2.0630−0.06300.3870−2.0491−0.04910.3831−2.0548−0.05480.4021
β 5 −2.0668−0.06680.3975−2.0430−0.04300.3903−2.0340−0.03400.3716
β 10 −1.0286−0.02860.3875−1.0327−0.03270.3799−1.0583−0.05830.4058
β 15 1.06130.06130.40011.01310.01310.39151.01360.01360.3867
β 20 2.09460.09460.39842.03150.03150.38532.06970.06970.4497
β 25 3.15350.15350.41043.07390.07390.39733.11780.11780.4147
β 30 3.12660.12660.41873.03780.03780.40883.12160.12160.3875
Table 2. Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 30% censoring.
Table 2. Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 30% censoring.
Par.Scenario 1: ( B , M ) = ( 15 , 20 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 4.8084 5.7248 0.0541
MLEBiasSDMLEBiasSDMLEBiasSD
θ 0.27120.02120.14460.2491−0.00090.14340.30220.05220.1507
β 1 −2.2181−0.21810.6137−2.1211−0.12110.5712−2.1928−0.19280.6084
β 5 −2.1717−0.17170.6270−2.1431−0.14310.6230−2.2133−0.21330.5920
β 10 −1.1136−0.11360.6001−1.0786−0.07860.5881−1.0936−0.09360.6158
β 15 1.09260.09260.61171.04520.04520.56031.04990.04990.6049
β 20 2.22110.22110.62342.04230.04230.56672.20680.20680.6567
β 25 3.26670.26670.61563.13690.13690.61923.24810.24810.5558
β 30 3.24900.24900.61923.14520.14520.63553.25990.25990.5502
Par.Scenario 2: ( B , M ) = ( 30 , 13 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 5.56726 7.7258 0.0818
MLEBiasSDMLEBiasSDMLEBiasSD
θ 0.31890.06890.92710.2496−0.00040.11500.27490.02490.1077
β 1 −2.0826−0.08260.4974−2.0987−0.09870.4898−2.0761−0.07610.5023
β 5 −2.1135−0.11350.5391−2.0965−0.09650.4950−2.0974−0.09740.5200
β 10 −1.0764−0.07640.5065−1.0743−0.07430.5180−1.0510−0.05100.5055
β 15 1.06540.06540.51371.04330.04330.50281.07390.07390.5766
β 20 2.12500.12500.50332.05250.05250.49332.10290.10290.5137
β 25 3.17310.17310.52363.08210.08210.51053.13160.13160.5783
β 30 3.19660.19660.52073.07160.07160.50343.22360.22360.5268
Par.Scenario 3: ( B , M ) = ( 50 , 10 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 4.7591 8.4607 0.1188
MLEBiasSDMLEBiasSDMLEBiasSD
θ 0.27290.02290.42360.2434−0.00660.08650.27920.02920.0966
β 1 −2.0917−0.09170.3814−2.0714−0.07140.3741−2.1494−0.14940.4376
β 5 −2.0692−0.06920.4076−2.0495−0.04950.3990−2.0617−0.06170.4617
β 10 −1.0320−0.03200.3822−1.0341−0.03410.3725−1.0714−0.07140.4174
β 15 1.06060.06060.39281.01630.01630.38431.09910.09910.3947
β 20 2.05300.05300.38511.9865−0.01350.37732.07800.07800.4779
β 25 3.10580.10580.40003.01580.01580.38553.15310.15310.4395
β 30 3.10710.10710.38493.01820.01820.37523.12500.12500.4593
Table 3. Simulation results for Gamma frailty model in Example 1 with different sample sizes at 15% censoring.
Table 3. Simulation results for Gamma frailty model in Example 1 with different sample sizes at 15% censoring.
Par.Scenario 1: ( B , M ) = ( 15 , 20 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 22.3576 18.1613 0.0846
MLEBiasSDMLEBiasSDMLEBiasSD
θ 2.05840.05840.64942.10280.10280.70662.11970.11970.7023
β 1 −2.1343−0.13430.5452−2.1155−0.11550.4974−2.1027−0.10270.6415
β 5 −2.1700−0.17000.5426−2.1303−0.13030.4923−2.1633−0.16330.5354
β 10 −1.0561−0.05610.5395−1.0550−0.05500.5505−1.0768−0.07680.6086
β 15 1.05150.05150.57491.00730.00730.50461.08830.08830.6006
β 20 2.13850.13850.5422.13110.13110.56392.16850.16850.5608
β 25 3.17020.17020.58153.19280.19280.58933.21400.21400.5916
β 30 3.21850.21850.56233.12680.12680.60053.24330.24330.6320
Par.Scenario 2: ( B , M ) = ( 30 , 13 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 23.2786 41.7191 0.1438
MLEBiasSDMLEBiasSDMLEBiasSD
θ 2.09340.09340.48662.02700.02700.48172.1510.15100.6232
β 1 −2.1266−0.12660.4648−2.1027−0.10270.4600−2.1189−0.11890.5060
β 5 −2.1197−0.11970.4763−2.0886−0.08860.4549−2.1842−0.18420.4881
β 10 −1.0328−0.03280.4703−1.0685−0.06850.4789−1.0671−0.06710.4818
β 15 1.06040.06040.48711.02120.02120.48471.09560.09560.4446
β 20 2.13130.13130.46962.01850.01850.47452.09610.09610.4848
β 25 3.17880.17880.46953.07470.07470.49543.17490.17490.4888
β 30 3.19340.19340.51203.12870.12870.49533.19080.19080.5070
Par.Scenario 3: ( B , M ) = ( 50 , 10 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 27.4950 47.8408 0.2415
MLEBiasSDMLEBiasSDMLEBiasSD
θ 2.07920.07920.40912.02680.02680.38902.09050.09050.4287
β 1 −2.0896−0.08960.4017−2.0466−0.04660.4406−2.1372−0.13720.4366
β 5 −2.0619−0.06190.4060−2.0388−0.03880.4199−2.1541−0.15410.463
β 10 −1.0431−0.04310.4186−1.0190−0.01900.4047−1.0672−0.06720.4016
β 15 1.07390.07390.41901.03630.03630.40251.00060.00060.4269
β 20 2.06550.06550.41412.06940.06940.42722.07810.07810.4213
β 25 3.15420.15420.39193.06800.06800.44613.11710.11710.4331
β 30 3.10850.10850.42013.07070.07070.48003.11980.11980.4200
Table 4. Simulation results for Gamma frailty model in Example 1 with different sample sizes at 30% censoring.
Table 4. Simulation results for Gamma frailty model in Example 1 with different sample sizes at 30% censoring.
Par.Scenario 1: ( B , M ) = ( 15 , 20 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 18.1609 27.0720 0.0601
MLEBiasSDMLEBiasSDMLEBiasSD
θ 2.03250.03250.86202.08440.08440.75222.03320.03320.6807
β 1 −1.96280.03720.8105−2.2108−0.21080.6094−2.1984−0.19840.5954
β 5 −1.90760.09240.8051−2.1330−0.13300.5744−2.1072−0.10720.6045
β 10 −1.0357−0.03570.5701−1.0732−0.07320.5423−1.1172−0.11720.5989
β 15 0.9478−0.05220.64641.07310.07310.55961.05850.05850.5717
β 20 1.9482−0.05180.77832.09240.09240.54502.14510.14510.6499
β 25 2.9937−0.00631.03793.16670.16670.61973.27880.27880.5763
β 30 2.9867−0.01331.04853.24370.24370.62033.26310.26310.5981
Par.Scenario 2: ( B , M ) = ( 30 , 13 )
Profile MM approachNon-Profile MM ApproachCoxph
T 16.7308 30.7202 0.0995
MLEBiasSDMLEBiasSDMLEBiasSD
θ 2.08800.08800.52072.09620.09620.47922.19130.19130.6610
β 1 −2.1346−0.13460.5360−2.1102−0.11020.4817−2.1127−0.11270.5501
β 5 −2.0743−0.07430.5299−2.1301−0.13010.5327−2.0796−0.07960.5362
β 10 −1.0846−0.08460.5289−1.0862−0.08620.5136−1.0690−0.06900.4962
β 15 1.09800.09800.52421.00010.00010.51691.09320.09320.5306
β 20 2.13290.13290.52162.03540.03540.52782.11170.11170.5432
β 25 3.20890.20890.51833.13650.13650.51553.23830.23830.5587
β 30 3.17020.17020.53453.16950.16950.54953.23010.23010.5380
Par.Scenario 3: ( B , M ) = ( 50 , 10 )
Profile MM ApproachNon-Profile MM ApproachCoxph
T 16.3097 29.1778 0.1647
MLEBiasSDMLEBiasSDMLEBiasSD
θ 2.09990.09990.44802.04620.04620.38312.09480.09480.3700
β 1 −2.1151−0.11510.4605−2.0508−0.05080.4187−2.1324−0.13240.4554
β 5 −2.0767−0.07670.4197−2.0876−0.08760.4615−2.1001−0.10010.4394
β 10 −1.0963−0.09630.4629−1.0712−0.07120.4393−1.0513−0.05130.4306
β 15 1.04430.04430.43130.9910−0.00900.42342.03470.03470.4476
β 20 2.08130.08130.47752.03300.03300.46302.10390.10390.4538
β 25 3.10890.10890.48483.09780.09780.46603.18290.18290.4484
β 30 3.12160.12160.48053.14620.14620.50053.18340.18340.5308
Table 5. Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 15% censoring.
Table 5. Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 15% censoring.
Par.Scenario 1: ( B , M ) = ( 15 , 20 )
Profile MM ApproachNon-Profile MM Approach
T 3.0369 7.9534
MLEBiasSDMLEBiasSD
θ 0.7060−0.29400.69200.7130−0.28700.6511
β 1 −2.1516−0.15160.5664−2.1002−0.10020.4646
β 5 −2.1783−0.17830.5366−2.1062−0.10620.5173
β 10 −1.1028−0.10280.5526−1.0881−0.08810.4719
β 15 1.12840.12840.55230.9661−0.03390.5069
β 20 2.21620.21620.52251.9742−0.02580.4936
β 25 3.25910.25910.52692.9877−0.01230.5332
β 30 3.21390.21390.55533.02470.02470.5250
Par.Scenario 2: ( B , M ) = ( 30 , 13 )
Profile MM ApproachNon-Profile MM Approach
T 5.2872 27.2494
MLEBiasSDMLEBiasSD
θ 0.7537−0.24630.56230.9178−0.08220.4438
β 1 −2.1515−0.15150.4592−2.1441−0.14410.4455
β 5 −2.1300−0.13000.4646−2.1146−0.11460.4907
β 10 −1.0873−0.08730.4743−1.0830−0.08300.4316
β 15 1.06810.06810.47551.02230.02230.4445
β 20 2.14540.14540.46002.05910.05910.4625
β 25 3.23230.23230.49723.10310.10310.4807
β 30 3.22230.22230.46983.08550.08550.4907
Par.Scenario 3: ( B , M ) = ( 50 , 10 )
Profile MM ApproachNon-Profile MM Approach
T 54.9905 36.7796
MLEBiasSDMLEBiasSD
θ 0.8461−0.15390.41430.9724−0.02760.3678
β 1 −2.0933−0.09330.4056−2.0688−0.06880.4135
β 5 −2.1509−0.15090.4375−2.0664−0.06640.4318
β 10 −1.0878−0.08780.3992−1.0790−0.07900.3824
β 15 1.04370.04370.42170.9728−0.02720.4111
β 20 2.11370.11370.40772.02570.02570.4005
β 25 3.15110.15110.43113.07090.07090.4129
β 30 3.18460.18460.41373.05400.05400.4219
Table 6. Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 30% censoring.
Table 6. Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 30% censoring.
Par.Scenario 1: ( B , M ) = ( 15 , 20 )
Profile MM ApproachNon-Profile MM Approach
T 9.7444 16.1049
MLEBiasSDMLEBiasSD
θ 0.5343−0.46570.66650.7017−0.29830.6431
β 1 −2.1693−0.16930.6153−2.1795−0.17950.6054
β 5 −2.2288−0.22880.5688−2.1677−0.16770.6601
β 10 −1.0228−0.02280.5690−1.1006−0.10060.5843
β 15 1.16350.16350.55421.05770.05770.6157
β 20 2.27790.27790.64792.09020.09020.6168
β 25 3.27900.27900.67603.17110.17110.6672
β 30 3.39690.39690.68673.19390.19390.6412
Par.Scenario 2: ( B , M ) = ( 30 , 13 )
Profile MM ApproachNon-Profile MM Approach
T 22.7787 22.0124
MLEBiasSDMLEBiasSD
θ 0.6564−0.34360.52631.02030.02030.5297
β 1 −2.1801−0.18010.5395−2.1915−0.19150.4992
β 5 −2.0213−0.02130.4727−2.1508−0.15080.5142
β 10 −1.1390−0.13900.5202−1.0678−0.06780.5140
β 15 1.08240.08240.52760.9845−0.01550.5005
β 20 2.13540.13540.46392.02520.02520.5088
β 25 3.18090.18090.64213.13620.13620.5233
β 30 3.23750.23750.55423.08280.08280.5183
Par.Scenario 3: ( B , M ) = ( 50 , 10 )
Profile MM ApproachNon-Profile MM Approach
T 20.9400 32.6620
MLEBiasSDMLEBiasSD
θ 0.8502−0.14980.38560.9809−0.01910.3556
β 1 −2.1046−0.10460.4252−2.0688−0.06880.4124
β 5 −2.1536−0.15360.4224−2.1123−0.11230.4065
β 10 −1.0597−0.05970.3893−1.0542−0.05420.3806
β 15 1.06920.06920.39731.00670.00670.3841
β 20 2.11290.11290.41622.01840.01840.4005
β 25 3.17700.17700.41593.04890.04890.4005
β 30 3.21020.21020.42773.08320.08320.4120
Table 7. The median of relative model errors for gamma frailty model by L 1 , MCP, and SCAD penalties with sample size ( B , M ) = ( 50 , 6 ) based on 200 realizations in Example 2.
Table 7. The median of relative model errors for gamma frailty model by L 1 , MCP, and SCAD penalties with sample size ( B , M ) = ( 50 , 6 ) based on 200 realizations in Example 2.
MRMEZerosMRMEZeros
CorrectIncorrect CorrectIncorrect
Penalty ϱ = 0.25 ϱ = 0.75
Profile MM method
L 1 0.15945.77500.16344.390
MCP ( γ = 3 )0.0914600.066460
SCAD ( γ = 3.7 )0.14345.91500.10145.9650
Non-profile MM method
L 1 0.08945.7400.18844.2250
MCP ( γ = 3 )0.0514600.085460
SCAD ( γ = 3.7 )0.07745.88500.10645.880
Table 8. The average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties in Example 2.
Table 8. The average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties in Example 2.
PenaltyPar.Profile MM ApproachNon-Profile MM Approach
MLEBiasSDMLEBiasSD
MCP
(ϱ = 0.25)
θ 0.4835 0.0165 0.11200.4820 0.0180 0.1398
β 1 0.9993 0.0007 0.09500.9988 0.0012 0.1028
β 2 3.0090 0.0090 0.19842.9856 0.0144 0.1971
β 49 1.9940 0.0060 0.15131.9791 0.0209 0.1512
β 50 3.9697 0.0303 0.24983.9868 0.0132 0.2566
SCAD
(ϱ = 0.25)
θ 0.4967 0.0033 0.12670.4800 0.0200 0.1335
β 1 1.01330.01330.09600.9864 0.0136 0.0975
β 2 3.01670.01670.20232.9467 0.0532 0.1644
β 49 2.01830.01830.14171.9816 0.0184 0.1255
β 50 4.03150.03150.26573.9300 0.0700 0.2334
MCP
(ϱ = 0.75)
θ 0.4886 0.0114 0.12580.4842 0.0158 0.1480
β 1 0.9774 0.0226 0.13311.00710.00710.1369
β 2 3.02520.02520.20202.9743 0.0257 0.2570
β 49 2.00830.00830.18311.9857 0.0143 0.2203
β 50 4.01340.01340.26064.00260.00260.3055
SCAD
(ϱ = 0.75)
θ 0.4869 0.0131 0.12770.4839 0.0161 0.1341
β 1 1.01190.01190.13641.01520.01520.1560
β 2 2.9815 0.0185 0.20633.00350.00350.2422
β 49 2.00120.00120.22381.9984 0.0016 0.2052
β 50 4.01190.01190.26334.00210.00210.2855
Table 9. Illustration of survival years and censoring date for some patients from the ADNI dataset.
Table 9. Illustration of survival years and censoring date for some patients from the ADNI dataset.
Patient IDFrom CN to MCI
First Observation Date
of State CN
Date of
Transition to MCI
Last Date
of Observation
t ij
(yr)
δ ij
011_S_00028 September 200526 September 201218 October 20177.051
011_S_002124 October 2005-27 November 201712.100
100_S_00358 November 20058 December 20108 December 20105.081
131_S_01237 February 200623 February 201210 February 20166.051
127_S_025928 March 20069 April 201422 September 20178.041
Patient IDFrom MCI to Dementia
First Observation Date
of State MCI
Date of
Transition to Dementia
Last Date
of Observation
t ij
(yr)
δ ij
011_S_000226 September 2012-18 October 20175.060
011_S_0021-----
100_S_00358 December 2010-8 December 20100.000
131_S_012323 February 201212 February 201410 February 20161.971
127_S_02599 April 201410 April 201522 September 20171.001
Table 10. Training results for Gamma Frailty Model.
Table 10. Training results for Gamma Frailty Model.
Profile MMNon-Profile MM
PenaltyBICNo. of Non-Zero β BICNo. of Non-Zero β
MCP1154.70701154.7570
SCAD1150.59691150.8269
Table 11. Training results for Inverse Gaussian Model.
Table 11. Training results for Inverse Gaussian Model.
Profile MMNon-Profile MM
PenaltyBICNo. of Non-Zero β BICNo. of Non-Zero β
MCP1198.76271198.8227
SCAD1203.97271204.5027
Table 12. Training results for Log-normal Frailty Model.
Table 12. Training results for Log-normal Frailty Model.
Profile MMNon-Profile MM
PenaltyBICNo. of Non-Zero β BICNo. of Non-Zero β
MCP1149.10711148.8672
SCAD1149.10711148.8671
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, X.; Xu, J.; Zhou, Y. Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics 2022, 10, 538. https://doi.org/10.3390/math10040538

AMA Style

Huang X, Xu J, Zhou Y. Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics. 2022; 10(4):538. https://doi.org/10.3390/math10040538

Chicago/Turabian Style

Huang, Xifen, Jinfeng Xu, and Yunpeng Zhou. 2022. "Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data" Mathematics 10, no. 4: 538. https://doi.org/10.3390/math10040538

APA Style

Huang, X., Xu, J., & Zhou, Y. (2022). Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics, 10(4), 538. https://doi.org/10.3390/math10040538

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop