Next Article in Journal
Positive Radially Symmetric Entire Solutions of p-k-Hessian Equations and Systems
Next Article in Special Issue
A New Cure Rate Model Based on Flory–Schulz Distribution: Application to the Cancer Data
Previous Article in Journal
IN-ME Position Error Compensation Algorithm for the Near-Field Beamforming of UAVs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Regression Analysis of Multivariate Interval-Censored Failure Time Data under Transformation Model with Informative Censoring

1
School of Mathematics, Jilin University, Changchun 130012, China
2
The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen 518057, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(18), 3257; https://doi.org/10.3390/math10183257
Submission received: 31 July 2022 / Revised: 24 August 2022 / Accepted: 30 August 2022 / Published: 7 September 2022
(This article belongs to the Special Issue Statistical Methods and Models for Survival Data Analysis)

Abstract

:
We consider a regression analysis of multivariate interval-censored failure time data where the censoring may be informative. To address this, an approximated maximum likelihood estimation approach is proposed under a general class of semiparametric transformation models, and in the method, the frailty approach is employed to characterize the informative interval censoring. For the implementation of the proposed method, we develop a novel EM algorithm and show that the resulting estimators of the regression parameters are consistent and asymptotically normal. To evaluate the empirical performance of the proposed estimation procedure, we conduct a simulation study, and the results indicate that it performs well for the situations considered. In addition, we apply the proposed approach to a set of real data arising from an AIDS study.

1. Introduction

In this paper, we consider a regression analysis of multivariate interval-censored failure time data where the censoring may be informative. Interval-censored data arise when the failure time of interest is known or observed only to belong to an interval instead of being observed exactly (Finkelstein, 1986 [1]; Sun, 2006 [2]). It is apparent that one can treat right-censored data as a special case of interval-censored data. Multivariate interval-censored data occur if a failure time study involves more than one related failure time of interest for which only interval-censored data are available. Among others, one can often face multivariate interval-censored data in epidemiological studies and clinical trials. Informative censoring occurs if the censoring mechanism or the underlying process generating observations is related to the failure times of interest (Kalbfleisch and Prentice, 2002 [3]; Sun, 2006 [2]).
An example of informative censoring is given by a clinical trial or periodic study on a failure event such as death for which some symptoms may occur before the occurrence of the event. For the situation, the study subject may tend to pay more clinical visits when the symptoms occur rather than following the pre-specified schedule. Many authors have pointed out’ that with informative censoring, the analysis that ignores it could lead to serious biased estimators or analysis results (Wang et al., 2010 [4]; Sun, 2006 [2]; Zhang et al., 2005 [5], 2007 [6]). For example, Sun (1999) [7] studied the issue for univariate current status data, a special case of interval-censored data where the observed interval includes either zero or infinity, and showed that the analysis could yield misleading results if the informative censoring is ignored or treated to be non-informative censoring.
A large amount of literature has been established for the regression analysis of multivariate interval-censored failure time data or their special cases, multivariate current status data and bivariate interval-censored data (Chen et al., 2007 [8], 2009 [9], 2013 [10]; Goggins and Finkelstein, 2000 [11]; Shen, 2015 [12]; Tong et al., 2008 [13]; Wang et al., 2008 [14]; Zeng et al., 2017 [15]; Zhang et al., 2009 [16]; Zhou et al., 2017 [17]). For this, three types of methods are commonly used, including the copula model approach, the marginal model-based approach and the frailty model-based method. The first employs various copula models to characterize the relationship between correlated failure times of interest (Wang et al., 2008 [14]; Zhang et al., 2009 [16]), and among others, Sun and Ding (2019) [18] discussed this for bivariate cases under the framework of the two-parameter Archimedean copula model. The second mainly focuses on the marginal distribution and puts no assumption on the correlation between the failure times of interest (Wei et al. 1989 [19]). The authors who developed such methods include Chen et al. (2007) [8], Chen et al. (2013) [10] and Tong et al. (2008) [13].
The frailty model-based approach generally employs the frailty or latent variable to model the correlation between the correlated failure times. It has the advantage of allowing one to directly estimate the correlation. One main shortcoming of most of the existing methods for multivariate interval-censored data is that they assume independent or non-informative interval censoring, and it is apparent that this may not hold in practice as discussed above. In this paper, we will adopt the frailty model-based approach to develop a new estimation procedure that allows for dependent or informative interval censoring.
Several authors have considered a regression analysis of univariate informatively interval-censored failure time data. For example, Zhang (2005) [5], Wang et al. (2010) [4] and Wang et al. (2018) [20] investigated the problem for current status data, case II interval-censored data and case K interval-censored data, respectively. Case II means that each study subject is observed only twice, while case K refers to the situation where each subject is observed at a sequence of observation times, which is much more general than others (Sun, 2006) [2]. As mentioned above, most of the existing methods for multivariate interval-censored data apply only to the situation with independent interval censoring, except Yu et al. (2022) [21]. Yu et al. (2022) [21] only considered case II interval-censored data under the additive hazards model. In this paper, the focus will be on case K multivariate interval-censored data with informative censoring and the proposed methods apply to much more general situations than Yu et al. (2022) [21].
More specifically, in Section 2, some notation and assumptions will be first introduced as well as the data structures. In the proposed method, we will focus on the case where the failure time of interest marginally follows a general class of semiparametric transformation models. The proposed approximated maximum likelihood estimation procedure will be presented in Section 3, and for the implementation of the proposed method, a novel EM algorithm will be developed. The asymptotic properties of the resulting estimators of the regression parameters will be given in Section 4. Section 5 will present some simulation results obtained from a study performed to evaluate the performance of the proposed method, and they indicate that it performs as expected. In Section 6, we apply the proposed methodology to a set of real data arising from an AIDS clinical trial, and Section 7 contains some discussion and concluding remarks.

2. Assumptions and Background

In this section, we first introduce some notation and background and then describe the model and data structure. Suppose that there is a failure time study consisting of n independent subjects and concerning M failure events of interest that may be related. Define T i m to be the failure time of interest and X i m a p-dimensional vector of covariates both related to the ith subject and the event m. Furthermore, for each subject, suppose that there exists a sequence of potential observation times U i 0 = 0 < U i 1 < < U i K i and a follow-up or stopping time τ i , where K i denotes the number of potential observations, i = 1 , , n . For simplicity, we assume that for each subject, the observation times for different failure events are the same and the proposed method below can be easily generalized to more general situations.
For subject i, define the point process N ˜ i ( t ) = j = 1 K i I ( U i j t ) , describing the observation process on the subject that jumps only at the observation times, i = 1 , , n . Note that for the situation considered here, we have M + 1 processes, the M underlying failure time processes of interest and the observation process N ˜ i ( t ) , and as mentioned above, the focus below will be on the case where they may be related (Ma et al., 2015 [22]; Wang et al., 2016 [23]; Zhang et al., 2007 [6]). To describe their relationships and the possible covariate effects on them, we assume that there exists a vector of latent variables b i and another latent variable u i with mean zero, and given X i m , Z i m , b i and u i , the cumulative hazard function of T i m has the form
Λ i m ( t | X i m , Z i m , b i , u i ) = G m { exp ( X i m T β x m + Z i m T b i + u i β u m ) Λ m ( t ) } .
Here, G m ( . ) is a known strictly increasing transformation function, Λ m ( . ) is an unknown baseline cumulative hazard function, Z i m contains 1 and part of the covariates X i m , and β m = ( β x m T , β u m ) T denotes the vector of unknown regression parameters.
For the observation process, it will be assumed that N ˜ i ( t ) is a non-homogeneous Poisson process satisfying
λ i h ( t | X i , u i ) = λ 0 h ( t ) exp m = 1 M X i m T α m + u i = λ 0 h ( t ) exp X i T α + u i
for the intensity function given X i m and u i . Here, λ 0 h is an unknown continuous baseline intensity function, X i = ( X i 1 T , , X i M T ) T , and α T = ( α 1 T , , α M T ) , which is a vector of regression parameters as β m . In the following, it will be assumed that given X i m , b i and u i , T i 1 , , T i M are independent, and given X i m and u i , T i and N ˜ i are independent. Moreover, τ i is independent of T i and N ˜ i . We point out that models (1) and (2) with u i = 0 have been commonly used in the analysis of failure-time data (Klein and Moescherger, 2003 [24]) and the analysis of event history data (Cook and Lawless, 2007 [25]), respectively. The parameter β u m denotes the degree of the correlation between the failure-time process and the observation process, and they will be independent if β u m = 0 .
The semiparametric transformation model (1) with b i = 0 and u i = 0 is quite general and can give many specific models. In particular, one can express it as a class of frailty-induced transformations
G m ( x ) = log 0 exp ( x t ) f m ( t ) d t .
In the above, f m ( t ) denotes the density function of a frailty variable with support [ 0 , ] . By setting f m ( t ) to be the gamma density with mean 1 and variance r m , it gives the class of logarithmic transformations G m ( x ) = r m 1 log ( 1 + r m x ) with r m > 0 (Chen et al., 2002 [26]). In particular, it yields the proportional odds model with r m = 1 or G m ( x ) = log ( 1 + x ) and gives the proportional hazards model with r k = 0 or G m ( x ) = x .
To describe the observed data, define δ i m j = I ( U i j 1 < T i m U i j ) , indicating if the failure time T i m belongs to the interval ( U i j 1 , U i j ] . In the following, it will be assumed that the observed data have the form
O = { O i = ( τ i , X i m , Z i m , U i j , δ i m j , m = 1 , , M , j = 1 , , K i ) ; i = 1 , , n } ,
where K i = N ˜ i ( τ i ) . That is, we observe case K interval-censored data (Sun, 2006).

3. Maximum Likelihood Estimation

3.1. Estimation Procedure

Now, we discuss inference about models (1) and (2), and for this, we will propose a two-step or an approximate maximum likelihood estimation procedure by following Huang and Wang (2004) [27] and Wang et al. (2016) [23]. More specifically, we will first consider the estimation of model (2) and then estimation of ϕ T = ( β 1 T , , β M T ) , the parameter of interest. The first step will be based on the following two facts. One is that one can easily show that K i follows the Poisson distribution with the mean
Λ i h ( τ i ; X i , u i ) = Λ 0 h ( τ i ) exp ( X i T α + u i )
given X i and u i , i = 1 , , n . The other is that the observation times U i 1 , , U i K i can be seen as the order statistics of a set of i.i.d. random variables with the density function
π ( t ) = λ 0 h ( t ) exp ( X i T α + u i ) Λ 0 h ( τ i ) exp ( X i T α + u i ) I ( 0 t τ i ) = λ 0 h ( t ) Λ 0 h ( τ i ) I ( 0 t τ i ) .
One can see that the function above does not depend on neither X i nor u i , which suggests that the function Λ 0 h ( t ) can be estimated by
Λ ^ 0 h ( t ) = s ( l ) > t ( 1 d ( l ) R ( l ) ) .
In the above, the s ( l ) ’s denote the ordered and distinct values of observation times { U i k } , d ( l ) the number of the observation times equal to s ( l ) , and R ( l ) the number of observation times satisfying U i k s ( l ) τ i among all subjects.
Under the assumptions above, it is easy to show that E [ K i ; X i . , u i , τ i ] = Λ 0 h ( τ i ) exp ( X T α + u i ) . This yields
E u i [ E [ K i Λ 0 h 1 ( τ i ) ; X i m , u i , τ i ] ] = E ( e u i ) exp ( X i T α ) ,
and a class of estimating equations
i = 1 n ω i X i ( K i Λ ^ 0 h 1 ( τ i ) E ( e u i ) exp ( X i T α ) ) = 0
for estimation of α m , m = 1 , , M with the ω i ’s being some weights. Let α ^ m denote the estimator of α m given by the estimating equations above, which suggests that one can naturally estimate u i by
u ^ i = log K i Λ ^ 0 h ( τ i ) exp ( X i T α ) .
Now, consider estimation of ϕ as well as model (1). For this, note that if the u i ’s were known, it would be natural to maximize the likelihood function
L n ( ϕ , Λ , γ | u i s ) = i = 1 n m = 1 M j = 1 K i exp ( G m [ 0 U i , j 1 exp { x i m T β m + Z i m T b i } d Λ m ( t ) ] ) exp ( G m [ 0 U i j exp { x i m T β m + Z i m T b i } d Λ m ( t ) ] ) Δ i m j exp ( G m [ 0 U i K i exp { x i m T β m + Z i m T b i } d Λ m ( t ) ] ) 1 j = 1 K i Δ i m j f b ( b i | γ ) d b i ,
where Λ = ( Λ 1 , , Λ M ) , x i m = ( X i m T , u ^ i ) T , and f b denotes the density function of the b i ’s assumed to be known up to a vector of parameters γ . Define L i m = m a x { U i j : U i j < T i m , j = 0 , , K i } and R i m = m i n { U i j : U i j T i m , j = 1 , , K i + 1 } , where U i 0 = 0 and U i , K i + 1 = . Then, ( L i m , R i m ] represents the shortest time interval that brackets T i m and the likelihood function above can be rewritten as
L n ( ϕ , Λ , γ | u i s ) = i = 1 n m = 1 M exp ( G m [ 0 L i m exp ( x i m T β m + Z i m T b i ) d Λ m ( t ) ] ) exp ( G m [ 0 R i m exp ( x i m T β m + Z i m T b i ) d Λ m ( t ) ] ) f b ( b i γ ) d b i .
By following Huang and Wang (2004) [27] and others, it is natural to estimate ϕ and Λ by their values that maximize the approximated likelihood function L n ( ϕ , Λ , γ | u ^ i s ) .
For the maximization of L n ( ϕ , Λ , γ | u ^ i s ) , note that it involves the unknown functions Λ m ’s and integrations. To deal with them, for the former, we propose to adopt the nonparametric approach. More specifically, for each m = 1 , , M , let 0 = t m 0 < t m 1 < < t m k m < denote the ordered sequence of all L i m and R i m with R i m < and assume that Λ m is a step function that jumps only at the t m q ’s with the jump sizes λ m q ’s. Then, L n ( ϕ , Λ , γ | u i s ) can be expressed as
L n ( ϕ , Λ , γ | u i s ) = i = 1 n m = 1 M exp G m t m q L i m exp { x i m T β m + Z i m T b i } λ m q
exp G m t m q R i m exp { x i m T β m + Z i m T b i } λ m q f b ( b i γ ) d b i .
In the following, we will develop an EM algorithm for the maximization with the focus on the situation where f b is a multivariate normal distribution with the covariance matrix Σ ( γ ) depending on the q-dimensional unknown parameter γ . The algorithm is valid for other distributions and some comments on this will be given below. It is worth to point out that as mentioned above, the idea discussed above has been used by Huang and Wang (2004) [27] and Wang et al. (2016) [23], among others. However, the problem discussed here is different or much more general than the existing literature.

3.2. EM Algorithm

In this subsection, we will develop an EM algorithm for the maximization of L n ( ϕ , Λ , γ u ^ i s ) , and for this, we will first discuss the data augmentation. Let the ξ i m ’s denote the random sample of size n from the density f m ( t ) . Then, we can rewrite the observed likelihood function as
L n ( ϕ , Λ , γ | u i s ) = i = 1 n m = 1 M ξ i m exp { ξ i m t m q L i m λ m q exp ( x i m T β m + Z i m T b i ) } 1 exp { ξ i m L i m < t m q R i m λ m q exp ( x i m T β m + Z i m T b i ) } I ( R i m < ) f m ( ξ i m ) d ξ i m × f b ( b i γ ) d b i .
Moreover, let the W i m q ’s denote the random sample of size n from the Poisson distributions with means ξ i m λ m q exp ( x i m T β m + Z i m T b i ) given ξ i m , and define A i m = t m q L i m W i m q and B i m = I ( R i m < ) L i m < t m q R i m W i m q such that
P ( A i m = 0 , B i m > 0 | L i m , R i m , x i m ) = exp { ξ i m t m q L i m λ m q exp ( x i m T β m + Z i m T b i ) } 1 exp { ξ i m L i m < t m q R i m λ m q exp ( x i m T β m + Z i m T b i ) } I ( R i m < ) .
It is easy to see that the maximization of (4) is equivalent to maximizing the likelihood function based on the data ( L i m , R i m , x i m , A i m = 0 , B i m > 0 ) ( i = 1 , , n ; m = 1 , , M ) . Based on this, for the development of the EM algorithm, it is natural to use the W i m q ’s, ξ i m ’s and b i ’s to augment the observed data. As a consequence, one can derive the resulting pseudo complete data log-likelihood function as
l c ( ϕ , Λ , γ | u i s ) = i = 1 n m = 1 M q = 1 k m I ( t m q R i m ) W i m q log { ξ i m λ m q exp ( x i m T β + Z i m T b i ) } ξ i m λ m q exp ( x i m T β + Z i m T b i ) log ( W i m q ! ) + log f m ( ξ i m ) d i 2 log ( 2 π ) 1 2 log | Σ i ( γ ) | b i T Σ i ( γ ) 1 b i 2 ,
where R i m = L i m I ( R i m = ) + R i m I ( R i m < ) .
Now, we consider the E-step of the EM algorithm. At the ( s + 1 ) th iteration and given ( ϕ s , Λ s , γ s ) T , we need to determine
Q ( ϕ , Λ , γ | ϕ s , Λ s , γ s ) = E [ l c ( ϕ , Λ , γ | u i s , O , ϕ s , Λ s , γ s ) ]
= i = 1 n m = 1 M q = 1 k m I ( t m q R i m ) E W i m q log ξ i m λ m q exp ( x i m T β + Z i m T b i )
ξ i m λ m q exp ( x i m T β + Z i m T b i ) E log ( W i m q ! ) + log f m ( ξ i m )
d i 2 log ( 2 π ) 1 2 log | Σ i ( γ ) | E b i T Σ i ( γ ) 1 b i 2
under the multivariate normal distribution with the covariance matrix Σ i ( γ ) . To calculate the conditional expectations E [ ξ i m exp ( x i m T β + Z i m T b i ) ] , E [ W i m q ] and E [ b i T Σ 1 ( γ ) b i ] given the observed data, we need to employ the joint density of ξ i m and b i given the observed data, which is proportional to
m = 1 M exp ξ i m t m q L i m exp ( x i m T β + Z i m T b i ) λ m q I ( R i m < ) exp ξ i m t m q R i m exp ( x i m T β + Z i m T b i ) λ m q × f m ( ξ i m ) ( 2 π ) d i / 2 | Σ i ( γ ) | 1 / 2 exp { b i T Σ i ( γ ) 1 b i 2 } .
Note that the conditional expectation of W i m q for t m q R i m given ξ i m ( m = 1 , , M ) , b i and the observed data is given by
E ^ ( W i m q | ξ i m , b i ) = I ( L i m < t m q R i m ) λ m q ξ i m exp ( x i m T β + Z i m T b i ) 1 exp { L i m < t m q R i m λ m q ξ i m q exp ( x i m T β + Z i m T b i ) } .
In the M-step, we can employ the Newton–Raphson method to update β m based on the equation
i = 1 n m = 1 M q = 1 k m I ( t m q R i m ) E ^ ( W i m q ) x i m i = 1 n I ( t m q R i m ) E ^ { ξ i m exp ( x i m T β m + Z i m T b i ) } x i m i = 1 n I ( t m q R i m ) E ^ { ξ i m exp ( x i m T β m + Z i m T b i ) } = 0 .
For estimation of λ m q , we have the closed form expression
λ m q = i = 1 n I ( t m q R i m ) E ^ ( W i m q ) i = 1 n I ( t m q R i m ) E ^ { ξ i m exp ( x i m T β m + Z i m T b i ) } ,
for q = 1 , , k m and m = 1 , , M . To estimate γ , one can maximize log Σ i ( γ ) E ^ { b i T Σ i 1 ( γ ) b i } with the Σ i ’s given by Σ = n 1 i = 1 n E ^ ( b i T b i ) .
Now, we summarize the EM algorithm described above as follows.
Step 1. Choose initial estimates ϕ ( 0 ) , Λ ( 0 ) , γ ( 0 ) of ϕ , Λ , γ , respectively.
Step 2. In the ( s + 1 ) th iteration, calculate E ^ [ ξ i m exp ( x i m T β ( s ) + Z i m T b i ) ] , E ^ [ W i m q ] and E ^ [ b i T Σ 1   ( γ ( s ) ) b i ] by using, for example, the Gaussian quadrature method. s = 0 , 1 , 2 ,
Step 3. Update Λ ( s + 1 ) by (6) with current ϕ ( s ) , γ ( s ) , and then update γ ( s + 1 ) by maximizing log Σ i ( γ ) E ^ { b i T Σ i 1 ( γ ) b i } . In addition, estimate ϕ ( s + 1 ) by employing the one-step Newton–Raphson method.
Step 4. Repeat Steps 2–3 until the convergence such that the absolute difference of the log-likelihood values between two consecutive iterations is less than a given positive value ε such as 10 3 .

4. Asymptotic Properties

Let θ ^ = β ^ 1 T , , β ^ M T , γ ^ T , Λ ^ 1 , , Λ ^ M T denote the estimator of θ = ( β 1 T , , β M T , γ T , Λ 1 , , Λ M ) T defined above and θ 0 = ( β 01 T , , β 0 M T , γ 0 T , Λ 01 , , Λ 0 M ) T the true value of θ . Define ζ 0 = β 01 T , , β 0 M T , γ 0 T T and ζ ^ = β ^ 1 T , , β ^ M T , γ ^ T T . In this section, we will establish the asymptotic properties of θ ^ , and for this, we first describe the regularity conditions needed.
Define
Q m t , b ; β m , Λ m = exp G m 0 t exp { β m T x m + b T Z m } d Λ m ( s ) ,
D m ( U m , b ; β m , Λ m ) = l = 0 K m Δ m l { Q m ( U m l , b ; β m , Λ m ) Q m ( U m , l + 1 , b ; β m , Λ m ) } , U m = ( U m 1 , , U m , K m ) , Δ m l = I ( U m l T m < U m , l + 1 ) , and p ( b | γ ) = ( 2 π ) d / 2 | Σ ( γ ) | 1 / 2 exp ( b T Σ ( γ ) 1 b / 2 ) . For the asymptotic properties of θ ^ , we need the following regularity conditions.
Condition 1.
The true value ζ 0 belongs to a known compact set A B C , where A denotes a compact set of R pM , B a compact set in R M , and C a compact set of R q in the domain of γ such that Σ ( γ ) is a positive-definite matrix with eigenvalues bounded away from zero and ∞. In addition, the true value Λ 0 m ( · ) is continuously differentiable with positive derivatives in [ 0 , τ m ] .
Condition 2.
The covariate vector X m and Z k are bounded in [ 0 , τ m ] .
Condition 3.
For the transformation function G m , assume that it is twice continuously differentiable on [ 0 , ) with G m ( 0 ) = 0 , G m ( x ) > 0 and G m ( ) = .
Condition 4.
Assume that sup γ C b g ( b ) p ( j ) ( b | γ ) d b < for any smooth function g ( · ) and j = 0 , 1 , 2 . Here, p ( j ) ( b | γ ) denotes the jth derivative of p ( b | γ ) with respect to γ.
Condition 5.
If there exists a vector u and some constants v m , m = 1 , , M such that
u T ζ + m = 1 M v m y m ζ , y 1 , , y M = ζ 0 , Λ 10 c 1 , , Λ M 0 c M · log b m = 1 M D m ( U m , b ; β , β u , Λ m ) p ( b γ ) d b = 0
for each of these values, then u = 0 p M + M + q and v m = 0 . In addition, 0 p M + M + q denotes a ( p M + M + q ) -dimensional vector of zeros.
Condition 6.
Assume that P ( τ m τ 0 , exp ( u ) > 0 ) > 0 for the follow-up time τ m and latent variable u, where τ 0 denotes the longest study time and the variance of exp ( u ) is bounded and there exists a positive small constant ϵ > 0 such that exp ( u ) > ϵ almost surely. Moreover, for τ m and u, the function F ( s ) = E [ exp ( u ) I ( τ m s ) ] is continuous for s [ 0 , τ 0 ] .
Note that Conditions 1 and 2 are standard conditions in survival analysis, and it is easy to check that Condition 3 on the transformation function holds for the logarithmic family G r ( x ) = r 1 log ( 1 + r x ) ( r 0 ) and the Box–Cox family G d ( x ) = d 1 ( 1 + x ) d 1 ( d 0 ) . Moreover, Condition 4 holds for modeling multivariate data with frailty models, and Condition 5 is required for the identifiability of the model. In addition, Condition 6 describes the relationship between the latent variable u and the parameters of interest. Most of the conditions above are purely for technique purposes and hold in general, in particular, for periodic follow-up studies.
Let · denote the Euclidean norm and define P f = f ( x ) d P ( x ) and P n f = n 1 i = 1 n f X i for a function f and a random variable X with distribution P. The following two theorems give the asymptotic properties of θ ^ .
Theorem 1.
Suppose that Conditions 1–6 hold. Then, as n , we have that ζ ^ ζ 0 + m = 1 M sup t 0 , τ m Λ ^ m ( t ) Λ 0 m ( t ) 0 almost surely.
Theorem 2.
Suppose that Conditions 1–6 hold. Then, as n , we have that n ( ζ ^ n ζ 0 ) d N ( 0 , I 0 1 ) , where I 0 = P { l ˜ ( θ 0 ) l ˜ ( θ 0 ) T } with l ˜ ( θ 0 ) given in the Appendix A.
We will sketch the proof for the results described above in Appendix A. For inference about ζ , it is apparent that one needs to estimate the covariance matrix, and for this, one can see from Appendix A that it would be difficult to derive a consistent estimator of I 0 . Thus, we propose to employ the profile likelihood approach to estimate the covariance matrix of ζ ^ (Murphy & van der Vaart, 2000) [28]. Specifically, let C denote the set of all step functions with nonnegative jumps at t m q and define pl n ( ζ ) = max Λ C log L n ( ζ , Λ ) , the profile log-likelihood. Then, one can estimate the covariance matrix of ζ ^ by the negative inverse of the matrix with the ( j , k ) th element given by
pl n ( ζ ^ ) pl n ζ ^ + h n e k pl n ζ ^ + h n e j + pl n ζ ^ + h n e k + h n e j h n 2 .
In the above, e j denotes the jth canonical vector in R d and h n is a constant of order n 1 / 2 . Note that to calculate p l n ( ζ ) for each ζ , one can reuse the proposed EM algorithm with β held fixed and the only step in the EM algorithm is to explicitly evaluate E ^ ( W i m q ) and E ^ ( ξ i m ) to update λ m using above. The iteration converges quickly in general by setting λ ^ m to be the initial value.

5. A Simulation Study

In this section, we give some of the simulation results obtained from a study performed to evaluate the finite sample performance of the proposed method with the focus on estimation of the β m ’s. In the study, we considered the situation with M = 2 correlated failure times of interest and two covariates. For the covariates, it was assumed that the first covariate follows the Bernoulli distribution with the success probability of 0.5 and the second covariate, the uniform distribution over ( 0 , 1 ) . To generate the true failure times, we first set Z i m to be one and generated the latent variables b i ’s from the normal distribution N ( 0 , σ 2 ) with σ 2 = 0.25 and the latent variables u i ’s from the normal distribution with the mean 0 and variance 1. Then, given the X i m ’s, Z i m ’s, b i ’s and u i ’s, the T i 1 ’s and T i 2 ’s were generated under model (1) with G m ( x ) = r m 1 log ( 1 + r m x ) , Λ 1 ( t ) = log ( 1 + 0.5 t ) and Λ 2 ( t ) = 0.5 t for r 1 = r 2 = 0 , r 1 = r 2 = 0.5 or r 1 = r 2 = 1 , respectively.
For the generation of the observation process and the observed data, we first assumed that the τ i s follow the uniform distribution over the interval [2, 3] and generated the K i ’s from the Poisson distribution with the mean
Λ i h ( τ i ; X i , u i ) = τ i exp ( X i T α + u i )
given the X i ’s and u i ’s. Note that in the above, we took Λ 0 h ( t ) = t and α m = 1 . Given the K i ’s, we took U i 1 < < U i K i to be the order statistics of the random sample of size K i from the uniform distribution over ( 0 , τ i ) . In the following, we considered two sets of true values, ( 0 , 0 , 0 ) T and ( 0.5 , 0.5 , 0.5 ) T , for the regression parameters β 1 = ( β x 11 , β x 12 , β u 1 ) T and β 2 = ( β x 21 , β x 22 , β u 2 ) T , corresponding to T i 1 and T i 2 , respectively. The results given below are based on n = 200 or 400 with 1000 replications.
Table 1 gives the results on the estimation of β 1 and β 2 given by the proposed estimation procedure with r 1 = r 2 = 0 , r 1 = r 2 = 0.5 and r 1 = r 2 = 1 . Here, we calculated the estimated bias (Bias) given by the average of the estimates minus the true value, the sample standard error (SSE) of the estimates, the average of the estimated standard errors (ESE) and the 95 % empirical coverage probability (CP). The results suggest that the proposed estimator of the regression parameters seems to be unbiased and the variance estimation based on the profile likelihood approach also seems to be reasonable. Furthermore, the results on the empirical coverage probabilities indicate that the normal approximation to the distribution of the proposed estimator of the regression parameters appears to be appropriate. In addition, the results got better in general with the increasing sample size, as expected.
As mentioned before, the proposed estimation procedure can be applied to any distribution for the latent variables b i ’s. To see this, we repeated the study above, except that we generated the b i ’s from the uniform distribution over ( 1 , 1 ) , and Table 2 presents the obtained results on the estimation of β 1 and β 2 with n = 200 and r 1 = r 2 = 0 . One can see that they are similar to those given in Table 1 and again suggest that the proposed approach seems to work well for the situations considered. To see the performance of the proposed approach with different types of covariates, we also repeated the study giving Table 1, except that both covariates were assumed to follow the standard normal distribution and give the obtained results with n = 200 and r 1 = r 2 = 0 in Table 3. They indicate that the proposed estimation procedure seems to be robust to different types of covariates.
Note that in the proposed estimation procedure, it has been assumed that the observation process N ˜ i ( t ) is a non-homogeneous Poisson process and one may be interested in the performance of the proposed method if this assumption is not true. To see this, we repeated the study giving Table 1, except that the N ˜ i ( t ) ’s were assumed to be mixed Poisson processes with the intensity function
λ i h ( t | X i , u i ) = v i λ 0 h ( t ) exp X i T α + u i
given the v i ’s, where the v i ’s were generated from the gamma distribution. Table 4 presents the results on the estimation of β 1 and β 2 given by the proposed approach with n = 200 and r 1 = r 2 = 0 , and they indicate that the approach seems to be robust with the processes N ˜ i ( t ) ’s.
For the initial value in the EM algorithm here, we set ϕ ( 0 ) = 0 , λ m q ( 0 ) = 1 k m , q = 1 , , k m , and γ ( 0 ) = 0.25 . It is worth to point out that we did try other initial values and the proposed EM algorithm seems to be robust with respect to the selection of the initial values. In other words, we did not encounter non-convergence issue in the simulation study. We also considered some other setups, including multivariate cases and the case with more than one covariate and obtained similar results.

6. An Application

In the section, the estimation procedure proposed in the previous sections is applied to the set of bivariate interval-censored data arising from an AIDS clinical trial, AIDS Clinical Trial Group 181, described in Goggins and Finkelstein (2000) [11]. The study concerns the opportunistic infection cytomegalovirus (CMV) and examined the study patients periodically. At each clinical visit or observation, among other information, the blood and urine samples were collected and tested to detect the existence of the CMV virus in the sample, which is also commonly referred to as the shedding of the virus. In addition, for each patient, the CD4 count, indicating the status of a person’s immune system and being commonly used to measure the stage of HIV infection, was also recorded at the entry time. For the analysis here, we are mainly interested in if the baseline CD4 account, the indicator of the initial stage of HIV disease, is related to the CMV shedding risk in either blood or urine.
The data set consists of 204 subjects, and they belong to two groups based on their baseline CD4 counts, either less than 75 or otherwise. More specifically, the two groups have 111 and 93 patients, respectively. On the observation of the CMV shedding times, some patients gave left-censored observations and some right-censored observations. The others provided some intervals or interval-censored observations, given by the last negative and first positive test dates. That is, we have bivariate interval-censored data on the CMV shedding times in the blood and urine. The percentages of right-censored observations for the CMV shedding times in the blood and urine are about 85 % and 43 % , respectively, which indicate that the CMV shedding risk in the urine may be higher than that in the blood. For the application of the proposed estimation procedure, let T i 1 and T i 2 denote the CMV shedding times in the blood and urine associated with the ith patient, respectively, and define X i = 1 if the ith subject’s baseline CD4 count was less than 75 and 0 otherwise. As in the simulation study, we took G m ( x ) = r m 1 log ( 1 + r m x ) and set Z i = 1 for all patients.
Table 5 presents the estimation results given by the proposed approach for different combinations of r 1 = 0 , 0.5 , 1 and r 2 = 0 , 0.5 , 1 , and they include the estimated covariate effects, β ^ b l o o d and β ^ u r i n e , the estimated standard errors (SE) and the p-values for testing no covariate effect (P). In addition, we have calculated the Akaike Information Criterion (AIC, Akaike, 1973 [29]) and Bayesian Information Criterion (BIC, Schwarz, 1978 [30]) for the selection of the optimal model. One can see from the table that the AIC and BIC values are quite close for all combinations of r 1 and r 2 , and the same is true for the estimated effects. By choosing r 1 = r 2 = 0 , which correspond to the proportional hazards models for both the T i 1 ’s and T i 2 ’s, we have β ^ b l o o d = 2.312 and β ^ u r i n e = 1.143 with the estimated standard errors being 0.396 and 0.142 , respectively. They suggest that the patients with lower CD4 at the baseline experienced CMV shedding in both blood and urine significantly early. To provide a graphical view about the difference between the CMV shedding in the blood and urine, Figure 1 presents the estimates of the baseline marginal survival functions given by the proposed method with r 1 = r 2 = 0 for the CMV shedding times in the blood and urine, respectively. They suggest that as discussed above, the CMV shedding in the urine occurred much earlier than in the blood.
In addition, with r 1 = r 2 = 0 , the proposed method yielded β ^ u 1 = 2.845 and β ^ u 2 = 0.958 with the estimated standard errors of 0.112 and 0.116 , respectively. They indicate that the observation process was significantly correlated with the CMV shedding times in both blood and urine. That is, we had dependent or informative censoring. To investigate the effects of informative censoring on the covariate effects, we assumed that β u 1 = β u 2 = 0 , meaning independent interval censoring, and obtained β ^ b l o o d = 1.560 and β ^ u r i n e = 1.306 with the estimated standard errors being 0.514 and 0.326 , respectively. They would correspond to the p-values of 0.015 and 0.011 for testing β b l o o d = 0 and β u r i n e = 0 , respectively. Although these results are similar to those given above, it is apparent that they underestimated the effects of the baseline CD4 on the risks of the CMV shedding times.

7. Discussion and Concluding Remarks

In the preceding sections, the regression analysis of case K multivariate interval-censored failure-time data was discussed under a general class of semiparametric transformation models in the presence of informative censoring. For the problem, an approximate maximum likelihood estimation procedure was proposed and the resulting estimators of the regression parameters were shown to be consistent and asymptotically normal. In the method, the frailty approach was employed to characterize the informative censoring as well as the relationship among the correlated failure times of interest. To implement the proposed approach, a novel EM algorithm was developed and the numerical studies indicated that the proposed method works well in practical situations. In addition, it was applied to a set of real bivariate interval-censored data arising from an AIDS clinical trial.
The proposed approach can be seen as a generalization of the method given by Zeng et al. (2017) [15] to allow for informative interval censoring, which can occur quite often, as discussed above and in the literature. In particular, it has been shown that in the presence of informative censoring, the analysis that ignores it could lead to biased or misleading results and conclusions. The proposed method has the advantages that it does not need or impose an assumption on the distribution of the latent variables and it is quite flexible and can be easily implemented. Moreover, the type of the data considered here includes most types of the failure-time data discussed in the literature as special cases and the model (1) gives many commonly used models.
As discussed above, although model (1) is quite flexible, it may not be straightforward to choose an optimal model for a given set of data, and one commonly used procedure for this is to apply the AIC or BIC. As an alternative, one may prefer to develop a model-checking or data-driven technique. However, this may be difficult and such a method does not seem to exist even for simple types of multivariate interval-censored data. It is worth noting that instead of the proposed approximation maximum likelihood estimation method, one may consider a full maximum likelihood estimation procedure. For this, however, one would need to specify or postulate some distributions for the latent variables b i ’s, which may be hard to be verified, and also the implementation would be much more complicated.

Author Contributions

Formal analysis, M.Y.; Conceptualization, M.Y. and M.D.; Software, M.Y.; Supervision, M.D.; Funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China Grant (12101522).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the finding of this study are openly available in “ACTG 181” at https://doi.org/10.1007/0-387-37119-2 [2].

Acknowledgments

We want to thank the editor as well as the two reviewers for their many insightful and helpful comments and suggestions that greatly helped the paper. The R code about the implementation of the proposed method is available upon request to the first author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorems 1 and 2

In this Appendix, we will sketch the proof of the two theorems given above.
Proof of Theorem 1. 
To prove the consistency, we can verify the condition of Theorem 5.7 of Van der Vaart (1998) [31]. B V [ 0 , τ m ] denotes the functions whose total variation in [ 0 , τ m ] are bounded by a given constant. Define M = { θ : θ A B C B V M } , where B V M = B V [ 0 , τ 1 ] B V [ 0 , τ 2 ] B V [ 0 , τ M ] and M 0 is a similar space with M containing θ 0 . Moreover, define the metric ρ ( θ , θ 0 ) on the parameter space M as ρ ( θ , θ 0 ) = m = 1 M β x m β 0 x m 2 + m = 1 M β u m β 0 u m 2 + γ γ 0 2 + m = 1 M sup t [ 0 , τ m ] | Λ m ( t ) Λ 0 m ( t ) | . Let L ( θ ) be the likelihood, so the log-likelihood is
l ( θ ) = log m = 1 M D m U m , b ; β m , Λ m p ( b | γ ) d b
Then, the class of function D m U m , b ; β m , Λ m is a Donsker class. By condition 4, we know that l ( θ ) is bounded away from zero. Therefore, l ( θ , O ) belongs to some Donsker class due to the preservation property of the Donsker class under the Lipschitz-continuous transformations. Then, we can conclude that sup θ M | P n l ( θ , O ) P l ( θ 0 , O ) | converges in probability to 0 as n .
Next, we need to verify another condition of Theorem 5.7 of Van der Vaart (1998) [31], for any ϵ > 0 ,
sup ρ ( θ , θ 0 ) > ϵ P l ( θ , O ) < P l ( θ 0 , O ) .
Following Gibbs’ inequality, we have that P l ( θ , O ) P l ( θ 0 , O ) for all θ M with equality holds if and only if l ( θ , O ) = l ( θ 0 , O ) almost surely. Assume that sup ρ ( θ , θ 0 ) > ϵ P l ( θ ) = P l ( θ 0 ) . Then, there exists a sequence θ j such that P l ( θ j ) P l ( θ 0 ) and ρ ( θ j , θ 0 ) > ϵ . Because A B C are compact and B V M are uniformly bounded such that θ j m converges to θ j 0 , and θ j m is the subsequence of θ j , where θ j 0 may or not be in M , but in M 0 . Clearly, P l ( θ ) is continuous with respect to θ , such that P l ( θ j 0 ) = P l ( θ 0 ) . By Condition 5 and similar arguments to the proof of Theorem 2.1 of Chang et al. (2007) [32], we can show the identifiability of the model parameters, so that θ j 0 = θ 0 . However, ρ ( θ j m , θ 0 ) > ϵ , so θ j m cannot converge to θ 0 . This is a contradiction. Therefore, sup ρ ( θ , θ 0 ) > ϵ P l ( θ ) < P l ( θ 0 ) . Following Theorem 5.7 of Van der Vaart (1998) [31], we have ρ ( θ ^ , θ 0 ) = o p ( 1 ) , which completes the proof of Theorem 1. □
Proof of Theorem 2. 
Define
S β x m ( θ ) = l ( θ ) β x m , S β u m ( θ ) = l ( θ ) β u m , S γ ( θ ) = l ( θ ) γ ,
the score functions with respect to β x m , β u m and γ , respectively. For m = 1 , , M , let h m ( t ) be a nonnegative and nondecreasing function on [ 0 , τ m ] . Define H = { h = ( h 1 ( t ) , , h M ( t ) ) } , Λ ϵ ( t ) = ( Λ 1 , ϵ ( t ) , , Λ M , ϵ ( t ) ) , and
H m l ( t ; θ ) = B m t , U m l , U m , l + 1 , b ; β m , Λ m m = 1 , m m M D m U m , b ; β m , Λ m p { b γ } d b m = 1 M D m U m , b ; β m , Λ m p { b γ } d b ,
where Λ m , ϵ ( t ) = Λ m ( t ) + ϵ h m ( t ) and
B m t , s 1 , s 2 , b ; β m , Λ m = exp β x m T X m + u β u m + b T Z m × Q m s 2 , b ; β m , Λ m G m 0 v exp β m T x m + b T Z m d Λ m ( s ) I ( s 2 t ) Q m s 1 , b ; β m , Λ m G m 0 u exp β m T x m + b T Z m d Λ m ( s ) I ( s 1 t ) .
It follows that
S β x ( θ ) = m = 1 M l = 0 K m 0 τ m H m l ( t ; θ , A ) X m d Λ m ( t ) , S β u ( θ ) = m = 1 M l = 0 K m 0 τ m H m l ( t ; θ , A ) u d Λ m ( t ) , S γ ( θ ) = m = 1 M D m U m , b ; β m , Λ m p γ { b γ } d b m = 1 M D m U m , b ; β m , Λ m p { b γ } d b ,
where p γ { b γ } is the first-order derivative of p { b γ } with respect to γ , β x = ( β x 1 T , , β x M T ) T and β u = ( β u 1 , , β u M ) T .
To obtain the score operator for A , we consider submodels A ϵ ( h ) , where h = h 1 , , h M T is a vector of functions in L 2 0 , τ m . Then, we have that d Λ m , ϵ , h m = 1 + ϵ h m d Λ m , and the score function along the mth submodels for every Λ m , m = 1 , , M has the form
S Λ m ( θ ) ( h ) = l = 0 K m Δ m l 0 τ m H m l ( t ; θ ) h m ( t ) d Λ m ( t ) .
The efficient score for ζ at ζ 0 , Λ 0 is l ˜ ζ 0 , Λ 0 = S ζ ζ 0 , Λ 0 m = 1 M S Λ m ζ 0 , Λ 0 h m , where S ζ ζ 0 , Λ 0 = S β x 1 θ 0 , , S β x M θ 0 , S β u θ 0 , S γ θ 0 T , h m is a ( p M + M + q ) -vector function satisfying
P S ζ ζ 0 , Λ 0 m = 1 M S Λ m ζ 0 , Λ 0 h m T m = 1 M S Λ m ζ 0 , Λ 0 h m = 0 ,
for each h m in H .
By following similar calculations in Section 3 of Chang et al. (2007) [32], we can establish the existence of h m in the above equation. The efficient Fisher information matrix I 0 for ζ at ζ 0 , Λ 0 is defined as P l ˜ ζ 0 , Λ 0 l ˜ ζ 0 , Λ 0 T . In the following, we will show that I 0 is positive definite. If the I 0 is singular, then there exists a nonzero vector ν R ( p M + M + q ) such that ν T I 0 ν = 0 . It follows that, with probability one, the score function along the submodel ζ 0 + ϵ ν , Λ 10 + ϵ ν T h 1 , , Λ M 0 + ϵ ν T h M is zero. Therefore,
ν T ζ + m = 1 M h m y m ζ , y 1 , , y M = ζ 0 , Λ 10 c 1 , , Λ M 0 c M
· log b m = 1 M D m ( U m , b , β m , Λ m ) p ( b γ ) d b = 0 .
Using Condition 5, we know that ν = 0 , and this is a contradiction. Therefore, we can conclude that ν T I 0 ν = 0 implies ν = 0 . That is, the efficient Fisher information matrix is positive.
Define
S ζ , m ( θ ) h m = ϵ ϵ = 0 S ζ θ ; Λ m = Λ m ϵ ,
and
S m , j ( θ ) h ˜ m , h j = ϵ ϵ = 0 S Λ m θ ; Λ j = Λ j ϵ h ˜ k
for m = 1 , , M and j = 1 , , M , where / ϵ ϵ = 0 Λ j ϵ = h j . By Taylor expansion, we can obtain
P l ˜ ζ 0 , Λ = P l ˜ ζ 0 , Λ 0 + P m = 1 M S ζ m ( θ ) Λ m Λ m 0 m = 1 M j = 1 M S m , j ( θ ) h m , Λ m Λ m 0 + O p m = 1 M Λ m Λ m 0 2 .
Note that P l ˜ ζ 0 , Λ 0 = 0 , P S ζ ( θ ) S Λ m ( θ ) h m = P S ζ , m ( θ ) h m , P ( S Λ m ( θ ) h ˜ m S Λ j ( θ ) h j ) = P S m , j ( ζ ) h ˜ m , h j . By the consistency and the proof of Theorem of Zeng et al. (2017), we can conclude that P l ˜ ζ 0 , Λ ^ n = O p n 2 / 3 , which implies n P l ˜ ζ 0 , Λ ^ n = o p ( 1 ) .
We know from Example 19.11 of Van der Vaart (1998) [31] that the class of uniformly bounded functions with bounded variations is a Donsker class. By using Theorem 2.10.6 of Van der Vaart and Wellner (1996) [33], we can verify that l ˜ ( ζ , Λ ) is a uniformly bounded Donsker class. In addition, we have proved that θ ^ n is consistent. Therefore, n ( P n P ) ( l ˜ ( ζ ^ n , Λ ^ n ) l ˜ ( ζ 0 , Λ 0 ) ) = o p ( 1 ) . Due to the fact that P n l ˜ θ ^ n = P l ˜ θ 0 = 0 and P l ˜ ζ 0 , Λ ^ n = o p ( 1 ) , we can have
n P l ˜ θ ^ n l ˜ ζ 0 , Λ ^ n = n P n l ˜ θ 0 + o p ( 1 ) .
By the mean value theorem, we have
n P ζ l ˜ ζ , Λ ^ n ζ ^ n ζ 0 = n P n l ˜ θ 0 + o p ( 1 ) ,
where ζ is a point between ζ ^ n and ζ 0 . Because θ ^ n is consistency and P ζ l ˜ θ 0 = P l ˜ θ 0 l ˜ θ 0 T = I 0 , we can conclude that
n ζ ^ n ζ 0 = I 0 1 n P n l ˜ θ 0 + o p ( 1 ) d N 0 , I 0 1 .
This completes the proof of Theorem 2. □

References

  1. Finkelstein, D.M. A proportional hazards model for interval-censored failure time data. Biometrics 1986, 42, 845–854. [Google Scholar] [CrossRef] [PubMed]
  2. Sun, J. Statistical Analysis of Interval-Censored Failure Time Data; Springer: New York, NY, USA, 2006. [Google Scholar]
  3. Kalbfleisch, J.D.; Prentice, R.L. The Statistival Analysis of Failure Time Data, 2nd ed.; Wiley: New York, NY, USA, 2002. [Google Scholar]
  4. Wang, L.; Sun, J.; Tong, X. Regression analysis of case II interval censored failure time data with the additive hazards model. Stat. Sin. 2010, 20, 1709–1723. [Google Scholar] [PubMed]
  5. Zhang, Z.; Sun, J.; Sun, L. Statistical analysis of current status data with informative observation times. Stat. Med. 2005, 24, 1399–1407. [Google Scholar] [CrossRef] [PubMed]
  6. Zhang, Z.; Sun, L.; Sun, J.; Finkelstein, D. Regression analysis of failure time data with informative interval censoring. Stat. Med. 2007, 26, 2533–2546. [Google Scholar] [CrossRef]
  7. Sun, J. A nonparametric test for current status data with unequal censoring. J. R. Stat. Soc. B 1999, 61, 243–250. [Google Scholar] [CrossRef]
  8. Chen, M.; Tong, X.; Sun, J. The proportional odds model for multivariate interval-censored failure time data. Stat. Med. 2007, 26, 5147–5161. [Google Scholar] [CrossRef]
  9. Chen, M.; Tong, X.; Sun, J. A frailty model approach for regression analysis of multivariate current status data. Stat. Med. 2009, 28, 3424–3436. [Google Scholar] [CrossRef]
  10. Chen, M.; Tong, X.; Zhu, L. A linear transformation model for multivariate interval-censored failure time data. Can. J. Stat. 2013, 41, 275–290. [Google Scholar] [CrossRef]
  11. Goggins, W.B.; Finkelstein, D.M. A proportional hazards model for multivariate interval-censored failure time data. Biometrics 2000, 56, 940–943. [Google Scholar] [CrossRef] [PubMed]
  12. Shen, P. Additive transformation models for multivariate interval-censored data. Commun. Stat.-Theory Methods 2015, 44, 1065–1079. [Google Scholar] [CrossRef]
  13. Tong, X.; Chen, M.H.; Sun, J. Regression analysis of multivariate interval censored failure time data with application to tumorigenicity experiments. Biom. J. 2008, 33, 364–374. [Google Scholar] [CrossRef]
  14. Wang, L.; Sun, J.; Tong, X. Efficient estimation for the proportional hazards model with bivariate current status data. Lifetime Data Anal. 2008, 14, 134–153. [Google Scholar] [CrossRef]
  15. Zeng, D.; Gao, F.; Lin, D. Maximum likelihood estimation for semiparametric regression models with multivatiate interval-censored data. Biometrika 2017, 104, 505–525. [Google Scholar] [CrossRef]
  16. Zhang, B.; Tong, X.; Sun, J. Efficient Estimation for the proportional odds model with bivariate current status data. Far East J. Theor. Stat. 2009, 27, 113–132. [Google Scholar]
  17. Zhou, Q.; Hu, T.; Sun, J. A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J. Am. Stat. Assoc. 2017, 112, 664–672. [Google Scholar] [CrossRef]
  18. Sun, T.; Ding, Y. Copula-based semiparametric regression method for bivariate data under general interval censoring. Biostatistics 2021, 2, 315–330. [Google Scholar] [CrossRef]
  19. Wei, L.J.; Lin, D.Y.; Weissfeld, L. Regression analysis of multivariate incomplere failure time data by modeling marginal distributions. J. Am. Stat. Assoc. 1989, 84, 1065–1073. [Google Scholar] [CrossRef]
  20. Wang, S.; Wang, C.; Wang, P.; Sun, J. Semiparametric analysis of the additive hazards model with informatively interval-censored failure time data. Comput. Stat. Data Anal. 2018, 125, 1–9. [Google Scholar] [CrossRef]
  21. Yu, M.; Feng, Y.; Duan, R.; Sun, J. Regression analysis of multivariate interval-censored failure time data with informative censoring. Stat. Methods Med. Res. 2022, 31, 391–403. [Google Scholar] [CrossRef] [PubMed]
  22. Ma, L.; Hu, T.; Sun, J. Sieve maximum likelihood regression analysis of dependent current status data. Biometrika 2015, 102, 731–738. [Google Scholar] [CrossRef]
  23. Wang, P.; Zhao, H.; Sun, J. Regression analysis of case k interval-censored failure time data in the presence of informative censoring. Biometrics 2016, 72, 1103–1112. [Google Scholar] [CrossRef]
  24. Klein, J.P.; Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data; Springer: New York, NY, USA, 2003. [Google Scholar]
  25. Cook, R.J.; Lawless, J. The Statistical Analysis of Recurrent Events; Springer: New York, NY, USA, 2007. [Google Scholar]
  26. Chen, K.; Jin, Z.; Ying, Z. Semiparametric Analysis of Transformation Models with Censored Data. Biometrika 2002, 89, 659–668. [Google Scholar] [CrossRef]
  27. Huang, C.Y.; Wang, M.C. Joint modeling and estimation for recurrent event processes and failure time data. J. Am. Stat. Assoc. 2004, 99, 1153–1165. [Google Scholar] [CrossRef]
  28. Murphy, S.A.; Van Der Vaart, A.W. On profile likelihood. J. Am. Stat. Assoc. 2000, 95, 449–465. [Google Scholar] [CrossRef]
  29. Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle. In Second International Symposium on Information Theory; Petrov, B., Csaki, F., Eds.; Academiae Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
  30. Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  31. Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: New York, NY, USA, 1998. [Google Scholar]
  32. Chang, I.S.; Wen, C.C.; Wu, Y.J. A profile likelihood theory for the correlated gamma-frailty model with current status family data. Stat. Sin. 2007, 17, 1023–1046. [Google Scholar]
  33. Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
Figure 1. Estimated marginal survival functions for the CMV shedding times in the blood and urine.
Figure 1. Estimated marginal survival functions for the CMV shedding times in the blood and urine.
Mathematics 10 03257 g001
Table 1. Simulation results on estimation of β with the b i ’s generated from the normal distribution.
Table 1. Simulation results on estimation of β with the b i ’s generated from the normal distribution.
True Value r 1 = r 2 = 0 r 1 = r 2 = 0.5 r 1 = r 2 = 1
BiasSSESEECPBiasSSESEECPBiasSSESEECP
n = 200
β x 11 00.0300.2480.2520.9450.0060.2890.2880.9510.0310.2510.2590.945
β x 12 00.0280.4120.4220.9570.0210.4790.4760.9530.0290.4180.4520.957
β u 1 0−0.0240.1410.1430.938−0.0260.1640.1630.945−0.0240.1420.1420.938
β x 21 00.0440.2360.2360.9530.0010.2780.2760.9480.0440.2420.2520.953
β x 22 00.0010.3900.4220.9490.0110.4590.4600.9490.0010.4030.4290.949
β u 2 0−0.0130.1380.1350.960−0.0260.1590.1580.945−0.0140.1380.1400.960
β x 11 0.50.0400.2330.2460.9440.0240.2910.2900.9550.0330.3140.3170.950
β x 12 0.50.0250.3780.4100.955−0.0070.4850.5010.9510.0200.5080.5330.948
β u 1 0.50.0100.1350.1420.956−0.0260.1650.1680.942−0.0220.1770.1790.948
β x 21 0.50.0280.2270.2380.9490.0170.2800.2820.9490.0620.3070.3160.947
β x 22 0.50.0110.3580.4360.9530.0110.4660.4840.9560.0540.4890.4980.949
β u 2 0.5−0.0050.1350.1310.943−0.0270.1600.1600.950−0.0220.1760.1790.958
n = 400
β x 11 00.0090.1710.1780.9520.0100.2000.2040.954−0.0010.2260.2280.949
β x 12 00.0010.2800.2830.952−0.0030.3270.3290.9540.0170.3700.3740.946
β u 1 0−0.0230.0970.1040.949−0.0360.1120.1130.9440.0360.1270.1300.942
β x 21 00.0090.1620.1800.9410.0070.1910.1940.9460.0090.2190.2220.942
β x 22 00.0010.2600.2870.9480.0140.3140.3210.9510.0030.3580.3660.955
β u 2 0−0.0250.0930.0960.942−0.0370.1080.1090.948−0.0360.1240.1230.946
β x 11 0.50.0050.1580.1630.9490.0120.1870.1900.9490.0310.2150.220.942
β x 12 0.50.0190.2530.2540.9540.0230.2990.3050.9570.0100.3440.3630.949
β u 1 0.5−0.0250.0900.0960.935−0.0310.1050.1050.937−0.0340.1210.1200.944
β x 21 0.50.0120.1530.1660.9520.0080.1830.1850.9420.0240.2110.2190.946
β x 22 0.50.0350.2390.2590.9560.0070.2890.3020.9490.0110.3330.3480.953
β u 2 0.5−0.0240.0890.0940.941−0.0320.1040.1040.94−0.030.1200.1150.943
Table 2. Simulation results on estimation of β with the b i ’s generated from the uniform distribution and r 1 = r 2 = 0 .
Table 2. Simulation results on estimation of β with the b i ’s generated from the uniform distribution and r 1 = r 2 = 0 .
True ValueBiasSSEESECP
β x 11 0−0.0050.2380.2400.948
β x 12 0−0.0040.4040.4040.953
β u 1 00.0010.1340.1340.946
β x 21 0−0.0010.2230.2320.950
β x 22 0−0.0040.3740.4120.945
β u 2 00.0030.1260.1260.957
β x 11 0.50.0310.2230.2310.948
β x 12 0.50.0350.3610.4010.956
β u 1 0.5−0.0210.1290.1340.947
β x 21 0.50.0220.2150.2310.948
β x 22 0.50.0540.3420.4120.947
β u 2 0.5−0.0120.1290.1290.955
Table 3. Simulation results on estimation of β with the covariates generated from the normal distribution and r 1 = r 2 = 0 .
Table 3. Simulation results on estimation of β with the covariates generated from the normal distribution and r 1 = r 2 = 0 .
True ValueBiasSSEESECP
β x 11 0−0.0030.1270.1250.949
β x 12 00.0030.1260.1260.953
β u 1 0−0.0060.1420.1450.950
β x 21 00.0030.1200.1190.938
β x 22 0−0.0010.1200.1190.941
β u 2 00.0040.1330.1410.957
β x 11 0.50.0370.1420.1410.946
β x 12 0.50.0270.1410.1410.954
β u 1 0.5−0.0110.1460.1520.948
β x 21 0.50.0370.1370.1350.945
β x 22 0.50.0330.1380.1340.939
β u 2 0.5−0.0080.1430.1480.949
Table 4. Simulation results on estimation of β with mixed Poisson observation processes and r 1 = r 2 = 0 .
Table 4. Simulation results on estimation of β with mixed Poisson observation processes and r 1 = r 2 = 0 .
True ValueBiasSSEESECP
β x 11 00.0020.2460.2500.952
β x 12 00.0130.4180.4290.951
β u 1 00.0010.1370.1360.951
β x 21 0−0.0020.2330.2410.948
β x 22 00.0060.3870.4190.946
β u 2 0−0.0010.1300.1310.949
β x 11 0.50.0370.2330.2410.948
β x 12 0.50.0750.3760.4000.949
β u 1 0.5−0.0340.1330.1360.945
β x 21 0.50.0370.2260.2570.953
β x 22 0.50.0380.3570.4350.949
β u 2 0.5−0.0290.1320.1420.952
Table 5. Analysis results for the AIDS clinical trials data.
Table 5. Analysis results for the AIDS clinical trials data.
r 1 r 2 β blood β urine SE blood SE urine P blood P urine AICBIC
002.3121.1430.3960.1420.0000.001727.917744.507
0.52.3331.2910.4010.2080.0000.002733.972750.563
12.3811.4370.4130.2560.0000.002738.959755.550
0.502.4831.1410.4240.1410.0000.001727.327743.918
0.52.5041.2880.4300.2060.0000.002733.405749.996
12.5521.4350.4420.2550.0000.002738.426755.016
102.6521.1400.4510.1400.0000.001726.826743.416
0.52.6701.2850.4570.2050.0000.002732.924749.514
12.7171.4330.4690.2540.0000.002737.981754.572
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yu, M.; Du, M. Regression Analysis of Multivariate Interval-Censored Failure Time Data under Transformation Model with Informative Censoring. Mathematics 2022, 10, 3257. https://doi.org/10.3390/math10183257

AMA Style

Yu M, Du M. Regression Analysis of Multivariate Interval-Censored Failure Time Data under Transformation Model with Informative Censoring. Mathematics. 2022; 10(18):3257. https://doi.org/10.3390/math10183257

Chicago/Turabian Style

Yu, Mengzhu, and Mingyue Du. 2022. "Regression Analysis of Multivariate Interval-Censored Failure Time Data under Transformation Model with Informative Censoring" Mathematics 10, no. 18: 3257. https://doi.org/10.3390/math10183257

APA Style

Yu, M., & Du, M. (2022). Regression Analysis of Multivariate Interval-Censored Failure Time Data under Transformation Model with Informative Censoring. Mathematics, 10(18), 3257. https://doi.org/10.3390/math10183257

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop