Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves

Muse, Abdisalam Hassan; Chesneau, Christophe; Ngesa, Oscar; Mwalili, Samuel

doi:10.3390/mca27060104

Open AccessArticle

Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves

¹

Institute for Basic Sciences, Technology and Innovation (PAUSTI), Pan African University, Nairobi 62000-00200, Kenya

²

Faculty of Science and Humanities, School of Postgraduate Studies and Research, Amoud University, Borama 25263, Somalia

³

Department of Mathematics, LMNO, CNRS-Université de Caen, Campus II, Science 3, 14032 Caen, France

⁴

Department of Mathematics and Physical Sciences, Taita Taveta University, Voi 635-80300, Kenya

⁵

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi 62000-00200, Kenya

^*

Authors to whom correspondence should be addressed.

Math. Comput. Appl. 2022, 27(6), 104; https://doi.org/10.3390/mca27060104

Submission received: 2 November 2022 / Revised: 22 November 2022 / Accepted: 29 November 2022 / Published: 30 November 2022

(This article belongs to the Special Issue Statistical Inference in Linear Models)

Download

Browse Figures

Versions Notes

Abstract

:

This study aims to propose a flexible, fully parametric hazard-based regression model for censored time-to-event data with crossing survival curves. We call it the accelerated hazard (AH) model. The AH model can be written with or without a baseline distribution for lifetimes. The former assumption results in parametric regression models, whereas the latter results in semi-parametric regression models, which are by far the most commonly used in time-to-event analysis. However, under certain conditions, a parametric hazard-based regression model may produce more efficient estimates than a semi-parametric model. The parametric AH model, on the other hand, is inappropriate when the baseline distribution is exponential because it is constant over time; similarly, when the baseline distribution is the Weibull distribution, the AH model coincides with the accelerated failure time (AFT) and proportional hazard (PH) models. The use of a versatile parametric baseline distribution (generalized log-logistic distribution) for modeling the baseline hazard rate function is investigated. For the parameters of the proposed AH model, the classical (via maximum likelihood estimation) and Bayesian approaches using noninformative priors are discussed. A comprehensive simulation study was conducted to assess the performance of the proposed model’s estimators. A real-life right-censored gastric cancer dataset with crossover survival curves is used to demonstrate the tractability and utility of the proposed fully parametric AH model. The study concluded that the parametric AH model is effective and could be useful for assessing a variety of survival data types with crossover survival curves.

Keywords:

Bayesian inference; hazard-based regression model; survival analysis; accelerated hazard model; generalized log-logistic distribution; crossover survival curves; censored data; maximum likelihood estimation

1. Introduction

In the analysis of lifetime data, hazard-based regression models have played a pivotal role. Such models produce a much more versatile framework for modeling survival data. They also make it conceivable to easily interpret the parameters from a practical perspective. When using regression models to analyze lifetime data, the Cox proportional hazard (PH) [1,2] model is the most widely assumed semi-parametric framework. The PH model’s main assumption is that the hazard ratios are proportional over time. When such assumptions are not validated by data, alternative survival regression models, such as the accelerated failure time (AFT) [3,4], and proportional odds (PO) [5] models might be applied in the analysis. However, none of them are appropriate for capturing lifetime data with crossing survival and hazard curves [6].

This kind of issue is frequently associated with clinical trials, including control and treatment groups. The survival function (SF) of one group may degrade swiftly while the SF of the other group decays slowly. The curves tend to meet at some point, resulting in an inversion in terms of who is on the bottom/top. The study of this change is essential in many clinical studies because determining the crossing time reveals when the target treatment for an illness can be judged beneficial [6].

In practice, time-to-event data with crossing survival curves can occur for a variety of reasons. Crossing survival curves, according to Breslow [7], may occur when a treatment has an early rapid benefit and then becomes equally or worse than placebo treatment after such a time period. Additionally, as described in Diao et al. [8], crossing survival curves may occur in clinical studies when a particular intensive treatment (i.e., surgery) may have negative consequences at first but show good results in the long term.

Several techniques have been presented in the literature to handle this crossover feature in time-to-event data. The most often used are based on regression coefficients that change over time; see, for instance, Egge and Zahl [9], Putter et al. [10], Shyur et al. [11], and Zhang et al. [12]. Two recent works considering the modeling and analysis of time-to-event data with crossing survival curves are [6,13]. For this type of problem, Chen and Wang [14] developed a semi-parametric two-sample framework. The two-sample feature refers to a scenario in which there is a control, and a treatment group, which can be readily represented by a binary variable. The AH model is an intriguing choice because it formulates similarly to the PH and AFT models. In their model, they leave the baseline hazard rate function (HRF) undefined. As an alternative to the PO or AFT models, their model relaxes the proportional hazard assumption while still allowing for the inclusion of both time-independent and time-dependent factors.

Although they offered an exploratory visual examination of the model’s suitability, they did not completely cover statistical model checking of the proposed model. Chen and Jewell [15] presented the AH model and its applicability to censored survival data. They used the AH model to analyze real data from a randomized clinical study of biodegradable carmustine polymers for the treatment of brain cancer. This analysis illustrated the model’s useful applications and the recommended test statistics.

The semi-parametric AH model estimators, on the other hand, include the unknown distribution in the asymptotic variance. Thus, numerically demanding approaches are required to make an inference about this parameter. As a result, Lee [16] suggested a straightforward estimation method for the semi-parametric AH model in which estimators are asymptotically normal with a distribution-free asymptotic variance. This also yields several lack-of-fit tests. These tests are similar to Gill–Schumacher tests in that the estimating functions are assessed at two separate weight functions, generating two estimators that are close to each other. They demonstrated that the estimators and tests perform well for some weight functions using numerical experiments. For more information about the estimators and tests for the semi-parametric AH model, we refer to [17].

Cox [1] pioneered the use of semi-parametric hazard-based regression models for univariate time-to-event data with the PH model. Rubio et al. [18] and Khan [19] presented two influential papers that propose the use of extended lifetime distributions to substitute the baseline hazard in a time-to-event analysis. The formulation of parametric hazard-based regression models is a central issue in Lawless [20]. The authors explored the benefits of using parametric hazard-based regression models. It is noticed that the baseline-modified distribution should be chosen based on its flexibility to incorporate varied failure rate shapes. A few examples include: Muse et al. [21], Muse et al. [22], Ashraful-Ul-Alam and Khan [23], Alvares and Rubio [24], Muse et al. [25], Al-aziz et al. [26], and Khan and Khosa [27].

Despite the numerous advantages of the semi-parametric AH framework, its implementation in applications appears to be restricted, owing to the technical difficulties in implementing theoretical breakthroughs. Estimation for the covariance matrices is challenging when the data are censored because the asymptotic covariance matrices for the regression estimators in this model involve the unknown baseline HRF and its derivative. However, censored data present a new technological barrier. Numerically demanding approaches, such as resampling techniques, can be used to approximate the covariance matrices. However, they are inefficient in actual settings due to their high computing cost [28].

The current study presents a fully parametric hazard-based regression model to fit the AH model to address the aforementioned concerns. The fundamental idea is to represent the baseline hazard by using a generalized log-logistic (GLL) distribution that is closed under both the AFT [25] and PH [22] frameworks and may incorporate various hazard rate shapes data including monotone and non-monotone shapes. Another advantage of the baseline is that it encompasses some of the most parametric distributions used in reliability and survival studies, such as log-logistic (LL), Burr XII with both 2-parameter and 3-parameter cases, Weibull, and exponential distributions. The shared tractability of parametric regression models and the adaptability of semi-parametric regression models is another appealing aspect of the suggested parametric AH model.

Thus, the main contribution of this study is to introduce and study a novel, flexible, parametric AH model to incorporate right-censored lifetime data with crossing survival curves. This is done by assuming the GLL lifetime distribution to deal with the baseline hazard in the parametric AH model. To the best of the author’s knowledge, we emphasize that using the parametric AH model with GLL baseline distribution hazard to extend the original AH semi-parametric model has never been considered in the literature. The methods are studied by using the classical and Bayesian frameworks for a more comprehensive presentation of models for all statistical audiences to consider. A detailed simulation study is also being developed. This entails introducing one binary and one continuous covariate into the baseline hazard. The reader should be aware that the majority of the single covariate scenarios have been researched in prominent references, such as [8].

Additionally, the following are some significant benefits of the methodology proposed here.

i.: It possesses the adaptability of parametric survival regression models.
ii.: It offers a continuous SF that makes it simple to find where two survival curves overlap.
iii.: It allows different shapes for the HRF and has the tractability of a parametric survival regression model.

The following is a brief description of the sections that compose the article. Section 2 discusses the formulation of the parametric AH model and associated probabilistic functions. Section 3 presents the baseline distribution under consideration, as well as alternative competing lifetime distributions, including some of its special cases. The proposed parametric AH model with GLL baseline distribution HRF and its submodels are presented in Section 4. Section 5 discusses the model inferential procedures. Section 6 performs the simulation studies. Section 7 demonstrates a real-life, right-censored cancer dataset with crossed survival curves. Section 8 concludes the study with some farewell remarks and suggests future research.

2. AH Model Formulation

In order to handle lifetime data with crossing of hazard and survival curves, Chen and Weng [14] proposed a hazard-based regression model known as the AH model that is expressed as follows:

h (t; x) = h_{0} (t ψ (x^{'} β)) = h_{0} (t e^{x^{'} β}),

(1)

where

ψ (x^{'} β) = e^{x^{'} β}

is the link function of the explanatory variables,

x = (x_{1}, x_{2}, \dots, x_{p})

is a vector of covariates,

β^{'} = (β_{1}, β_{2}, \dots, β_{p})

is a vector of coefficients of regression, and

h_{0} (t)

corresponds to the baseline hrf.

In this model,

e^{x^{'} β}

characterizes how the explanatory variables into x change the time scale of the underlying HRF. In this case,

β < 0

or

β > 0

imply deceleration or acceleration of the HRF’s time scale, respectively. For example, if one explanatory variable has a value of 1 for a treatment group and a value of 0 for a control group, then

e^{β} = \frac{1}{2}

indicates that the HRF of the treatment group advances in half the time as those in the control group. The same is true for

e^{β} = 2

, which indicates that the HRF of the treatment group advances twice as quickly as those in the control group. There are no differences between the groups, according to

e^{β} = 1

.

The AH model offers some appealing and intriguing characteristics. The AH model, unlike the AFT and PH models, can handle the crossing of survival and hazard curves [29]. Furthermore, the AH framework enables both the control and treatment groups’ hazard curves to begin at the same time point. This is especially beneficial in randomized controlled trials, because it is more reasonable to hypothesize that the hazard or risk between groups is comparable at

t = 0

[30].

The inability of the parametric AH model to incorporate situations where the HRF is constant over time is a limitation that is not shared by the AFT and PH models (e.g., exponential distribution) [28]. As a result, before implementing this model, it is crucial to assess the non-constancy of the baseline function. The AH model, like the AFT and PH models, has coincidences when the baseline HRF is a Weibull distribution [31].

As an alternative, the cumulative hazard function (CHF) can be used to represent the parametric AH model as follows:

H (t; x) = H_{0} (t e^{x^{'} β}) e^{- x^{'} β},

(2)

where

H_{0} (t)

denotes the baseline CHF.

The other probabilistic functions for the parametric AH model, associated with Equation (2), can be expressed as follows.

The sf for the parametric AH model is

S (t; x) = {[S_{0} (t e^{x^{'} β})]}^{e^{- x^{'} β}},

(3)

where

S_{0} (t)

denotes the baseline SF. The cumulative distribution function (CDF) for the parametric AH model is

F (t; x) = 1 - {[S_{0} (t e^{x^{'} β})]}^{e^{- x^{'} β}} .

(4)

The probability density function (PDF) for the parametric AH model is

f (t; x) = f_{0} (t e^{x^{'} β}) {[S_{0} (t e^{x^{'} β})]}^{e^{- x^{'} β}},

(5)

where

f_{0} (t)

denotes the baseline PDF.

3. Baseline Hazard

Standard parametric models using several prominent survival distributions are commonly used in survival data analysis. The LL distribution is one of the most commonly utilized in oncology research, owing to the flexibility of its HRF and the ability to estimate its parameters. We frequently have datasets in medical research that demand more advanced parametric models. To do this, the literature has introduced new classes of parametric distributions based on the modification of the LL distribution. Specific situations include the GLL distribution [32], Kumaraswamy LL (KuLL) distribution [33], heavy-tailed LL (HTLL) distribution [34], tan LL (TLL) distribution [35], a novel LL (NLL) distribution [36], arctan LL distribution [37], and an extended LL (ELL) distribution [38], among others [39].

For fully parametric hazard-based regression models, we must assume a parametric form for the baseline, of which there are an infinite number of options, and which one is appropriate will generally depend on the situation. We analyze a general-purpose candidate, the chosen GLL distribution presented by Khan and Khosa [27], in this paper. The GLL distribution is constructed by using the AH framework, and it is then contrasted with various baseline hazards that can take into account different hazard rate shapes as well as some of its special case distributions.

The HRF and the CHF of the GLL distribution are expressed as follows:

h_{G L L} (t; θ) = \frac{α k {(k t)}^{α - 1}}{1 + {(η t)}^{α}}, t \geq 0, k, α, η > 0,

(6)

H_{G L L} (t; θ) = \frac{k^{α}}{η^{α}} log [1 + {(η t)}^{α}], t \geq 0, k, α, η > 0,

(7)

where

θ

represents the vector of the involved parameters.

The HRF in Equation (6) consists of different submodels of the GLL distribution [32]. These distributions are listed as follows:

Log-logistic (LL): when

k = η

, Equation (6) reduces to the hrf of the LL distribution, which is

h_{L L} (t; θ) = \frac{α k {(k t)}^{α - 1}}{1 + {(k t)}^{α}}, t \geq 0, k, α > 0 .

(8)

Burr-XII (BXII): when

η = 1

, equation (6) reduces to the hrf of the BXII-2 distribution, which is

h_{B X I I} (t; θ) = \frac{α k {(k t)}^{α - 1}}{1 + t^{α}}, t \geq 0, k, α > 0 .

(9)

Weibull (W): when

η \to 0

, Equation (6) reduces to the hrf of the W distribution, which is

h_{W} (t; θ) = α k {(k t)}^{α - 1}, t \geq 0, k, α > 0 .

(10)

In this work, we compare the proposed baseline hazard to its submodels as well as three additional baseline hazard candidates that can be incorporated for both monotone and nonmonotone hazard rate shapes: the power generalized Weibull (PGW) model [40], exponentiated Weibull (EW) model [41], and the generalized gamma (GG) model [42]. The corresponding distributions have comparable levels of adaptability and tractability. The following are the HRF functions for the PGW, EW, and GG distributions, respectively:

h_{P G W} (t; θ) = \frac{α}{η k^{α}} t^{α - 1} {[1 + {(\frac{t}{k})}^{α}]}^{(\frac{1}{η} - 1)}, t \geq 0, k, α, η > 0,

(11)

h_{G G} (t; θ) = \frac{\frac{η}{Γ (\frac{α}{η}) k^{α}} t^{α - 1} e^{- {(\frac{t}{k})}^{η}}}{1 - \frac{γ (\frac{α}{η}, {(\frac{t}{k})}^{η})}{Γ (\frac{α}{η})}}, t \geq 0, k, α, η > 0,

(12)

where

γ (t, x)

and

Γ (x)

denote the incomplete and complete gamma functions, respectively, and

h_{E W} (t; θ) = \frac{α k η {(k t)}^{α - 1} {[1 - e^{- {(k t)}^{α}}]}^{η - 1} e^{e^{- {(k t)}^{α}}}}{1 - {[1 - e^{- {(k t)}^{α}}]}^{η}}, t \geq 0, k, α, η > 0 .

(13)

We also used the gamma (G) and log-normal (LN) distributions, two additional popular classical distributions used in survival and reliability research.

4. The Proposed Model

There are several approaches to expressing parametric hazard-based regression models. The AH model formulation is one such strategy. The GLL hazard-based regression model can be written in the context of the AH framework by substituting the exponential function for the link function in Equation (1). We recall that the HRF under the

AH

framework is computed as follows:

h (t) = h_{0} (t e^{x^{'} β}) .

We begin with the GLL baseline distribution HRF with parameters

α, η

, and k (with the AH model notations). The HRF with an explanatory variable vector x is as follows:

h (t; θ, β, x) = h_{0} (t e^{x^{'} β}; θ) = \frac{α k {(k t^{*})}^{α - 1}}{1 + {(η t^{*})}^{α}},

(14)

which is the GLL HRF with

t^{*} = t e^{x^{'} β}

once more. In addition, the other survival probabilistic functions for the GLL–AH framework are expressed as follows.

The SF for the GLL–AH model is

S (t; θ, β, x) = {[S_{0} (t e^{x^{'} β}; θ)]}^{e^{- x^{'} β}} = {[1 + {(η t e^{x^{'} β})}^{α}]}^{\frac{k^{α} e^{- x^{'} β}}{η^{α}}} .

(15)

The CDF for the GLL–AH model is

F (t; θ, β, x) = 1 - {[S_{0} (t e^{x^{'} β}; θ)]}^{e^{- x^{'} β}} = 1 - {[1 + {(η t e^{x^{'} β})}^{α}]}^{\frac{k^{α} e^{- x^{'} β}}{η^{α}}} .

(16)

The CHF for the GLL–AH model is

H (t; θ, β, x) = H_{0} (t e^{x^{'} β}; θ) e^{- x^{'} β} = (\frac{k^{α}}{η^{α}} log [1 + {(η t e^{x^{'} β})}^{α}]) e^{- x^{'} β} .

(17)

The PDF for the GLL–AH model is

f (t; θ, β, x) = f_{0} (t e^{x^{'} β}; θ) {[S_{0} (t e^{x^{'} β}; θ)]}^{e^{- x^{'} β}} = \frac{α k {(k t e^{x^{'} β})}^{α - 1}}{{[1 + {(η t e^{x^{'} β})}^{α}]}^{\frac{k^{α}}{η^{α}} + 1}} {[1 + {(η t e^{x^{'} β})}^{α}]}^{\frac{k^{α} e^{- x^{'} β}}{η^{α}}} .

(18)

4.1. Submodels

The proposed parametric hazard-based GLL–AH model framework has three submodels that are also closed under the AH framework.

4.1.1. Submodel I: $η = 1$

If we put

η = 1

in Equation (14), we get the HRF of the BXII–AH model, which is expressed mathematically as

h (t; x) = \frac{α k {(k t e^{x^{'} β})}^{α - 1}}{1 + {(t e^{x^{'} β})}^{α}} .

(19)

4.1.2. Submodel II: $η = k$

If we put

η = k

in Equation (14), we are referred to the HRF of the LL–AH model, which is stated mathematically as

h (t; x) = \frac{α k {(k t e^{x^{'} β})}^{α - 1}}{1 + {(k t e^{x^{'} β})}^{α}} .

(20)

4.1.3. Submodel III: $η^{α} \to 0$

If we put

η^{α} \to 0

in Equation (14), we are referred to the HRF of the W–AH model, which is stated mathematically as

h (t; x) = α k {(k t e^{x^{'} β})}^{α - 1} .

(21)

5. Inferential Procedures

In this section, the parameters of the proposed parametric AH model with GLL baseline distribution HRF are estimated by using a classical approach (via the maximum likelihood estimation (MLE) method) and Bayesian inference using noninformative priors.

5.1. Classical Approach

We are concerned in this subsection with a full likelihood function for the proposed parametric AH model. The likelihood function is an important component not only in the Bayesian approach but also in classical inference, in which the standard approach for estimating parameters involves maximizing it. Consider both noninformative and independent (right) censorship.

Suppose there are n individuals with survival times denoted by

T_{1}, T_{2}, \dots, T_{n}

. Assuming that the data are subject to right censoring, we observe

t_{i} = min (T_{i}, R C_{i})

, where

R C_{i} > 0

being the censoring time for individual i. Letting

δ_{i} = I (T_{i} \leq R C_{i})

that equals 1 if

T_{i} \leq R C_{i}

and 0 otherwise, the observed data for individual i consists of

\{t_{i}, δ_{i}, x_{i}\}, i = 1, 2, \dots, n

, where

t_{i}

is a survival time or censoring time according to whether

δ_{i} = 1

or 0, respectively, and

x_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i p})}^{'}

is a

p \times 1

column vector of external covariates.

When considering a parametric AH model, the censored likelihood function can be written as follows:

\begin{matrix} L (θ, β; D) & = \prod_{i = 1}^{n} {[f (t_{i}; θ, β, x_{i})]}^{δ_{i}} {[S (t_{i}; θ, β, x_{i})]}^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} {[\frac{h (t_{i}; θ, β, x_{i})}{S (t_{i}; θ, β, x_{i})}]}^{δ_{i}} {[S (t_{i}; θ, β, x_{i})]}^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} {[h (t_{i}; θ, β, x_{i})]}^{δ_{i}} S (t_{i}; θ, β, x_{i}) \\ = \prod_{i = 1}^{n} {[h (t_{i}; θ, β, x_{i})]}^{δ_{i}} exp [- H (t_{i}; θ, β, x_{i})] \\ = \prod_{i = 1}^{n} {[h_{0} (t_{i} e^{x_{i}^{'} β}; θ)]}^{δ_{i}} exp [- H_{0} (t_{i} e^{x_{i}^{'} β}; θ) e^{- x_{i}^{'} β}], \end{matrix}

(22)

where

D = (t_{i}, δ_{i}, x_{i}, i = 1, 2, \dots, n)

represents the observed data, which includes survival times, censoring time, and covariates. In our expression, we recall that

θ

is the vector of baseline distributional parameters, and

β

is the regression coefficients. An iterative optimization approach can be used to produce the MLE (e.g., the Newton–Raphson algorithm). Because the MLEs are approaching normalcy, various hypothesis tests and interval constructions of model parameters are conceivable.

The log-likelihood function is expressed as follows:

ℓ (θ, β; D) = \sum_{i = 1}^{n} δ_{i} log [h_{0} (t e^{x_{i}^{'} β}; θ)] - \sum_{i = 1}^{n} H_{0} (t_{i} e^{x_{i}^{'} β}; θ) e^{- x_{i}^{'} β} .

(23)

The GLL–AH model’s full log-likelihood function is expressed as follows:

\begin{matrix} ℓ (θ, β; D) & = \sum_{i = 1}^{n} δ_{i} log (α) + \sum_{i = 1}^{n} δ_{i} α log (k) + (α - 1) \sum_{i = 1}^{n} δ_{i} log (t_{i} e^{x_{i}^{'} β}) \\ - \sum_{i = 1}^{n} δ_{i} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] - {(\frac{k}{η})}^{α} \sum_{i = 1}^{n} e^{- x_{i}^{'} β} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] . \end{matrix}

(24)

To obtain the MLE of

θ^{'} = (k, α, η)

, and

β

, we can directly maximize Equation (24) with respect to

(k, α, η)

, and

β

. Alternatively, we can express the first derivative of the log-likelihood function in order to solve the nonlinear equations below for the log-likelihood function’s first derivative.

With this aim, let us set

φ = (k, α, η, β)

. Then the first derivatives of the log-likelihood functions are as follows:

\begin{matrix} \frac{\partial ℓ (φ)}{\partial α} & = \frac{1}{α} \sum_{i = 1}^{n} δ_{i} + \sum_{i = 1}^{n} δ_{i} log (k) + \sum_{i = 1}^{n} δ_{i} log (t_{i} e^{x_{i}^{'} β}) \\ - \sum_{i = 1}^{n} δ_{i} \frac{{(η t_{i} e^{x_{i}^{'} β})}^{α} log (η t_{i} e^{x_{i}^{'} β})}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}} \\ - {(\frac{k}{η})}^{α} log (k) \sum_{i = 1}^{n} e^{- x_{i}^{'} β} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] \\ + {(\frac{k}{η})}^{α} log (η) \sum_{i = 1}^{n} e^{- x_{i}^{'} β} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] \\ - {(\frac{k}{η})}^{α} \sum_{i = 1}^{n} \frac{e^{- x_{i}^{'} β} {(η t_{i} e^{x_{i}^{'} β})}^{α} log (η t_{i} e^{x_{i}^{'} β})}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}}, \end{matrix}

(25)

\begin{matrix} \frac{\partial ℓ (φ)}{\partial η} & = - (\frac{α}{η}) \sum_{i = 1}^{n} δ_{i} \frac{{(η t_{i} e^{x_{i}^{'} β})}^{α}}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}} \\ + (\frac{α}{η}) {(\frac{k}{η})}^{α} \sum_{i = 1}^{n} e^{- x_{i}^{'} β} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] \\ - (\frac{α}{η}) {(\frac{k}{η})}^{α} \sum_{i = 1}^{n} \frac{e^{- x_{i}^{'} β} {(η t_{i} e^{x_{i}^{'} β})}^{α}}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}}, \end{matrix}

(26)

\begin{matrix} \frac{\partial ℓ (φ)}{\partial k} = (\frac{α}{k}) \sum_{i = 1}^{n} δ_{i} - (\frac{α}{k}) {(\frac{k}{η})}^{α} \sum_{i = 1}^{n} e^{- x_{i}^{'} β} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] \end{matrix}

(27)

and

\begin{matrix} \frac{\partial ℓ (φ)}{\partial β_{j}} & = (α - 1) \sum_{i = 1}^{n} δ_{i} x_{i j} - α \sum_{i = 1}^{n} δ_{i} x_{i j} \frac{{(η t_{i} e^{x_{i}^{'} β})}^{α}}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}} \\ + {(\frac{k}{η})}^{a} \sum_{i = 1}^{n} x_{i j} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}] . \end{matrix}

(28)

5.2. Bayesian Approach

In this subsection, the prior distributions for the parameters of the proposed model are first established, and these distributions are then multiplied by the likelihood function to create the Bayesian model.

5.2.1. Prior Distribution

The formulation of a prior distribution is a crucial step in every Bayesian approach. This is especially true for fully parametric survival regression models. Because we lack prior knowledge from historical data or from prior experiments, we set the prior scenario in this study using a noninformative independent gamma distribution, Gamma

(10, 10)

, as the baseline distribution parameters. Gamma distributions are flexible and include noninformative priors (uniform) and the marginal priors distribution for each regression coefficient is taken as a normal distribution centered at zero with a wide known variance

(0, 100)

. Numerous study articles in the literature, such as [19,22,24,25,26,43], take these priors into account. Here, we consider

π (α) \sim G (a_{1}, b_{1}) = \frac{b_{1}^{a_{1}}}{Γ (a_{1})} α^{a_{1} - 1} e^{- b_{1} α}; a_{1}, b_{1}, α > 0,

(29)

π (η) \sim G (a_{2}, b_{2}) = \frac{b_{2}^{a_{2}}}{Γ (a_{2})} η^{a_{2} - 1} e^{- b_{2} η}; a_{2}, b_{2}, η > 0,

(30)

π (k) \sim G (a_{3}, b_{3}) = \frac{b_{3}^{a_{3}}}{Γ (a_{3})} k^{a_{3} - 1} e^{- b_{3} k}; a_{3}, b_{3}, k > 0 .

(31)

From historical data of the baseline distribution, it is simple to determine the hyperparametric values of the prior distributions [32]. When the explanatory variables are assumed to have a prior normal distribution, we have the following regression coefficients:

π (β^{'}) \sim N (a_{4}, b_{4}) .

(32)

The joint prior distribution of all unknown parameters has a PDF given by

π (α, k, η, β^{'}) = π (α) π (η) π (k) π (β^{'}) .

(33)

5.2.2. Likelihood Function

The likelihood function for the GLL general hazard model is computed as follows:

\begin{matrix} L_{G L L - A H} (θ, β; D) = & = \prod_{i = 1}^{n} {[h_{0} (t_{i} e^{x_{i}^{'} β}; θ)]}^{δ_{i}} exp [- H_{0} (t_{i} e^{x_{i}^{'} β}; θ) e^{- x_{i}^{'} β}] \\ = \prod_{i = 1}^{n} {[\frac{α k {(k t_{i} e^{x_{i}^{'} β})}^{α - 1}}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}}]}^{δ_{i}} \\ exp [- \{\frac{k^{α}}{η^{α}} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}]\} e^{- x_{i}^{'} β}] . \end{matrix}

(34)

5.2.3. Posterior Distribution

The joint posterior PDF is expressed as the multiplication of the likelihood function in Equation (34) and the prior distribution in Equation (33):

\begin{matrix} p (α, k, η, β; t) & \propto \prod_{i = 1}^{n} {[\frac{α k {(k t_{i} e^{x_{i}^{'} β})}^{α - 1}}{1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}}]}^{δ_{i}} \\ exp [- \{\frac{k^{α}}{η^{α}} log [1 + {(η t_{i} e^{x_{i}^{'} β})}^{α}]\} e^{- x_{i}^{'} β}] \\ \times π (α, k, η, β^{'}), \end{matrix}

(35)

where the prior specification for the unknown parameters is represented by the first four terms on the right-hand side of the equation.

The joint posterior PDF is analytically intractable because of how challenging it is to integrate. Therefore, the inference can be supported by the Markov chain Monte Carlo (McMC) simulation methods, including the Gibbs sampler and Metropolis–Hastings algorithms, which can be used to generate samples from which features of the relevant marginal distributions can be inferred.

6. Simulation Study

In this section, we offer a thorough Monte Carlo (MC) simulation analysis to assess how well the suggested model performs in terms of estimating the parameters of the baseline distribution and the regression coefficients. There are two inferential techniques used in the analysis.

I.: Procedure I: An MLE estimate technique.
II.: Procedure II: A Bayesian estimation technique with independent gamma priors for the baseline distribution parameters and a normal prior for the regression coefficients, as well as non-informative priors.

Two explanatory variables in an AH regression framework were considered in all simulations: one binary covariate

(x_{1})

generated from Bernoulli

(0.5)

distribution and one continuous covariate

(x_{2})

generated from the standard normal distribution. Regression parameter values were chosen to be

β = (0.75, 0.5)

corresponding to the covariate vector

x = {(x_{1}, x_{2})}^{'}

.

The GLL baseline distribution hazard was used to generate the survival data, and the exponential distribution with a rate parameter equal to the censoring proportion of 10% was used to generate the censoring times.

We were particularly interested in the performance and accuracy of the proposed model’s estimators in the simulation exercise, specifically the bias, standard error, and mean square error. The simulation’s findings were derived from 500 replications with 50, 100, 300, and 500 samples for each parameter value. The results are shown in Table 1, which includes the mean estimate (est), standard error (SE), average bias (AB), mean square error (MSE), and coverage probability for the MLE estimates for both inferential techniques. The estimates’ averages are extremely close, and generally, the AB and MSE are less as sample size rises. Additionally, as sample sizes are increased, estimates for all evaluated parameters perform better. We also note that, compared to MLE estimates, Bayesian estimates have a lower SE.

Similar results were obtained from a simulation analysis with around 20% censored observations for each dataset (data not shown). In conclusion, our simulation work has shown that the suggested parametric AH model may prove to be a highly helpful parametric hazard-based regression model to accurately represent survival data with or without crossover survival curves.

7. Applications

This section examines a right-censored dataset from an oncology clinical trial with crossover survival curves to show how the proposed parametric AH model can be used to model lifetime data with crossing survival curves. First, the Rstan package’s Bayesian analysis of the AH model and its competing models, such as the PH, PO, and AFT models, is provided. After performing a traditional analysis with the MLE technique, add model comparison. Next, by using a frequentist estimation approach, regression analyses were conducted by using the proposed baseline hazard (GLL), power generalized Weibull (PGW), generalized gamma (GG), exponentiated Weibull (EW), log-logistic (LL), and Weibull (W) distributions as a baseline to AH models, and the fits were compared by using information criteria (Akaike information criterion (AIC), Consistent AIC (CAIC), and Hannan–Quinn information criterion (QIC)). The GLL–AH and its submodels are then used to do a Bayesian analysis.

7.1. Gastric Cancer Dataset

We look at the Gastrointestinal Tumor Study Group’s gastric cancer data collection (1982). This dataset has frequently been used in studies involving crossing survival curves, particularly in the field related to survival analysis. A few instances include Demarqui and Mayrink [6] and Diao et al. [8]. The dataset is freely accessible under the label "gastric" by using the R package AmoudSurv [44].

This oncology clinical trial includes 90 patients who have been diagnosed with locally advanced gastric cancer. The patients were randomly assigned to the following groups: (i) a control group, which included 45 patients who got chemotherapy; and (ii) a treatment group, which included 45 patients who received radiation therapy along with chemotherapy. In this study, these patients were followed for around 5 years. For each patient, three variables are reported in the datasets: the response time, which indicates failure (time to death) or right censoring (the censoring proportion in this data set is around 12.22%), a binary failure indicator, which identifies patients who experienced the event of interest, and a group binary indicator with 1, indicating the type of treatment.

Figure 1 shows the overall survival curve for the gastric cancer dataset as well as the survival curves for the two types of therapies (chemotherapy vs. chemotherapy mixed with radiotherapy) used to treat locally unresectable gastric cancer. Close inspection reveals crossovers and crossings between the curves, which supports the AH model’s efficacy and suitability for this data analysis. The fundamental non-parametric plots for the survival time of the gastric cancer dataset are presented in Figure 2.

7.2. Classical Analysis

The MLE estimates for baseline distribution parameters and coefficients of regression for the proposed AH model with different baseline distributions and other survival regression models with the GLL baseline distribution are provided in Table 2 and Table 3.

Table 2 summarizes the statistics for the GLL–AH model and other survival regression models, including the PH, PO, and AFT models with all GLL baseline distributions. Based on the information criterion values, we conclude that the GLL–AH model has the lowest AIC, CAIC, and HQIC values compared to the other survival regression models, which indicates that the GLL–AH model outperforms its competing models.

The statistics summary under the GLL–AH model, and other AH models with different baseline distributions are presented in Table 3. Based on the information criteria values, we deduce that the GLL–AH model beats its rival AH models becaue it has the lowest AIC, CAIC, and HQIC values when compared to the other AH models with various baseline distributions.

7.3. Likelihood Ratio Test

The proposed AH model with the GLL baseline distribution is compared to its submodels, which include the log-logistic AH, Burr-XII AH, and Weibull AH models, by using the likelihood ratio test (LRT). It is required to reduce the number of parameters in a model and evaluate how this affects the model’s capacity to match the data in order to draw thorough statistical conclusions about the model. In Table 4, statistics and related P-values demonstrate that the GLL–AH model fits the gastric dataset with crossing survival curves better than its submodels.

7.4. Bayesian Analysis

We used Bayesian analysis to compare the proposed GLL–AH model with its competing models, such as the GLL–PH, GLL–AH, and GLL–AFT models, and some of its submodels, including the LL–AH, BXII–AH, and W–AH regression models. The baseline distribution parameters

α \sim G (a_{1}, b_{1}), η \sim G (a_{2}, b_{2})

, and

k \sim G (a_{3}, b_{3})

with hyperparameter values

(a_{1} = b_{1} = a_{2} = b_{2} = a_{3} = b_{3} = 10)

are assumed to have separate gamma priors that are independent and noninformative normal prior with a value of

N (0, 100)

for

β^{'} s

(regression coefficients). The Rstan package was utilized for our analysis [45].

7.4.1. Numerical Summary

In this section, we used the McMC sample of posterior properties for the proposed fully parametric AH, PO, AFT, and PH models with the GLL baseline distribution in Table 5 to examine several posterior characteristics of interest and their numerical values. The submodels of the GLL baseline distribution using the AH model are also examined in Table 6 to assess several posterior characteristics of interest and their numerical values.

7.4.2. Visual Summary

Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 provide the trace and autocorrelation (AC) plots for the baseline distribution parameters and regression coefficients of the proposed AH model and its submodels, plus other competing survival regression models, including the GLL–PH, GLL–PO, and GLL–AFT models, indicating convergence of the chains.

7.4.3. Posterior Predictive Checks

If a fitted Bayesian parametric hazard-based regression model predicts future observations that are consistent with the current data, it is considered sufficient or performing well. By using the Bayesplot R package, posterior predictive check (PPC) plots are used to visually evaluate model fit. It can be seen from PPC in Figure 10, that the GLL–AH model fits the data quite well.

7.4.4. McMC Convergence Diagnostics

We applied both numerical and visual methods to evaluate the convergence of the McMC algorithm for the proposed models and their special cases. The McMC algorithm HMC-NUTS has converged to the joint posterior distribution, as shown by the summary results in the above table, because the potential scale reduction factor

h a t \hat{R}

is 1, the effective sample size

(N_{e f f})

is greater than 400, and the MC error (SE) is less than 0.05 of the posterior standard deviations for all parameters.

Visually assessing convergence is often done by using AC and trace graphs [23]. Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 show a stationary pattern fluctuating within a band, demonstrating the convergence of the McMC algorithm. Figure 11, showing the AC plot, demonstrates how the AC rapidly decreases to zero as the period of lag increases, indicating good mixing and the convergence of the algorithm to the desired posterior distribution. Finally, Figure 12 indicates the pdf plots for the GLL-AH model posterior parameters.

7.4.5. Bayesian Model Selection

We implemented two information criteria, the Watanabe–Akaike information criterion (WAIC), proposed by [46], for the Bayesian model comparison, and the leave-one-out information criterion (LOOIC) proposed by Vehtari et al. [47]. A model may be said to be best suited if it has the lowest WAIC and LOOIC values for both information criteria. In addition to Stan fitting, posterior predictive check (PPC) and determining WAIC and LOOIC are performed by using the R package loo [47]. Table 7 below shows that, when compared to its rival models, the GLL–AH model is the most effective. In addition, Table 8 demonstrates that, when compared to its sub-models, again the GLL–AH model is the superior one.

Figure 13 indicates the Kaplan–Meier estimate and the sf estimate for the proposed GLL–AH model parameters.

Figure 14 and Figure 15 demonstrate the Kaplan–Meier estimate and the survival estimate curves for the proposed regression models with GLL baseline distribution and the AH model with various baseline hazards. In Figure 14, the GLL–AH model survival curve is closer to the KM survival curve compared to all other survival regression models. The same thing occurred in Figure 15.

The main advantage of this study is that, unlike other parametric survival regression models like the PH, PO, and AFT models, the parametric AH model may accommodate survival datasets with crossover survival curves. The proposed parametric model, on the other hand, is inappropriate when the baseline distribution is exponential, which is one of the study’s limitations. Another limitation is that when the baseline distribution is the Weibull distribution, the proposed model performs identically to existing parametric hazard-based regression models, such as PH and AFT models.

Extension of the AH model’s structure to incorporate survival datasets with or without crossover survival curves is one possible future endeavor. Additionally, this framework may include other parametric survival regression models, such as the additive hazards model.

8. Conclusions

This article proposes a fully parametric AH model for dealing with censored lifetime data with crossover survival curves as an extension of the semi-parametric AH model [14]. The primary distinction between this modification and others is that we used a modified baseline distribution that can capture different hazard rate shapes to provide a more flexible depiction of the baseline hazard. By adopting a flexible parametric baseline distribution like the GLL distribution, we showed that it is possible to carry out both Bayesian and classical likelihood inference using the rstan package of the R programming language.

This also defines the paper’s key contribution, as no other study combining these two characteristics (AH model and a modified baseline distribution) can be found in the time-to-event analysis field. Furthermore, employing both Bayesian and classical inference via MLE will address the semi-parametric AH model’s limited use due to a lack of efficient and trustworthy estimation methods. Additionally, using the GLL distribution as a baseline hazard offers several benefits as compared to other parametric baseline distributions that may accept different hazard rate shapes, such as the gamma, GG, Weibull, EW, PGW, LL, Bur-XII, and LN distributions.

Following the simulation study, the paper gave a real-world demonstration involving a well-known dataset with crossover survival curves and was concerned with a clinical study for patients with gastric cancer. In summary, the GLL–AH model outperforms the other competing parametric AH models with various baseline hazards and other survival regression models with the same baseline hazard. Finally, we developed an R package, “AHSurv”, to fit the proposed model in this study as an addendum to this paper; the source code is accessible at [48].

Author Contributions

Conceptualization, A.H.M., C.C., O.N. and S.M.; Data curation, A.H.M., C.C., O.N. and S.M.; Formal analysis, A.H.M., C.C., O.N. and S.M.; Investigation, A.H.M., C.C., O.N. and S.M.; Methodology, A.H.M., C.C., O.N. and S.M.; Software, A.H.M., C.C., O.N. and S.M.; Supervision, C.C., O.N. and S.M.; Validation, A.H.M., C.C., O.N. and S.M.; Visualization, A.H.M., C.C., O.N. and S.M.; Writing—original draft, A.H.M. and C.C.; Writing—review & Editing, A.H.M., C.C., O.N. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Datasets are mentioned along the paper.

Acknowledgments

The authors would like to thank the academic editors and referees for their valuable suggestions and comments which improved the paper. The first author would like to thank Pan African University for supporting his work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
Kalbfleisch, J.D. Non-parametric Bayesian analysis of survival time data. J. R. Stat. Soc. Ser. B (Methodol.) 1978, 40, 214–221. [Google Scholar] [CrossRef]
Buckley, J.; James, I. Linear regression with censored data. Biometrika 1979, 66, 429–436. [Google Scholar] [CrossRef]
Komárek, A.; Lesaffre, E. Bayesian accelerated failure time model with multivariate doubly interval-censored data and flexible distributional assumptions. J. Am. Stat. Assoc. 2008, 103, 523–533. [Google Scholar] [CrossRef] [Green Version]
Bennett, S. Analysis of survival data by the proportional odds model. Stat. Med. 1983, 2, 273–277. [Google Scholar] [CrossRef]
Demarqui, F.N.; Mayrink, V.D. Yang and Prentice model with piecewise exponential baseline distribution for modeling lifetime data with crossing survival curves. Braz. J. Probab. Stat. 2021, 35, 172–186. [Google Scholar] [CrossRef]
Breslow, N.E.; Edler, L.; Berger, J. A two-sample censored-data rank test for acceleration. Biometrics 1984, 40, 1049–1062. [Google Scholar] [CrossRef]
Diao, G.; Zeng, D.; Yang, S. Efficient semiparametric estimation of short-term and long-term hazard ratios with right-censored data. Biometrics 2013, 69, 840–849. [Google Scholar] [CrossRef] [Green Version]
Egge, K.; Zahl, P.H. Survival of glaucoma patients. Acta Ophthalmol. Scand. 1999, 77, 397–401. [Google Scholar] [CrossRef]
Putter, H.; Sasako, M.; Hartgrink, H.; Van de Velde, C.; Van Houwelingen, J. Long-term survival with non-proportional hazards: Results from the Dutch Gastric Cancer Trial. Stat. Med. 2005, 24, 2807–2821. [Google Scholar] [CrossRef]
Shyur, H.J.; Elsayed, E.; Luxhøj, J.T. A general model for accelerated life testing with time-dependent covariates. Nav. Res. Logist. (NRL) 1999, 46, 303–321. [Google Scholar] [CrossRef]
Zhang, H.; Wang, P.; Sun, J. Regression analysis of interval-censored failure time data with possibly crossing hazards. Stat. Med. 2018, 37, 768–775. [Google Scholar] [CrossRef]
Demarqui, F.N.; Mayrink, V.D.; Ghosh, S.K. An Unified Semiparametric Approach to Model Lifetime Data with Crossing Survival Curves. arXiv 2019, arXiv:1910.04475. [Google Scholar]
Chen, Y.Q.; Wang, M.C. Analysis of accelerated hazards models. J. Am. Stat. Assoc. 2000, 95, 608–618. [Google Scholar] [CrossRef]
Chen, Y.Q.; Jewell, N.P.; Yang, J. Accelerated hazards model: Method, theory and applications. Handb. Stat. 2003, 23, 431–441. [Google Scholar]
Lee, S.H. Some estimators and tests for accelerated hazards model using weighted cumulative hazard difference. J. Appl. Stat. 2009, 36, 473–482. [Google Scholar] [CrossRef]
Lee, S.H. On the estimators and tests for the semiparametric hazards regression model. Lifetime Data Anal. 2016, 22, 531–546. [Google Scholar] [CrossRef]
Rubio, F.J.; Remontet, L.; Jewell, N.P.; Belot, A. On a general structure for hazard-based regression models: An application to population-based cancer research. Stat. Methods Med Res. 2019, 28, 2404–2417. [Google Scholar] [CrossRef]
Khan, S.A. Exponentiated Weibull regression for time-to-event data. Lifetime Data Anal. 2018, 24, 328–354. [Google Scholar] [CrossRef]
Lawless, J.F. Statistical Models and Methods for Lifetime Data; John Wiley & Sons: New York, NY, USA, 2011. [Google Scholar]
Muse, A.H.; Mwalili, S.; Ngesa, O.; Chesneau, C.; Alshanbari, H.M.; El-Bagoury, A.A.H. Amoud Class for Hazard-Based and Odds-Based Regression Models: Application to Oncology Studies. Axioms 2022, 11, 606. [Google Scholar] [CrossRef]
Muse, A.H.; Ngesa, O.; Mwalili, S.; Alshanbari, H.M.; El-Bagoury, A.A.H. A Flexible Bayesian Parametric Proportional Hazard Model: Simulation and Applications to Right-Censored Healthcare Data. J. Healthc. Eng. 2022, 2022, 2051642. [Google Scholar] [CrossRef] [PubMed]
Ashraf-Ul-Alam, M.; Khan, A.A. Generalized Topp-Leone-Weibull AFT modeling: A Bayesian Analysis with MCMC Tools Using R and Stan. Austrian J. Stat. 2021, 50, 52–76. [Google Scholar] [CrossRef]
Alvares, D.; Rubio, F.J. A tractable Bayesian joint model for longitudinal and survival data. Stat. Med. 2021, 40, 4213–4229. [Google Scholar] [CrossRef] [PubMed]
Muse, A.H.; Mwalili, S.; Ngesa, O.; Alshanbari, H.M.; Khosa, S.K.; Hussam, E. Bayesian and frequentist approach for the generalized log-logistic accelerated failure time model with applications to larynx-cancer patients. Alex. Eng. J. 2022, 61, 7953–7978. [Google Scholar] [CrossRef]
Al-Aziz, S.N.; Muse, A.H.; Jawad, T.M.; Sayed-Ahmed, N.; Aldallal, R.; Yusuf, M. Bayesian inference in a generalized log-logistic proportional hazards model for the analysis of competing risk data: An application to stem-cell transplanted patients data. Alex. Eng. J. 2022, 61, 13035–13050. [Google Scholar] [CrossRef]
Khan, S.A.; Khosa, S.K. Generalized log-logistic proportional hazard model with applications in survival analysis. J. Stat. Distrib. Appl. 2016, 3, 16. [Google Scholar] [CrossRef] [Green Version]
Chen, Y.; Hanson, T.; Zhang, J. Accelerated hazards model based on parametric families generalized with Bernstein polynomials. Biometrics 2014, 70, 192–201. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Peng, Y. Crossing hazard functions in common survival models. Stat. Probab. Lett. 2009, 79, 2124–2130. [Google Scholar] [CrossRef] [Green Version]
Muse, A.H.; Mwalili, S.; Ngesa, O.; Chesneau, C.; Al-Bossly, A.; El-Morshedy, M. Bayesian and Frequentist Approaches for a Tractable Parametric General Class of Hazard-Based Regression Models: An Application to Oncology Data. Mathematics 2022, 10, 3813. [Google Scholar] [CrossRef]
Chen, Y.Q.; Jewell, N.P. On a general class of semiparametric hazards regression models. Biometrika 2001, 88, 687–702. [Google Scholar] [CrossRef]
Muse, A.H.; Mwalili, S.; Ngesa, O.; Almalki, S.J.; Abd-Elmougod, G.A. Bayesian and classical inference for the generalized log-logistic distribution with applications to survival data. Comput. Intell. Neurosci. 2021, 2021, 5820435. [Google Scholar] [CrossRef]
De Santana, T.V.F.; Ortega, E.M.; Cordeiro, G.M.; Silva, G.O. The Kumaraswamy-log-logistic distribution. J. Stat. Theory Appl. 2012, 11, 265–291. [Google Scholar]
Teamah, A.E.A.; Elbanna, A.A.; Gemeay, A.M. Heavy-tailed log-logistic distribution: Properties, risk measures and applications. Stat. Optim. Inf. Comput. 2021, 9, 910–941. [Google Scholar] [CrossRef]
Muse, A.H.; Tolba, A.H.; Fayad, E.; Abu Ali, O.A.; Nagy, M.; Yusuf, M. modeling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution. Comput. Intell. Neurosci. 2021, 2021, 8640794. [Google Scholar] [CrossRef]
Mansour, M.M.; Ibrahim, M.; Aidi, K.; Shafique Butt, N.; Ali, M.M.; Yousof, H.M.; Hamed, M.S. A new log-logistic lifetime model with mathematical properties, copula, modified goodness-of-fit test for validation and real data modeling. Mathematics 2020, 8, 1508. [Google Scholar] [CrossRef]
Alkhairy, I.; Nagy, M.; Muse, A.H.; Hussam, E. The Arctan-X family of distributions: Properties, simulation, and applications to actuarial sciences. Complexity 2021, 2021, 4689010. [Google Scholar] [CrossRef]
Alfaer, N.M.; Gemeay, A.M.; Aljohani, H.M.; Afify, A.Z. The extended log-logistic distribution: Inference and actuarial applications. Mathematics 2021, 9, 1386. [Google Scholar] [CrossRef]
Muse, A.H.; Mwalili, S.M.; Ngesa, O. On the log-logistic distribution and its generalizations: A survey. Int. J. Stat. Probab. 2021, 10, 93. [Google Scholar] [CrossRef]
Haghighi, M.N.F. On the power generalized Weibull family: Model for cancer censored data. Metron 2009, 67, 75–86. [Google Scholar]
Mudholkar, G.S.; Hutson, A.D. The exponentiated Weibull family: Some properties and a flood data application. Commun. Stat.-Methods 1996, 25, 3059–3083. [Google Scholar] [CrossRef]
Stacy, E.W. A generalization of the gamma distribution. In Ann. Math. Stat. 1962, 33, 1187–1192. [Google Scholar] [CrossRef]
Elshahhat, A.; Muse, A.H.; Egeh, O.M.; Elemary, B.R. Estimation for Parameters of Life of the Marshall-Olkin Generalized-Exponential Distribution Using Progressive Type-II Censored Data. Complexity 2022, 2022, 8155929. [Google Scholar] [CrossRef]
Muse, A.H.; Mwalili, S.; Ngesa, O.; Chesneau, C. AmoudSurv: An R Package for Tractable Parametric Odds-Based Regression Models. 2022. Available online: https://cran.r-project.org/web//packages/AmoudSurv/index.html (accessed on 1 November 2022).
Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan: A probabilistic programming language. J. Stat. Softw. 2017, 76, 1–32. [Google Scholar] [CrossRef] [Green Version]
Watanabe, S. A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 2013, 14, 867–897. [Google Scholar]
Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef]
Muse, A.H.; Mwalili, S.; Ngesa, O.; Kilai, M. AHSurv: An R Package for Flexible Parametric Accelerated Hazards (AH) Regression Models. 2022. Available online: https://cran.r-project.org/web/packages/AHSurv/index.html (accessed on 1 November 2022).

Figure 1. Illustrating the overall survival curve and the crossing survival curves for the two types of treatment.

Figure 2. Fundamental plots for the survival time of the gastric cancer dataset.

Figure 3. The GLL–AH model posterior parameters trace plots of the gastric cancer data.

Figure 4. The GLL–PH model posterior parameters trace plots of the gastric cancer data.

Figure 5. The GLL–PO model posterior parameters trace plots of the gastric cancer data.

Figure 6. The GLL–AFT model posterior parameters trace plots of the gastric cancer data.

Figure 7. The LL–AH model posterior parameters trace plots of the gastric cancer data.

Figure 8. The W–AH model posterior parameters trace plots of the gastric cancer data.

Figure 9. The BXII–AH model posterior parameters trace plots of the gastric cancer data.

Figure 10. The empirical CDF, the dotted line and the CDF of the fitted model, the smooth curve, show that the fitted GLL–AH model predicts the future observations that are consistent with the current data.

Figure 11. The GLL–AH model posterior parameters AC plots of the gastric cancer data.

Figure 12. The GLL–AH model posterior parameters PDF plots of the gastric cancer data.

Figure 13. Kaplan–Meier and fitted survival curve for the GLL–AH model of the gastric cancer dataset.

Figure 14. Kaplan–Meier and estimated survival plots for the competitive regression models with the GLL baseline distribution of the gastric cancer dataset.

Figure 15. Kaplan–Meier and estimated survival plots for the competitive AH models of the gastric cancer dataset.

Table 1. Simulation study for GLL–AH regression model. True values (True), Estimates (Est.), standard error (SE), average bias (AB), mean square error (MSE), and coverage probability (CP 95%) are presented for the parameters.

	True	Est.	SE	AB	MSE	CP	Est.	SE	AB	MSE	$\hat{R}$
					Set I $n = 50$
$M_{2}$		MLE Approach						Bayesian
$β_{1}$	0.75	0.800	0.100	0.050	0.037	93.85	0.790	0.002	0.040	0.036	1.002
$β_{2}$	0.5	0.558	0.042	0.058	0.024	94.50	0.512	0.003	0.012	0.011	1.002
$α$	1.50	1.590	0.010	0.090	0.008	95.20	1.505	0.001	0.005	0.003	1.000
k	0.75	0.900	0.435	0.150	0.063	92.05	0.850	0.005	0.100	0.045	1.002
$η$	1.20	1.265	0.011	0.065	0.046	94.25	1.212	0.000	0.012	0.004	1.003
	True	Est.	SE	AB	MSE	CP	Est.	SE	AB	MSE	$\hat{R}$
					Set II $n = 100$
$M_{2}$		MLE approach						Bayesian
$β_{1}$	0.75	0.790	0.100	0.040	0.036	94.10	0.770	0.001	0.020	0.018	1.000
$β_{2}$	0.5	0.530	0.030	0.030	0.024	94.80	0.510	0.002	0.010	0.010	1.001
$α$	1.50	1.610	0.040	0.110	0.087	93.40	1.553	0.001	0.053	0.041	1.003
k	0.75	0.850	0.250	0.100	0.056	93.20	0.800	0.004	0.050	0.037	1.002
$η$	1.20	1.250	0.008	0.050	0.034	94.80	1.205	0.000	0.005	0.003	1.001
					Set III $n = 300$
	True	Est.	SE	AB	MSE	CP	Est.	SE	AB	MSE	$\hat{R}$
$M_{2}$		MLE approach						Bayesian
$β_{1}$	0.75	0.78	0.092	0.030	0.032	94.40	0.768	0.001	0.018	0.016	1.000
$β_{2}$	0.5	0.525	0.013	0.025	0.021	93.90	0.503	0.001	0.003	0.002	1.000
$α$	1.50	1.592	0.021	0.042	0.030	93.85	1.506	0.001	0.006	0.006	1.001
k	0.75	0.844	0.212	0.094	0.049	93.46	0.798	0.003	0.048	0.036	1.000
$η$	1.20	1.252	0.008	0.052	0.034	94.60	1.205	0.000	0.005	0.003	1.001
	True	Est.	SE	AB	MSE	CP	Est.	SE	AB	MSE	$\hat{R}$
					Set IV $n = 500$
$M_{2}$		MLE approach						Bayesian
$β_{1}$	0.75	0.775	0.065	0.025	0.017	95.10	0.752	0.000	0.002	0.002	1.000
$β_{2}$	0.5	0.526	0.013	0.026	0.021	94.00	0.503	0.001	0.003	0.002	1.000
$α$	1.50	1.550	0.040	0.050	0.037	94.70	1.503	0.001	0.003	0.001	1.000
k	0.75	0.825	0.110	0.075	0.048	94.07	0.780	0.003	0.030	0.027	1.001
$η$	1.20	1.205	0.005	0.005	0.003	95.04	1.203	0.000	0.003	0.001	1.001

Table 2. Results from the fitted proposed fully parametric AH regression model and other survival regression models with the GLL baseline distribution to gastric cancer dataset.

Models	Parameter(s)	Estimate	SE	AIC	CAIC	HQIC
GLL-AH	$β$	2.690	0.021	244.318	242.845	248.351
	$α$	1.505	0.040
	k	0.542	0.036
	$η$	0.133	0.022
GLL-PO	$β$	0.750	0.101	251.816	250.522	255.848
	$α$	1.382	0.100
	k	0.650	0.074
	$η$	0.500	0.042
GLL-PH	$β$	0.130	0.241	255.565	254.345	259.598
	$α$	1.302	0.140
	k	0.759	0.136
	$η$	0.580	0.222
GLL-AFT	$β$	0.540	0.135	252.139	250.851	256.171
	$α$	1.545	0.127
	k	0.557	0.106
	$η$	0.728	0.231

Table 3. Results from the fitted proposed fully parametric AH regression model with different baseline distributions to gastric cancer dataset.

Models	Parameter(s)	Estimate	SE	AIC	CAIC	HQIC
GLL-AH	$β$	2.690	0.021	244.318	242.845	248.351
	$α$	1.505	0.040
	k	0.542	0.036
	$η$	0.133	0.022
PGW-AH	$β$	1.930	0.082	251.186	249.878	255.218
	$α$	1.687	0.142
	k	0.821	0.066
	$η$	2.226	0.102
GG-AH	$β$	2.688	0.130	252.645	251.368	256.677
	$α$	1.821	0.122
	k	0.482	0.236
	$η$	0.737	0.042
EW-AH	$β$	2.066	0.110	252.667	251.390	256.699
	$α$	0.789	0.212
	k	0.911	0.086
	$η$	2.283	0.052
LL-AH	$β$	1.097	0.020	247.492	246.686	250.517
	$α$	1.913	0.052
	k	1.213	0.019
LN-AH	$β$	0.261	0.120	263.830	263.197	266.854
	$α$	0.065	0.101
	k	1.260	0.032
BXII-AH	$β$	0.923	0.142	249.144	248.359	252.168
	$α$	0.880	0.119
	k	1.890	0.120
W-AH	$β$	2.581	0.214	256.776	256.078	259.800
	$α$	1.013	0.049
	k	1.818	0.112
G-AH	$β$	2.367	0.430	255.121	254.406	258.145
	$α$	1.495	0.039
	k	1.252	0.123

Table 4. LRT test for the GH model and its submodels.

Model	Hypothesis	LRT	p-Value
GLL-AH vs. BXII-AH	$H_{0} : η = 1, H_{1} : H_{0}$ is false,	6.999	0.008
GLL-AH vs. LL-AH	$H_{0}$ : $η = k, H_{1} : H_{0}$ is false,	5.347	0.021
GLL-AH vs. W-AH	$H_{0}$ : $η^{α} \to 0$ , $H_{1} : H_{0}$ is false,	14.533	<0.001

Table 5. Results for the posterior properties of the GLL–AH, GLL–PO, GLL–PH and GLL–AFT models.

Models	Par (s)	Estimate	SE	SD	2.5%	Medium	97.5%	$N_{eff}$	$\hat{R}$
GLL–AH	$β$	1.016	0.009	0.476	0.030	1.027	1.909	2684	1.001
	$α$	0.836	0.002	0.106	0.648	0.829	1.064	3097	1.002
	k	1.553	0.004	0.196	1.205	1.544	1.969	2714	1.001
	$η$	0.674	0.003	0.191	0.353	0.653	1.105	3023	1.001
GLL–PO	$β$	0.565	0.006	0.353	−0.135	0.562	1.268	3617	1.001
	$α$	1.414	0.003	0.156	1.136	1.405	1.741	3257	1.000
	k	0.804	0.002	0.115	0.600	0.796	1.054	2951	1.001
	$η$	0.806	0.004	0.214	0.429	0.792	1.262	2918	1.000
GLL–PH	$β$	0.106	0.004	0.224	−0.330	0.107	0.540	3216	1.000
	$α$	1.341	0.002	0.146	1.077	1.332	1.646	3588	1.001
	k	0.876	0.002	0.122	0.662	0.869	1.134	3068	1.001
	$η$	0.837	0.004	0.221	0.452	0.820	1.315	3239	1.001
GLL–AFT	$β$	0.418	0.005	0.269	−0.116	0.415	0.949	3396	1.000
	$α$	1.435	0.003	0.177	1.124	1.423	1.804	3373	1.000
	k	0.809	0.002	0.114	0.609	0.801	1.060	2963	1.000
	$η$	0.850	0.004	0.210	0.479	0.836	1.311	2728	1.000

Table 6. Results for the posterior properties of the submodels of the GLL–AH model including LL–AH, W–AH, and BXII–AH models.

Models	Par (s)	Estimate	SE	SD	2.5%	Medium	97.5%	$N_{eff}$	$\hat{R}$
LL–AH	$β$	0.764	0.007	0.385	−0.073	0.800	1.421	3228	1.001
	$α$	1.636	0.004	0.197	1.261	1.629	2.039	2930	1.000
	k	0.879	0.002	0.107	0.688	0.873	1.109	3681	1.001
W–AH	$β$	−0.007	0.014	0.949	−1.850	−0.019	1.860	4377	1.000
	$α$	0.984	0.001	0.085	0.821	0.982	1.152	3521	1.000
	k	0.559	0.001	0.068	0.437	0.554	0.702	3875	1.001
BXII–AH	$β$	0.678	0.007	0.378	−0.135	0.697	1.345	3291	1.000
	$α$	1.627	0.004	0.209	1.247	1.620	2.062	3099	1.000
	k	0.949	0.002	0.115	0.740	0.943	1.186	3932	1.000

Table 7. Bayesian model comparison for the GLL–AH, GLL–PO, GLL–AFT, and GLL–PH models.

Model	WAIC	LOOIC
GLL–AH	243.20	243.20
GLL–PO	251.40	251.42
GLL–AFT	251.80	251.90
GLL–PH	254.80	254.82

Table 8. Bayesian model comparison for the GLL–AH and its special cases including LL–AH, W–AH, and BXII–AH models.

Model	WAIC	LOOIC
GLL–AH	243.20	243.20
LL–AH	249.30	249.40
W–AH	255.01	255.00
BXII–AH	247.05	247.08

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muse, A.H.; Chesneau, C.; Ngesa, O.; Mwalili, S. Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves. Math. Comput. Appl. 2022, 27, 104. https://doi.org/10.3390/mca27060104

AMA Style

Muse AH, Chesneau C, Ngesa O, Mwalili S. Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves. Mathematical and Computational Applications. 2022; 27(6):104. https://doi.org/10.3390/mca27060104

Chicago/Turabian Style

Muse, Abdisalam Hassan, Christophe Chesneau, Oscar Ngesa, and Samuel Mwalili. 2022. "Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves" Mathematical and Computational Applications 27, no. 6: 104. https://doi.org/10.3390/mca27060104

APA Style

Muse, A. H., Chesneau, C., Ngesa, O., & Mwalili, S. (2022). Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves. Mathematical and Computational Applications, 27(6), 104. https://doi.org/10.3390/mca27060104

Article Menu

Flexible Parametric Accelerated Hazard Model: Simulation and Application to Censored Lifetime Data with Crossing Survival Curves

Abstract

1. Introduction

2. AH Model Formulation

3. Baseline Hazard

4. The Proposed Model

4.1. Submodels

4.1.1. Submodel I: η = 1

4.1.2. Submodel II: η = k

4.1.3. Submodel III: η α → 0

5. Inferential Procedures

5.1. Classical Approach

5.2. Bayesian Approach

5.2.1. Prior Distribution

5.2.2. Likelihood Function

5.2.3. Posterior Distribution

6. Simulation Study

7. Applications

7.1. Gastric Cancer Dataset

7.2. Classical Analysis

7.3. Likelihood Ratio Test

7.4. Bayesian Analysis

7.4.1. Numerical Summary

7.4.2. Visual Summary

7.4.3. Posterior Predictive Checks

7.4.4. McMC Convergence Diagnostics

7.4.5. Bayesian Model Selection

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1.1. Submodel I: $η = 1$

4.1.2. Submodel II: $η = k$

4.1.3. Submodel III: $η^{α} \to 0$