Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators

Nemes, Szilárd; Gustavsson, Andreas; Jauhiainen, Alexandra

doi:10.3390/e24050713

Open AccessArticle

Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators

by

Szilárd Nemes

^*,

Andreas Gustavsson

^* and

Alexandra Jauhiainen

^*

BioPharma Early Biometrics and Statistical Innovation, Data Science & AI, BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden

^*

Authors to whom correspondence should be addressed.

Entropy 2022, 24(5), 713; https://doi.org/10.3390/e24050713

Submission received: 4 April 2022 / Revised: 5 May 2022 / Accepted: 5 May 2022 / Published: 16 May 2022

(This article belongs to the Special Issue Applications of Information Theory in Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Restricted Mean Survival Time (

R M S T

), the average time without an event of interest until a specific time point, is a model-free, easy to interpret statistic. The heavy reliance on non-parametric or semi-parametric methods in the survival analysis has drawn criticism, due to the loss of efficacy compared to parametric methods. This assumes that the parametric family used is the true one, otherwise the gain in efficacy might be lost to interpretability problems due to bias. The Focused Information Criterion (

F I C

) considers the trade-off between bias and variance and offers an objective framework for the selection of the optimal non-parametric or parametric estimator for scalar statistics. Herein, we present the

F I C

framework for the selection of the

R M S T

estimator with the best bias-variance trade-off. The aim is not to identify the true underling distribution that generated the data, but to identify families of distributions that best approximate this process. Through simulation studies and theoretical reasoning, we highlight the effect of censoring on the performance of

F I C

. Applicability is illustrated with a real life example. Censoring has a non-linear effect on

F I C

s performance that can be traced back to the asymptotic relative efficiency of the estimators.

F I C

s performance is sample size dependent; however, with censoring percentages common in practical applications

F I C

selects the true model at a nominal probability (0.843) even with small or moderate sample sizes.

Keywords:

parametric; non-parametric; information theory; model selection; survival analysis

1. Introduction

Restricted Mean Survival Time (RMST), the average survival time up to a given time point, is hailed as a model-free statistic, which is easy to interpret causally when summarizing survival data [1].

R M S T

has observed a resurgence in practical applications as an alternative to classical analysis based on log-rank tests or Proportional Hazard (PH) models when assessing between-group differences in survival analysis [2,3]. For clinical trial planning, the power of different analysis methods needs to be considered. There are indications that log-rank or PH tests generally have higher statistical power than

R M S T

; however, this depends on the setting [4,5]. When estimated non-parametrically,

R M S T

is less efficient than hazard-based methods estimated via semi- or fully parametric models under the proportional hazards assumption [6].

The heavy reliance on non-parametric or semi-parametric methods in a survival analysis has drawn some criticism [7,8]; however, as Meier and collaborators [9] point out, it is a rather challenging task to identify the correct parametric form for a certain problem. In addition, the censoring affects the efficacy of both parametric and non-parametric

R M S T

estimators. Gardiner [10] used Kolmogorov–Smirnov, Andersen-Darling and Cramér-von Mises statistics to assess the goodness-of-fit of parametric distributions against the empirical Kaplan–Meier alternative prior to estimating

R M S T

. Nemes and collaborators [11] concluded in a simulation study that, under model miss-specification, the non-parametric

R M S T

estimator has superior efficacy in terms of the mean squared error (

M S E

) compared to parametric alternatives. The authors also concluded that parametric estimators reduce type II error rates (i.e., increased statistical power) if the correct distribution is identified. The percentage of censoring and the choice of restriction time are acknowledged by the authors to directly affect the comparability of parametric and non-parametric estimators.

The

M S E

offers an objective way to compare estimators in simulation studies where the true value of a parameter of interest is known. However, the validity of

M S E

comparisons is limited in practical situations, as the bias generally is unknown and is difficult to estimate. Building upon the Focused Information Criterion (

F I C

) by Claeskens and Hjort [12], Jullum and Hjort [13] developed a framework for objective comparison and model selection among parametric and non-parametric models, and this latest development of

F I C

is at the core of our study.

F I C

does not attempt to assess the overall fit of candidate models to observed data. Instead, candidate models are ranked based on the estimated precision of a parameter of primary interest. This ‘focus’ parameter does not need to be a specified parameter of a distribution, but can be any scalar summary of the data. As

R M S T

captures the survival patterns in a single scalar measure,

F I C

offers a feasible framework for model selection.

In this paper, building upon Claeskens and Hjort [12] and Jullum and Hjort [13], we aim to establish the

F I C

framework for the model selection for

R M S T

. We describe the mathematical framework needed for implementation. Thereafter, we look at factors affecting the performance of

F I C

, such as censoring type and rate as well as sample size. In addition, as with a real-life application, we illustrate possible gains in efficacy by using the parametric

R M S T

estimators suggested by

F I C

without compromising interpretability. We also provide an indicative discussion of the interplay between the maximum follow-up time and chosen restriction time.

2. Notation and Assumptions

2.1. Notation and Nomenclature

We assume that survival times

X_{1}, \dots, X_{n}

for subjects

j = 1, \dots, n

are independently and identically distributed (

i i d

), according to the cumulative distribution function

F (x) = P (X \leq x)

and survival function of interest

S (x) = 1 - F (x) = P (X > x)

. Similarly we assume

C_{1}, \dots, C_{n}

to be

i i d

censoring times according to the distribution function

G (c)

and survival function

1 - G (c)

. Thus, the actual observed time for subject j is

T_{j} = min (X_{j}, C_{j},)

. Additionally, we have

δ_{j} = I \{X_{j} \leq C_{j})\}

as an event indicator that takes a value of 1 if the event of interest takes place before or on the given censoring time, and 0 otherwise. We assume independence between failure and censoring times. We let

t_{(1)} \leq . . . \leq t_{(n)}

denote the ordered observed survival times and

δ_{(1)}, . . ., δ_{(n)}

their associated indicator values.

In estimating the survival function S from the observed censored data

{(t_{(i)}, δ_{(i)})}_{i = 1}^{N}

, scientific literature almost exclusively uses the Kaplan–Meier Product-Limit estimator [14], expressed as

\begin{matrix} {\hat{S}}_{K M} (t) = \prod_{T_{i} \leq t} [1 - \frac{δ_{i}}{Y (T_{i})}] \end{matrix}

(1)

where

Y (t)

is the number at risk at time t. If we have information about

F (x)

and if it is a member of a parametric family of distributions with p-variate parameter vector

θ

, then the likelihood function for the sample (

T_{j}

,

δ_{j}

),

j = 1, . . ., n

is

\begin{matrix} L (θ | T_{j}, δ_{j}) = \prod_{j = 1}^{n} f {(T_{j}; θ)}^{δ_{j}} {\{1 - F (T_{j}; θ)\}}^{1 - δ_{j}} . \end{matrix}

(2)

Further, we denote the first and second derivatives of the log-likelihood function,

log L (θ | T_{j}, δ_{j})

, as

u (T_{j}; θ)

and

I (T_{j}; θ)

. We also define the information matrix as

\begin{matrix} J = - E_{F} \{I (T_{j}; θ)\} and K = {Var}_{F} \{u (T_{j}; θ)\} \end{matrix}

(3)

Generally, K is considered an inefficient estimator of the information matrix; however, it plays an important role when robustness is of concern. Under some regularity conditions (see Chap 6 in [15]) the maximum likelihood estimator of

θ

,

{\hat{θ}}_{M L E}

, satisfies

\begin{matrix} \sqrt{n} ({\hat{θ}}_{M L E} - θ_{0}) \overset{D}{\to} N_{p} \{0, Σ\}, \end{matrix}

(4)

where

θ_{0}

is the unique minimizer of the Kullback–Leiber divergence and the least false parameter value;

N_{p}

is a mean zero p-variate normal distribution with covaraince matrix

Σ

. If the assumed parametric model is the true model then

J = K

and

Σ = J {(θ)}^{- 1}

. Below, the subscript

n p

denotes the non-parametric estimator and

p m

the parametric estimator, while the subscript 0 corresponds to the least false or best approximate value, as the minimizer of the Kullback–Leibler distance from the true model to the approximated model.

2.2. Restricted Mean Survival Time

Kaplan and Meier [14] suggested estimating the mean survival time (

μ

) as

\begin{matrix} {\hat{μ}}_{K M} = \int_{0}^{\infty} t d F_{n} (t), \end{matrix}

(5)

where

F_{n}

is the empirical distribution function. However, this is rarely estimable due to censoring and instead attention is paid to the

τ

-restricted mean survival time (

μ_{τ}

)

\begin{matrix} {\hat{μ}}_{K M, τ} = \int_{0}^{τ} t d F_{n} (t) = \int_{0}^{τ} {\hat{S}}_{K M} (t) d t . \end{matrix}

(6)

This approach disregards any information after

τ

and technically, this counts as Type I censoring, as the analysis is restricted to the interval

(0, τ]

.

Alternatively, based on the plug-in principle, we can use the maximum likelihood estimates to calculate

μ_{τ}

with the assumed distribution function as

\begin{matrix} μ_{τ} = \int_{0}^{τ} S (t; {\hat{θ}}_{M L E}) d t . \end{matrix}

As the Kaplan–Meier estimator has an infinite number of parameters,

σ_{n p}^{2} > σ_{p m}^{2}

. However, this presumes that

F (t)

is correctly identified. If

F (t)

is incorrectly selected, then the maximum likelihood estimator is asymptotically biased, resulting in a inflated

M S E

. Trading-off bias against variance is a cornerstone of the

F I C

, described in the next section. In this setting, the non-parametric estimator is considered unbiased, thus

\begin{matrix} M S E_{n p} = 0^{2} + \frac{v_{n p}}{n} . \end{matrix}

(7)

and the

M S E

for the parametric estimator is given by

\begin{matrix} M S E_{p m} = b^{2} + \frac{v_{p m}}{n}, \end{matrix}

(8)

where b is the bias of the estimator and v represents the variance.

3. Focused Information Criterion for RMST

We now aim to deduce the

F I C

for

R M S T

. We look at properties of the non-parametric estimator

\int_{0}^{τ} {\hat{S}}_{K M} (t) d t

and a parametric alternative denoted by

\int_{0}^{τ} {\hat{S}}_{p m} (t) d t

. As

n \overset{}{\to} \infty

based on Jullum and Hjort [13], we note that

\begin{matrix} (\begin{matrix} \sqrt{n} (\int_{0}^{τ} {\hat{S}}_{K M} (t) d t - \int_{0}^{τ} S (t) d t) \\ \sqrt{n} (\int_{0}^{τ} {\hat{S}}_{p m} (t) d t - \int_{0}^{τ} S_{0} (t) d t) \end{matrix}) \overset{D}{\to} ((\begin{matrix} Z \\ c^{t} J^{- 1} U \end{matrix})) \sim N ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} v_{n p} & v_{c} \\ v_{c} & v_{p m} \end{matrix})) . \end{matrix}

(9)

Here,

(Z, U)

are zero mean normal variables with dimensions 1 and p. Next, we need to establish estimators for the parameters in Equation (9). For the variance of the non-parametric

R M S T

, the empirical analogue of

{\hat{v}}_{n p} = n^{- 1} \sum_{i = 1}^{n} I F (T_{i}, {\hat{F}}_{n})

is a natural choice. Here,

I F

is the influence function of a statistical functional

T (F)

given by

\begin{matrix} I F (x; T, F) = lim_{ϵ \to 0} \frac{\{T [(1 - ϵ) F + ϵ δ_{x}] - T (F)\}}{ϵ}, \end{matrix}

(10)

if this limit exists. Reid [16] was first to provide

I F

for censored data, and for the restricted mean survival time. Building upon the representation of the cumulative hazard function as a functional of two subsurvival functions

S^{u} = P (X > t, δ = 1)

and

S^{c} = P (X > t, δ = 0)

by Peterson [17], Reid [16] gives

\begin{matrix} I F (T_{i}, F_{n}, S^{u}, S^{c}) = \int_{0}^{τ} S (t) \{\frac{1 {s \leq t}}{(S^{u} + S^{c}) (s)} + \int_{0}^{min (s, t)} \frac{d S^{u}}{{(S^{u} + S^{c})}^{2} (u)}\} d t, \end{matrix}

(11)

with

τ < \infty

and

S (τ) > 0

. This reduces to the well known Greenwood plug-in estimator, which, as Eaton and collaborators [5] demonstrated based on Monte Carlo simulations, is closest to empirical and asymptotic variances. The Greenwood estimator is given by

\begin{matrix} \hat{V} (μ_{τ}) = \sum_{t_{i} \leq t} {[\int_{t_{i}}^{τ} {\hat{S}}_{K M} (t) d t]}^{2} \frac{δ_{i}}{Y_{i} (T_{i}) (Y_{i} (T_{i}) - δ_{i})} . \end{matrix}

(12)

The variance of the parametric estimator is defined from a model-agnostic viewpoint. The influence function of

{\hat{θ}}_{M L E} = M L E (F)

is given by

I F (T, F) = lim_{ϵ \to 0} \{M L E (F_{ϵ}) - M L E (F)\} = J {(θ)}^{- 1} K (θ) J {(θ)}^{- 1}

with

\begin{matrix} J = - \frac{1}{n} \sum_{j = 1}^{n} \{I (T; θ)\} and K = \frac{1}{n} \sum_{j = 1}^{n} \{u (T; θ) u {(T; θ)}^{t}\} . \end{matrix}

(13)

With the delta-method, this gives

\begin{matrix} v_{p m} = {\{\frac{\partial μ (\hat{θ})}{\partial θ}\}}^{t} J {(θ)}^{- 1} K (θ) J {(θ)}^{- 1} \{\frac{\partial μ (\hat{θ})}{\partial θ}\} . \end{matrix}

(14)

For the co-variance

\begin{matrix} v_{c} = \frac{1}{n} {\{\frac{\partial μ (\hat{θ})}{\partial θ}\}}^{t} J^{- 1} \sum_{j = 1}^{n} I F (T_{j}, {\hat{F}}_{n}) u (T_{j}; \hat{θ}) . \end{matrix}

(15)

In Equation (9) we made the claim that

\sqrt{n} (\int_{0}^{τ} {\hat{S}}_{K M} (t) d t - \int_{0}^{τ} S (t) d t)

has a limit normal distribution with a mean zero of a certain variance, implicitly assuming that

\int_{0}^{τ} {\hat{S}}_{K M} (t) d t

is asymptotically unbiased.

Generally, non-parametric estimators are unbiased; however, this is not true for the Kaplan–Meier integrals [14]. Meier [18] specified that

{\hat{S}}_{K M} (t)

is “nearly unbiased” at a rate of

e^{- Y (t)}

.

Gill [19] provided stronger bounds for the bias

\begin{matrix} - F (t) H^{n} (t) \leq S (t) - {\hat{S}}_{K M} (t) \leq 0 \end{matrix}

(16)

where

H^{n} (t) = P (Y (t) = 0)

is the probability that the at risk set is empty.

Mauro [20] demonstrated that

B i a s 〈\int_{0}^{τ} t d F_{n} (t)〉 \leq 0

. Zhou [21] was first to provide a lower bound for the bias

\begin{matrix} - \int_{0}^{τ} t H^{n} (t) F (d t) \leq Bias 〈\int_{0}^{τ} t d F_{n} (t)〉 . \end{matrix}

(17)

Stute [22] provided and improved version of the lower bound of the bias in the form of

\begin{matrix} - \int_{0}^{τ} t G (t) H^{n - 1} (t) F (d t) \leq Bias 〈\int_{0}^{τ} t d F_{n} (t)〉 . \end{matrix}

(18)

It is evident that if there is no censoring, the terms of the lower bound vanish, and as the bias is strictly negative,

\int_{0}^{τ} x d F_{n} (x)

is unbiased. This is expected as in this case

\int_{0}^{τ} x d F_{n} (x) = n^{- 1} \sum_{i} X_{i}

. However, it is also evident that when censoring is present, the Kaplan–Meier integral can have a non-negligible large sample size bias. Maximum bias is observed at

τ_{H} = inf {t : H (t) = 1}

, the least upper bound of support for the distribution function of T. In real life applications

τ ≪ τ_{H}

. Additionally, the bias is more evident when G has short tails compared to F. As a result

H^{n} (t)

, or

H^{n - 1} (t)

on the interval

(0, τ]

is negligible and we can assume that the bias of

{\hat{S}}_{K M} \sim 0

.

The parametric estimator is asymptotically unbiased and based on Equation (9) for the bias

\hat{b}

, we have

\begin{matrix} \sqrt{(n)} (\hat{b} - b) \overset{D}{\to} c^{t} J^{- 1} U \sim N (0, κ) \end{matrix}

(19)

where

κ = v_{p m} + v_{n p} - 2 v_{c}

.

Although,

\hat{b}

is an approximately unbiased estimator for b, typically

\hat{b^{2}}

overestimates

b^{2}

with

E_{F} {\hat{b^{2}}} = b^{2} + κ / n + o (n^{- 1})

. Jullum and Hjort [13] noted that it is theoretically possible that

b^{2} < κ / n

and introduced the following correction

max (0, {\hat{b}}^{2} - \hat{κ} / n)

in order to truncate negative estimates (i.e., no bias) to zero.

After we have established the necessary estimators, we can confirm the

F I C

scores for the

R M S T

μ_{τ}

as

\begin{matrix} F I C_{n p} & = \frac{{\hat{v}}_{n p}}{n} \end{matrix}

(20)

\begin{matrix} F I C_{p m} & = max \{0, {\hat{b}}^{2} - \frac{\hat{κ}}{n}\} + \frac{{\hat{v}}_{p m}}{n} \end{matrix}

(21)

Clinical trials mainly aim to compare two (or more) treatment arms, e.g., to test the difference in restricted means survival times between two groups (denoted 1 and 2, below)

\begin{matrix} Δ = \int_{0}^{τ} S_{1} (t) d t - \int_{0}^{τ} S_{2} (t) d t . \end{matrix}

(22)

If

Δ

is estimated based on non-parametric models, then

\begin{matrix} F I C_{n p}^{Δ} & = \frac{{\hat{v}}_{1_{n p}}}{n_{1}} + \frac{{\hat{v}}_{2_{n p}}}{n_{2}} . \end{matrix}

(23)

while if we use parametric estimators then

\begin{matrix} F I C_{p m}^{Δ} & = max \{0, {({\hat{b}}_{1} - {\hat{b}}_{2})}^{2} - \frac{{\hat{κ}}_{1}}{n_{1}} - \frac{{\hat{κ}}_{2}}{n_{2}}\} + \frac{{\hat{v}}_{1_{p m}}}{n_{1}} + \frac{{\hat{v}}_{2_{p m}}}{n_{2}} . \end{matrix}

(24)

Naturally, a mix of distributions or a mix of parametric and non-parametric estimators is possible.

4. Operating Characteristics of FIC for RMST

Jullum and Hjort [13] (Corollary 1) provided the upper probability limit of

F I C

selecting the true parametric model over the non-parametric one (

α_{n}

) as

P r (χ_{1}^{2} < 2) = 0.843

. Likely,

α_{n}

is influenced by several factors that limit the amount of information available in the data. In the following, we assess how censoring and sample size affect

α_{n}

. In addition, we discuss how the choice of

τ

and the relationship between

τ

and maximum follow-up time (

t_{m a x}

) might affect

F I C

.

The characteristics of the variance estimators for

R M S T

have direct implications on

F I C

. The Greenwood variance estimator (Equation (12)) is a sum of a sequence of overlapping squared areas from

t_{i}

to

τ

weighed by the square of the coefficient of the variation of

S (t)

at

t_{i}

. As noted previously,

v_{n p} \leq v_{p m}

. When we have Type I censoring, the support set for

v_{n p}

and

v_{p m}

coincide. If

t_{m a x} > τ

, then the domain of the non-parametric estimator is

(0, τ]

, while for the parametric estimator it is

(0, t_{m a x}]

. The proportion of the total Fisher Information contained in the censored data is just the proportion of observations that are not censored [23], and given

X ⊥ ⊥ C ⊥ ⊥ τ

we have

\begin{matrix} J (T, θ) = J (X, θ) P r (X < τ \land C) \end{matrix}

(25)

where

P r (X < τ \land C) = \int_{0}^{τ} F (x) g (x) d x

and if

τ < t_{m a x}

then

J_{t_{m a x}} {(θ)}^{- 1} < J_{τ} {(θ)}^{- 1}

. If the parametric model is correct, and follow-up is not restricted to

(0, τ]

(i.e., random censoring) with

J (T, θ) = J (X, θ) P r (X < C)

,

F I C

ought to select the true parametric estimator with higher probability.

Within a reasonable restriction time, we expect that

α_{n}

is directly affected by the percent of censored observations and sample size. Here, we consider a scenario where we assume that the maximum follow-up time is

τ

, mirroring a clinical trial with Type I censoring at

τ

. The actual observed time for subject j is

T_{j} = min (X_{j}, C_{j} \land τ)

and

δ_{j} = I \{X_{j} \leq C_{j} \land τ\}

, a mix of Type I and random censoring. We assume exponential survival times with

λ = 1 / 365

and Type I censoring at

τ = 365

, and evaluate a series of random exponential censoring times with hazard

γ = 0.1 / 365, . . ., 3 / 365

with increments of 0.029. This resulted in a minimum overall censoring of 36.7%, and a maximum censoring of 75%. For each

γ

, we simulated a data set with

n = 100

and estimated

F I C

for the non-parametric and for the exponential

R M S T

. The simulations were repeated 1000 times. The aim was to assess the true positive rate of choosing between the (true) exponential

R M S T

and the non-parametric alternative.

As it can be observed in Figure 1, with increasing censoring, the sensitivity of

F I C

initially decreased, reaching a minimum at around 60% of censoring, followed by an increase in sensitivity. Next, with the censoring percentage at the point where the sensitivity was the lowest (

γ = 0.00448

), we simulated survival data with varying sample sizes from 50 to 1000 subjects and estimated

F I C

. Each sample size was simulated 1000 times. As expected, the true positive rate of choosing the exponential distribution increased with the sample size. For a more detailed look at the patterns recorded in Figure 1, please see the Appendix A.

5. Practical Application

The survival rate of melanoma has increased in recent decades, with approximately two-thirds of the patients surviving 5 years or more after diagnosis, with women generally having better survival than men [24]. Using a data set compiled by Drzewiecki and collaborators [25], we will assess the possibility to improve the efficiency of an

R M S T

analysis of sex-specific survival. Data from 126 female and 79 male melanoma patients are included in the analysis (data can be found in the “timereg” R package). As can be observed in Figure 2, females have better survival prospects than males. Next, we analyse whether the restricted mean survival time at 3, 5 and 10 years differ between the sexes. As competing models, we consider the non-parametric estimator and the Exponential, Weibull, Gamma, Generalized Gamma and Log-logistic distributions. The combination of the Exponential distribution for men and Gamma distribution for women was flagged by

F I C

as a better alternative than the purely non-parametric estimators (Table 1).

On average, women had 65 days longer survival in the initial 3 years. The bias of the parametric estimator was negligible (0.7 days). Additionally, the parametric model reduced the 95 % confidence interval (CI) length with 27.5%, a considerable gain.

On average, women had 165 days longer survival in the initial 5 years. Just as for the 3-year survival, the Exponential-Gamma combination best described the data at 5 years, and reduced the 95% CI length with 18%. However, it should be noted that the bias was 14 days, which is an 8.5% bias. At 10 years, the

R M S T

difference between men and women increased to 15.5 months. Still, parametric estimation increased the efficiency; however, the reduction in CI length was less than 1%, a very minor gain compared to the non-parametric estimator.

6. Discussions

In this paper, we have introduced the

F I C

[12,13] as a tool for the model selection for

R M S T

. While

F I C

has a well established theory and is applicable in a wide range of areas, using

F I C

as a tool for selection of the best

R M S T

model has some characteristics that need to be considered. First, we need to consider that the non-parametric

R M S T

estimator is not consistent and is biased. Likely, this will have minor implications in practical applications; nevertheless, researchers should consider this aspect. If the at risk set at the chosen restriction time

τ

is the empty set or contains very few participants, the bias can be non-negligible. Second, the censoring percentage and type of censoring (type II or random, possibly hybrid) affects the efficiency of parametric and non-parametric estimators differently. Third, the variance of the Kaplan–Meier survival curve at any time t is based on information up to

t -

. The parametric survival curve estimator use information up to

τ

in the case of Type I censoring, or

t_{m a x}

in the case of random censoring.

Jullum and Hjort [13] concluded that the upper probability limit of

F I C

selecting the true parametric model over the non-parametric one (

α_{n}

) to be

P r (χ_{1}^{2} < 2) = 0.843

, a probability that was replicated in our simulations. This probability was obtained when the exponential estimator was tested against the non-parametric estimator in a setting when the exponential model was the true one and the censoring was due the restriction at

τ

. We observed that

α_{n}

was dependent on the censoring percentage. In addition, reaching

α_{n} = 0.843

is sample size dependent.

In clinical trials of chronic diseases,

τ

coincides with the end of the follow-up. In observational studies, often

τ ≪ t_{m a x}

, thus the information contained in

(τ, t_{m a x}]

might offer an extra advantage for the parametric variance estimator. However, the same information in

(τ, t_{m a x}]

might bias the parametric survival estimate up to

τ

and induce bias in

{\hat{μ}}_{τ}

. This is more apparent when outliers are present, which usually appear on the right tail of the distribution. This depends on the assumed distribution, as Aranda [26] highlighted, where, e.g., exponential survival curves are less affected than Weibull survival curves.

As illustrated by the analysis of the melanoma data,

F I C

selected the best fitted model that minimizes

M S E

. However, just as Akaike or the Bayesian Information Criterion (AIC and BIC), it offers a ranking of competing models, but not a direct gauge of model fit or quality. At the 10-year restriction time, the parametric estimator was ranked first, but the statistical gains (i.e., lower

M S E

) of choosing the parametric estimator was negligible. Only looking at

F I C

ranks is likely not enough, but one should consider the distance between the competing models on the

F I C

scale. Just as with AIC and BIC, this requires further research.

One practical difficulty of parametric estimation of the

R M S T

lies in the selection of parametric distribution(s). A set of competing parametric families can be selected based on subject-specific disease knowledge and by graphical examination of the hazard. The aim should not be to identify the true underlying distribution that generated the data, but to identify families of distributions with similar shapes [27] and by simultaneously looking at the bias and variance with

F I C

to decide how much model miss-specification can be tolerated [28] in order to increase efficiency.

In conclusion, we advocate the adaptation of the

F I C

framework for model selection for

R M S T

. Studies with relatively short restriction times (i.e., restriction time shorter than the mean/median survival time) can greatly benefit from moving from a non-parametric estimation to a parametric one. It is relatively easy to identify families of distributions with similar shapes as the observed data for shorter follow-times, which would decrease the bias. In observational studies where

t_{max} > τ

, we recommend a first analysis to be conducted so that the support set of both parametric and non-parametric estimators is

(0, τ]

. This setting will likely result in a smaller bias for the parametric

R M S T

estimator and would aid interpretability. Naturally, as

F I C

trades off bias against variance, a reduced variance might outweigh the bias of the parametric estimator on

(0, t_{max}]

. Yet another argument for restricting attention to

(0, τ]

is that the distribution that

F I C

selects might convey important medical/biological information.

Author Contributions

Conceptualization, S.N., A.G. and A.J.; methodology, S.N., A.G. and A.J.; software, S.N., A.G. and A.J.; writing—original draft preparation, S.N., A.G. and A.J.; writing—review and editing, S.N., A.G. and A.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the anonymous reviewers for their insightful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CI	Confidence Interval
FIC	Focused Information Criterion
IF	Influence Function
KM	Kaplan–Meier
MLE	Maximum Likelihood Estimator
np	Non-parametric
pm	Parametric
RMST	Restricted Mean Survival Time

Appendix A. Behavior of FIC for RMST as a Function of Censoring

Figure 1a in the main text exhibited an interesting pattern, namely that the probability of selecting the true exponential

R M S T

over the non-parametric estimator had a v-shaped curve, with

P r_{F_{n}} (F I C_{p m} \leq F I C_{n p})

being the lowest at around 65% of the data censored. Here, we will look at

F I C

and its components, to elucidate the mechanisms behind this pattern. As noted in the main text

\begin{matrix} F I C_{p m} & = max \{0, {\hat{b}}^{2} - \frac{\hat{κ}}{n}\} + \frac{{\hat{v}}_{p m}}{n} . \end{matrix}

(A1)

If the parametric model is the true data generating process then, at least asymptotically,

\hat{b} = 0

, and in practical situations it is expected that

\hat{b}

is close to zero and

F I C_{p m}

is dominated by the variance. Please see Appendix B for more details.

Starting with the Greenwood variance estimator (Equation (12)) and then replacing each term by the asymptotic limiting counterparts, the non-parametric asymptotic variance (

A V a r

) is

\begin{matrix} A V a r (μ_{n p}) = \int_{0}^{τ} {\{\int_{x}^{τ} S (t) d t\}}^{2} \frac{h (x)}{n (1 - F (x)) (1 - G (x))} d x . \end{matrix}

(A2)

Assuming exponential survival times with rate

λ

and exponential censoring times with rate

γ

then

\begin{matrix} \frac{1}{n} A V a r (μ_{n p}) = \int_{0}^{τ} \frac{{(e^{- λ τ} - e^{- x λ})}^{2}}{λ e^{- x (λ + γ)}} d x . \end{matrix}

(A3)

The variance of the parametric estimator when the assumed parametric model is the true model (and

J = K

) is given by

\begin{matrix} A V a r (μ_{p m}) = {\{\frac{e^{- λ τ} (1 + λ τ) - 1}{λ^{2}}\}}^{2} \frac{λ^{2}}{\sum_{i} δ_{i}} . \end{matrix}

(A4)

Here,

\sum_{i} δ_{i} = n P r (X < C) = n λ {(λ + γ)}^{- 1}

. Next, we need to establish the Asymptotic Relative Efficiency (

A R E

) between the non-parametric (Equation (A3)) and parametric variance (Equation (A4)) as

\begin{matrix} A R E = \frac{A V a r (μ_{p m})}{A V a r (μ_{n p})} . \end{matrix}

Without providing a closed form solution, we can observe that the sample size n is factored out. Thus, for a given

τ

the

A R E

is a function of the proportion of censored observations (Figure A1). The v-shaped curve of Figure 1 is present here as well. Miller [7] and Jullum and Hjort [8] have assessed the

A R E

of parametric and non-parametric variance estimators, concluding that the maximum

A R E

is 64%. As Figure A1 indicates, the maximum

A R E

is much higher; however, for realistic restriction times (≤mean survival time), this is achieved at a high censoring percentage, i.e., when the censoring distribution has shorter tails than the survival distribution. This might result in

t_{m a x} < τ

, which complicates the estimation of

\int_{0}^{τ} \hat{S} (t) d t

. Kaplan and Meier [14] did not define

S (x)

for

t_{m a x} < x

and

δ_{t_{m a x}} = 0

. Efron [29] proposed a modification so that

\forall x > t_{m a x}

S (x) = 0

, while Gill [19] proposed

S (x) = S (t_{m a x}) | \forall x > t_{m a x}

. Both modifications would bias

R M S T

and in light of guidelines by Eaton [5],

R M S T

should not be calculated in these settings.

In closing, we can conclude that the v-shaped curve of the true positive rate of

F I C

choosing the true exponential distribution is due to the relation of the asymptotic relative efficiency of parametric and non-parametric estimators to censoring. This concludes the discussion of Figure 1a.

Figure A1. Asymptotic relative efficiency of the parametric and non-parametric variance estimators for

R M S T

as a function of censoring percentage and restriction (

τ

) time.

Figure A1. Asymptotic relative efficiency of the parametric and non-parametric variance estimators for

R M S T

as a function of censoring percentage and restriction (

τ

) time.

Appendix B. Limit Probability as a Function of Censoring and Sample Size

Figure 1b in the main text illustrates, with the help of simulation, the convergence to the limiting probability that

F I C

would select a parametric model over the non-parametric one as a function of censoring and sample size. Using the notation and theory outlined in the appendix by Jullum and Hjort [13], we note that

\begin{matrix} P r_{F_{n}} (F I C_{p m} \leq F I C_{n p}) \overset{}{\to} P r \{χ_{1}^{2} (\frac{n {({\hat{μ}}_{n p} - {\hat{μ}}_{p m})}^{2}}{v_{n p} - v_{p m}}) \leq 2\} . \end{matrix}

Here

χ_{1}^{2} (ζ)

is a non-central distributed variable with 1 degree of freedom and a non-centrality parameter

ζ

. If the considered parametric model is the true one and we have unbiased estimates, then

ζ = 0

and

P r_{F_{n}} (F I C_{p m} \leq F I C_{n p}) = 0.843 .

As noted in the main text,

{\hat{μ}}_{n p}

is downward biased and the bias may decrease to zero at a rate slower than

\sqrt{n}

[30]. The simulation studies in Section 5 in the main text assumed exponential survival times. Maximum likelihood estimators are consistent; however, they can have a small sample bias. The bias of the maximum likelihood estimate of the rate parameter (

λ

) of the exponential distribution

{(n - 1)}^{- 1} λ

rapidly decreases with increasing sample size. We note that

β = E [X]

is the expectation of the uncensored survival times; then, with the help of the delta-method, we can establish the bias of

{\hat{μ}}_{p m}

as

\begin{matrix} b_{p m} = - \frac{1}{2 n} \frac{τ^{2} e^{- τ β^{- 1}}}{β} \end{matrix}

Just as for the non-parametric estimate, we have a downward bias. In addition,

b_{p m} \overset{}{\to} 0

as

n \overset{}{\to} \infty

or

τ \overset{}{\to} \infty

. Due to the bias of both parametric and non-parametric estimates and different convergence rates, we expect that

{\hat{μ}}_{n p} - {\hat{μ}}_{p m} \neq 0

. We can conclude that if

n \overset{}{\to} \infty

then

P r_{F_{n}} (F I C_{p m} \leq F I C_{n p}) \overset{}{\to} 0.843

.

The simulation in Section 5 assumed hybrid random and Type I censoring. This assumes that no information is recorded after

τ

. In observational studies, the maximum follow-up usually exceeds

τ

. The asymptotic relative efficiency of the parametric estimator in the setting described in Section 5 with

T_{j} = min (X_{j}, C_{j} \land τ)

is around 87% of the estimator with

T_{j} = min (X_{j}, C_{j})

. Thus, considering the available data after

τ

increases the convergence toward 0.843. This concludes the discussion of Figure 1b.

References

Stensrud, M.J.; Aalen, J.M.; Aalen, O.O.; Valberg, M. Limitations of hazard ratios in clinical trials. Eur. Heart J. 2019, 40, 1378–1383. [Google Scholar] [CrossRef] [PubMed]
Uno, H.; Claggett, B.; Tian, L.; Inoue, E.; Gallo, P.; Miyata, T.; Schrag, D.; Takeuchi, M.; Uyama, Y.; Zhao, L.; et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J. Clin. Oncol. 2014, 32, 2380. [Google Scholar] [CrossRef] [PubMed]
Hasegawa, T.; Misawa, S.; Nakagawa, S.; Tanaka, S.; Tanase, T.; Ugai, H.; Wakana, A.; Yodo, Y.; Tsuchiya, S.; Suganami, H.; et al. Restricted mean survival time as a summary measure of time-to-event outcome. Pharm. Stat. 2020, 19, 436–453. [Google Scholar] [CrossRef] [PubMed]
Huang, B.; Kuan, P.F. Comparison of the restricted mean survival time with the hazard ratio in superiority trials with a time-to-event end point. Pharm. Stat. 2018, 17, 202–213. [Google Scholar] [CrossRef]
Eaton, A.; Therneau, T.; Le-Rademacher, J. Designing clinical trials with (restricted) mean survival time endpoint: Practical considerations. Clin. Trials 2020, 17, 285–294. [Google Scholar] [CrossRef]
Quartagno, M.; Morris, T.P.; White, I.R. Why restricted mean survival time methods are especially useful for non-inferiority trials. Clin. Trials 2021, 18, 743–745. [Google Scholar] [CrossRef]
Miller, R.G., Jr. What price Kaplan-Meier? Biometrics 1983, 39, 1077–1081. [Google Scholar] [CrossRef] [Green Version]
Jullum, M.; Hjort, N.L. What price semiparametric Cox regression? Lifetime Data Anal. 2019, 25, 406–438. [Google Scholar] [CrossRef] [Green Version]
Meier, P.; Karrison, T.; Chappell, R.; Xie, H. The price of Kaplan–Meier. J. Am. Stat. Assoc. 2004, 99, 890–896. [Google Scholar] [CrossRef]
Gardiner, J.C. Restricted Mean Survival Time Estimation: Nonparametric and Regression Methods. J. Stat. Theory Pract. 2021, 15, 1–15. [Google Scholar] [CrossRef]
Nemes, S.; Bülow, E.; Gustavsson, A. A brief overview of restricted mean survival time estimators and associated variances. Stats 2020, 3, 107–119. [Google Scholar] [CrossRef]
Claeskens, G.; Hjort, N.L. The Focused Information Criterion. J. Am. Stat. Assoc. 2003, 98, 900–916. [Google Scholar] [CrossRef]
Jullum, M.; Hjort, N.L. Parametric or nonparametric: The FIC approach. Stat. Sin. 2017, 27, 951–981. [Google Scholar] [CrossRef] [Green Version]
Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
Boos, D.; Stefanski, L. Essential Statistical Inference: Theory and Methods; Springer: New York, NY, USA, 2013. [Google Scholar]
Reid, N. Influence Functions for Censored Data. Ann. Stat. 1981, 9, 78–92. [Google Scholar] [CrossRef]
Peterson, A.V. Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions. J. Am. Stat. Assoc. 1977, 72, 854–858. [Google Scholar]
Meier, P. Estimation of a distribution function from incomplete observations. J. Appl. Probab. 1975, 12, 67–87. [Google Scholar] [CrossRef]
Gill, R.D. Censoring and stochastic integrals. In Math Centre Tracts; Mathematics Centrum: Amsterdam, The Netherlands, 1980; Volume 124. [Google Scholar]
Mauro, D. A combinatoric approach to the Kaplan-Meier estimator. Ann. Stat. 1985, 13, 142–149. [Google Scholar] [CrossRef]
Zhou, M. Two-sided bias bound of the Kaplan-Meier estimator. Probab. Theory Relat. Fields 1988, 79, 165–173. [Google Scholar] [CrossRef]
Stute, W. The bias of Kaplan-Meier integrals. Scand. J. Stat. 1994, 21, 475–484. [Google Scholar]
Zheng, G.; Gastwirth, J.L. On the Fisher information in randomly censored data. Stat. Probab. Lett. 2001, 52, 421–426. [Google Scholar] [CrossRef]
Behbahani, S.; Maddukuri, S.; Cadwell, J.B.; Lambert, W.C.; Schwartz, R.A. Gender differences in cutaneous melanoma: Demographics, prognostic factors, and survival outcomes. Dermatol. Ther. 2020, 33, e14131. [Google Scholar] [CrossRef]
Drzewiecki, K.; Ladefoged, C.; Christensen, H. Biopsy and prognosis for cutaneous malignant melanomas in clinical stage I. Scand. J. Plast. Reconstr. Surg. 1980, 14, 141–144. [Google Scholar] [CrossRef] [PubMed]
Aranda-Ordaz, F.J. Relative efficiency of the Kaplan-Meier estimator under contamination: Relative efficiency of the kaplan-meier. Commun. Stat.-Simul. Comput. 1987, 16, 987–997. [Google Scholar] [CrossRef]
Klein, J.P.; Moeschberger, M.L. The robustness of several estimators of the survivorship function with randomly censored data. Commun. Stat.-Simul. Comput. 1989, 18, 1087–1112. [Google Scholar] [CrossRef]
Claeskens, G.; Hjort, N.L. Model selection and model averaging. In Cambridge Books; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Efron, B. The two sample problem with censored data. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 1 January 1967; Volume 4, pp. 831–853. [Google Scholar]
Stute, W. The statistical analysis of Kaplan-Meier integrals. Lect. Notes-Monogr. Ser. 1995, 27, 231–254. [Google Scholar]

Figure 1. True positive rate of correctly identifying the exponential distribution as a function of censoring percentage (a) and sample size (b). The dashed horizontal line represents the theoretical limit (

α_{n} = 0.843

) of selecting the true parametric model over the non-parametric one.

Figure 1. True positive rate of correctly identifying the exponential distribution as a function of censoring percentage (a) and sample size (b). The dashed horizontal line represents the theoretical limit (

α_{n} = 0.843

) of selecting the true parametric model over the non-parametric one.

Figure 2. Overall survival after melanoma diagnosis.

Table 1. Difference in

R M S T

(

Δ

) and associated

F I C

for women compared to men expressed in days at 3, 5 and 10 years for melanoma patients estimated with non-parametric Kaplan–Meier integrals and a combination of Exponential and Gamma distributions.

Table 1. Difference in

R M S T

(

Δ

) and associated

F I C

for women compared to men expressed in days at 3, 5 and 10 years for melanoma patients estimated with non-parametric Kaplan–Meier integrals and a combination of Exponential and Gamma distributions.

Timepoint	Model	$Δ$ (Days)	Bias of $Δ$	$\sqrt{FIC}$	95% CI for $Δ$
3 yrs	Non-param.	65.32	-	30.02	6.47; 124.17
	Param.	64.54	−0.78	21.74	21.92; 107.17
5 yrs	Non-param.	165.19	-	67.10	36.66; 299.72
	Param.	153.92	−14.83	54.83	46.46; 261.40
10 yrs	Non-param.	468.19	-	183.06	109.40; 826.99
	Param.	452.62	−15.57	181.51	96.86; 808.37

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nemes, S.; Gustavsson, A.; Jauhiainen, A. Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators. Entropy 2022, 24, 713. https://doi.org/10.3390/e24050713

AMA Style

Nemes S, Gustavsson A, Jauhiainen A. Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators. Entropy. 2022; 24(5):713. https://doi.org/10.3390/e24050713

Chicago/Turabian Style

Nemes, Szilárd, Andreas Gustavsson, and Alexandra Jauhiainen. 2022. "Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators" Entropy 24, no. 5: 713. https://doi.org/10.3390/e24050713

APA Style

Nemes, S., Gustavsson, A., & Jauhiainen, A. (2022). Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators. Entropy, 24(5), 713. https://doi.org/10.3390/e24050713

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Focused Information Criterion for Restricted Mean Survival Times: Non-Parametric or Parametric Estimators

Abstract

1. Introduction

2. Notation and Assumptions

2.1. Notation and Nomenclature

2.2. Restricted Mean Survival Time

3. Focused Information Criterion for RMST

4. Operating Characteristics of FIC for RMST

5. Practical Application

6. Discussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Behavior of FIC for RMST as a Function of Censoring

Appendix B. Limit Probability as a Function of Censoring and Sample Size

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI