Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation

Peláez, Rebeca; Cao, Ricardo; Vilar, Juan M.

doi:10.3390/math10091523

Open AccessArticle

Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation

by

Rebeca Peláez

^*

,

Ricardo Cao

and

Juan M. Vilar

Research Group MODES, Department of Mathematics, CITIC, University of A Coruña, 15071 A Coruña, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(9), 1523; https://doi.org/10.3390/math10091523

Submission received: 1 April 2022 / Revised: 20 April 2022 / Accepted: 26 April 2022 / Published: 2 May 2022

(This article belongs to the Special Issue Application of Survival Analysis in Economics, Finance and Insurance)

Download

Browse Figures

Versions Notes

Abstract

:

For a fixed time, t, and a horizon time, b, the probability of default (PD) measures the probability that an obligor, that has paid his/her credit until time t, runs into arrears not later that time

t + b

. This probability is one of the most crucial elements that influences the risk in credits. Previous works have proposed nonparametric estimators for the probability of default derived from Beran’s estimator and a doubly smoothed Beran’s estimator of the conditional survival function for censored data. They have also found asymptotic expressions for the bias and variance of the estimators, but they do not provide any practical way to choose the smoothing parameters involved. In this paper, resampling methods based on bootstrap techniques are proposed to approximate the bandwidths on which Beran and smoothed Beran’s estimators of the PD depend. Bootstrap algorithms for the calculation of confidence regions of the probability of default are also proposed. Extensive simulation studies show the good behavior of the presented algorithms. The bandwidth selector and the confidence region algorithm are applied to a German credit dataset to analyze the probability of default conditional on the credit scoring.

Keywords:

bootstrap; censored data; credit risk; kernel method; survival analysis

MSC:

62F40; 62N02; 62G05; 91G40

1. Introduction

The debts coming from clients with unpaid credits have an important impact in the solvency of banks and other credit institutions. According to the Basel Committee on Banking Supervision of the Bank for International Settlements, one of the crucial elements for the risk measurement of capital assignments is the probability of default. For a fixed time, t, and a horizon time, b, the probability of default (PD) can be defined as the probability that a credit that has been paid until time t becomes unpaid not later than time

t + b

. The PD is allowed to depend on the credit scoring x, which is, usually, some linear combination of informative covariates of the credit and the clients. Standard methods to estimate the PD include logistic models and other binary response parametric regression models. See the studies of [1,2,3,4,5,6], among others.

In recent years, survival analysis has started to be considered an interesting tool in credit risk problems. Since the work by [7], some literature has been developed following this line. See the works of [8,9,10,11,12,13,14]. Nonparametric estimators of the probability of default based on conditional survival function estimators are presented in [12,13] since the probability of default can be written in terms of the conditional survival function in a right censored context. All these nonparametric estimators are based on covariate smoothing. In the recent work [14], a general nonparametric estimator of the probability of default with double smoothing, both in the covariate and in the time variable, was proposed and studied.

In particular, Beran’s estimator and a doubly smoothed Beran’s estimator of the PD were presented in [13,14], respectively, and their asymptotic properties were analyzed. Simulation studies carried out in these papers show a good performance of the estimators, especially of the doubly smoothed Beran’s estimator. However, these previous studies were carried out using theoretical smoothing parameters. Since the integrated mean square error expressions are complex and depend on several population parameters, they are not useful in practice to obtain plug-in estimations of these theoretical bandwidths. The goal of this work is to propose resampling techniques to approximate them.

Bootstrap has become a strong tool in many statistical applications since it was first introduced by [15]. Bootstrap for right censored data was first proposed by [16] and the bootstrap method and its applications were studied in [17]. Asymptotic theory of bootstrap for right censored data was stablished by [18,19]. In [20], bootstrap for nonparametric regression with right censored observations at fixed covariate values was studied. A bootstrap approach for the nonparametric censored regression setup was studied in [21]. In [22] a local cross-validation bandwidth selector was proposed.

Our approach follows the ideas of [21], and it is based on the obvious bootstrap. Both Beran and smoothed Beran’s estimators are bootstrapped in order to approximate their corresponding optimal bandwidths. The bootstrap is also useful to compute confidence regions, as the existing theoretical result only allows to compute pointwise and theoretical confidence intervals, which are not computable in practice since the variance of the estimator again depends on unknown population quantities. A bootstrap algorithm to compute confidence regions for the probability of default is also proposed.

The remainder of this paper is organized as follows. In Section 2, bootstrap selectors for the bandwidths of Beran and smoothed Beran’s estimators are proposed. In Section 3, a simulation study shows the behavior of the PD estimators with bootstrap bandwidths. The issue of obtaining confidence regions for the probability of default,

P D (t | x)

, for a fixed value of

x \in I \subseteq R

and t covering the interval

I_{T} \subseteq R^{+}

is addressed in Section 4 using Beran and smoothed Beran’s estimators. A simulation study on the proposed methods for computing confidence regions is shown in Section 5. In Section 6, Beran’s estimator and the smoothed Beran’s estimator with bootstrap bandwidths are used to estimate the probability of default function conditional on the credit scoring for a German credit dataset. Section 7 contains some concluding remarks.

2. Bandwidth Selection for Beran and Smoothed Beran’s PD Estimators

Consider the right censored simple random sample

{\{(X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n}

of

(X, Z, δ)

where

X_{i}

represents the covariate,

Z_{i} = min {T_{i}, C_{i}}

the observed lifetime and

δ_{i} = I_{{T_{i} \leq C_{i}}}

the censoring indicator, where

T_{i} \geq 0

and

C_{i} \geq 0

are the time to occurrence of the event and the censoring time for the i-th individual of the sample with

i = 1, \dots, n

. In credit risk, usually, X is the credit scoring, Z is the observed maturity, T is the time to default, and C is the time until the end of the study or the anticipated cancellation of the credit. The distribution function of T is denoted by

F (t)

and the survival function by

S (t) = 1 - F (t)

. The functions

F (t | x)

and

S (t | x)

are the conditional distribution and survival functions of T evaluated at t given

X = x

. The conditional distribution function of Z is denoted by

H (t | x)

, and the conditional distribution function of C is denoted by

G (t | x)

. It is assumed that an unknown relationship between T and X exists. Given

X = x

, the distributions of T and C conditional to

X = x

are supposed to be independent.

Let

x \in I

be a fixed value of the covariate X and

b > 0

a fixed positive time. Then, the probability of default in a time horizon

t + b

from a maturity time, t, is defined as follows:

\begin{matrix} P D (t | x) & = & P (T \leq t + b | T > t, X = x) \\ = & \frac{F (t + b | x) - F (t | x)}{1 - F (t | x)} = 1 - \frac{S (t + b | x)}{S (t | x)} . \end{matrix}

(1)

In this section, methods for the automatic selection of the bandwidths for Beran’s estimator and the smoothed Beran’s estimator of the probability of default are proposed.

2.1. Beran’s Estimator

Beran’s estimator of the conditional survival function is given by

\begin{matrix} {\hat{S}}_{h} (t | x) & = & \prod_{i = 1}^{n} (1 - \frac{I_{{Z_{i} \leq t, δ_{i} = 1}} w_{h, i} (x)}{1 - \sum_{j = 1}^{n} I_{{Z_{j} < Z_{i}}} w_{h, j} (x)}) \end{matrix}

(2)

where

w_{h, i} (x) = \frac{K ((x - X_{i}) / h)}{\sum_{j = 1}^{n} K ((x - X_{j}) / h)}

with

i = 1, . . ., n

,

K (u)

is a kernel function and

h = h_{n}

is the bandwidth that determines the smoothness introduced in the estimator through the covariate X.

Plugging (2) into (1), the probability of default estimator based on Beran’s PD estimator is obtained:

\begin{matrix} {\hat{P D}}_{h} (t | x) & = & 1 - \frac{{\hat{S}}_{h} (t + b | x)}{{\hat{S}}_{h} (t | x)} . \end{matrix}

(3)

In this section, bootstrap methods are proposed for the automatic selection of h. There are two classic methods for bootstrap resampling in a censoring context: the obvious bootstrap and the simple bootstrap. The equivalence between both methods in an unconditional setup is proved in [16]. In [21], this result is extended to the case where a covariate is involved, assuming there is no ties in the sample values of the covariate. This was done by proving the equivalence of the two resampling methods, the obvious bootstrap and the simple weighted bootstrap. In this paper, the following obvious bootstrap method combined with a smoothed bootstrap for the covariate is proposed.

2.1.1. Algorithm for Bootstrap Resampling

Let

I_{1} \subseteq R

be an interval containing appropriate bandwidth values and let

r \in I_{1}

be the pilot bandwidth for the bootstrap resampling:

Obtain $U_{1}, \dots, U_{n}$ iid with $U_{i} \sim U (0, 1)$ and $V_{1}, \dots, V_{n}$ iid with common density K for all $i = 1, \dots, n$ .
For each $i = 1, \dots, n$ , define

$X_{i}^{*} = X_{[n U_{i}] + 1} + r V_{i},$

where $[u]$ is the integer part of u. Generate $T_{i}^{*}$ from Beran’s estimator of the conditional distribution of T using the sample ${(X_{i}, Z_{i}, δ_{i})}_{i = 1}^{n}$ and bandwidth r, denoted by ${\hat{F}}_{r} (t | X_{i}^{*})$ , and $C_{i}^{*}$ from the Beran’s estimator of the conditional distribution of C using the sample ${(X_{i}, Z_{i}, 1 - δ_{i})}_{i = 1}^{n}$ and bandwidth r, denoted by ${\hat{G}}_{r} (t | X_{i}^{*})$ .
The estimators ${\hat{F}}_{r} (t | X_{i}^{*})$ and ${\hat{G}}_{r} (t | X_{i}^{*})$ are forced to be equal to one from the last observed lifetime ( $max {Z_{i} : i = 1, \dots, n}$ ) onwards.
For each $i = 1, \dots, n$ , obtain

$Z_{i}^{*} = min {T_{i}^{*}, C_{i}^{*}},$

$δ_{i}^{*} = I (T_{i}^{*} \leq C_{i}^{*}) .$
Consider the bootstrap resample ${\{(X_{i}^{*}, Z_{i}^{*}, δ_{i}^{*})\}}_{i = 1}^{n}$ .

In this paper, we want to estimate the probability of default function,

P D (t | x)

, for a fixed

x \in I

and t covering the interval

I_{T} \subset R

. Therefore, our goal is to get the bandwidth

h_{M I S E} \in I_{1}

that minimizes the mean integrated squared error given by

M I S E_{x} (h) = E (\int_{I_{T}} {({\hat{P D}}_{h} (t | x) - P D (t | x))}^{2} d t)

(4)

whose bootstrap approximation is

M I S E_{x}^{*} (h) = E (\int_{I_{T}} {({\hat{P D}}_{h}^{*} (t | x) - {\hat{P D}}_{r} (t | x))}^{2} d t)

where

{\hat{P D}}_{r} (t | x)

is the estimation of the theoretical PD with pilot bandwidth r, using the sample

{\{(X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n}

and

{\hat{P D}}_{h}^{*} (t | x)

is the bootstrap estimation of

P D

with bandwidth h, using the bootstrap resample

{\{(X_{i}^{*}, Z_{i}^{*}, δ_{i}^{*})\}}_{i = 1}^{n}

.

The resampling distribution of

{\hat{P D}}_{h}^{*} (t | x)

cannot be computed in a close form, so the Monte Carlo method is used. It is based on obtaining B bootstrap resamples and estimating

{\hat{P D}}_{h}^{*} (t | x)

for each of them. Thus, the distribution of

{\hat{P D}}_{h}^{*} (t | x)

is approximated by the empirical one of

{\hat{P D}}_{h}^{*, 1} (t | x), \dots, {\hat{P D}}_{h}^{*, B} (t | x)

, obtained from B bootstrap resamples and the bootstrap version of the estimation error committed by Beran’s estimator for any smoothing parameter h is given by

{M I S E}_{x}^{*} (h) ≃ \frac{1}{B} \sum_{k = 1}^{B} (\int_{I_{T}} {({\hat{P D}}_{h}^{*, k} (t | x) - {\hat{P D}}_{r} (t | x))}^{2} d t) .

(5)

Likewise, the integral is approximated by a Riemann sum.

2.1.2. Algorithm for Bootstrap Bandwidth Selector Based on Beran’s Estimator

Let

x \in I

be a fixed value of the covariate,

t \in I_{T}

and

r \in I_{1}

:

Compute ${\hat{P D}}_{r} (t | x)$ from the original sample ${(X_{i}, Z_{i}, δ_{i})}_{i = 1}^{n}$ .
Obtain B bootstrap resamples of the form ${(X_{i}^{*, k}, Z_{i}^{*, k}, δ_{i}^{*, k})}_{i = 1}^{n}$ with $k = 1, . . ., B$ using the smoothed bootstrap with pilot bandwidth $r \in I_{1}$ and calculate ${\hat{P D}}_{h}^{*, k} (t | x)$ for each of them.
Approximate $M I S E_{x}^{*} (h)$ according to (5).
Repeat Steps 1–3 for values of h in a grid of $I_{1}$ .
Select the value of h that provides the smallest $M I S E_{x}^{*} (h)$ as the bootstrap bandwidth $h^{*}$ .

Concerning the auxiliary bandwidth

r \in I_{1}

, a preliminary analysis not shown here suggests

r = \frac{3}{4} (Q_{X} (0.975) - Q_{X} (0.025)) {(\sum_{i = 1}^{n} δ_{i})}^{- 1 / 3}

(6)

where

Q_{X} (u)

is the u quantile of the sample

{\{X_{i}\}}_{i = 1}^{n}

, as a suitable pilot bandwidth in this context.

Note that the proposed algorithm is also valid to obtain a bootstrap approximation of the optimal bandwidth for the estimation of

P D (t | x)

for fixed values of

t \in I_{T}

and

x \in I

by replacing

M I S E_{x}^{*} (h)

by

M S E_{t, x}^{*} (h)

, which is the bootstrap analogue of

M S E_{t, x} (h) = E ({({\hat{P D}}_{h} (t | x) - P D (t | x))}^{2}) .

2.2. Smoothed Beran’s Estimator

Nonparametric estimators for the probability of default such as Beran’s estimator,

{\hat{P D}}_{h} (t | x)

, are smoothed in the covariate X. It is interesting to consider estimators with double smoothing both in the covariate, X, and in the time variable, T. This idea was previously used in [23,24,25] to obtain doubly smoothed estimators of the conditional survival function. In [14], a doubly smoothed version of PD Beran’s estimator was proposed based on a further smoothing, in time, of Beran’s estimator of the conditional survival function ([26]). Asymptotic properties and simulation studies carried out in [14] show that the doubly smoothed Beran’s estimator performs better than the classical Beran’s estimator when estimating the probability of the default curve. This section presents a method for automatic selection of the bandwidths on which it depends.

Let

{\hat{S}}_{h} (t | x)

be Beran’s estimator of the conditional survival function given in (2) with

h = h_{n}

being the smoothing parameter for the covariate. Then, the expression for the doubly smoothed Beran’s estimator of the conditional survival function defined in [25] is as follows:

\begin{matrix} {\tilde{S}}_{h, g} (t | x) & = & 1 - \sum_{i = 1}^{n} s_{(i)} K (\frac{t - Z_{(i)}}{g}) \end{matrix}

(7)

with

s_{(i)} = {\hat{S}}_{h} (Z_{(i - 1)} | x) - {\hat{S}}_{h} (Z_{(i)} | x)

where

Z_{(i)}

is the i-th element of the sorted sample of Z,

K (t)

is the distribution function of a kernel K,

K (t) = \int_{- \infty}^{t} K (u) d u

, and

g = g_{n}

is the smoothing parameter for the time variable. Then, plugging (7) into (1), the probability of the default estimator based on the smoothed Beran’s survival estimator is obtained:

\begin{matrix} {\tilde{P D}}_{h, g} (t | x) & = & 1 - \frac{{\tilde{S}}_{h, g} (t + b | x)}{{\tilde{S}}_{h, g} (t | x)} . \end{matrix}

(8)

A bootstrap method is proposed for the automatic selection of the bivariate bandwidth

(h, g)

.

2.2.1. Algorithm for Bootstrap Resampling

Let

I_{1} \subseteq R

and

I_{2} \subseteq R

be intervals containing appropriate bandwidth values and let

r \in I_{1}

and

s \in I_{2}

be pilot bandwidths for the smoothed resample of X, T and C:

Obtain $U_{1}, \dots, U_{n}$ iid with $U_{i} \sim U (0, 1)$ and $V_{1}, \dots, V_{n}$ iid with common density K, $W_{1}^{1}, \dots, W_{n}^{1}$ iid with common density K and $W_{1}^{2}, \dots, W_{n}^{2}$ iid with common density K for all $i = 1, \dots, n$ .
For each $i = 1, \dots, n$ , obtain

$X_{i}^{*} = X_{[n U_{i}] + 1} + r V_{i},$

$T_{i}^{*} = T_{0, i}^{*} + s W_{i}^{1}$

$C_{i}^{*} = C_{0, i}^{*} + s W_{i}^{2}$

where $T_{0, i}^{*}$ is resampled from ${\hat{F}}_{r} (t | X_{i}^{*})$ constructed using Beran’s estimator with the sample ${(X_{i}, Z_{i}, δ_{i})}_{i = 1}^{n}$ and $C_{0, i}^{*}$ is resampled from ${\hat{G}}_{r} (t | X_{i}^{*})$ constructed using Beran’s estimator with the sample ${(X_{i}, Z_{i}, 1 - δ_{i})}_{i = 1}^{n}$ .
For each $i = 1, \dots, n$ , obtain

$Z_{i}^{*} = min {T_{i}^{*}, C_{i}^{*}},$

$δ_{i}^{*} = I (T_{i}^{*} \leq C_{i}^{*}) .$
Consider the bootstrap resample ${\{(X_{i}^{*}, Z_{i}^{*}, δ_{i}^{*})\}}_{i = 1}^{n}$ .

The conditional distribution functions of

T^{*} | X^{*}

and

C^{*} | X^{*}

are, respectively, the smoothed Beran’s estimators

{\tilde{F}}_{r, s} (t | X_{i}^{*})

and

{\tilde{G}}_{r, s} (t | X_{i}^{*})

.

The optimal bivariate bandwidth,

(h_{M I S E}, g_{M I S E}) \in I_{1} \times I_{2}

is defined as the pair of bandwidths that minimizes the mean integrated squared error given by

M I S E_{x} (h, g) = E (\int_{I_{T}} {({\tilde{P D}}_{h, g} (t | x) - P D (t | x))}^{2} d t) .

(9)

The bootstrap version of

M I S E_{x} (h, g)

is given by

M I S E_{x}^{*} (h, g) = E (\int_{I_{T}} {({\tilde{P D}}_{h, g}^{*} (t | x) - {\tilde{P D}}_{r, s} (t | x))}^{2} d t),

where

{\tilde{P D}}_{r, s} (t | x)

is the smoothed Beran’s PD estimation with pilot bandwidths

(r, s) \in I_{1} \times I_{2}

using the sample

{\{(X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n}

and

{\tilde{P D}}_{h, g}^{*} (t | x)

is the bootstrap estimation of

P D

with bandwidths

(h, g)

, using the bootstrap resample

{\{(X_{i}^{*}, Z_{i}^{*}, δ_{i}^{*})\}}_{i = 1}^{n}

. Since the sampling distribution of

{\tilde{P D}}_{h, g}^{*} (t | x)

is unknown, the Monte Carlo method gives the following approximation

{M I S E}_{x}^{*} (h, g) ≃ \frac{1}{B} \sum_{k = 1}^{B} (\int_{I_{T}} {({\tilde{P D}}_{h, g}^{*, k} (t | x) - {\tilde{P D}}_{r, s} (t | x))}^{2} d t),

(10)

based on the empirical distribution of

{\tilde{P D}}_{h, g}^{*} (t | x)

obtained from B bootstrap resamples. The integral is approximated by a Riemann sum.

2.2.2. Algorithm for Bootstrap Bandwidth Selector Based on the Smoothed Beran’s Estimator

Let x be a fixed value of the covariate,

t \in I_{T}

and

(r, s) \in I_{1} \times I_{2}

:

Compute ${\tilde{P D}}_{r, s} (t | x)$ from the original sample ${(X_{i}, Z_{i}, δ_{i})}_{i = 1}^{n}$ .
Obtain B bootstrap resamples of the form ${(X_{i}^{*, k}, Z_{i}^{*, k}, δ_{i}^{*, k})}_{i = 1}^{n}$ with $k = 1, . . ., B$ using the doubly smoothed bootstrap and calculate ${\tilde{P D}}_{h, g}^{*, k} (t | x)$ for each of them.
Approximate $M I S E_{x}^{*} (h)$ according to (10).
Repeat Steps 1–3 for pairs of values $(h, g)$ in a grid of $I_{1} \times I_{2}$ .
Obtain the pair $(h, g)$ that provides the smallest $M I S E_{x}^{*} (h, g)$ as the bootstrap bandwidth $(h^{*}, g^{*})$ .

The auxiliary bandwidth

r \in I_{1}

was defined in (6). The pilot bandwidth

s \in I_{2}

for the time variable smoothing is

s = \frac{3}{4} (Q_{Z} (0.975) - Q_{Z} (0.025)) {(\sum_{i = 1}^{n} δ_{i})}^{- 1 / 7}

(11)

where

Q_{Z} (u)

is the u quantile of the sample

{\{Z_{i}\}}_{i = 1}^{n}

.

3. Simulation Study for Bandwidth Selection

A simulation study was conducted in order to show the behavior of bootstrap bandwidth selectors for Beran’s and smoothed Beran’s estimators proposed in Section 2. Two models are considered, one with Weibull lifetime and censoring time distributions and another one with exponential distributions.

Model 1 considers a

U (0, 1)

distribution for X. The time to occurrence of the event conditional to the covariate,

{T |}_{X = x}

, follows a Weibull distribution with parameters

d = 2

and

Γ {(x)}^{- 1 / d}

where

Γ (x) = 1 + 5 x

, and the censoring time conditional to the covariate,

{C |}_{X = x}

, follows a Weibull distribution with parameters

d = 2

and

Δ {(x)}^{- 1 / d}

where

Δ (x) = 10 + d_{1} x + 20 x^{2}

. In this case, the conditional survival function and the censoring conditional probability are given by

S (t | x) = e^{- Γ (x) t^{d}},

P (δ = 0 | X = x) = \frac{Δ (x)}{Γ (x) + Δ (x)} .

Having set the value of the covariate

x = 0.6

, the value of

d_{1}

is chosen so that the censoring conditional probability is

0.2

and

0.5

. These values are

d_{1} = - 27

and

d_{1} = - 22

, respectively. The conditional survival function for this model is estimated in a time grid of size

n_{T}

,

0 < t_{1} < \dots < t_{n_{T}}

, where

t_{n_{T}} + b = F^{- 1} (0.95 | x) = 0.8654

and

b = 0.15

, i.e., about

20 %

of the time grid range for the value of the covariate

x = 0.6

. Therefore, in this case,

I_{T} = (0, 0.8654)

.

Model 2 also considers a

U (0, 1)

distribution for X. The time to occurrence of the event conditional to the covariate,

{T |}_{X = x}

, follows an exponential distribution with parameter

E (x) = 2 + 58 x - 160 x^{2} + 107 x^{3}

, and the censoring time conditional to the covariate,

{C |}_{X = x}

, follows an exponential distribution with parameter

Θ (x) = 10 + c_{1} x + 20 x^{2}

. In this scenario, the conditional survival function and the censoring conditional probability are the following:

S (t | x) = e^{- E (x) t},

P (δ = 0 | X = x) = \frac{Θ (x)}{E (x) + Θ (x)} .

Having set the value of the covariate,

x = 0.8

, the value of

c_{1}

is chosen so that the censoring conditional probability is

0.2

and

0.5

. These values are

c_{1} = - 113 / 4

and

c_{1} = - 55 / 2

, respectively. The conditional survival function is estimated in a time grid of size

n_{T}

,

0 < t_{1} < \dots < t_{n_{T}}

, where

t_{n_{T}} + b = F^{- 1} (0.95 | x) = 3.8211

and

b = 0.7

, i.e., about

20 %

of the time grid range for the value of the covariate

x = 0.8

. Therefore, in this case,

I_{T} = (0, 3.8211)

.

It can be proved that Model 1 is close to a proportional hazards model, while Model 2 moves away from this parametric model. These two models were used in the simulation study carried out by [14].

The boundary effect is corrected using the reflexion principle, and the truncated Gaussian kernel with a truncation range

(- 50, 50)

is considered. The size of the lifetime grid is

n_{T} = 100

. The sample size is

n = 400

. The simulation study is carried out with software developed in R by the authors themselves. In order to minimize the MISE error function without increasing CPU time more than necessary, a limited-memory algorithm for solving large nonlinear optimization problems is used, L-BFGS-B. It was proposed by [27] for solving optimization problems subject to simple bounds on the variables in which information on the Hessian matrix is difficult to obtain. Results of numerical studies about this method are shown in [27]. It is available at the stats package from the Comprehensive R Archive Network (CRAN) using Fortran 77 subroutines (see [28]).

3.1. Simulation Study for Beran’s Estimator

In this subsection, the behavior of the bootstrap bandwidth selector for Beran’s estimator is shown. For each model, the estimation error function

M I S E_{x} (h)

is approximated via Monte Carlo using 300 simulated samples. The bandwidth that minimises

M I S E_{x} (h)

is obtained and denoted by

h_{M I S E}

. The values of

h_{M I S E}

and

M I S E_{x} (h_{M I S E})

are used as a benchmark.

In the simulation study,

N = 300

simulated samples are used. For each sample,

B = 500

bootstrap resamples are obtained to approximate the bootstrap MISE function,

M I S E_{x}^{*} (h)

, and obtain the bootstrap bandwidth associated to each simulated sample

h_{j}^{*}

,

j = 1, 2, \dots, N

. The mean value of the N bootstrap bandwidths and the standard deviation are defined as follows

\bar{h^{*}} = \frac{1}{N} \sum_{j = 1}^{N} h_{j}^{*}, s d (h^{*}) = \sqrt{\frac{1}{N} \sum_{j = 1}^{N} {(h_{j}^{*} - \bar{h^{*}})}^{2}}

As a relative measure of the difference between the bootstrap bandwidth and the optimal one, we compute

H_{j}^{*} = \frac{h_{j}^{*} - h_{M I S E}}{h_{M I S E}}

with

j = 1, \dots, N

. The mean of the absolute value of these relative deviations,

\bar{H^{*}} = \frac{1}{N} \sum_{j = 1}^{N} | H_{j}^{*} |

, is a good measure of how close the bootstrap bandwidth is to the optimal one.

For each sample, the estimation error committed by Beran’s estimator with the corresponding bootstrap bandwidth,

M I S E_{x} (h_{j}^{*}) = E (\int_{I_{T}} {({\hat{P D}}_{h_{j}^{*}} (t | x) - P D (t | x))}^{2} d t),

and its squared root,

R M I S E_{x} (h_{j}^{*})

, are approximated via Monte Carlo using 300 simulated samples. The mean of these estimation errors given by

\bar{R M I S E_{x} (h^{*})} = \frac{1}{N} \sum_{j = 1}^{N} R M I S E_{x} (h_{j}^{*})

is used as a measure of the estimation error made by the bootstrap bandwidth, when compared with the estimation error made by the MISE bandwidth.

As a relative measure of the difference between the estimation errors using the bootstrap and the MISE bandwidths, the following ratios are defined:

R_{j}^{*} = \frac{R M I S E_{x} (h_{j}^{*}) - R M I S E (h_{M I S E})}{R M I S E (h_{M I S E})}

satisfying

R_{j}^{*} \geq 0

for all

j = 1, \dots, N

. The mean of the

R_{j}^{*}

values with

j = 1, \dots, N

is denoted by

\bar{R^{*}} = \frac{1}{N} \sum_{j = 1}^{N} R_{j}^{*}

. Small values (close to zero) of

\bar{H^{*}}

and

\bar{R^{*}}

indicate good behavior of the bootstrap bandwidth. Values of the bootstrap bandwidths, estimation errors and relative measures for Models 1 and 2 are included in Table 1. The results show a good performance of the proposed bootstrap selector.

Figure 1 and Figure 2 show the function

M I S E_{x} (h)

along with the Monte Carlo approximations of

M I S E_{x}^{*} (h)

for some simulated samples and the boxplots of

H_{j}^{*}

and

R_{j}^{*}

with

j = 1, \dots, N

for Models 1 and 2. The method tends to slightly underestimate the value of

h^{*}

with respect to

h_{M I S E}

in Model 1 and overestimate its value in Model 2, which is reflected in the boxplots of

H_{j}^{*}

. Nevertheless, these figures show that the

M I S E_{x} (h)

curve is fairly flat and variations in the selection of h do not imply an important increase in the estimation error.

In order to illustrate the results, Figure 3 shows the theoretical probability of default function and Beran’s estimation with the MISE and bootstrap bandwidths drawn for one sample from Model 1 and 2 when the conditional probability of censoring is

0.5

. For large values of time, the performance of the estimator becomes worse, due to the fact that in that region there are few data, most of them censored, and therefore offering poor information.

3.2. Simulation Study for the Smoothed Beran’s Estimator

In this section, a simulation study on the bootstrap bandwidth selector of the smoothed Beran’s estimator in (8) is carried out. The resampling technique and Monte Carlo approximation of the MISE presented in Section 2.2 are used.

For each model, the error function

M I S E_{x} (h, g)

is approximated via Monte Carlo from 300 simulated samples, and the bivariate bandwidth that minimizes

M I S E_{x} (h, g)

is obtained and denoted by

(h_{M I S E}, g_{M I S E})

. The values of

(h_{M I S E}, g_{M I S E})

and

M I S E_{x} (h_{M I S E}, g_{M I S E})

are used as a benchmark.

In the study,

N = 300

samples are simulated. For each simulated sample, the corresponding bootstrap bandwidths are approximated from

B = 500

resamples, obtaining

(h_{j}^{*}, g_{j}^{*})

with

j = 1, \dots, N

. The mean value of the N bootstrap bandwidths and the standard deviation are the following:

(\bar{h^{*}}, \bar{g^{*}}) = (\frac{1}{N} \sum_{j = 1}^{N} h_{j}^{*}, \frac{1}{N} \sum_{j = 1}^{N} g_{j}^{*}),

s d (h^{*}) = \sqrt{\frac{1}{N} \sum_{j = 1}^{N} {(h_{j}^{*} - \bar{h^{*}})}^{2}}, s d (g^{*}) = \sqrt{\frac{1}{N} \sum_{j = 1}^{N} {(g_{j}^{*} - \bar{g^{*}})}^{2}} .

In order to measure the distance of the bootstrap bidimensional bandwidth of the j-th sample,

(h_{j}^{*}, g_{j}^{*})

, from the corresponding MISE bandwidth,

(h_{M I S E}, g_{M I S E})

, consider the vector

D_{j}^{*} = (\frac{h_{j}^{*} - h_{M I S E}}{h_{M I S E}}, \frac{g_{j}^{*} - g_{M I S E}}{g_{M I S E}}) \in R^{2} .

and its Euclidean norm denoted by

H_{j}^{*} = {∥ D_{j}^{*} ∥}_{2}

with

j = 1, \dots, N

. The mean value,

\bar{H^{*}} = \frac{1}{N} \sum_{j = 1}^{N} H_{j}^{*}

is a measure of how close the bootstrap bandwidths are to the MISE one.

For each sample, the estimation error committed by the smoothed Beran’s estimator with the corresponding bootstrap bandwidth,

M I S E_{x} (h_{j}^{*}, g_{j}^{*}) = E (\int_{I_{T}} {({\tilde{P D}}_{h_{j}^{*}, g_{j}^{*}} (t | x) - P D (t | x))}^{2} d t),

and its squared root,

R M I S E_{x} (h_{j}^{*}, g_{j}^{*})

, are approximated via Monte Carlo using 300 simulated samples. The mean of these estimation errors given by

\bar{R M I S E_{x} (h^{*}, g^{*})} = \frac{1}{N} \sum_{j = 1}^{N} R M I S E_{x} (h_{j}^{*}, g_{j}^{*})

is used as a measure of the estimation error committed by the bootstrap bidimensional bandwidth in the model.

The ratio

R_{j}^{*} = \frac{R M I S E_{x} (h_{j}^{*}, g_{j}^{*}) - R M I S E_{x} (h_{M I S E}, g_{M I S E})}{R M I S E_{x} (h_{M I S E}, g_{M I S E})}

is defined as a relative measure of the difference between the error committed by the estimator with bootstrap bandwidth and MISE bandwidth. The mean of the positive values

R_{j}^{*}

with

j = 1, \dots, N

is denoted by

\bar{R^{*}} = \frac{1}{N} \sum_{j = 1}^{N} R_{j}^{*}

. Values of the bootstrap bivariate bandwidths, estimation errors and relative measures for Models 1 and 2 are included in Table 2.

Figure 4 shows the

M I S E_{x} (h, g)

function of the smoothed Beran’s estimator and its bootstrap approximation for one sample of both Models 1 and 2 when the conditional probability of censoring is

0.5

. It is approximated on a grid of 50 values of h and 50 values of g. Note that both

M I S E_{x} (h, g)

and

M I S E_{x}^{*} (h, g)

curves for each fixed h value are quite similar in the region close to the minimum value of

M I S E_{x}^{*} (h, g)

. Thus, the influence of covariate smoothing parameter h is weak when estimating the PD using values of bandwidth g close to the optimal one.

Figure 5 and Figure 6 show the boxplots of

H_{j}^{*}

and

R_{j}^{*}

with

j = 1, \dots, N

. In general, the selector tends to underestimate the value of the bandwidths. Due to the behavior of the

M I S E_{x} (h, g)

curves mentioned above, this does not lead to a significant increase in the estimation error.

Figure 7 shows the theoretical probability of the default function and Beran’s estimation with MISE and bootstrap bandwidths for one sample from Models 1 and 2 when the conditional probability of censoring is

0.5

. Comparing this figure with the equivalent one for Beran’s estimator shown in Figure 3, the improvement in estimation due to the double smoothing is remarkable.

The results showed in Table 1 and Table 2 are summarized in Table 3 to compare the behavior of Beran (BERAN) and the smoothed Beran’s (SBERAN) estimators for the PD and to evaluate whether the improvement that smoothing in the time variable provides for PD estimation is preserved when approximating the smoothing parameters by resampling techniques. Table 3 shows the estimation errors committed by Beran and the smoothed Beran’s estimators of the probability of default using bootstrap bandwidths. In order to measure the increase in estimation error resulting from using Beran’s estimator, the following ratio is defined:

R_{S} = \frac{\bar{R M I S E_{x} (h^{*})} - \bar{R M I S E (h^{*}, g^{*})}}{\bar{R M I S E (h^{*}, g^{*})}}

and included in Table 3.

In Model 1, the estimation error committed by Beran’s estimator is

20 %

larger than the error committed by the smoothed Beran’s estimator when the conditional probability of censoring is 0.2 and by

50 %

when the conditional probability of censoring is 0.5. In Model 2, these differences are even more significant: the estimation error increases up to

80 %

when using Beran’s estimator with bootstrap bandwidth instead of the doubly smoothed Beran’s estimator.

4. Confidence Regions Using Beran and Smoothed Beran’s Estimators

Let

x \in I

be a fixed value of the covariate and consider

P D (t | x)

the probability of default curve with

t \in I_{T}

. The curve

P D (t | x)

belongs to the function space

F (I_{T})

whose elements are real-valued functions with domain

I_{T}

. From the sample

{(X_{i}, Z_{i}, δ_{i}),

i = 1, . . ., n}

, Beran’s estimation of

P D (t | x)

,

{\hat{P D}}_{h} (t | x)

, is obtained and a confidence region of

P D (t | x)

at

1 - α

confidence level associated to Beran’s estimator can be constructed. A similar construction is done for the smoothed Beran’s estimator. This confidence region of

P D (t | x)

is a random subset of

F (I_{T})

denoted by

R_{α}

that satisfies

P (P D (t | x) \in R_{α}, \forall t \in I_{T}) = 1 - α .

In this section, a method for constructing confidence regions,

R_{α}

, based on Beran and the smoothed Beran’s estimator is developed.

First, Beran’s estimator of the probability of default,

{\hat{P D}}_{h} (t | x)

, given in (3) is used. This method follows the ideas of [29] to obtain prediction regions. It is based on finding the value of

λ_{α} \in R^{+}

such that

P (| {\hat{P D}}_{h} (t | x) - P D (t | x) | < λ_{α} σ (t), \forall t \in I_{T}) = 1 - α

with

σ^{2} (t) = V a r ({\hat{P D}}_{h} (t | x))

. Thus, the theoretical confidence region is defined by

R_{α} = \{({\hat{P D}}_{h} (t | x) - λ_{α} σ (t), {\hat{P D}}_{h} (t | x) + λ_{α} σ (t)) : t \in I_{T}\} .

Since

λ_{α}

and

σ (t)

are unknown, they are approximated by means of a bootstrap technique. The bootstrap confidence region is defined as follows:

R_{α}^{*} = \{({\hat{P D}}_{h}^{*} (t | x) - λ_{α}^{*} σ^{*} (t), {\hat{P D}}_{h}^{*} (t | x) + λ_{α}^{*} σ^{*} (t)) : t \in I_{T}\} .

where

{\hat{P D}}_{h}^{*} (t | x)

is the bootstrap estimation of

P D

with bandwidth h and

λ_{α}^{*}

and

σ^{*} (t)

are the bootstrap analogue of

λ_{α}

and

σ (t)

. The confidence region

R_{α}^{*}

satisfies

p (λ_{α}^{*}) = P ({\hat{P D}}_{r} (t | x) \in R_{α}^{*}, \forall t \in I_{T}) = 1 - α .

(12)

From the original sample

{\{(X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n}

, Beran’s estimator of

P D (t | x)

is obtained with appropriate bandwidth h,

{\hat{P D}}_{h} (t | x)

. The algorithm to obtain the bootstrap confidence region for

P D (t | x)

at confidence level

1 - α

associated to

{\hat{P D}}_{h} (t | x)

is explained below. The Monte Carlo method is used to approximate

σ^{*} (t)

, and an iterative method is used to approximate the value of

λ_{α}^{*}

so that the confidence region has a confidence level approximately equal to

1 - α

.

4.1. Confidence Region Based on Beran’s Estimator

Compute Beran’s estimator ${\hat{P D}}_{r} (t | x)$ from the original sample ${\{(X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n}$ and pilot bandwidth $r \in I_{1}$ .
Generate B bootstrap resamples of the form ${\{(X_{i}^{*, k}, Z_{i}^{*, k}, δ_{i}^{*, k})\}}_{i = 1}^{n}$ by means of the resampling algorithm presented in SubSection 2.1 and pilot bandwidth r.
For $k = 1, \dots, B$ , compute ${\hat{P D}}_{h}^{*, k} (t | x)$ with the k-th bootstrap resample and bandwidth h, obtaining ${\{{\hat{P D}}_{h}^{*, k} (t | x)\}}_{k = 1}^{B}$ .
Approximate the standard deviation of ${\hat{P D}}_{h}^{*} (t | x)$ by

$σ^{*} (t) ≃ {(\frac{1}{B} \sum_{k = 1}^{B} {({\hat{P D}}_{h}^{*, k} (t | x) - \frac{1}{B} \sum_{l = 1}^{B} {\hat{P D}}_{h}^{*, l} (t | x))}^{2})}^{1 / 2}, t \in I_{T} .$
Use an iterative method to obtain an approximation of the value $λ_{α}^{*}$ defined in (12).
The confidence region is given by

$R_{α} = \{({\hat{P D}}_{h} (t | x) - λ_{α}^{*} σ^{*} (t), {\hat{P D}}_{h} (t | x) + λ_{α}^{*} σ^{*} (t)) : t \in I_{T}\} .$

4.2. Iterative Method to Approximate $λ_{α}^{*}$

The iterative method to approximate the value of

λ_{α}^{*} \in R^{+}

so that the confidence region

R_{α}^{*}

has a confidence level approximately equal to

1 - α

is explained below. This algorithm allows the parameter

λ_{α}^{*}

to be approximated quickly and efficiently.

Let

{\{{\hat{P D}}_{h}^{*, k} (t | x)\}}_{k = 1}^{B}

be the Beran’s estimations of the PD with bandwidth h over a set of B bootstrap resamples of

{\{(X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n}

. Define the Monte Carlo approximation of

p (λ)

in (12), for any

λ \in R^{+}

, as follows:

p (λ) ≃ \frac{1}{B} \sum_{k = 1}^{B} I \{{\hat{P D}}_{r} (t | x) \in ({\hat{P D}}_{h}^{*, k} (t | x) - λ σ^{*} (t), {\hat{P D}}_{h}^{*, k} (t | x) + λ σ^{*} (t)), \forall t \in I_{T}\} .

(13)

Let

λ_{L}, λ_{H} \in R^{+}

be such that

p (λ_{L}) \leq 1 - α \geq p (λ_{H})

and let

ζ > 0

be a tolerance, for example,

ζ = 10^{- 4}

.

Obtain $λ_{M} = \frac{λ_{L} + λ_{H}}{2}$ and compute Monte Carlo approximations of $p (λ_{L})$ , $p (λ_{M})$ and $p (λ_{H})$ according to (13).
If $p (λ_{M}) = 1 - α$ or $p (λ_{H}) - p (λ_{L}) < ζ$ , then $λ_{α}^{*} = λ_{M}$ . Otherwise,
2.1
If $1 - α < p (λ_{M})$ , then $λ_{H} = λ_{M}$ and return to Step 1.
2.2
If $p (λ_{M}) < 1 - α$ , then $λ_{L} = λ_{M}$ and return to Step 1.

A preliminary analysis not shown here suggests the following pilot bandwidth:

r = \frac{3}{4} (Q_{X} (0.975) - Q_{X} (0.025)) {(\sum_{i = 1}^{n} δ_{i})}^{- 1 / 3} .

This method to obtain confidence regions for the curve

P D (t | x)

for fixed

x \in I

and t covering

I_{T}

based on Beran’s estimator can be adapted to obtain confidence regions using the doubly smoothed Beran’s estimator. Simply replace Beran’s estimator

{\hat{P D}}_{h} (t | x)

by the smoothed Beran’s estimator

{\tilde{P D}}_{h, g} (t | x)

given in (8) where necessary, and obtain the analogous bootstrap approximations of

λ_{α}

and

σ (t)

. The confidence region is given by

R_{α} = \{({\tilde{P D}}_{h, g} (t | x) - λ_{α}^{*} σ^{*} (t), {\tilde{P D}}_{h, g} (t | x) + λ_{α}^{*} σ^{*} (t)) : t \in I_{T}\} .

Denote the lower and upper bounds of the confidence region by

l (t, x)

and

u (t, x)

, respectively. It may happen that the lower bound of the confidence region is less than 0 or the upper bound is greater than 1 for some points

(t_{0}, x_{0})

. When this happens, we set

l (t_{0}, x_{0}) = 0

or

u (t_{0}, x_{0}) = 1

, as appropriate.

The pilot bandwidths defined in (6) and (11) are used for the confidence region algorithm based on both Beran and smoothed Beran’s estimators.

5. Simulation Study for Confidence Regions

A simulation study is carried out to test the performance of bootstrap confidence regions proposed. Models 1 and 2 described in Section 3 are considered in this study, with identical features. The methods shown in Section 4 are used for this purpose with both Beran and smoothed Beran’s estimators. When Beran’s estimator is used, the bandwidth that minimizes the mean integrated squared error,

h = h_{M I S E}

, is used. Similarly, if the smoothed Beran’s estimator is used, the two-dimensional bandwidth that minimizes the mean integrated squared error,

(h, g) = (h_{M I S E}, g_{M I S E})

, is used. These bandwidths are unknown in practice, but they allow a fair comparison of methods in the simulation study.

The simulation set-up is the one explained in Section 3. Two conditional probabilities of censoring are considered for each model:

P (δ = 0 | x) = 0.2

and

P (δ = 0 | x) = 0.5

. The number of bootstrap resamples of each samples is

B = 500

, and

N = 300

simulated samples of each model are obtained. The sample size is

n = 400

. The confidence level is

1 - α

with

α = 0.05

.

Figure 8 shows Beran’s estimations of the PD for

B = 500

resamples from one sample of Model 2 when the conditional probability of censoring is 0.5. The theoretical probability of default is also plotted in the figure. The PD is estimated on a time grid

t_{1} = 0 < t_{2} < \dots < t_{n_{T}}

such that

t_{n_{T}} + b = F^{- 1} (0.95 | x)

. The information provided by the data in the right tail of such a time distribution is sparse due to high censoring. The method results in extremely wide confidence regions or degeneration to zero as in the case of Model 2 (see Figure 8). Therefore, the time grid in this section is restricted to the interval where sufficient information is available.

For this section, we consider the problem of obtaining the bootstrap confidence region for the probability of default in a time grid

t_{1} = 0 < t_{2} < \dots < t_{n_{T}}

such that with

t_{k} \in I_{T} \subseteq R^{+}

for all

k = 1, \dots, n_{T}

and

t_{n_{T}} + b = F^{- 1} (0.70 | x)

, with b being approximately equal to

20 %

of the grid length. For Model 1, having set the value of the covariate,

x = 0.6

, the time horizon is

b = 0.1

(

20 %

of the time range) and

t_{n_{T}} + b = F^{- 1} (0.70 | x = 0.6) = 0.55

. For Model 2, having set the value of the covariate,

x = 0.8

, the time horizon is

b = 0.3

(

20 %

of the time range) and

t_{n_{T}} + b = F^{- 1} (0.70 | x = 0.8) = 1.55

. Table 4 contains the bandwidths that minimize the MISE function for Beran’s estimator and the smoothed Beran’s estimator along this new time grid.

For each model, the confidence region is obtained according to the method explained in Section 4 using both Beran’s estimator and the smoothed Beran’s estimator. The criteria for comparing the methods are set out below.

A confidence region performs well if its coverage is close to the nominal one, in this case

1 - α = 0.95

, and has a small area or average width. Denoting

l (t, x) = {\hat{P D}}_{h} (t | x) - λ_{α}^{*} σ^{*} (t)

and

u (t, x) = {\hat{P D}}_{h} (t | x) + λ_{α}^{*} σ^{*} (t)

when using Beran’s estimator or

l (t, x) = {\tilde{P D}}_{h, g} (t | x) - λ_{α}^{*} σ^{*} (t)

and

u (t, x) = {\tilde{P D}}_{h, g} (t | x) + λ_{α}^{*} σ^{*} (t)

when using the smoothed Beran’s estimator, the following values measure the performance of the confidence region and allow for the comparison of results.

Coverage is the percentage of bootstrap regions that contain the whole theoretical probability of default curve and it is defined as follows

\frac{1}{N} \sum_{j = 1}^{N} I \{P D (t_{k} | x) \in (l (t_{k}, x), u (t_{k}, x)), \forall k = 1, . . ., n_{T}\} .

The mean pointwise coverage is the mean of the proportion of time grid values for which the confidence region contains the theoretical probability of default curve. It is given by

\frac{1}{N} \sum_{j = 1}^{N} (\frac{1}{n_{T}} \sum_{k = 1}^{n_{T}} I \{P D (t_{k} | x) \in (l (t_{k}, x), u (t_{k}, x))\}) .

Average width of the bootstrap confidence region is defined by

\frac{1}{N} \sum_{j = 1}^{N} (\frac{1}{n_{T}} \sum_{k = 1}^{n_{T}} (u (t_{k}, x) - l (t_{k}, x))) .

Winkler score (see [30]) is also used to compare the behavior of the methods. For classical confidence or prediction intervals, it is defined as the length of the interval plus a penalty if the theoretical value is outside the interval. Thus, it combines width and coverage. For values that fall within the interval, the Winkler score is simply the length of the interval. So low scores are associated with narrow intervals. When the theoretical value falls outside the interval, the penalty is proportional to how far the observation is from the interval. The formula of the Winkler score (WS) as a function of the time and covariate variables is as follows:

\begin{matrix} WS (t, x) & = & u (t, x) - l (t, x) + \frac{2}{α} (l (t, x) - S (t | x)) I (S (t | x) < l (t, x)) \\ + \frac{2}{α} (S (t | x) - u (t, x)) I (S (t | x) > u (t, x)) . \end{matrix}

Since we are working with confidence regions for fixed

x \in I

and t varying over the interval

I_{T}

, the integrated Winkle score is proposed as a criteria for the comparison of the confidence regions. It is defined by

I W S (x) = \int_{I_{T}} W S (t, x) d t .

and the lower the value of IWS, the better the performance of the confidence region.

The results obtained are shown in Table 5. The high values of pointwise coverage in all scenarios are remarkable. Furthermore, these coverage percentages are preserved when using double smoothing, while the average width of the confidence regions is halved. This is reflected in the IWS, which presents much larger values in the Beran’s estimator-based confidence regions.

This analysis is also illustrated in Figure 9 and Figure 10, where the confidence region for the probability of default of one sample from Models 1 and 2 is shown. These graphs show the higher variability of the Beran’s estimations in the resamples with respect to the smoothed Beran’s estimations. This leads to much wider confidence regions, especially at the right tail of the time distribution.

6. Application to Real Data

In this section, bandwidth selectors for Beran’s and the smoothed Beran’s estimators are applied to the German Credit dataset, and the confidence region of the probability of default is obtained. This dataset is publicly available on the webpage http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data) (accessed on 15 September 2021) and was previously analyzed in [31]. It includes information about 1000 credits, from which 293 were classified as bad credits and 707 as good credits. Then, the censoring ratio of this dataset is

70.7 %

. The duration of the credits in months is available along with the credit amount, checking account, savings amount and time of employment, among others.

The duration of the credits is set as the time to default, Z, the bad/defaulted credits are denoted by

δ = 1

and the good credits by

δ = 0

. Let us denote the credit scoring with

X = (1, θ_{2}, θ_{3}, θ_{4}) (X_{1}, X_{2}, X_{3}, X_{4})

.

Since some of the original covariates are ordinal (interval) variables, we change them into numerical variables by following the criteria explained in [31]:

X_{1}

is already a continuous variable denoting amount of credit in DM,

X_{2} \in {- 0.05, 0.01, 0.25, 0}

denotes the amount of money in the checking account in thousands of DM,

X_{3} \in {0, 0.05, 0.25, 0.75, 1.25}

denotes the savings amount in thousands of DM, and

X_{4} \in {0, 0.5, 2.5, 5.5, 8.5}

denotes the years of employment. The single-index method proposed by [31] is used to estimate

(1, θ_{2}, θ_{3}, θ_{4})

. The credit scoring is obtained as follows:

X = X_{1} + 3.2091 X_{2} + 0.2312 X_{3} + 2.1891 X_{4}

Figure 11 shows the scatter plot between credit scoring and follow-up time variable by distinguishing between the censored and uncensored (and therefore, defaulted) credits. A dependency relationship between the two variables can be identified in the plot.

The probability of default,

P D (t | x)

, is estimated when

x = 0.85

, which is a close value to the sample mean of the credit scoring, and

t \in [0, 60]

. The bandwidth selector presented in Section 2.1 is used to approximate the optimal bandwidth for Beran’s estimator, obtaining

h^{*} = 0.500

. The bandwidth selector presented in Section 2.2 gives the bootstrap approximation of the optimal bivariate bandwidth for the smoothed Beran’s estimator,

(h^{*}, g^{*}) = (0.102, 13.614)

. The estimations of the conditional survival function and the probability of default by means of Beran’s and the smoothed Beran’s estimator with the corresponding bootstrap bandwidths are shown in Figure 12. The poor behavior of Beran’s PD estimator for large values of time is evident. The results obtained by the doubly smoothed Beran’s estimator seem to be more appropriate, since the roughness of the Beran’s estimaton is not expected in this type of curve. Supporting the conclusion of our real data analysis, the estimation of the probability of default over time for the assessment of risk in portfolios and bond rating obtained in [32,33] have shapes similar to those obtained here.

Finally, the confidence region methods proposed in Section 4 are applied. Since the MISE bandwidths are unknown in this context, bootstrap bandwidths are used. The bootstrap resamples and the resulting confidence regions at confidence level

95 %

using each estimator are shown in Figure 13. The average width of the confidence region based on Beran’s estimator is

0.5581

, and the average width of the one based on the smoothed Beran’s estimator is

0.1438

. Note that the confidence region of

P D (t | x)

is computed over the time interval

[0, 40]

. Since the information provided in the right tail of the time distribution is sparse, Beran’s estimator performs very poorly, leading to extremely wide confidence regions. However, this problem is not as severe for the doubly smoothed Beran’s estimator, so the confidence region is computable for higher values of time. Figure 14 shows the confidence region of

P D (t | x)

based on the smoothed Beran’s estimator with

t \in [0, 60]

. The average width of this confidence region is

0.2398

.

In practice, the financial institution measures different features of its clients, such as age, amount of money in the bank account, salary, years of employment, etc. They summarize, usually by logistic regression, these covariates into the single variable credit scoring. Subsequently, techniques such as those shown in this paper allow the calculation of the probability of default at horizon b for all of them. The curve

P D (t | x)

provides the probability that the client will default after a certain period of time b.

7. Conclusions and Future Lines

This article proposes an automatic bandwidth selector and confidence regions algorithms based on bootstrap selectors for Beran’s estimator and the smoothed Beran’s estimator of the probability of default in credit risk. The proposed resampling methods and bootstrap selectors allow to approximate the MISE bandwidths corresponding to each estimator: the covariate-smoothing bandwidth in the case of the Beran estimator and the two-dimensional covariate and time-smoothing bandwidth in the case of the doubly-smoothed estimator. In view of the simulation study carried out, it can be concluded that the bandwidth selectors work properly. The doubly smoothed Beran’s estimator with bootstrap bandwidths commits smaller estimation errors than Beran’s estimator. The simulation results also show the good behavior of the confidence regions, especially those based on the doubly smoothed Beran’s estimator. They have a lower average width, reducing the uncertainty about where the true probability of default curve is located, while preserving a high coverage.

The main limitation of proposed methods is their high computational cost. Approximating the bootstrap bandwidth or computing a confidence region by 500 resamples from a sample of size 100 requires one minute, while a sample of size 500 requires 25 min to obtain the result. These times are similar for both estimators. They seem to increase quadratically as the sample size grows, which may lead to prohibitive times for very large sample sizes. Using subsampling techniques is an appealing idea to be considered in the future for optimizing these methods.

In a financial context, credit scoring typically summarizes several interesting features of clients in order to measure their creditworthiness. However, this work could be extended to the case of having a multidimensional covariate

(X_{1}, \dots, X_{q})

, where each

X_{i}

is a feature of the individual. Methods such as single-index can be useful for this purpose to avoid the curse of dimensionality. An approach along the lines similar to [31] can be used.

Author Contributions

Conceptualization, R.C. and J.M.V.; Data curation, R.P.; Formal analysis, R.P.; Investigation, R.P.; Supervision, R.C. and J.M.V.; Visualization, R.P.; Writing—original draft, R.P.; Writing—review & editing, R.P., R.C. and J.M.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by MICINN Grant PID2020-113578RB-100, and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14 and Centro Singular de Investigación de Galicia ED431G 2019/01), all of them through the ERDF.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data) (accessed on 15 September 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wiginton, J.C. A note on the comparison of logit and discriminant models of consumer credit behaviour. J. Financ. Quant. Anal. 1980, 15, 757–770. [Google Scholar] [CrossRef]
Srinivasan, V.; Kim, Y.H. Credit granting: A comparative analysis of clasification procedures. J. Financ. 1987, 42, 665–681. [Google Scholar] [CrossRef]
Steenackers, A.; Goovaerts, M.J. A credit scoring model for personal loans. Insur. Math. Econ. 1989, 8, 31–34. [Google Scholar] [CrossRef]
Thomas, L.C.; Crook, J.N.; Edelman, D.B. Credit Scoring and Credit Control; Oxford University Press: Oxford, UK, 1992. [Google Scholar]
Baba, N.; Goko, H. Survival Analysis of Hedge Funds; Bank of Japan, Working Papers Series; Bank of Japan: Tokyo, Japan, 2006. [Google Scholar]
Samreen, A.; Zaidi, F. Design and development of credit scoring model for the commercial banks of Pakistan: Forecasting creditworthiness of individual borrowers. Int. J. Bus. Soc. Sci. 2012, 17, 155–166. [Google Scholar]
Naraim, B. Survival analysis and the credit granting decision. In Credit Scoring and Credit Control; Thomas, L.C., Crook, J.N., Edelman, D.B., Eds.; Oxford University Press: Oxford, UK, 1992; pp. 109–121. [Google Scholar]
Schuermann, T.; Hanson, S.G. Estimating probabilities of default. In Staff Report Federal Reserve Bank of New York; Federal Reserve Bank of New York: New York, NY, USA, 2004; pp. 923–947. [Google Scholar]
Glennon, D.; Nigro, P. Measuring the default risk of small business loans: A survival analysis approach. J. Money Credit. Bank. 2005, 37, 923–947. [Google Scholar] [CrossRef]
Allen, L.N.; Rose, L.C. Financial survival analysis of defaulted debtors. J. Oper. Res. Soc. 2006, 57, 630–636. [Google Scholar] [CrossRef]
Beran, J.; Djaïdja, A.K. Credit risk modeling based on survival analysis with inmunes. Stat. Methodol. 2007, 4, 251–276. [Google Scholar] [CrossRef]
Cao, R.; Vilar, J.M.; Devia, A. Modelling consumer credit risk via survival analysis (with discussion). Stat. Oper. Res. Trans. 2009, 33, 3–30. [Google Scholar]
Peláez, R.; Cao, R.; Vilar, J.M. Probability of default estimation in credit risk using a nonparametric approach. TEST 2021, 30, 383–405. [Google Scholar] [CrossRef]
Peláez, R.; Cao, R.; Vilar, J.M. Nonparametric estimation of probability of default with double smoothing. SORT 2021, 45, 93–120. [Google Scholar]
Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
Efron, B. Censored data and the bootstrap. J. Am. Stat. Assoc. 1981, 76, 312–319. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman and Hall: London, UK, 1993. [Google Scholar]
Akritas, M. Bootstrapping the Kaplan-Meier estimator. J. Am. Stat. Assoc. 1986, 81, 1032–1039. [Google Scholar]
Lo, S.H.; Singh, K. The product-limit estimator and the bootstrap: Some asymptotic representations. Probab. Theory Relat. Fields 1986, 71, 455–465. [Google Scholar] [CrossRef]
Van Keilegom, I.; Veraverbeke, N. Estimation and bootstrap with censored data in fixed design nonparametric regression. Ann. Inst. Stat. Math. 1997, 49, 467–491. [Google Scholar] [CrossRef]
Li, G.; Datta, S. A bootstrap approach to nonparametric regression for right censored data. Ann. Inst. Stat. Math. 2001, 53, 708–729. [Google Scholar] [CrossRef]
Geerdens, C.; Acar, E.F.; Janssen, P. Conditional copula models for right-censored clustered event time data. Biostatistics 2017, 19, 247–262. [Google Scholar] [CrossRef]
Földes, A.; Rejtø, L.; Winter, B.B. Strong consistency properties of nonparametric estimators for randomly censored data, II: Estimation of density and failure rate. Period. Math. Hung. 1981, 12, 15–29. [Google Scholar] [CrossRef]
Leconte, E.; Poiraud-Casanova, S.; Thomas-Agnan, C. Smooth conditional distribution function and quantiles under random censorship. Lifetime Data Anal. 2002, 8, 229–246. [Google Scholar] [CrossRef]
Peláez, R.; Cao, R.; Vilar, J.M. Nonparametric Estimation of the Conditional Survival Function with Double Smoothing; Technical Report; Universidade da Coruña: A Coruña, Spain, 2021. [Google Scholar]
Beran, R. Nonparametric Regression with Randomly Censored Survival Data; Technical Report; University of California: Los Angeles, CA, USA, 1981. [Google Scholar]
Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
Zhu, C.; Byrd, R.H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 1997, 23, 550–560. [Google Scholar] [CrossRef]
Cao, R.; Francisco-Fernández, M.; Quinto, E. A random effect multiplicative heteroscedastic model for bacterial growth. BMC Bioinform. 2010, 11, 77. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Winkler, R.L. A decision-theoretic approach to interval estimation. J. Am. Stat. Assoc. 1972, 67, 187–191. [Google Scholar] [CrossRef]
Strzalkowska-Kominiak, E.; Cao, R. Maximum likelihood estimation for conditional distribution single-index models under censoring. J. Multivar. Anal. 2013, 114, 74–98. [Google Scholar] [CrossRef]
Barnard, B. Rating Migration and Bond Valuation: Ahistorical Interest Rate and Default Probability Term Structures; University of the Witwatersrand, Wits Business School: Johannesburg, South Africa, 2017. [Google Scholar]
dos Reis, G.; Smith, G. Robust and consistent estimation of generators in credit risk. Quant. Financ. 2018, 18, 983–1001. [Google Scholar] [CrossRef] [Green Version]

Figure 1.

M I S E_{x} (h)

function (black line) approximated via Monte Carlo and

M I S E_{x}^{*} (h)

functions (gray lines) for

N = 300

samples (textbftop), boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (middle) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 1.

Figure 1.

M I S E_{x} (h)

function (black line) approximated via Monte Carlo and

M I S E_{x}^{*} (h)

functions (gray lines) for

N = 300

samples (textbftop), boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (middle) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 1.

Figure 2.

M I S E_{x} (h)

function (black line) approximated via Monte Carlo and

M I S E_{x}^{*} (h)

functions (gray lines) for

N = 300

samples (top), boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (middle) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 2.

Figure 2.

M I S E_{x} (h)

function (black line) approximated via Monte Carlo and

M I S E_{x}^{*} (h)

functions (gray lines) for

N = 300

samples (top), boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (middle) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 2.

Figure 3. Theoretical probability of default function

P D (t | x)

(solid line), Beran’s estimation with MISE bandwidth (dotted line) and Beran’s estimation with bootstrap bandwidth (dashed line) for one sample from Model 1 (left) and Model 2 (right) with

P (δ = 0 | x) = 0.5

.

Figure 3. Theoretical probability of default function

P D (t | x)

(solid line), Beran’s estimation with MISE bandwidth (dotted line) and Beran’s estimation with bootstrap bandwidth (dashed line) for one sample from Model 1 (left) and Model 2 (right) with

P (δ = 0 | x) = 0.5

.

Figure 4.

M I S E_{x} (h, g)

function approximated via Monte Carlo (left) and

M I S E_{x}^{*} (h, g)

function approximated via bootstrap (right) for one sample from Model 1 (top) and Model 2 (bottom) when

P (δ = 0 | x) = 0.5

.

Figure 4.

M I S E_{x} (h, g)

function approximated via Monte Carlo (left) and

M I S E_{x}^{*} (h, g)

function approximated via bootstrap (right) for one sample from Model 1 (top) and Model 2 (bottom) when

P (δ = 0 | x) = 0.5

.

Figure 5. Boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (top) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 1.

Figure 5. Boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (top) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 1.

Figure 6. Boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (top) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 2.

Figure 6. Boxplot of

H_{1}^{*}, . . ., H_{N}^{*}

values (top) and boxplot of

R_{1}^{*}, . . ., R_{N}^{*}

values (bottom) when the conditional probability of censoring is 0.2 (left) and 0.5 (right) in Model 2.

Figure 7. Theoretical probability of default function,

P D (t | x)

, (solid line), smoothed Beran’s estimation with MISE bandwidth (dotted line) and smoothed Beran’s estimation with bootstrap bandwidth (dashed line) for one sample from Model 1 (left), Model 2 (right) with

P (δ = 0 | x) = 0.5

.

Figure 7. Theoretical probability of default function,

P D (t | x)

, (solid line), smoothed Beran’s estimation with MISE bandwidth (dotted line) and smoothed Beran’s estimation with bootstrap bandwidth (dashed line) for one sample from Model 1 (left), Model 2 (right) with

P (δ = 0 | x) = 0.5

.

Figure 8. Theoretical

P D (t | x)

(red solid line), Beran’s estimation of

P D (t | x)

with MISE bandwidths (black dashed line) and bootstrap versions of Beran’s estimations of

P D (t | x)

from

B = 500

resamples (gray dashed lines) of one sample from Model 2 when

P (δ = 0 | x) = 0.5

.

Figure 8. Theoretical

P D (t | x)

(red solid line), Beran’s estimation of

P D (t | x)

with MISE bandwidths (black dashed line) and bootstrap versions of Beran’s estimations of

P D (t | x)

from

B = 500

resamples (gray dashed lines) of one sample from Model 2 when

P (δ = 0 | x) = 0.5

.

Figure 9. Theoretical

P D (t | x)

(red solid line) and estimation with MISE bandwidths (black dashed line) along with the bootstrap estimations of

P D (t | x)

from

B = 500

resamples (gray dashed lines) in the left panel and

95 %

confidence region (black dotted lines) in the right panel by means of Beran’s estimator (top) and the smoothed Beran’s estimator (bottom) for one sample from Model 1 when

P (δ = 0 | x) = 0.5

.

Figure 9. Theoretical

P D (t | x)

(red solid line) and estimation with MISE bandwidths (black dashed line) along with the bootstrap estimations of

P D (t | x)

from

B = 500

resamples (gray dashed lines) in the left panel and

95 %

confidence region (black dotted lines) in the right panel by means of Beran’s estimator (top) and the smoothed Beran’s estimator (bottom) for one sample from Model 1 when

P (δ = 0 | x) = 0.5

.

Figure 10. Theoretical

P D (t | x)

(red solid line) and estimation with MISE bandwidths (black dashed line) along with the bootstrap estimations of

P D (t | x)

from

B = 500

resamples (gray dashed lines) in the left panel and

95 %

confidence region (black dotted lines) in the right panel by means of Beran’s estimator (top) and the smoothed Beran’s estimator (bottom) for one sample from Model 2 when

P (δ = 0 | x) = 0.5

.

Figure 10. Theoretical

P D (t | x)

(red solid line) and estimation with MISE bandwidths (black dashed line) along with the bootstrap estimations of

P D (t | x)

from

B = 500

resamples (gray dashed lines) in the left panel and

95 %

confidence region (black dotted lines) in the right panel by means of Beran’s estimator (top) and the smoothed Beran’s estimator (bottom) for one sample from Model 2 when

P (δ = 0 | x) = 0.5

.

Figure 11. Scatter plot of credit scoring and duration of the credit in the censored group (red circles) and the uncensored group (blue triangles) of the German credit data.

Figure 12. Conditional survival function estimation (left) and probability of default estimation (right) by means of Beran’s estimator (dashed line) and the smoothed Beran’s estimator (solid line) with bootstrap bandwidths when

x = 0.85

in the German credit dataset.

Figure 12. Conditional survival function estimation (left) and probability of default estimation (right) by means of Beran’s estimator (dashed line) and the smoothed Beran’s estimator (solid line) with bootstrap bandwidths when

x = 0.85

in the German credit dataset.

Figure 13. Estimation of

P D (t | x)

with bootstrap bandwidths (black line) along with bootstrap estimations of PD (gray lines) from

B = 500

resamples (left) and

95 %

confidence region (right) by Beran’s estimator (top) and the smoothed Beran’s estimator (bottom) when

x = 0.85

and

t \in [0, 40]

in the German credit dataset.

Figure 13. Estimation of

P D (t | x)

with bootstrap bandwidths (black line) along with bootstrap estimations of PD (gray lines) from

B = 500

resamples (left) and

95 %

confidence region (right) by Beran’s estimator (top) and the smoothed Beran’s estimator (bottom) when

x = 0.85

and

t \in [0, 40]

in the German credit dataset.

Figure 14. Estimation of

P D (t | x)

with bootstrap bandwidths (black line) along with bootstrap estimations of PD (gray lines) from

B = 500

resamples (left) and

95 %

confidence region (right) by the smoothed Beran’s estimator when

x = 0.85

and

t \in [0, 60]

in the German credit dataset.

Figure 14. Estimation of

P D (t | x)

with bootstrap bandwidths (black line) along with bootstrap estimations of PD (gray lines) from

B = 500

resamples (left) and

95 %

confidence region (right) by the smoothed Beran’s estimator when

x = 0.85

and

t \in [0, 60]

in the German credit dataset.

Table 1. MISE, average bootstrap bandwidths and estimation errors of Beran’s PD estimator in each level of censoring conditional probability for Models 1 and 2. Numbers within brackets are standard deviations.

	Model 1		Model 2
$P (δ = 0 \| X = x)$	$0.2$	$0.5$	$0.2$	$0.5$
$h_{M I S E}$	0.37576	0.35909	0.09494	0.10959
$R M I S E_{x} (h_{M I S E})$	0.05520	0.11144	0.27942	0.49991
$\bar{h^{*}} (s d)$	0.27856 (0.092)	0.30892 (0.110)	0.21763 (0.041)	0.23091 (0.068)
$\bar{H^{*}}$	0.31431	0.29306	1.29211	1.10692
$\bar{R M I S E_{x} (h^{*})}$	0.05700	0.11405	0.29671	0.50824
$\bar{R^{*}}$	0.03260	0.02336	0.06188	0.01666

Table 2. MISE, average bootstrap bandwidths and estimation errors of the smoothed Beran’s PD estimator in each level of censoring conditional probability for Models 1 and 2. Numbers within brackets are standard deviations.

	Model 1		Model 2
$P (δ = 0 \| X = x)$	$0.2$	$0.5$	$0.2$	$0.5$
$h_{M I S E}$	0.21633	0.16735	0.11122	0.37551
$g_{M I S E}$	0.09286	0.14612	1.27755	1.68878
$R M I S E_{x} (h_{M I S E}, g_{M I S E})$	0.03710	0.05094	0.09829	0.12322
$\bar{h^{*}} (s d)$	0.11736 (0.051)	0.11219 (0.057)	0.19813 (0.180)	0.16593 (0.218)
$\bar{g^{*}} (s d)$	0.12647 (0.039)	0.19671 (0.054)	0.60005 (0.375)	1.45428 (0.711)
$\bar{H^{*}}$	0.68121	0.63604	1.22609	0.89164
$\bar{R M I S E_{x} (h^{}, g^{})}$	0.04620	0.06793	0.22135	0.28342
$\bar{R^{*}}$	0.24517	0.33357	1.25199	1.30003

Table 3. Comparative table of the estimation error of Beran’s estimator and the smoothed Beran’s estimator in Models 1 and 2.

		Model 1		Model 2
$P (δ = 0 \| X = x)$		$0.2$	$0.5$	$0.2$	$0.5$
BERAN	$\bar{R M I S E_{x} (h^{*})}$	0.05579	0.11206	0.28593	0.49916
SBERAN	$\bar{R M I S E (h^{}, g^{})}$	0.04629	0.07216	0.20007	0.27611
$R_{S}$		0.20523	0.55294	0.42915	0.80783

Table 4. MISE bandwidths and RMISE of Beran and smoothed Beran’s estimators in each level of censoring conditional probability for Models 1 and 2 when

t_{n_{T}} + b = F^{- 1} (0.70 | x)

.

Table 4. MISE bandwidths and RMISE of Beran and smoothed Beran’s estimators in each level of censoring conditional probability for Models 1 and 2 when

t_{n_{T}} + b = F^{- 1} (0.70 | x)

.

		Model 1		Model 2
$P (δ = 0 \| X = x)$		$0.2$	$0.5$	$0.2$	$0.5$
BERAN	$h_{M I S E}$	0.375510	0.320408	0.041837	0.057755
	$R M I S E_{x} (h_{M I S E})$	0.019403	0.025943	0.193334	0.220733
SBERAN	$h_{M I S E}$	0.230612	0.196939	0.094286	0.154490
	$g_{M I S E}$	0.073673	0.083469	0.908163	1.071429
	$R M I S E_{x} (h_{M I S E}, g_{M I S E})$	0.013658	0.018165	0.026161	0.029007

Table 5. Coverage, average width and IWS of the

95 %

confidence regions by means of Beran’s and the smoothed Beran’s estimators using

N = 300

simulated samples from Models 1 and 2.

Table 5. Coverage, average width and IWS of the

95 %

confidence regions by means of Beran’s and the smoothed Beran’s estimators using

N = 300

simulated samples from Models 1 and 2.

	Model 1
$P (δ = 0 \| X = x)$	0.2		0.5
Estimator	BERAN	SBERAN	BERAN	SBERAN
Coverage (%)	96.33	90.67	90.00	85.33
Mean pointwisecoverage (%)	99.94	98.05	99.63	96.85
Mean width	0.21997	0.09539	0.24827	0.10937
IWS	0.09869	0.04537	0.11218	0.05571
	Model 2
$P (δ = 0 \| X = x)$	0.2		0.5
Estimator	BERAN	SBERAN	BERAN	SBERAN
Coverage (%)	97.33	83.00	91.46	98.00
Mean pointwisecoverage (%)	99.88	98.53	99.65	99.85
Mean width	0.50514	0.17969	0.55581	0.33033
IWS	0.62590	0.22825	0.71009	0.40845

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peláez, R.; Cao, R.; Vilar, J.M. Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation. Mathematics 2022, 10, 1523. https://doi.org/10.3390/math10091523

AMA Style

Peláez R, Cao R, Vilar JM. Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation. Mathematics. 2022; 10(9):1523. https://doi.org/10.3390/math10091523

Chicago/Turabian Style

Peláez, Rebeca, Ricardo Cao, and Juan M. Vilar. 2022. "Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation" Mathematics 10, no. 9: 1523. https://doi.org/10.3390/math10091523

APA Style

Peláez, R., Cao, R., & Vilar, J. M. (2022). Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation. Mathematics, 10(9), 1523. https://doi.org/10.3390/math10091523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation

Abstract

1. Introduction

2. Bandwidth Selection for Beran and Smoothed Beran’s PD Estimators

2.1. Beran’s Estimator

2.1.1. Algorithm for Bootstrap Resampling

2.1.2. Algorithm for Bootstrap Bandwidth Selector Based on Beran’s Estimator

2.2. Smoothed Beran’s Estimator

2.2.1. Algorithm for Bootstrap Resampling

2.2.2. Algorithm for Bootstrap Bandwidth Selector Based on the Smoothed Beran’s Estimator

3. Simulation Study for Bandwidth Selection

3.1. Simulation Study for Beran’s Estimator

3.2. Simulation Study for the Smoothed Beran’s Estimator

4. Confidence Regions Using Beran and Smoothed Beran’s Estimators

4.1. Confidence Region Based on Beran’s Estimator

4.2. Iterative Method to Approximate $λ_{α}^{*}$

5. Simulation Study for Confidence Regions

6. Application to Real Data

7. Conclusions and Future Lines

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Bootstrap Bandwidth Selection and Confidence Regions for Double Smoothed Default Probability Estimation

Abstract

1. Introduction

2. Bandwidth Selection for Beran and Smoothed Beran’s PD Estimators

2.1. Beran’s Estimator

2.1.1. Algorithm for Bootstrap Resampling

2.1.2. Algorithm for Bootstrap Bandwidth Selector Based on Beran’s Estimator

2.2. Smoothed Beran’s Estimator

2.2.1. Algorithm for Bootstrap Resampling

2.2.2. Algorithm for Bootstrap Bandwidth Selector Based on the Smoothed Beran’s Estimator

3. Simulation Study for Bandwidth Selection

3.1. Simulation Study for Beran’s Estimator

3.2. Simulation Study for the Smoothed Beran’s Estimator

4. Confidence Regions Using Beran and Smoothed Beran’s Estimators

4.1. Confidence Region Based on Beran’s Estimator

4.2. Iterative Method to Approximate λ α *

5. Simulation Study for Confidence Regions

6. Application to Real Data

7. Conclusions and Future Lines

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2. Iterative Method to Approximate $λ_{α}^{*}$