1. Introduction
In actuarial science, one of the fundamental problems is that of predicting future claims of individual risk given one’s past experience of a collective of heterogeneous risks. Credibility is a ratemaking technique that serves to forecast future premiums for a group of insurance contracts for which we have experience, whilst we have a lot more experience for a collection of contracts that are similar but not exactly the same.
In the insurance industry, some legislated rules indicate that some changes over time occurred across the claim distribution. Therefore, it is essential to examine these changes at different points of the distribution. An empirical distribution function provides a way to model and sample cumulative probabilities for a data sample that does not fit a standard probability distribution. Its value at a given point is equal to the proportion of observations from the sample that are less than or equal to that point.
In non-life insurance practice, actuaries are often faced with the challenge of predicting the number of claims and the claim amounts to be incurred at any given time, which serve to implement fair pricing and reserves given the nature of the risk. Actuaries usually deal with events that are uncertain and their economic consequences. The aim of this paper is to carry out the credibility estimation of empirical distribution functions in measuring and managing these uncertainties.
In the first part of this paper, we extend the work of
Jewell (
1974b) in terms of forecasting the distribution of individual risk in cases where the observations are weighted for the non-homogeneous and homogeneous models. Here, the weights (sizes)
,
,
are now changing in time. The contract
j might result from a grouping and averaging of
observations in a contract with several independent and identically distributed observations
,
, during the year
i, i.e.,
, and then taking the conditional mean of the identity function
. Alternatively, in the case of raw data, the contract
j might result from the grouping and averaging of identity functions within the year
i,
and then taking the
.
Here, we proceed with the former considering the credibility distribution estimation as a point estimate approach of
. Optimal linearized estimators of
are obtained by the classical least squares approach as well as by the optimal projection theorem of random variables on planes as presented by
De Vylder (
1976,
1996).
In the second part of this paper, we consider credibility distribution estimation based on grouped data formed by aggregating the individual observations of a variable into groups. The construction of the empirical distribution based on grouped data can be performed by obtaining the point values of the empirical distribution function whenever is possible. Then, we approximate the distribution functions by connecting those points with straight lines and applying premium estimation in a credibility framework. An alternative model of credibility estimation is also obtained similarly as in
Bühlmann and Straub (
1970) model.
2. Weighted Credibility Distribution Estimation
In the following, we consider the credibility model with several contracts and weighted observations. For an insurance portfolio, are the average losses of observations for contract and period . For industry portfolios, denotes the average returns (losses/gains) of firms in portfolios for period .
2.1. Assumptions
We have the following assumptions:
- (i)
The contracts are independent and the variables are identically distributed.
- (ii)
, where is an indicator function that is equal to 1 if and 0 otherwise.
- (iii)
where if and 0 otherwise.
2.2. Structural Parameters
The structural parameters , , and are as follows:
- (SP1)
- (SP2)
;
- (SP3)
2.3. Notation
Here, we present the weighted empirical distribution function as well as some notations that are useful for the derivation of the credibility distribution estimation.
Lemma 1 (Expectation and Covariance Relations). Based on the above assumptions, we can obtain expressions for the conditional expectations and covariances as follows, Proof. Relation (
2) is straightforward. Relation (3) results from
The first part of (4) results from
Similarly, we can prove the second and third parts of (4). For the proof of the first part of relation (
5), we have
In the same way, we can prove the second and third parts of (
5). Finally, (6) can be proved as
□
Similarly, as in
Bühlmann and Straub (
1970), by the following theorems, we will provide the optimal linearized non-homogeneous (as well as the homogeneous) credibility estimators and provide some useful estimators for the structure parameters.
Theorem 1 (Linearized non-homogeneous credibility distribution estimator). Under the assumptions –, the optimal linearized non-homogeneous estimator of is obtained bywith and as in (1). Proof. We have to find
in
such that
is minimum. Differentiating (
9) with respect to
, we have
Substituting the value of
in (
9) and differentiating with respect to
, we obtain
The right-hand side of (
11) becomes
Then, (
11) implies that
Multiplying (
12) by
and summing with respect to
, (i.e.,
), we obtain
Since the probability distribution of
is invariant under permutations of
and
is uniquely defined, it must hold that
. Then, (
8) becomes
which provides (
7). □
Theorem 2 (Linearized homogeneous credibility distribution estimator). Under the assumptions –, the optimal linearized homogeneous estimator of is obtained bywith , and as defined in (1). Proof. Letting
we have to minimize
such that
holds under the restrictions
, with the Lagrange multiplier
. The following quantity leads to
From (
16), we obtain
Differentiating (
17) with respect to
, we obtain
Multiplying both sides by
and taking the sum over
, we obtain for each
:
Substituting (
20) into (
19), we obtain
Then, the optimal linearized homogeneous estimator of
becomes
resulting in (
13). □
The following theorem will prove that has a smaller variance than , i.e., based on the heterogeneity and the fluctuation of the risk, has a minimal mean square error.
Theorem 3. The is the minimum for all , such that , for .
Proof. We have to minimize the following quantity
Taking the derivative of (
23) with respect
for
and
, we obtain
This is the same as
This gives
We know that
We therefore obtain
and from (
26), we have
□
Theorem 4. Under assumptions –, the quadratic loss for the credibility distribution estimator is given by Proof. We have
that provides (
29). □
2.4. Optimal Projection Theorem
In the following,
De Vylder’s (
1976,
1996) optimal projection theorem of random variables in the plane is applied in order to derive the optimal estimator of
and
. Practically,
is replaced by
in (
7).
Theorem 5. The optimal estimator of in the plane , , ) is Proof. Directly from (
2) and (
5). □
Theorem 6. The optimal credibility estimator of based on is Proof. In order to prove (
30), it is sufficient to prove the unbiasedness and covariance conditions of the optimal projection theorem of random variables on planes not through the origin (see
De Vylder (
1976,
1996)), that is
and
The unbiasedness condition results from (
2) and
The covariance condition results from the independence of the contracts and the covariance relations of Lemma 1, which gives
□
2.5. Unbiased Estimators
Lemma 2. The following estimators of the structural parameters , and , presented in Section 2.2, are unbiased. Based on De Vylder (1978), an unbiased estimator of can take the form Proof. The unbiasedness of
is straightforward and is omitted. The unbiasedness of
follows from
resulting in (33). For the proof of the unbiasedness of (34), we refer to
Bühlmann and Straub (
1970). Finally, the unbiasedness of
in (
35) results from
which implies (
35). □
3. Credible Distribution for Grouped Data
Grouped data are formed by aggregating the individual observations of a variable into groups. For example, a histogram is a density approximation for grouped data. The construction of the empirical distribution based on grouped data can be achieved by obtaining the point values of the empirical distribution function whenever possible. Then, we can approximate the distribution function by connecting those point values with straight lines.
Empirical distribution for grouped data is evaluated at a point estimate x. We consider the case where the point estimate x is at a boundary and the case where the value of x is between the boundaries.
3.1. Empirical Distribution for Grouped Data at Boundary
For contract
j, let the group boundaries be
, where
and
. Let
be the number of observations in the interval
,
,
and
be the total number of observations for the
j contract. For grouped data, the empirical distribution function at each group boundary
is defined as
For grouped data, there is no problem if the distribution function has to be estimated at a boundary. When all of the information is available, working with the empirical estimate of the distribution function is straightforward (see
Klugman et al. (
2012)). We have the following assumptions:
3.1.1. Assumptions
- (i*)
The contracts are independent and the variables are identically distributed. The observations have finite variance,
- (ii*)
- (iii*)
3.1.2. Structural Parameters
3.1.3. Notation
Here, we adopt the following notation:
Based on the above assumptions, a credibility distribution estimator for
is obtained as
With the following theorem, we can obtain the credibility distribution estimator of .
Theorem 7. Under the assumptions –, the credibility factor in (40) is given bywith as in (38) and as in (39). Proof. The proof of the theorem can be obtained by minimizing the expression
with respect to
. □
3.1.4. Credibility Estimators
Lemma 3. The credibility point estimators of , and are given as follows: Proof. Similarly to the proof of Lemma 2. □
3.2. Empirical Distribution for Grouped Data at Value x between Boundaries
Now, suppose that the value of
x is between the boundaries
and
. Then, for contract
j, the empirical distribution function is given by
This function is differentiable at all values except for the group boundaries. Based on (
41), we can obtain the following
and
Note that the above estimator is biased although it is an unbiased estimator of the true interpolated value (see
Klugman et al. (
2012)).
The conditional variance of the empirical distribution is
where
and
Then, we can proceed as in
Section 3.1 for obtaining the credibility distribution estimator of
, when the value of
x is between boundaries.
4. Alternative Credibility Distribution Approach for Grouped Data
For grouped data, the previous approaches yield credibility point estimates. If we want to find the credibility estimation in the framework of
Bühlmann and Straub (
1970), we may apply the concept of uniform distribution within each interval
and the first two moments can be estimated from
for
. Thus, for contract
j, the empirical estimate of the mean
is the weighted average of the interval midpoints where the weight
for an interval is the proportion of the observations that are in the interval (histogram), i.e.,
Letting
and assuming that
and
, the credibility estimation based on grouped data can be obtained similarly as in the
Bühlmann and Straub (
1970) model
with parameters
Theorem 8. The following are unbiased estimators for μ, , and a:andor 5. Numerical Illustrations
In this section, we use two datasets, one with insurance motor claims data and a second with monthly returns financial data.
5.1. Numerical Example with Insurance Data
The dataset is provided by
Insurance Europe (
2022) and includes a database with figures on the European insurance industry during the period 2004–2020 for 32 EU countries. Our numerical illustration is based on a complete dataset of 10 selected countries for the years 2004–2018. Our dataset also contains the motor claims paid and the number of motor claims for each country and each year. The selected countries are the following: Austria (AT); Germany (DE); Finland (FI); Greece (GR); Hungary (HR); Italy (IT); Norway (NO); Poland (PL); Portugal (PT); and Sweden (SE).
Table 1 shows the summary statistics of the motor claim amounts and the claim numbers for countries
and years
.
Table 2 illustrates the results of a credibility distribution function for motor claims amount data during the years 2004–2018. More analytically, the upper part of the table shows the individual empirical distribution
of claim amounts
(
x = 320, 800, 1000, 2000, 3000, 23,800, 23,896, 23,897) and the corresponding credibility distribution estimators
are shown in the middle part of the table. The estimated credibility factors
, as well the estimated parameters
,
,
, are presented in the lower part of
Table 2. Note that
means that the value of all claims
and
if claims
x.
In
Table 2, we observe a lack of monotonicity of the estimated credibility distributions for all contracts. In order to obtain monotonicity, we similarly proceed as in
Cai et al. (
2015) by restricting the credibility factor
to be a constant free of
x. The results are shown in
Table 3. Although monotonicity has been restored from a risk management perspective (which serves to fair pricing and reserves given the nature of the risk), more investigation is required, especially in the points where monotonicity breaks down.
Remark 1. Another way of obtaining monotonicity of the credibility estimated by distribution functions is by sorting the resulting credibility by estimated distribution functions. In the relevant literature, there are methods for extracting a monotone function from non-monotonic data. Such a method is the monotonic regression that achieves the monotonicity and smoothness of the regression by introducing a regularization term, and solving an optimization problem with constraints. Some key references are: Friedman and Tibshirani (1984), Mukerjee (1988), Shively et al. (2009) and Zhang (2004). Similarly, the above approaches could be applied to our model. By letting the values of motor claims be larger than
x = 23,800 and less than or equal to
x = 23,897
x = 23,897 is the maximum threshold of contract DE, which is the contract with the largest values of motor claims, as shown in
Table 1), whilst the values of the estimated credibility distribution
remain the same up to the fifth decimal place. By letting
x > 23,897, the estimated credibility distribution goes to 1 (see
Table 2).
Remark 2. Similarly to in Bühlmann and Straub (1970) model, can possibly be negative. This means that there is no detectable difference between the risks. In this case we put , as in our cases for x = 23,800, 23,896, 23,897. Figure 1 displays the individual empirical distribution in each contract. Note that the red bullets indicate the corresponding credibility estimate at specific points presented in
Table 2.
5.2. Example of Credibility Distribution Estimation with Financial Data
The dataset was created (see
Fama and French (
2022)) as follows: each NYSE, AMEX, and NASDAQ stock was assigned to an industry portfolio at the end of June of year
t based on its four-digit SIC code at that time. Compustat SIC codes have been used for the fiscal year ending in the calendar year
. Whenever Compustat SIC codes are not available, CRSP SIC codes for June of year
t were used. Then, returns from July of year
t to June of year
are computed. The weights are the number of firms in portfolios.
In particular, the portfolios are constructed with monthly returns from July 1926 to July 2022 and it contains value returns for 10 industry portfolios. The credibility distribution for each of these portfolios needs to be estimated. As a profit (P), we consider a random variable X, with positive returns values and as a loss (L) with negative return values. The 10 industry portfolios are as follows:
- (1)
NoDur: consumer non-durables—food, tobacco, textiles, apparel, leather, and toys.
- (2)
Durbl: Consumer durables—cars, TVs, furniture, household appliances.
- (3)
Manuf: Manufacturing—machinery, trucks, planes, chemicals, off-furn, and paper.
- (4)
Enrgy: Oil, gas, and coal extraction and products.
- (5)
HiTec: Business equipment—computers, software, and electronic equipment.
- (6)
Telcm: Telephone and television transmission.
- (7)
Shops: Wholesale, retail, and some services (laundries, repair shops).
- (8)
Hlth: healthcare, medical equipment, and drugs.
- (9)
Utils: Utilities.
- (10)
Other: Other—mines, construction, building material, transportation, hotels, bus service, entertainment, and finance.
Table 5 provides some descriptive statistics of the (P/L) monthly returns of the 10 industry portfolios. The number of observations in each portfolio is
.
Table 6 illustrates the results of credibility distribution function for monthly returns for 10 industry portfolios from July 1926 to July 2022. More analytically, the upper part of the table shows the individual empirical distribution
of the returns
(
) and the corresponding credibility distribution estimators
are shown in the middle part of the table. The estimated credibility factors
, as well the estimated parameters
,
,
, are presented in the lower part of
Table 6. The monotonicity of the estimated distribution function is shown in
Table 6. By letting the values of returns be larger than
x = 59 and less than or equal to
x = 79.79 (
x = 79.79 is the maximum threshold of portfolio Durbl, which is the portfolio with the largest return values, as shown in
Table 5), the values of the estimated credibility distribution
remain the same up to the fifth decimal place. By letting
,
goes to 1 (see
Table 6).
Figure 2 displays the individual empirical distribution in each contract. Again, note that the red bullets indicate the corresponding credibility estimate at specific points presented in
Table 6.
Credibility Coefficients for Industry Portfolios Data
Here, we provide an intuitive interpretation for the form of the credibility distribution estimator for the monthly returns for the 10 industry portfolios, by presenting the following credibility coefficients.
Table 7 illustrates the coefficient of variation
, the average within-risk coefficient of variation
, and the credibility coefficient
for the industry portfolio data.
Remark 4. The results of Table 6 and Remark 2, for x = 50, 0 and 0.0469, imply that BRV = 0 and . 5.3. Example of Credibility Distribution Estimation with Financial Grouped Data
The empirical distribution function for the grouped data was depicted by the step function of
Fama and French (
2022) data. The grouping (see
Table 8) is a subjective element in this fit and other persons would have different ones. The total number of observations in each portfolio is the same (
).
Table 9 illustrates the results of the credibility distribution function for monthly returns for 10 industry portfolios from July 1926 to July 2022. Analytically, the upper part of the table shows the individual empirical distribution
of returns
(
) and the corresponding credibility distribution estimators
are shown in the middle part of the table. The estimated credibility factors
, as well the estimated parameters
,
,
, are presented in the lower part of
Table 9. The monotonicity of the estimated distribution function is shown in
Table 9, but the convergence to one of the estimated credibility distribution for grouped data should be further investigated.
Figure 3 displays the smoothed individual empirical distribution for grouped data in each contract. Again, the red bullets indicate the corresponding credibility estimate at specific points presented in
Table 9.
Credibility Coefficients for Financial Grouped Data
Table 10 illustrates the coefficient of variation
, the average within-risk coefficient of variation
, and the credibility coefficient
for the industry portfolios of grouped data.
5.4. Example of the Classical Credibility Estimation with Financial Grouped Data
For grouped data, the previous approach gives a credibility point estimate. If we want to derive the classical credibility estimation, we can apply the concept of uniform distribution within each interval of returns and take the interval midpoints as the value of return. The weights are the number of observations in each interval.
Table 11 shows the individual average return for the 10 industry portfolios
, the credibility estimation of returns for these portfolios
, along with the credibility factor
and the estimated parameters
,
and
.
6. Concluding Remarks
The objective of this paper was to present the appropriate credibility distribution model that adequately describes the insurance losses, a model that can be used for risk management purposes.
The main contribution of the paper is that it embedded the empirical distribution into credibility modeling in the form of the
Bühlmann and Straub (
1970) model. In the first part of the paper, we present the model of the weighted credibility distribution, and in the second part, a model that applies to a grouped data in intervals.
With our models, we examine two datasets, one with motor claim amounts and the number of motor claims from 10 selected European countries during the period 2004–2020, and a second with monthly returns from July 1926 to July 2022 for 10 industry portfolios. For applying our credibility distribution model with grouped data, we grouped the second dataset (Fama/French financial data) into intervals of claim amounts. Under this setting, the grouping is subjective and the weights are the number of points within each interval and the total weights in each interval are the same.
The monotonicity (or non-monotonicity) and the convergence to one of the estimated distribution functions are shown numerically in
Table 2,
Table 3,
Table 6 and
Table 9. From a theoretical point of view, the monotonicity, as well as the convergence of the estimated distribution functions need further investigation. Furthermore, the sufficient conditions for the asymptotic optimality of the empirical credibility distribution estimators can be also investigated, providing some good ideas for a new project.