Next Article in Journal
Cross-Sectional Determinants of Analyst Coverage for R&D Firms
Previous Article in Journal
Sustaining Algeria’s Retirement System in the Population Aging Context: Could a Contribution Cap Strategy Work?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dependence Modelling for Heavy-Tailed Multi-Peril Insurance Losses

Department of Statistics and Actuarial Science, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
*
Author to whom correspondence should be addressed.
Risks 2024, 12(6), 97; https://doi.org/10.3390/risks12060097
Submission received: 28 April 2024 / Revised: 10 June 2024 / Accepted: 13 June 2024 / Published: 16 June 2024
(This article belongs to the Special Issue Statistical Modelling in Risk Management)

Abstract

:
The Danish fire loss dataset records commercial fire losses under three insurance coverages: building, contents, and profits. Existing research has primarily focused on the heavy-tail behaviour of the losses but ignored the relationship among different insurance coverages. In this paper, we aim to model the aggregate loss for all three coverages. To study the pairwise dependence of claims from all types of coverage, an independent model, a hierarchical model, and some copula-based models are proposed for the frequency component. Meanwhile, we applied composite distributions to capture the heavy-tailed severity component. It is shown that consideration of dependence for the multi-peril frequencies (i) significantly enhances model goodness-of-fit and (ii) provides more accurate risk measures of the aggregated losses for all types of coverage in total.

1. Introduction

Insurance provides financial compensation to individuals or companies after a particular event occurs. Basically, the insurance business relies on the diversification effects after the risks of the policyholders are pooled together and supported by collected premiums from the policyholders, which are primarily determined by the expected amount of claims from each of the policyholders. In this regard, there are two components to be considered for effective risk management purposes: the heavy-tail behaviour of insurance claims and the possible dependence among different insurance coverages. Heavy-tail behaviour can affect the effectiveness of risk mitigation as the risk pooling is inherently based on the total expected claim amounts, whereas an excessively large claim could dampen the solvency of the insurance portfolio due to the heavy-tail behaviour. Such impacts could be more substantial if the insurance portfolio provides multiple types of coverage and the claims from different types of coverage are positively correlated. In this paper, we focus on a reinsurance dataset, Danish multi-peril commercial fire loss, aiming to incorporate the dependency among different insurance coverages and heavy-tailed losses for modelling the monthly aggregate loss for the company.
In this regard, there have been many approaches that could handle heavy-tail behaviours of insurance claims. One example is the peak-over-threshold (POT) approach. However, for a reinsurance company, it is important to consider the financial loss in the entire range instead of focusing on the extreme cases. Additionally, a finite mixture model can be used to handle heavy-tail behaviours, which is a linear combination of multiple distributions such as (Hong and Martin 2018; Miljkovic and Grün 2016). A unique advantage of such models is the ability to construct a multimodal distribution. Different from finite mixture models, a composite model splices and combines random variables (usually continuous random variables) with the consideration of continuity and differentiability at the splicing points. It allows the fitting of different distributions with desirable distributional properties on certain ranges of data, especially to accommodate the heavy-tail nature of the data. For example, (Cooray and Cheng 2015; Pigeon and Denuit 2011) focus on composite lognormal–Pareto models, and (Scollnik and Sun 2012) applied composite Weibull–Pareto models. By considering the model complexity and ability to capture the realistic loss behaviour, we utilize composite models for the loss severities (claim amounts) in this paper.
In addition to the heavy-tail, the dependency among risks is another unignorable behaviour. Two different methods have been applied in this study: the hierarchical and the copula-based modelling frameworks. With the multilevel modelling technique, the hierarchical modelling framework bridges the relationship among different events by sharing the belief that risks from a common environment are not independently distributed. For instance, (Fung et al. 2023) proposed a hierarchical modelling approach that models the number of certain climate events and the associated claim counts subsequently. Recently, (Jeong 2024) considered a multivariate Tweedie distribution where the correlated random effects are modelled only with their moments. An alternative way of studying dependency is the copula method. This methodology allows to flexibly connect random variables using a dependent structure. The authors of (Lee and Shi 2019) suggested a copula-based collective risk model for describing various dependencies in longitudinal insurance claims data. The authors of (Oh et al. 2021) provided a copula-based collective risk model for microlevel multi-year claims data. The authors of (Jeong et al. 2023) considered a factor copula model to capture dependence among claim counts from multiple lines of business.
There are existing studies in the literature that analyze the heavy-tailed behaviour of Danish fire losses. For example, (McNeil 1997) applied the generalized Pareto distribution for the total losses for all coverages and tested the goodness-of-fit. Additionally, (Resnick 1997) suggested some alternative methodologies to study the tail behaviour. However, there is a lack of studies that focus on the dependency among losses under different coverages. They are needed, from the practice perspective, in order to appropriately price and set reserves for multi-peril insurance products. We conduct a comprehensive study to address both issues for effective risk management purposes regarding the aggregate Danish fire loss on a monthly basis. More specifically, under the framework of collective risk models, we model the claim frequency from various types of insurance coverage first and then study the aggregate loss by adding the losses from all insurance coverages and using the composite model for loss amounts.
In this study, we propose three different types of frequency models: a fully independent model as a benchmark model, a hierarchical model, and some copula-based models to model the frequency component. The hierarchical model and copula-based models proposed incorporate the dependency among different coverages. Meanwhile, several two-component composite models are implemented to model the severity component. After conducting statistical and risk analyses, by comparing with the benchmark model, the fully independent model, we conclude that the models with dependency structure significantly improve model goodness-of-fit and provide more accurate risk measures of the aggregate losses for all types of coverages in total. However, there are limitations regarding our proposed models. Our proposed models do not consider the dependency between the frequency and severity. In the literature, for example, (Vernic et al. 2021) proposed a Sarmanov distribution for modelling dependence between the frequency and the average severity of insurance claims. Additionally, there could be models other than copula ones to model the dependence between the claim counts of different types.
The remainder of this article is organized as follows. Section 2 introduces the dataset that motivates our research. Section 3 provides a statistical framework to model the dependent claim frequency from multiple types of insurance coverage. Section 4 provides a framework for the severity component to capture the heavy-tail behaviour of insurance claims. Section 5 provides the estimation results from different models, associated with their implications in risk management. Section 6 concludes this article.

2. Data Exploration

We start with the introduction of a dataset of Danish multi-peril fire losses, which is available in an R library, CASdataset. It was recorded by the Denmark’s Copenhagen Reinsurance Company and contains 2167 commercial fire loss records from 1980 to 1990. Each recorded claim includes the loss amounts of three sections: building, contents, and profits, which are adjusted by inflation using 1985 as the base year. Below, a few rows of the data are provided. The building, contents, and profits columns show the Danish Krone losses in millions and the total column is the sum of the three. Table 1 shows the first 5 rows of the dataset.
We recall that we are interested in analyzing the Danish reinsurance aggregate loss on a monthly basis. The collective risk modelling framework is utilized for each coverage to model the total loss for a single coverage. Then, the adding-up of the losses from three coverages is the aggregate loss we are interested in. In this regard, we aggregate the 132 months claim numbers to obtain the observations using the following notations:
  • M t : Number of reported accidents during month t = 1 , , 132 ;
  • N j t : Number of claims from the jth lines of insurance during month t, where j = 1 , 2 , 3 represent the building, contents, and profits;
  • Y j t k : kth individual loss amounts from jth line of insurance during month t for k = 1 , , N j t ;
  • S j t : Aggregate loss amount from the jth line of insurance during month t, which is defined as
    S j t : = k = 1 N j t Y j t k , N j t > 0
    and 0 otherwise. We use a compound risk model (CRM) to describe S j t ;
  • S t : Aggregate loss for all lines of insurance during month t, which is defined as
    S t : = S 1 t + S 2 t + S 3 t .
For the jth lines of insurance, j = 1 represents the damage to the building, j = 2 is the related contents, and j = 3 stands for the profit line. Since we aggregate the data monthly, we consider whether we could assume that the claim numbers S 1 t , S 2 t , and S 3 t are time-independent. Figure 1 shows the boxplots of the claim numbers in three insurance lines for different months.
The plots show no significant seasonal effect. We also checked that claims in the current month do not affect claims in the following month with a separate exploratory analysis not included in this article. Therefore, it could be innocuous to assume that S 1 , S 2 , S 3 , and S are the aggregate claim random variables that we are interested in, where S 1 t , S 2 t , S 3 t , and S t are i.i.d. samples of S 1 , S 2 , S 3 , and S , respectively. However, we can detect some dependence among the claim numbers from three lines of businesses.
We calculate the Pearson correlation coefficient for any two lines of claim numbers. The claim numbers of buildings are highly related to the contents. The Pearson coefficient is 0.876580. For the contents and profits coverage, the coefficient is 0.7454898. The relationship between the buildings and profits is not as strong as the others, and the coefficient is 0.574442. Overall, this exploratory analysis shows the necessity of modelling dependence among the claim counts from multiple coverage.
It is known that the losses in property insurance are mostly heavy-tailed, whereby the given data is not exceptional. In this regard, (McNeil 1997; Resnick 1997) worked on the extreme value analyses using this dataset, where they applied the peak-over-threshold and estimated the parameters to implement generalized Pareto distributions. In each business line, we observe several data points that have relatively large losses. In Table 2, for all business lines, the averages of the observed losses are higher than the corresponding third quarters.

3. Dependence Modelling for Multivariate Claim Frequencies

To model possible dependence among the claim frequencies from the three types of coverage, we consider three types of models: a fully independent model (Section 3.1), a binomial thinning model (Section 3.2), and copula-based models (Section 3.3). The independent model is used as a benchmark model for comparison. Other models consider the dependence among the claim numbers in different lines. The binomial thinning model utilizes a hierarchical framework to model such a dependence. However, it only allows fixed dependency structures between any two margins. Copula-based models are used to construct the joint distribution flexibly.

3.1. Benchmark Model: Independent Frequency Model

The fully independent frequency model assumes independent relationships among the margins of the frequencies (and, subsequently, severities) from multiple types of coverage.
We recall that the claim numbers data are over-dispersed. To capture this behaviour, we model the number of building, content, and profit claims as well as the number of reported accidents using negative binomial random variables, N 1 , N 2 , N 3 , and M, respectively, instead of using the Poisson distribution that implicitly assumes equi-dispersion. Negative binomial also performs better than Poisson in terms of in-sample goodness-of-fit measures such as AIC, BIC, and the log-likelihood in our dataset. For all j = 1 , 2 , 3 , we assume N j NB ( λ j , r j ) and M NB ( λ , r ) with the following parameterization:
f N j ( N j = n j ) = Γ ( r j + n j ) Γ ( r j ) Γ ( n j + 1 ) λ j r j + λ j n j r j r j + λ j r j ,
f M ( M = m ) = Γ ( r + m ) ! Γ ( r ) Γ ( m + 1 ) λ r + λ m r r + λ r ,
where r and r j are the size parameters of the negative binomial distributions. Instead of using the probability as the second parameter, we use λ j and λ , which stand for the means of the random variables N j and M. The likelihood function of the negative binomial parameters is given by:
L ( θ | D ) = t = 1 132 f N 1 , N 2 , N 3 , M ( N 1 t , N 2 t , N 3 t , M t ; θ ) = ind . t = 1 132 f M ( M ; r , λ ) · f N 1 ( N 1 t ; r 1 , λ 1 ) · f N 2 ( N 2 t ; r 2 , λ 2 ) · f N 3 ( N 3 t ; r 3 , λ 3 ) = t = 1 132 Γ ( r + m t ) Γ ( m t + 1 ) Γ ( r ) λ m t ( r + λ ) r + m t · j = 1 3 Γ ( r j + n j t ) Γ ( n j t + 1 ) Γ ( r i ) λ j n j t ( r j + λ j ) r j + n j t ,
where θ is a vector of all the parameters for the negative binomial distributions for the reported accident numbers and the claim numbers from each coverage. We let D denote the available data.

3.2. Binomial Thinning Model

To investigate the possible dependence of the frequencies of this dataset, we note that the claim numbers ( N j , j = 1 , 2 , 3 ) cannot be larger than the number of accidents reported (M), by definition. In this case, one can consider the binomial distribution as a natural fit for the N j given M, while N j s are conditionally independent given M.
More specifically, we use the negative binomial distribution to model M due to observed over-dispersion, namely M NB ( λ , r ) . We also set N j | M = m BN ( m , λ j / λ ) for the three sources of claim numbers so that the joint distribution of ( M , N 1 , N 2 , N 3 ) can be expressed as:
f N 1 , N 2 , N 3 , M ( n 1 , n 2 , n 3 , m ) = f N 1 | M ( n 1 | M = m ) · f N 2 | M ( n 2 | M = m ) · f N 3 | M ( n 3 | M = m ) · f M ( m ) = j = 1 3 m n j λ j n j ( λ λ j ) m n j λ m · Γ ( r + m ) Γ ( m + 1 ) Γ ( r ) λ r + λ m r r + λ r .
We note that the marginal distributions of N j is a negative binomial random variable with size parameter r and mean λ j , as shown below:
f N j ( n j ) = m = n j f N j ( n j | M = m ) · f M ( m ) = m = n j Γ ( r + m ) Γ ( m + 1 ) Γ ( r ) λ r + λ m r r + λ r · m n j λ j n j ( m λ j ) λ n j λ m = Γ ( r + n j ) Γ ( n j + 1 ) Γ ( r ) m = n j r + m 1 m n j λ λ j λ m n j λ j λ n j λ λ + r m r λ + r r = Γ ( r + n j ) Γ ( n j + 1 ) Γ ( r ) λ j λ j + r n j r λ j + r r m = n j r + m 1 m n j ( λ j + r ) n j + r ( λ λ j ) m n j ( λ + r ) m + r = Γ ( r + n j ) Γ ( n j + 1 ) Γ ( r ) λ j λ j + r n j r λ j + r r , n j = 0 , 1 , .
One can write the last step directly from the previous step because we recognize that the part behind the summation is a probability mass function of a negative binomial. Another way to show that the marginal distribution of N j is negative binomial is to use either probability generating or characteristic functions.
Compared with the independent model, this binomial thinning considers the dependency among three lines of business. However, a very obvious drawback is the unchangeable dependent structure, as the dependent relationship is tied to the marginal distributions.

3.3. Copula-Based Frequency Model

To overcome the drawback of the binomial thinning model, one can use copulas, which were originally defined by (Sklar 1959), where the joint distribution of N 1 , …, N k (denoted by H) can be written as a combination of a copula C and the corresponding marginal distributions F 1 , , F k as follows:
H ( n 1 , , n k ) = P ( N 1 n 1 , , N k n k ) = C ( F 1 ( n 1 ) , , F k ( n k ) ) .
As we consider the frequencies from three types of insurance coverage, one can write the joint probability of the claim frequencies via a copula C as follows:
P ( N 1 = n 1 , N 2 = n 2 , N 3 = n 3 ) = C ( F 1 ( n 1 ) , F 2 ( n 2 ) , F 3 ( n 3 ) ) C ( F 1 ( n 1 1 ) , F 2 ( n 2 ) , F 3 ( n 3 ) ) C ( F 1 ( n 1 ) , F 2 ( n 2 1 ) , F 3 ( n 3 ) ) C ( F 1 ( n 1 ) , F 2 ( n 2 ) , F 3 ( n 3 1 ) ) + C ( F 1 ( n 1 1 ) , F 2 ( n 2 1 ) , F 3 ( n 3 ) ) + C ( F 1 ( n 1 1 ) , F 2 ( n 2 ) , F 3 ( n 3 1 ) ) + C ( F 1 ( n 1 ) , F 2 ( n 2 1 ) , F 3 ( n 3 1 ) ) C ( F 1 ( n 1 1 ) , F 2 ( n 2 1 ) , F 3 ( n 3 1 ) ) .
To maintain consistency in our analysis, we again assume the same marginal distributions of N j , which means that N j NB ( λ j , r i ) for i = 1 , 2 , 3 . Regarding the copula families, we use the following three-dimensional copulas:
  • Gaussian:
    C ( u 1 , u 2 , u 3 ) = Φ 3 | Σ Φ 1 ( u 1 ) , Φ 1 ( u 2 ) , Φ 1 ( u 3 )
    where Φ 3 | Σ is the joint distribution function of the trivariate normal distribution with mean 0 and the (exchangeable) covariance matrix Σ = σ J 3 J 3 + ( 1 σ ) I 3 with J 3 = ( 1 , 1 , 1 ) , I 3 is an identity matrix of size 3 for σ ( 1 , 1 ) , and Φ 1 is the quantile function of a standard normal random variable. It is implicitly assumed all pairwise correlations in the correlation matrix are the same, which means that the Gaussian copula has an exchangeable structure.
  • Gumbel:
    C ( u 1 , u 2 , u 3 ) = exp ( ( log u 1 ) σ G ( log u 2 ) σ G ( log u 3 ) σ G ) 1 / σ G ,
    where σ G 1 is the parameter of the Gumbel copula. A larger σ G value indicates that any pairwise marginals are more positively related.
  • Joe:
    C ( u 1 , u 2 , u 3 ) = 1 1 [ 1 ( 1 u 1 ) σ J ] [ 1 ( 1 u 2 ) σ J ] [ 1 ( 1 u 3 ) σ J ] 1 / σ J ,
    where σ J 1 is the parameter the Joe copula. Similar to the Gumbel copula, a larger σ J constructs stronger positive dependency between any pairwise marginals.

4. Composite Models for Heavy-Tailed Severities

Several positive continuous distributions can be used to study the claim amounts distribution. While some distributions such as gamma and lognormal are good candidates for modelling the low-cost range, they might not be able to capture the heavy-tail behaviour. In this regard, we consider some two-component composite models to model both the body and tail parts in a balanced way.
A two-component composite model combines the body part of a light-tailed distribution with the tail part of a heavy-tailed distribution. Different from mixture distributions, there is no overlap between the supports of these components. By denoting the light-tailed and heavy-tailed densities/distribution functions as g 1 ( Y ) / G 1 ( Y ) and g 2 ( Y ) / G 2 ( Y ) , respectively, one can write the density of a composite random variable with two components as follows:
g c o m p ( y ) = 1 1 + ϕ g 1 ( y ) G 1 ( u ) y < u ; ϕ 1 + ϕ g 2 ( y ) 1 G 2 ( u ) y u ,
where ϕ is the weight parameter, and u is the threshold to separate the two components. The cumulative distribution function of a composite model can be expressed as follows:
G c o m p ( y ) = 1 1 + ϕ G 1 ( y ) G 1 ( u ) y < u ; 1 1 + ϕ + ϕ 1 + ϕ G 2 ( y ) 1 G 2 ( u ) y u .
Regarding the estimation scheme, one can use the maximum likelihood estimation to find the optimal parameters for the body and tail distributions. We note that the threshold u and weight parameter ϕ are not estimated but determined to guarantee the continuity and differentiability of the composite distribution at the threshold with the following constraints:
  • Continuity:
    lim y u g c o m p ( y ) = lim y u + g c o m p ( y ) ϕ = lim y u g 1 ( y ) G 1 ( u ) lim y u + g 2 ( y ) 1 G 2 ( u ) = g 1 ( u ) ( 1 G 2 ( u ) ) g 2 ( u ) G 1 ( u ) ;
  • Differentiability:
    1 1 + ϕ lim y u d d y g 1 ( y ) G 1 ( u ) = ϕ 1 + ϕ lim y u + d d y g 2 ( y ) 1 G 2 ( u ) d d u ln g 1 ( u ) g 2 ( u ) = 0 .
For example, assume that Y 1 and Y 2 follow gamma and inverse gamma distributions, that is Y 1 G ( α 1 , θ 1 ) and Y 2 IG ( α 2 , θ 2 ) with the following density functions:
g 1 ( y 1 ) = ( y 1 / θ 1 ) α 1 e y 1 / θ 1 y 1 Γ ( α 1 ) , y 1 0 ,
g 2 ( y 2 ) = ( θ 2 / y 2 ) α 2 e θ 2 / y 2 y 2 Γ ( α 2 ) , y 2 0 .
With Equations (15) and (16), one can find the threshold and weight parameters as functions of the distribution parameters as follows:
0 = d d u ln g 1 ( u ) g 2 ( u ) = d d u ln ( u / θ 1 ) α 1 e u / θ 1 u Γ ( α 1 ) ( θ 2 / u ) α 2 e θ 2 / u u Γ ( α 2 ) = d d u α 1 ln u u θ 1 + α 2 ln u + θ 2 u = ( α 1 + α 2 ) u u 2 θ 1 θ 2
u = α 1 + α 2 + ( α 1 + α 2 ) 2 4 θ 2 θ 1 2 / θ 1
ϕ = g 1 ( u ) ( 1 G 2 ( u ) ) g 2 ( u ) G 1 ( u ) .
Other composite models considered are listed in Table 3 with corresponding equations to determine the threshold values u. We note that the weight parameter ϕ is given by (15).

5. Empirical Analysis and Implications for Risk Management

5.1. Estimation Results

The logic of estimating the parameters using the benchmark model is straightforward, so that one can directly maximize the joint log-likelihood with a numerical routine, for example, the optim function in R. Table 4 shows the estimated values for the binomial thinning model and the benchmark, independent frequency model. The point estimates of λ , λ 1 , λ 2 , and λ 3 under the two models are similar. Additionally, the standard errors of the mean parameters are smaller compared with the standard errors of dispersion parameters, r, r 1 , r 2 , and r 3 . It is observed that there is a significant level of improvement in the log-likelihood by incorporating dependence via the common factor. We note that the improvements in AIC and BIC are even greater with the dependence modelling as the binomial thinning model is more parsimonious than the independent model.
As mentioned in the previous section, the joint distribution of a copula-based model combines marginal distributions with a copula function. Here, we use the inference by margin (IFM) method, so that the marginal distributions in the independent model are considered as given, while only the copula part is additionally estimated. Table 5 shows the estimated copula parameters and the log-likelihood values of each of the copula models. We note that the parameter estimated with the Gaussian copula model implies positive relationships among the three lines. By comparing the log-likelihood, the Gaussian copula outperforms the others.
For the severity components, we use several composite models. For the body part (modelled with a light-tailed distribution), we consider the gamma and exponential distributions. For the tail part (modelled with a heavy-tailed distribution), we use inverse-gamma, Pareto, and lognormal distributions. In the following Table 6, Table 7 and Table 8, the model selection criteria for various composite models are demonstrated, fitted with the building/content/profit severity data, respectively.
In the case of building losses, the gamma and lognormal ( G and LN ) and gamma and Pareto ( G and P a ) distributions are shown to have the best goodness-of-fit. Likewise, we find that gamma and lognormal ( G and LN ) is the best for modelling contents losses, and gamma and Pareto ( G and P a ) fits the profits losses well. Table 9 shows the point estimates of three composite distributions’ parameters, given the best combinations for each coverage, along with the splicing points distribution and the corresponding weight parameter values based on the parameter estimates. We note that some transformation is required to make the weight parameter meaningful. For example, in the case of building losses, we can interpret 1 1 + ϕ = 0.2433 as the proportion of Y that is from the body part, of the gamma distribution. On the other hand, ϕ 1 + ϕ = 0.7567 of Y is from the tail part, of the lognormal distribution. Specifically, a larger weight parameter value indicates that the composite model is more heavy-tailed, and vice versa. The splicing point parameter, u, indicates the change of distribution components. For the building coverage, the splicing parameter is 2.08943, which means that the building losses greater than 2.08943 million Danish Krone are modelled by a lognormal distribution. Additionally, we observe that the losses from the profit line are more heavy-tailed compared with the losses from the other two lines.

5.2. Empirical Findings for Risk Management

In the insurance industry, estimating the risk level for a product or portfolio is critical for determining appropriate levels of the premium and reserve. We recall that S j and S mean that the random variable stands for the aggregate loss amount from the jth line and the aggregate loss for all lines of insurance, as defined by (1) and (2). It is also straightforward to see that E [ S ] = j = 1 3 E [ S j ] due to the additivity of expectation. However, such a property generally does not hold for other types of risk measures, so it is important to properly analyze the risk level of total claims S , rather than summing up the risk level of S 1 , S 2 , and S 3 . For our risk analysis, we use the following well-known risk measures:
  • Value at Risk (VaR)— VaR α ( Y ) = min { y R : F Y ( y ) α } , α [ 0 , 1 ] ;
  • Tail Value at Risk (TVaR)— TVaR α ( Y ) = E [ Y | Y VaR α ( Y ) ] , α [ 0 , 1 ] ;
  • Proportional Hazard (PH) Risk Measure (Wang 1995)— PH α ( X ) = 0 ( 1 F ( y ) ) 1 / α d y , α 1 .
  • Dual Power (DP) Risk Measure— DP β ( X ) = 0 1 ( F ( y ) ) β d y , β 1 .
While TVaR is not a coherent risk measure unless the underlying distribution is continuous, it is innocuous to assume that the TVaR of S 1 , S 2 , S 3 , and S are coherent, as we mainly focus on the tail part, where the claim amounts are for sure strictly positive and the underlying distributions are continuous. We also note that (Wang 1994) showed that the integration of the transformed distribution is coherent when the transformation is a concave function, so that PH and DP are both coherent.
For comparison of the calculated risk measures under each of the models, we apply a Monte Carlo simulation to numerically evaluate the values of risk measures. More specifically, we simulated 100,000 data points, number of accidents M, the claim numbers for three business lines N 1 , N 2 , and N 3 , and, subsequently, the claim amounts S 1 , S 2 , and S 3 under each of the model specifications with the estimated parameters shown in Section 5.1.
The simulation for the independent model is straightforward. Because of the independence, we apply random generations for the negative binomial distributions to simulate claim numbers N 1 , N 2 , and N 3 for all business lines. Unlike the independent one, the binomial thinning model requires the simulation of the reported number of accidents M. After that, a binomial random generation with size parameters corresponding to the reported claim numbers is applied to get the claim numbers N 1 , N 2 , and N 3 for three lines of business. The logic for the copula models is similar. We first generate trivariate uniform random numbers from the copula functions. With the generated uniform random numbers, we get the claim numbers N 1 , N 2 , and N 3 using the inverse of the marginal distributions. Once the claim frequencies N 1 , N 2 , and N 3 were generated, the severity components are generated, subsequently. For example, if N 1 is given, then uniform random numbers are generated N 1 times and they are converted to the individual severities via the inverse distribution (or quantile) function of the composite distribution function for building losses. Lastly, these values are summed up as S 1 .
Figure 2, Figure 3 and Figure 4 show the scatterplots of the combinations of building, content, and profit claims for observed and simulated frequency data. Based on the plots of observed data, there are apparent positive relationships among the marginal frequencies. As we expected, however, the independent model cannot capture such dependent behaviours. In the case of the other models, the binomial thinning model shows a substantial linear relationship between the building and contents claim numbers, which is the most similar to the observed. In the case of Figure 3, however, the Joe copula best captures the relationship between the building and profit frequencies.
Lastly, Table 10 shows the approximated risk measures under different models. The independent model reproduces relatively smaller values of VaR and TVaR for the aggregated claims S = S 1 + S 2 + S 3 , whereas the calculated risk measure values for each coverage, S 1 , S 2 , and S 3 are more or less the same, regardless of the chosen model. This is quite natural as, regardless of the (assumed) dependence structure, the marginal distributions for N 1 , N 2 , and N 3 (and, subsequently, S 1 , S 2 , and S 3 ) are the same. As a result, the TVaR for S under the independent model is severely underestimated compared to the observed (or empirical) TVaR, while the other dependent models are able to reproduce the empirical TVaR for S with less deviations. It implies that it is required to consider possible dependence among different types of insurance coverage for an effective enterprise risk management purpose.

6. Conclusions and Discussions

In conclusion, we bridged the connection among different coverages by considering the loss amounts incurred under different coverages due to a fire accident and taking into account the heavy-tail behaviour. From the insurance aspect, we assessed several risk measures, which can be interpreted differently. For example, the Value at Risk with α can be interpreted as the assets that should be reserved to reduce the bankruptcy possibility to 1 α . By comparing with the fully independent model, we found that both dependent modelling frameworks performed better from both statistical and insurance aspects. Specifically, the binomial thinning model captured the behaviour of the observed claim numbers better than the independent model from the calculated model evaluation criterion. All binomial thinning and copula-based models provided more reasonable and consistent risk measures.
Additionally, we presented two modelling frameworks to capture the dependency: binomial thinning and copula-based. Although, from the approximate risk measures results, we could not conclude which is the best, we could still observe the flexibility of copula-based models. The binomial thinning model suggested a certain dependent structure. However, by implying different copula functions, we may capture the dependence based on different joint distributions.
There are some concerns and limitations of the current research. Firstly, while we used copulas with discrete random variables, (Genest and Nešlehová 2007) discussed the limitations of applying copula with discrete random variables. Thus, we shall carefully interpret the results relative to the dependent measures because of the lack of uniqueness of the copula functions. However, it is still effective when the discrete random variables’ probability mass is spread widely enough on their support. Secondly, we also implicitly assumed that the frequency and severity components are independent, whereas some existing literature show the presence of dependence among the frequency and severity components, including, but not limited to, (Jeong and Valdez 2020) and (Vernic et al. 2021).
For future research, (Geenens 2020) proposed a method of incorporating dependency for discrete random variables by using an idea similar to the copula method. Based on and further improving it, we can provide more rigorous analyses for the dependent multivariate discrete data. Additionally, one can also study the dependency between the frequency and severity components on top of the dependence among the claims from multiple types of coverage.

Author Contributions

Conceptualization, H.J. and Y.L.; methodology, H.J. and Y.L.; software, T.Y. and H.J.; validation, T.Y.; formal analysis, T.Y.; investigation, T.Y. and H.J.; data curation, T.Y. and H.J.; writing—original draft preparation, T.Y. and H.J.; writing—review and editing, H.J. and Y.L.; visualization, T.Y.; supervision, H.J. and Y.L.; project administration, H.J. and Y.L.; funding acquisition, H.J. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSERC Discovery grant number R611467/R611851 and CANSSI GSE, Scholarship number R619645.

Data Availability Statement

The research data used in this article are available with a R package CASdatasets. The R codes to reproduce the results in this article are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
VaRValue at Risk
TVaRTail Value at Risk
PHProportional Hazard
DPDual Power

References

  1. Cooray, Kahadawala, and Chin-I Cheng. 2015. Bayesian estimators of the lognormal–Pareto composite distribution. Scandinavian Actuarial Journal 2015: 500–15. [Google Scholar] [CrossRef]
  2. Fung, Tsz Chai, Himchan Jeong, and George Tzougas. 2023. Investigating the effect of climate-related hazards on claim frequency prediction in motor insurance. SSRN Electronic Journal, SSRN 4638074. [Google Scholar] [CrossRef]
  3. Geenens, Gery. 2020. Copula modeling for discrete random vectors. Dependence Modeling 8: 417–40. [Google Scholar] [CrossRef]
  4. Genest, Christian, and Johanna Nešlehová. 2007. A Primer on Copulas for Count Data. Astin Bulletin 37: 475–515. [Google Scholar] [CrossRef]
  5. Hong, Liang, and Ryan Martin. 2018. Dirichlet process mixture models for insurance loss data. Scandinavian Actuarial Journal 2018: 545–54. [Google Scholar] [CrossRef]
  6. Jeong, Himchan, and Emiliano A. Valdez. 2020. Predictive compound risk models with dependence. Insurance: Mathematics and Economics 94: 182–95. [Google Scholar] [CrossRef]
  7. Jeong, Himchan, George Tzougas, and Tsz Chai Fung. 2023. Multivariate claim count regression model with varying dispersion and dependence parameters. Journal of the Royal Statistical Society Series A: Statistics in Society 186: 61–83. [Google Scholar] [CrossRef]
  8. Jeong, Himchan. 2024. Tweedie multivariate semi-parametric credibility with the exchangeable correlation. Insurance: Mathematics and Economics 115: 13–21. [Google Scholar] [CrossRef]
  9. Lee, Gee Y., and Peng Shi. 2019. A dependent frequency–severity approach to modeling longitudinal insurance claims. Insurance: Mathematics and Economics 87: 115–29. [Google Scholar] [CrossRef]
  10. McNeil, Alexander J. 1997. Estimating the Tails of Loss Severity Distributions Using Extreme Value Theory. ASTIN Bulletin 27: 117–37. [Google Scholar] [CrossRef]
  11. Miljkovic, Tatjana, and Bettina Grün. 2016. Modeling loss data using mixtures of distributions. Insurance: Mathematics and Economics 70: 387–96. [Google Scholar] [CrossRef]
  12. Oh, Rosy, Himchan Jeong, Jae Youn Ahn, and Emiliano A. Valdez. 2021. A multi-year microlevel collective risk model. Insurance: Mathematics and Economics 100: 309–28. [Google Scholar] [CrossRef]
  13. Pigeon, Mathieu, and Michel Denuit. 2011. Composite Lognormal–Pareto model with random threshold. Scandinavian Actuarial Journal 2011: 177–92. [Google Scholar] [CrossRef]
  14. Resnick, Sidney I. 1997. Discussion of the Danish Data on Large Fire Insurance Losses. ASTIN Bulletin 27: 139–51. [Google Scholar] [CrossRef]
  15. Scollnik, David P., and Chenchen Sun. 2012. Modeling with Weibull-Pareto models. North American Actuarial Journal 16: 260–72. [Google Scholar] [CrossRef]
  16. Sklar, Abe. 1959. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut Statistique de l’Université de Paris VIII: 229–31. [Google Scholar]
  17. Vernic, Raluca, Catalina Bolancé, and Ramon Alemany. 2021. Sarmanov distribution for modeling dependence between the frequency and the average severity of insurance claims. Insurance: Mathematics and Economics 102: 111–25. [Google Scholar] [CrossRef]
  18. Wang, Shaun. 1994. Premium Calculation by Transforming the Layer Premium Density. ASTIN Bulletin 26: 71–92. [Google Scholar] [CrossRef]
  19. Wang, Shaun. 1995. Insurance Pricing and Increased Limits Ratemaking by Proportional Hazards Transforms. Insurance: Mathematics and Economics 17: 43–54. [Google Scholar] [CrossRef]
Figure 1. Exploration of seasonal effects with the Danish reinsurance dataset.
Figure 1. Exploration of seasonal effects with the Danish reinsurance dataset.
Risks 12 00097 g001
Figure 2. Observed and simulated building vs. contents claims.
Figure 2. Observed and simulated building vs. contents claims.
Risks 12 00097 g002
Figure 3. Observed and simulated building vs. profits claims.
Figure 3. Observed and simulated building vs. profits claims.
Risks 12 00097 g003
Figure 4. Observed and simulated contents vs. profits claims.
Figure 4. Observed and simulated contents vs. profits claims.
Risks 12 00097 g004
Table 1. Excerpt from the Danish Fire Dataset.
Table 1. Excerpt from the Danish Fire Dataset.
DateBuildingContentsProfitsTotal
3 January 19801.098096630.585651500.000000001.683748
4 January 19801.756954610.336749600.000000002.093704
5 January 19801.732581260.000000000.000000001.732581
7 January 19800.000000001.305376000.474377751.779754
7 January 19801.244509523.367496000.000000004.612006
Table 2. Summary of loss amount for three business lines.
Table 2. Summary of loss amount for three business lines.
SourceMin.1st Qu.MedianMean3rd Qu.Max.
Building0.023190.966181.320131.986681.97860152.41321
Contents0.000830.290000.575701.701781.44648132.01320
Profits0.004080.100110.266190.851800.6792961.93265
Table 3. Threshold values u for various composite models ( G : gamma, E : exponential, IG : Inverse-gamma, LN : lognormal, and P a : Pareto).
Table 3. Threshold values u for various composite models ( G : gamma, E : exponential, IG : Inverse-gamma, LN : lognormal, and P a : Pareto).
NameHead Dist.Tail Dist.u
G and IG ( x / θ 1 ) α 1 e x / θ 1 x Γ ( α 1 ) ( θ 2 / x ) α 2 e θ 2 / x x Γ ( α 2 ) u = α 1 + α 2 + ( α 1 + α 2 ) 2 4 θ 2 θ 1 2 / θ 1
G and LN ( x / θ 1 ) α 1 e x / θ 1 x Γ ( α 1 ) exp ( ln x μ 2 ) 2 2 σ 2 2 x σ 2 2 π 0 = α 1 u θ 1 + ln x u σ 2 2
G and P a ( x / θ 1 ) α 1 e x / θ 1 x Γ ( α 1 ) α 2 θ 2 α 2 ( x + θ 2 ) α 2 + 1 u = α 1 + α 2 θ 2 θ 1 + ( α 1 + α 2 θ 2 θ 1 ) 2 + 4 θ 2 θ 1 ( α 1 1 ) 2 / θ 1
E and IG e x / θ 1 θ 1 ( θ 2 / x ) α 2 e θ 2 / x x Γ ( α 2 ) u = α 2 + 1 + ( α 2 + 1 ) 2 4 θ 2 θ 1 2 / θ 1
E and LN e x / θ 1 θ 1 exp ( ln x μ 2 ) 2 2 σ 2 2 x σ 2 2 π 0 = 1 θ 1 + 1 u + ln u μ 2 u σ 2 2
E and P a e x / θ 1 θ 1 α 2 θ 2 α 2 ( x + θ 2 ) α 2 + 1 u = ( α 2 + 1 ) θ 1 θ 2
Table 4. Parameter estimates for the binomial thinning and independent frequency models.
Table 4. Parameter estimates for the binomial thinning and independent frequency models.
Binomial Thinning ModelIndependent Frequency Model
EstimatesCI Lower (95%) CI Upper (95%) Sth.Err Estimates CI Lower (95%) CI Upper (95%) Sth.Err
λ 1 15.0814.2415.910.4315.0814.2115.950.44
r 1 ----20.748.9632.526.01
λ 2 12.7211.9713.470.212.7211.9213.520.41
r 2 ----17.597.5527.645.12
λ 3 4.674.275.070.204.674.115.220.28
r 3 ----3.621.945.300.86
λ 16.4215.5317.300.4516.4215.5317.300.45
r25.3210.0340.627.8025.2410.0540.437.75
log L 1183.47 1516.57
AIC 2382.94 3043.14
BIC 2406.00 3057.56
Table 5. The estimates and log-likelihood of copula models.
Table 5. The estimates and log-likelihood of copula models.
Gaussian CopulaGumbel CopulaJoe Copula
Est. parameter0.704521.831472.17170
log L −1015.953−1021.079−1033.461
Table 6. Log-likelihoods of composite models for building severity.
Table 6. Log-likelihoods of composite models for building severity.
G and IG G and P a G and LN E and IG E and P a E and LN
log L −2800.93−2771.15−2771.14−3181.33−3220.69−3220.72
AIC5609.875550.305550.296368.656447.376447.43
BIC5632.255572.685572.676385.446464.166464.22
Table 7. Log-likelihoods of composite models for content severity.
Table 7. Log-likelihoods of composite models for content severity.
G and IG G and P a G and LN E and IG E and P a E and LN
log L −2187.88−2039.52−2037.59−2102.97−2102.81−2102.16
AIC4383.774087.044083.184211.954211.614210.32
BIC4405.474108.744104.884228.234227.894226.60
Table 8. Log-likelihoods of composite models for profit severity.
Table 8. Log-likelihoods of composite models for profit severity.
G and IG G and P a G and LN E and IG E and P a E and LN
log L −309.19−297.19−427.81−305.93−304.53−304.48
AIC626.38602.39863.62617.86615.06614.97
BIC644.07620.08881.31631.13628.33628.24
Table 9. Parameter estimates for the severity components.
Table 9. Parameter estimates for the severity components.
Building: G and LN Contents: G and LN Profits: G and P a
Head Dist. α 1 = 3.71085 α 1 = 1.98766 α 1 = 1.55072
θ 1 = 0.37198 θ 1 = 0.21591 θ 1 = 0.10144
Tail Dist. μ 2 = 331.88884 μ 2 = 1.34871 α 2 = 1.41237
σ 2 = 13.20987 σ 2 = 1.69228 θ 2 = 0.37195
u 2.08943 0.47466 0.11282
ϕ 0.32151 1.34244 2.92302
Table 10. Values of risk measures under different models.
Table 10. Values of risk measures under different models.
MeasureModelBuilding ( S 1 )Content ( S 2 )Profit ( S 3 )Aggregate ( S )
VaR 0.90 Observations43.3675337.610738.92767684.95829
Independent45.8394840.135018.85367482.54968
Bin. Thin.45.5541539.811748.33025987.44851
Gaussian46.0096439.931488.92500788.03041
Gumbel45.9438240.41558.8095588.64008
Joe46.0086240.253668.79239188.37843
TVaR 0.90 Observations67.6328861.730917.34992130.9614
Independent62.2931164.1972721.77994114.7784
Bin. Thin.62.1843963.4674522.57028122.2398
Gaussian62.5756463.7181724.13904123.6911
Gumbel62.7208564.0982724.19948124.4939
Joe63.2034963.4808222.36632122.8164
TVaR 0.95 Observations88.5958882.6581924.1481171.8545
Independent75.1907982.9212732.76977140.1614
Bin. Thin.75.2889582.0128734.9973149.5173
Gaussian75.5330382.1878637.40043151.7549
Gumbel75.8073982.2437937.66422152.5675
Joe76.6454881.3573434.09076149.5723
PH 2 Observations51.9327542.8176011.3142193.62481
Independent54.1954756.3797728.35901102.4392
Bin. Thin.58.8418356.8998140.01965114.9348
Gaussian53.3108150.201463.63473134.7637
Gumbel53.5259649.7184655.63991126.7038
Joe60.1957347.3557638.58851111.7715
DP 3 Observations43.2273335.965718.2477482.29961
Independent41.5723735.754199.2528576.3109
Bin. Thin.41.5092735.526219.4621179.59646
Gaussian41.6904635.602659.9802280.09476
Gumbel41.6976935.824579.9892380.20223
Joe41.8294935.571919.4253479.60271
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, T.; Lu, Y.; Jeong, H. Dependence Modelling for Heavy-Tailed Multi-Peril Insurance Losses. Risks 2024, 12, 97. https://doi.org/10.3390/risks12060097

AMA Style

Yan T, Lu Y, Jeong H. Dependence Modelling for Heavy-Tailed Multi-Peril Insurance Losses. Risks. 2024; 12(6):97. https://doi.org/10.3390/risks12060097

Chicago/Turabian Style

Yan, Tianxing, Yi Lu, and Himchan Jeong. 2024. "Dependence Modelling for Heavy-Tailed Multi-Peril Insurance Losses" Risks 12, no. 6: 97. https://doi.org/10.3390/risks12060097

APA Style

Yan, T., Lu, Y., & Jeong, H. (2024). Dependence Modelling for Heavy-Tailed Multi-Peril Insurance Losses. Risks, 12(6), 97. https://doi.org/10.3390/risks12060097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop