1. Introduction
Loss reserving is the process of estimating the reserve an insurer should hold to meet the future claims payments arising from policies which it has under-written. Insurers underwrite risks and receive premiums to cover claims arising over a specified period. The amount and timing of claim payments are uncertain, and so the insurer is required to set aside sufficient reserves to meet these obligations as and when they fall due. An insurer mitigates the risks to an extent by pooling similar risks. However, there is still uncertainty regarding the timing and quantum of payments, which may cause liquidity strain for the insurer. The failure to generate sufficient liquid assets to meet liabilities in a timely manner will affect business continuity. Therefore, it is crucial for insurers to project future claims payments and estimate the associated volatility in an accurate manner.
The accurate projection of future claims liabilities is important for numerous aspects of an insurer’s operations. From a pricing perspective, an understanding of the expected amount and timing of future claims liabilities enables more precise technical pricing. This allows an insurer to price risks more appropriately and improves its competitiveness within the market. From a reserving perspective, being able to more accurately project future claims will reduce uncertainty and risk margin, which is an amount or margin reflecting an assessment of uncertainty associated with insurance risk (
Risk Margin Working Group 2009). From a capital perspective, greater accuracy in claims projections will enable better allocation of capital to its most appropriate use. Therefore, loss reserving is critical for an insurer as it plays a vital role in informing underwriting, pricing, capital, and planning decisions. For shareholders, reserving and related items form a material portion of an insurer’s financial statements. Mis-reserving constitutes insurance/actuarial risk, which leads to increased capital requirements (for example, in the Solvency II regime in the European Union). Under-reserving will have a direct impact on an insurer’s profitability. However, over-reserving is also problematic, as capital is not directed to its most appropriate use to generate returns. The regulators are also very interested in the sufficiency of reserves to ensure business continuity and the protection of policyholders.
The amount and timing of a claim are highly uncertain for several reasons. Firstly, there is a delay between when an event leading to a loss occurs and the notification of the event to the insurer. The claim may also develops over time, leading to multiple losses being generated. Further delay exists between claim notification, assessment, and settlement. The amount of payment varies depending on the development of the claim over time.
Traditional reserving approaches developed to estimate future claims liabilities are largely deterministic, including the chain-ladder and Bornhuetter–Ferguson techniques (
Bornhuetter and Ferguson 1972). Stochastic methodologies linked to the chain-ladder technique have also been developed to better estimate loss reserve variability. These include the chain-ladder approach in
Mack (
1993) and the bootstrap method in
England and Verrall (
1999).
With advancements in computer processing, machine learning approaches are increasingly adopted to solve problems for which large quantities of data are available. Predictive modeling from generalized linear models (
Haberman and Renshaw 1996) to machine learning techniques (
Gao et al. 2019) have been widely explored and applied in insurance. For insurance reserving, non-parametric individual claim reserving using decision trees is first explored in
Baudry and Robert (
2017). Wüthrich refines Mack’s chain-ladder method using neural networks (2018). More recently,
Kuo (
2019) proposes a novel approach to loss reserving based on deep neural networks in the form of DeepTriangle. The deep neural network approach in
Kuo (
2019) jointly models reserving paid losses and outstanding claims with minimal feature engineering. The model has shown improvements in predictive accuracy (as measured by the root mean squared percentage error and mean absolute percentage error) compared to existing stochastic methods across multiple lines of business.
This paper builds on the loss reserving approach in
Kuo (
2019) and generalizes the DeepTriangle for non-life insurance reserving. The generalized approach offers more flexibility and accuracy in solving actuarial reserving problems than existing techniques. It predicts claims outstanding weighted by exposure instead of loss ratio to remove subjectivity associated with premium weighting. Chain-ladder predicted outstanding claims are used as part of the multi-task learning to remove the dependence on case estimates. Enhancements to the categorical embedding component of the model architecture may further enhance model accuracy. Grid-search is introduced for hyperparameter tuning to improve model performance. The performance of the generalized approach is compared to traditional Chain-ladder, AutoML, and the original DeepTriangle. Results show that the Generalized DeepTriangle approach outperforms the traditional and existing machine learning methods.
The rest of the paper is organized as follows:
Section 2 describes the evolution of actuarial reserving methods over time leading up to this paper, and
Section 3 describes our generalized model architecture.
Section 4 describes the dataset used, details the evaluation metrics for assessing model performance, and discusses results. Lastly,
Section 5 concludes this paper and suggests potential future developments.
2. Related Work, Notation, and Terminologies
This section describes the evolution of reserving methods leading up to our paper. It also introduces the notation and terminology associated with actuarial reserving. Note that only a high-level description of reserving methods relevant to this paper is provided. For a comprehensive overview of the development of reserving approaches over time, refer to
Carrato and Visintin (
2019).
2.1. The Chain-Ladder Method on Cumulative Data
The most common reserving approach for estimating the ultimate cost in non-life insurance is the chain-ladder approach. The chain-ladder method in
Mack (
1993) is often considered as a fundamental form of the approach. It forecasts future claims development based on historical cumulative claims development aggregated by accident and development periods. A distribution-free formula for evaluating the standard error of chain-ladder reserve estimates is also derived.
Let be the incremental claim paid for accident year i and development year j, for and and let be the cumulative claim of accident year i and up to development year j. Then It is assumed that there exists:
developing factors
such that
and
such that
Mack (
1993) proposes the following estimators:
where an estimator for
may be obtained by means of extrapolation.
2.2. Regression on Individual Loss Data
The chain-ladder approach is a reserving method based on aggregated claims experience by accident and development periods. Regression based on individual loss data first proposed by
Norberg (
1993) and
Hesselager (
1994) enables more granular data to be used for predicting future claims developments.
Let
be the number of claims for accident year
i. Denote
, the cumulative payment up to time
of the
h-th claim of accident year
i. The total cumulative payment up to time
for the accident year
i is:
Therefore, for an individual claim, the following equation holds true:
Then, the following estimator can be used to predict the ultimate loss:
2.3. Clustering on Individual Loss Data
The chain-ladder model in
Mack (
1993) assumes that claims are homogeneous, which does not always hold for an entire population in practice. To address this, clustering of claims into homogeneous groups is proposed, assuming that a linear model is applicable for each group of claims. Let
K be the total number of clusters for a portfolio, the total cumulative claim payment up to time
for accident year
i and cluster
k is:
where
represents the
h-th claim which belongs to the
k-th cluster in calendar year
, and the total cumulative payment up to time
for accident year
i is:
Therefore, for each cluster, the following equation holds true:
The following estimator can be used to predict the ultimate loss:
where
has the similar definition for
in (4), but with
being replaced by
Individual claim reserving models using large amount of granular information sit on the opposite end of the spectrum of loss reserving approaches to aggregate reserving methods like the chain-ladder, which uses relatively limited data. Clustering enables the forecasting of claims reserves at a segment level, balancing the granularity of reserving at an individual level with the reduced volatility of aggregate reserving approaches.
2.4. Dual Input Paid-Incurred Model on Individual Loss Data
Incurred claims cost is the sum of the paid, to-date amount and case estimates on open claims. The inclusion of case estimates in future claims prediction is often beneficial as it allows for situations where few payments have been made but are expected in the future. Hence, joint models accounting for both paid-to-date and incurred costs increase accuracy.
In addition to cumulative claims paid, incurred claim amounts can also be included as an input to the modeling. Let
K be the total number of clusters of the portfolio. The total incurred claim payment
, for accident year
i, in calendar year
for cluster
k is:
where
indicates the
h-th incurred claim for the
k-th cluster in calendar year
The following estimator can be used to predict incurred loss:
However, not all lines of business have case reserves, so the approach is not universally applicable.
2.5. Artificial Neural Networks (ANN)
Advancements in artificial intelligence and machine learning have led to novel and modern approaches of solving actuarial problems through big data.
Wüthrich (
2018) proposes the application of neural networks to the chain-ladder reserving.
The DeepTriangle architecture in
Kuo (
2019) uses a feed-forward network with fully connected layers; see the illustration in
Figure 1. Output
y is predicted from the input vector
x. Hidden layers, as represented by
transform the input into representations which gradually increase in the predictive power of the output as we move across each layer
. Each node
is computed iteratively as:
where
L represents the total number of layers,
represents the total components of the
l-th layer,
is the activation function which is chosen to be nonlinear,
is the activation column vector,
is the row weights vector, and
is the biases scalar.
Conventionally, and . The weights and biases are the parameters of the neural network learned during training. They are selected by the neural network to maximize prediction accuracy.
The chain-ladder factors for artificial neural networks are found by minimizing a given appropriate loss function. Each development period j has its own neural network architecture to be optimized with respect to the loss function. The loss function is used to measure how close the model predictions are to the actual values.
2.6. DeepTriangle
Kuo (
2019) proposed the DeepTriangle as a novel approach for loss reserving based on the deep neural network described in
Section 2.5. It jointly models paid losses and claims outstanding stated in
Section 2.3 and incorporates heterogeneous inputs in
Section 2.2. The key components of the model architecture are described below.
2.6.1. Sequence-to-Sequence Architecture
The architecture uses a class of algorithms called sequence-to-sequence learning (
Sutskever et al. 2014). Instead of relying on single data points, the model takes a sequence of ordered events as input and predicts a sequence into the future, making it suitable for reserving claim development predictions.
We have previously defined
to be the incremental claims paid. Here we define
to be the total claims outstanding for accident year
i and development year
j where
and
. Then, at the end of calendar year
I, we have access to the observed data
Then
is the ultimate loss for accident year
which can be estimated by
The gated recurrent unit (GRU) in
Chung et al. (
2014) is used to process the paid losses and claims outstanding sequences. Here, we use the notation as in
Kuo (
2019) and define the activation function
at time
t as follows:
where:
represents the input values,
represents the logistic sigmoid function,
represent weight matrices,
and represent biases to be learnt.
Each activation function retains values from earlier values of the input sequence and gives a certain weight to the estimated current state and the previous state .
2.6.2. Multi-Task Learning
DeepTriangle simultaneously models two sequences as input and two as output. This means that one task can reuse insights derived from the other.
Kuo (
2019) proposes the use of paid losses and case reserve by accident and development year as the dual input sequences.
Kuo (
2019) defines the two sequences of inputs and outputs as:
where
and
represents the net earned premium for accident year
i. Note that the model takes in and predicts loss ratios to normalize the inputs and outputs.
2.6.3. Categorical Embedding
Company codes are passed to an embedding layer, with each company represented by a vector in
as in
Guo and Berkhahn (
2016). Company codes are mapped onto a multi-dimensional vector space, where segments with similar implicit behaviors are placed closer together. In other words, it implicitly finds the relationships between segments, serving as a proxy for company characteristics.
4. Data, Experiments, and Results
This section first details the data source and data pre-processing and then describes the evaluation metrics used to assess the model performance against the benchmark models before illustrating the results.
4.1. Data
Kuo (
2019) uses the National Association of Insurance Commissioners (NAIC) Schedule P dataset (
Meyers and Peng 2019). The dataset includes claims over accident years 1988–1997 and 10 development years for each accident year. Schedule P data are aggregated by accident year, development year by line of business, and group code. It includes both aggregated premium and claims information. However, Schedule P data have the following two limitations:
The dataset does not include information on the number of lives or policy start and end dates, meaning that it is not possible to use exposure years as a weight
The dataset is aggregated with only line of business and company code segmentations, making it difficult to conduct modeling and analysis at a more granular level. This also limits our ability to understand the drivers of the experience
Extensive research has been conducted into publicly available insurance data sources for the most suitable dataset. The individual claims history simulation machine in
Gabrielli and Wüthrich (
2018) produces insurance datasets that are more suitable and addresses the limitations of Schedule P data.
Gabrielli and Wüthrich (
2018) developed a stochastic simulation machine that generates individual claims histories of non-life insurance claims. The simulation machine enables users to simulate a synthetic insurance portfolio of individual claims histories based on real non-life insurance data.
The final dataset is a simulated dataset that corresponds to claims over accident years 1994 to 2005, with over 12 development years of experience. It contains the feature information for each claim in
Table 1.
The benefits of the simulation machine dataset include:
The dataset is at an individual claim line level, enabling exposure-weighting. It also offers more flexibility in the level of granularity used for modeling
The existence of multiple feature information (line of business, claims code, age, and injury part) offers more information on the claims and enables more granular segmentation
4.2. Data Processing
Table 2 outlines the parameters adopted to simulate the individual claims dataset for our analysis. The only potential limitation for such parameter adoptions is not varying the claims volatility, but this was a deliberate decision so as not to add further complexity when interpreting model results, given that each LOB already has its intrinsic characteristics within its data.
Recovery payments have been excluded from the input dataset as the model adopts an activation function that predicts nonnegative cash flows.
The individual claims dataset is aggregated for the purpose of this paper. Aggregation is performed by accident year, development lag, line of business, and claims code, with a number of claims, exposure years, and paid losses summarized for analysis. We have separately repeated the model by lines of business only to understand the impact of segmentation on model predictiveness. More details on the methodology are provided in the following subsection.
We have split the data into the following segments for model prediction and validation:
We assess the model predictiveness based on cumulative predicted payments for development year 10.
4.3. Performance Evaluation Metrics
A range of validation methods have been proposed for evaluating the performance of reserving models. This paper uses the Mean Absolute Percentage Error (MAPE) and Root Mean Square Percentage Error (RMSPE) in the model evaluation process. MAPE and RMSPE are adopted for consistency with
Kuo (
2019). Percentage errors enable unit-free measurement over each segment. In this case, the segment is defined by the categorical variable passed through the embedding layer. The actual and predicted cumulative ultimate losses as at development year 10 by segment are compared to evaluate model performance.
For line of business
l,
and
where
, represents the count of all possible values that the categorical variable can take and, is the set of possible levels which the categorical input can take, is the number of elements in and
and are the actual and predicted cumulative ultimate loss for the cth categorical variable as at development year 10.
4.4. Benchmark Models
To assess the performance of the generalized DeepTriangle approach, the model’s MAPE and RMSPE are compared against that for the chain-ladder method in
Mack (
1993) and the AutoML model adopted in the original DeepTriangle approach in
Kuo (
2019).
The chain-ladder method in
Mack (
1993) enables a comparison to traditional, judgement-free reserving technique. The AutoML model enables a comparison of model performance against alternative machine learning techniques, which is developed through automated searches over common machine learning techniques. It is trained over an ensemble involving a random forest, an extremely randomized forest, and a random grid of gradient boosting machines, a random grid of deep feedforward neural networks (
H2O.ai 2018). An iterative forecasting scheme is used to predict each timestep.
4.5. Parameterization and Implementation
Table 3 below details the key model parameters used for training the model.
We use the average mean squared error over the forecasted time steps as the loss function of the prediction. For each accident and development year set
, the per-sample loss function is defined as:
We create an ensemble of 10 models trained with the same model architecture but different initial seeds. We take the average predicted ultimate claims at development year 10 for performance evaluation. This is accomplished to reduce the variation in predicted targets associated with neural network models. Note that increasing the number of models will lead to further variance reductions but requires a longer training time.
4.6. Results and Discussions
We have applied the benchmark models and the generalized DeepTriangle architecture to predict ultimate claims payment.
Table 4 provides a comparison of model performance here DeepTriangle (
Kuo 2019) is the ultimate claims prediction using the original DeepTriangle methodology, Generalized DeepTriangle (aggregated) is the prediction at an aggregate level, meaning without claim code segmentation; generalized DeepTriangle is the prediction using claim code categories as categorical embedding. It can be seen that the generalized DeepTriangle outperforms the benchmark models both at a portfolio level and across each line of business.
For the Generalized DeepTriangle (aggregated), the results using batch size 2 are used as they yield the best overall performance at an aggregate level. The results under batch size 32 are used for the generalized DeepTriangle for the same reason. It is worth noting that line of business 3 has lower exposure and greater volatility than the other lines of business. This has led to higher prediction uncertainties when using machine learning approaches compared to traditional Mack’s chain ladder approach.
There is also an optimal range for batch size depending on the level of segmentation.
Table 5 and
Table 6 compare MAPE by batch sizes. The optimal batch size for the aggregate prediction (between 2 and 8) is materially lower than for the more granular prediction by claim code (between 32 and 256). This is intuitive as larger batch sizes group more claim codes together, reducing the variance. Therefore, the addition of a grid search for hyperparameter optimization enhances model performance.
Analysis has also been performed on RMSPE on top of MAPE in
Table 7, and it also yields better overall performance. However, due to RMSPE being a square error, it emphasizes uncertainty on more volatile portfolios and leads to better performance on less volatile portfolios. As can be seen on the volatilities for AutoML, Kuo’s DeepTriangle, as well as the generalized approach, the MAPE enables a better comparison of results.
5. Conclusions and Potential Further Extensions
This paper proposes some extensions to the DeepTriangle methodology developed in
Kuo (
2019) in several aspects as described in
Section 3.
On a practical note, reserving requires significant regulatory oversight, making applications of machine learning techniques difficult. Not only does the result need to be accurate, but it also needs to be explainable and stable. Improving model interpretability and reducing volatility have been continued areas of research as we develop more advanced machine learning techniques.
To best enable advancement in this field, we need to develop both short-term applications as well as ongoing model improvements to make it usable in a corporate context. In the short term, the Generalized DeepTriangle can be used as a guide to supplement existing reserving methodologies. The Generalized DeepTriangle picks up on the subtler changes in claims behavior and claims profiles, which may be difficult to identify in a timely manner under traditional aggregated reserving approaches. Compared to other machine learning methods for predicting claims behaviors, the Generalized DeepTriangle is the closest in structure and more comparable to traditional reserving methods as it predicts by accident and development periods on historic claims experience. Therefore, it may supplement existing reserving methodologies and inform on reserving trends in a rapidly changing post-pandemic environment.
There is potential for further model enhancements. The first option is to conduct principal component analysis on key categorical variables to determine the optimal segmentation to feed into the embedding layer. Alternatively, the model architecture could be modified to embed multiple categorical variables.