Risks

Research

Jump to: Review

26 pages, 1035 KiB

Open AccessArticle

Concordance Probability for Insurance Pricing Models

by Jolien Ponnet, Robin Van Oirbeek and Tim Verdonck

Risks 2021, 9(10), 178; https://doi.org/10.3390/risks9100178 - 8 Oct 2021

Cited by 3 | Viewed by 2493

Abstract

The concordance probability, also called the C-index, is a popular measure to capture the discriminatory ability of a predictive model. In this article, the definition of this measure is adapted to the specific needs of the frequency and severity model, typically used during [...] Read more.

The concordance probability, also called the C-index, is a popular measure to capture the discriminatory ability of a predictive model. In this article, the definition of this measure is adapted to the specific needs of the frequency and severity model, typically used during the technical pricing of a non-life insurance product. For the frequency model, the need of two different groups is tackled by defining three new types of the concordance probability. Secondly, these adapted definitions deal with the concept of exposure, which is the duration of a policy or insurance contract. Frequency data typically have a large sample size and therefore we present two fast and accurate estimation procedures for big data. Their good performance is illustrated on two real-life datasets. Upon these examples, we also estimate the concordance probability developed for severity models. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

19 pages, 1404 KiB

Open AccessArticle

Synthetic Dataset Generation of Driver Telematics

by Banghee So, Jean-Philippe Boucher and Emiliano A. Valdez

Risks 2021, 9(4), 58; https://doi.org/10.3390/risks9040058 - 24 Mar 2021

Cited by 19 | Viewed by 6387

Abstract

This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables [...] Read more.

This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process while using machine learning algorithms. In the first stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The second stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The third stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work ot be valuable. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

33 pages, 962 KiB

Open AccessArticle

Alleviating Class Imbalance in Actuarial Applications Using Generative Adversarial Networks

by Kwanda Sydwell Ngwenduna and Rendani Mbuvha

Risks 2021, 9(3), 49; https://doi.org/10.3390/risks9030049 - 8 Mar 2021

Cited by 14 | Viewed by 4232

Abstract

To build adequate predictive models, a substantial amount of data is desirable. However, when expanding to new or unexplored territories, this required level of information is rarely always available. To build such models, actuaries often have to: procure data from local providers, use [...] Read more.

To build adequate predictive models, a substantial amount of data is desirable. However, when expanding to new or unexplored territories, this required level of information is rarely always available. To build such models, actuaries often have to: procure data from local providers, use limited unsuitable industry and public research, or rely on extrapolations from other better-known markets. Another common pathology when applying machine learning techniques in actuarial domains is the prevalence of imbalanced classes where risk events of interest, such as mortality and fraud, are under-represented in data. In this work, we show how an implicit model using the Generative Adversarial Network (GAN) can alleviate these problems through the generation of adequate quality data from very limited or highly imbalanced samples. We provide an introduction to GANs and how they are used to synthesize data that accurately enhance the data resolution of very infrequent events and improve model robustness. Overall, we show a significant superiority of GANs for boosting predictive models when compared to competing approaches on benchmark data sets. This work offers numerous of contributions to actuaries with applications to inter alia new sample creation, data augmentation, boosting predictive models, anomaly detection, and missing data imputation. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

19 pages, 550 KiB

Open AccessArticle

Applications of Clustering with Mixed Type Data in Life Insurance

by Shuang Yin, Guojun Gan, Emiliano A. Valdez and Jeyaraj Vadiveloo

Risks 2021, 9(3), 47; https://doi.org/10.3390/risks9030047 - 3 Mar 2021

Cited by 8 | Viewed by 4108

Abstract

Death benefits are generally the largest cash flow items that affect the financial statements of life insurers; some may still not have a systematic process to track and monitor death claims. In this article, we explore data clustering to examine and understand how [...] Read more.

Death benefits are generally the largest cash flow items that affect the financial statements of life insurers; some may still not have a systematic process to track and monitor death claims. In this article, we explore data clustering to examine and understand how actual death claims differ from what is expected—an early stage of developing a monitoring system crucial for risk management. We extended the k-prototype clustering algorithm to draw inferences from a life insurance dataset using only the insured’s characteristics and policy information without regard to known mortality. This clustering has the feature of efficiently handling categorical, numerical, and spatial attributes. Using gap statistics, the optimal clusters obtained from the algorithm are then used to compare actual to expected death claims experience of the life insurance portfolio. Our empirical data contained observations of approximately 1.14 million policies with a total insured amount of over 650 billion dollars. For this portfolio, the algorithm produced three natural clusters, with each cluster having lower actual to expected death claims but with differing variability. The analytical results provide management a process to identify policyholders’ attributes that dominate significant mortality deviations, and thereby enhance decision making for taking necessary actions. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

14 pages, 953 KiB

Open AccessArticle

Mining Actuarial Risk Predictors in Accident Descriptions Using Recurrent Neural Networks

by Jean-Thomas Baillargeon, Luc Lamontagne and Etienne Marceau

Risks 2021, 9(1), 7; https://doi.org/10.3390/risks9010007 - 29 Dec 2020

Cited by 7 | Viewed by 2889

Abstract

One crucial task of actuaries is to structure data so that observed events are explained by their inherent risk factors. They are proficient at generalizing important elements to obtain useful forecasts. Although this expertise is beneficial when paired with conventional statistical models, it [...] Read more.

One crucial task of actuaries is to structure data so that observed events are explained by their inherent risk factors. They are proficient at generalizing important elements to obtain useful forecasts. Although this expertise is beneficial when paired with conventional statistical models, it becomes limited when faced with massive unstructured datasets. Moreover, it does not take profit from the representation capabilities of recent machine learning algorithms. In this paper, we present an approach to automatically extract textual features from a large corpus that departs from the traditional actuarial approach. We design a neural architecture that can be trained to predict a phenomenon using words represented as dense embeddings. We then extract features identified as important by the model to assess the relationship between the words and the phenomenon. The technique is illustrated through a case study that estimates the number of cars involved in an accident using the accident’s description as input to a Poisson regression model. We show that our technique yields models that are more performing and interpretable than some usual actuarial data mining baseline. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

23 pages, 961 KiB

Open AccessArticle

Application of a Vine Copula for Multi-Line Insurance Reserving

by Himchan Jeong and Dipak Dey

Risks 2020, 8(4), 111; https://doi.org/10.3390/risks8040111 - 21 Oct 2020

Cited by 5 | Viewed by 3051

Abstract

This article introduces a novel use of the vine copula which captures dependence among multi-line claim triangles, especially when an insurance portfolio consists of more than two lines of business. First, we suggest a way to choose an optimal joint loss development model [...] Read more.

This article introduces a novel use of the vine copula which captures dependence among multi-line claim triangles, especially when an insurance portfolio consists of more than two lines of business. First, we suggest a way to choose an optimal joint loss development model for multiple lines of business that considers marginal distribution, vine copula structure, and choice of family for each pair of copulas. The performance of the model is also demonstrated with Bayesian model diagnostics and out-of-sample validation measures. Finally, we provide an implication of the dependence modeling, which allows a company to analyze and establish the risk capital for whole portfolio. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

12 pages, 827 KiB

Open AccessFeature PaperArticle

Address Identification Using Telematics: An Algorithm to Identify Dwell Locations

by Christopher Grumiau, Mina Mostoufi, Solon Pavlioglou and Tim Verdonck

Risks 2020, 8(3), 92; https://doi.org/10.3390/risks8030092 - 1 Sep 2020

Cited by 1 | Viewed by 3380

Abstract

In this work, a method is proposed for exploiting the predictive power of a geo-tagged dataset as a means of identification of user-relevant points of interest (POI). The proposed methodology is subsequently applied in an insurance context for the automatic identification of a [...] Read more.

In this work, a method is proposed for exploiting the predictive power of a geo-tagged dataset as a means of identification of user-relevant points of interest (POI). The proposed methodology is subsequently applied in an insurance context for the automatic identification of a driver’s residence address, solely based on his pattern of movements on the map. The analysis is performed on a real-life telematics dataset. We have anonymized the considered dataset for the purpose of this study to respect privacy regulations. The model performance is evaluated based on an independent batch of the dataset for which the address is known to be correct. The model is capable of predicting the residence postal code of the user with a high level of accuracy, with an f1 score of 0.83. A reliable result of the proposed method could generate benefits beyond the area of fraud, such as general data quality inspections, one-click quotations, and better-targeted marketing. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

Review

Jump to: Research

26 pages, 657 KiB

Open AccessEditor’s ChoiceReview

Machine Learning in P&C Insurance: A Review for Pricing and Reserving

by Christopher Blier-Wong, Hélène Cossette, Luc Lamontagne and Etienne Marceau

Risks 2021, 9(1), 4; https://doi.org/10.3390/risks9010004 - 23 Dec 2020

Cited by 33 | Viewed by 15937

Abstract

In the past 25 years, computer scientists and statisticians developed machine learning algorithms capable of modeling highly nonlinear transformations and interactions of input features. While actuaries use GLMs frequently in practice, only in the past few years have they begun studying these newer [...] Read more.

In the past 25 years, computer scientists and statisticians developed machine learning algorithms capable of modeling highly nonlinear transformations and interactions of input features. While actuaries use GLMs frequently in practice, only in the past few years have they begun studying these newer algorithms to tackle insurance-related tasks. In this work, we aim to review the applications of machine learning to the actuarial science field and present the current state of the art in ratemaking and reserving. We first give an overview of neural networks, then briefly outline applications of machine learning algorithms in actuarial science tasks. Finally, we summarize the future trends of machine learning for the insurance industry. Full article

(This article belongs to the Special Issue Data Mining in Actuarial Science: Theory and Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data Mining in Actuarial Science: Theory and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI