Next Article in Journal
Periodic and Quasi-Periodic Orbits in the Collinear Four-Body Problem: A Variational Analysis
Previous Article in Journal
Efficient Adaptation: Enhancing Multilingual Models for Low-Resource Language Translation
Previous Article in Special Issue
Multi-Task Forecasting of the Realized Volatilities of Agricultural Commodity Prices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation

Global Management Studies, Ted Rogers School of Management, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada
Mathematics 2024, 12(19), 3150; https://doi.org/10.3390/math12193150
Submission received: 14 September 2024 / Revised: 5 October 2024 / Accepted: 7 October 2024 / Published: 8 October 2024

Abstract

:
This study examines how telematics variables such as annual percentage driven, total miles driven, and driving patterns influence the distributional behaviour of conventional rating factors when incorporated into predictive models for capturing auto insurance risk in rate regulation. To effectively manage the complexity inherent in telematics data, we advocate for the adoption of non-negative sparse principal component analysis (NSPCA) as a structured approach for data dimensionality reduction. By emphasizing sparsity and non-negativity constraints, NSPCA enhances the interpretability and predictive power of models concerning both loss severity and claim counts. This methodological innovation aims to advance statistical analyses within insurance pricing frameworks, ensuring the robustness of predictive models and providing insights crucial for rate regulation strategies specific to the auto insurance sector. Results show that, to enhance auto insurance risk pricing models, it is essential to address data dimension reduction challenges when integrating telematics data variables. Our findings underscore that integrating telematics variables into predictive models maintains the integrity of risk relativity estimates associated with traditional policy variables.

1. Introduction

Insurance pricing, particularly in auto insurance, has been a pivotal element of the insurance industry, serving not only as a crucial business component but also as an essential tool for ensuring fairness and regulatory compliance [1]. Auto insurance pricing, encompassing both additive and multiplicative algorithms [2,3], has traditionally focused on determining risk relativities for rating factors across various levels. Predictive models, such as generalized linear models (GLMs), have been employed to streamline this process. In auto insurance, a revolutionary approach has emerged, known as usage-based insurance (UBI) [4,5,6,7]. UBI takes advantage of telematics data to predict future claim loss and claim frequency, employing this predictive insight to establish risk factor-level relativities for insurance pricing [8]. This cutting-edge methodology involves harnessing real-time data from telematics devices installed in vehicles, with the primary objective of capturing and analyzing the driving patterns of individuals [9,10,11]. Unlike conventional approaches, UBI introduces a dynamic and personalized dimension to insurance pricing. However, the influence of incorporating telematics data into predictive models for insurance losses remains largely unexplored, particularly from the rate regulation and management of aggregate risk perspective. In rate regulation, the primary focus is on determining the relativity of major risk factors used to assess the underlying risks associated with drivers and vehicles. It is important to assess the impact of incorporating telematics variables on the estimation of risk relativity for traditional major risk factors.
One of the primary advantages of adopting UBI lies in its ability to incentivize responsible driving behaviour [12,13,14]. By closely monitoring individual driving habits through telematics data, UBI allows for a more accurate assessment of risk. As a result, drivers who exhibit safe and responsible driving patterns stand to be rewarded with more favourable premium rates. This not only creates a fairer and more personalized insurance pricing structure but also aligns with broader societal goals of promoting road safety and reducing accidents [15,16]. Furthermore, UBI contributes to a paradigm shift in the insurance industry by fostering a proactive approach to risk management. Traditional pricing models rely heavily on historical claim data and generalized assumptions of loss distribution, whereas UBI embraces real-time information, providing insurers with a more granular understanding of risk factors [17]. This not only enhances the accuracy of pricing but also enables insurers to promptly respond to changing driving behaviours and emerging trends [18]. Nevertheless, the implementation of UBI contracts remains relatively slow in North America, with insights from [19] employing economic theory aiding in elucidating this phenomenon. Studying how usage-based insurance (UBI) data can enhance the accuracy of insurance pricing and ensure actuarial fairness, as required in regulatory practices, may aid in promoting the adoption of UBI contracts.
As the insurance pricing landscape continues to evolve, embracing innovative solutions like UBI becomes imperative for insurers to stay competitive, adaptive, and aligned with the evolving expectations of both regulators and policyholders. The pressing question at hand revolves around the impact that the inclusion of telematics risk factors in predictive models has on the estimation of loss severity and claim frequency [20,21]. In rate regulation for auto insurance, the traditional methodology involves determining the relativity associated with major factors, such as age group, insured gender, car age, car use, and marital status [22,23]. Companies then employ these estimated risk relativities as benchmarks for individualized insurance pricing, where more risk factors may be involved. However, as telematics data come to play in the pricing process [14,24,25], introducing variables like annual percentage driven, total miles driven, percentage driven during a day, and percentage driven in a week into predictive models, the dynamics of the claim probability distribution or loss cost distribution for each traditional rating variable may undergo changes. This alteration, essentially an impact of telematics variables, has the potential to exert varying effects on the established regulating variables. It becomes crucial to explore whether these changes lead to substantial shifts in the traditional regulating variables or if their influence remains relatively unaffected. This prompts a compelling need to delve into a comprehensive study on the effect of telematics data on the estimation of both claim frequency and claim severity. By understanding how telematics variables influence the outcome of rate regulation in auto insurance, we can gain valuable insights into the relationship between traditional and evolving factors.
In predictive modelling, managing model complexity is essential for understanding the contribution of each variable. To enhance the efficiency of models estimating claim probability and claim amounts using telematics data, we applied non-negative sparse principal component analysis (NSPCA) [26,27] for dimension reduction. This approach simplifies the model and improves interpretability by systematically reducing the dimensionality of the telematics data, providing a more focused input for the predictive models. NSPCA is particularly suitable for telematics data, as variables like driving speed and acceleration are inherently non-negative, aligning naturally with NSPCA’s non-negativity constraint. Additionally, NSPCA’s sparsity constraint identifies a smaller set of influential factors, focusing on key driving behaviours. This enhances interpretability while reducing complexity, which is crucial for developing accurate models for predicting claim counts and amounts. Our method involves applying NSPCA to each telematics risk factor at the observation level, reducing the dimensionality by retaining only the most significant principal components. The integration of sparsity ensures that only the most influential levels, contributing meaningfully to these components, are preserved. This deliberate reduction streamlines the data and enhances the interpretability of the major components. Furthermore, the non-negativity constraint ensures that the resultant combinations of variable levels are both meaningful and practically relevant, reinforcing the effectiveness of the predictive models.
This study aims to explore the impact of various pricing strategies on estimating the risk relativity of traditional policy variables, providing insights into how telematics-based auto insurance pricing influences rate regulation. Following the reduction of telematics data dimensions using NSPCA, three predictive models were constructed for claim counts and claim amounts. Initially, a model was built solely on traditional policy variables, excluding telematics data. Subsequently, a model incorporating telematics data was considered, followed by a combined model using both traditional policy variables and telematics-derived variables. The distributional behaviours of the resulting relativities from these models were compared to empirically calculated ones. This study not only contributes to the refinement of predictive models but also aids in the formulation of more accurate and responsive rate regulation strategies that align with the dynamic landscape of the auto insurance industry. In essence, it allows us to adapt and optimize the regulatory framework to better cater to the evolving needs and complexities of the auto insurance market.
The novelty of this research is the introduction of NSPCA to the analysis of telematics, which helps identify essential components of each telematics variable and reduces dimensionality, enabling more effective management of telematics data. The findings of this study carry substantial implications for auto insurance rate regulation. They affirm that the emphasis on traditional policy variables in rate regulation practices is justified. The integration of telematics data alongside traditional policy variables does not result in a significant deviation in the distribution of key regulatory risk factors. Hence, it represents a suitable pricing strategy for insurance markets that incorporate UBI.

2. Literature Review

The existing literature has predominantly concentrated on the development of statistical models using telematics data, with a primary focus on achieving more precise pricing for auto insurance policies and fostering responsible driving habits. For instance, ref. [24] employed data from a Belgian telematics product specifically targeting young drivers. The research studied how to determine car insurance premiums based on telematics data. By using statistical modelling techniques, specifically, generalized additive models, the research convincingly demonstrated that incorporating telematics variables significantly improves predictive accuracy. The study revealed that gender, once considered a crucial rating variable, becomes unnecessary when assessing expected claim frequency in the presence of telematics variables. The work conducted in [24] shares some similarities with our work, but we address the impact of telematics variables on traditional policy variables including gender from a rate regulation perspective.
Modelling the annual usage of a driver is a key aspect in UBI. In [28], the research contributes to this aspect by using telematics data to enhance auto insurance rating mechanisms. The analysis incorporates the distance travelled per year into a zero-inflated Poisson model, strategically predicting the excess of zero claims. The findings unveil a learning effect, suggesting that extended driving durations may lead to higher premiums, but drivers accumulating longer distances over time may benefit from potential discounts. This underscores the potential of telematics information in not only refining insurance models but also bolstering efforts to enhance overall traffic safety. Another interesting study using teletmatics data by [29] employed anomaly detection algorithms on vehicles’ trip summaries. The study developed routine and anomaly profiles for each vehicle, finding that features extracted from the vehicles’ profile enhance claim classification in the framework of anomaly-based insurance. Other recent research that uses telematics data for insurance pricing or discovering the driving patterns so that the associated risk level can be identified, has appeared in [30,31,32,33].
In addition to telematics data, the utilization of GPS data for insurance ratemaking has emerged as a significant avenue of exploration. A notable study by [34] integrates GPS data into motor insurance ratemaking, highlighting the insurance sector’s potential to determine premium rates based on driver behaviour. Employing count data regression models, the approach incorporates parameters that capture various characteristics of automobile usage known to influence claiming behaviour. By augmenting classical frequency models with telematics information derived from usage-based insurance policies, the study demonstrates the substantial impact of both the distance travelled and driver habits on the expected number of accidents. This, in turn, directly influences the premium of insurance coverage.
Similarly, the investigation carried out by [35] studied the integration of second-by-second GPS data into auto insurance pricing structures. The study showcases the effective use of real-time GPS trajectory data obtained from a traffic APP and survey data. Notably, the research underscores the potential of incorporating unique contextual-based risk measurements, extending beyond traditional UBI factors. By comparing driving speed to others on the same road segment, the study introduces an innovative approach to assessing risk. The findings reveal compelling correlations between driving behaviours, such as hard brakes, hard starts, peak-time travel, speeding, deviations from traffic flow, and accident rates. These insights provide valuable guidance for insurance companies venturing into the UBI sector, aiding in the establishment of accurate auto insurance premium rates and the mitigation of underwriting losses. In our research, we delved into the correlation between various telematics variables and their influence on the relative risk associated with conventional rating factors.
In the work conducted by [36], the research scrutinizes the predictive capabilities of telematics data features extracted through speed acceleration heatmaps. The telematics covariates under investigation, which involved K-means classification, principal components, and bottleneck activations from a neural network, exhibited a superiority in out-of-sample prediction for claim frequencies when compared to conventional factors such as driver’s age, as demonstrated in the conducted case study. The numerical examples presented in the study strongly advocate for the adoption of these telematics covariates, especially emphasizing the efficacy of the first principal component and bottleneck activations, as a recommended approach to enhance the precision of car insurance pricing models. Concurrently, the study presented in [37] introduces innovative pricing strategies for German car insurance, specifically designed to seamlessly integrate telematics data into actuarial pricing decisions. The outcomes of the study reveal a substantial impact on pricing decisions, enabling the implementation of usage-based insurance premiums. This is achieved by applying discounts or surcharges based on driving behaviour, marking a significant shift in the traditional approach to car insurance pricing. The study not only showcases the feasibility of incorporating telematics data but also emphasizes its potential to revolutionize the landscape of actuarial decision-making in the context of car insurance.
In a complementary vein, the study conducted by [38] analyzed UBI adopters and discovered the impact of UBI on driving behaviour. The study revealed a 21% reduction in daily average hard-brake frequency after six months of UBI adoption, indicating a substantial improvement in the safety of driving habits. Furthermore, the research identified heterogeneous effects across demographic groups, highlighting that younger drivers and women tend to exhibit more significant improvements. Moreover, the correlation between negative feedback and economic incentives with enhanced driving behaviour emphasizes the societal benefits of UBI, particularly in the realm of improved road safety. These insights collectively contribute to the multifaceted impacts of UBI on both insurers and drivers. As the study demonstrates that UBI pricing strategies may rely on traditional rating variables, particularly those used for regulatory purposes, it is crucial to explore the extent of these influences.
While there exists limited research focusing on addressing the regulatory impact of UBI on auto insurance pricing, a notable study by [6] fills this gap by investigating the effects of UBI on private passenger auto liability insurers’ underwriting performance. The findings of this study reveal that UBI significantly enhances underwriting performance by reducing the loss ratio, especially among early adopters. Notably, as UBI matures, its benefits become more pronounced, leading to a notable increase in market share for early adopters and an overall positive impact on return on assets and return on equity. This underscores the growing importance of UBI in reshaping the landscape of auto insurance pricing, with tangible benefits emerging over time.

3. Materials and Methods

3.1. Synthetic Data

Telematics or GPS data, while valuable for quantifying relationship between driving behaviour and insurance claims, are often limited in scope for research purposes. This is due to several factors: access restrictions, privacy concerns, and the proprietary nature of telematics data, which are typically owned by insurance companies. As a result, researchers may only have access to small or incomplete datasets that are insufficient for robust statistical analysis or for developing highly predictive models. To address this limitations, researchers often resort to data augmentation techniques such as oversampling or using synthetic data generation methods like the Synthetic Minority Oversampling Technique (SMOTE). These techniques are particularly useful when the available data are imbalanced or too small for effective model training. Oversampling involves replicating data observations from underrepresented classes to balance the dataset, while SMOTE generates new synthetic data points by interpolating between existing observations. In the case of telematics data, SMOTE can be applied to create synthetic driving behaviour patterns that mimic real-world variations, helping to enlarge the dataset and reduce the risk of model overfitting. By using synthetic data generation techniques, researchers can create larger, more balanced datasets, which not only improve model training but also enhance the reliability of predictive models. This approach allows for better generalization, making it possible to test models on a more comprehensive range of driving behaviours, and, ultimately, improving the robustness of research findings.
The dataset used in this research originates from a synthetic dataset comprising 100,000 auto usage-based insurance (UBI) policies, as provided by [39]. This synthetic dataset encompasses three distinct types of variables: traditional policy variables, driving-pattern-related variables, and response variables, specifically claim amounts and claim occurrence. Across the dataset, there are 52 variables, encompassing essential factors such as insured age group, car age range, annual mileage driven, and credit scores. The dataset also includes telematics variables that detail driving patterns, including metrics such as acceleration and braking speeds, as well as the daily percentage of driving within a week. Additionally, certain variables capture the intensity of left or right turns. In the dataset, each telematics data variable is linked to a specific level of a corresponding telematics risk factor. These variables are expected to demonstrate correlations, as they authentically align with the same underlying risk factor. For example, the percentage of driving is presented in terms of the weekly distribution of driving relative frequencies for each day within a week. Consequently, we end up with seven correlated variables, each capturing the relative frequency of driving for a specific day. Analogous variables include those related to acceleration, braking, the percentage of driving over an extended period during a day, the percentage of driving during morning and afternoon rush hours, as well as the intensities of left and right turns. In essence, these variables share commonality in their measurement due to their connection to broader driving behaviours and patterns. These data variables mentioned above require dimension reduction due to their correlation among levels within the factor. The process of dimension reduction and how we deal with the reduced levels of the telematics variable is illustrated in Figure 1. There is also a set of single-level factor variables such as annual percentage driven, total mileage driven, and average of day driven per week.
It is essential to highlight that this dataset follows a transactional structure, where each record denotes the presence or absence of a claim. In cases where a claim is reported, the associated record includes the corresponding loss amount. For a more detailed exploration of telematics variables and a comprehensive descriptive analysis of the dataset’s variables, interested readers can refer to [39].

3.2. Non-Negative Sparse PCA

In statistical multivariate analysis, NSPCA extends classical PCA by imposing constraints of non-negativity and sparsity on the principal components. Another related method is Non-negative Sparse Matrix Factorization (NSMF), designed to decompose matrices while adhering to sparsity and non-negativity constraints. However, the PCA-based approach is deemed more advantageous than matrix decomposition in this research. This is attributed to its focus on concentrating variation into major principal components, facilitating an additional layer of sparsity control. This paper utilized such an approach to achieve dimension reduction, as detailed in the subsequent sections. Let us assume X ( i ) to be the data matrix correspond to the ith group of telematics data variables, where i = 1 , 2 , 7 in this work. They include daily percentage driven per week, acceleration intensity, braking intensity, rush hours percentage driven, left turn intensity, and right turn intensity. The objective function for NSPCA is given by
minimize X ( i ) Z ( i ) D ( i ) F 2 + λ D ( i ) 1 , subject to Z ( i ) Z ( i ) = I , where Z ( i ) , D ( i ) 0 .
Here, · F denotes the Frobenius norm, and · 1 is the L 1 -norm. Each X ( i ) has n samples and p ( i ) variables, where n = 100,000 and the value of p ( i ) depends on i. For instance, if p ( i ) refers to daily percentage driven per week, then it is equal to 7. Z ( i ) is the matrix of principal components scores of X ( i ) . D ( i ) is the matrix of principal component loadings. λ is the sparsity parameter. When Z ( i ) and D ( i ) have no non-negativity constraint, the given minimization problem is reduced to sparse PCA, and if, further, there is sparsity control, which means that λ = 0 , then the problem just becomes a traditional PCA via matrix decomposition. The introduction of sparsity to regulate the cardinality of principal component loading represents the initial layer in our approach. Additionally, we incorporate a secondary layer of sparsity control, where we selectively retain the most significant principal components. These retained components serve as the foundation for our telematics-based predictors, which, in turn, form the basis for the predictive models we have developed. To obtain these non-negative sparse principal components, the n s p r c o m p [40] R 4.3.0 software package (see https://www.r-project.org, created originally by by John Chambers and colleagues) was used for implementation.
As we mentioned, NSPCA is an extension of classical PCA that has been widely adopted for high dimensional data analysis, especially when interpretability is a priority. The non-negativity constraint allows for more meaningful interpretations of the principal components, especially when dealing with data that should not be negative values, such as telematics variables. Applications of NSPCA have been explored in fields like genomics, image processing, and finance, where it has proven useful in extracting interpretable components from complex datasets while reducing dimensionality [41,42]. The incorporation of sparsity further enhances the model’s utility by identifying a subset of factor levels that are most influential, thereby reducing noise and simplifying interpretation. This is especially advantageous in telematics data, where only a few driving behaviours may meaningfully impact claim outcomes.
Despite its advantages, NSPCA does introduce certain computational and practical challenges. One key limitation is its computational complexity, especially when applied to large datasets. The iterative optimization process required to enforce both non-negativity and sparsity can be computationally intensive, particularly when dealing with high dimensional data. This is why we chose to reduce the dimensionality within each individual telematics variable, rather than combining all telematics variables together. Moreover, the choice of the sparsity parameter, λ , is crucial to balance between sparsity and model fit, and determining an optimal λ often involves cross-validation, adding further computational demands. The Expectation Maximization (EM) algorithm [40] was proposed to address some of these challenges, but the convergence of this method can be slow in practice, especially for large datasets.
Another practical consideration is the model interpretability–performance trade-off [43]. While the non-negativity constraint enhances the interpretability of the principal components, it may limit the model’s ability to capture more subtle relationships between variables. This trade-off has been discussed in various applications, with studies showing that PCA can sometimes result in a loss of predictive power [44] compared to more flexible models that do not impose such constraints. In our application, we mitigate this risk by incorporating an additional layer of sparsity control, retaining only the most significant components, which ensures that the model focuses on the most critical driving behaviours influencing claim risk.

3.3. Modelling with Traditional Policy Variables

Auto insurance pricing based on traditional rating variables involves assessing risk factors such as age group, insured gender, car age, car use, and marital status. Insurance companies estimate the risk relativities associated with these variables to establish benchmarks for pricing. The traditional approach relies on historical data and generalized assumptions to determine the impact of these variables on the likelihood and severity of claims. This method provides a standardized framework for pricing, with premiums adjusted based on the perceived risk associated with individual drivers and their vehicles. In predicting claim frequency using traditional policy variables, the following generalized linear model is constructed:
O c c u r r e n c e I n s u r e d . A g e G r o u p + I n s u r e d . s e x + C a r . a g e . r a n g e + M a r i t a l + C a r . u s e   + C r e d i t . s c o r e . r a n g e + L o c a t i o n . C l u s t e r + A n n u a l . m i l e s . d r i v e . r a n g e + Y e a r s . n o c l a i m s . r a n g e .
Similarly, the model used to predict the claim amount is give as follows:
A m o u n t I n s u r e d . A g e G r o u p + I n s u r e d . s e x + C a r . a g e . r a n g e + M a r i t a l + C a r . u s e   + C r e d i t . s c o r e . r a n g e + L o c a t i o n . C l u s t e r + A n n u a l . m i l e s . d r i v e . r a n g e + Y e a r s . n o c l a i m s . r a n g e .
It should be noted that all traditional policy variables are grouped and their estimated coefficients are for each level with the factor. The purpose of grouping is to improve the creditability of variables used in pricing and maintain a relative stability for each risk level.

3.4. Modelling with Telematics Data Only

When auto insurance pricing is exclusively based on telematics data, the insurance industry undergoes a significant shift towards a more dynamic and personalized model. Telematics data, encompassing real-time information about driving behaviour collected through devices installed in vehicles, take precedence as the primary factor influencing insurance premiums. This approach allows for a more detailed assessment of risk by focusing on individual driving patterns and habits. One of the notable implications of relying solely on telematics data is the heightened level of personalization in insurance premiums. Pricing becomes intricately tailored to the specific driving behaviour of each individual, potentially resulting in lower premiums for safe drivers and higher costs for those engaging in riskier behaviours. This departure from traditional models emphasizes a more nuanced understanding of risk based on real-time data.
The models used to predict claim probability and claim amounts become the following based on the data we used:
O c c u r r e n c e A n n u a l . p c t . d r i v e n + T o t a l . m i l e s . d r i v e n + P c t . d r i v e 1 + P c t . d r i v e 2 + P c t . d r i v e 3 + P c t . d r i v e 4 + P c t . d r i v e . h r s + R u s h . a m p m + A v g d a y s . w e e k + A c c e l 1 + A c c e l 2 + B r a k e 1 + B r a k e 2 + L e f t . t u r n 1 + L e f t . t u r n 2 + R i g h t . t u r n 1 + R i g h t . t u r n 2 ,
where variables P c t . d r i v e 1 , P c t . d r i v e 2 , P c t . d r i v e 3 , and P c t . d r i v e 4 represent the first four PCs retained of daily percentage of driving over a week; A c c e l 1 and A c c e l 2 are the first two PCs retained of the acceleration levels; the variables B r a k e 1 and B r a k e 2 correspond to the first two PCs of braking level. Similarly, the variables L e f t . t u r n 1 and L e f t . t u r n 2 represent the first two PCs of left turn intensity, while R i g h t . t u r n 1 and R i g h t . t u r n 2 represent the first two PCs of right turn intensity. The dimension reduction via non-negative sparse PCA makes the model more interpretable, as the number of variables included in the model has been significantly reduced. Similarly, the model used to predict the claim amount is give as follows:
A m o u n t A n n u a l . p c t . d r i v e n + T o t a l . m i l e s . d r i v e n + P c t . d r i v e 1 + P c t . d r i v e 2 + P c t . d r i v e 3 + P c t . d r i v e 4 + P c t . d r i v e . h r s + R u s h . a m p m + A v g d a y s . w e e k + A c c e l 1 + A c c e l 2 + B r a k e 1 + B r a k e 2 + L e f t . t u r n 1 + L e f t . t u r n 2 + R i g h t . t u r n 1 + R i g h t . t u r n 2 ,
Moreover, the use of telematics data enables real-time adjustments in pricing. With continuous updates from telematics devices, insurers can promptly adapt their pricing structures to reflect any changes in driving habits. This responsiveness contributes to a more adaptive and immediate approach to assessing and managing risk factors compared to traditional models that rely on historical data and assumptions. While the move towards telematics-based pricing brings several benefits, challenges and considerations exist. Issues related to privacy, data security, and potential bias in interpreting telematics data must be carefully addressed. Additionally, the transition from traditional rating variables to a telematics-centric approach requires thoughtful consideration of regulatory frameworks and industry standards. In essence, the industry’s evolution towards telematics-based pricing represents a broader shift towards a more data-driven, personalized, and responsive model in the auto insurance landscape.

3.5. Modelling with Traditional Policy Variables and Telematics Variables

The last modelling strategy was to include both traditional policy variables and telematics variables. This hybrid approach offers a more comprehensive and accurate assessment of individual risk profiles. By integrating demographic factors like age and gender with real-time driving behaviour insights, insurers can create better models that reflect both historical characteristics and current habits. This precision in risk assessment contributes to fairer and more personalized insurance premiums. Moreover, the inclusion of traditional rating variables alongside telematics data enhances the overall fairness and equity of the pricing model. This is particularly significant in ensuring that insurance premiums are not solely determined by a specific set of factors, mitigating potential biases. The combination of traditional and telematics data creates a more balanced and objective approach, aligning with industry efforts to promote transparency and fairness in insurance pricing. The integration of traditional and telematics data also enhances the robustness of predictive models against data gaps and uncertainties. In situations where telematics data may be incomplete or unavailable, traditional variables act as stabilizing factors, ensuring the reliability and adaptability of the pricing model. This resilience is crucial for insurers to navigate changing circumstances without compromising the accuracy of risk assessments.
From a regulatory perspective, the impact of combining traditional and telematics data on auto insurance rate regulation is significant. Many regulators require the consideration of specific traditional rating variables. By incorporating these alongside telematics data, insurers can ensure compliance with regulatory standards. The comprehensive approach aligns with regulatory expectations while allowing insurers to benefit from the valuable insights derived from driving behaviour data. In the analysis section, we will explore the relativity patterns of traditional policy variables based on the prediction results of claim occurrence and claim amount for each policyholder.
In this study, we used a generalized linear model to analyze the data, opting for a Poisson distribution to represent the number of claims and a Gamma distribution to account for the variability in claim amounts. We utilized the identity link function for the Poisson distribution and applied a log-link function for the Gamma distribution, aligning with common practices in auto insurance rate regulation for modelling claim frequency and severity. For telematics data, this is a prudent strategy for several reasons. Firstly, telematics data often exhibit characteristics such as skewness and heteroscedasticity in loss amounts, which are well-captured by the flexibility of the gamma distribution. This distribution allows for modelling non-negative continuous data with a right-skewed distribution, making it suitable for capturing the variability in loss amounts observed in insurance claims data. Secondly, the Poisson distribution is commonly employed for modelling count data, such as the frequency of insurance claims in telematics datasets. It naturally accommodates the discrete nature of claim counts and is particularly effective when dealing with rare events, which is often the case in insurance claims data. By incorporating these distributions into a GLM framework, we can effectively model both the frequency and severity of claims simultaneously, providing a comprehensive understanding of the underlying risk factors in telematics data. This approach enables insurers to better assess and manage risks, leading to more accurate pricing and improved decision-making processes. Although alternative error functions were not explored in this work, the rationale behind our model selection is rooted in its practical applicability within the insurance industry.

3.6. Some Discussions on Methodologies

In predictive modelling, achieving a manageable model size is pivotal for gaining deeper insights into the contribution of each variable to the model-building process. This underscores the primary rationale behind our decision to implement dimension reduction on each telematics factor, characterized by different levels. This strategic approach is aimed at effectively reducing the model’s complexity and facilitating a better understanding of its contributions of each factor level.
NSPCA and NSMF are two methods often employed in machine learning and high dimensional data analysis that share certain fundamental principles. Both methods incorporate a non-negativity constraint, ensuring that the derived components or factors are non-negative, aligning with practical interpretations in various real-world problems. Additionally, both NSPCA and NSMF leverage a sparsity constraint, aiming to identify a set of components or factors with a substantial number of zero elements. This sparsity control is advantageous for promoting interpretability and efficiency in representing complex datasets like telematics data. Moreover, both techniques often find application in dimensionality reduction, seeking to condense the information in the data into a reduced set of non-negative and sparse components that capture essential patterns. Nonetheless, despite these commonalities, NSPCA and NSMF diverge in their primary objectives and structures. NSPCA focuses on extracting a set of sparse and non-negative principal components that maximize the variance of the data. These components are orthogonal to each other, emphasizing the capture of uncorrelated directions of maximum variance. In contrast, NSMF involves decomposing a given matrix into two non-negative matrices. The objective is to find a sparse and non-negative representation of the input data, where the components do not need to be orthogonal. This reflects a different approach, emphasizing additive combinations of basis vectors to approximate the input matrix.
In this research, NSPCA stood out as a more suitable technique for the dimension reduction of telematics data used in the predictive modelling of claim counts and claim amounts for several reasons. Our telematics data involve variables such as driving speed, acceleration, and other behavioural factors, which inherently should not have negative values. The non-negativity constraint enforced by NSPCA aligns seamlessly with the nature of telematics data, preserving their inherent characteristics in the process of dimensionality reduction. Moreover, the sparsity constraint imposed by NSPCA is particularly advantageous in handling the high-dimensional telematics data. This constraint promotes a sparse set of principal components, enabling the identification of a subset of influential features. In the context of telematics data, where specific driving behaviours may significantly impact claims, having sparse principal components facilitates a more focused and interpretable representation of the telematics data. The interpretability of the extracted components is a crucial aspect, especially when dealing with telematics data for insurance purposes. NSPCA’s ability to induce sparsity enhances interpretability, allowing actuaries and insurance professionals to attribute significance to specific driving behaviours. This feature contributes to a better understanding of the impact of various factors on claim counts and amounts. In addition to interpretability, the dimension reduction capabilities of NSPCA are particularly relevant for telematics data. By reducing dimensionality while preserving the most relevant features, NSPCA aids in building predictive models that avoid overfitting and enhance generalization performance. By focusing on capturing the most significant variations in the data, this technique identifies key driving behaviours that contribute to a higher likelihood of claims or affect claim amounts. This emphasis on influential factors can contribute to the development of more accurate predictive models based on telematics data.
However, the application of NSPCA introduces a potential trade-off between enhancing interpretability and maintaining predictive power. While NSPCA enforces non-negativity, which aligns well with the nature of telematics variables, this restriction may limit the flexibility of the model. In many machine learning methods, data transformations like standardization, where variables are centred around the mean and scaled to unit variance, are common pre-processing steps that allow variables to assume both positive and negative values. Such transformations can improve interpretability by representing deviations from the mean and enabling a more balanced understanding of variable contributions. The non-negativity constraint in NSPCA may, however, hinder the ability to capture such variations, potentially leading to a reduction in predictive power. Standardized variables may reveal patterns or relationships that could be obscured when constrained to non-negative values. To address this issue, one possible remedy is to reverse the data transformation (i.e., de-standardize) prior to applying NSPCA. By transforming the data back to their original non-negative scale after standardization, the method’s non-negativity constraint can be upheld while still benefiting from pre-processing techniques that aid in balancing and scaling the data.
Alternatively, NSPCA could be applied selectively, avoiding pre-processing steps that introduce negative values where this constraint would conflict with model assumptions. This selective approach would ensure that NSPCA’s strengths in promoting interpretability through sparsity are maintained without undermining the potential for predictive accuracy. Ultimately, careful consideration must be given to whether the enhanced interpretability of NSPCA compensates for any loss in predictive power, with additional strategies employed to mitigate the effect of non-negativity restrictions on standardized data. This balance between interpretability and predictive accuracy is crucial and should be tailored to the specific goals and characteristics of the data being analyzed.

4. Results

In this section, we provide a comprehensive analysis of the results obtained through our proposed methodology. Table 1 and Table 2 showcase the principal component loading vectors for each telematics variable across different principal components. As there are seven distinct telematics variables, each with varying levels, the maximum number of components varies accordingly. An insightful observation from the results presented in the tables is the discernible impact of imposed sparsity, resulting in a reduction in the cardinality of each vector. This reduction facilitates the exclusion of less significant levels within each variable, enabling a focused capture of the primary effects from more substantial levels. Furthermore, the obtained principal components exhibit clear differentiation, aiding us in the selection of the final retained principal component for predictive modelling of both claim counts and claim amounts. Overall, these findings underscore the efficacy of our methodology in distilling meaningful insights from complex telematics data, thereby enhancing the predictive accuracy and interpretability crucial for informed decision-making in insurance analytics.
The telematics variable representing the percentage of daily driving per week is segmented into seven levels, each corresponding to a specific day of the week. Through the application of non-negative sparse PCA, our analysis reveals that the primary variation (approximately 75%) is captured within the first four components. By leveraging the sparsity constraint on the principal component loading vector, we discern distinct driving patterns: the first component characterizes weekend driving, the second elucidates patterns observed on Wednesday and Thursday, the third predominantly reflects Friday’s patterns, and the fourth corresponds to Monday and Tuesday driving behaviours. This clustering by non-negative sparse PCA effectively categorizes driving habits into four distinct groups, each aligned with specific days or combinations of days within the week. Such insights offer a nuanced understanding of driving behaviours over the course of a week. Two additional driving metrics pertain to extended durations and rush-hour travel. Extended driving is categorized into three levels: more than 2 h, more than 3 h, and more than 4 h. Meanwhile, rush-hour driving encompasses both morning and afternoon peak periods. Our analysis reveals that maintaining multiple levels for these factors is unnecessary. Through a weighted average of the percentage of driving across the original levels, we can consolidate these two factors into single-level constructs. These newly derived factors serve as refined predictors for predictive modelling of claim counts and claim amounts. This streamlined approach enhances the efficiency of our predictive models by capturing the essence of extended and rush-hour driving in a more concise and representative manner.
Acceleration and braking intensity serve as pivotal telematics variables aimed at capturing driving behaviour. Within the dataset, each of these variables is delineated into six levels of intensity, reflecting the degree to which a vehicle accelerates or decelerates. Higher intensities in acceleration or deceleration correlate with increased driver risk. Remarkably, both acceleration and deceleration exhibit a distinct threshold of intensity, typically falling between 9 and 11 miles. Consequently, the driving patterns associated with these actions can be effectively categorized into two distinct levels: intensities below 10 miles and intensities above 10 miles. This delineation suggests that merely two principal components, collectively explaining approximately 90% of the data variation, suffice for subsequent predictive modelling pertaining to these variables. Comparable to acceleration and deceleration, the intensity of left and right turns serves as a crucial metric for gauging driving safety. Analysis indicates a discernible cut-off range between 9 and 10 miles per hour, indicating the need for two principal components to effectively encapsulate these data. Consequently, a new factor is constructed, with intensity below 10 miles per hour constituting one level, and intensity above 10 miles per hour forming the other. Together, these two principal components elucidate nearly 100% of the variation in turn intensity, providing comprehensive insights into driving behaviours.
We further examined how various pricing strategies impact traditional policy variables, encompassing years of no claims, annual mileage driven, credit scores, insured age, and car age. Our focus on these particular variables stemmed from their availability within the dataset. Figure 2 and Figure 3 present a comparison of claim probability and loss costs concerning traditional policy variables between empirical estimates and predicted values derived from a model incorporating only telematics variables. The telematics variables utilized for forecasting claim counts and loss costs comprise a reduced set obtained through non-negative sparse PCA. Notably, we discern significant disparities for specific levels within the policy variables. For instance, concerning years of no claims surpassing 40, the model overestimates loss costs. Similarly, for loss costs linked to credit scores, those associated with lower scores, such as 700 and below, are underestimated. A similar situation arises with insured age and car age, wherein certain levels witness substantial underestimation while others experience overestimation within the policy variable.
A finding emerges from the analysis: the predictive model relying solely on telematics variables demonstrates superior performance in recovering empirical patterns for predicting claim counts compared to loss prediction. This observation suggests the potential capacity of telematics variables to reflect the influence of driving patterns on traditional policy variables, thus underscoring the critical necessity of factoring in such impacts when regulating insurance rates. However, it becomes evident that solely considering telematics variables in predicting claim counts or claim amounts may not suffice, given the substantial deviation from empirical results. This highlights the complexity of insurance risk assessment and the importance of incorporating multiple factors beyond just telematics data to achieve more accurate predictions and better inform rate-making practices.
Telematics variables are directly related to driving behaviour, such as acceleration, braking, and cornering. By analyzing these driving behaviours, insurers can more accurately assess the risk profile of individual drivers and predict the likelihood of future accidents, thereby influencing claim frequency predictions. However, when it comes to modelling claim amounts, telematics data may have a lesser impact, as factors influencing severity, such as the cost of repairs or medical expenses, are often influenced by external variables beyond driving behaviour. While telematics data can still provide valuable insights into driving habits that may indirectly affect claim amounts, such as the frequency of highway driving, its primary strength lies in its ability to improve the accuracy of claim frequency predictions by identifying risky driving behaviours.
Figure 4 and Figure 5 illustrate the comparison among empirical estimates, predictions made by the model using only traditional policy variables, and those obtained from the model incorporating both traditional policy variables and a reduced set of telematics data. Notably, we observe no significant deviation in either predicted claim frequency or predicted loss cost. This observation underscores the importance of including traditional policy variables when forecasting both claim counts and claim amounts. Given that telematics variables play a role in determining individual premiums and their predicted outcomes significantly impact the distribution of claim probabilities or claim amounts for traditional policy variables, we infer that predictive modelling using both types of variables surpasses alternative pricing strategies.
Table 3 and Table 4 present the estimated model coefficients and their associated standard deviations for both predictive models used in predicting claim probability and claim amounts. These models incorporate both traditional policy variables and a reduced set of telematics variables. Our analysis reveals that the inclusion of insured gender and marital status does not yield a significant impact on either claim probability or claim amount predictions. However, variables such as insured age, car age, credit score, location, annual miles driven, and years of no claims exhibit statistically significant effects when predicting claim frequency. Consequently, this set of variables is streamlined to car age, credit score, and location. This finding suggests that, while more traditional policy variables play a significant role in predicting claim frequency, claim amounts are predominantly influenced by car age, credit score, and location, aligning with common practices in auto insurance pricing, where the most influential variables are identified. Moreover, all telematics variables, except for right turn, demonstrate statistical significance. Additionally, there are instances where certain levels within a variable may not appear statistically significant compared to the base level. This observation hints at the possibility of further reducing the telematics variable set. However, given their manageable size, a further reduction could potentially diminish the predictive power of the model. Thus, despite the potential for further refinement, maintaining the current set of telematics variables is advisable to preserve the model’s effectiveness in predicting claim frequency and claim amounts.
After computing and comparing the risk relativities using loss cost based on the model coefficients obtained with and without the inclusion of telematics data, we present the findings in Figure 6 and Table 5. Our analysis reveals noteworthy implications regarding the impact of telematics variables on risk relativities. Surprisingly, the inclusion of telematics variables in the predictive models for claim probability and claim amount does not result in significant changes in loss cost relativities for most factor levels, with the exception of a few levels of location and the majority of levels of the annual miles driven variable.
Specifically, we observe a significant increase in risk relativity for most levels following the inclusion of telematics variables, particularly in relation to the usage of vehicles. This finding sheds light on the importance of vehicle usage patterns in determining risk profiles and underscores the value of telematics data in refining risk assessment methodologies. By revealing variations in risk relativities associated with different levels of location and annual miles driven, the study highlights the nuanced impact of telematics variables on insurance risk, offering insights that can inform more tailored pricing strategies and underwriting practices.
Overall, while the inclusion of telematics variables may not lead to substantial changes in loss cost relativities across most factor levels, the observed increase in risk relativity for certain variables underscores the significance of telematics data in enhancing risk assessment accuracy and refining insurance pricing models. These findings provide valuable insights for insurers seeking to leverage telematics technology to better understand and mitigate insurance risk, ultimately leading to more informed decision-making and improved competitiveness in the insurance market.

5. Some Discussions on Statistical Analysis

This study provides an in-depth exploration of the influence of telematics variables in conjunction with traditional rating variables within the framework of rate regulation. While some variables, such as gender and marital status, did not emerge as statistically significant in the model, their inclusion is crucial for assessing pricing fairness by estimating risk relativity across various demographic groups. From a regulatory perspective, retaining these variables is essential, as they ensure that risk relativity is adequately captured within the model. Although these variables may not strongly enhance predictive power, they play a critical role in helping maintain fairness in pricing, which is a central objective in rate regulation. It is important to distinguish between the objectives of rate regulation and those of insurance pricing models used by insurance companies. In pricing models, where numerous rating variables from diverse sources are employed for determining insurance premiums, variable selection is a key step in reducing redundancy and enhancing efficiency. However, the goal of rate regulation is not merely to improve predictive performance but to establish benchmark values for regulatory risk factors. These benchmarks, often based on traditional policy variables selected through actuarial best practices, help ensure that pricing remains equitable and consistent across different demographic groups, or other criteria.
In this study, instead of focusing on variable selection for inclusion or exclusion, model selection criteria such as the Akaike Information Criterion (AIC) and log-likelihood were used to evaluate the impact of different model configurations, while keeping the same set of predictors. This approach allowed us to assess model performance without compromising the integrity of key variables that are essential for regulatory purposes.
In many European countries, the use of gender as a rating factor in insurance pricing is prohibited, and a similar approach is followed in Canada. However, in the Canadian context, gender and marital status are combined with age groups to form a rating factor known as ’Type of Use’, which is widely and lawfully employed by auto insurance companies. From a rate regulation perspective, the inclusion of these variables is crucial as it allows for the derivation of their risk relativity.
Telematics variables may also correlate with traditional policy variables such as gender and marital status. For example, male drivers are often associated with higher speeds when making turns and may exhibit more aggressive driving behaviours than female drivers. These telematics variables offer significant promise for the regulation of UBI and represent a new frontier in understanding how driving pattern data can be integrated into insurance rate regulation. This study is one of the first attempts to investigate the impact of telematics variables on modelling claim frequency and severity specifically for rate regulation. The telematics variables employed in this study demonstrate considerable potential for regulating UBI.

6. Impact of Telematics Data on Pricing and Rate Regulation

The study of how telematics data affect pricing strategies and rate regulation framework from a risk and insurance perspective yields several significant impacts. To start, it enhances risk assessment outcomes by providing insurers and regulators with a more detailed understanding of individual driving behaviour. This enhancement allows for more accurate discrimination between high- and low-risk drivers, leading to more precise and fair pricing and underwriting decisions. Additionally, incorporating telematics data facilitates dynamic pricing models, such as usage-based or pay-as-you-drive insurance, where premiums are adjusted based on actual driving patterns. This approach promotes actuarial fairness in insurance pricing by aligning premiums charged more closely with individual risk profiles. Moreover, the integration of telematics data enables insurers to enhance loss prevention efforts. By identifying risky driving behaviours like speeding or harsh braking, insurers can proactively mitigate risks. Incentives or rewards for safe driving practices can be offered, ultimately reducing the frequency and severity of claims.
However, the use of telematics data in pricing and rate regulation also brings forth regulatory considerations. Regulators must balance the benefits of technology innovation and risk discrimination with consumer protection and fairness. This may involve developing guidelines or standards for the collection, usage, and protection of telematics data to ensure transparency and accountability in insurance practices. Furthermore, challenges and ethical considerations arise with the use of telematics data. Concerns about data privacy, consent, and potential biases in algorithmic decision-making may need to be addressed when implementing telematics-based pricing. Insurers and regulators must navigate these issues to maintain consumer trust and confidence in telematics-based insurance products.
On the other hand, the utilization of a predictive model incorporating telematics data for predicting claim counts and claim amounts can significantly impact the auto insurance industry from an actuarial fairness perspective. Such models enable insurers to enhance risk discrimination by analyzing individual driving behaviour. This precision allows for a more equitable distribution of premiums, aligning them more closely with the actual risk posed by each insured driver. From an actuarial pricing perspective, it is expected that drivers should be responsible for their own risk. Consequently, actuarial fairness is bolstered as premiums become more reflective of expected claims costs, reducing the potential for cross-subsidization where lower-risk policyholders subsidize the premiums of their higher-risk counterparts. Moreover, telematics-based insurance models introduce incentives for safe driving practices, further reinforcing actuarial fairness. This approach not only benefits individual drivers but also contributes to fairer insurance pricing by aligning premiums with actual risk exposure. By providing insights into how premiums are calculated, policyholders can better understand and evaluate the fairness of pricing decisions, enhancing overall confidence in the insurance industry. Overall, while telematics-based predictive modelling holds promise for improving actuarial fairness in the auto insurance industry, it necessitates ongoing evaluation, refinement, and regulatory oversight to ensure equitable outcomes for all stakeholders.

7. Conclusions

Our comprehensive analysis of the results obtained through the proposed methodology sheds light on the effectiveness of leveraging telematics data in insurance analytics. The NSPCA approach employed in our study enabled us to distill meaningful insights from complex telematics data, enhancing predictive accuracy and interpretability, which is crucial for informed decision-making in the auto insurance industry. The findings from our analysis of telematics variables, such as daily driving patterns, extended driving durations, and driving intensities, provide a good understanding of driving behaviours. By categorizing driving habits into distinct groups based on days of the week and intensity levels, we gained valuable insights into the variability of risk profiles associated with different driving patterns. Additionally, our examination of various pricing strategies and their impact on traditional policy variables highlighted the complex interplay between telematics data and traditional risk factors.
The incorporation of telematics variables into the predictive modelling of claim counts and claim amounts could potentially lead to a more dynamic and personalized approach to premium calculation. The insight obtained from such predictive modelling enables insurers to differentiate between high and low-risk drivers more accurately than traditional rating factors like age, gender, and location alone. However, regulators may need to adapt existing frameworks to accommodate these advancements, ensuring that ratemaking practices remain fair and transparent for consumers. Our study revealed that, while the inclusion of telematics variables in predictive models may not lead to significant changes in loss cost relativities across most factor levels, it does offer valuable insights into the impact of vehicle usage patterns on insurance risk. The observed increase in risk relativity for certain variables underscores the importance of telematics data in refining risk assessment methodologies and informing more tailored pricing strategies.
Our study has unveiled that relying solely on telematics data for pricing could markedly alter the traditional rating variables commonly employed in current regulatory practices. Consequently, our research suggests that pricing insurance based solely on telematics data may not be advisable. Rather, we advocate for a pricing strategy that integrates telematics variables with existing rating factors. By incorporating these variables, insurers can achieve a more sustainable approach to discerning various levels of risk among drivers. This strategy ensures a balanced consideration of both traditional and telematics-derived factors, thereby fostering greater stability in insurance pricing practices. Our future work will focus on using advanced machine learning techniques, such as deep learning or ensemble methods, to develop more robust models capable of capturing the intricate relationships within telematics-derived factor and the response variables on which we focused. By refining these models, insurers can improve their accuracy and predictive power, leading to more informed decision-making in auto insurance pricing and rate regulation.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Mwongela, J.N. The Influence of Regulatory Framework on Insurance Penetration in Kenya. A Case Study of the Registered Insurance Companies in Nairobi County. Ph.D. Thesis, Kenya Methodist University, Nairobi, Kenya, 2022. [Google Scholar]
  2. Fu, L.; Wu, C.S.P. General Iteration Algorithm for Classification Ratemaking. Variance 2007, 1, 193–213. [Google Scholar]
  3. Branda, M. Optimization Approaches to Multiplicative Tariff of Rates Estimation in Non-life Insurance. Asia-Pac. J. Oper. Res. 2014, 31, 1450032. [Google Scholar] [CrossRef]
  4. Masese, V.O. Application of generalized linear Models in Pricing Usage-Based Insurance. Ph.D. Thesis, University of Nairobi, Nairobi, Kenya, 2020. [Google Scholar]
  5. Bian, Y.; Yang, C.; Zhao, J.L.; Liang, L. Good Drivers Pay Less: A Study of Usage-based Vehicle Insurance Models. Transp. Res. Part A Policy Pract. 2018, 107, 20–34. [Google Scholar] [CrossRef]
  6. Che, X.; Liebenberg, A.; Xu, J. Usage-based Insurance—i=Impact on Insurers and Potential Implications for InsurTech. N. Am. Actuar. J. 2022, 26, 428–455. [Google Scholar] [CrossRef]
  7. Śliwiński, A.; Kuryłowicz, Ł. Usage-based Insurance and Its Acceptance: An Empirical Approach. Risk Manag. Insur. Rev. 2021, 24, 71–91. [Google Scholar] [CrossRef]
  8. Li, H.J.; Luo, X.G.; Zhang, Z.L.; Jiang, W.; Huang, S.W. Driving Risk Prevention in Usage-based Insurance Services Based on Interpretable Machine Learning and Telematics data. Decis. Support Syst. 2023, 172, 113985. [Google Scholar] [CrossRef]
  9. Cunha, L.; Bravo, J.M. Automobile Usage-Based-Insurance: Improving Risk Management Using Telematics Data. In Proceedings of the 2022 17th Iberian Conference on Information Systems and Technologies (CISTI), Madrid, Spain, 22–25 June 2022; IEEE: Piscatvey, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  10. Vavouranakis, P.; Panagiotakis, S.; Mastorakis, G.; Mavromoustakis, C.X. Smartphone-based telematics for usage based insurance. In Advances in Mobile Cloud Computing and Big Data in the 5G Era; Springer: Cham, Switzerland, 2017; pp. 309–339. [Google Scholar]
  11. Guillen, M.; Nielsen, J.P.; Pérez-Marín, A.M. Near-miss Telematics in Motor Insurance. J. Risk Insur. 2021, 88, 569–589. [Google Scholar] [CrossRef]
  12. El hafidy, A.; Rachad, T.; Idri, A.; Zellou, A. Gamified Mobile Applications for Improving Driving Behavior: A Systematic Mapping Study. Mob. Inf. Syst. 2021, 2021, 6677075. [Google Scholar] [CrossRef]
  13. Coetzer, B.A. Usage-Based Insurance: Nudging towards Responsible Driving. Ph.D. Thesis, Stellenbosch University, Stellenbosch, South Africa, 2022. [Google Scholar]
  14. Henckaerts, R.; Antonio, K. The Added Value of Dynamically Updating Motor Insurance Prices with Telematics Collected Driving Behavior Data. Insur. Math. Econ. 2022, 105, 79–95. [Google Scholar] [CrossRef]
  15. Goniewicz, K.; Goniewicz, M.; Pawłowski, W.; Fiedor, P. Road Accident Rates: Strategies and Programmes for Improving Road Traffic Safety. Eur. J. Trauma Emerg. Surg. 2016, 42, 433–438. [Google Scholar] [CrossRef]
  16. Elvik, R.; Vadeby, A.; Hels, T.; Van Schagen, I. Updated Estimates of the Relationship Between Speed and Road Safety at the Aggregate and Individual levels. Accid. Anal. Prev. 2019, 123, 114–122. [Google Scholar] [CrossRef] [PubMed]
  17. Fan, C.K.; Wang, W.Y. A Comparison of Underwriting Decision Making Between Telematics-enabled UBI and Traditional Auto Insurance. Adv. Manag. Appl. Econ. 2017, 7, 17. [Google Scholar]
  18. Yan, C.; Ou, Z.; Liu, W.; Xu, Q. Research on UBI Auto Insurance Pricing Model Based on Adaptive SAPSO to Optimize the Fuzzy Controller. Int. J. Fuzzy Syst. 2020, 22, 491–503. [Google Scholar] [CrossRef]
  19. Holzapfel, J.; Peter, R.; Richter, A. Mitigating Moral Hazard with Usage-based Insurance. J. Risk Insur. 2023. [Google Scholar] [CrossRef]
  20. Eling, M.; Kraft, M. The Impact of Telematics on The Insurability of Risks. J. Risk Financ. 2020, 21, 77–109. [Google Scholar] [CrossRef]
  21. Chaba, R. Influence of Telematics of UBI Insurance on The Management of The Fleet of Company Vehicles. Arch. Motoryz. 2021, 92, 69–82. [Google Scholar] [CrossRef]
  22. Wiegers, W.A. The Use of Age, Sex, and Marital Status as Rating Variables in Automobile Insurance. Univ. Tor. Law J. 1989, 39, 149–210. [Google Scholar] [CrossRef]
  23. Lemaire, J.; Park, S.C.; Wang, K.C. The Use of Annual Mileage as a Rating Variable. Astin Bull. J. IAA 2016, 46, 39–69. [Google Scholar] [CrossRef]
  24. Verbelen, R.; Antonio, K.; Claeskens, G. Unravelling The Predictive Power of Telematics Data in Car Insurance Pricing. J. R. Stat. Soc. Ser. C Appl. Stat. 2018, 67, 1275–1304. [Google Scholar] [CrossRef]
  25. Maillart, A. Toward an Explainable Machine Learning Model for Claim Frequency: A Use Case in Car Insurance Pricing with Telematics Data. Eur. Actuar. J. 2021, 11, 579–617. [Google Scholar] [CrossRef]
  26. Duong, T.D.; Duong, V.N. Non-negative Sparse Principal Component Analysis for Multidimensional Constrained Optimization. In Proceedings of the PRICAI 2008: Trends in Artificial Intelligence: 10th Pacific Rim International Conference on Artificial Intelligence, Hanoi, Vietnam, 15–19 December 2008; Proceedings 10. Springer: Berlin/Heidelberg, Germany, 2008; pp. 103–114. [Google Scholar]
  27. Duong, T.D. Non-Negative Sparse Principal Component Analysis. J. Tech. Educ. Sci. 2009, 4, 31–38. [Google Scholar]
  28. Guillen, M.; Nielsen, J.P.; Ayuso, M.; Pérez-Marín, A.M. The Use of Telematics Devices to Improve Automobile Insurance Rates. Risk Anal. 2019, 39, 662–672. [Google Scholar] [CrossRef] [PubMed]
  29. Duval, F.; Boucher, J.P.; Pigeon, M. Enhancing Claim Classification with Feature Extraction from Anomaly-detection-derived Routine and Peculiarity Profiles. J. Risk Insur. 2023, 90, 421–458. [Google Scholar] [CrossRef]
  30. Jiang, Q.; Shi, T. Auto Insurance Pricing Using Telematics Data: Application of a Hidden Markov Model. N. Am. Actuar. J. 2024, 1–18. [Google Scholar] [CrossRef]
  31. Chauhan, V.; Yadav, J. Bibliometric Review of Telematics-based Automobile Insurance: Mapping the Landscape of Research and Knowledge. Accid. Anal. Prev. 2024, 196, 107428. [Google Scholar] [CrossRef]
  32. Hosein, P. A Data Science Approach to Risk Assessment for Automobile Insurance Policies. Int. J. Data Sci. Anal. 2024, 17, 127–138. [Google Scholar] [CrossRef]
  33. Xiang, J.; Ghaffarpasand, O.; Pope, F.D. Mapping Urban Mobility Using Vehicle Telematics to Understand Driving Behaviour. Sci. Rep. 2024, 14, 3271. [Google Scholar] [CrossRef]
  34. Ayuso, M.; Guillen, M.; Nielsen, J.P. Improving Automobile Insurance Ratemaking Using Telematics: Incorporating Mileage and Driver Behaviour Data. Transportation 2019, 46, 735–752. [Google Scholar] [CrossRef]
  35. Ma, Y.L.; Zhu, X.; Hu, X.; Chiu, Y.C. The Use of Context-sensitive Insurance Telematics Data in Auto Insurance Rate Making. Transp. Res. Part Policy Pract. 2018, 113, 243–258. [Google Scholar] [CrossRef]
  36. Gao, G.; Meng, S.; Wüthrich, M.V. Claims Frequency Modeling Using Telematics Car Driving Data. Scand. Actuar. J. 2019, 2019, 143–162. [Google Scholar] [CrossRef]
  37. Weidner, W.; Transchel, F.W.; Weidner, R. Telematic Driving Profile Classification in Car Insurance Pricing. Ann. Actuar. Sci. 2017, 11, 213–236. [Google Scholar] [CrossRef]
  38. Soleymanian, M.; Weinberg, C.B.; Zhu, T. Sensor Data and Behavioral Tracking: Does Usage-based Auto Insurance Benefit Drivers? Mark. Sci. 2019, 38, 21–43. [Google Scholar] [CrossRef]
  39. So, B.; Boucher, J.P.; Valdez, E.A. Synthetic Dataset Generation of Driver Telematics. Risks 2021, 9, 58. [Google Scholar] [CrossRef]
  40. Sigg, C.D.; Buhmann, J.M. Expectation-maximization for Sparse and Non-negative PCA. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 960–967. [Google Scholar]
  41. Casalino, G.; Del Buono, N.; Mencar, C. Nonnegative Matrix Factorizations for Intelligent Data Analysis. In Non-Negative Matrix Factorization Techniques: Advances in Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2016; pp. 49–74. [Google Scholar]
  42. Bertsimas, D.; Cory-Wright, R.; Pauphilet, J. Solving Large-scale Sparse PCA to Certifiable (near) Optimality. J. Mach. Learn. Res. 2022, 23, 1–35. [Google Scholar]
  43. Urzola, R.G. Jack of all Trades, Master of None: The Trade-offs in Sparse PCA Methods for Diverse Purposes. Ph.D. Thesis, Proefschrift AIO, Hulsberg, The Netherlands, 2023. [Google Scholar]
  44. Forzani, L.; Arancibia, R.G.; Llop, P.; Tomassi, D. Supervised Dimension Reduction for Ordinal Predictors. Comput. Stat. Data Anal. 2018, 125, 136–155. [Google Scholar] [CrossRef]
Figure 1. Flowchart illustrating the modelling process with dimensionality reduction applied to telematics variables.
Figure 1. Flowchart illustrating the modelling process with dimensionality reduction applied to telematics variables.
Mathematics 12 03150 g001
Figure 2. The predicted probability and loss cost by different levels of years of no claims, annual mileage driven, and credit scores, obtained from the model using only telematics data variables, compared to the empirical results.
Figure 2. The predicted probability and loss cost by different levels of years of no claims, annual mileage driven, and credit scores, obtained from the model using only telematics data variables, compared to the empirical results.
Mathematics 12 03150 g002
Figure 3. The predicted probability and loss cost by different levels of insured age and car age, obtained from the model using only telematics data variables, compared to the empirical results.
Figure 3. The predicted probability and loss cost by different levels of insured age and car age, obtained from the model using only telematics data variables, compared to the empirical results.
Mathematics 12 03150 g003
Figure 4. The predicted probability and loss cost by different levels of insured age, car age, and credit scores, obtained from the model using traditional variables only (labelled as 1 in the figure legend), traditional plus telematics variables (labelled as 2 in the figure legend), and compared to the empirical results.
Figure 4. The predicted probability and loss cost by different levels of insured age, car age, and credit scores, obtained from the model using traditional variables only (labelled as 1 in the figure legend), traditional plus telematics variables (labelled as 2 in the figure legend), and compared to the empirical results.
Mathematics 12 03150 g004aMathematics 12 03150 g004b
Figure 5. The predicted probability and loss cost by different levels of insured age and car age, obtained from the model using traditional variables only (labelled as 1 in the figure legend), traditional plus telematics variables (labelled as 2 in the figure legend), and compared to the empirical results.
Figure 5. The predicted probability and loss cost by different levels of insured age and car age, obtained from the model using traditional variables only (labelled as 1 in the figure legend), traditional plus telematics variables (labelled as 2 in the figure legend), and compared to the empirical results.
Mathematics 12 03150 g005
Figure 6. The comparison of estimated risk relativity for claim probability, claim amount and loss cost obtained from the predictive models with and without the use of telematics variables.
Figure 6. The comparison of estimated risk relativity for claim probability, claim amount and loss cost obtained from the predictive models with and without the use of telematics variables.
Mathematics 12 03150 g006
Table 1. The principal component load vectors obtained from the NSPCA of each telematics risk factor. The dimensions for the factors of weekly percentage driven, acceleration, braking, percentage driven for extended periods, and percentage driven during morning and afternoon are, respectively, 7, 6, 6, 3 and 2.
Table 1. The principal component load vectors obtained from the NSPCA of each telematics risk factor. The dimensions for the factors of weekly percentage driven, acceleration, braking, percentage driven for extended periods, and percentage driven during morning and afternoon are, respectively, 7, 6, 6, 3 and 2.
PC1PC2PC3PC4PC5PC6PC7
Pct.drive.Mon0000.5640.9590.2270
Pct.drive.Tue0000.826000.104
Pct.drive.Wed00.69000.01600.9240
Pct.drive.Thr00.7230.31600.24900
Pct.drive.Fri000.920000.3070.215
Pct.drive.Sat0.7220.0200.2330000
Pct.drive.Sun0.6920000.13400.971
Standard Deviation1.1401.0350.9791.0180.8080.7020.600
Proportion of Variance0.2210.1830.1630.1760.1110.0840.061
Cumulative Proportion0.2210.4040.5670.7440.8550.9391
Accel.06miles00.4970.9990.0460.0100.054
Accel.08miles00.63700.22600
Accel.09miles00.590000.9770
Accel.11miles0.570000.9730.2110.007
Accel.12miles0.58900000.998
Accel.14miles0.57300.053000
Standard Deviation1.6921.5320.6490.2570.2250.136
Proportion of Variance0.4960.4070.0730.0110.0090.003
Cumulative Proportion0.4960.9040.9770.9880.9971
Brake.06miles00.5001.000000.00004
Brake.08miles00.646000.0990.016
Brake.09miles00.57700.99600
Brake.11miles0.57400000
Brake.12miles0.585000.01801.000
Brake.14miles0.57300.0250.0900.9950
Standard Deviation1.7071.5170.6560.2660.2020.107
Proportion of Variance0.5050.3990.0750.0120.0070.002
Cumulative Proportion0.5050.9040.9790.9910.9981
Pct.drive.2hrs0.54410
Pct.drive.3hrs0.61500
Pct.drive.4hrs0.57001
Standard Deviation1.5560.5330.308
Proportion of Variance0.8650.1010.034
Cumulative Proportion0.8650.9661
Pct.drive.rush am0.6971
Pct.drive.rush pm0.7170
Standard Deviation1.1620.578
Proportion of Variance0.8010.199
Cumulative Proportion0.8011
Table 2. The principal component load vectors obtained from the NSPCA of each telematics risk factor. The dimensions for the factors of left and right turns are 5.
Table 2. The principal component load vectors obtained from the NSPCA of each telematics risk factor. The dimensions for the factors of left and right turns are 5.
PC1PC2PC3PC4PC5
Left.turn.intensity0800.7070.26300
Left.turn.intensity0900.70700.9640
Left.turn.intensity100.5760.0210.9650.2650
Left.turn.intensity110.5790001
Left.turn.intensity120.5770000
Standard Deviation1.7261.4120.1070.0520.042
Proportion of Variance0.5970.4000.0020.0010.0004
Cumulative Proportion0.5970.9970.9991.0001
Right.turn.intensity0800.7070.2710.8810.426
Right.turn.intensity0900.707000
Right.turn.intensity100.5760.0340.96300
Right.turn.intensity110.5810000.905
Right.turn.intensity120.576000.4740
Standard Deviation1.7211.4080.1320.0960.071
Proportion of Variance0.5950.3980.0040.0020.001
Cumulative Proportion0.5950.9940.9970.9991
Table 3. Traditional variable coefficients and their standard errors obtained from the GLM model with both traditional policy variables and telematics variables.
Table 3. Traditional variable coefficients and their standard errors obtained from the GLM model with both traditional policy variables and telematics variables.
Dependent Variable
Num_ClaimNum_ClaimAMT_ClaimAMT_Claim
PoissonPoissonglm: Gammaglm: Gamma
link = loglink = log
(1) Coefficients(2) Standard Errors(3) Coefficients(4) Standard Errors
Insured.AgeGroup23–35−0.473 ***(0.112)−0.030(0.140)
Insured.AgeGroup36–45−0.527 ***(0.116)−0.222(0.146)
Insured.AgeGroup46–65−0.336 ***(0.118)−0.200(0.147)
Insured.AgeGroup65+−0.561 ***(0.131)−0.134(0.164)
Insured.sexMale0.045(0.031)0.010(0.041)
Car.age.range1–5−0.210 ***(0.049)−0.077(0.064)
Car.age.range6–10−0.489 ***(0.053)−0.263 ***(0.070)
Car.age.range11–15−0.781 ***(0.076)−0.546 ***(0.098)
Car.age.range ≥16−1.290 ***(0.263)−0.514(0.323)
MaritalSingle0.046(0.035)−0.006(0.046)
Car.useCommute−0.095(0.081)0.063(0.107)
Car.useFarmer−0.602 *(0.234)−0.805 ***(0.293)
Car.usePrivate−0.047(0.086)−0.073(0.113)
Credit.score.range601–700−0.127 *(0.074)−0.037(0.095)
Credit.score.range701–800−0.328 ***(0.069)−0.261 ***(0.089)
Credit.score.range801–900−0.791 ***(0.068)−0.511 ***(0.089)
factor(Location.Cluster)20.032(0.093)−0.028(0.122)
factor(Location.Cluster)3−1.080 ***(0.412)−0.679(0.554)
factor(Location.Cluster)40.057(0.069)0.238 ***(0.091)
factor(Location.Cluster)50.298(0.307)−0.120(0.395)
factor(Location.Cluster)6−0.284 ***(0.082)−0.040(0.107)
factor(Location.Cluster)7−0.070(0.111)0.522 ***(0.141)
factor(Location.Cluster)8−0.801 *(0.451)1.397 *(0.553)
factor(Location.Cluster)9−0.099(0.067)0.009(0.087)
factor(Location.Cluster)100.150 *(0.073)0.233 *(0.095)
factor(Location.Cluster)110.096(0.503)1.678 ***(0.617)
factor(Location.Cluster)12−0.572 ***(0.118)0.118(0.150)
factor(Location.Cluster)13−0.163 *(0.076)0.068(0.100)
factor(Location.Cluster)140.012(0.066)0.121(0.086)
Annual.miles.drive.range5000–10,0000.733 ***(0.131)0.041(0.168)
Annual.miles.drive.range10,000–15,0000.829 ***(0.133)0.159(0.170)
Annual.miles.drive.range15,000–20,0000.839 ***(0.144)0.126(0.186)
Annual.miles.drive.range20,000–25,0000.903 ***(0.193)0.018(0.242)
Annual.miles.drive.range25,000+0.299(0.351)−0.696(0.472)
Years.noclaims.range21–40−0.160 ***(0.046)0.045(0.059)
Years.noclaims.range41–60−0.136 *(0.065)−0.379 ***(0.086)
Years.noclaims.range61–80−0.279(0.194)−0.159(0.243)
Note: * p < 0.1; *** p < 0.01.
Table 4. Telematics variable coefficients and their standard errors obtained from the GLM model with both traditional policy variables and telematics variables.
Table 4. Telematics variable coefficients and their standard errors obtained from the GLM model with both traditional policy variables and telematics variables.
Dependent Variable
Num_ClaimNum_ClaimAMT_ClaimAMT_Claim
PoissonPoissonglm: Gammaglm: Gamma
link = loglink = log
(1) Coefficients(2) Standard Errors(3) Coefficients(4) Standard Errors
Annual.pct.driven2.336 ***(0.074)−0.392 ***(0.102)
Total.miles.driven0.00003 ***(0.00000)−0.00001 ***(0.00001)
Pct.drive10.392(0.252)−0.531(0.437)
Pct.drive20.459 *(0.199)−0.439(0.344)
Pct.drive30.478 *(0.234)−0.548(0.410)
Pct.drive40.310 *(0.158)−0.300(0.272)
Pct.drive.hrs0.014 ***(0.005)−0.065 ***(0.020)
Rush.ampm−0.065 ***(0.016)−0.015(0.021)
Avgdays.week0.067 ***(0.018)−0.102 ***(0.027)
Accel1−0.133 ***(0.039)−0.054(0.123)
Accel20.050 ***(0.019)0.027(0.033)
Brake10.038(0.030)0.175(0.135)
Brake20.088 ***(0.011)0.088 ***(0.027)
Left.turn1−0.030 *(0.015)0.045 *(0.021)
Left.turn20.059 ***(0.018)−0.065 ***(0.025)
Right.turn10.026(0.020)0.040(0.060)
Right.turn2−0.014(0.027)−0.046(0.078)
Constant−4.500 ***(0.240)9.502 ***(0.316)
Observations100,000 3864
Log Likelihood−15,608.270 −35,038.040
Akaike Inf. Crit.31,326.540 70,186.090
Note: * p < 0.1; *** p < 0.01.
Table 5. Relativity estimate for the cases with and without telematics variables used in the predictive models for predicting the probability of claim occurrence and claim amounts. The overall relativity from both claim probability and claim amount in terms of loss cost are also obtained and compared. The bolded values in the table indicate more significant changes in relativities.
Table 5. Relativity estimate for the cases with and without telematics variables used in the predictive models for predicting the probability of claim occurrence and claim amounts. The overall relativity from both claim probability and claim amount in terms of loss cost are also obtained and compared. The bolded values in the table indicate more significant changes in relativities.
With Telematics Without Telematics
VariableClaim Prob
(with Poisson)
Claim Amounts
(with Gamma)
Loss CostClaim Prob
(with Poisson)
Claim Amounts
(with Gamma)
Loss CostRel.Change
Insured.Age16–221.001.001.001.001.001.000.00
Insured.Age23–350.620.970.600.690.910.63−0.31
Insured.Age36–450.590.800.470.650.750.49−0.28
Insured.Age46–650.710.820.580.710.790.57−0.21
Insured.Age65+0.570.870.500.530.890.47−0.40
Female1.001.001.001.001.001.000.00
Male1.051.011.060.991.031.030.02
Car.age≤01.001.001.001.001.001.000.00
Car.age1–50.810.930.750.910.910.83−0.16
Car.age6–100.610.770.470.640.830.53−0.36
Car.age11–150.460.580.270.410.640.26−0.38
Car.age≥160.280.600.160.240.660.16−0.49
Married1.001.001.001.001.001.000.00
Single1.050.991.041.030.991.030.05
Car.Use.Commercial1.001.001.001.001.001.000.00
Car.useCommute0.911.060.970.811.050.85−0.09
Car.useFarmer0.550.450.240.420.440.18−0.19
Car.usePrivate0.950.930.890.810.960.77−0.07
Credit.Score.≤6001.001.001.001.001.001.000.00
Credit.score.range601–7000.880.960.850.931.060.98−0.21
Credit.score.range701–8000.720.770.560.730.780.57−0.22
Credit.score.range801–9000.450.600.270.450.600.27−0.33
Location.Cluster11.001.001.001.001.001.000.00
Location.Cluster21.030.971.001.180.951.120.05
Location.Cluster30.340.510.170.350.450.16−0.27
Location.Cluster41.061.271.341.231.361.67−0.02
Location.Cluster51.350.891.201.480.951.400.25
Location.Cluster60.750.960.720.790.890.70−0.17
Location.Cluster70.931.691.571.071.571.690.00
Location.Cluster80.454.041.810.493.551.74 1 . 73
Location.Cluster90.911.010.911.000.991.00−0.08
Location.Cluster101.161.261.471.201.261.510.20
Location.Cluster111.105.355.891.045.045.230.85
Location.Cluster120.561.130.640.601.070.65−0.44
Location.Cluster130.851.070.910.931.030.96−0.13
Location.Cluster141.011.131.141.111.211.34−0.07
Annual.miles.0–50001.001.001.001.001.001.000.00
Annual.miles.5000–10,0002.081.042.172.320.992.311.17
Annual.miles.10,000–15,0002.291.172.682.671.122.991.57
Annual.miles.15,000–20,0002.311.132.622.980.952.821.68
Annual.miles.20,000–25,0002.471.022.513.870.893.461.62
Annual.miles.25,000+1.350.500.672.160.310.660.37
Years.noclaims.0–201.001.001.001.001.001.000.00
Years.noclaims21–400.851.050.890.851.040.88−0.15
Years.noclaims41–600.870.680.600.790.660.52−0.06
Years.noclaims61–800.760.850.650.600.790.48−0.15
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, S. Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation. Mathematics 2024, 12, 3150. https://doi.org/10.3390/math12193150

AMA Style

Xie S. Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation. Mathematics. 2024; 12(19):3150. https://doi.org/10.3390/math12193150

Chicago/Turabian Style

Xie, Shengkun. 2024. "Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation" Mathematics 12, no. 19: 3150. https://doi.org/10.3390/math12193150

APA Style

Xie, S. (2024). Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation. Mathematics, 12(19), 3150. https://doi.org/10.3390/math12193150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop