Figure 1.
Distribution of the log of the severity for all datasets.
Figure 1.
Distribution of the log of the severity for all datasets.
Figure 2.
Visualization of the results from tuning random forests to predict claim frequency on the BI data. Here, we show the change in logloss across different values of max depth of each tree in the random forest. The left pane shows the results when mtry is 7, and the right pane shows the results when mtry is 20. Each pane contains two lines of different colors representing different minimum split improvements.
Figure 2.
Visualization of the results from tuning random forests to predict claim frequency on the BI data. Here, we show the change in logloss across different values of max depth of each tree in the random forest. The left pane shows the results when mtry is 7, and the right pane shows the results when mtry is 20. Each pane contains two lines of different colors representing different minimum split improvements.
Figure 3.
Visualization of the results from tuning gradient boosted forests to predict claim frequency on the BI data. Here, we show the change in logloss across different values of max depth of each tree in the gradient boosted forest. The left pane shows the results when the number of trees is 300, the middle pane shows when the number of trees is 500, and the right pane shows the results when the number of trees is 1000. Each pane contains two lines of different colors representing different learning rates.
Figure 3.
Visualization of the results from tuning gradient boosted forests to predict claim frequency on the BI data. Here, we show the change in logloss across different values of max depth of each tree in the gradient boosted forest. The left pane shows the results when the number of trees is 300, the middle pane shows when the number of trees is 500, and the right pane shows the results when the number of trees is 1000. Each pane contains two lines of different colors representing different learning rates.
Figure 4.
Visualization of the results from tuning deep learning models to predict claim frequency on the BI data. Here, we show the change in logloss across different values of learning rate for the deep neural network. The left pane shows the results when there is one layer, the middle pane shows results when there are two layers, and the right pane shows there are three layers. The coloring of the lines correspond to the total number of nodes in the neural network.
Figure 4.
Visualization of the results from tuning deep learning models to predict claim frequency on the BI data. Here, we show the change in logloss across different values of learning rate for the deep neural network. The left pane shows the results when there is one layer, the middle pane shows results when there are two layers, and the right pane shows there are three layers. The coloring of the lines correspond to the total number of nodes in the neural network.
Figure 5.
Visualization of the results from tuning random forests to predict severity on the BI data. MAE (as computed on the validation set) is shown on the y-axis as it varies according to the max depth of each tree in the forest. Each pane shows a different minimum split improvement value, with the different colors representing different values of the mtry tuning parameter.
Figure 5.
Visualization of the results from tuning random forests to predict severity on the BI data. MAE (as computed on the validation set) is shown on the y-axis as it varies according to the max depth of each tree in the forest. Each pane shows a different minimum split improvement value, with the different colors representing different values of the mtry tuning parameter.
Figure 6.
Visualization of the results from tuning gradient boosted forests to predict severity on the BI data (note that only models with the distribution tuning parameter set to “Laplace” are shown since their MAE was much less than the other models). Validation MAE is shown on the y-axis as it changes according to the maximum depth of each tree in the gradient boosted forest. Each pane shows a different number of trees in the forest while each colored line represents a different learning rate.
Figure 6.
Visualization of the results from tuning gradient boosted forests to predict severity on the BI data (note that only models with the distribution tuning parameter set to “Laplace” are shown since their MAE was much less than the other models). Validation MAE is shown on the y-axis as it changes according to the maximum depth of each tree in the gradient boosted forest. Each pane shows a different number of trees in the forest while each colored line represents a different learning rate.
Figure 7.
Visualization of the results from tuning deep learning models to predict severity on the BI data. Validation MAE is shown as a function of the total nodes in the neural network, with each pane showing a different distributional assumption of the network and each line representing a different learning rate.
Figure 7.
Visualization of the results from tuning deep learning models to predict severity on the BI data. Validation MAE is shown as a function of the total nodes in the neural network, with each pane showing a different distributional assumption of the network and each line representing a different learning rate.
Figure 8.
Visualization of the results from tuning random forests to predict claim frequency on the PD data. Here, we show the change in logloss across different values of max depth of each tree in the random forest. The left pane shows the results when mtry is 7, and the right pane shows the results when mtry is 20. Each pane contains two lines of different colors representing different minimum split improvements.
Figure 8.
Visualization of the results from tuning random forests to predict claim frequency on the PD data. Here, we show the change in logloss across different values of max depth of each tree in the random forest. The left pane shows the results when mtry is 7, and the right pane shows the results when mtry is 20. Each pane contains two lines of different colors representing different minimum split improvements.
Figure 9.
Visualization of the results from tuning gradient boosted forests to predict claim frequency on the PD data. Here, we show the change in logloss across different values of max depth of each tree in the gradient boosted forest. The left pane shows the results when the number of trees is 300, the middle pane shows when the number of trees is 500, and the right pane shows the results when the number of trees is 1000. Each pane contains two lines of different colors representing different learning rates.
Figure 9.
Visualization of the results from tuning gradient boosted forests to predict claim frequency on the PD data. Here, we show the change in logloss across different values of max depth of each tree in the gradient boosted forest. The left pane shows the results when the number of trees is 300, the middle pane shows when the number of trees is 500, and the right pane shows the results when the number of trees is 1000. Each pane contains two lines of different colors representing different learning rates.
Figure 10.
Visualization of the results from tuning deep learning models to predict claim frequency on the PD data. Here, we show the change in logloss across different values of learning rate for the deep neural network. The left pane shows the results when there is one layer, the middle pane shows results when there are two layers, and the right pane shows there are three layers. The coloring of the points correspond to the total number of nodes in the neural network.
Figure 10.
Visualization of the results from tuning deep learning models to predict claim frequency on the PD data. Here, we show the change in logloss across different values of learning rate for the deep neural network. The left pane shows the results when there is one layer, the middle pane shows results when there are two layers, and the right pane shows there are three layers. The coloring of the points correspond to the total number of nodes in the neural network.
Figure 11.
Visualization of the results from tuning random forests to predict severity on the PD data. MAE (as computed on the validation set) is shown on the y-axis as it varies according to the max depth of each tree in the forest. Each pane shows a different minimum split improvement value, with the different colors representing different values of the mtry tuning parameter.
Figure 11.
Visualization of the results from tuning random forests to predict severity on the PD data. MAE (as computed on the validation set) is shown on the y-axis as it varies according to the max depth of each tree in the forest. Each pane shows a different minimum split improvement value, with the different colors representing different values of the mtry tuning parameter.
Figure 12.
Visualization of the results from tuning gradient boosted forests to predict severity on the PD data (note that only models with the distribution tuning parameter set to “Laplace” are shown since their MAE was much less than the other models). Validation MAE is shown on the y-axis as it changes according to the maximum depth of each tree in the gradient boosted forest. Each pane shows a different number of trees in the forest while each colored line represents a different learning rate.
Figure 12.
Visualization of the results from tuning gradient boosted forests to predict severity on the PD data (note that only models with the distribution tuning parameter set to “Laplace” are shown since their MAE was much less than the other models). Validation MAE is shown on the y-axis as it changes according to the maximum depth of each tree in the gradient boosted forest. Each pane shows a different number of trees in the forest while each colored line represents a different learning rate.
Figure 13.
Visualization of the results from tuning deep learning models to predict severity on the PD data. Validation MAE is shown as a function of the total nodes in the neural network, with each pane showing a different distributional assumption of the network and each line representing a different learning rate.
Figure 13.
Visualization of the results from tuning deep learning models to predict severity on the PD data. Validation MAE is shown as a function of the total nodes in the neural network, with each pane showing a different distributional assumption of the network and each line representing a different learning rate.
Figure 14.
Visualization of the results from tuning random forests to predict claim frequency on the COLL data. Here, we show the change in logloss across different values of max depth of each tree in the random forest. The left pane shows the results when mtry is 7, and the right pane shows the results when mtry is 20. Each pane contains two lines of different colors representing different minimum split improvements.
Figure 14.
Visualization of the results from tuning random forests to predict claim frequency on the COLL data. Here, we show the change in logloss across different values of max depth of each tree in the random forest. The left pane shows the results when mtry is 7, and the right pane shows the results when mtry is 20. Each pane contains two lines of different colors representing different minimum split improvements.
Figure 15.
Visualization of the results from tuning gradient boosted forests to predict claim frequency on the COLL data. Here, we show the change in logloss across different values of max depth of each tree in the gradient boosted forest. The left pane shows the results when the number of trees is 300, the middle pane shows when the number of trees is 500, and the right pane shows the results when the number of trees is 1000. Each pane contains two lines of different colors representing different learning rates.
Figure 15.
Visualization of the results from tuning gradient boosted forests to predict claim frequency on the COLL data. Here, we show the change in logloss across different values of max depth of each tree in the gradient boosted forest. The left pane shows the results when the number of trees is 300, the middle pane shows when the number of trees is 500, and the right pane shows the results when the number of trees is 1000. Each pane contains two lines of different colors representing different learning rates.
Figure 16.
Visualization of the results from tuning deep learning models to predict claim frequency on the COLL data. Here, we show the change in logloss across different values of learning rate for the deep neural network. The left pane shows the results when there is one layer, the middle pane shows results when there are two layers, and the right pane shows there are three layers. The coloring of the points correspond to the total number of nodes in the neural network.
Figure 16.
Visualization of the results from tuning deep learning models to predict claim frequency on the COLL data. Here, we show the change in logloss across different values of learning rate for the deep neural network. The left pane shows the results when there is one layer, the middle pane shows results when there are two layers, and the right pane shows there are three layers. The coloring of the points correspond to the total number of nodes in the neural network.
Figure 17.
Visualization of the results from tuning random forests to predict severity on the COLL data. MAE (as computed on the validation set) is shown on the y-axis as it varies according to the max depth of each tree in the forest. Each pane shows a different minimum split improvement value, with the different colors representing different values of the mtry tuning parameter.
Figure 17.
Visualization of the results from tuning random forests to predict severity on the COLL data. MAE (as computed on the validation set) is shown on the y-axis as it varies according to the max depth of each tree in the forest. Each pane shows a different minimum split improvement value, with the different colors representing different values of the mtry tuning parameter.
Figure 18.
Visualization of the results from tuning gradient boosted forests to predict severity on the COLL data (note that only models with the distribution tuning parameter set to “Laplace” are shown since their MAE was much less than the other models). Validation MAE is shown on the y-axis as it changes according to the maximum depth of each tree in the gradient boosted forest. Each pane shows a different number of trees in the forest while each colored line represents a different learning rate.
Figure 18.
Visualization of the results from tuning gradient boosted forests to predict severity on the COLL data (note that only models with the distribution tuning parameter set to “Laplace” are shown since their MAE was much less than the other models). Validation MAE is shown on the y-axis as it changes according to the maximum depth of each tree in the gradient boosted forest. Each pane shows a different number of trees in the forest while each colored line represents a different learning rate.
Figure 19.
Visualization of the results from tuning deep learning models to predict severity on the COLL data. Validation MAE is shown as a function of the total nodes in the neural network, with each pane showing a different distributional assumption of the network and each line representing a different learning rate.
Figure 19.
Visualization of the results from tuning deep learning models to predict severity on the COLL data. Validation MAE is shown as a function of the total nodes in the neural network, with each pane showing a different distributional assumption of the network and each line representing a different learning rate.
Figure 20.
Comparison of the logloss of the frequency model part to the MAE of the overall two-part model for the BI data. MAE of the entire two-part model is shown on the y-axis while logloss of the frequency model is shown on the x-axis in order to illustrate how the logloss can affect the overall performance of the two-part model. Different types of frequency models are shown in each pane, while severity model types are shown by different colored points.
Figure 20.
Comparison of the logloss of the frequency model part to the MAE of the overall two-part model for the BI data. MAE of the entire two-part model is shown on the y-axis while logloss of the frequency model is shown on the x-axis in order to illustrate how the logloss can affect the overall performance of the two-part model. Different types of frequency models are shown in each pane, while severity model types are shown by different colored points.
Figure 21.
Comparison of the logloss of the frequency model part to the MAE of the overall two-part model for the PD data. MAE of the entire two-part model is shown on the y-axis while logloss of the frequency model is shown on the x-axis in order to illustrate how the logloss can affect the overall performance of the two-part model. Different types of frequency models are shown in each pane, while severity model types are shown by different colored points.
Figure 21.
Comparison of the logloss of the frequency model part to the MAE of the overall two-part model for the PD data. MAE of the entire two-part model is shown on the y-axis while logloss of the frequency model is shown on the x-axis in order to illustrate how the logloss can affect the overall performance of the two-part model. Different types of frequency models are shown in each pane, while severity model types are shown by different colored points.
Figure 22.
Comparison of the logloss of the frequency model part to the MAE of the overall two-part model for the COLL data. MAE of the entire two-part model is shown on the y-axis, while logloss of the frequency model is shown on the x-axis in order to illustrate how the logloss can affect the overall performance of the two-part model. Different types of frequency models are shown in each pane, while severity model types are shown by different colored points.
Figure 22.
Comparison of the logloss of the frequency model part to the MAE of the overall two-part model for the COLL data. MAE of the entire two-part model is shown on the y-axis, while logloss of the frequency model is shown on the x-axis in order to illustrate how the logloss can affect the overall performance of the two-part model. Different types of frequency models are shown in each pane, while severity model types are shown by different colored points.
Figure 23.
A summary of mSHAP values from 50,000 observations in the COLL test dataset. The 10 most important variables are shown with a beeswarm plot corresponding to the estimated SHAP values from the dataset. Color corresponds to the actual value of the variable, not its importance.
Figure 23.
A summary of mSHAP values from 50,000 observations in the COLL test dataset. The 10 most important variables are shown with a beeswarm plot corresponding to the estimated SHAP values from the dataset. Color corresponds to the actual value of the variable, not its importance.
Figure 24.
An explanation of three different individual predictions on the COLL test dataset using the mSHAP values. The yellow line shows the model prediction and the grey line is the average model prediction across the training set. Colored bars represent the impact from the 10 most important variables.
Figure 24.
An explanation of three different individual predictions on the COLL test dataset using the mSHAP values. The yellow line shows the model prediction and the grey line is the average model prediction across the training set. Colored bars represent the impact from the 10 most important variables.
Table 1.
General description of the three datasets.
Table 1.
General description of the three datasets.
Description | Num Records | % | Exposure | Amount | % | Claim Count | % |
---|
BI | 30,342,067 | 100% | 3,830,558 | 634,080,483 | 100.00% | 32,293 | 100.00% |
Zero exposure | 6,724,652 | 22.16% | - | 6,958,737 | 1.10% | 367 | 1.14% |
Negative exposure | 3,885,178 | 12.80% | (33) | 10,848,560 | 1.71% | 606 | 1.88% |
PD | 20,201,841 | 100.00% | 2,665,037 | 520,665,847 | 100.00% | 151,842 | 100.00% |
Zero exposure | 4,138,323 | 20.48% | - | 6,981,221 | 1.34% | 1898 | 1.25% |
Negative exposure | 2,590,939 | 12.83% | (129) | 9,330,567 | 1.79% | 2487 | 1.64% |
COLL | 30,285,873 | 100.00% | 3,835,828 | 443,291,671 | 100.00% | 135,419 | 100.00% |
Zero exposure | 6,634,314 | 21.91% | - | 5,078,430 | 1.15% | 1621 | 1.20% |
Negative exposure | 3,889,473 | 12.84% | (118) | 7,738,811 | 1.75% | 2291 | 1.69% |
Table 2.
Summary of EARNED_EXPOSURE across the three datasets.
Table 2.
Summary of EARNED_EXPOSURE across the three datasets.
Dataset | Minimum | First Quartile | Median | Mean | Third Quartile | Maximum |
---|
BI | 0.0001 | 0.0822 | 0.2110 | 0.2385 | 0.3973 | 1.0028 |
PD | 0.0001 | 0.0822 | 0.2138 | 0.2418 | 0.4028 | 1.9996 |
COLL | 0.0001 | 0.0822 | 0.2110 | 0.2384 | 0.3973 | 0.9998 |
Table 3.
Percentages of positive ULTIMATE_AMOUNT in the different datasets.
Table 3.
Percentages of positive ULTIMATE_AMOUNT in the different datasets.
Dataset | % of Rows with ULTIMATE_AMOUNT above 0 |
---|
BI | 0.16% |
PD | 1.06% |
COLL | 0.64% |
Table 4.
Counts of the rounded values of ULTIMATE_CLAIM_COUNT.
Table 4.
Counts of the rounded values of ULTIMATE_CLAIM_COUNT.
| BI | PD | COLL |
---|
Claim Count | Num Rows | % | Num Rows | % | Num Rows | % |
---|
0 | 19,700,470 | 99.8% | 13,329,135 | 98.9% | 19,636,069 | 99.4% |
1 | 31,531 | 0.16% | 141,107 | 1.05% | 120,681 | 0.61% |
2 | 218 | 0.001% | 2239 | 0.02% | 5260 | 0.03% |
3+ | 18 | 0.0001% | 98 | 0.001% | 76 | 0.0004% |
Table 5.
Summary statistics for the severity for each dataset.
Table 5.
Summary statistics for the severity for each dataset.
Dataset | Minimum | First Quartile | Median | Mean | Third Quartile | Maximum |
---|
BI | 1.1 | 2842 | 8821 | 19,231 | 21,300 | 732,953 |
PD | −9506 | 856 | 2095 | 3468 | 4553 | 88,494 |
COLL | −2850 | 1061 | 2027 | 3294 | 3979 | 101,203 |
Table 6.
Characteristics of the predictor variables.
Table 6.
Characteristics of the predictor variables.
Variable | Unique Values | Description |
---|
X_VAR1 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR2 | 2 | Binary, 0 or 1. Nearly all observations are 0 so it is thrown out in modeling |
X_VAR3 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR4 | 31 | Integer, ranging from 1 to 31 inclusive |
X_VAR5 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR6 | 17 | Integer, ranging from 0 to 16 inclusive |
X_VAR7 | 17 | Integer, ranging from 0 to 16 inclusive |
X_VAR8 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR9 | 10 | Integer, ranging from 1 to 10 inclusive |
X_VAR10 | 2 | Character, “A” or “B” |
X_VAR11 | 8 | Integer, ranging from 1 to 8 inclusive |
X_VAR12 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR13 | 4 | Character, “B”, “C”, “D”, or “E” |
X_VAR14 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR15 | 4 | Integer, ranging from 0 to 3 inclusive |
X_VAR16 | 9 | Integer, ranging from 1 to 9 inclusive |
X_VAR17 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR18 | 7 | Character, “A”, “B”, or “E” through “I” |
X_VAR19 | 2062 | A string with the form of an integer between 1 and 2529 inclusive preceded by an “A”. Left out of modeling due to the high number of unique values. |
X_VAR20 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR21 | 17 | Integer, ranging from 0 to 16 inclusive |
X_VAR22 | 23 | Integer, ranging from 1 to 23 inclusive |
X_VAR23 | 11 | Character, either a digit 1 through 9 inclusive, the letter “U” or missing |
X_VAR24 | 11 | Integer, ranging from 1 to 11 inclusive |
X_VAR25 | 24 | Integer, ranging from 1 to 24 inclusive |
X_VAR26 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR27 | 48 | String of two letters, alphabetically between “AA” and “BY” inclusive. This variable represents state, but the mapping of the levels to the states is unknown. |
X_VAR28 | 6 | Integer, ranging from 1 to 6 inclusive |
X_VAR29 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR30 | 8 | Integer, ranging from 1 to 8 inclusive |
X_VAR31 | 26 | Integer, ranging from 1 to 26 inclusive |
X_VAR32 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR33 | 6 | Integer, ranging from 1 to 6 inclusive |
X_VAR34 | 56,817 | String consisting of the letter “A” followed by a number. Due to the high number of unique levels, it is left out of modeling. |
X_VAR35 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR36 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR37 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR38 | 31 | A string consisting of the letter “B” followed by an integer between 1 and 31 inclusive |
X_VAR39 | 16 | Integer, ranging from 1 to 16 inclusive |
X_VAR40 | 17 | Integer, ranging from 0 to 16 inclusive |
X_VAR41 | 4 | Integer, ranging from 1 to 4 inclusive. This variable is the year of the policy, but the policies are not evenly dispersed across the 4 levels |
X_VAR42 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR43 | 11 | Integer, ranging from 1 to 11 inclusive |
X_VAR44 | 5 | Integer, ranging from 0 to 4 inclusive |
X_VAR45 | 17 | Integer, ranging from 0 to 16 inclusive |
X_VAR46 | 31,064 | String with 5 characters, all numbers and letters. Due to the high number of unique values, it is left out of modeling |
Table 7.
Tuning Parameters for Random Forest Frequency Models.
Table 7.
Tuning Parameters for Random Forest Frequency Models.
Model Parameter | Possible Values |
---|
ntrees | 100 |
max_depth | 20, 15, 10, 7, 5 |
min_split_improvement | 0.001, 0.01 |
mtries | 20, 7 |
histogram_type | “RoundRobin” |
sample_rate | 0.632 |
categorical_encoding | “EnumLimited” |
col_sample_rate_per_tree | 0.8 |
seed | 16 |
Table 8.
Tuning Parameters for Gradient Boosted Forest Frequency Models.
Table 8.
Tuning Parameters for Gradient Boosted Forest Frequency Models.
Model Parameter | Possible Values |
---|
ntrees | 300, 500, 1000 |
max_depth | 1, 2, 3, 5, 7, 10 |
learn_rate | 0.001, 0.0001 |
min_split_improvement | 0.0001 |
distribution | “multinomial” |
sample_rate | 0.632 |
nbins_cats | 56 |
categorical_encoding | “Eigen” |
col_sample_rate_per_tree | 0.8 |
seed | 16 |
Table 9.
Tuning Parameters for Deep Learning Frequency Models.
Table 9.
Tuning Parameters for Deep Learning Frequency Models.
Model Parameter | Possible Values |
---|
activation | “Tanh” |
hidden | 100, [100, 100], [200, 200], [100, 100, 100] |
adaptive_rate | FALSE |
rate | 0.1, 0.01, 0.005, 0.001, 0.0005 |
rate_decay | 0.5 |
momentum_start | 0.5 |
momentum_stable | 0.99 |
input_dropout_ratio | 0.1 |
initial_weight_distribution | “Normal” |
initial_weight_scale | 1 |
loss | “Automatic” |
distribution | “multinomial” |
stopping_metric | “logloss” |
stopping_tolerance | 0.001 |
categorical_encoding | “EnumLimited” |
seed | 16 |
mini_batch_size | 100 |
Table 10.
Tuning Parameters for Random Forest Severity Models.
Table 10.
Tuning Parameters for Random Forest Severity Models.
Model Parameter | Possible Values |
---|
ntrees | 100, 200 |
max_depth | 3, 5, 7, 10, 15, 20, 30 |
min_split_improvement | 0.01, 0.001, 0.0001 |
mtries | −1, 20, 7 |
histogram_type | “UniformAdaptive”, “RoundRobin” |
sample_rate | 0.632 |
categorical_encoding | “EnumLimited” |
col_sample_rate_per_tree | 0.8 |
seed | 16 |
Table 11.
Tuning Parameters for Gradient Boosted Forest Severity Models.
Table 11.
Tuning Parameters for Gradient Boosted Forest Severity Models.
Model Parameter | Possible Values |
---|
ntrees | 300, 500, 1000 |
max_depth | 1, 2, 3, 5, 7, 10 |
learn_rate | 0.001, 0.0001 |
min_split_improvement | 0.0001 |
distribution | “gaussian”, “gamma”, “laplace” *, “huber” * |
sample_rate | 0.632 |
nbins_cats | 56 |
categorical_encoding | “Eigen” |
col_sample_rate_per_tree | 0.8 |
seed | 16 |
Table 12.
Tuning Parameters for Deep Learning Severity Models.
Table 12.
Tuning Parameters for Deep Learning Severity Models.
Model Parameter | Possible Values |
---|
activation | “Tanh” |
hidden | 100, [100, 100], [200, 200], [100, 100, 100] |
adaptive_rate | FALSE |
rate | 0.01, 0.001, 0.0001, 0.00001 |
rate_decay | 0.5 |
momentum_start | 0.5 |
momentum_stable | 0.99 |
input_dropout_ratio | 0.1 |
initial_weight_distribution | “Normal” |
initial_weight_scale | 1 |
loss | “Automatic” |
distribution | “gaussian”, “laplace” |
stopping_metric | “MAE” |
stopping_tolerance | 0.001 |
categorical_encoding | “EnumLimited” |
seed | 16 |
mini_batch_size | 50 |
Table 13.
GLM validation data metrics for all three datasets.
Table 13.
GLM validation data metrics for all three datasets.
Dataset | Severity GLM MAE | Frequency GLM Logloss |
---|
BI | 18,574 | 0.9239 |
PD | 2739 | 1.1518 |
COLL | 2459 | 1.1262 |