Figure 1.
The concept of multi-layer erceptron (MLP) approximation by virtual discretized interpretable multi-layer perceptron (VDIMLP) networks, with MLP as a subnetwork of VDIMLP. The weight values of the two networks are equal from the first hidden layer of VDIMLP. Between the input layer and the first hidden layer, VDIMLP approximates the Identity function with a staircase activation function. This approximation allows us to generate propositional rules.
Figure 1.
The concept of multi-layer erceptron (MLP) approximation by virtual discretized interpretable multi-layer perceptron (VDIMLP) networks, with MLP as a subnetwork of VDIMLP. The weight values of the two networks are equal from the first hidden layer of VDIMLP. Between the input layer and the first hidden layer, VDIMLP approximates the Identity function with a staircase activation function. This approximation allows us to generate propositional rules.
Figure 2.
The max-pooling operator applied to a 4 × 4 matrix with four non-overlapping regions of size 2 × 2.
Figure 2.
The max-pooling operator applied to a 4 × 4 matrix with four non-overlapping regions of size 2 × 2.
Figure 3.
Coding of a propositional rule with three antecedents. The first layer of weights specifies the antecedents, while the second layer achieves the logical “and” of the antecedents. The activation function of the neurons is a step function.
Figure 3.
Coding of a propositional rule with three antecedents. The first layer of weights specifies the antecedents, while the second layer achieves the logical “and” of the antecedents. The activation function of the neurons is a step function.
Figure 4.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 4.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 5.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 5.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 6.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 6.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 7.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 7.
Centroids of 16 rules with their antecedents generated from in the fully connected layers.
Figure 8.
Centroids of nine rules with their antecedents generated from in the fully connected layers.
Figure 8.
Centroids of nine rules with their antecedents generated from in the fully connected layers.
Figure 9.
An example of four training samples belonging to the “malignant” class, correctly classified by the same rule extracted from a discretized interpretable convolutional network (DICON).
Figure 9.
An example of four training samples belonging to the “malignant” class, correctly classified by the same rule extracted from a discretized interpretable convolutional network (DICON).
Figure 10.
Three training samples belonging to the “malignant” class. The example in the middle activates a single rule, while the examples on the left and right each activate an additional rule.
Figure 10.
Three training samples belonging to the “malignant” class. The example in the middle activates a single rule, while the examples on the left and right each activate an additional rule.
Figure 11.
Three training samples belonging to the “malignant” class. The example in the middle is covered by a single rule, while the examples on the left and right activate four and two rules, respectively.
Figure 11.
Three training samples belonging to the “malignant” class. The example in the middle is covered by a single rule, while the examples on the left and right activate four and two rules, respectively.
Figure 12.
Centroids of nine rules with their antecedents generated from in the fully connected layers.
Figure 12.
Centroids of nine rules with their antecedents generated from in the fully connected layers.
Figure 13.
Samples belonging to the “malignant” class, sharing a rule. The first and the second case activate a single rule, while the second is covered by two rules.
Figure 13.
Samples belonging to the “malignant” class, sharing a rule. The first and the second case activate a single rule, while the second is covered by two rules.
Figure 14.
Samples belonging to the “malignant” class, activating the same rule.
Figure 14.
Samples belonging to the “malignant” class, activating the same rule.
Figure 15.
Samples belonging to the “malignant” class, sharing a rule. The first and the third case activate a single rule, while the second is covered by two rules.
Figure 15.
Samples belonging to the “malignant” class, sharing a rule. The first and the third case activate a single rule, while the second is covered by two rules.
Figure 16.
Centroids of nine rules with their antecedents generated from in the fully connected layers.
Figure 16.
Centroids of nine rules with their antecedents generated from in the fully connected layers.
Figure 17.
Samples belonging to the “benign” class, sharing a rule. The third and the fifth case are covered by a single rule, while the others present a higher number of colored dots related up to four activated rules.
Figure 17.
Samples belonging to the “benign” class, sharing a rule. The third and the fifth case are covered by a single rule, while the others present a higher number of colored dots related up to four activated rules.
Figure 18.
Samples belonging to the “benign” class, sharing a rule. The first and the fifth case are covered by a single rule, while the others present a higher number of colored dots related up to three activated rules.
Figure 18.
Samples belonging to the “benign” class, sharing a rule. The first and the fifth case are covered by a single rule, while the others present a higher number of colored dots related up to three activated rules.
Figure 19.
Samples belonging to the “benign” class, activating the same rule.
Figure 19.
Samples belonging to the “benign” class, activating the same rule.
Figure 20.
Samples belonging to the “benign” class, activating the same rule.
Figure 20.
Samples belonging to the “benign” class, activating the same rule.
Figure 21.
Two histograms corresponding to the values of two pixels located in the top left corner. Each histogram characterizes the distribution of the training samples with respect to class “malignant” (yellow bars) and class “benign” (white bars).
Figure 21.
Two histograms corresponding to the values of two pixels located in the top left corner. Each histogram characterizes the distribution of the training samples with respect to class “malignant” (yellow bars) and class “benign” (white bars).
Figure 22.
A case classified as “benign” by both C4.5 (left) and a CNN (right). Colored dots represent the rule antecedents.
Figure 22.
A case classified as “benign” by both C4.5 (left) and a CNN (right). Colored dots represent the rule antecedents.
Table 1.
Average results on the thyroid dataset. From left to right: predictive accuracy, fidelity, predictive accuracy of the rules, predictive accuracy of the rules when rules and network agree, number of rules, number of antecedents. Numbers in brackets designate standard deviations.
Table 1.
Average results on the thyroid dataset. From left to right: predictive accuracy, fidelity, predictive accuracy of the rules, predictive accuracy of the rules when rules and network agree, number of rules, number of antecedents. Numbers in brackets designate standard deviations.
| Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | #Ant. |
---|
MLP | 97.5 (0.3) | - | - | - | - | - |
VDIMLP0 | 97.3 (0.2) | 98.5 (0.3) | 97.9 (0.3) | 98.3 (0.2) | 38.3 (6.5) | 4.3 (0.2) |
VDIMLP1 | 97.5 (0.3) | 98.3 (0.3) | 96.9 (0.3) | 98.1 (0.3) | 79.7 (11.7) | 4.1 (0.2) |
VDIMLP2 | 97.5 (0.3) | 99.7 (0.1) | 97.4 (0.4) | 97.6 (0.3) | 9.8 (1.4) | 2.6 (0.2) |
Table 2.
Average results obtained by discretized interpretable multi-layer perceptron (DIMLP) ensembles, single DIMLP networks and C4.5 decision trees.
Table 2.
Average results obtained by discretized interpretable multi-layer perceptron (DIMLP) ensembles, single DIMLP networks and C4.5 decision trees.
| Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | #Ant. |
---|
DIMLP-ens | 98.6 (0.1) | 99.5 (0.2) | 98.7 (0.2) | 99.0 (0.1) | 24.5 (6.1) | 3.5 (0.2) |
DIMLP [34] | - | - | 99.3 (0.0) | - | 16.5 (-) | 3.4 (-) |
C4.5 [34] | 99.4 (0.0) | - | 99.4 (0.0) | - | 7.0 (0.0) | 2.0 (0.0) |
Table 3.
Results obtained by a convolutional neural networks (CNN) and its approximation with a virtual discretized interpretable multi-layer perceptron (VDIMLP) subnetwork in the top layers.
Table 3.
Results obtained by a convolutional neural networks (CNN) and its approximation with a virtual discretized interpretable multi-layer perceptron (VDIMLP) subnetwork in the top layers.
| Tr. Acc. | Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | Avg. #Ant. |
---|
CNN | 99.55 | 99.39 | - | - | - | - | - |
VDIMLP () | 99.45 | 99.31 | 98.16 | 97.68 | 99.44 | 1734 | 11.4 |
VDIMLP () | 99.45 | 99.36 | 98.27 | 97.82 | 99.47 | 1570 | 11.6 |
Table 4.
Results obtained by a DIMLP ensemble and C4.5 decision trees (DTs). The first and the second row takes into account 11 × 11 images, while the third is related to 28 × 28 images.
Table 4.
Results obtained by a DIMLP ensemble and C4.5 decision trees (DTs). The first and the second row takes into account 11 × 11 images, while the third is related to 28 × 28 images.
| Tr. Acc. | Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | Avg. #Ant. |
---|
DIMLP-ens (11 × 11) | 99.7 | 98.0 | 90.7 | 89.6 | 98.7 | 7144 | 9.1 |
C4.5 (11 × 11) | 97.9 | 90.1 | - | 89.2 | - | 452 | 9.9 |
C4.5 (28 × 28) | 97.7 | 89.3 | - | 88.4 | - | 392 | 10.6 |
Table 5.
Results obtained by CNNs, decision trees and ensembles of DIMLPs on the skin-cancer dataset with 28 × 28 images. The first row is related to the results with data augmentation during training.
Table 5.
Results obtained by CNNs, decision trees and ensembles of DIMLPs on the skin-cancer dataset with 28 × 28 images. The first row is related to the results with data augmentation during training.
| Tr. Acc. | Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | Avg. #Ant. |
---|
CNN (augm. data) | 85.2 (1.2) | 82.0
(1.0) | - | - | - | - | - |
CNN | 98.9 (0.8) | 81.9 (0.7) | - | - | - | - | - |
C4.5 | 98.5 (0.2) | 69.8 (1.7) | - | 70.9 (0.7) | - | 26.1 (6.7) | 4.6 (0.6) |
DIMLP-ens | 80.5 (0.3) | 75.6 (0.2) | 90.2 (1.4) | 73.7 (1.0) | 77.3 (0.5) | 339.2 (30.6) | 4.9 (0.2) |
Table 6.
Results obtained by a CNN and its approximation with a VDIMLP subnetwork in the top layers.
Table 6.
Results obtained by a CNN and its approximation with a VDIMLP subnetwork in the top layers.
| Tr. Acc. | Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | Avg. #Ant. |
---|
VDIMLP () | 82.6 | 81.8 | 94.5 | 80.3 | 82.9 | 222 | 6.9 |
Table 7.
Results obtained by transfer learning with a VGG network. The first row is related to VDIMLPs in the upper layers, while the second row concerns ensembles of DIMLPs replacing VDIMLPs.
Table 7.
Results obtained by transfer learning with a VGG network. The first row is related to VDIMLPs in the upper layers, while the second row concerns ensembles of DIMLPs replacing VDIMLPs.
| Tr. Acc. | Tst. Acc. | Fid. | Rul. Acc. (1) | Rul. Acc. (2) | #Rul. | Avg. #Ant. |
---|
VDIMLP | 86.0 (0.9) | 83.8 (1.2) | 95.0 (0.9) | 83.1 (1.1) | 85.2 (1.2) | 199.1 (11.9) | 5.7 (0.2) |
DIMLP-ens | 87.1 (0.3) | 84.9 (0.3) | 95.4 (0.7) | 83.9 (0.8) | 86.0 (0.4) | 181.2 (29.0) | 5.8 (0.5) |