Figure 1.
Illustration of an autoencoder model (encoder and decoder) using the two different partially impossible reconstruction losses together with sampling examples. (a) Variation I: input and target are images of the same scenery , but under different illumination conditions. (b) Variation II: for the input a new scenery of the same class label is selected as target.
Figure 1.
Illustration of an autoencoder model (encoder and decoder) using the two different partially impossible reconstruction losses together with sampling examples. (a) Variation I: input and target are images of the same scenery , but under different illumination conditions. (b) Variation II: for the input a new scenery of the same class label is selected as target.
Figure 2.
Example images for the different SVIRO datasets variations. Each seat position (left, middle and right) can be empty, or an adult, a child seat, an infant seat or an everyday object can be placed on it. (a) SVIRO: Base dataset with sceneries for ten different vehicle interiors. (b) SVIRO-Illumination: Scenery is rendered under several illuminations conditions. (c) SVIRO-NoCar: Scenery is rendered with different background and without a vehicle. (d) SVIRO-Uncertainty: Fine-grained splits with unknown everyday objects for uncertainty estimation.
Figure 2.
Example images for the different SVIRO datasets variations. Each seat position (left, middle and right) can be empty, or an adult, a child seat, an infant seat or an everyday object can be placed on it. (a) SVIRO: Base dataset with sceneries for ten different vehicle interiors. (b) SVIRO-Illumination: Scenery is rendered under several illuminations conditions. (c) SVIRO-NoCar: Scenery is rendered with different background and without a vehicle. (d) SVIRO-Uncertainty: Fine-grained splits with unknown everyday objects for uncertainty estimation.
Figure 3.
Comparison of training latent space representations (t-SNE projection) by different autoencoder models for different datasets: MNIST (first block), Fashion-MNIST (second block), CIFAR10 (third block) and GTSRB (fourth block). We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). Different colours represent different classes. (a) AE. (b) TAE. (c) II-AE (Ours).
Figure 3.
Comparison of training latent space representations (t-SNE projection) by different autoencoder models for different datasets: MNIST (first block), Fashion-MNIST (second block), CIFAR10 (third block) and GTSRB (fourth block). We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). Different colours represent different classes. (a) AE. (b) TAE. (c) II-AE (Ours).
Figure 4.
Comparison of the input (first row), reconstruction (second row) and target (third row) of the training data after the last epoch. The results are for different autoencoder models for different datasets: MNIST (first block), Fashion-MNIST (second block), CIFAR10 (third block) and GTSRB (fourth block). We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). (a) AE. (b) TAE. (c) II-AE (Ours).
Figure 4.
Comparison of the input (first row), reconstruction (second row) and target (third row) of the training data after the last epoch. The results are for different autoencoder models for different datasets: MNIST (first block), Fashion-MNIST (second block), CIFAR10 (third block) and GTSRB (fourth block). We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). (a) AE. (b) TAE. (c) II-AE (Ours).
Figure 5.
Extended Yale Face Database B. Illumination is removed from the training samples (first row) to form the reconstruction (second row). The test samples (third row) cannot always be reconstructed reliably (fourth row). Reconstructing the nearest neighbour (fifth row) preserves the head pose and position and removes the illumination such that potentially a classification or a landmark detection could be performed on the latter instead of the input image.
Figure 5.
Extended Yale Face Database B. Illumination is removed from the training samples (first row) to form the reconstruction (second row). The test samples (third row) cannot always be reconstructed reliably (fourth row). Reconstructing the nearest neighbour (fifth row) preserves the head pose and position and removes the illumination such that potentially a classification or a landmark detection could be performed on the latter instead of the input image.
Figure 6.
The encoder–decoder model transforms the input images of the same scenery (first row) into a cleaned version (second row) by removing all illumination and environment information (see the background through the window). The test image (third row) cannot be reconstructed perfectly (fourth row). Choosing the nearest neighbour in the latent space and reconstructing the latter leads to a class-preserving reconstruction (fifth row).
Figure 6.
The encoder–decoder model transforms the input images of the same scenery (first row) into a cleaned version (second row) by removing all illumination and environment information (see the background through the window). The test image (third row) cannot be reconstructed perfectly (fourth row). Choosing the nearest neighbour in the latent space and reconstructing the latter leads to a class-preserving reconstruction (fifth row).
Figure 7.
Comparison of the reconstruction of the 5 nearest neighbours (columns 3 to 7) for different encoder–decoder latent spaces (a–c). The reconstruction (second column) of the test sample (first column) is also reported. We used the I-PIRL and either used the triplet loss (TAE), the variational autoencoder (VAE) or just the vanilla reconstruction loss (AE). The triplet regularization is the most reliable and consistent one across all 5 neighbours. Notice the class changes across neighbours for the AE and VAE models. (a) I-AE. (b) I-VAE. (c) I-TAE.
Figure 7.
Comparison of the reconstruction of the 5 nearest neighbours (columns 3 to 7) for different encoder–decoder latent spaces (a–c). The reconstruction (second column) of the test sample (first column) is also reported. We used the I-PIRL and either used the triplet loss (TAE), the variational autoencoder (VAE) or just the vanilla reconstruction loss (AE). The triplet regularization is the most reliable and consistent one across all 5 neighbours. Notice the class changes across neighbours for the AE and VAE models. (a) I-AE. (b) I-VAE. (c) I-TAE.
Figure 8.
Examples for the Webcam Clip Art dataset. The encoder–decoder model removes the environmental features from the images (first row) to form the output images (second row). Vehicles and people are removed from the scenery and skies, rivers and beaches are smoothed out.
Figure 8.
Examples for the Webcam Clip Art dataset. The encoder–decoder model removes the environmental features from the images (first row) to form the output images (second row). Vehicles and people are removed from the scenery and skies, rivers and beaches are smoothed out.
Figure 9.
Reconstruction of the first scenery (first recon) is compared against the furthest reconstruction of all sceneries (max recon). First recon is also used to determine the closest and furthest scenery. The model does not learn to focus the reconstruction to a training sample.
Figure 9.
Reconstruction of the first scenery (first recon) is compared against the furthest reconstruction of all sceneries (max recon). First recon is also used to determine the closest and furthest scenery. The model does not learn to focus the reconstruction to a training sample.
Figure 10.
Reconstruction results of unseen real data (a) from the TICaM dataset: (b) E-AE Trained on Tesla SVIRO, (c) E-AE Trained on Kodiaq SVIRO-Illumination, (d) I-E-AE Trained on Kodiaq SVIRO-Illumination, (e) E-AE, (f) I-E-AE, (g) II-E-AE, (h) E-TAE, (i) I-E-TAE, (j) II-E-TAE and (k) Nearest neighbour of (j). Examples (e–k) are all trained on our new dataset. A red (wrong) or green (correct) box highlights whether the semantics are preserved by the reconstruction.
Figure 10.
Reconstruction results of unseen real data (a) from the TICaM dataset: (b) E-AE Trained on Tesla SVIRO, (c) E-AE Trained on Kodiaq SVIRO-Illumination, (d) I-E-AE Trained on Kodiaq SVIRO-Illumination, (e) E-AE, (f) I-E-AE, (g) II-E-AE, (h) E-TAE, (i) I-E-TAE, (j) II-E-TAE and (k) Nearest neighbour of (j). Examples (e–k) are all trained on our new dataset. A red (wrong) or green (correct) box highlights whether the semantics are preserved by the reconstruction.
Figure 11.
Comparison of the training performance distribution for each epoch over 250 epochs. II-E-TAE (extractor triplet autoencoder using the II-PIRL) is compared against training the extractor from scratch or fine-tuning the layers after the features used by the extractor in our approach. (a) Resnet-50. (b) VGG-11. (c) Densenet-121.
Figure 11.
Comparison of the training performance distribution for each epoch over 250 epochs. II-E-TAE (extractor triplet autoencoder using the II-PIRL) is compared against training the extractor from scratch or fine-tuning the layers after the features used by the extractor in our approach. (a) Resnet-50. (b) VGG-11. (c) Densenet-121.
Figure 12.
Reconstruction of real digits when trained on MNIST. We either used the triplet loss (T or TAE), the second variation of the PIRL (II) and/or the extractor (E) or just the vanilla reconstruction loss (AE). The II-PIRL provides the best class preserving reconstructions.
Figure 12.
Reconstruction of real digits when trained on MNIST. We either used the triplet loss (T or TAE), the second variation of the PIRL (II) and/or the extractor (E) or just the vanilla reconstruction loss (AE). The II-PIRL provides the best class preserving reconstructions.
Figure 13.
Comparison of OOD images (first row) with the corresponding reconstructions (second row) for several inferences when dropout is enabled. The results are for different autoencoder models for different datasets: CIFAR10 (first block), LSUN (second block), Places365 (third block) and SVHN (fourth block). We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). (a) MC-AE. (b) MC-TAE. (c) MC-II-AE (Ours).
Figure 13.
Comparison of OOD images (first row) with the corresponding reconstructions (second row) for several inferences when dropout is enabled. The results are for different autoencoder models for different datasets: CIFAR10 (first block), LSUN (second block), Places365 (third block) and SVHN (fourth block). We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). (a) MC-AE. (b) MC-TAE. (c) MC-II-AE (Ours).
Figure 14.
Comparison of entropy histograms between (GTSRB, filled bars with blue) and several (not filled bars and coloured according to dataset used) for different methods. We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE) and compare them against MC Dropout and an ensemble of models. MCA-II-AE provides the best separation between (in-distribution) and (out-of-distribution). Moreover, the different have a more similar distribution compared to the other models, which we also evaluate quantitatively. Notice the non-linear scale on the y-axis (number count per bin) to ease visualization for smaller values. (a) MC-AE. (b) MC-TAE. (c) MC-II-AE (Ours). (d) MC Dropout. (e) Ensemble.
Figure 14.
Comparison of entropy histograms between (GTSRB, filled bars with blue) and several (not filled bars and coloured according to dataset used) for different methods. We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE) and compare them against MC Dropout and an ensemble of models. MCA-II-AE provides the best separation between (in-distribution) and (out-of-distribution). Moreover, the different have a more similar distribution compared to the other models, which we also evaluate quantitatively. Notice the non-linear scale on the y-axis (number count per bin) to ease visualization for smaller values. (a) MC-AE. (b) MC-TAE. (c) MC-II-AE (Ours). (d) MC Dropout. (e) Ensemble.
Table 1.
Mean and standard deviation test accuracy over 10 runs by a linear SVM classifier trained in the latent space of different autoencoders after the latter finished training. We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE).
Table 1.
Mean and standard deviation test accuracy over 10 runs by a linear SVM classifier trained in the latent space of different autoencoders after the latter finished training. We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE).
Dataset | II-AE (Ours) | TAE | AE |
---|
MNIST | 99.3 ± 0.1 | 99.1 ± 0.1 | 93.2 ± 0.5 |
Fashion | 91.9 ± 0.2 | 90.7 ± 0.3 | 78.9 ± 0.3 |
CIFAR10 | 65.9 ± 0.6 | 60.4 ± 1.4 | 18.6 ± 2.0 |
GTSRB | 98.9 ± 0.4 | 98.8 ± 0.3 | 95.7 ± 0.5 |
Table 2.
Mean accuracy and variance over 5 repeated training runs on each of the three vehicle interiors. F—fine-tuned pretrained model, ES—early stopping with 80:20 split, NS—no early stopping and V—vanilla reconstruction loss. We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). PIRL improves the vanilla version and, with the nearest neighbour search, outperforms all other models.
Table 2.
Mean accuracy and variance over 5 repeated training runs on each of the three vehicle interiors. F—fine-tuned pretrained model, ES—early stopping with 80:20 split, NS—no early stopping and V—vanilla reconstruction loss. We either used the triplet loss (TAE), the second variation of the PIRL (II-AE) or just the vanilla reconstruction loss (AE). PIRL improves the vanilla version and, with the nearest neighbour search, outperforms all other models.
| Vehicle |
---|
Model | Cayenne | Kodiaq | Kona |
MobileNet-ES | 62.9 ± 3.1 | 71.8 ± 4.3 | 73.0 ± 0.8 |
VGG11-ES | 64.4 ± 35.0 | 74.0 ± 19.0 | 75.5 ± 5.7 |
ResNet50-ES | 72.3 ± 3.7 | 77.9 ± 35.0 | 76.6 ± 9.9 |
MobileNet-NS | 72.7 ± 3.8 | 77.0 ± 4.1 | 77.4 ± 2.2 |
VGG11-NS | 74.1 ± 5.8 | 71.2 ± 14.0 | 78.4 ± 2.6 |
ResNet50-NS | 76.2 ± 18.0 | 83.1 ± 1.1 | 82.0 ± 3.2 |
MobileNet-F | 85.8 ± 2.0 | 90.6 ± 1.2 | 88.6 ± 0.6 |
VGG11-F | 90.5 ± 2.0 | 90.3 ± 1.2 | 89.2 ± 0.9 |
ResNet50-F | 87.9 ± 2.0 | 89.7 ± 6.1 | 88.5 ± 1.0 |
AE-V | 74.1 ± 0.7 | 80.1 ± 1.8 | 73.3 ± 0.9 |
VAE-V | 73.4 ± 1.3 | 79.5 ± 0.6 | 73.0 ± 0.9 |
TAE-V | 90.8 ± 0.3 | 91.7 ± 0.2 | 89.9 ± 0.6 |
I-AE (Ours) | 86.8 ± 0.3 | 86.7 ± 1.5 | 86.7 ± 0.9 |
I-VAE (Ours) | 81.4 ± 0.5 | 86.6 ± 0.9 | 85.9 ± 0.8 |
I-TAE (Ours) | 92.4 ± 1.5 | 93.5 ± 0.9 | 93.0 ± 0.3 |
Table 3.
For each experiment, the best performance on real vehicle interior images (TICaM) across all epochs is taken and then the mean and maximum of those values across all 10 runs is reported. The model weights achieving maximum performance per run are evaluated on SVIRO, where they perform better as well. We used the triplet loss (TAE), the first (I) or the second variation (II) of the PIRL and the extractor module (E).
Table 3.
For each experiment, the best performance on real vehicle interior images (TICaM) across all epochs is taken and then the mean and maximum of those values across all 10 runs is reported. The model weights achieving maximum performance per run are evaluated on SVIRO, where they perform better as well. We used the triplet loss (TAE), the first (I) or the second variation (II) of the PIRL and the extractor module (E).
Dataset | TICaM | SVIRO |
---|
Model | Mean | Max | Mean | Max |
VGG | | | | |
Scratch | | | | |
Pretrained | | | | |
E-TAE | | | | |
I-E-TAE | |
|
|
|
II-E-TAE |
| | | |
Resnet | | | | |
Scratch | | | | |
Pretrained | | | | |
E-TAE |
|
| | |
I-E-TAE | | | | |
II-E-TAE |
| |
|
|
Densenet | | | | |
Scratch | | | | |
Pretrained | | | | |
E-TAE | | | | |
I-E-TAE | | |
| |
II-E-TAE |
|
| |
|
Table 4.
For each of the 10 experimental runs per method after 250 epochs and using the VGG-11 extractor, we trained different classifiers in the latent space: k-nearest neighbour (KNN), random forest (RForest) and support vector machine with a linear kernel (SVM). We either used the triplet loss (TAE), the first (I) or the second variation (II) of the PIRL and the extractor module (E). Most of the contribution to the synthetic to real generalization on TICaM is due to the II variation of the PIRL cost function.
Table 4.
For each of the 10 experimental runs per method after 250 epochs and using the VGG-11 extractor, we trained different classifiers in the latent space: k-nearest neighbour (KNN), random forest (RForest) and support vector machine with a linear kernel (SVM). We either used the triplet loss (TAE), the first (I) or the second variation (II) of the PIRL and the extractor module (E). Most of the contribution to the synthetic to real generalization on TICaM is due to the II variation of the PIRL cost function.
Variant | KNN | RForest | SVM |
---|
E-AE | | | |
I-E-AE | | | |
II-E-AE |
|
| |
E-TAE | | |
|
Table 5.
Different model variations trained on MNIST. The classifiers were trained on the training data latent space and evaluated on real digits: k-nearest neighbour (KNN), random forest (RForest) and support vector machine with a linear kernel (SVM). We used the triplet loss (TAE), the first (I) or the second variation (II) of the PIRL and the extractor module (E).
Table 5.
Different model variations trained on MNIST. The classifiers were trained on the training data latent space and evaluated on real digits: k-nearest neighbour (KNN), random forest (RForest) and support vector machine with a linear kernel (SVM). We used the triplet loss (TAE), the first (I) or the second variation (II) of the PIRL and the extractor module (E).
Model | KNN | RForest | SVM |
---|
AE | 15.7 | 12.5 | 11.6 |
TAE | 11.1 | 11.6 | 8.4 |
II-AE | 27.8 | 20.2 | 23.6 |
II-TAE | 21.8 | 17.9 | 23.9 |
E-AE | 27.3 | 23.1 | 26.5 |
E-TAE | 26.1 | 19.1 | 23.3 |
II-E-AE | 65.0 | 61.9 | 65.6 |
II-E-TAE | 64.1 | 63.7 | 63.7 |
Table 6.
Comparison of AUROC (in percentage, larger is better) of our method against MC dropout and an ensemble of models (Ensemble) as well as vanilla (AE) and triplet autoencoders (TAE) either using the second variation of the PIRL (II) or not. We repeated the experiments for 10 runs and report the mean and standard deviation. If , then we report the result on the test set of . is in-distribution data and out-of-distribution data. Best results are highlighted in grey.
Table 6.
Comparison of AUROC (in percentage, larger is better) of our method against MC dropout and an ensemble of models (Ensemble) as well as vanilla (AE) and triplet autoencoders (TAE) either using the second variation of the PIRL (II) or not. We repeated the experiments for 10 runs and report the mean and standard deviation. If , then we report the result on the test set of . is in-distribution data and out-of-distribution data. Best results are highlighted in grey.
| MC-II-AE (Ours) | MC-TAE | MC-AE | MC Dropout | Ensemble |
---|
MNIST
MNIST | 93.6 ± 0.4 | 93.3 ± 1.0 | 84.9 ± 0.8 | 90.2 ± 0.8 | 81.0 ± 1.6 |
MNIST
CIFAR10 | 99.1 ± 0.8 | 97.5 ± 1.1 | 81.0 ± 5.1 | 91.8 ± 1.8 | 91.9 ± 1.8 |
MNIST
Fashion | 97.3 ± 0.7 | 95.0 ± 1.1 | 77.2 ± 6.1 | 88.5 ± 2.4 | 82.6 ± 2.5 |
MNIST
Omniglot | 99.4 ± 0.4 | 97.6 ± 0.7 | 82.6 ± 9.0 | 93.2 ± 4.0 | 95.8 ± 2.3 |
MNIST
SVHN | 99.1 ± 1.1 | 98.1 ± 1.0 | 81.5 ± 7.1 | 94.9 ± 1.9 | 94.2 ± 1.6 |
Fashion
Fashion | 86.2 ± 0.5 | 85.8 ± 0.8 | 83.5 ± 0.8 | 82.1 ± 0.4 | 79.8 ± 0.8 |
Fashion
CIFAR10 | 96.6 ± 1.3 | 91.7 ± 1.9 | 91.2 ± 3.2 | 88.6 ± 1.2 | 91.0 ± 1.0 |
Fashion
MNIST | 91.5 ± 1.7 | 87.2 ± 2.3 | 76.4 ± 6.4 | 83.2 ± 2.0 | 88.4 ± 0.8 |
Fashion
Omniglot | 97.7 ± 1.2 | 89.0 ± 3.0 | 77.7 ± 8.7 | 91.7 ± 2.4 | 96.9 ± 0.9 |
Fashion
SVHN | 95.7 ± 2.5 | 90.5 ± 2.7 | 92.1 ± 3.4 | 90.0 ± 1.1 | 93.6 ± 1.1 |
GTSRB
GTSRB | 94.6 ± 0.9 | 93.4 ± 0.8 | 87.9 ± 1.6 | 85.7 ± 1.2 | 83.2 ± 0.9 |
GTSRB
CIFAR10 | 92.6 ± 3.3 | 80.7 ± 1.8 | 75.4 ± 2.5 | 79.0 ± 0.8 | 69.2 ± 1.0 |
GTSRB
LSUN | 93.9 ± 3.5 | 81.2 ± 1.9 | 76.6 ± 2.3 | 80.3 ± 0.6 | 68.3 ± 0.8 |
GTSRB
Places365 | 93.6 ± 3.6 | 82.0 ± 1.7 | 76.2 ± 2.1 | 79.4 ± 0.5 | 68.7 ± 0.8 |
GTSRB
SVHN | 92.8 ± 3.0 | 83.6 ± 2.4 | 76.1 ± 3.7 | 82.8 ± 1.2 | 72.7 ± 0.7 |
SVIRO-U
CIFAR10 | 87.2 ± 7.2 | 71.5 ± 21.8 | 78.5 ± 4.9 | 70.8 ± 10.6 | 83.8 ± 2.2 |
SVIRO-U
GTSRB | 74.5 ± 9.4 | 68.1 ± 18.4 | 82.5 ± 4.8 | 76.6 ± 6.2 | 87.0 ± 1.9 |
SVIRO-U
LSUN | 84.4 ± 8.2 | 71.0 ± 21.2 | 77.5 ± 4.4 | 74.0 ± 8.5 | 82.7 ± 1.8 |
SVIRO-U
Places365 | 85.7 ± 7.8 | 71.2 ± 21.4 | 79.4 ± 3.8 | 75.3 ± 7.4 | 83.7 ± 1.5 |
SVIRO-U
SVHN | 92.7 ± 4.9 | 72.4 ± 22.6 | 79.0 ± 4.8 | 66.2 ± 13.0 | 84.4 ± 3.0 |
SVIRO-U
Adults (A) | 97.6 ± 0.8 | 48.7 ± 48.7 | 88.4 ± 1.1 | 91.9 ± 0.8 | 87.3 ± 0.7 |
SVIRO-U
Seats (S) | 93.3 ± 2.8 | 46.4 ± 46.4 | 74.9 ± 2.1 | 89.7 ± 4.2 | 89.0 ± 2.8 |
SVIRO-U
Objects (O) | 75.5 ± 4.2 | 39.5 ± 39.6 | 73.4 ± 1.6 | 73.7 ± 2.5 | 73.4 ± 4.5 |
SVIRO-U
A, S | 92.4 ± 2.0 | 45.7 ± 45.7 | 75.0 ± 1.7 | 87.8 ± 1.7 | 79.0 ± 1.7 |
SVIRO-U
A, O | 81.8 ± 2.3 | 66.2 ± 16.2 | 76.4 ± 1.0 | 80.5 ± 1.9 | 81.8 ± 0.8 |
SVIRO-U
A, S, O | 77.1 ± 2.6 | 37.9 ± 37.9 | 69.6 ± 1.2 | 75.3 ± 1.9 | 77.8 ± 1.3 |
SVIRO-U
Tesla (OOD) | 79.0 ± 11.8 | 65.3 ± 16.5 | 72.7 ± 4.8 | 81.3 ± 3.8 | 80.3 ± 2.2 |
Table 7.
We calculated the sum of the Wasserstein distances between (in-distribution) and all (TD, larger is better) separately and the sum of the distances between (out-of-distribution) CIFAR10 and all other (OD, smaller is better) over 10 runs. We report the mean and standard deviation and compare our method (MC-II-AE) against MC dropout and an ensemble of models (Ensemble) as well as vanilla (AE) and triplet autoencoders (TAE).
Table 7.
We calculated the sum of the Wasserstein distances between (in-distribution) and all (TD, larger is better) separately and the sum of the distances between (out-of-distribution) CIFAR10 and all other (OD, smaller is better) over 10 runs. We report the mean and standard deviation and compare our method (MC-II-AE) against MC dropout and an ensemble of models (Ensemble) as well as vanilla (AE) and triplet autoencoders (TAE).
| | MC-II-AE (Ours) | MC-TAE | MC-AE | MC Dropout | Ensemble |
---|
OD | |
| | | | |
TD | |
| | | | |