Figure 1.
Self-training with Noisy Student for skin lesion image segmentation. (a) Train teacher model on labeled data and generate pseudo-labels. (b) Train the student model on labeled and pseudo-labeled data and generate new pseudo-labels. (c) Train the student* model on labeled and new pseudo-labeled data. Blue arrows in the Figure represent the model training process, and red arrows describe pseudo-labels flow.
Figure 1.
Self-training with Noisy Student for skin lesion image segmentation. (a) Train teacher model on labeled data and generate pseudo-labels. (b) Train the student model on labeled and pseudo-labeled data and generate new pseudo-labels. (c) Train the student* model on labeled and new pseudo-labeled data. Blue arrows in the Figure represent the model training process, and red arrows describe pseudo-labels flow.
Figure 2.
DeepLab V3 model architecture [
14].
Figure 2.
DeepLab V3 model architecture [
14].
Figure 3.
Number of real and pseudo-labeled samples in train, validation, and test datasets.
Figure 3.
Number of real and pseudo-labeled samples in train, validation, and test datasets.
Figure 4.
Examples of images for which masks were not created or were too small and thus not included in the dataset.
Figure 4.
Examples of images for which masks were not created or were too small and thus not included in the dataset.
Figure 5.
Influence of training dataset composition on IoU for different backbones on ISIC 2017 + 2018 dataset. Ratio of labeled to unlabeled data is 1/m where denotes teacher model.
Figure 5.
Influence of training dataset composition on IoU for different backbones on ISIC 2017 + 2018 dataset. Ratio of labeled to unlabeled data is 1/m where denotes teacher model.
Figure 6.
Original skin lesion image and its augmented versions. Each shows the augmentation set used in the study as input noise for training student model.
Figure 6.
Original skin lesion image and its augmented versions. Each shows the augmentation set used in the study as input noise for training student model.
Figure 7.
Influence of real- to pseudo-label ratio on IoU for different augmentation sets on ISCI 2017 + 2018 dataset. Ratio of labeled to unlabeled data is 1/m where denotes teacher model.
Figure 7.
Influence of real- to pseudo-label ratio on IoU for different augmentation sets on ISCI 2017 + 2018 dataset. Ratio of labeled to unlabeled data is 1/m where denotes teacher model.
Figure 8.
Progress in each iteration of self-training for DeepLabV3 with ResNet18 backbone and DeepLabV3 with ResNet34 backbone on ISIC 2017 + 2018 dataset.
Figure 8.
Progress in each iteration of self-training for DeepLabV3 with ResNet18 backbone and DeepLabV3 with ResNet34 backbone on ISIC 2017 + 2018 dataset.
Figure 9.
Progress in each iteration of self-training for DeepLabV3 with ResNet34 backbone with simple augmentations and optical transform on ISIC 2017 + 2018 dataset.
Figure 9.
Progress in each iteration of self-training for DeepLabV3 with ResNet34 backbone with simple augmentations and optical transform on ISIC 2017 + 2018 dataset.
Figure 10.
Progress in each iteration of self-training on ISIC 2018 dataset. Results for DeepLabV3 (DLV3) with ResNet34 (R34) and ResNet18 (R18) backbones.
Figure 10.
Progress in each iteration of self-training on ISIC 2018 dataset. Results for DeepLabV3 (DLV3) with ResNet34 (R34) and ResNet18 (R18) backbones.
Figure 11.
Progress in each iteration of self-training on PH2 dataset. Results for DeepLabV3 (DLV3) with ResNet34 (R34) and ResNet18 (R18) backbones.
Figure 11.
Progress in each iteration of self-training on PH2 dataset. Results for DeepLabV3 (DLV3) with ResNet34 (R34) and ResNet18 (R18) backbones.
Figure 12.
Influence of real- to pseudo-label ratio on student and student* models on ISIC 2017 + 2018 dataset. The ratio of labeled to unlabeled data is 1/m, where denotes the teacher model. For student*, IoU improves only for 1:1 real- to pseudo-labels, and for higher ratios, performance starts to decrease, while for the student model, optimal performance is for a 1:4 ratio.
Figure 12.
Influence of real- to pseudo-label ratio on student and student* models on ISIC 2017 + 2018 dataset. The ratio of labeled to unlabeled data is 1/m, where denotes the teacher model. For student*, IoU improves only for 1:1 real- to pseudo-labels, and for higher ratios, performance starts to decrease, while for the student model, optimal performance is for a 1:4 ratio.
Figure 13.
Distribution of IoU in ISIC 2017 + 2018 test set. The dashed line marks as threshold of correct segmentation.
Figure 13.
Distribution of IoU in ISIC 2017 + 2018 test set. The dashed line marks as threshold of correct segmentation.
Figure 14.
Images with the worst acquired IoU scores. Subjective assessment of some ground truth masks shows that images are annotated incorrectly. Nevertheless, the model predicted masks that resembled actual lesions.
Figure 14.
Images with the worst acquired IoU scores. Subjective assessment of some ground truth masks shows that images are annotated incorrectly. Nevertheless, the model predicted masks that resembled actual lesions.
Figure 15.
Images with the best acquired IoU scores. We demonstrate that each iteration of self-training leads to more detailed skin lesion boundary segmentation.
Figure 15.
Images with the best acquired IoU scores. We demonstrate that each iteration of self-training leads to more detailed skin lesion boundary segmentation.
Table 1.
Comparison of model architectures for skin lesion segmentation.
Table 1.
Comparison of model architectures for skin lesion segmentation.
| IoU | Dice | Precision | Recall |
---|
U-Net | 0.6941 | 0.7396 | 0.8758 | 0.6460 |
U-Net++ | 0.8137 | 0.8978 | 0.8902 | 0.8202 |
DeepLabV3 | 0.8205 | 0.8822 | 0.9170 | 0.8080 |
DeepLabV3+ | 0.8166 | 0.8843 | 0.9140 | 0.7915 |
Table 2.
Results of student with simple augmentations model (ResNet18 backbone) depending on train set composition on ISIC 2018 dataset.
Table 2.
Results of student with simple augmentations model (ResNet18 backbone) depending on train set composition on ISIC 2018 dataset.
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8659 | 0.9194 | 0.8781 | 0.9308 |
m = 1 | 0.8789 | 0.9355 | 0.9030 | 0.9303 |
m = 2 | 0.8713 | 0.9215 | 0.8690 | 0.9482 |
m = 4 | 0.8657 | 0.9245 | 0.8760 | 0.9382 |
m = 8 | 0.8714 | 0.9292 | 0.8808 | 0.9396 |
Table 3.
Results of student with simple augmentations model (ResNet34 backbone) depending on train set composition on ISIC 2018 dataset.
Table 3.
Results of student with simple augmentations model (ResNet34 backbone) depending on train set composition on ISIC 2018 dataset.
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8568 | 0.9133 | 0.8408 | 0.9596 |
m = 1 | 0.8647 | 0.9173 | 0.8498 | 0.9561 |
m = 2 | 0.8601 | 0.9159 | 0.8622 | 0.9389 |
m = 4 | 0.8567 | 0.9180 | 0.8368 | 0.9579 |
m = 8 | 0.8593 | 0.9124 | 0.8448 | 0.9571 |
Table 4.
Results of student with simple augmentations model (ResNe18 backbone) depending on train set composition on PH2 dataset.
Table 4.
Results of student with simple augmentations model (ResNe18 backbone) depending on train set composition on PH2 dataset.
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8649 | 0.9273 | 0.8962 | 0.9385 |
m = 1 | 0.8740 | 0.9382 | 0.8969 | 0.9503 |
m = 2 | 0.8723 | 0.9333 | 0.8816 | 0.9652 |
m = 4 | 0.8578 | 0.9224 | 0.8590 | 0.9671 |
m = 8 | 0.8734 | 0.9346 | 0.8877 | 0.9590 |
Table 5.
Results of student with simple augmentations model (ResNet34 backbone) depending on train set composition on PH2 dataset.
Table 5.
Results of student with simple augmentations model (ResNet34 backbone) depending on train set composition on PH2 dataset.
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8574 | 0.9192 | 0.8502 | 0.9760 |
m = 1 | 0.8554 | 0.9181 | 0.8459 | 0.9782 |
m = 2 | 0.8575 | 0.9200 | 0.8475 | 0.9791 |
m = 4 | 0.8603 | 0.9222 | 0.8484 | 0.9811 |
m = 8 | 0.8537 | 0.9135 | 0.8380 | 0.9825 |
Table 6.
IoU on ISIC 2017 + 2018 of DeepLabV3 with ResNet34 backbone with different augmentations. m stands for real- to pseudo-labeled data in the student training dataset.
Table 6.
IoU on ISIC 2017 + 2018 of DeepLabV3 with ResNet34 backbone with different augmentations. m stands for real- to pseudo-labeled data in the student training dataset.
| Teacher | Student | m |
---|
simple augmentations | 0.8475 | 0.8540 | 4 |
coarse dropout | 0.8360 | 0.8501 | 2 |
elastic transform | 0.8169 | 0.8299 | 1 |
grid distortion | 0.8499 | 0.8532 | 1 |
optical distortion | 0.8519 | 0.8550 | 2 |
Table 7.
IoU on PH2 of DeepLabV3 with ResNet34 backbone with optical distortion augmentation. m stands for real- to pseudo-labeled data in the student training dataset.
Table 7.
IoU on PH2 of DeepLabV3 with ResNet34 backbone with optical distortion augmentation. m stands for real- to pseudo-labeled data in the student training dataset.
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8512 | 0.9226 | 0.8451 | 0.9820 |
m = 1 | 0.8586 | 0.9253 | 0.8536 | 0.9804 |
m = 2 | 0.8550 | 0.9222 | 0.8454 | 0.9843 |
m = 4 | 0.8546 | 0.9217 | 0.8473 | 0.9806 |
m = 8 | 0.8484 | 0.9227 | 0.8425 | 0.9826 |
Table 8.
IoU on ISIC 2018 of DeepLabV3 with ResNet34 backbone with optical distortion augmentation. m stands for real- to pseudo-labeled data in the student training dataset.
Table 8.
IoU on ISIC 2018 of DeepLabV3 with ResNet34 backbone with optical distortion augmentation. m stands for real- to pseudo-labeled data in the student training dataset.
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8598 | 0.9209 | 0.8404 | 0.9710 |
m = 1 | 0.8559 | 0.9207 | 0.8470 | 0.9603 |
m = 2 | 0.8681 | 0.9220 | 0.8596 | 0.9590 |
m = 4 | 0.8657 | 0.9245 | 0.8760 | 0.9382 |
m = 8 | 0.8439 | 0.9104 | 0.8367 | 0.9477 |
Table 9.
Results of teacher, student, and student* on ISIC 2018 dataset (ResNet18 backbone).
Table 9.
Results of teacher, student, and student* on ISIC 2018 dataset (ResNet18 backbone).
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8659 | 0.9194 | 0.8781 | 0.9308 |
student | 0.8789 | 0.9355 | 0.9030 | 0.9303 |
student* | 0.8800 | 0.9373 | 0.9002 | 0.9342 |
Table 10.
Results of teacher, student, and student* on ISIC 2018 dataset (ResNet34 backbone).
Table 10.
Results of teacher, student, and student* on ISIC 2018 dataset (ResNet34 backbone).
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8568 | 0.9133 | 0.8408 | 0.9596 |
student | 0.8567 | 0.9180 | 0.8368 | 0.9579 |
student* | 0.8670 | 0.9213 | 0.8593 | 0.9565 |
Table 11.
Results of teacher, student, and student* on PH2 dataset (ResNet18 backbone).
Table 11.
Results of teacher, student, and student* on PH2 dataset (ResNet18 backbone).
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8649 | 0.9273 | 0.8962 | 0.9385 |
student | 0.8740 | 0.9382 | 0.8969 | 0.9503 |
student* | 0.8754 | 0.9372 | 0.8858 | 0.9651 |
Table 12.
Results of teacher, student, and student* on PH2 dataset (ResNet34 backbone).
Table 12.
Results of teacher, student, and student* on PH2 dataset (ResNet34 backbone).
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8574 | 0.9192 | 0.8502 | 0.9760 |
student | 0.8603 | 0.9222 | 0.8484 | 0.9811 |
student* | 0.8652 | 0.9259 | 0.8554 | 0.9810 |
Table 13.
Results of teacher, student, and student* on ISIC 2018 dataset (ResNet34 backbone with optical distortion augmentation).
Table 13.
Results of teacher, student, and student* on ISIC 2018 dataset (ResNet34 backbone with optical distortion augmentation).
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8598 | 0.9209 | 0.8404 | 0.9710 |
student | 0.8681 | 0.9220 | 0.8596 | 0.9590 |
student* | 0.8686 | 0.9189 | 0.8632 | 0.9545 |
Table 14.
Results of teacher, student, and student* on PH2 dataset (ResNet34 backbone with optical distortion augmentation).
Table 14.
Results of teacher, student, and student* on PH2 dataset (ResNet34 backbone with optical distortion augmentation).
| mIoU | Dice | Precision | Recall |
---|
teacher | 0.8512 | 0.9226 | 0.8451 | 0.9820 |
student | 0.8550 | 0.9222 | 0.8454 | 0.9843 |
student* | 0.8556 | 0.9182 | 0.8404 | 0.9804 |
Table 15.
Segmentation performance metrics for the proposed method and other state-of-the-art methods on ISIC 2018 dataset. Note that the Polar method authors reported median IoU (result marked with *).
Table 15.
Segmentation performance metrics for the proposed method and other state-of-the-art methods on ISIC 2018 dataset. Note that the Polar method authors reported median IoU (result marked with *).
Method | mIoU | Dice | Ref. |
---|
DuAT | 0.867 | 0.923 | [24] |
BAT | 0.843 | 0.912 | [23] |
DoubleU-Net | 0.821 | 0.896 | [49] |
Polar | 0.874 * | 0.925 | [26] |
Ours (ResNet18) | 0.880 | 0.937 | [this work] |
Ours (ResNet34) | 0.869 | 0.919 | [this work] |