ProMatch: Semi-Supervised Learning with Prototype Consistency
Abstract
:1. Introduction
- We introduce a novel loss component called PC Loss, which addresses the limitations of traditional pseudo-labeling methods. By deriving the pseudo-label from the prediction-prototype of labeled data, the PC Loss ensures more precise and stable label propagation in semi-supervised learning.
- By integrating the PC Loss with the techniques employed in the MixMatch family methods, we establish the ProMatch training framework. This framework combines the benefits of PC Loss and the existing approaches, resulting in an improved performance in semi-supervised learning tasks.
- Extensive experimental results demonstrate that ProMatch achieves significant performance gains over previous algorithms on popular benchmark datasets such as CIFAR-10, CIFAR-100, SVHN, and Mini-ImageNet. These results validate the effectiveness and superiority of our proposed approach in the field of SSL.
2. Related Work
2.1. Consistency Regularization
2.2. Pseudo-Labeling
2.3. The Combination of Consistency Regularization and Pseudo-Labeling
3. Methods
3.1. Preliminaries
3.2. Prototype Generation
3.3. Prototype Consistency Loss
3.4. Total Objective
4. Experiments
4.1. Datasets
4.2. Baseline
- VAT [24]: maintains the consistency of the model’s output under adversarial perturbations.
- MeanTeacher [10]: keeps the consistency between model parameters and a moving average teacher model.
- MixMatch [9]: guesses low-entropy labels for data-augmented unlabeled examples and mixes labeled and unlabeled data using Mix-up.
- PLCB [36]: proposes to learn from unlabeled data by generating soft pseudo-labels using the network predictions.
- ReMixMatch [16]: introduces the distribution alignment and augmentation anchoring to upgrade MixMatch.
- FixMatch [19]: simplifies its previous works by introducing a confidence threshold into its unsupervised objective function. For the same unlabeled sample, FixMatch encourages consistency between weak and strong augmented images.
- SimPLE [20]: introduces a similarity threshold and focuses on the similarity among unlabeled samples.
- FlexMatch [30]: proposes Curriculum Pseudo Labeling, a curriculum learning approach to utilize unlabeled samples according to the status of model learning.
- DoubleMatch [37]: combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process.
- NP-Match [38]: adjusts neural processes (NPs) to semi-supervised learning and proposes an uncertainty-guided skew-geometric JS divergence to replace the original KL divergence in NPs.
- Bad GAN [39]: a generative-model-based SSL method, which is built up upon the assumption that good semi-supervised learning requires a bad generator.
- Triple-GAN [40]: also a generative-model-based SSL method, which is formulated as a three-player minimax game consisting of a generator, a classifier, and a discriminator.
4.3. Implementation Details
4.4. Results
4.5. Ablation Study
Algorithm 1 ProMatch algorithm. | |
| |
| |
| ▹ Generate semantic-prototype and store by label k. |
| ▹ Generate prediction-prototype and store by label k. |
| |
| |
| ▹ Apply weak data augmentation to |
| ▹ Apply strong data augmentation to |
| ▹ Compute semantic feature across using EMA |
| ▹ Compute average predictions across using EMA |
| ▹ Apply temperature sharpening to the average prediction |
| ▹ Compute predictions across using EMA |
| ▹ Similarity between and |
| ▹ Similarity between and |
| |
| ▹ Obtain the most similar semantic-prototype |
| ▹ Obtain the most similar prediction-prototype |
| |
| |
| |
| |
| ▹ Total loss |
- The first row and 7th–9th row of Table 5 show the accuracy rates of our proposal ProMatch under different values of the confidence threshold . We also plot these results in Figure 7a. From these results, we can observe that the accuracy rates of our method have a positive correlation with the confidence threshold. Specifically, ProMatch achieves the optimal accuracy rate of 72.71% at the confidence threshold .
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE international Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Su, X.; Huang, T.; Li, Y.; You, S.; Wang, F.; Qian, C.; Zhang, C.; Xu, C. Prioritized architecture sampling with monto-carlo tree search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10968–10977. [Google Scholar]
- Tang, K.; Ma, Y.; Miao, D.; Song, P.; Gu, Z.; Tian, Z.; Wang, W. Decision fusion networks for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 2022, 1–14. [Google Scholar] [CrossRef]
- Zhu, P.; Hong, J.; Li, X.; Tang, K.; Wang, Z. SGMA: A novel adversarial attack approach with improved transferability. Complex Intell. Syst. 2023, 1–13. [Google Scholar] [CrossRef]
- Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C.A. Mixmatch: A holistic approach to semi-supervised learning. Adv. Neural Inf. Process. Syst. 2019, 32, 5049–5059. [Google Scholar]
- Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 2017, 30, 1195–1204. [Google Scholar]
- Lee, D.H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the Workshop on challenges in representation learning, ICML, Atlanta, GA, USA, 20–21 June 2013; Volume 3, p. 896. [Google Scholar]
- Chapelle, O.; Zien, A. Semi-supervised classification by low density separation. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, PMLR, Bridgetown, Barbados, 6–8 January 2005; pp. 57–64. [Google Scholar]
- Verma, V.; Kawaguchi, K.; Lamb, A.; Kannala, J.; Solin, A.; Bengio, Y.; Lopez-Paz, D. Interpolation consistency training for semi-supervised learning. Neural Netw. 2022, 145, 90–106. [Google Scholar] [CrossRef] [PubMed]
- Tang, K.; Shi, Y.; Lou, T.; Peng, W.; He, X.; Zhu, P.; Gu, Z.; Tian, Z. Rethinking perturbation directions for imperceptible adversarial attacks on point clouds. IEEE Internet Things J. 2022, 10, 5158–5169. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Berthelot, D.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Sohn, K.; Zhang, H.; Raffel, C. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv 2019, arXiv:1911.09785. [Google Scholar]
- Kim, D.J.; Choi, J.; Oh, T.H.; Yoon, Y.; Kweon, I.S. Disjoint multi-task learning between heterogeneous human-centric tasks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1699–1708. [Google Scholar]
- Kuo, C.W.; Ma, C.Y.; Huang, J.B.; Kira, Z. Featmatch: Feature-based augmentation for semi-supervised learning. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 479–495. [Google Scholar]
- Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 2020, 33, 596–608. [Google Scholar]
- Hu, Z.; Yang, Z.; Hu, X.; Nevatia, R. Simple: Similar pseudo label exploitation for semi-supervised classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 15099–15108. [Google Scholar]
- Zheng, M.; You, S.; Huang, L.; Wang, F.; Qian, C.; Xu, C. SimMatch: Semi-Supervised Learning With Similarity Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14471–14481. [Google Scholar]
- Sajjadi, M.; Javanmardi, M.; Tasdizen, T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv. Neural Inf. Process. Syst. 2016, 29, 1171–1179. [Google Scholar]
- Laine, S.; Aila, T. Temporal ensembling for semi-supervised learning. arXiv 2016, arXiv:1610.02242. [Google Scholar]
- Miyato, T.; Maeda, S.i.; Koyama, M.; Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1979–1993. [Google Scholar] [CrossRef] [PubMed]
- Shi, W.; Gong, Y.; Ding, C.; Ma, Z.; Tao, X.; Zheng, N. Transductive Semi-Supervised Deep Learning Using Min-Max Features. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 311–327. [Google Scholar]
- Xie, Q.; Luong, M.T.; Hovy, E.; Le, Q.V. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10687–10698. [Google Scholar]
- Wang, G.H.; Wu, J. Repetitive Reprediction Deep Decipher for Semi-Supervised Learning. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Rizve, M.N.; Duarte, K.; Rawat, Y.S.; Shah, M. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. arXiv 2021, arXiv:2101.06329. [Google Scholar]
- Xie, Q.; Dai, Z.; Hovy, E.; Luong, T.; Le, Q. Unsupervised data augmentation for consistency training. Adv. Neural Inf. Process. Syst. 2020, 33, 6256–6268. [Google Scholar]
- Zhang, B.; Wang, Y.; Hou, W.; Wu, H.; Wang, J.; Okumura, M.; Shinozaki, T. Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Adv. Neural Inf. Process. Syst. 2021, 34, 18408–18419. [Google Scholar]
- Xu, Y.; Shang, L.; Ye, J.; Qian, Q.; Li, Y.F.; Sun, B.; Li, H.; Jin, R. Dash: Semi-supervised learning with dynamic thresholding. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11525–11536. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 702–703. [Google Scholar]
- Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā Indian J. Stat. 1946, 401–406. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report TR-2009; University of Toronto: Toronto, ON, Canada, 2009; Volume 5, p. 6. [Google Scholar]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, Granada, Spain, 12–17 December 2011. [Google Scholar]
- Arazo, E.; Ortego, D.; Albert, P.; O’Connor, N.E.; McGuinness, K. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Wallin, E.; Svensson, L.; Kahl, F.; Hammarstrand, L. Doublematch: Improving semi-supervised learning with self-supervision. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 2871–2877. [Google Scholar]
- Wang, J.; Lukasiewicz, T.; Massiceti, D.; Hu, X.; Pavlovic, V.; Neophytou, A. Np-match: When neural processes meet semi-supervised learning. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 22919–22934. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, F.; Cohen, W.W.; Salakhutdinov, R.R. Good semi-supervised learning that requires a bad gan. Adv. Neural Inf. Process. Syst. 2017, 30, 6510–6520. [Google Scholar]
- Li, C.; Xu, T.; Zhu, J.; Zhang, B. Triple generative adversarial nets. Adv. Neural Inf. Process. Syst. 2017, 30, 4088–4098. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
- Bottou, L. Stochastic gradient descent tricks. In Neural Networks Tricks of the Trade: Second Edition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 421–436. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
CIFAR-10 | SVHN | CIFAR-100 | CIFAR-100 | Mini-ImageNet | |
---|---|---|---|---|---|
0.95 | |||||
1 | |||||
1 | |||||
4 | |||||
0.003 | 0.02 | ||||
K | 7 | 4 | 2 | 7 | |
T | 0.5 | ||||
0.75 | |||||
weight decay | 0.0005 | 0.001 | 0.04 | 0.02 | |
batch size | 64 | 16 | |||
EMA decay | 0.999 | ||||
backbone network | WRN 28-2 | WRN 28-8 | WRN 28-2 | ResNet 18 | |
optimizer | SGD | AdamW | |||
momentum | 0.9 | \ | |||
lr scheduler | cosine decay | \ | |||
lr decay rate | \ |
CIFAR-100 | ||
---|---|---|
Method | 10,000 Labels | Backbone |
MixMatch [9] | 64.01% | WRN 28-2 |
MixMatch Enhanced | 67.12% | WRN 28-2 |
SimPLE [20] | 70.82% | WRN 28-2 |
ProMatch | 72.71% | WRN 28-2 |
MixMatch [9] | 71.69% | WRN 28-8 |
ReMixMatch [16] | 76.97% | WRN 28-8 |
FixMatch [19] | 77.40% | WRN 28-8 |
SimPLE [20] | 78.11% | WRN 28-8 |
FlexMatch [30] | 78.10% | WRN 28-8 |
DoubleMatch [37] | 78.78% | WRN 28-8 |
NP-Match [38] | 78.78% | WRN 28-8 |
ProMatch | 78.85% | WRN 28-8 |
CIFAR-10 | SVHN | |||
---|---|---|---|---|
Method | 1000 Labels | 4000 Labels | 1000 Labels | 4000 Labels |
VAT [24] | 81.36% | 88.95% | 94.02% | 95.80% |
MeanTeacher [10] | 82.68% | 89.64% | 96.25% | 96.61% |
Bad GAN [39] | 79.37% | 85.59% | 95.75% | 96.03% |
MixMatch [9] | 92.25% | 93.76% | 96.73% | 97.11% |
ReMixMatch [16] | 94.27% | 94.86% | 97.17% | 97.58% |
Triple-GAN-V2 [40] | 85.00% | 89.99% | 96.55% | 96.92% |
FixMatch [19] | − | 95.69% | 97.64% | − |
SimPLE [20] | 94.84% | 94.95% | 97.54% | 97.31% |
FlexMatch [30] | - | 95.81% | 93.28% | - |
DoubleMatch [37] | - | 95.35% | 97.90% | - |
ProMatch | 95.01% | 95.83% | 97.79% | 97.88% |
Fully Supervised | 96.23% | 98.17% |
Mini-ImageNet | ||
---|---|---|
Method | 4000 Labels | Error Rate of Pseudo-Labels |
Mean Teacher [10] | 27.49% | 38.87% |
PLCB [36] | 43.51% | 22.63% |
MixMatch [9] | 48.46% | 18.32% |
FixMatch [19] | 50.21% | 14.77% |
SimPLE [20] | 49.39% | 15.35% |
ProMatch | 50.83% | 13.21% |
Ablations: CIFAR-100 | ||||||
---|---|---|---|---|---|---|
Ablation | Augmentation Type | Accuracy | ||||
ProMatch | RandAugment | 1 | 0.95 | 2 | 4 | 72.71% |
ProMatch | RandAugment | 1 | 0.95 | 7 | 4 | 73.38% |
w/o | RandAugment | 0 | 0.95 | 2 | 4 | 69.07% |
w/o | RandAugment | 0 | 0.95 | 7 | 4 | 69.94% |
w/o RandAugment | fixed | 1 | 0.95 | 2 | 4 | 67.66% |
w/o RandAugment, w/o | fixed | 0 | 0.95 | 2 | 4 | 67.41% |
= 0 | RandAugment | 1 | 0 | 2 | 4 | 71.54% |
= 0.5 | RandAugment | 1 | 0.5 | 2 | 4 | 71.53% |
= 0.75 | RandAugment | 1 | 0.75 | 2 | 4 | 71.89% |
= 0.5 | RandAugment | 0.5 | 0.95 | 2 | 4 | 72.55% |
= 2 | RandAugment | 2 | 0.95 | 2 | 4 | 71.81% |
= 3 | RandAugment | 3 | 0.95 | 2 | 4 | 72.42% |
= 4 | RandAugment | 4 | 0.95 | 2 | 4 | 72.76% |
= 3 | RandAugment | 1 | 0.95 | 2 | 3 | 71.83% |
= 5 | RandAugment | 1 | 0.95 | 2 | 5 | 72.01% |
= 10 | RandAugment | 1 | 0.95 | 2 | 10 | 71.82% |
= 20 | RandAugment | 1 | 0.95 | 2 | 20 | 72.13% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, Z.; Wang, X.; Li, J. ProMatch: Semi-Supervised Learning with Prototype Consistency. Mathematics 2023, 11, 3537. https://doi.org/10.3390/math11163537
Cheng Z, Wang X, Li J. ProMatch: Semi-Supervised Learning with Prototype Consistency. Mathematics. 2023; 11(16):3537. https://doi.org/10.3390/math11163537
Chicago/Turabian StyleCheng, Ziyu, Xianmin Wang, and Jing Li. 2023. "ProMatch: Semi-Supervised Learning with Prototype Consistency" Mathematics 11, no. 16: 3537. https://doi.org/10.3390/math11163537
APA StyleCheng, Z., Wang, X., & Li, J. (2023). ProMatch: Semi-Supervised Learning with Prototype Consistency. Mathematics, 11(16), 3537. https://doi.org/10.3390/math11163537