FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling
Abstract
:1. Introduction
- We propose fine-grained training sample categories based on the difficulty of learning them and their label correctness.
- We propose a simple yet effective FGCM framework for noisy label learning, which can effectively maximize the useation of the training data and prevent models from over-fitting to noisy labels.
- Extensive experiments show the effectiveness and robustness of FGCM under different ratios and types of noise, outperforming previous methods.
2. Related Work
3. Proposed Method
3.1. Generalization of Sample Clusters
Algorithm 1: Generalization of Sample Clusters and Cluster Refinements |
3.2. Cluster Refinement
3.3. Mixed Semi-Training
4. Experiments
4.1. Experiment Settings
4.1.1. Datasets
4.1.2. Implementation Details
4.2. Experimental Results and Analysis
Compared Methods
- DivideMix. DivideMix trains two networks simultaneously and iteratively divides samples into either clean or noisy sets via a two-component mixture model. Furthermore, semi-supervised learning is performed with the clean and noisy set treated as labeled and unlabeled data, respectively.
- ELR. Using semi-supervised learning techniques, ELR first estimates target probabilities base on the outputs of model in the early training stage. Then, ELR hinders the memorization of noisy labels by employing a regularization term to maximize the inner product between the targets and model outputs.
- MOIT+. Networks are first pretrained with supervised contrastive learning. Samples are divided based on the learned features. Then a classifier is trained in a semi-supervised manner on the divided datasets.
- Sel-CL+. The training of Sel-CL+ is warmed up with unsupervised contrastive learning. With the obtained low-dimensional representations, Sel-CL+ selects confident sample pairs and trains the models with supervised contrastive learning technique.
Dataset | CIFAR-10 | CIFAR-100 | ||||
---|---|---|---|---|---|---|
Noise type | Sym | |||||
Method/Noise ratio | 20% | 50% | 80% | 20% | 50% | 80% |
Cross-Entropy | 86.8 | 79.4 | 62.9 | 61.8 | 37.3 | 8.8 |
Fcorr [20] | 86.8 | 79.8 | 63.3 | 61.5 | 46.6 | 19.9 |
Co-teaching+ [18] | 89.5 | 85.7 | 67.4 | 65.6 | 51.8 | 27.9 |
Pcorr [38] | 92.4 | 89.1 | 77.5 | 69.4 | 57.5 | 31.1 |
FINE [39] | 91.0 | 87.3 | 69.4 | 70.3 | 64.2 | 25.6 |
Meta-Learning [40] | 92.9 | 89.3 | 77.4 | 68.5 | 59.2 | 42.4 |
Mcorr [8] | 94.0 | 92.0 | 86.8 | 73.9 | 66.1 | 48.2 |
DivideMix [19] | 95.2 | 94.2 | 93.0 | 75.2 | 72.8 | 58.3 |
ELR [15] | 93.8 | 92.6 | 88.0 | 74.5 | 70.2 | 45.2 |
MSLC [41] | 93.5 | 90.5 | 69.9 | 72.5 | 68.9 | 24.3 |
MOIT+ [28] | 94.1 | 91.8 | 81.1 | 75.9 | 70.6 | 47.6 |
Sel-CL+ [12] | 95.5 | 93.9 | 89.2 | 76.5 | 72.4 | 59.6 |
FGCM | 95.6 | 94.9 | 94.1 | 77.1 | 74.9 | 61.1 |
Dataset | CIFAR-10 | |||
---|---|---|---|---|
Noise type | Asym | |||
Method/Noise ratio | 10% | 20% | 30% | 40% |
Cross-Entropy | 88.8 | 86.1 | 81.7 | 76.0 |
GCE [24] | 89.5 | 85.6 | 80.6 | 76.0 |
Pcorr [38] | 93.1 | 92.9 | 92.6 | 91.6 |
Mcorr [8] | 89.6 | 91.8 | 92.2 | 91.2 |
DivideMix [19] | 93.8 | 93.2 | 92.5 | 91.4 |
ELR [15] | 94.4 | 93.3 | 91.5 | 85.3 |
MOIT+ [28] | 94.2 | 94.3 | 94.3 | 93.3 |
Sel-CL+ [12] | 95.6 | 95.2 | 94.5 | 93.4 |
FGCM | 95.6 | 95.4 | 94.7 | 93.8 |
4.3. Ablations
4.3.1. Ablation Study on Warm-Up Epoch
4.3.2. Ablation Study on Cluster Number
4.3.3. Ablation Study on Clustering Methods
4.4. Training Time Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Arpit, D.; Jastrzebski, S.; Ballas, N.; Krueger, D.; Bengio, E.; Kanwal, M.S. A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 7–9 August 2017. [Google Scholar]
- Toneva, M.; Sordoni, A.; des Combes, R.T.; Trischler, A.; Bengio, Y.; Gordon, G. An empirical study of example forgetting during deep neural network learning. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zhang, C.; Recht, B.; Bengio, S.; Hardt, M.; Vinyals, O. Understanding deep learning requires rethinking generalization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. CACM 2017, 6, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016. [Google Scholar]
- Tanaka, D.; Ikami, D.; Yamasaki, T.; Aizawa, K. Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Liu, T.; Tao, D. Classification with Noisy Labels by Importance Reweighting. IEEE Trans. PAMI 2016, 38, 447–461. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Arazo, E.; Ortego, D.; Albert, P.; O’Connor, N.E.; McGuinness, K. Unsupervised label noise modeling and loss correction. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Zhang, Z.; Zhang, H.; Arik, S.O.; Lee, H.; Pfister, T. Distilling effective supervision from severe label noise. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020. [Google Scholar]
- Gu, N.; Fan, M.; Meng, D. Robust Semi-Supervised Classification for Noisy Labels Based on Self-Paced Learning. IEEE SPL 2016, 23, 1806–1810. [Google Scholar] [CrossRef]
- Yao, J.; Wang, J.; Tsang, I.; Zhang, Y.; Sun, J.; Zhang, C.; Zhang, R. Deep Learning from Noisy Image Labels with Quality Embedding. IEEE Trans. IP 2019, 28, 1909–1922. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, S.; Xia, X.; Ge, S.; Liu, T. Selective-Supervised Contrastive Learning with Noisy Labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022. [Google Scholar]
- Yi, R.; Huang, Y. TC-Net: Detecting Noisy Labels via Transform Consistency. IEEE Trans. Multimed. 2021, 24, 4328–4341. [Google Scholar] [CrossRef]
- Bernhardt, M.; Castro, D.C.; Tanno, R.; Schwaighofer, A.; Tezcan, K.C.; Monteiro, M.; Bannur, S.; Lungren, M.P.; Nori, A.; Glocker, B.; et al. Active label cleaning for improved dataset quality under resource constraints. Nat. Commun. 2022, 13, 1161. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Niles-Weed, J.; Razavian, N.; Fernandez-Granda, C. Early-learning regularization prevents memorization of noisy labels. In Proceedings of the Conference on Neural Information Processing Systems, Virtual, 6–12 December 2020. [Google Scholar]
- Jiang, L.; Zhou, Z.; Leung, T.; Li, L.; Li, F.-F. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Han, B.; Yao, Q.; Yu, X.; Niu, G.; Xu, M.; Hu, W. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]
- Yu, X.; Han, B.; Yao, J.; Niu, G.; Tsang, I.W.; Sugiyama, M. How does disagreement help generalization against label corruption? In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Li, J.; Socher, R.; Hoi, S. Dividemix: Learning with noisy labels as semi-supervised learning. In Proceedings of the International Conference on Learning Representations, Virtual, 26–30 April 2020. [Google Scholar]
- Patrini, G.; Rozza, A.; Menon, A.K.; Nock, R.; Qu, L. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017. [Google Scholar]
- Chen, X.; Gupta, A. Webly supervised learning of convolutional networks. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015. [Google Scholar]
- Xiao, T.; Xia, T.; Yang, Y.; Huang, C.; Wang, X. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015. [Google Scholar]
- Ghosh, A.; Kumar, H.; Sastry, P. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]
- Wang, Y.; Ma, X.; Chen, Z.; Luo, Y.; Yi, J.; Bailey, J. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–31 October 2019. [Google Scholar]
- Chen, P.; Liao, B.B.; Chen, G.; Zhang, S. Understanding and utilizing deep neural networks trained with noisy labels. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Nguyen, D.T.; Mummadi, C.K.; Ngo, T.P.N.; Nguyen, T.H.P.; Beggel, L.; Brox, T. SELF: Learning to filter noisy labels with self-ensembling. In Proceedings of the International Conference on Learning Representations, Virtual, 26–30 April 2020. [Google Scholar]
- Ortego, D.; Arazo, E.; Albert, P.; O’Connor, N.E. Multi-objective interpolation training for robustness to label noise. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 20–25 June 2021. [Google Scholar]
- Permuter, H.; Francos, J.; Jermyn, I. A study of gaussian mixture models of color and texture features for image classification and segmentation. Pattern Recognit. 2006, 39, 695–706. [Google Scholar] [CrossRef]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. R. Stat. Soc. 1977, 39, 1–38. [Google Scholar]
- Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Proceedings of the Conference on Neural Information Processing Systems, Virtual, 6–12 December 2020. [Google Scholar]
- Krizhevsky, A.; Nair, V.; Hinton, G. CIFAR-10 CIFAR-100; Canadian Institute for Advanced Research: Toronto, ON, Canada, 2021. [Google Scholar]
- Li, W.; Wang, L.; Li, W.; Agustsson, E.; Gool, L.V. Webvision database: Visual learning and understanding from web data. arXiv 2017, arXiv:1708.02862. [Google Scholar]
- Song, H.; Kim, M.; Park, D.; Lee, J. Prestopping: How does early stopping help generalization against label noise? In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Yi, K.; Wu, J. Probabilistic end-to-end noise correction for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Kim, T.; Ko, J.; Choi, J.; Yun, S.Y. Fine samples for learning with noisy labels. In Proceedings of the Conference on Neural Information Processing Systems, Virtual, 7–10 December 2021. [Google Scholar]
- Li, J.; Wong, Y.; Zhao, Q.; Kankanhalli, M.S. Learning to learn from noisy labeled data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Wu, Y.; Shu, J.; Xie, Q.; Zhao, Q.; Meng, D. Learn To Purify Noisy Labels via Meta Soft Label Corrector. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021. [Google Scholar]
- Zhang, W.; Wang, Y.; Qiao, Y. Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Malach, E.; Shalev-Shwartz, S. Decoupling “when to update” from “how to update”. In Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–7 December 2017. [Google Scholar]
- Ma, X.; Wang, Y.; Houle, M.; Zhou, S.; Erfani, S.; Xia, S.; Wijewickrema, S.; Bailey, J. Dimensionality-driven learning with noisy labels. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Li, J.; Xiong, C.; Hoi, S. Learning from noisy data with robust representation learning. In Proceedings of the International Conference on Computer Vision, Virtual, 11–17 October 2021. [Google Scholar]
- Yang, M.; Huang, Z.; Hu, P.; Li, T.; Lv, J.; Peng, X. Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022. [Google Scholar]
- Wang, Y.; Baldwin, T.; Verspoor, K. Noisy Label Regularisation for Textual Regression. In Proceedings of the International Conference on Computational Linguistics, Gyeongju, Korea, 10–12 October 2022. [Google Scholar]
Category | Methods |
---|---|
Noise transition matrix | Fcorr [20], Webly learning [21], Probabilistic noise modeling [22] |
Robust Loss Function | Robust MAE [23], Generalized Cross Entropy [24], Symmetric Cross Entropy [25] |
Sample Selection | MentorNet [16], Co-teaching [17], Co-teaching+ [18], Iterative-CV [26] |
Hybrid Methods | SELF [27], Mcorr [8], DivideMix [19], MOIT+ [28], Sel-CL+ [12] |
Method | TCNet [13] | Meta-Cleaner [42] | ELR | FINE | MSLC | Divide-Mix+ | ELR+ | FGCM |
---|---|---|---|---|---|---|---|---|
Accuracy | 71.15 | 72.50 | 72.87 | 72.91 | 74.02 | 74.76 | 74.81 | 74.91 |
Method | WebVision | ILSVRC12 | ||
---|---|---|---|---|
Top1 | Top5 | Top1 | Top5 | |
Fcorr [20] | 61.15 | 82.68 | 57.36 | 82.36 |
Decoupling [43] | 62.54 | 84.74 | 58.26 | 82.26 |
D2L [44] | 62.68 | 84.00 | 57.80 | 81.36 |
MentorNet [16] | 63.00 | 81.40 | 57.80 | 79.92 |
Co-teaching [17] | 63.58 | 85.20 | 61.48 | 84.70 |
Iterative-CV [26] | 65.24 | 85.34 | 61.60 | 84.98 |
DivideMix+ [19] | 77.32 | 91.64 | 75.20 | 90.84 |
ELR [15] | 76.26 | 91.26 | 68.71 | 87.84 |
ELR+ [15] | 77.78 | 91.68 | 70.29 | 89.76 |
ProtoMix [45] | 76.3 | 91.5 | 73.3 | 91.2 |
FGCM | 77.84 | 91.76 | 74.56 | 90.24 |
Dataset | CIFAR-10 | |
---|---|---|
Method Noise Rates | Sym 20% | Sym 80% |
FGCM (CS + NS + CH + NHI + NHR + cluster refinement) | 95.6 | 94.1 |
FGCM (CS + NS + CH + NHI + NHR | 95.2 | 93.6 |
FGCM (CS + CH + NHI + NHR) | 94.5 | 93.0 |
FGCM (CS + NS + CH) | 93.8 | 76.9 |
FGCM w/o fine-grained sample categorization | 88.25 | 50.0 |
Cross Entropy | 86.8 | 62.9 |
Methods-Cluster Num | Sym 20% | Sym 50% | Sym 80% | Asym 40% |
---|---|---|---|---|
ANNO-GMM-2 | 99.70 | 97.94 | 96.73 | 99.79 |
FGCM-GMM-3 | 97.67 | 97.20 | 96.19 | 95.94 |
FGCM-GMM-4 | 99.10 | 97.42 | 95.40 | 95.40 |
FGCM-KMEANS-5 | 99.75 | 98.33 | 95.93 | 95.13 |
FGCM-GMM-5 | 99.75 | 99.02 | 97.23 | 97.04 |
Methods-Cluster Num | Sym 20% | Sym 50% | Sym 80% | Asym 40% |
---|---|---|---|---|
ANNO-GMM-2 | 77.05 | 52.09 | 16.44 | 41.39 |
FGCM-GMM-3 | 79.68 | 73.28 | 35.84 | 58.72 |
FGCM-GMM-4 | 83.80 | 71.04 | 32.57 | 63.20 |
FGCM-KMEANS-5 | 77.56 | 66.40 | 33.78 | 60.83 |
FGCM-GMM-5 | 79.35 | 71.42 | 29.57 | 60.59 |
Co-Teaching+ | Pcorr | Meta-Learning | DivideMix | Sel-CL+ | FGCM |
---|---|---|---|---|---|
4.3 h | 6.0 h | 8.6 h | 5.2 h | 7.2 h | 5.4 h |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, S.; Tian, X.; Jiang, R.; Chen, Y. FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling. Appl. Sci. 2022, 12, 11406. https://doi.org/10.3390/app122211406
Yan S, Tian X, Jiang R, Chen Y. FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling. Applied Sciences. 2022; 12(22):11406. https://doi.org/10.3390/app122211406
Chicago/Turabian StyleYan, Shaotian, Xiang Tian, Rongxin Jiang, and Yaowu Chen. 2022. "FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling" Applied Sciences 12, no. 22: 11406. https://doi.org/10.3390/app122211406
APA StyleYan, S., Tian, X., Jiang, R., & Chen, Y. (2022). FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling. Applied Sciences, 12(22), 11406. https://doi.org/10.3390/app122211406