Variational Disentangle Zero-Shot Learning
Abstract
:1. Introduction
- We identify the non-injective problem that results from a lack of instance-level attributes for ZSL classification tasks.
- We introduce a novel VDZSL method that leverages variation inference to disentangle instance-specific attributes from shared class-specific information.
- Extensive experiments are conducted on three benchmark datasets, and our model generally outperforms other state-of-the-art methods.
2. Related Work
2.1. Zero-Shot Learning Variations
2.2. Variation Autoencoder
Generative Model for ZSL
3. Method
3.1. Problem Definition
3.2. Variational Disentangle Network
3.3. Margin Regularizer
3.4. Zero-Shot Learning
3.5. Final Training Objective
4. Experiment Set-Up
4.1. Datasets and Settings
4.1.1. Input Space
4.1.2. Evaluation Metrics
5. Discussion
5.1. Quantitative Results Discussion
5.2. Latent Space Visualization
5.3. Margin Analysis
5.4. Limitations and Future Work
5.4.1. Limitation
5.4.2. Future Work
- Designing end-to-end training strategies for zero-shot learning (ZSL) recognition allows for the avoidance of pretrained features. The use of task-specific features can enhance recognition performance, leading to more accurate results.
- Investigate more advanced distance measurements (e.g., the Wasserstein distance in the earth mover’s distance group) and their effects on the zero-shot learning task.
- By connecting the advanced generative model [40] with zero-shot learning, we can leverage its capabilities. Specifically, by generating images conditioned on attributes, we can produce a larger-scale dataset suitable for the zero-shot learning task.
5.4.3. Connecting to Real-World Applications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ZSL | Zero-shot learning |
References
- Palatucci, M.; Pomerleau, D.; Hinton, G.E.; Mitchell, T.M. Zero-shot learning with semantic output codes. In Proceedings of the Advances in Neural Information Processing Systems 22 (NIPS 2009), Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
- Frome, A.; Corrado, G.S.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.; Mikolov, T. Devise: A deep visual-semantic embedding model. In Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
- Akata, Z.; Perronnin, F.; Harchaoui, Z.; Schmid, C. Label-embedding for attribute-based classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Weston, J.; Bengio, S.; Usunier, N. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, 16–22 July 2011; Volume 3, pp. 2764–2770. [Google Scholar]
- Toutanova, K.; Chen, D.; Pantel, P.; Poon, H.; Choudhury, P.; Gamon, M. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 17–21 September 2015; pp. 1499–1509. [Google Scholar]
- Suthaharan, S.; Suthaharan, S. Support vector machine. In Machine Learning Models and Algorithms for Big DATA Classification: Thinking with Examples for Effective Learning; Springer: Berlin/Heidelberg, Germany, 2016; pp. 207–235. [Google Scholar]
- Romera-Paredes, B.; Torr, P. An embarrassingly simple approach to zero-shot learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 2152–2161. [Google Scholar]
- Xian, Y.; Akata, Z.; Sharma, G.; Nguyen, Q.; Hein, M.; Schiele, B. Latent embeddings for zero-shot classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVP 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 69–77. [Google Scholar]
- Kodirov, E.; Xiang, T.; Gong, S. Semantic autoencoder for zero-shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 3174–3183. [Google Scholar]
- Lampert, C.H.; Nickisch, H.; Harmeling, S. Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 453–465. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Xiang, T.; Gong, S. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2021–2030. [Google Scholar]
- Lu, C.; Krishna, R.; Bernstein, M.; Fei-Fei, L. Visual relationship detection with language priors. In Proceedings of the European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 852–869. [Google Scholar]
- Stafylakis, T.; Tzimiropoulos, G. Zero-shot keyword spotting for visual speech recognition in-the-wild. In Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Long, Y.; Liu, L.; Shao, L.; Shen, F.; Ding, G.; Han, J. From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kodirov, E.; Xiang, T.; Fu, Z.; Gong, S. Unsupervised Domain Adaptation for Zero-Shot Learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Lampert, C.H.; Nickisch, H.; Harmeling, S. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2009), Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Xian, Y.; Schiele, B.; Akata, Z. Zero-Shot Learning-The Good, the Bad and the Ugly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Rohrbach, M.; Ebert, S.; Schiele, B. Transfer learning in a transductive setting. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS 2013), Lake Tahoe, NV, USA, 5–8 December 2013. [Google Scholar]
- Song, J.; Shen, C.; Yang, Y.; Liu, Y.; Song, M. Transductive Unbiased Embedding for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Demirel, B.; Cinbis, R.G.I.C.N. Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Liu, W.; Chang, S.F. Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Xian, Y.; Lorenz, T.; Schiele, B.; Akata, Z. Feature Generating Networks for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Kumar Verma, V.; Arora, G.; Mishra, A.; Rai, P. Generalized Zero-Shot Learning via Synthesized Examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Zhu, Y.; Elhoseiny, M.; Liu, B.; Peng, X.; Elgammal, A. A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Felix, R.; Kumar, V.B.G.; Reid, I.; Carneiro, G. Multi-modal Cycle-consistent Generalized Zero-Shot Learning. In Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, H.; Long, Y.; Guan, Y.; Shao, L. Triple Verification Network for Generalized Zero-Shot Learning. IEEE Trans. Image Process. 2019, 28, 506–517. [Google Scholar] [CrossRef] [PubMed]
- Mensink, T.; Verbeek, J.; Perronnin, F.; Csurka, G. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In Proceedings of the European Conference on Computer Vision (ECCV 2012), Florence, Italy, 7–13 October 2012. [Google Scholar]
- Changpinyo, S.; Chao, W.L.; Gong, B.; Sha, F. Synthesized Classifiers for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Farhadi, A.; Endres, I.; Hoiem, D.; Forsyth, D. Describing objects by their attributes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Zhang, Z.; Saligrama, V. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Li, Y.; Zhang, J.; Zhang, J.; Huang, K. Discriminative Learning of Latent Features for Zero-Shot Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Jiang, H.; Wang, R.; Shan, S.; Chen, X. Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition. In Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Long, Y.; Liu, L.; Shen, Y.; Shao, L. Towards Affordable Semantic Searching: Zero-shot Retrieval via Dominant Attributes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In Proceedings of the International Workshop on Similarity-Based Pattern Recognition (SIMBAD 2015), Copenhagen, Denmark, 12–14 October 2015. [Google Scholar]
- Welinder, P.; Branson, S.; Mita, T.; Wah, C.; Schroff, F.; Belongie, S.; Perona, P. Caltech-UCSD birds 200. In Computation & Neural Systems Technical Report; California Institute of Technology: Pasadena, CA, USA, 2010. [Google Scholar]
- Nilsback, M.E.; Zisserman, A. Automated flower classification over a large number of classes. In Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), Bhubaneswar, India, 16–19 December 2008. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, J.; Zhou, F.; Wen, S.; Liu, X.; Lin, Y. Deep metric learning with angular loss. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2593–2601. [Google Scholar]
- Amyar, A.; Ruan, S.; Vera, P.; Decazes, P.; Modzelewski, R. RADIOGAN: Deep convolutional conditional generative adversarial network to generate PET images. In Proceedings of the 7th International Conference on Bioinformatics Research and Applications (ICBRA 2020), Berlin, Germany, 13–15 September 2020; pp. 28–33. [Google Scholar]
ZSL | Generalized ZSL | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Methods | AwA2 | CUB | FLO | AwA2 | CUB | FLO | ||||||
ts | tr | H | ts | tr | H | ts | tr | H | ||||
DAP | 46.1 | 40.0 | - | 0.0 | 84.7 | 0.0 | 1.7 | 67.9 | 3.3 | - | - | - |
IAP | 35.9 | 24.0 | - | 0.9 | 87.6 | 1.8 | 0.2 | 72.8 | 0.4 | - | - | - |
LATEM | 55.8 | 49.3 | 40.4 | 11.5 | 77.3 | 20.0 | 15.2 | 57.3 | 24.0 | 6.6 | 47.6 | 21.5 |
ALE | 62.5 | 54.9 | 48.5 | 14.0 | 81.8 | 23.9 | 23.7 | 62.8 | 34.4 | 13.3 | 61.6 | 21.9 |
DEVISE | 59.7 | 52.0 | 45.9 | 17.1 | 74.7 | 27.8 | 23.8 | 53.0 | 32.8 | 9,9 | 44.2 | 16.2 |
SJE | 61.9 | 53.9 | 53.4 | 8.0 | 73.9 | 14.4 | 23.5 | 59.2 | 33.6 | 13.9 | 47.6 | 21.5 |
ESZSL | 58.6 | 53.9 | 51.0 | 5.9 | 77.8 | 11.0 | 12.6 | 63.8 | 21.0 | 11.4 | 56.8 | 19.0 |
SAE | 58.1 | 42.0 | 45.6 | 1.1 | 82.8 | 2.2 | 17.4 | 50.7 | 25.9 | - | - | - |
VDZSL (Ours) | 70.0 | 53.0 | 60.0 | 20.0 | 85.0 | 32.3 | 24.8 | 48.5 | 32.9 | 23.5 | 79.7 | 36.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, J.; Wan, J.; Li, T.; Li, X.; Ye, Y. Variational Disentangle Zero-Shot Learning. Mathematics 2023, 11, 3578. https://doi.org/10.3390/math11163578
Su J, Wan J, Li T, Li X, Ye Y. Variational Disentangle Zero-Shot Learning. Mathematics. 2023; 11(16):3578. https://doi.org/10.3390/math11163578
Chicago/Turabian StyleSu, Jie, Jinhao Wan, Taotao Li, Xiong Li, and Yuheng Ye. 2023. "Variational Disentangle Zero-Shot Learning" Mathematics 11, no. 16: 3578. https://doi.org/10.3390/math11163578
APA StyleSu, J., Wan, J., Li, T., Li, X., & Ye, Y. (2023). Variational Disentangle Zero-Shot Learning. Mathematics, 11(16), 3578. https://doi.org/10.3390/math11163578