Looking Back to Lower-Level Information in Few-Shot Learning
Abstract
:1. Introduction
- We propose a novel FSL meta-learning method, Looking-Back, that utilizes lower-level information from hidden layers, which is different from existing FSL methods that only use feature embedding of the last layer during meta-training.
- We implement our Looking-Back method using a graph neural network, which fully utilizes the advantage of graph structures for few-shot learning to absorb the lower-level information in the hidden layers of the neural network.
- We evaluate our proposed Looking-Back method on two popular FSL datasets, miniImageNet and tieredImageNet, and achieve new state-of-the-art results, providing supporting evidence that using lower-level information could result in better meta-learners in FSL tasks.
2. Related Work
2.1. Meta-Learning
2.1.1. Metric-Based Meta-Learning
2.1.2. Optimization-Based Meta-Learning
2.1.3. Graph-Based Meta-Learning
2.2. Transfer Learning
3. Proposed Method
3.1. Problem Definition
3.2. Feature Extractor Module
3.3. Graph Construction Module
3.4. Classification Loss
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Results and Discussion
4.3.1. Overall performance
4.3.2. Comparing Looking-Back and TPN Training in a “Higher Shot” Setting
4.3.3. Influence of Higher-Shot Training on Looking-Back
4.3.4. Why Only Using the Last Layer’s Information during Inference
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef] [Green Version]
- Wani, M.A.; Bhat, F.A.; Afzal, S.; Khan, A.I. Supervised deep learning in face recognition. In Advances in Deep Learning; Springer: Berlin/Heidelberg, Germany, 2020; pp. 95–110. [Google Scholar]
- Wang, W.; Liang, D.; Chen, Q.; Iwamoto, Y.; Han, X.H.; Zhang, Q.; Hu, H.; Lin, L.; Chen, Y.W. Medical image classification using deep learning. In Deep Learning in Healthcare; Springer: Berlin/Heidelberg, Germany, 2020; pp. 33–51. [Google Scholar]
- Raschka, S.; Patterson, J.; Nolet, C. Machine learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information 2020, 11, 193. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. 2019, 53, 63. [Google Scholar] [CrossRef]
- Kang, B.; Liu, Z.; Wang, X.; Yu, F.; Feng, J.; Darrell, T. Few-shot object detection via feature reweighting. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8420–8429. [Google Scholar]
- Bao, Y.; Wu, M.; Chang, S.; Barzilay, R. Few-shot Text Classification with Distributional Signatures. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3630–3638. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4077–4087. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Qi, H.; Brown, M.; Lowe, D.G. Low-shot learning with imprinted weights. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5822–5830. [Google Scholar]
- Gidaris, S.; Komodakis, N. Dynamic few-shot visual learning without forgetting. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4367–4375. [Google Scholar]
- Qiao, S.; Liu, C.; Shen, W.; Yuille, A.L. Few-shot image recognition by predicting parameters from activations. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7229–7238. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2224–2232. [Google Scholar]
- Kipf, T.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2017, arXiv:1609.02907. [Google Scholar]
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. arXiv 2018, arXiv:1710.10903. [Google Scholar]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1263–1272. [Google Scholar]
- Garcia, V.; Estrach, J.B. Few-shot learning with graph neural networks. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Liu, Y.; Lee, J.; Park, M.; Kim, S.; Yang, E.; Hwang, S.J.; Yang, Y. Learning to propagate labels: Transductive propogation network for few-shot learning. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Kim, J.; Kim, T.; Kim, S.; Yoo, C.D. Edge-labeling graph neural network for few-shot Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11–20. [Google Scholar]
- Ren, M.; Triantafillou, E.; Ravi, S.; Snell, J.; Swersky, K.; Tenenbaum, J.B.; Larochelle, H.; Zemel, R.S. Meta-Learning for semi-supervised few-shot classification. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Li, X.; Sun, Q.; Liu, Y.; Zhou, Q.; Zheng, S.; Chua, T.S.; Schiele, B. Learning to self-train for semi-supervised few-shot classification. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 10276–10286. [Google Scholar]
- Yu, Z.; Chen, L.; Cheng, Z.; Luo, J. TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12856–12864. [Google Scholar]
- Xing, C.; Rostamzadeh, N.; Oreshkin, B.; Pinheiro, P.O. Adaptive cross-modal few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 4848–4858. [Google Scholar]
- Schonfeld, E.; Ebrahimi, S.; Sinha, S.; Darrell, T.; Akata, Z. Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 8247–8255. [Google Scholar]
- Li, W.; Wang, L.; Xu, J.; Huo, J.; Gao, Y.; Luo, J. Revisiting local descriptor based image-to-class measure for few-shot learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7253–7260. [Google Scholar]
- Lifchitz, Y.; Avrithis, Y.; Picard, S.; Bursuc, A. Dense classification and implanting for few-shot learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9250–9259. [Google Scholar]
- Nichol, A.; Achiam, J.; Schulman, J. On first-order meta-learning algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Mallya, A.; Lazebnik, S. PackNet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7765–7773. [Google Scholar]
- Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A simple neural attentive meta-learner. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Oreshkin, B.; López, P.R.; Lacoste, A. TADAM: Task dependent adaptive metric for improved few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 721–731. [Google Scholar]
- Lee, K.; Maji, S.; Ravichandran, A.; Soatto, S. Meta-learning with differentiable convex optimization. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10657–10665. [Google Scholar]
- Sun, Q.; Liu, Y.; Chua, T.S.; Schiele, B. Meta-transfer learning for few-shot learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 403–412. [Google Scholar]
- Chung, F.R. Spectral Graph Theory; American Mathematical Soc. Press: Providence, RI, USA, 1997. [Google Scholar]
- Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Schölkopf, B. Learning with local and global consistency. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; pp. 321–328. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Fort, S. Gaussian prototypical networks for few-shot learning on omniglot. arXiv 2018, arXiv:1708.02735. [Google Scholar]
- Cao, T.; Law, M.T.; Fidler, S. A theoretical analysis of the number of sShots in few-shot learning. arXiv 2020, arXiv:1909.11722. [Google Scholar]
Method | Extract. Net. | 1-Shot | 5-Shot |
---|---|---|---|
Matching Net [9] | Conv-64 | 43.56 ± 0.84 | 55.31 ± 0.73 |
Prototypical Net [42] | Conv-64 | 49.42 ± 0.78 | 68.20 ± 0.66 |
Relation Net [11] | Conv-64 | 50.44 ± 0.82 | 65.32 ± 0.70 |
Reptile [32] | Conv-64 | 49.97 ± 0.32 | 65.99 ± 0.58 |
GNN [22] | Conv-64 | 49.02 ± 0.98 | 63.50 ± 0.84 |
MAML [12] | Conv-64 | 48.70 ± 1.84 | 63.11 ± 0.92 |
TPN [23] | Conv-64 | 53.75 ± 0.86 | 69.43 ± 0.67 |
Looking-Back | Conv-64 | 55.91 ± 0.86 | 70.99 ± 0.68 |
Method | Extract. Net. | 1-Shot | 5-Shot |
---|---|---|---|
Prototypical Net [42] | Conv-64 | 53.31 ± 0.89 | 72.69 ± 0.74 |
Relation Net [11] | Conv-64 | 54.48 ± 0.93 | 71.31 ± 0.78 |
Reptile [32] | Conv-64 | 52.36 ± 0.23 | 71.03 ± 0.22 |
MAML [12] | Conv-64 | 51.67 ± 1.81 | 70.30 ± 1.75 |
TPN [23] | Conv-64 | 57.53 ± 0.96 | 72.85 ± 0.74 |
Looking-Back | Conv-64 | 58.97 ± 0.97 | 73.59 ± 0.74 |
Dataset | Method | 1-Shot | 5-Shot |
---|---|---|---|
miniImageNet | TPN | 55.51 ± 0.86 | 69.86 ± 0.65 |
Looking-Back | 56.49 ± 0.83 | 70.47 ± 0.66 | |
tieredImageNet | TPN | 59.91 ± 0.94 | 73.30 ± 0.75 |
Looking-Back | 61.19 ± 0.92 | 73.78 ± 0.74 |
Training Approach | Dataset | 1-Shot | 5-Shot |
---|---|---|---|
Same | miniImageNet | 2.16 | 1.56 |
tieredImageNet | 1.44 | 0.74 | |
Higher | miniImageNet | 0.98 | 0.61 |
tieredImageNet | 1.28 | 0.48 |
Dataset | 1-Shot | 5-Shot |
---|---|---|
miniImageNet | 0.58 | −0.52 |
tieredImageNet | 2.22 | 0.19 |
Dataset | Setting | 2nd Layer | 3rd Layer | 4th Layer |
---|---|---|---|---|
miniImageNet | 1-shot | 42.24 ± 0.76 | 50.87 ± 0.81 | 55.91 ± 0.86 |
5-shot | 58.10 ± 0.72 | 67.07 ± 0.69 | 70.99 ± 0.68 | |
tieredImageNet | 1-shot | 46.25 ± 0.87 | 54.70 ± 0.93 | 58.97 ± 0.97 |
5-shot | 61.12 ± 0.75 | 69.94 ± 0.74 | 73.59 ± 0.74 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Z.; Raschka, S. Looking Back to Lower-Level Information in Few-Shot Learning. Information 2020, 11, 345. https://doi.org/10.3390/info11070345
Yu Z, Raschka S. Looking Back to Lower-Level Information in Few-Shot Learning. Information. 2020; 11(7):345. https://doi.org/10.3390/info11070345
Chicago/Turabian StyleYu, Zhongjie, and Sebastian Raschka. 2020. "Looking Back to Lower-Level Information in Few-Shot Learning" Information 11, no. 7: 345. https://doi.org/10.3390/info11070345
APA StyleYu, Z., & Raschka, S. (2020). Looking Back to Lower-Level Information in Few-Shot Learning. Information, 11(7), 345. https://doi.org/10.3390/info11070345