Ensemble Learning of Lightweight Deep Learning Models Using Knowledge Distillation for Image Classification
Abstract
:1. Introduction
- We designed and implemented an ensemble model that combines feature-based, response-based, and relation-based lightweight knowledge distillation models.
- We conducted extensive experiments on various knowledge distillation models and our proposed ensemble models under the same conditions for a fair comparison.
- We showed that our proposed ensemble model outperforms other state-of-the-art distillation models as well as large teacher networks on two different datasets (CIFAR-10 and CIFAR-100) with less computational power.
2. Related Work
2.1. Model Compression
2.1.1. Low-Rank Factorization
2.1.2. Parameter Sharing and Pruning
2.1.3. Transferred/Compact Convolutional Filters
2.2. Knowledge Distillation
2.2.1. Response-Based Knowledge
2.2.2. Feature-Based Knowledge
2.2.3. Relation-Based Knowledge
3. Proposed Methods
3.1. Image Augmentation
3.2. Knowledge Distillation
3.2.1. Student Loss
3.2.2. Distilled Loss of the Response-Based Model
3.2.3. Distilled Loss of the Feature-Based Model
3.2.4. Distilled Loss of the Relation-Based Model
3.3. Ensemble of KD Models
4. Experiments and Results
4.1. Dataset
4.2. Experimental Settings
4.3. Results
5. Discussion
5.1. Comparison between our Experimental Results and Others
5.2. Computational Advantage/Disadvantage of our Method
5.3. The Size of Training Data
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A
Method | CIFAR-10 | CIFAR-100 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Top-1 | Top-2 | Top-3 | Top-4 | Top-5 | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 | |
Teacher | 94.1 | 98.06 | 99.28 | 99.67 | 99.79 | 72.69 | 83.13 | 87.95 | 90.51 | 92.25 |
Student | 92.46 | 97.67 | 98.96 | 99.55 | 99.8 | 69.05 | 80.47 | 86.32 | 89.07 | 91.13 |
Logits [26] | 93.14 | 97.91 | 99.15 | 99.63 | 99.82 | 70.19 | 81.96 | 87.34 | 90.23 | 92.26 |
Soft target [27] | 92.89 | 97.68 | 99.03 | 99.61 | 99.81 | 70.42 | 81.71 | 86.93 | 90.01 | 91.81 |
AT [29] | 93.44 | 97.86 | 99.19 | 99.69 | 99.85 | 69.69 | 81.22 | 86.46 | 89.61 | 91.41 |
Fitnet [28] | 92.59 | 97.58 | 99.05 | 99.62 | 99.8 | 69.1 | 81.42 | 86.65 | 89.79 | 91.87 |
NST [36] | 92.93 | 97.61 | 99.1 | 99.62 | 99.82 | 69.09 | 81.19 | 86.53 | 89.53 | 91.51 |
PKT [47] | 93.11 | 97.89 | 99.15 | 99.48 | 99.73 | 68.96 | 80.62 | 86.06 | 89.05 | 91.03 |
FSP [46] | 92.43 | 97.58 | 99.07 | 99.64 | 99.79 | 69.63 | 81.5 | 86.92 | 89.74 | 91.91 |
FT [37] | 93.32 | 97.92 | 99.12 | 99.58 | 99.74 | 70.11 | 81.62 | 87.09 | 90.16 | 91.83 |
RKD [30] | 93.21 | 97.91 | 99.14 | 99.5 | 99.76 | 69.32 | 81.19 | 86.35 | 89.35 | 91.33 |
AB [38] | 93.04 | 97.56 | 99.08 | 99.62 | 99.78 | 69.66 | 81.37 | 86.64 | 89.72 | 91.51 |
SP [48] | 92.97 | 97.71 | 99.13 | 99.65 | 99.85 | 70.09 | 81.61 | 86.74 | 89.92 | 91.79 |
Sobolev [39] | 92.62 | 97.63 | 98.96 | 99.57 | 99.81 | 68.53 | 80.96 | 86.61 | 89.81 | 91.78 |
BSS [40] | 92.56 | 97.67 | 99.01 | 99.53 | 99.79 | 69.57 | 81.59 | 87.13 | 90.17 | 92.01 |
CC [31] | 92.74 | 97.62 | 99.16 | 99.6 | 99.8 | 69.06 | 80.87 | 86.12 | 89.48 | 91.4 |
IRG [49] | 93.05 | 97.99 | 99.14 | 99.62 | 99.81 | 69.88 | 81.63 | 86.87 | 89.92 | 91.64 |
VID [41] | 92.37 | 97.6 | 99.04 | 99.59 | 99.84 | 68.84 | 80.83 | 86.3 | 89.47 | 91.38 |
OFD [42] | 92.86 | 97.52 | 98.92 | 99.54 | 99.82 | 69.77 | 81.4 | 86.86 | 89.98 | 91.85 |
AFD [43] | 92.96 | 97.75 | 99.03 | 99.51 | 99.78 | 68.86 | 80.77 | 86.29 | 89.54 | 91.6 |
CRD [44] | 92.67 | 97.76 | 99.03 | 99.61 | 99.81 | 71.01 | 81.99 | 87.53 | 90.19 | 92.17 |
DML [45] | 92.87 | 97.88 | 99.21 | 99.71 | 99.87 | 70.53 | 82.43 | 87.44 | 90.05 | 92.02 |
Ens-AT-Logits | 93.97 | 98.14 | 99.36 | 99.73 | 99.91 | 73.61 | 84.7 | 89.29 | 91.87 | 93.49 |
Ens-RKD-Logits | 93.82 | 98.26 | 99.32 | 99.67 | 99.8 | 73.32 | 84.39 | 89.08 | 91.64 | 93.41 |
Ens-AT-RKD | 94.16 | 98.18 | 99.37 | 99.69 | 99.84 | 73.75 | 84.27 | 89.04 | 91.73 | 93.39 |
Ens-all | 94.41 | 98.32 | 99.42 | 99.74 | 99.92 | 74.55 | 85.53 | 89.88 | 92.34 | 94.07 |
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; MIT Press: Cambrige, UK, 2012; pp. 1097–1105. [Google Scholar]
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 1–13. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Lebedev, V.; Ganin, Y.; Rakhuba, M.; Oseledets, I.; Lempitsky, V. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arXiv 2014, arXiv:1412.6553. [Google Scholar]
- Tai, C.; Xiao, T.; Zhang, Y.; Wang, X. Convolutional neural networks with low-rank regularization. arXiv 2015, arXiv:1511.06067. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 1737–1746. [Google Scholar]
- Vanhoucke, V.; Senior, A.; Mao, M.Z. Improving the speed of neural networks on CPUs. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 12–17 December 2011. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Courbariaux, M.; Bengio, Y. Binarynet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv 2016, arXiv:1602.02830. [Google Scholar]
- Courbariaux, M.; Bengio, Y.; David, J.P. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems; MIT Press: Cambrige, UK, 2015; pp. 3123–3131. [Google Scholar]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland; pp. 525–542. [Google Scholar]
- Chen, W.; Wilson, J.; Tyree, S.; Weinberger, K.; Chen, Y. Compressing neural networks with the hashing trick. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2285–2294. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems; NIPS Proceedings: Red Hook, NY, USA, 2016; pp. 2074–2082. [Google Scholar]
- Yu, R.; Li, A.; Chen, C.F.; Lai, J.H.; Morariu, V.I.; Han, X.; Gao, M.; Lin, C.Y.; Davis, L.S. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9194–9203. [Google Scholar]
- Sindhwani, V.; Sainath, T.; Kumar, S. Structured transforms for small-footprint deep learning. In Advances in Neural Information Processing Systems; NIPS Proceedings: Red Hook, NY, USA, 2015; pp. 3088–3096. [Google Scholar]
- Kailath, T.; Chun, J. Generalized displacement structure for block-Toeplitz, Toeplitz-block, and Toeplitz-derived matrices. SIAM J. Matrix Anal. Appl. 1994, 15, 114–128. [Google Scholar] [CrossRef] [Green Version]
- Rakhuba, M.V.; Oseledets, I.V. Fast multidimensional convolution in low-rank tensor formats via cross approximation. SIAM J. Sci. Comput. 2015, 37, A565–A582. [Google Scholar] [CrossRef] [Green Version]
- Cohen, T.; Welling, M. Group equivariant convolutional networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2990–2999. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv 2016, arXiv:1602.07261. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Huang, G.; Liu, S.; Van der Maaten, L.; Weinberger, K.Q. Condensenet: An efficient DenseNet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2752–2761. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Ba, L.J.; Caruana, R. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems; NIPS Proceedings: Red Hook, NY, USA, 2014; pp. 2654–2662. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for thin deep nets. arXiv 2015, arXiv:1412.6550. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv 2017, arXiv:1612.03928. [Google Scholar]
- Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational knowledge distillation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3967–3976. [Google Scholar]
- Peng, B.; Jin, X.; Liu, J.; Li, D.; Wu, Y.; Liu, Y.; Zhou, S.; Zhang, Z. Correlation congruence for knowledge distillation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 5007–5016. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Meng, Z.; Li, J.; Zhao, Y.; Gong, Y. Conditional teacher-student learning. In Proceedings of the 44th International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019; pp. 6445–6449. [Google Scholar]
- Huang, Z.; Wang, N. Like what you like: Knowledge distill via neuron selectivity transfer. arXiv 2017, arXiv:1707.01219. [Google Scholar]
- Kim, J.; Park, S.; Kwak, N. Paraphrasing complex network: Network compression via factor transfer. In Advances in Neural Information Processing Systems; NIPS Proceedings: Red Hook, NY, USA, 2018; pp. 2760–2769. [Google Scholar]
- Heo, B.; Lee, M.; Yun, S.; Choi, J.Y. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 33, pp. 3779–3787. [Google Scholar]
- Czarnecki, W.M.; Osindero, S.; Jaderberg, M.; Swirszcz, G.; Pascanu, R. Sobolev training for neural networks. In Advances in Neural Information Processing Systems; NIPS Proceedings: Red Hook, NY, USA, 2017; pp. 4278–4287. [Google Scholar]
- Heo, B.; Lee, M.; Yuno, S.; Choi, J.Y. Knowledge distillation with adversarial samples supporting decision boundary. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 33, pp. 3771–3778. [Google Scholar]
- Ahn, S.; Hu, S.X.; Damianou, A.; Lawrence, N.D.; Dai, Z. Variational information distillation for knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9163–9171. [Google Scholar]
- Heo, B.; Kim, J.; Yun, S.; Park, H.; Kwak, N.; Choi, J.Y. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1921–1930. [Google Scholar]
- Wang, K.; Gao, X.; Zhao, Y.; Li, X.; Dou, D.; Xu, C. Pay attention to features, transfer learn faster CNNs. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; Available online: https://openreview.net/forum?id=ryxyCeHtPB (accessed on 4 July 2020).
- Tian, Y.; Krishnan, D.; Isola, P. Contrastive representation distillation. arXiv 2019, arXiv:1910.10699. [Google Scholar]
- Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4320–4328. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
- Passalis, N.; Tefas, A. Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 268–284. [Google Scholar]
- Tung, F.; Mori, G. Similarity-preserving knowledge distillation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1365–1374. [Google Scholar]
- Liu, Y.; Cao, J.; Li, B.; Yuan, C.; Hu, W.; Li, Y.; Duan, Y. Knowledge distillation via instance relationship graph. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7096–7104. [Google Scholar]
- Github. Available online: https://github.com/AberHu/Knowledge-Distillation-Zoo (accessed on 2 July 2020).
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Rachmadi, M.F.; Valdés-Hernández, M.D.C.; Agan, M.L.F.; Di Perri, C.; Komura, T. Segmentation of white matter hyperintensities using convolutional neural networks with global spatial information in routine clinical brain MRI with none or mild vascular pathology. Comput. Med. Imaging Graph. 2018, 66, 28–43. [Google Scholar] [CrossRef] [Green Version]
- Holden, D.; Komura, T.; Saito, J. Phase-functioned neural networks for character control. ACM Trans. Graph. (TOG) 2017, 36, 1–13. [Google Scholar] [CrossRef]
- Mousas, C.; Newbury, P.; Anagnostopoulos, C.N. Evaluating the covariance matrix constraints for data-driven statistical human motion reconstruction. In Proceedings of the 30th Spring Conference on Computer Graphics, Smolenice, Slovakia, 28–30 May 2014; pp. 99–106. [Google Scholar]
- Mousas, C.; Newbury, P.; Anagnostopoulos, C.N. Data-driven motion reconstruction using local regression models. In Proceedings of the 10th International Conference Artificial Intelligence Applications and Innovations, Rhodes, Greece, 19–21 September 2014; pp. 364–374. [Google Scholar]
- Suk, H.I.; Wee, C.Y.; Lee, S.W.; Shen, D. State-space model with deep learning for functional dynamics estimation in resting-state fMRI. NeuroImage 2016, 129, 292–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chéron, G.; Laptev, I.; Schmid, C. P-cnn: Pose-based cnn features for action recognition. In Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015; pp. 3218–3226. [Google Scholar]
- Saito, S.; Wei, L.; Hu, L.; Nagano, K.; Li, H. Photorealistic facial texture inference using deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5144–5153. [Google Scholar]
- Li, R.; Si, D.; Zeng, T.; Ji, S.; He, J. Deep convolutional neural networks for detecting secondary structures in protein density maps from cryo-electron microscopy. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016; pp. 41–46. [Google Scholar]
- Li, Z.; Zhou, Y.; Xiao, S.; He, C.; Huang, Z.; Li, H. Auto-conditioned recurrent networks for extended complex human motion synthesis. arXiv 2017, arXiv:1707.05363. [Google Scholar]
- Abdel-Hamid, O.; Mohamed, A.R.; Jiang, H.; Penn, G. Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model For Speech Recognition. In Proceedings of the IEEE International Conference on Acoustics, speech and signal processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 4277–4280. [Google Scholar]
- Rekabdar, B.; Mousas, C. Dilated Convolutional Neural Network for Predicting Driver’s Activity. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3245–3250. [Google Scholar]
- Rekabdar, B.; Mousas, C.; Gupta, B. Generative adversarial network with policy gradient for text summarization. In Proceedings of the IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, 30 January–1 February 2019; pp. 204–207. [Google Scholar]
- Bilmes, J.A.; Bartels, C. Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 2005, 22, 89–100. [Google Scholar] [CrossRef]
- Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
- Cao, L.J.; Chua, K.S.; Chong, W.K.; Lee, H.P.; Gu, Q.M. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 2003, 55, 321–336. [Google Scholar] [CrossRef]
- Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version]
- Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef] [Green Version]
- Ngiam, J.; Khosla, A.; Kim, M.; Nam, J.; Lee, H.; Ng, A.Y. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Nam, J.; Herrera, J.; Slaney, M.; Smith, J.O., III. Learning Sparse Feature Representations for Music Annotation and Retrieval. In Proceedings of the 13th International Society for Music Information Retrieval Conference, Porto, Portugal, 8–12 October 2012; pp. 565–570. [Google Scholar]
- Mousas, C.; Anagnostopoulos, C.N. Learning motion features for example-based finger motion estimation for virtual characters. 3D Res. 2017, 8, 25. [Google Scholar] [CrossRef]
Method | CIFAR-10 | CIFAR-100 | ||
---|---|---|---|---|
Top-1 | Top-2 | Top-1 | Top-5 | |
Teacher | 94.10 | 98.06 | 72.69 | 92.25 |
Student | 92.46 | 97.67 | 69.05 | 91.13 |
Logits [26] | 93.14 | 97.91 | 70.19 | 92.26 |
Soft target [27] | 92.89 | 97.68 | 70.42 | 91.81 |
AT [29] | 93.44 | 97.86 | 69.69 | 91.41 |
Fitnet [28] | 92.59 | 97.58 | 69.1 | 91.87 |
NST [36] | 92.93 | 97.61 | 69.09 | 91.51 |
PKT [47] | 93.11 | 97.89 | 68.96 | 91.03 |
FSP [46] | 92.43 | 97.58 | 69.63 | 91.91 |
FT [37] | 93.32 | 97.92 | 70.11 | 91.83 |
RKD [30] | 93.21 | 97.91 | 69.32 | 91.33 |
AB [38] | 93.04 | 97.56 | 69.66 | 91.51 |
SP [48] | 92.97 | 97.71 | 70.09 | 91.79 |
Sobolev [39] | 92.62 | 97.63 | 68.53 | 91.78 |
BSS [40] | 92.56 | 97.67 | 69.57 | 92.01 |
CC [31] | 92.74 | 97.62 | 69.06 | 91.4 |
IRG [49] | 93.05 | 97.99 | 69.88 | 91.64 |
VID [41] | 92.37 | 97.6 | 68.84 | 91.38 |
OFD [42] | 92.86 | 97.52 | 69.77 | 91.85 |
AFD [43] | 92.96 | 97.75 | 68.86 | 91.6 |
CRD [44] | 92.67 | 97.76 | 71.01 | 92.17 |
DML [45] | 92.87 | 97.88 | 70.53 | 92.02 |
Ens-AT-Logits | 93.97 | 98.14 | 73.61 | 93.49 |
Ens-RKD-Logits | 93.82 | 98.26 | 73.32 | 93.41 |
Ens-AT-RKD | 94.16 | 98.18 | 73.75 | 93.39 |
Ens-all | 94.41 | 98.32 | 74.55 | 94.07 |
Method | CIFAR-10 | CIFAR-100 | ||
---|---|---|---|---|
Time (s) | Param (MB) | Time (s) | Param (MB) | |
Teacher | 3.09 | 1.73 | 3.02 | 1.74 |
Student | 1.02 | 0.27 | 0.99 | 0.28 |
Ens-AT-Logits | 1.78 | 0.54 | 1.76 | 0.55 |
Ens-RKD-Logits | 1.74 | 0.54 | 1.72 | 0.55 |
Ens-AT-RKD | 1.78 | 0.54 | 1.76 | 0.55 |
Ens-all | 2.53 | 0.81 | 2.51 | 0.83 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, J.; Gwak, J. Ensemble Learning of Lightweight Deep Learning Models Using Knowledge Distillation for Image Classification. Mathematics 2020, 8, 1652. https://doi.org/10.3390/math8101652
Kang J, Gwak J. Ensemble Learning of Lightweight Deep Learning Models Using Knowledge Distillation for Image Classification. Mathematics. 2020; 8(10):1652. https://doi.org/10.3390/math8101652
Chicago/Turabian StyleKang, Jaeyong, and Jeonghwan Gwak. 2020. "Ensemble Learning of Lightweight Deep Learning Models Using Knowledge Distillation for Image Classification" Mathematics 8, no. 10: 1652. https://doi.org/10.3390/math8101652
APA StyleKang, J., & Gwak, J. (2020). Ensemble Learning of Lightweight Deep Learning Models Using Knowledge Distillation for Image Classification. Mathematics, 8(10), 1652. https://doi.org/10.3390/math8101652