EffShuffNet: An Efficient Neural Architecture for Adopting a Multi-Model
Abstract
:1. Introduction
- We propose the EffShuff and EffShuff-Dense blocks, which are modifications of the ShuffleNet-v2 unit architecture. Specifically, the “depth-wise convolution” in the transition stage of the ShuffleNet-v2 unit is replaced with “average pooling”. This modification enables our EffShuffNet to effectively employ dense connections, thereby enhancing the representation capability of the model by incorporating context throughout all bottom layers.
- We evaluate the proposed architecture using age-and-gender prediction as a representative task requiring multiple light-weight models. Additionally, we test the EffShuff-Dense unit with fine-grained classification tasks, which are challenging tasks, to demonstrate that dense connections are a primary factor in enhancing the performance of the model, which is missing in recent lightweight models.
2. Related Work
3. Methodology
3.1. Age and Gender Classification
3.2. EffShuffNet
3.3. EffShuffDenseNet
4. Experimental Results
4.1. Environmental Settings
4.2. Dataset
- The Audience dataset [30] is a well-known dataset for age and gender classification that was published in 2014. The images used in this dataset were captured under various real-world conditions, including different appearances, poses, lighting conditions, and image quality. The dataset consisted of 15,163 images for age classification, which was divided into 10 classes, and 15,300 images for gender classification, which was divided into four classes.
4.3. Results
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 18 August 2009; pp. 248–255. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Kaiming, H.; Xiangyu, Z.; Shaoqing, R.; Jian, S. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Jongchan, P.; Sanghyun, W.; Joon-Young, L.; So, K.I. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Sanghyun, W.; Jongchan, P.; Joon-Young, L.; So, K.I. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Alexey, D.; Lucas, B.; Alexander, K.; Dirk, W.; Xiaohua, Z.; Thomas, U.; Mostafa, D.; Matthias, M.; Georg, H.; Sylvain, G.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Iandola, I.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Xiangyu, Z.; Xinyu, Z.; Mengxiao, L.; Jian, S. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Mark, S.; Andrew, H.; Menglong, Z.; Andrey, Z.; Liang-Chieh, C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Ningning, M.; Xiangyu, Z.; Hai-Tao, Z.; Jian, S. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Jiahui, Y.; ZiRui, W.; Vijay, V.; Legg, Y.; Mojtaba, S.; Yonghui, W. CoCa: Contrastive Captioners are Image-Text Foundation Models. arXiv 2022, arXiv:2205.01917. [Google Scholar]
- Alec, R.; Wook, K.J.; Chris, H.; Aditya, R.; Gabriel, G.; Sandhini, A.; Girish, S.; Amanda, A.; Pamela, M.; Jack, C.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; Volume 139, pp. 8748–8763. [Google Scholar]
- Kang, J.; Jeonghwan, G. Ensemble learning of lightweight deep learning models using knowledge distillation for image classification. Mathematics 2020, 8, 1652. [Google Scholar] [CrossRef]
- Hinton, G.; Oriol, V.; Jeff, D. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Raphael, T.; Yao, L.; Linqing, L.; Lili, M.; Olga, V.; Jimmy, L. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. arXiv 2019, arXiv:1903.12136. [Google Scholar]
- Qizhe, X.; Minh-Thang, L.; Eduard, H.; Quoc, V.L. Self-training with Noisy Student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10687–10698. [Google Scholar]
- Gaojing, W.; Qingquan, L.; Lei, W.; Yuanshi, Z.; Zheng, L. Elderly Fall Detection with an Accelerometer Using Lightweight Neural Networks. Electronics 2019, 8, 1354. [Google Scholar] [CrossRef] [Green Version]
- Wei, L.; Liangchi, Z.; Chuhan, W.; Zhenxiang, C.; Chao, N. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1995–2015. [Google Scholar]
- Eran, E.; Roee, E.; Tal, H. Age and Gender Estimation of Unfiltered Faces. Trans. Inf. Forensics Secur. 2014, 9, 2170–2179. [Google Scholar]
- Gerry. Butterfly & Moths Image Classification 100 Species. Available online: https://www.kaggle.com/datasets/gpiosenka/butterfly-images40-species (accessed on 6 March 2023).
- Van Horn, G.; Branson, S.; Farrell, R.; Haber, S.; Barry, J.; Ipeirotis, P.; Perona, P.; Belongie, S. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 595–604. [Google Scholar]
Layers | Stage | Output Size | Output Filters | Repeat |
---|---|---|---|---|
Input | General Stem | 224 × 224 | 3 | 1 |
Convolution | 112 × 112 | 24 | 1 | |
MaxPool | 56 × 56 | 24 | 1 | |
EffShuff transition | Stage 1 | 28 × 28 | 116 | 1 |
EffShuff block | 28 × 28 | 116 | 1 | |
EffShuff block | 28 × 28 | 116 | 2 | |
EffShuff transition | Stage 2 | 14 × 14 | 232 | 1 |
EffShuff block | 14 × 14 | 232 | 1 | |
EffShuff block | 14 × 14 | 232 | 6 | |
EffShuff transition | Stage 3 | 7 × 7 | 464 | 1 |
EffShuff block | 7 × 7 | 464 | 1 | |
EffShuff block | 7 × 7 | 464 | 2 | |
GAP | Classifier | 1 × 1 | 464 | 1 |
Softmax | 1 × 1 | No of classes | 1 |
Layers | Stage | Output Size | Output Filters | Growth Rate (K) |
---|---|---|---|---|
Input | General Stem | 224 × 224 | 3 | |
Convolution | 112 × 112 | 64 | ||
MaxPool | 56 × 56 | 64 | ||
EffShuff transition | Stage 1 | 28 × 28 | 92 | |
EffShuff-Dense block | 28 × 28 | 108 | 16 | |
EffShuff-Dense block | 28 × 28 | 124 | 16 | |
EffShuff-Dense block | 28 × 28 | 140 | 16 | |
EffShuff transition | Stage 2 | 14 × 14 | 176 | |
EffShuff-Dense block | 14 × 14 | 192 | 16 | |
EffShuff-Dense block | 14 × 14 | 208 | 16 | |
EffShuff-Dense block | 14 × 14 | 224 | 16 | |
EffShuff-Dense block | 14 × 14 | 224 | 16 | |
EffShuff-Dense block | 14 × 14 | 224 | 16 | |
EffShuff-Dense block | 14 × 14 | 272 | 16 | |
EffShuff-Dense block | 14 × 14 | 288 | 16 | |
EffShuff transition | Stage 3 | 7 × 7 | 440 | |
EffShuff-Dense block | 7 × 7 | 456 | 16 | |
EffShuff-Dense block | 7 × 7 | 472 | 16 | |
EffShuff-Dense block | 7 × 7 | 488 | 16 | |
GAP | Classifier | 1 × 1 | 488 | |
Softmax | 1 × 1 | No of classes |
Model | Top-1 Accuracy | Params | Flops |
---|---|---|---|
ResNet50 (PreTrained) | 98.06 | 24.59 M | 124.33 G |
ResNet50 (Scratch) | 96.3 | 24.59 M | 124.33 G |
DenseNet201 (PreTrained) | 98.94 | 19.26 M | 138.39 G |
DenseNet201 (Scratch) | 95.91 | 19.26 M | 138.39 G |
EfficientNetB0 (PreTrained) | 96.63 | 4.67 M | 13.01 G |
EfficientNetB0 (Scratch) | 93.27 | 4.67 M | 13.01 G |
MobileNetV2 (PreTrained) | 98.15 | 2.88 M | 10.00 G |
MobileNetV2 (Scratch) | 95.31 | 2.88 M | 10.00 G |
ShuffleNetV2 (Scratch) | 95.97 | 1.95 M | 5.86 G |
EffShuffNet | 96.37 | 1.13 M | 4.85 G |
EffShuffDenseNet | 97.42 | 1.43 M | 7.44 G |
Model | Top-1 Accuracy | Params | Flops |
---|---|---|---|
ResNet50 (PreTrained) | 98.75 | 24.59 M | 124.33 G |
ResNet50 (Scratch) | 97.18 | 24.59 M | 124.33 G |
DenseNet201 (PreTrained) | 98.15 | 19.26 M | 138.39 G |
DenseNet201 (Scratch) | 96.73 | 19.26 M | 138.39 G |
EfficientNetB0 (PreTrained) | 92.94 | 4.67 M | 13.01 G |
EfficientNetB0 (Scratch) | 95.09 | 4.67 M | 13.01 G |
MobileNetV2 (PreTrained) | 98.43 | 2.88 M | 10.00 G |
MobileNetV2 (Scratch) | 96.92 | 2.88 M | 10.00 G |
ShuffleNetV2 (Scratch) | 96.53 | 1.95 M | 5.86 G |
EffShuffNet | 97.58 | 1.13 M | 4.85 G |
EffShuffDenseNet | 97.84 | 1.43 M | 7.44 G |
Model | Top-1 Accuracy | Params | Flops |
---|---|---|---|
ResNet50 (PreTrained) | 99.68 | 33.62 M | 124.33 G |
ResNet50 (Scratch) | 94.29 | 33.62 M | 124.33 G |
DenseNet201 (PreTrained) | 99.60 | 27.73 M | 138.39 G |
DenseNet201 (Scratch) | 96.91 | 27.73 M | 138.39 G |
EfficientNetB0 (PreTrained) | 99.20 | 10.32 M | 13.01 G |
EfficientNetB0 (Scratch) | 94.06 | 10.32 M | 13.01 G |
MobileNetV2 (PreTrained) | 99.76 | 8.53 M | 10.00 G |
MobileNetV2 (Scratch) | 95.64 | 8.53 M | 10.00 G |
ShuffleNetV2 (Scratch) | 95.56 | 2.05 M | 5.86 G |
EffShuff-Block | 97.70 | 1.21 M | 4.85 G |
EffShuff-Dense-Block | 98.25 | 1.47 M | 7.44 G |
Model | Top-1 Accuracy | Params | Flops |
---|---|---|---|
ResNet50 (PreTrained) | 93.18 | 79.28 M | 248.67 G |
ResNet50 (Scratch) | 90.42 | 79.28 M | 248.67 G |
DenseNet201 (PreTrained) | 95.22 | 70.53 M | 276.79 G |
DenseNet201 (Scratch) | 91.47 | 70.53 M | 276.79 G |
EfficientNetB0 (PreTrained) | 87.41 | 38.85 M | 26.01 G |
EfficientNetB0 (Scratch) | 90.63 | 38.85 M | 26.01 G |
MobileNetV2 (PreTrained) | 94.17 | 37.06 M | 20.00 G |
MobileNetV2 (Scratch) | 90.60 | 37.06 M | 20.00 G |
ShuffleNetV2 (Scratch) | 87.41 | 2.51 M | 11.73 G |
EffShuffNet | 88.26 | 1.64 M | 9.70 G |
EffShuffNet-Plus | 91.68 | 1.92 M | 10.53 G |
EffShuffDenseNet | 91.08 | 1.69 M | 14.89 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.-I.; Yu, G.-H.; Lee, J.; Vu, D.T.; Kim, J.-H.; Park, H.-S.; Kim, J.-Y.; Hong, S.-H. EffShuffNet: An Efficient Neural Architecture for Adopting a Multi-Model. Appl. Sci. 2023, 13, 3505. https://doi.org/10.3390/app13063505
Kim J-I, Yu G-H, Lee J, Vu DT, Kim J-H, Park H-S, Kim J-Y, Hong S-H. EffShuffNet: An Efficient Neural Architecture for Adopting a Multi-Model. Applied Sciences. 2023; 13(6):3505. https://doi.org/10.3390/app13063505
Chicago/Turabian StyleKim, Jong-In, Gwang-Hyun Yu, Jin Lee, Dang Thanh Vu, Jung-Hyun Kim, Hyun-Sun Park, Jin-Young Kim, and Sung-Hoon Hong. 2023. "EffShuffNet: An Efficient Neural Architecture for Adopting a Multi-Model" Applied Sciences 13, no. 6: 3505. https://doi.org/10.3390/app13063505
APA StyleKim, J. -I., Yu, G. -H., Lee, J., Vu, D. T., Kim, J. -H., Park, H. -S., Kim, J. -Y., & Hong, S. -H. (2023). EffShuffNet: An Efficient Neural Architecture for Adopting a Multi-Model. Applied Sciences, 13(6), 3505. https://doi.org/10.3390/app13063505