Learning a Hierarchical Global Attention for Image Classification
Abstract
:1. Introduction
- We propose a hierarchical global attention (HGA) mechanism for comprehensive information consideration. The HGA hierarchically finds the spatial-wise relations among features with multi-scale structure design and utilizes nonlinear exploration to learn an adaptive attention.
- We provide several patterns for applying HGA to exist network backbones, which demonstrates its flexible applications for different structures.
- Experimental results show HGA can boost the image classification capacity for different advanced network structures with restricted computation complexity and parameters.
2. Related Works
2.1. CNN-Based Image Classification
2.2. Attention Mechanism for Image Classification
3. Hierarchical Global Attention
3.1. Multi-Scale Feature Exploitation
3.2. Hierarchical Global Average Pooling
3.3. Excitation Exploration
4. Implementation and Discussion
5. Experiments
5.1. Results
5.2. Ablation Study
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Florea, G.; Mihailescu, R.C. Deep Learning for Group Activity Recognition in Smart Office Environments. Future Internet 2020, 12, 133. [Google Scholar] [CrossRef]
- Song, X.; Yang, H.; Zhou, C. Pedestrian Attribute Recognition with Graph Convolutional Network in Surveillance Scenarios. Future Internet 2019, 11, 245. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Qian, J.; Yao, Z.; Pan, J. Convolutional Two-Stream Network Using Multi-Facial Feature Fusion for Driver Fatigue Detection. Future Internet 2019, 11, 115. [Google Scholar] [CrossRef] [Green Version]
- Song, A.; Wu, Z.; Ding, X.; Hu, Q.; Di, X. Neurologist Standard Classification of Facial Nerve Paralysis with Deep Neural Networks. Future Internet 2018, 10, 111. [Google Scholar] [CrossRef] [Green Version]
- Roychowdhury, S.; Hage, P.; Vasquez, J. Azure-Based Smart Monitoring System for Anemia-Like Pallor. Appl. Sci. 2020, 10, 1681. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Girshick, R.B.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technically Report; Computer Science Department, University of Toronto: Toronto, ON, USA, 2009; Volume 1. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Gao, S.; Cheng, M.; Zhao, K.; Zhang, X.; Yang, M.; Torr, P.H.S. Res2Net: A New Multi-scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xie, S.; Girshick, R.B.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Zhang, Z.; Lin, H.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. ResNeSt: Split-Attention Networks. arXiv 2020, arXiv:2004.08955. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
- Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, 19–22 September 2016. [Google Scholar]
- Han, D.; Kim, J.; Kim, J. Deep Pyramidal Residual Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6307–6315. [Google Scholar] [CrossRef] [Green Version]
- Larsson, G.; Maire, M.; Shakhnarovich, G. FractalNet: Ultra-Deep Neural Networks without Residuals. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Zhang, T.; Qi, G.; Xiao, B.; Wang, J. Interleaved Group Convolutions for Deep Neural Networks. arXiv 2017, arXiv:1707.02725. [Google Scholar]
- Xie, G.; Wang, J.; Zhang, T.; Lai, J.; Hong, R.; Qi, G. IGCV2: Interleaved Structured Sparse Convolutional Neural Networks. arXiv 2018, arXiv:1804.06202. [Google Scholar]
- Sun, K.; Li, M.; Liu, D.; Wang, J. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks. In Proceedings of the British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Li, Z.; Loy, C.C.; Lin, D. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 3900–3908. [Google Scholar] [CrossRef] [Green Version]
- Xie, L.; Yuille, A.L. Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 1388–1397. [Google Scholar] [CrossRef]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8697–8710. [Google Scholar] [CrossRef] [Green Version]
- Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, F.; Yuille, A.L.; Huang, J.; Murphy, K. Progressive Neural Architecture Search. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 2820–2828. [Google Scholar] [CrossRef] [Green Version]
- Zhong, Z.; Yan, J.; Wu, W.; Shao, J.; Liu, C. Practical Block-Wise Neural Network Architecture Generation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2423–2432. [Google Scholar] [CrossRef] [Green Version]
- Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing Neural Network Architectures using Reinforcement Learning. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable Architecture Search. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 510–519. [Google Scholar] [CrossRef] [Green Version]
Output Size | ResNet-50 [15] | HGA-ResNet-50 | HGA-ResNeXt-50 |
---|---|---|---|
GAP, 1000-d fc, softmax | GAP, 1000-d fc, softmax | GAP, 1000-d fc, softmax |
Model | ResNet-50 | ResNet-101 | ResNet-152 | ResNeXt-50 | ResNeXt-101 | |
---|---|---|---|---|---|---|
Vanilla | Top 1 err.(%) | 24.80 | 23.17 | 22.42 | 22.11 | 21.18 |
Top 5 err.(%) | 7.48 | 6.52 | 6.34 | 5.90 | 5.57 | |
FLOPs(G) | 4.11 | 7.83 | 11.55 | 4.25 | 8.01 | |
Params(M) | 25.55 | 44.54 | 60.19 | 25.02 | 44.17 | |
HGA | Top 1 err.(%) | 22.48 (−2.32) | 21.50 (−1.67) | 20.63 (−1.79) | 20.22 (−1.89) | 19.82 (−1.36) |
Top 5 err.(%) | 6.22 (−1.26) | 5.60 (−0.92) | 5.23 (−1.11) | 4.88 (−1.02) | 4.59 (−0.98) | |
FLOPs(G) | 4.38 | 8.23 | 12.13 | 4.53 | 8.71 | |
Params(M) | 26.73 | 47.07 | 63.92 | 26.21 | 46.70 |
Model | ResNet-50 | ResNet-101 | ResNet-152 | ResNeXt-50 | ResNeXt-101 | |
---|---|---|---|---|---|---|
Vanilla | Top 1 err.(%) | 24.80 | 23.17 | 22.42 | 22.11 | 21.18 |
SENet [7] | Top 1 err.(%) | 23.29 (−1.51) | 22.38 (−0.79) | 21.57 (−0.85) | 21.10 (−1.01) | 20.70 (−0.48) |
FLOPs(G) | 4.11 | 7.83 | 11.56 | 4.26 | 8.01 | |
SKNet [35] | Top 1 err.(%) | - | - | - | 20.79 (−1.23) | 20.19 (−0.84) |
FLOPs(G) | - | - | - | 4.47 | 8.46 | |
HGA | Top 1 err.(%) | 22.48 (−2.32) | 21.50 (−1.67) | 20.63 (−1.79) | 20.22 (−1.89) | 19.82 (−1.36) |
FLOPs(G) | 4.38 | 8.23 | 12.13 | 4.53 | 8.71 |
Model | R-110 [15] | R-164 [15] | SE-R-110 [7] | SE-R-164 [7] | HGA-R-110 | HGA-R-164 |
---|---|---|---|---|---|---|
CIFAR-10 | 6.37 | 5.46 | 5.21 | 4.39 | 4.52 | 3.98 |
CIFAR-100 | 26.88 | 24.33 | 23.85 | 21.31 | 22.02 | 20.82 |
MFE | HGA | EE | ResNet-101 | ResNeXt-101 | ||
---|---|---|---|---|---|---|
Top 1 err.(%) | Top 5 err.(%) | Top 1 err.(%) | Top 5 err.(%) | |||
✓ | ✓ | ✓ | 21.50 | 5.60 | 19.82 | 4.59 |
✗ | ✓ | ✓ | 21.91 | 5.73 | 20.22 | 4.81 |
✓ | ✗ | ✓ | 21.58 | 5.63 | 19.92 | 4.62 |
✓ | ✓ | ✗ | 21.63 | 5.69 | 20.01 | 4.69 |
✓ | ✗ | ✗ | 21.85 | 5.69 | 20.03 | 4.63 |
✗ | ✓ | ✗ | 22.40 | 6.20 | 20.48 | 5.09 |
✗ | ✗ | ✓ | 22.00 | 5.75 | 20.34 | 4.88 |
✗ | ✗ | ✗ | 23.17 | 6.52 | 21.18 | 5.57 |
ResNet-101 | ResNeXt-101 | ||||||
---|---|---|---|---|---|---|---|
Top 1 err.(%) | Top 5 err.(%) | Top 1 err.(%) | Top 5 err.(%) | ||||
✓ | 21.88 | 5.72 | 20.19 | 4.79 | |||
✓ | 21.76 | 5.69 | 20.11 | 4.71 | |||
✓ | 21.70 | 5.67 | 20.08 | 4.69 | |||
✓ | 21.67 | 5.66 | 20.01 | 4.66 | |||
✓ | ✓ | ✓ | ✓ | 21.50 | 5.60 | 19.82 | 4.59 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, K.; Gao, J.; Choi, K.-n.; Duan, L. Learning a Hierarchical Global Attention for Image Classification. Future Internet 2020, 12, 178. https://doi.org/10.3390/fi12110178
Cao K, Gao J, Choi K-n, Duan L. Learning a Hierarchical Global Attention for Image Classification. Future Internet. 2020; 12(11):178. https://doi.org/10.3390/fi12110178
Chicago/Turabian StyleCao, Kerang, Jingyu Gao, Kwang-nam Choi, and Lini Duan. 2020. "Learning a Hierarchical Global Attention for Image Classification" Future Internet 12, no. 11: 178. https://doi.org/10.3390/fi12110178
APA StyleCao, K., Gao, J., Choi, K. -n., & Duan, L. (2020). Learning a Hierarchical Global Attention for Image Classification. Future Internet, 12(11), 178. https://doi.org/10.3390/fi12110178