HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification
Abstract
:1. Introduction
- We introduce HCAM-CL as an intuitive, end-to-end solution for HIC, particularly for instances with variable-length hierarchies, by conceptualizing HIC as a sequential generation task.
- The incorporation of an enhanced hierarchical cross-attention mechanism provides superior differentiation of label dependencies within their hierarchical structures.
- An empirical evaluation of hierarchical benchmarks, including CIFAR-10, CIFAR-100, and a design patent image dataset, demonstrates the superior capabilities of our model in tackling various HIC challenges.
2. Related Work
3. Methodology
3.1. Problem Statement
3.2. The Framework of Our Model
3.2.1. Image Feature Extraction
3.2.2. Hierarchical Cross-Attention Mechanism
3.2.3. Hierarchical Classification
4. Experimental Results
4.1. Experimental Dataset and Settings
4.2. Evaluation Metric and Baseline Models
4.3. Results and Analysis
4.4. Ablations
- Removing the attention mechanism caused the accuracy of the model on the patent image dataset to drop from 82.98% to 70.05%. This significant drop underlines the central role of the cross-attention mechanism in capturing the intricate correlations between images and their associated labels.
- Reducing the number of cross-attention levels resulted in a drop in model accuracy on the design patent image dataset from 82.98% to 79.54%. This decrease suggests that a multi-layer cross-attention framework enhances the model’s competence in sequence generation tasks.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, H.; Hu, Q.; Zhu, P.; Wang, Y.; Wang, P. A Recursive Regularization Based Feature Selection Framework for Hierarchical Classification. IEEE Trans. Knowl. Data Eng. 2019, 33, 2833–2846. [Google Scholar] [CrossRef]
- Lima, H.C.S.C.; Otero, F.E.B.; Merschmann, L.H.C.; Souza, M.J.F. A Novel Hybrid Feature Selection Algorithm for Hierarchical Classification. IEEE Access 2021, 9, 127278–127292. [Google Scholar] [CrossRef]
- Fu, R.; Li, B.; Gao, Y.; Wang, P. CNN with coarse-to-fine layer for hierarchical classification. IET Comput. Vis. 2018, 12, 892–899. [Google Scholar] [CrossRef]
- Kowsari, K.; Sali, R.; Ehsan, L.; Adorno, W.; Ali, A.; Moore, S.; Amadi, B.; Kelly, P.; Syed, S.; Brown, D. HMIC: Hierarchical Medical Image Classification, A Deep Learning Approach. Information 2020, 11, 318. [Google Scholar] [CrossRef] [PubMed]
- He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 558–567. [Google Scholar]
- Gao, D.; Yang, W.; Zhou, H.; Wei, Y.; Hu, Y.; Wang, H. Deep Hierarchical Classification for Category Prediction in E-commerce System. arXiv 2020, arXiv:2005.06692. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
- Yan, Z.; Zhang, H.; Piramuthu, R.; Jagadeesh, V.; DeCoste, D.; Di, W.; Yu, Y. HD-CNN: Hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 2740–2748. [Google Scholar]
- Zhu, X.; Bain, M. B-CNN: Branch convolutional neural network for hierarchical classification. arXiv 2017, arXiv:1709.09890. [Google Scholar]
- Lin, T.Y.; RoyChowdhury, A.; Maji, S. Bilinear cnn models for fine-grained visual recognition. arXiv 2015, arXiv:1504.07889. [Google Scholar]
- Guo, Y.; Liu, Y.; Bakker, E.M.; Guo, Y.; Lew, M.S. CNN-RNN: A large-scale hierarchical image classification framework. Multimedia Tools Appl. 2017, 77, 10251–10271. [Google Scholar] [CrossRef]
- Koo, J.; Klabjan, D.; Utke, J. Combined convolutional and recurrent neural networks for hierarchical classification of images. arXiv 2018, arXiv:1809.09574. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Li, R.; Lin, C.; Collinson, M.; Li, X.; Chen, G. A hierarchical-attention hierarchical recurrent neural network for dialogue act classification. arXiv 2018, arXiv:1810.09154. [Google Scholar]
- Chen, T.; Wu, W.; Gao, Y.; Dong, L.; Luo, X.; Lin, L. Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 2023–2031. [Google Scholar]
- Chen, Q.; Liu, Q.; Lin, E. A knowledge-guide hierarchical learning method for long-tailed image classification. Neurocomputing 2021, 459, 408–418. [Google Scholar] [CrossRef]
- Pizarro, I.; Nanculef, R.; Valle, C. An Attention-Based Architecture for Hierarchical Classification with CNNs. IEEE Access 2023, 11, 32972–32995. [Google Scholar] [CrossRef]
- Seo, Y.; Shin, K.-S. Hierarchical convolutional neural networks for fashion image classification. Expert Syst. Appl. 2018, 116, 328–339. [Google Scholar] [CrossRef]
- Zhang, X.; Tang, L.; Luo, H.; Zhong, S.; Guan, Z.; Chen, L.; Zhao, C.; Peng, J.; Fan, J. Hierarchical bilinear convolutional neural network for image classification. IET Comput. Vis. 2021, 15, 197–207. [Google Scholar] [CrossRef]
- Taoufiq, S.; Nagy, B.; Benedek, C. HierarchyNet: Hierarchical CNN-based urban building classification. Remote Sens. 2020, 12, 3794. [Google Scholar] [CrossRef]
- Noor, K.T.; Robles-Kelly, A.; Kusy, B. A capsule network for hierarchical multi-label image classification. In Structural, Syntactic, and Statistical Pattern Recognition; Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR); Springer International Publishing: Cham, Switzerland, 2022. [Google Scholar]
- He, G.; Huo, Y.; He, M.; Zhang, H.; Fan, J. A novel orthogonality loss for deep hierarchical multi-task learning. IEEE Access 2020, 8, 67735–67744. [Google Scholar] [CrossRef]
- He, G.; Ji, J.; Zhang, H.; Xu, Y.; Fan, J. Feature Selection-Based Hierarchical Deep Network for Image Classification. IEEE Access 2020, 8, 15436–15447. [Google Scholar] [CrossRef]
- He, G.; Li, F.; Wang, Q.; Bai, Z.; Xu, Y. A hierarchical sampling based triplet network for fine-grained image classification. Pattern Recognit. 2021, 115, 107889. [Google Scholar] [CrossRef]
- Kuang, Z.; Li, Z.; Zhao, T.; Fan, J. Deep multi-task learning for large-scale image classification. In Proceedings of the IEEE Third International Conference on Multimedia Big Data, Laguna Hills, CA, USA, 19–21 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 310–317. [Google Scholar]
- Kuang, Z.; Yu, J.; Yu, Z.; Fan, J. Ontology-driven hierarchical deep learning for fashion recognition. In Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval, Miami, FL, USA, 10–12 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 19–24. [Google Scholar]
- Kuang, Z.; Zhang, X.; Yu, J.; Li, Z.; Fan, J. Deep embedding of concept ontology for hierarchical fashion recognition. Neurocomputing 2020, 425, 191–206. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Accuracy | F1-Score | |||
---|---|---|---|---|
Coarse1 | Coarse2 | Fine | ||
CNN–RNN [15] | 0.9577 | 0.8286 | 0.7801 | 0.8587 |
CNN–LSTM [16] | 0.9674 | 0.85 | 0.7973 | 0.8746 |
B-CNN [13] | 0.9592 | 0.9049 | 0.8862 | 0.9284 |
H-CNN [22] | 0.9604 | 0.9019 | 0.8809 | 0.9144 |
ML-CapsNet [25] | 0.9752 | 0.8927 | 0.8572 | 0.9057 |
BA-CNN [21] | 0.9836 | 0.9230 | 0.8880 | 0.9315 |
Ours (HCAM-CL) | 0.9882 | 0.9291 | 0.8936 | 0.9345 |
Accuracy | F1-Score | |||
---|---|---|---|---|
Coarse1 | Coarse2 | Fine | ||
CNN-RNN [15] | 0.7685 | 0.5946 | 0.5446 | 0.6546 |
CNN-LSTM [16] | 0.852 | 0.7446 | 0.7005 | 0.7767 |
B-CNN [13] | 0.9388 | 0.7849 | 0.7392 | 0.8194 |
H-CNN [22] | 0.9476 | 0.8019 | 0.7796 | 0.8268 |
ML-CapsNet [25] | 0.9587 | 0.8654 | 0.8059 | 0.8856 |
BA-CNN [21] | 0.9606 | 0.8790 | 0.8184 | 0.9015 |
Ours (HCAM-CL) | 0.9674 | 0.8888 | 0.8298 | 0.9078 |
Model | Fine Accuracy |
---|---|
BA-CNN [21] | 0.6147 |
B-CNN [13] | 0.6442 |
ML-CapsNet [25] | 0.6462 |
H-CNN [22] | 0.6923 |
CNN-RNN [15] | 0.7226 |
DHC [6] | 0.7591 |
HCAM-CL | 0.7794 |
Accuracy | F1-Score | |||
---|---|---|---|---|
Coarse1 | Coarse2 | Fine | ||
Without cross attention | 0.852 | 0.7446 | 0.7005 | 0.7767 |
With cross attention | 0.9518 | 0.8590 | 0.7954 | 0.8758 |
With hierarchical cross attention | 0.9624 | 0.8888 | 0.8298 | 0.9008 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, J.; Liang, J.; Zhu, J.; Li, Y. HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification. Symmetry 2024, 16, 1231. https://doi.org/10.3390/sym16091231
Su J, Liang J, Zhu J, Li Y. HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification. Symmetry. 2024; 16(9):1231. https://doi.org/10.3390/sym16091231
Chicago/Turabian StyleSu, Jing, Jianmin Liang, Jiayi Zhu, and Yongjiang Li. 2024. "HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification" Symmetry 16, no. 9: 1231. https://doi.org/10.3390/sym16091231
APA StyleSu, J., Liang, J., Zhu, J., & Li, Y. (2024). HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification. Symmetry, 16(9), 1231. https://doi.org/10.3390/sym16091231