Multi-Modal Vision Transformer with Explainable Shapley Additive Explanations Value Embedding for Cymbidium goeringii Quality Grading
Abstract
:1. Introduction
2. Related Works
2.1. Orchid Variety Classification
2.2. Flower Quality Grading
2.3. Agricultural Product Quality Grading
3. Materials and Methods
3.1. Dataset Acquisition and Definition
3.2. Transformer Encoder
3.3. UNet Architecture
3.4. Model Architecture Design
3.5. SHAP Algorithm for Feature Importance Calculation
4. Results and Discussion
4.1. Combination of Image and Text Basically Achieved Quality Grading
4.2. ViT and Global Fine-Grained Features Achieved Explainable Features Representation
4.3. Concatenation Replaced Dot Production Reflecting the Relationship Between Key Features and Value Adequately
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, F.; Zhu, G.; Wang, Z.; Liu, H.; Xu, Q.; Huang, D.; Zhao, C. Integrated mRNA and microRNA transcriptome variations in the multi-tepal mutant provide insights into the floral patterning of the orchid Cymbidium goeringii. BMC Genom. 2017, 18, 1–24. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Song, J.; Han, W.; Chen, B.; Zhang, X. Morphological diversity of wild Cymbidium goeringii and Cymbidium faberi in the Qinling Mountains. J. Northwest A F Univ.-Nat. Sci. Ed. 2017, 45, 143–150. [Google Scholar]
- Balilashaki, K.; Martinez-Montero, M.E.; Vahedi, M.; Cardoso, J.C.; Silva Agurto, C.L.; Leiva-Mora, M.; Feizi, F.; Musharof Hossain, M. Medicinal Use, Flower Trade, Preservation and Mass Propagation Techniques of Cymbidium Orchids—An Overview. Horticulturae 2023, 9, 690. [Google Scholar] [CrossRef]
- Yang, F.; Gao, J.; Li, J.; Wei, Y.; Xie, Q.; Jin, J.; Lu, C.; Zhu, W.; Wong, S.M.; Zhu, G. The China orchid industry: Past and future perspectives. Ornam. Plant Res. 2024, 4, e002. [Google Scholar] [CrossRef]
- Seyler, B.C. The Role of Botanical Gardens in the Conservation of Orchid Biocultural Diversity in Sichuan Province, China; University of Hawai’i at Manoa: Honolulu, HI, USA, 2017. [Google Scholar]
- Tiwari, P.; Sharma, A.; Bose, S.K.; Park, K.I. Advances in Orchid Biology: Biotechnological Achievements, Translational Success, and Commercial Outcomes. Horticulturae 2024, 10, 152. [Google Scholar] [CrossRef]
- Shefferson, R.P.; Jacquemyn, H.; Kull, T.; Hutchings, M.J. The demography of terrestrial orchids: Life history, population dynamics and conservation. Bot. J. Linn. Soc. 2020, 192, 315–332. [Google Scholar] [CrossRef]
- Fu, Q.; Zhang, X.; Zhao, F.; Ruan, R.; Qian, L.; Li, C. Deep feature extraction for cymbidium species classification using global–local CNN. Horticulturae 2022, 8, 470. [Google Scholar] [CrossRef]
- Sarachai, W.; Bootkrajang, J.; Chaijaruwanich, J.; Somhom, S. Orchid classification using homogeneous ensemble of small deep convolutional neural network. Mach. Vis. Appl. 2022, 33, 17. [Google Scholar] [CrossRef]
- Yang, Y.; Zhang, G.; Ma, S.; Wang, Z.; Liu, H.; Gu, S. Potted phalaenopsis grading: Precise bloom and bud counting with the PA-YOLO algorithm and multiviewpoint imaging. Agronomy 2024, 14, 115. [Google Scholar] [CrossRef]
- Wei, H.; Tang, W.; Chu, X.; Mu, Y.; Ma, Z. Grading method of potted anthurium based on RGB-D features. Math. Probl. Eng. 2021, 2021, 1–8. [Google Scholar] [CrossRef]
- Chang, Y.W.; Hsiao, Y.K.; Ko, C.C.; Shen, R.S.; Lin, W.Y.; Lin, K.P. A Grading System of Pot-Phalaenopsis Orchid Using YOLO-V3 Deep Learning Model. In Proceedings of the Advances in Networked-Based Information Systems: The 23rd International Conference on Network-Based Information Systems (NBiS-2020) 23, Victoria, Canada, 31 August–2 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 498–507. [Google Scholar] [CrossRef]
- Sun, Y.; Zhu, L.; Wang, G.; Zhao, F. Multi-input convolutional neural network for flower grading. J. Electr. Comput. Eng. 2017, 2017, 9240407. [Google Scholar] [CrossRef]
- Yang, Z.; Li, Z.; Hu, N.; Zhang, M.; Zhang, W.; Gao, L.; Ding, X.; Qi, Z.; Duan, S. Multi-Index Grading Method for Pear Appearance Quality Based on Machine Vision. Agriculture 2023, 13, 290. [Google Scholar] [CrossRef]
- Chopra, H.; Singh, H.; Bamrah, M.S.; Mahbubani, F.; Verma, A.; Hooda, N.; Rana, P.S.; Singla, R.K.; Singh, A.K. Efficient fruit grading system using spectrophotometry and machine learning approaches. IEEE Sens. J. 2021, 21, 16162–16169. [Google Scholar] [CrossRef]
- Mukherjee, A.; Sarkar, T.; Chatterjee, K.; Lahiri, D.; Nag, M.; Rebezov, M.; Shariati, M.A.; Miftakhutdinov, A.; Lorenzo, J.M. Development of artificial vision system for quality assessment of oyster mushrooms. Food Anal. Methods 2022, 15, 1663–1676. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Yan, H.; Deng, B.; Li, X.; Qiu, X. TENER: Adapting transformer encoder for named entity recognition. arXiv 2019, arXiv:1911.04474. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; Volume 3, pp. 234–241. [Google Scholar] [CrossRef]
- Gramegna, A.; Giudici, P. SHAP and LIME: An evaluation of discriminative power in credit risk. Front. Artif. Intell. 2021, 4, 752558. [Google Scholar] [CrossRef]
- Luo, X.; Zhou, M.; Li, S.; Hu, L.; Shang, M. Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications. IEEE Trans. Cybern. 2019, 50, 1844–1855. [Google Scholar] [CrossRef]
- Chen, H.; Li, H.; Li, Y.; Chen, C. Sparse spatial transformers for few-shot learning. Sci. China Inf. Sci. 2023, 66, 210102. [Google Scholar] [CrossRef]
- Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
- Jain, A.; Sarsaiya, S.; Chen, J.; Wu, Q.; Lu, Y.; Shi, J. Changes in global Orchidaceae disease geographical research trends: Recent incidences, distributions, treatment, and challenges. Bioengineered 2021, 12, 13–29. [Google Scholar] [CrossRef] [PubMed]
- Zhao, X.; Li, Y.; Zhang, M.M.; He, X.; Ahmad, S.; Lan, S.; Liu, Z.J. Research advances on the gene regulation of floral development and color in orchids. Gene 2023, 888, 147751. [Google Scholar] [CrossRef] [PubMed]
- Hadsell, R.; Rao, D.; Rusu, A.A.; Pascanu, R. Embracing change: Continual learning in deep neural networks. Trends Cogn. Sci. 2020, 24, 1028–1040. [Google Scholar] [CrossRef]
- Arshed, M.A.; Rehman, H.A.; Ahmed, S.; Dewi, C.; Christanto, H.J. A 16 × 16 Patch-Based Deep Learning Model for the Early Prognosis of Monkeypox from Skin Color Images. Computation 2024, 12, 33. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. Separable self-attention for mobile vision transformers. arXiv 2022, arXiv:2206.02680. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Su, J.; Wang, L.; Atkinson, P.M. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
- Zhang, B.; Tian, Z.; Tang, Q.; Chu, X.; Wei, X.; Shen, C. Segvit: Semantic segmentation with plain vision transformers. Adv. Neural Inf. Process. Syst. 2022, 35, 4971–4982. [Google Scholar] [CrossRef]
- Zheng, Q.; Zhu, J.; Li, Z.; Pang, S.; Wang, J.; Li, Y. Feature concatenation multi-view subspace clustering. arXiv 2019, arXiv:1901.10657. [Google Scholar] [CrossRef]
- Henderson, M.; Casanueva, I.; Mrkšić, N.; Su, P.H.; Wen, T.H.; Vulić, I. ConveRT: Efficient and accurate conversational representations from transformers. arXiv 2019, arXiv:1911.03688. [Google Scholar] [CrossRef]
- Pucci, R.; Micheloni, C.; Foresti, G.L.; Martinel, N. CVGAN: Image Generation with Capsule Vector-VAE. In Proceedings of the International Conference on Image Analysis and Processing, Lecce, Italy, 23–27 May 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 536–547. [Google Scholar] [CrossRef]
- Nash, C.; Menick, J.; Dieleman, S.; Battaglia, P.W. Generating images with sparse representations. arXiv 2021, arXiv:2103.03841. [Google Scholar] [CrossRef]
- Ye, N.; Tang, J.; Deng, H.; Zhou, X.Y.; Li, Q.; Li, Z.; Yang, G.Z.; Zhu, Z. Adversarial invariant learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 12441–12449. [Google Scholar] [CrossRef]
- Carraro, T.; Polato, M.; Aiolli, F. A look inside the black-box: Towards the interpretability of conditioned variational autoencoder for collaborative filtering. In Proceedings of the Adjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization, New York, NY, USA, 14–17 July 2020; pp. 233–236. [Google Scholar] [CrossRef]
- Cheng, Y.C.; Lee, H.Y.; Sun, M.; Yang, M.H. Controllable image synthesis via segvae. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 159–174. [Google Scholar] [CrossRef]
- Yeung, S.; Kannan, A.; Dauphin, Y.; Feifei, L. Tackling over-pruning in variational autoencoders. arXiv 2017, arXiv:1706.03643. [Google Scholar] [CrossRef]
- Yang, B.; Li, J.; Wong, D.F.; Chao, L.S.; Wang, X.; Tu, Z. Context-aware self-attention networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 387–394. [Google Scholar] [CrossRef]
Flower | Leaves | Bud | Label (Grade) | |||||
---|---|---|---|---|---|---|---|---|
Image | Petal Type | Longest Leaf | Widest Leaf | Leaves Height | Leaves Number | Bud Number | Seedlings Number | |
Lotus | 20 | 0.6 | - | - | - | - | 1 | |
Lotus | 12 | 0.6 | - | 5 | 2 | 2 | 2 | |
Lotus | 23 | 1.1 | - | 27 | - | 5 | 3 | |
Plum | 39 | 1.2 | - | - | - | 3 | 4 |
Label (Grade) | Reference Price Level (RMB) | Numbers of Samples | Size of Training Set | Size of Validation Set | Size of Test Set |
---|---|---|---|---|---|
1 | 0–100 | 722 | 587 | 73 | 62 |
2 | 100–300 | 696 | 556 | 70 | 70 |
3 | 300–600 | 673 | 538 | 66 | 69 |
4 | 600–1500 | 668 | 534 | 65 | 69 |
5 | 1500–3000 | 624 | 499 | 63 | 62 |
6 | 3000–5000 | 593 | 474 | 60 | 59 |
7 | >5000 | 580 | 464 | 58 | 58 |
Total | 4556 | 3652 | 455 | 449 |
Encoder Input | Accuracy of Validation Set | Accuracy of Test Set | ||||
---|---|---|---|---|---|---|
HE-CNN | CNN | ViT | HE-CNN | CNN | ViT | |
Numerical features ⊗ RGB ⊕ global feature from HE-CNN, CNN or ViT | 81.22% | 74.46% | 94.19% | 76.58% | 70.63% | 93.13% |
Training ID | Encoder Input | Accuracy of Validation Set | Accuracy of Test Set | ||||
---|---|---|---|---|---|---|---|
VQ-VAE Encoder |
C-VAE Encoder |
Transformer Encoder |
VQ-VAE Encoder |
C-VAE Encoder |
Transformer Encoder | ||
#1 | Numerical features | 57.21% | 51.36% | 63.02% | 52.56% | 48.71% | 59.36% |
#2 | Numerical features ⊗ RGB | 75.67% | 71.32% | 84.92% | 72.81% | 69.10% | 81.69% |
#3 | Numerical features ⊗ RGB ⊕ global feature from ViT | 85.36% | 81.29% | 94.19% | 82.48% | 77.88% | 93.13% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; He, X.; Wang, Y.; Li, X. Multi-Modal Vision Transformer with Explainable Shapley Additive Explanations Value Embedding for Cymbidium goeringii Quality Grading. Appl. Sci. 2024, 14, 10157. https://doi.org/10.3390/app142210157
Wang Z, He X, Wang Y, Li X. Multi-Modal Vision Transformer with Explainable Shapley Additive Explanations Value Embedding for Cymbidium goeringii Quality Grading. Applied Sciences. 2024; 14(22):10157. https://doi.org/10.3390/app142210157
Chicago/Turabian StyleWang, Zhen, Xiangnan He, Yuting Wang, and Xian Li. 2024. "Multi-Modal Vision Transformer with Explainable Shapley Additive Explanations Value Embedding for Cymbidium goeringii Quality Grading" Applied Sciences 14, no. 22: 10157. https://doi.org/10.3390/app142210157
APA StyleWang, Z., He, X., Wang, Y., & Li, X. (2024). Multi-Modal Vision Transformer with Explainable Shapley Additive Explanations Value Embedding for Cymbidium goeringii Quality Grading. Applied Sciences, 14(22), 10157. https://doi.org/10.3390/app142210157