A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion
Abstract
:1. Introduction
- State-space attention mechanism: In traditional deep learning models, although CNNs can effectively recognize image features, they often overlook the temporal information and spatial relationships in images, especially in dynamic changes during disease evolution. For this reason, we designed an innovative state-space attention mechanism. This mechanism not only focuses on the local features of maize leaves but also captures the temporal evolution and spatial distribution of leaf diseases through the state-space model. Through this method, the model can finely perceive the subtle changes from the initial to development stage of the disease, significantly improving the recognition ability in early stages of the disease, which is crucial for early intervention and management of the disease.
- State-space function: The state-space function is introduced to more efficiently adjust and optimize the weight distribution in the attention mechanism. Traditional attention mechanisms are often fixed or change little, making it difficult to adapt to complex and variable practical application scenarios. Our state-space function dynamically adjusts weights according to the real-time health condition of the maize leaves, allowing the model to adaptively adjust its focus according to the specific development stage and type of disease. This adaptive adjustment can greatly enhance the model’s accuracy in recognizing different disease states in practical applications, thereby more precisely determining the type and development trend of the disease.
- State loss function: To optimize the performance of the state-space attention mechanism, we developed a new loss-function–state-loss function. This loss function is specifically designed for the characteristics of disease detection and can effectively reduce the occurrences of misdiagnosis and missed diagnosis during the disease detection process. By weighting the prediction errors in different states, this loss function can more accurately guide model learning, optimizing the model’s generalization ability and robustness in complex agricultural environments, thereby improving the overall detection accuracy.
2. Related Work
2.1. Transformer
2.2. MAMBA
2.3. Attention Mechanism
3. Materials and Method
3.1. Materials
3.1.1. Dataset Collection
3.1.2. Data Augmentation
3.1.3. Dataset Construction
3.2. Proposed Method
3.2.1. Overall
3.2.2. State-Space Attention Mechanism
- 1.
- Input features (): The input is the feature representation at each time step (t), with dimensions of , where H and W represent the height and width of the feature map and C is the number of channels.
- 2.
- State representation (): The state-space model introduces the state representation () based on the input features to capture spatiotemporal information. The state representation is updated at each time step according to the input features.
- 3.
- State update: The state update process involves a series of matrix operations, including multiplication and summation. The specific update process is as follows: First, the input features () are transformed by the state matrix () to generate state information. The state matrix () is defined as , where F is a function that updates the state matrix based on the previous state () and the input at the previous time step (). This function facilitates learning of the dynamics between the evolving state and the new input features. Then, the state information is multiplied by the weight matrix (A) and combined with the previous state information through a weighted sum to obtain the new state representation. The mathematical expression is as follows:
- 4.
- Output generation: After the state update is completed, the model generates the output () and further processes the state information. The calculation of the output is as follows:
- Modeling capability in the temporal dimension: During the state update process, the model uses the multiplication and summation operations of matrices A and to model the associations between the input features at time t and the states from previous time steps. Thus, this mechanism can capture changes in the input features over time.
- Feature fusion and context awareness: By performing weighted summation of the state representation and input features through matrices C and D, the state-space attention mechanism achieves a fusion of local features and global context, further enhancing the model’s representation capability.
3.2.3. State-Space Function
3.2.4. Multi-Scale Fusion Module
- Input feature extraction and linear projection: After being processed by the visual encoder tokens, the input features generate a series of feature mappings with dimensions of , where T represents the time steps of the input features and is the feature dimension. After the first stage of linear projection, the feature dimension is further increased to .
- Multi-stage feature extraction: The multi-scale fusion module in the model contains multiple stages of feature extraction. At each stage, the input feature mappings are processed by a state update module, generating new features with dimensions of , where is the amplification factor for feature extraction at that stage. The feature dimensions increase progressively across the stages to extract higher-level features:
- Fusion process and feature concatenation: In the multi-scale fusion module, the features extracted from each stage are fused through a concatenation operation. Assuming that the model has N stages, the fused features () can be expressed as follows:By concatenating features at different scales, the model obtains a feature map () that integrates multi-scale features with dimensions of , where is the dimension of the fused features. The fused features contain rich information from local to global and from low levels to high levels.
- Feature transformation and linear mapping: The fused feature map () undergoes a series of linear mapping operations to further adjust the feature dimensions to meet the requirements of downstream tasks. Initially, instead of directly concatenating the features along the N stages as , the concatenated features are processed to result in effective dimensions of . During the linear mapping process, the feature dimensions are then increased from to . After processing through a nonlinear activation function, the dimensions are reduced back to the original feature dimension:Through this linear mapping operation, the feature dimensions are unified, and the features are sufficiently fused both in terms of space and scale.
- Feature classification and output: After multi-scale feature fusion is completed, the features processed by linear mapping are sent to the classification module (TS mixer classification). The classification module performs the final classification operation on the fused features to obtain the disease detection results.
3.2.5. State Loss Function
3.3. Experimental Setup
3.3.1. Hardware and Software Platform
3.3.2. Experimental Configuration
3.3.3. Baseline
3.3.4. Evaluation Metrics
4. Results and Discussion
4.1. Maize Disease Detection Results
4.2. Results Analysis
4.2.1. Analysis of Different Diseases
4.2.2. Confusion Matrix Analysis
4.3. Ablation Experiments and Discussion
4.3.1. Discussion on Different Attention Mechanisms
4.3.2. Discussion on Different Loss Functions
4.4. Limits and Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mukhtar, T.; Vagelas, I.; Javaid, A. New trends in integrated plant disease management. Front. Agron. 2023, 4, 1104122. [Google Scholar] [CrossRef]
- Kotwal, J.; Kashyap, R.; Pathan, S. Agricultural plant diseases identification: From traditional approach to deep learning. Mater. Today Proc. 2023, 80, 344–356. [Google Scholar] [CrossRef]
- Zhang, Y.; Wa, S.; Liu, Y.; Zhou, X.; Sun, P.; Ma, Q. High-accuracy detection of maize leaf diseases CNN based on multi-pathway activation function module. Remote. Sens. 2021, 13, 4218. [Google Scholar] [CrossRef]
- Jasrotia, S.; Yadav, J.; Rajpal, N.; Arora, M.; Chaudhary, J. Convolutional neural network based maize plant disease identification. Procedia Comput. Sci. 2023, 218, 1712–1721. [Google Scholar] [CrossRef]
- Masood, M.; Nawaz, M.; Nazir, T.; Javed, A.; Alkanhel, R.; Elmannai, H.; Dhahbi, S.; Bourouis, S. MaizeNet: A deep learning approach for effective recognition of maize plant leaf diseases. IEEE Access 2023, 11, 52862–52876. [Google Scholar] [CrossRef]
- Zhang, Y.; Wa, S.; Zhang, L.; Lv, C. Automatic plant disease detection based on tranvolution detection network with GAN modules using leaf images. Front. Plant Sci. 2022, 13, 875693. [Google Scholar] [CrossRef]
- Xiong, H.; Li, J.; Wang, T.; Zhang, F.; Wang, Z. EResNet-SVM: An overfitting-relieved deep learning model for recognition of plant diseases and pests. J. Sci. Food Agric. 2024, 104, 6018–6034. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, X.; Liu, Y.; Zhou, J.; Huang, Y.; Li, J.; Zhang, L.; Ma, Q. A time-series neural network for pig feeding behavior recognition and dangerous detection from videos. Comput. Electron. Agric. 2024, 218, 108710. [Google Scholar] [CrossRef]
- Jamjoom, M.; Elhadad, A.; Abulkasim, H.; Abbas, S. Plant leaf diseases classification using improved k-means clustering and svm algorithm for segmentation. Comput. Mater. Contin. 2023, 76, 367–382. [Google Scholar] [CrossRef]
- Reddy, S.R.; Varma, G.S.; Davuluri, R.L. Deep neural network (DNN) mechanism for identification of diseased and healthy plant leaf images using computer vision. Ann. Data Sci. 2024, 11, 243–272. [Google Scholar] [CrossRef]
- Rudenko, M.; Kazak, A.; Oleinikov, N.; Mayorova, A.; Dorofeeva, A.; Nekhaychuk, D.; Shutova, O. Intelligent Monitoring System to Assess Plant Development State Based on Computer Vision in Viticulture. Computation 2023, 11, 171. [Google Scholar] [CrossRef]
- Zhang, Y.; Lv, C. TinySegformer: A lightweight visual segmentation model for real-time agricultural pest detection. Comput. Electron. Agric. 2024, 218, 108740. [Google Scholar] [CrossRef]
- Patil, M.A.; Manur, M. Sensitive crop leaf disease prediction based on computer vision techniques with handcrafted features. Int. J. Syst. Assur. Eng. Manag. 2023, 14, 2235–2266. [Google Scholar] [CrossRef]
- Kaya, Y.; Gürsoy, E. A novel multi-head CNN design to identify plant diseases using the fusion of RGB images. Ecol. Inform. 2023, 75, 101998. [Google Scholar] [CrossRef]
- Pramudhita, D.A.; Azzahra, F.; Arfat, I.K.; Magdalena, R.; Saidah, S. Strawberry Plant Diseases Classification Using CNN Based on MobileNetV3-Large and EfficientNet-B0 Architecture. J. Ilm. Tek. Elektro Komput. Inform. 2023, 9, 522–534. [Google Scholar]
- Rachman, R.K.; Susanto, A.; Nugroho, K.; Islam, H.M.M. Enhanced Vision Transformer and Transfer Learning Approach to Improve Rice Disease Recognition. J. Comput. Theor. Appl. 2024, 1, 446–460. [Google Scholar] [CrossRef]
- Chen, Z.; Wang, G.; Lv, T.; Zhang, X. Using a Hybrid Convolutional Neural Network with a Transformer Model for Tomato Leaf Disease Detection. Agronomy 2024, 14, 673. [Google Scholar] [CrossRef]
- Brown, D.; De Silva, M. Plant Disease Detection on Multispectral Images using Vision Transformers. In Proceedings of the 25th Irish Machine Vision and Image Processing Conference (IMVIP), Galway, Ireland, 25–26 August 2016; Volume 30. [Google Scholar]
- Zeng, Q.; Sun, J.; Wang, S. DIC-Transformer: Interpretation of plant disease classification results using image caption generation technology. Front. Plant Sci. 2024, 14, 1273029. [Google Scholar] [CrossRef]
- Mehta, S.; Kukreja, V.; Srivastava, P. Agriculture Breakthrough: Federated ConvNets for Unprecedented Maize Disease Detection and Severity Estimation. In Proceedings of the 2023 International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, 10–11 August 2023; IEEE: New York, NY, USA, 2023; pp. 375–380. [Google Scholar]
- Mehta, S.; Kukreja, V.; Gupta, A. Revolutionizing Maize Disease Management with Federated Learning CNNs: A Decentralized and Privacy-Sensitive Approach. In Proceedings of the 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, 26–28 May 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Li, E.; Wang, L.; Xie, Q.; Gao, R.; Su, Z.; Li, Y. A novel deep learning method for maize disease identification based on small sample-size and complex background datasets. Ecol. Inform. 2023, 75, 102011. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Patwardhan, N.; Marrone, S.; Sansone, C. Transformers in the real world: A survey on nlp applications. Information 2023, 14, 242. [Google Scholar] [CrossRef]
- Kmetty, Z.; Kollányi, B.; Boros, K. Boosting classification reliability of NLP transformer models in the long run. arXiv 2023, arXiv:2302.10016. [Google Scholar]
- Dubey, S.R.; Singh, S.K. Transformer-based generative adversarial networks in computer vision: A comprehensive survey. IEEE Trans. Artif. Intell. 2024, 5, 4851–4867. [Google Scholar] [CrossRef]
- Roy, A.M.; Bhaduri, J. A computer vision enabled damage detection model with improved yolov5 based on transformer prediction head. arXiv 2023, arXiv:2303.04275. [Google Scholar]
- Li, Q.; Ren, J.; Zhang, Y.; Song, C.; Liao, Y.; Zhang, Y. Privacy-Preserving DNN Training with Prefetched Meta-Keys on Heterogeneous Neural Network Accelerators. In Proceedings of the 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Li, Q.; Zhang, Y.; Ren, J.; Li, Q.; Zhang, Y. You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks. arXiv 2024, arXiv:2404.04098. [Google Scholar]
- Xia, C.; Wang, X.; Lv, F.; Hao, X.; Shi, Y. Vit-comer: Vision transformer with convolutional multi-scale feature interaction for dense predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5493–5502. [Google Scholar]
- Tabbakh, A.; Barpanda, S.S. A deep features extraction model based on the transfer learning model and vision transformer “tlmvit” for plant disease classification. IEEE Access 2023, 11, 45377–45392. [Google Scholar] [CrossRef]
- Li, G.; Wang, Y.; Zhao, Q.; Yuan, P.; Chang, B. PMVT: A lightweight vision transformer for plant disease identification on mobile devices. Front. Plant Sci. 2023, 14, 1256773. [Google Scholar] [CrossRef]
- Wu, P.; Wang, Z.; Zheng, B.; Li, H.; Alsaadi, F.E.; Zeng, N. AGGN: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion. Comput. Biol. Med. 2023, 152, 106457. [Google Scholar] [CrossRef]
- Zhang, X.; Mu, W. GMamba: State space model with convolution for Grape leaf disease segmentation. Comput. Electron. Agric. 2024, 225, 109290. [Google Scholar] [CrossRef]
- Shi, D.; Li, C.; Shi, H.; Liang, L.; Liu, H.; Diao, M. A Hierarchical Feature-Aware Model for Accurate Tomato Blight Disease Spot Detection: Unet with Vision Mamba and ConvNeXt Perspective. Agronomy 2024, 14, 2227. [Google Scholar] [CrossRef]
- Zhang, H.; Zhu, Y.; Wang, D.; Zhang, L.; Chen, T.; Wang, Z.; Ye, Z. A survey on visual mamba. Appl. Sci. 2024, 14, 5683. [Google Scholar] [CrossRef]
- Li, Q.; Zhang, Y. Confidential Federated Learning for Heterogeneous Platforms against Client-Side Privacy Leakages. In Proceedings of the ACM Turing Award Celebration Conference 2024, Changsha, China, 5–7 July 2024; pp. 239–241. [Google Scholar]
- Liu, X.; Zhang, C.; Zhang, L. Vision mamba: A comprehensive survey and taxonomy. arXiv 2024, arXiv:2405.04404. [Google Scholar]
- Qu, H.; Ning, L.; An, R.; Fan, W.; Derr, T.; Xu, X.; Li, Q. A Survey of Mamba. arXiv 2024, arXiv:2408.01129. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Lu, S.; Liu, M.; Yin, L.; Yin, Z.; Liu, X.; Zheng, W. The multi-modal fusion in visual question answering: A review of attention mechanisms. PeerJ Comput. Sci. 2023, 9, e1400. [Google Scholar] [CrossRef]
- Samo, M.; Mafeni Mase, J.M.; Figueredo, G. Deep learning with attention mechanisms for road weather detection. Sensors 2023, 23, 798. [Google Scholar] [CrossRef]
- Zhong, C.; Hu, L.; Zhang, Z.; Xia, S. Attt2m: Text-driven human motion generation with multi-perspective attention mechanism. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 509–519. [Google Scholar]
- Li, X.; Li, M.; Yan, P.; Li, G.; Jiang, Y.; Luo, H.; Yin, S. Deep learning attention mechanism in medical image analysis: Basics and beyonds. Int. J. Netw. Dyn. Intell. 2023, 2, 93–116. [Google Scholar] [CrossRef]
- Huang, Z.; Su, L.; Wu, J.; Chen, Y. Rock image classification based on EfficientNet and triplet attention mechanism. Appl. Sci. 2023, 13, 3180. [Google Scholar] [CrossRef]
- Sunil, C.; Jaidhar, C.; Patil, N. Tomato plant disease classification using multilevel feature fusion with adaptive channel spatial and pixel attention mechanism. Expert Syst. Appl. 2023, 228, 120381. [Google Scholar]
- Alirezazadeh, P.; Schirrmann, M.; Stolzenburg, F. Improving deep learning-based plant disease classification with attention mechanism. Gesunde Pflanz. 2023, 75, 49–59. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Disease | Quantity |
---|---|
Maize Northern Leaf Blight | 1596 |
Maize Southern Leaf Blight | 988 |
Maize Smut | 1013 |
Maize Head Smut | 1280 |
Maize Round Spot | 1612 |
Maize Brown Spot | 854 |
Model | Precision | Recall | Accuracy | F1 Score |
---|---|---|---|---|
AlexNet | 0.81 | 0.79 | 0.80 | 0.80 |
GoogLeNet | 0.85 | 0.83 | 0.84 | 0.84 |
ResNet | 0.88 | 0.86 | 0.87 | 0.87 |
EfficientNet | 0.90 | 0.87 | 0.89 | 0.88 |
ViT | 0.91 | 0.89 | 0.90 | 0.90 |
MaizeNet [3] | 0.92 | 0.91 | 0.91 | 0.91 |
Tiny-segformer [12] | 0.92 | 0.88 | 0.89 | 0.90 |
Proposed Method | 0.95 | 0.94 | 0.94 | 0.94 |
Disease | Precision | Recall | Accuracy | F1 Score |
---|---|---|---|---|
Maize Northern Leaf Blight | 0.94 | 0.99 | 0.98 | 0.95 |
Maize Southern Leaf Blight | 0.92 | 0.91 | 0.91 | 0.98 |
Maize Smut | 0.96 | 0.96 | 0.91 | 0.98 |
Maize Head Smut | 0.99 | 0.91 | 0.92 | 0.91 |
Maize Round Spot | 0.93 | 0.94 | 0.95 | 0.92 |
Maize Brown Spot | 0.96 | 0.90 | 0.94 | 0.92 |
Model | Precision | Recall | Accuracy | F1 Score |
---|---|---|---|---|
Standard Self-Attention | 0.74 | 0.70 | 0.72 | 0.72 |
Convolutional Block Attention Module | 0.87 | 0.83 | 0.85 | 0.85 |
State-Space Attention | 0.95 | 0.94 | 0.94 | 0.94 |
Model | Precision | Recall | Accuracy | F1 Score |
---|---|---|---|---|
Cross-Entropy Loss | 0.69 | 0.65 | 0.67 | 0.67 |
Focal Loss | 0.83 | 0.80 | 0.81 | 0.81 |
State-Space Loss | 0.95 | 0.94 | 0.94 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, T.; Yan, F.; Lv, X.; Zhao, H.; Wang, Z.; Dong, K.; Fu, Z.; Jia, R.; Lv, C. A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion. Plants 2024, 13, 3151. https://doi.org/10.3390/plants13223151
Zhu T, Yan F, Lv X, Zhao H, Wang Z, Dong K, Fu Z, Jia R, Lv C. A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion. Plants. 2024; 13(22):3151. https://doi.org/10.3390/plants13223151
Chicago/Turabian StyleZhu, Tong, Fengyi Yan, Xinyang Lv, Hanyi Zhao, Zihang Wang, Keqin Dong, Zhengjie Fu, Ruihao Jia, and Chunli Lv. 2024. "A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion" Plants 13, no. 22: 3151. https://doi.org/10.3390/plants13223151
APA StyleZhu, T., Yan, F., Lv, X., Zhao, H., Wang, Z., Dong, K., Fu, Z., Jia, R., & Lv, C. (2024). A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion. Plants, 13(22), 3151. https://doi.org/10.3390/plants13223151