A Lightweight Pine Wilt Disease Detection Method Based on Vision Transformer-Enhanced YOLO
Abstract
:1. Introduction
- A lightweight Multi-Scale Attention module (MSA) is introduced to construct an EfficientViT feature extraction network, which achieves efficient global information extraction and multi-scale learning through efficient hardware operations, reducing network computational complexity;
- A Content-Aware Cross-Scale bidirectional fusion neck network (CACSNet) is proposed, which uses the Content-Aware Reassembly Feature Enhancement (CARAFE) operator to replace the bilinear difference in PANET (Path Aggregation Network) for upsampling, and uses cross-scale weighting for feature fusion to improve the expression ability of fine-grained features of diseased trees, prevent small target feature loss, and improve detection accuracy;
- Optimization of the loss function and introduction of EIOU (Efficient Intersection over Union) loss to help the model better balance the size and shape information of the target, improving the accuracy and robustness of PWD detection.
2. Related Works
2.1. Visual Transformer in Remote Sensing
2.2. Lightweight Multi-Scale Attention
3. Materials and Methods
3.1. Baseline Network YOLOv5
3.2. Redesign of Backbone Feature Extraction Network
- (1)
- The backbone network incorporates an input backbone and four stages, characterized by diminishing feature map size and escalating channel numbers;
- (2)
- Lightweight MSAs are integrated into stages 3 and 4;
- (3)
- For downsampling, the model employs MBConv with a step size of 2.
3.3. Design of CACSNet Neck Networks
3.4. Optimization of Loss Function
4. Experiment and Performance Analysis
4.1. Research Area and Data Acquisition
4.2. Experimental Configuration
4.3. Experimental Indicators
4.4. Performance Comparison of Different Methods
4.4.1. Performance Comparison of Different Methods
4.4.2. Ablation Experiment
4.4.3. Feature Extraction Performance Analysis of EfficientViT
4.4.4. Performance Analysis of the Training Process
5. Conclusions
- The method proposed in this paper has been experimentally verified on a standard platform. The next step is to deploy the application on a drone hardware platform through algorithms to further verify its feasibility and potential economic benefits;
- Combining the method proposed in this paper with satellite-based forest monitoring to further strengthen the monitoring of pine tree discoloration caused by pine wilt disease. Integrating drone images with satellite images for multi-scale analysis from both macroscopic and local perspectives, comprehensively monitoring diseases through data fusion and analysis;
- Applying the method proposed in this paper to the detection of other forest diseases, such as bark beetle damage.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pan, C. Development of studies on pinewood nematodes diseases. J. Xiamen Univ. 2011, 50, 476–483. [Google Scholar]
- Liu, F.; Su, H.; Ding, T.; Huang, J.; Liu, T.; Ding, N.; Fang, G. Refined Assessment of Economic Loss from Pine Wilt Disease at the Subcompartment Scale. Forests 2023, 14, 139. [Google Scholar] [CrossRef]
- Duarte, A.; Borralho, N.; Cabral, P.; Caetano, M. Recent advances in forest insect pests and diseases monitoring using UAV-based data: A systematic review. Forests 2022, 13, 911. [Google Scholar] [CrossRef]
- Zhang, X.; Yang, H.; Cai, P.; Chen, G.; Li, X.; Zhu, K. Research progress on remote sensing monitoring of pine wilt disease. Trans. Chin. Soc. Agric. Eng 2022, 38, 184–194. [Google Scholar]
- Cai, P.; Chen, G.; Yang, H.; Li, X.; Zhu, K.; Wang, T.; Liao, P.; Han, M.; Gong, Y.; Wang, Q.; et al. Detecting Individual Plants Infected with Pine Wilt Disease Using Drones and Satellite Imagery: A Case Study in Xianning, China. Remote Sens. 2023, 15, 2671. [Google Scholar] [CrossRef]
- You, J.; Zhang, R.; Lee, J. A deep learning-based generalized system for detecting pine wilt disease using RGB-based UAV images. Remote Sens. 2021, 14, 150. [Google Scholar] [CrossRef]
- Qin, J.; Wang, B.; Wu, Y.; Lu, Q.; Zhu, H. Identifying pine wood nematode disease using UAV images and deep learning algorithms. Remote Sens. 2021, 13, 162. [Google Scholar] [CrossRef]
- Wu, B.; Liang, A.; Zhang, H.; Zhu, T.; Zou, Z.; Yang, D.; Tang, W.; Li, J.; Su, J. Application of conventional UAV-based high- throughput object detection to the early diagnosis of pine wilt disease by deep learning. For. Ecol. Manag. 2021, 486, 118986. [Google Scholar] [CrossRef]
- Gong, H.; Ding, Y.; Li, D.; Wang, W.; Li, Z. Recognition of Pine Wood Affected by Pine Wilt Disease Based on YOLOv5. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 4753–4757. [Google Scholar]
- Sun, Z.; Ibrayim, M.; Hamdulla, A. Detection of pine wilt nematode from drone images using UAV. Sensors 2022, 22, 4704. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Naseer, M.M.; Ranasinghe, K.; Khan, S.H.; Hayat, M.; Shahbaz Khan, F.; Yang, M.H. Intriguing properties of vision transformers. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2021; Volume 34, pp. 23296–23308. [Google Scholar]
- Park, N.; Kim, S. How do vision transformers work? arXiv 2022, arXiv:2202.06709. [Google Scholar]
- Hao, S.; Wu, B.; Zhao, K.; Ye, Y.; Wang, W. Two-stream swin transformer with differentiable sobel operator for remote sensing image classification. Remote Sens. 2022, 14, 1507. [Google Scholar] [CrossRef]
- Ma, J.; Li, M.; Tang, X.; Zhang, X.; Liu, F.; Jiao, L. Homo–heterogenous transformer learning framework for RS scene classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2223–2239. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2021; Volume 34, pp. 12077–12090. [Google Scholar]
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. Segnext: Rethinking convolutional attention design for semantic segmentation. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2022; Volume 35, pp. 1140–1156. [Google Scholar]
- Wang, G.; Li, B.; Zhang, T.; Zhang, S. A network combining a transformer and a convolutional neural network for remote sensing image change detection. Remote Sens. 2022, 14, 2228. [Google Scholar] [CrossRef]
- Cai, H.; Li, J.; Hu, M.; Gan, C.; Han, S. EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation. arXiv 2023, arXiv:2205.14756. [Google Scholar]
- Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning, Virtual Event, 13–18 July 2020; PMLR; pp. 5156–5165. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; Volume 28. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16519–16529. [Google Scholar]
- Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef] [PubMed]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2022; Volume 35, pp. 9969–9982. [Google Scholar]
Variants | Input Stem | Stage 1 | Stage 2 | Stage 3 | Stage 4 |
---|---|---|---|---|---|
C | C0 = 8 | C1 = 16 | C2 = 32 | C3 = 64 | C4 = 128 |
L | L0 = 1 | L1 = 2 | L2 = 2 | L3 = 2 | L4 = 2 |
H | 640 | 640 | 640 | 640 | 640 |
W | 640 | 640 | 640 | 640 | 640 |
Platform | Configuration |
---|---|
Operating system | Linux 3.10.0 |
CPU | Intel(R) Xeon(R) Gold 6138 CPU @ 2.00 GHz |
GPU | Tesla V100-PCIE-32GB |
GPU accelerator | CUDA 10.2 |
Deep learning frame | PyTorch 1.10.1 |
Compilers | PyCharm and Anaconda |
Scripting language | Python 3.7 |
Method | Recall (%) | [email protected](%) | [email protected]:0.95(%) |
---|---|---|---|
Faster-RCNN | 75.4 | 75.2 | 66.2 |
RetinaNET | 96.6 | 95.9 | 92.5 |
YOLOV6 | 96.7 | 95.9 | 80.8 |
YOLOv7 | 93.9 | 82.5 | 55.9 |
YOLOX | 96.9 | 96.0 | 84.3 |
YOLOv5 | 96.1 | 97.6 | 90.8 |
Light-ViTeYOLO | 95.7 | 97.2 | 94.3 |
Method | Parameters (M) | GFLOPs | FPS (Frames/s) |
---|---|---|---|
Faster-RCNN | 41.1 | 78.1 | 15.5 |
RetinaNET | 36.1 | 81.6 | 12.3 |
YOLOV6 | 17.1 | 21.8 | 26.3 |
YOLOv7 | 6.5 | 13.9 | 39.5 |
YOLOX | 8.9 | 13.3 | 46.5 |
YOLOv5 | 7.1 | 15.8 | 67.0 |
Light-ViTeYOLO | 3.89 | 7.4 | 57.9 |
Model | Parameters (M) | GFLOPs | [email protected](%) | [email protected]:0.95(%) |
---|---|---|---|---|
baseline | 7.02 | 15.8 | 97.6 | 90.8 |
+EfficientViT | 3.74 | 6.8 | 97.22 | 93.6 |
+CACSNet | 3.89 | 7.4 | 97.20 | 94.0 |
+EIOU | 3.89 | 7.4 | 97.27 | 94.3 |
Model | Parameters (M) | GFLOPs | [email protected](%) | [email protected]:0.95(%) |
---|---|---|---|---|
YOLOv5 +ViT | 7.02 | 15.6 | 97.2 | 95.8 |
YOLOv5 +BoTNet | 6.69 | 15.5 | 97.2 | 95.9 |
YOLOv5 +CoNet | 8.19 | 16.8 | 97.2 | 95.4 |
YOLOv5 +Shufflenetv2 | 3.79 | 7.9 | 96.07 | 87.87 |
YOLOv5 +Mobilenetv3 | 3.19 | 5.9 | 94.19 | 86.83 |
YOLOv5 +RepVGG | 7.19 | 16.3 | 97.13 | 85.93 |
YOLOv5 +GhostNet | 3.68 | 8.1 | 96.49 | 86.06 |
YOLOv5 +EfficientViT | 3.74 | 6.8 | 97.22 | 93.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, Q.; Zou, S.; Wang, H.; Luo, W.; Zheng, X.; Liu, L.; Meng, Z. A Lightweight Pine Wilt Disease Detection Method Based on Vision Transformer-Enhanced YOLO. Forests 2024, 15, 1050. https://doi.org/10.3390/f15061050
Yuan Q, Zou S, Wang H, Luo W, Zheng X, Liu L, Meng Z. A Lightweight Pine Wilt Disease Detection Method Based on Vision Transformer-Enhanced YOLO. Forests. 2024; 15(6):1050. https://doi.org/10.3390/f15061050
Chicago/Turabian StyleYuan, Quanbo, Suhua Zou, Huijuan Wang, Wei Luo, Xiuling Zheng, Lantao Liu, and Zhaopeng Meng. 2024. "A Lightweight Pine Wilt Disease Detection Method Based on Vision Transformer-Enhanced YOLO" Forests 15, no. 6: 1050. https://doi.org/10.3390/f15061050
APA StyleYuan, Q., Zou, S., Wang, H., Luo, W., Zheng, X., Liu, L., & Meng, Z. (2024). A Lightweight Pine Wilt Disease Detection Method Based on Vision Transformer-Enhanced YOLO. Forests, 15(6), 1050. https://doi.org/10.3390/f15061050