Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models
Abstract
:1. Introduction
- Enhancing few-shot crop leaf disease classification accuracy by multimodal integration: By fine-tuning CLIP, we integrate image and text information, providing a paradigm for the development of multimodality in the agricultural field. The experimental results show that our method exhibits excellent classification performance in few-shot scenarios, effectively solving the challenge of data scarcity.
- Fine-grained disease feature description driven by VLM: We innovatively leverage Qwen-VL’s fine-grained recognition to generate detailed textual descriptions of crop diseases as prompt texts from a set of infected crop leaf images to assist in generating discriminative classifier weights. This enhances the model’s sensitivity and accuracy in identifying complex diseases from the images of infected crop leaves.
- Enhancing key textual features by cross-attention and SE Attention: In the process of processing prompt texts, we use cross-attention and SE Attention, respectively, in the training-free and training-required modes to guide the model’s attention to important textual features. By dynamically adjusting the weights of crucial prompt texts’ features, we effectively improve the quality of the model’s classification weights.
2. Related Work
2.1. Development of Crop Disease Image Classification Technology
2.2. Vision–Language Models and Fine-Tuning
2.3. Automatic Generation of Prompt Text
2.4. Attention Mechanism
3. Method
3.1. Background
3.1.1. Zero-Shot CLIP
3.1.2. Cache Model
3.1.3. APE
- The relationship between and , determined using Equation (1), represents the cosine similarity between the test image and the prompt texts.
- The relationship between and can be calculated using a similar method as described in Equation (2):
- The relationship between and involves APE’s zero-shot CLIP prediction on training data, denoted as . To evaluate CLIP’s downstream recognition capability, the KL-divergence, , between and is calculated as follows:
3.1.4. APE-T
- For in Equation (1), APE-T first pads the E-channel into D-channels as by filling the extra channels with zero. The padded , denoted as , is added to , updating CLIP’s zero-shot prediction by the optimized textual features, formulated as
- For in Equation (4), APE-T first broadcasts the C-embedding into as by repeating it for each category. Next, APE-T adds the expanded to element-wise, improving the cache model’s few-shot prediction by optimizing training-set features. This process is formulated as
- For in Equation (5), APE-T makes it learnable during training, enabling adaptive learning of optimal cache scores for different training-set features, determining their contribution to predictions.
3.2. Image-Driven Prompt Text Generation
- Representative image selection: For simplicity, we extract traditional image features for clustering, recognizing the importance of color and texture features in crop disease recognition. These features are highly stable and intuitive, especially in distinguishing subtle and complex disease types. To achieve this, we used the K-means clustering algorithm to select representative images based on both color and texture features. Specifically, we calculate the average color and GLCM (Gray Level Co-occurrence Matrix) texture features for each image, then cluster these combined features using K-means. This approach allows us to cluster each category separately and select M representative images for each of the C categories to form a collection , where denotes the set of representative images belonging to class i.
- Prompt text generation: We sequentially input the selected representative images from each category into Qwen-VL, employing the following unified template command for querying: “Can you help me describe this [CLASS] leaf?”, where [CLASS] is replaced with the specific disease category. This operation aims at generating detailed descriptions that are closely aligned with the image content. As shown in Figure 2, we show three different categories of crop diseases, the prompt text generated by Qwen-VL with image-driven guidance, and the prompt text generated by the language model GPT3.5 [40] and Qwen-VL without image-driven guidance. For instance, in the image-driven prompt text generated for the “grape leaf blight” category, “dark brown spots” are identified as crucial indicators of the disease, and the generated text also describes additional information about the blade surface. In contrast, without image-driven guidance, although the text generated by GPT3.5 and Qwen-VL also includes relevant disease features, it is not as detailed as the prompt text generated with image-driven input.
- Integrate text information: We consolidate and organize the collection of textual descriptions corresponding to representative images of all categories. The generated prompt texts will serve as an important basis for subsequent classification weights.
3.3. Text Feature Fusion in Training-Free (VLCD) Mode
3.4. Text Feature Enhancement in Training-Required (VLCD-T) Mode
4. Experiment
4.1. Settings
4.1.1. Dataset
4.1.2. Implementation Details
4.2. Performance Analysis
4.3. Ablation Study
4.3.1. Different Prompt Texts
4.3.2. Representative Image Selection Strategy
4.3.3. Effectiveness of Attention Mechanisms
4.3.4. Different Network Backbones
5. Visualization
5.1. Performance Comparison of Different Prompt Text Generation Strategies
5.2. Dynamic Changes in Accuracy under the SE Attention Module
6. Summary
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, L.; Zhang, S.; Wang, B. Plant disease detection and classification by deep learning—A review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
- Cheng, X.; Wu, Y.; Zhang, Y.; Yue, Y. Image recognition of stored grain pests: Based on deep convolutional neural network. Chin. Agric. Sci. Bull. 2018, 34, 154–158. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR: New York, NY, USA, 2021; pp. 8748–8763. [Google Scholar]
- Zhang, J.; Huang, J.; Jin, S.; Lu, S. Vision-Language Models for Vision Tasks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5625–5644. [Google Scholar] [CrossRef] [PubMed]
- Bai, J.; Bai, S.; Yang, S.; Wang, S.; Tan, S.; Wang, P.; Lin, J.; Zhou, C.; Zhou, J. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond. arXiv 2023, arXiv:2308.12966. [Google Scholar]
- Zhang, R.; Zhang, W.; Fang, R.; Gao, P.; Li, K.; Dai, J.; Qiao, Y.; Li, H. Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification. In European Conference on Computer Vision; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 493–510. [Google Scholar]
- Zhu, X.; Zhang, R.; He, B.; Zhou, A.; Wang, D.; Zhao, B.; Gao, P. Not All Features Matter: Enhancing Few-Shot CLIP with Adaptive Prior Refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 2605–2615. [Google Scholar]
- Ng, A. AI Doesn’t Have to Be Too Complicated or Expensive for Your Business. Harvard Business Review. 2021. Available online: https://hbr.org/2021/07/ai-doesnt-have-to-be-too-complicated-or-expensive-for-your-business (accessed on 4 July 2024).
- Hamid, O.H. Data-Centric and Model-Centric AI: Twin Drivers of Compact and Robust Industry 4.0 Solutions. Appl. Sci. 2023, 13, 2753. [Google Scholar] [CrossRef]
- Irmak, G.; Saygılı, A. A novel approach for tomato leaf disease classification with deep convolutional neural networks. J. Agric. Sci. 2024, 30, 367–385. [Google Scholar] [CrossRef]
- Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
- Guo, P.; Liu, T.; Li, N. Design of automatic recognition of cucumber disease image. Inf. Technol. J. 2014, 13, 2129. [Google Scholar] [CrossRef]
- Zhang, S.; Wu, X.; You, Z.; Zhang, L. Leaf image based cucumber disease recognition using sparse representation classification. Comput. Electron. Agric. 2017, 134, 135–141. [Google Scholar] [CrossRef]
- Kaya, A.; Keceli, A.S.; Catal, C.; Yalic, H.Y.; Temucin, H.; Tekinerdogan, B. Analysis of transfer learning for deep neural network based plant classification models. Comput. Electron. Agric. 2019, 158, 20–29. [Google Scholar] [CrossRef]
- Bai, Y.; Hou, F.; Fan, X.; Lin, W.; Lu, J.; Zhou, J.; Fan, D.; Li, L. An interpretable high-accuracy method for rice disease detection based on multi-source data and transfer learning. Agriculture 2023, 13, 1–23. [Google Scholar]
- Li, Y.; Chao, X. Semi-supervised few-shot learning approach for plant diseases recognition. Plant Methods 2021, 17, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Nuthalapati, S.V.; Tunga, A. Multi-domain few-shot learning and dataset for agricultural applications. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021. [Google Scholar]
- Li, X.; Wen, C.; Hu, Y.; Yuan, Z.; Zhu, X.X. Vision-language models in remote sensing: Current progress and future trends. IEEE Geosci. Remote Sens. Mag. 2024, 12, 32–66. [Google Scholar] [CrossRef]
- Bossard, L.; Guillaumin, M.; Van Gool, L. Food-101–mining discriminative components with random forests. In Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Part VI. Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 446–461. [Google Scholar]
- Kiela, D.; Firooz, H.; Mohan, A.; Goswami, V.; Singh, A.; Ringshia, P.; Testuggine, D. The hateful memes challenge: Detecting hate speech in multimodal memes. Adv. Neural Inf. Process. Syst. 2020, 33, 2611–2624. [Google Scholar]
- Zhou, K.; Yang, J.; Loy, C.C.; Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 2022, 130, 2337–2348. [Google Scholar] [CrossRef]
- Zhou, K.; Yang, J.; Loy, C.C.; Liu, Z. Conditional Prompt Learning for Vision-Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16816–16825. [Google Scholar]
- Yao, H.; Zhang, R.; Xu, C. Visual-language prompt tuning with knowledge-guided context optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6757–6767. [Google Scholar]
- Gao, P.; Geng, S.; Zhang, R.; Ma, T.; Fang, R.; Zhang, Y.; Li, H.; Qiao, Y. Clip-adapter: Better vision-language models with feature adapters. Int. J. Comput. Vis. 2023, 132, 581–595. [Google Scholar] [CrossRef]
- Yu, T.; Lu, Z.; Jin, X.; Chen, Z.; Wang, X. Task residual for tuning vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10899–10909. [Google Scholar]
- Li, X.; Lian, D.; Lu, Z.; Bai, J.; Chen, Z.; Wang, X. GraphAdapter: Tuning Vision-Language Models with Dual Knowledge Graph. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 3–6 December 2023; pp. 13448–13466. [Google Scholar]
- Lu, Z.; Bai, J.; Li, X.; Xiao, Z.; Wang, X. Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models. In Proceedings of the Forty-first International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Lewis, K.M.; Mu, E.; Dalca, A.V.; Guttag, J. Gist: Generating image-specific text for fine-grained object classification. arXiv 2023, arXiv:2307.11315. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 29 July 2024).
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Martins, A.; Astudillo, R. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of the International Conference on Machine Learning, 19–24 June 2016; PMLR: New York, NY, USA, 2016; pp. 1614–1623. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef]
- Lei, Z.; Zhang, G.; Wu, L.; Zhang, K.; Liang, R. A multi-level mesh mutual attention model for visual question answering. Data Sci. Eng. 2022, 7, 339–353. [Google Scholar] [CrossRef]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8844–8854. [Google Scholar]
- Maniparambil, M.; Vorster, C.; Molloy, D.; Murphy, N.; McGuinness, K.; O’Connor, N.E. Enhancing CLIP with GPT-4: Harnessing visual descriptions as prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 262–271. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A dataset for visual plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 249–253. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Shen, S.; Li, L.H.; Tan, H.; Bansal, M.; Rohrbach, A.; Chang, K.-W.; Yao, Z.; Keutzer, K. How much can clip benefit vision-and-language tasks? arXiv 2021, arXiv:2107.06383. [Google Scholar]
Disease Category | Number of Images | Disease Category | Number of Images |
---|---|---|---|
apple black rot | 621 | strawberry leaf scorch | 1109 |
apple cedar apple rust | 275 | tomato bacterial spot | 2127 |
apple healthy | 1645 | tomato early blight | 1000 |
apple scab | 630 | tomato healthy | 1591 |
blueberry healthy | 1502 | tomato late blight | 1915 |
cherry healthy | 854 | tomato leaf mold | 952 |
cherry powdery mildew | 1052 | tomato mosaic virus | 373 |
corn cercospora leaf spot gray leaf spot | 513 | tomato septoria leaf spot | 1771 |
corn common rust | 1192 | tomato spider mites | 1676 |
corn healthy | 1162 | tomato target spot | 1404 |
corn northern blight | 985 | tomato yellow leaf curl virus | 3357 |
grape black rot | 1180 | coffee healthy | 282 |
grape black measles | 1383 | coffee red spider mite | 136 |
grape healthy | 423 | coffee rust | 282 |
grape leaf blight | 1076 | cotton diseased | 288 |
orange huanglongbing | 5507 | cotton healthy | 427 |
peach bacterial spot | 2297 | cucumber diseased | 227 |
peach healthy | 360 | cucumber healthy | 241 |
pepper bell bacterial spot | 997 | lemon diseased | 67 |
pepper bell healthy | 1491 | lemon healthy | 149 |
potato early blight | 1000 | mango diseased | 255 |
potato healthy | 152 | mango healthy | 159 |
potato late blight | 1000 | pomegranate diseased | 261 |
raspberry healthy | 371 | pomegranate healthy | 277 |
soybean healthy | 5090 | rice bacterial leaf blight | 40 |
squash powdery mildew | 1835 | rice brown spot | 40 |
strawberry healthy | 456 | rice leaf smut | 40 |
Few-Shot Setup | 1 | 2 | 4 | 8 | 16 |
---|---|---|---|---|---|
Zero-shot CLIP [3]: 13.72 | |||||
Training-free | |||||
Tip-Adapter [6] | 25.42 | 38.31 | 53.95 | 67.54 | 73.46 |
APE [7] | 48.06 | 61.13 | 68.34 | 74.70 | 77.20 |
VLCD | 49.31 | 62.55 | 69.20 | 75.14 | 77.44 |
Training-required | |||||
CoOp [21] | 43.44 | 42.01 | 64.14 | 70.79 | 85.59 |
KgCoOp [23] | 25.49 | 27.52 | 32.26 | 25.49 | 55.24 |
CLIP-Adapter [24] | 19.38 | 20.76 | 20.76 | 33.53 | 52.86 |
Tip-Adapter-F [6] | 37.89 | 42.62 | 56.02 | 75.14 | 83.76 |
APE-T [7] | 53.43 | 62.18 | 71.38 | 79.68 | 85.99 |
VLCD-T | 58.16 | 66.07 | 72.32 | 80.95 | 87.31 |
Few-Shot Setup | 0 | 1 | 2 | 4 | 8 | 16 |
---|---|---|---|---|---|---|
Training-free | ||||||
Random selection | 21.62 | 48.96 | 61.41 | 67.98 | 75.08 | 77.39 |
Cluster selection | 22.59 | 49.31 | 62.55 | 69.20 | 75.14 | 77.44 |
Training-required | ||||||
Random selection | - | 57.83 | 65.72 | 72.19 | 80.65 | 87.08 |
Cluster selection | - | 58.16 | 66.07 | 72.32 | 80.95 | 87.31 |
Intra-Class Prompt Text Processing | Inter-Class Prompt Text Processing | Accuracy in Different Few-Shot Setups | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Setup | Average | Cross-Attention | Without SE | With SE | 0 | 1 | 2 | 4 | 8 | 16 |
Training-free | ✓ | - | - | - | 20.80 | 48.06 | 61.13 | 68.34 | 74.70 | 77.20 |
- | ✓ | - | - | 22.59 | 49.31 | 62.55 | 69.20 | 75.14 | 77.44 | |
Training-required | ✓ | - | ✓ | - | - | 53.43 | 62.18 | 71.38 | 79.68 | 85.96 |
- | ✓ | ✓ | - | - | 54.04 | 62.35 | 71.47 | 79.88 | 86.26 | |
✓ | - | - | ✓ | - | 56.61 | 65.14 | 72.06 | 79.96 | 86.71 | |
- | ✓ | - | ✓ | - | 58.16 | 66.07 | 72.32 | 80.95 | 87.31 |
Few-Shot Setup | 1 | 2 | 4 | 8 | 16 |
---|---|---|---|---|---|
r = 2 | 56.36 | 64.57 | 70.02 | 78.58 | 84.91 |
r = 4 | 56.89 | 65.03 | 71.08 | 79.33 | 85.18 |
r = 8 | 58.13 | 66.04 | 70.68 | 78.51 | 84.98 |
r = 16 | 55.56 | 65.20 | 69.97 | 79.90 | 84.91 |
r = 32 | 58.16 | 66.07 | 72.32 | 80.95 | 87.31 |
Few-Shot Setup | 1 | 2 | 4 | 8 | 16 | |
---|---|---|---|---|---|---|
epoch = 10 | without SE | 41.44 | 55.40 | 59.46 | 70.70 | 78.30 |
with SE | 50.41 | 62.37 | 67.35 | 76.06 | 82.75 | |
+8.97 | +6.97 | +7.89 | +5.36 | +4.45 | ||
epoch = 20 | without SE | 52.49 | 60.76 | 68.73 | 76.06 | 82.49 |
with SE | 58.01 | 66.06 | 71.87 | 79.88 | 86.14 | |
+5.52 | +5.30 | +3.14 | +3.73 | +2.68 | ||
epoch = 30 | without SE | 54.04 | 62.35 | 71.07 | 79.68 | 86.26 |
with SE | 58.16 | 66.07 | 72.32 | 80.95 | 87.31 | |
+4.12 | +3.72 | +1.25 | +1.27 | +1.05 | ||
epoch = 40 | without SE | 54.48 | 62.35 | 71.35 | 80.36 | 86.76 |
with SE | 58.36 | 65.62 | 74.45 | 82.10 | 88.00 | |
+3.88 | +3.13 | +3.10 | +1.74 | +1.24 | ||
epoch = 50 | without SE | 54.44 | 63.00 | 72.39 | 81.28 | 87.60 |
with SE | 58.16 | 65.88 | 74.89 | 82.49 | 88.16 | |
+3.72 | +2.88 | +2.50 | +1.21 | +0.56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Y.; Yan, H.; Ding, K.; Cai, T.; Zhang, Y. Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models. Sensors 2024, 24, 6109. https://doi.org/10.3390/s24186109
Zhou Y, Yan H, Ding K, Cai T, Zhang Y. Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models. Sensors. 2024; 24(18):6109. https://doi.org/10.3390/s24186109
Chicago/Turabian StyleZhou, Yueyue, Hongping Yan, Kun Ding, Tingting Cai, and Yan Zhang. 2024. "Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models" Sensors 24, no. 18: 6109. https://doi.org/10.3390/s24186109
APA StyleZhou, Y., Yan, H., Ding, K., Cai, T., & Zhang, Y. (2024). Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models. Sensors, 24(18), 6109. https://doi.org/10.3390/s24186109