Grape Cluster Real-Time Detection in Complex Natural Scenes Based on YOLOv5s Deep Learning Network
Abstract
:1. Introduction
2. Materials and Methods
2.1. Materials
2.2. Grape Cluster Detection Based on YOLOv5s Deep Learning Algorithm
2.2.1. YOLOv5s Network Frame
- Mosaic data augmentation can make the network more robust.
- Convolution, Batch Normalization and Leaky-ReLU (CBL): A module composed of the activation of functions within the Convolution layer, the Batch Normalization layer and the Leaky-ReLU.
- Res unit: By drawing on the residual structure in the Resnet network, the network can be built deeper.
- CSP structure: There are two CSP structures in the network. The CSP1_X structure is applied to the Backbone network, which can reduce the amount of computation while ensuring detection precision. The CSP2_X structure is applied to the Neck, which can strengthen the network feature fusion abilities [26].
- SPP: A spatial pyramid pooling layer, which mainly converts convolutional features of different sizes into pooled features of the same length [29].
2.2.2. Fine-Tuning and Training of YOLOv5s Grape Detection Model
- Organization of data. After the grape image data was downloaded, preprocessing operations such as cleaning, screening and resolution adjustment were performed. The grape clusters in all images were manually labeled, and the data set was divided into a training set (6059 images), a validation set (866), and a testing set 1732) in a ratio of 7:1:2.
- Fine-tuning of model parameters. To obtain a better grape cluster detection effect based on YOLOv5s, the model parameters were fine-tuned, mainly including network input size, batch size, classes, epoch, learning rate, “conf-thres” and “Iou-thres.” Parameters of YOLOv5s in the work are shown in Table 2.
- Network training and testing. The YOLOv5s grape cluster detection model was trained by using the training set and the validation set. After the training, the weight file of the detection model was obtained, and the performance of the model was evaluated by using the test set. The network produced the location box and probability of the identified grape target.
2.2.3. Training Results of Grape Detection Model Based on YOLOv5s
2.3. Model Performance Ealuation
3. Results
3.1. Grape Cluster Target Detection Results and Analysis
3.2. Comparison of Different Target Detection Algorithms
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Torres-Sánchez, J.; Mesas-Carrascosa, F.J.; Santesteban, L.-G.; Jiménez-Brenes, F.M.; Oneka, O.; Villa-Llop, A.; Loidi, M.; López-Granados, F. Grape cluster detection using UAV photogrammetric point clouds as a low-cost tool for yield forecasting in vineyards. Sensors 2021, 21, 3083. [Google Scholar] [CrossRef]
- Gennaro, S.F.D.; Toscano, P.; Cinat, P.; Berton, A.; Matese, A. A low-cost and unsupervised image recognition methodology for yield estimation in a vineyard. Front. Plant Sci. 2019, 10, 559. [Google Scholar] [CrossRef] [PubMed]
- Marani, R.; Milella, A.; Petitti, A.; Reina, G. Deep neural networks for grape bunch segmentation in natural images from a consumer-grade camera. Precis. Agric. 2021, 22, 387–413. [Google Scholar] [CrossRef]
- Liu, S.; Whitty, M. Automatic grape bunch detection in vineyards with an SVM classifier. J. Appl. Log. 2015, 13, 643–653. [Google Scholar] [CrossRef]
- Liu, S.; Cossell, S.; Tang, J.; Dunn, G.; Whitty, M. A computer vision system for early stage grape yield estimation based on shoot detection. Comput. Electron. Agric. 2017, 137, 88–101. [Google Scholar] [CrossRef]
- Cecotti, H.; Rivera, A.; Farhadloo, M.; Pedroza, M.A. Grape detection with convolutional neural networks. Expert Syst. Appl. 2020, 159, 113588. [Google Scholar] [CrossRef]
- Ghiani, L.; Sassu, A.; Palumbo, F.; Mercenaro, L.; Gambella, F. In-Field automatic detection of grape bunches under a totally uncontrolled environment. Sensors 2021, 21, 3908. [Google Scholar] [CrossRef]
- Aquino, A.; Diago, M.P.; Millán, B.; Tardáguila, J. A new methodology for estimating the grapevine-berry number per cluster using image analysis. Biosyst. Eng. 2017, 156, 80–95. [Google Scholar] [CrossRef]
- Nuske, S.; Wilshusen, K.; Achar, S.; Yoder, L.; Narasimhan, S.; Singh, S. Automated visual yield estimation in vineyards. J. Field Robot. 2014, 31, 837–860. [Google Scholar] [CrossRef]
- Badeka, E.; Kalabokas, T.; Tziridis, K.; Nicolaou, A.; Vrochidou, E.; Mavridou, E.; Papakostas, G.A.; Pachidis, T. Grapes Visual Segmentation for Harvesting Robots Using Local Texture Descriptors. Comput. Vis. Syst. 2019, 11754, 98–109. [Google Scholar] [CrossRef]
- Luo, L.; Tang, Y.; Lu, Q.; Chen, X.; Zhang, P.; Zou, X. A vision methodology for harvesting robot to detect cutting points on peduncles of double overlapping grape clusters in a vineyard. Comput. Ind. 2018, 99, 130–139. [Google Scholar] [CrossRef]
- Pérez-Zavala, R.; Torres-Torriti, M.; Cheein, F.A.; Troni, G. A pattern recognition strategy for visual grape bunch detection in vineyards. Comput. Electron. Agric. 2018, 151, 136–149. [Google Scholar] [CrossRef]
- Santos, T.T.; de Souza, L.L.; dos Santos, A.A.; Avila, S. Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association. Comput. Electron. Agric. 2020, 170, 105247. [Google Scholar] [CrossRef]
- Milella, A.; Marani, R.; Petitti, A.; Reina, G. In-field high throughput grapevine phenotyping with a consumer-grade depth camera. Comput. Electron. Agric. 2019, 156, 293–306. [Google Scholar] [CrossRef]
- Grimm, J.; Herzog, K.; Rist, F.; Kicherer, A.; Töpfer, R.; Steinhage, V. An adaptable approach to automated visual detection of plant organs with applications in grapevine breeding. Biosyst. Eng. 2019, 183, 170–183. [Google Scholar] [CrossRef]
- Zabawa, L.; Kicherer, A.; Klingbeil, L.; Töpferc, R.; Kuhlmanna, H.; Roscher, R. Counting of grapevine berries in images via semantic segmentation using convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2020, 164, 73–83. [Google Scholar] [CrossRef]
- Aguiar, A.S.; Magalhães, S.A.; dos Santos, F.N.; Castro, L.; Pinho, T.; Valente, J.; Martins, R.; Boaventura-Cunha, J. Grape bunch detection at different growth stages using deep learning quantized models. Agronomy 2021, 11, 1890. [Google Scholar] [CrossRef]
- Yin, W.; Wen, H.; Ning, Z.; Ye, J.; Dong, Z.; Luo, L. Fruit detection and pose estimation for grape cluster–harvesting robot using binocular imagery based on deep neural networks. Front. Robot. AI 2021, 8, 626989. [Google Scholar] [CrossRef]
- Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
- Bargoti, S.; Underwood, J. Deep fruit detection in orchards. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3626–3633. [Google Scholar] [CrossRef]
- Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
- Tzutalin. LabelImg. Available online: https://github.com/tzutalin/labelImg (accessed on 15 June 2022).
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, Real-time ObjectDetection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Ultralytics. Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 15 June 2022).
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. arXiv 2011, arXiv:2011.08036. [Google Scholar]
Variety | Number of Original Images | Number of Images after Amplification | Interfering Factors | |
---|---|---|---|---|
Grape | Kyoho | 128 | 1408 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes |
Summer Black | 119 | 1309 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes | |
Cabernet Sauvignon | 108 | 1188 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes | |
Midknight Beauty | 100 | 1100 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes | |
Manicure Finger | 98 | 1078 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes | |
Fujiminori | 120 | 1320 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes | |
Syrah | 114 | 1254 | scene differences, image size, quality differences, shooting techniques, occlusion and overlap, light changes |
Parameters | Values |
---|---|
Input size | 640 × 640 |
Batch_size | 8 |
Classes | 1 |
Epoch | 300 |
Learning rate | 1.0 × 10−2 |
Conf-thres | 0.001 |
Iou-thres | 0.6 |
Evaluation Indicator | Precision (%) | Recall (%) | mAP (%) | F1 (%) | Detection Speed (fps) | Size (MB) |
---|---|---|---|---|---|---|
Results | 99.40 | 99.40 | 99.40 | 99.40 | 344.83 | 13.67 |
Network Model | Precision (%) | Recall (%) |
---|---|---|
Mask R-CNN | 78.80 | 75.90 * |
YOLOv2 | 55.90 | 45.50 |
YOLOv3 | 58.70 | 38.90 |
YOLOv5s (our) | 99.40 | 99.40 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Ding, H.; Shi, Q.; Wang, Y. Grape Cluster Real-Time Detection in Complex Natural Scenes Based on YOLOv5s Deep Learning Network. Agriculture 2022, 12, 1242. https://doi.org/10.3390/agriculture12081242
Zhang C, Ding H, Shi Q, Wang Y. Grape Cluster Real-Time Detection in Complex Natural Scenes Based on YOLOv5s Deep Learning Network. Agriculture. 2022; 12(8):1242. https://doi.org/10.3390/agriculture12081242
Chicago/Turabian StyleZhang, Chuandong, Huali Ding, Qinfeng Shi, and Yunfei Wang. 2022. "Grape Cluster Real-Time Detection in Complex Natural Scenes Based on YOLOv5s Deep Learning Network" Agriculture 12, no. 8: 1242. https://doi.org/10.3390/agriculture12081242
APA StyleZhang, C., Ding, H., Shi, Q., & Wang, Y. (2022). Grape Cluster Real-Time Detection in Complex Natural Scenes Based on YOLOv5s Deep Learning Network. Agriculture, 12(8), 1242. https://doi.org/10.3390/agriculture12081242