Cherry Tomato Detection for Harvesting Using Multimodal Perception and an Improved YOLOv7-Tiny Neural Network
Abstract
:1. Introduction
- (1)
- Multimodal RGB-D images are utilized in combination with simple preprocessing methods to screen regions of interest (ROIs) for cherry tomato detection to improve efficiency.
- (2)
- To better utilize depth information for cherry tomato detection, the normal vector angles of a point cloud are introduced and combined with the non-luminance color channels in the Lab color space of an RGB image as the input to neural networks.
- (3)
- In addition to the multimodal image input, YOLOv7-tiny has been improved from three different aspects: eliminating the “Objectness” output layer; introducing a new “Classness” method for prediction box; and improving the NMS by using a hybrid method.
- (4)
- The proposed approach has been evaluated with a cherry tomato harvesting robot in a commercial greenhouse farm, and it outperforms several state-of-the-art detection neural networks in precision, recall, and accuracy while running at 26 FPS on Nvidia Jetson TX1. And cherry tomato picking based on the detection results shows promising potential for practical applications.
2. Materials and Methods
2.1. Cherry Tomato Picking Robot
2.1.1. Main Hardware Modules
2.1.2. Main Software Modules
2.2. Multimodal Perception
2.2.1. Obstacle Avoidance
2.2.2. Cherry Tomato ROI Image Patches
2.2.3. Normal Vector Angles of a Point Cloud
- (1)
- The integral image of the Z data of the point cloud is constructed, which is the sum of all Z coordinate values in the rectangular area from to .
- (2)
- The horizontal and vertical vectors are computed as [33]
- (3)
- The local point cloud normal vector, as shown in Figure 6a, is
2.2.4. Depth Value in the Prediction Box
2.2.5. YOLOv7-Tiny-CTD: Improvement of YOLOv7-Tiny for Cherry Tomato Detection
- (1)
- Replacing the color image input with a multimodal image input. Most of the object detection models, including YOLOv7, use color image input. In some methods using RGB-D input, the color image is first sent into the model to obtain the preliminary results and combined with the depth map for further processing, which struggles to make full use of the depth information. Hybrid RGB-D DNNs have been proposed to better utilize both color and depth information [34,35,36]. But they usually introduce significant complexity. For simplicity, we adjust the network structure of the input of the YOLOv7-tiny and add the depth map as a separate channel to the color channels in Lab space to feed into the network. As shown in Figure 8, an RGB image is replaced by a 4-channel image by adding the mask map described above.
- (2)
- Eliminating the “Objectness” output layer of YOLOv7-tiny given that there is only one type of detection target for cherry tomatoes so for every prediction box the object inside must be a cherry tomato.
- (3)
- Defining a new “Classness” method of the prediction box:
- (4)
2.2.6. Evaluation Metrics
2.3. Eye–Hand Calibration
2.4. End Effector Trajectory Planning and Cherry Tomato Picking
3. Experimental Results
3.1. Dataset
3.2. Model Training and Testing
3.3. YOLOv7-Tiny-CTD Compared with Existing Models
3.4. Cherry Tomato Robot Picking Results
4. Discussion
5. Conclusions
6. Patents
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bogue, R. Fruit picking robots: Has their time come? Ind. Robot Int. J. Robot. Res. Appl. 2020, 47, 141–145. [Google Scholar] [CrossRef]
- Ceres, R.; Pons, J.L.; Jimenez, A.; Martin, J.; Calderon, L. Design and implementation of an aided fruit-harvesting robot (Agribot). Ind. Robot Int. J. 1998, 25, 337–346. [Google Scholar] [CrossRef]
- Muscato, G.; Prestifilippo, M.; Abbate, N.; Rizzuto, I. A prototype of an orange picking robot: Past history, the new robot and experimental results. Ind. Robot Int. J. 2005, 32, 128–138. [Google Scholar] [CrossRef]
- Scarfe, A.J.; Flemmer, R.C.; Bakker, H.; Flemmer, C.L. Development of an autonomous kiwifruit picking robot. In Proceedings of the 4th International Conference on Autonomous Robots and Agents, Wellington, New Zealand, 10–12 February 2009; pp. 380–384. [Google Scholar]
- Hua, X.; Li, H.; Zeng, J.; Han, C.; Chen, T.; Tang, L.; Luo, Y. A review of target recognition technology for fruit picking robots: From digital image processing to deep learning. Appl. Sci. 2023, 13, 4160. [Google Scholar] [CrossRef]
- Pal, A.; Leite, A.C.; From, P.J. A novel end-to-end vision-based architecture for agricultural human–robot collaboration in fruit picking operations. Robot. Auton. Syst. 2024, 172, 104567. [Google Scholar] [CrossRef]
- Chen, B.; Gong, L.; Yu, C.; Du, X.; Chen, J.; Xie, S.; Le, X.; Li, Y.; Liu, C. Workspace decomposition based path planning for fruit-picking robot in complex greenhouse environment. Comput. Electron. Agric. 2023, 215, 108353. [Google Scholar] [CrossRef]
- Bulanon, D.M.; Kataoka, T.; Ota, Y.; Hiroma, T. AE—Automation and emerging technologies: A segmentation algorithm for the automatic recognition of Fuji apples at harvest. Biosyst. Eng. 2002, 83, 405–412. [Google Scholar] [CrossRef]
- Payne, A.B.; Walsh, K.B.; Subedi, P.; Jarvis, D. Estimation of mango crop yield using image analysis–segmentation method. Comput. Electron. Agric. 2013, 91, 57–64. [Google Scholar] [CrossRef]
- Senthilnath, J.; Dokania, A.; Kandukuri, M.; Ramesh, K.; Anand, G.; Omkar, S. Detection of tomatoes using spectral-spatial methods in remotely sensed RGB images captured by UAV. Biosyst. Eng. 2016, 146, 16–32. [Google Scholar] [CrossRef]
- Luo, L.; Tang, Y.; Zou, X.; Wang, C.; Zhang, P.; Feng, W. Robust grape cluster detection in a vineyard by combining the AdaBoost framework and multiple color components. Sensors 2016, 16, 2098. [Google Scholar] [CrossRef]
- Teixidó, M.; Font, D.; Pallejà, T.; Tresanchez, M.; Nogués, M.; Palacín, J. Definition of linear color models in the RGB vector color space to detect red peaches in orchard images taken under natural illumination. Sensors 2012, 12, 7701–7718. [Google Scholar] [CrossRef] [PubMed]
- Kurtulmus, F.; Lee, W.S.; Vardar, A. Immature peach detection in colour images acquired in natural illumination conditions using statistical classifiers and neural network. Precis. Agric. 2014, 15, 57–79. [Google Scholar] [CrossRef]
- Zhou, W.; Meng, F.; Li, K. A cherry tomato classification-picking Robot based on the K-means algorithm. J. Phys. Conf. Ser. 2020, 1651, 012126. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Chen, J.; Wang, Z.; Wu, J.; Hu, Q.; Zhao, C.; Tan, C.; Teng, L.; Luo, T. An improved Yolov3 based on dual path network for cherry tomatoes detection. J. Food Process Eng. 2021, 44, e13803. [Google Scholar] [CrossRef]
- Zheng, H.; Wang, G.; Li, X. YOLOX-Dense-CT: A detection algorithm for cherry tomatoes based on YOLOX and DenseNet. J. Food Meas. Charact. 2022, 16, 4788–4799. [Google Scholar] [CrossRef]
- Yan, Y.; Zhang, J.; Bi, Z.; Wang, P. Identification and Location Method of Cherry Tomato Picking Point Based on Si-YOLO. In Proceedings of the IEEE 13th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Qinhuangdao, China, 11–14 July 2023; pp. 373–378. [Google Scholar]
- Wang, C.; Wang, C.; Wang, L.; Wang, J.; Liao, J.; Li, Y.; Lan, Y. A lightweight cherry tomato maturity real-time detection algorithm based on improved YOLOV5n. Agronomy 2023, 13, 2106. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Zhou, H.; Li, Z.; Ning, C.; Tang, J. Cad: Scale invariant framework for real-time object detection. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 760–768. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar]
- Hosang, J.; Benenson, R.; Schiele, B. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4507–4515. [Google Scholar]
- Cui, B.; Zeng, Z.; Tian, Y. A Yolov7 cherry tomato identification method that integrates depth information. In Proceedings of the Third International Conference on Optics and Image Processing (ICOIP 2023), Hangzhou, China, 14–16 April 2023; Volume 12747, pp. 312–320. [Google Scholar]
- Gursoy, E.; Navarro, B.; Cosgun, A.; Kulić, D.; Cherubini, A. Towards vision-based dual arm robotic fruit harvesting. In Proceedings of the IEEE 19th International Conference on Automation Science and Engineering (CASE), Auckland, New Zealand, 26–30 August 2023; pp. 1–6. [Google Scholar]
- Wang, H.; Cui, B.; Wen, X.; Jiang, Y.; Gao, C.; Tian, Y. Pallet detection and estimation with RGB-D salient feature learning. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 8914–8919. [Google Scholar]
- Durmus, D. CIELAB color space boundaries under theoretical spectra and 99 test color samples. Color Res. Appl. 2020, 45, 796–802. [Google Scholar] [CrossRef]
- Tian, Y. Dynamic focus window selection using a statistical color model. Digit. Photogr. II 2006, 6069, 98–106. [Google Scholar]
- Serra, J.; Vincent, L. An overview of morphological filtering. Circuits Syst. Signal Process. 1992, 11, 47–108. [Google Scholar] [CrossRef]
- Fabbri, R.; Costa, L.D.F.; Torelli, J.C.; Bruno, O.M. 2D Euclidean distance transform algorithms: A comparative survey. ACM Comput. Surv. (CSUR) 2008, 40, 1–44. [Google Scholar] [CrossRef]
- Holzer, S.; Rusu, R.B.; Dixon, M.; Gedikli, S.; Navab, N. Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 2684–2689. [Google Scholar]
- Zia, S.; Yuksel, B.; Yuret, D.; Yemez, Y. RGB-D object recognition using deep convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 896–903. [Google Scholar]
- Gené-Mola, J.; Vilaplana, V.; Rosell-Polo, J.R.; Morros, J.-R.; Ruiz-Hidalgo, J.; Gregorio, E. Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities. Comput. Electron. Agric. 2019, 162, 689–698. [Google Scholar] [CrossRef]
- Eitel, A.; Springenberg, J.T.; Spinello, L.; Riedmiller, M.; Burgard, W. Multimodal deep learning for robust RGB-D object recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 681–687. [Google Scholar]
- Guan, L.; Wang, F.; Li, B.; Tang, R.; Wei, R.; Deng, H.; Tian, Y. Adaptive automotive chassis welding joint inspection using a cobot and a multi-modal vision sensor. In Proceedings of the International Conference on Digital Economy and Artificial Intelligence, Shenzhen, China, 24–26 June 2024; pp. 841–849. [Google Scholar]
- Jiang, J.; Luo, X.; Luo, Q.; Qiao, L.; Li, M. An overview of hand-eye calibration. Int. J. Adv. Manuf. Technol. 2022, 119, 77–97. [Google Scholar] [CrossRef]
- Enebuse, I.; Foo, M.; Ibrahim, B.S.K.K.; Ahmed, H.; Supmak, F.; Eyobu, O.S. A comparative review of hand-eye calibration techniques for vision guided robots. IEEE Access 2021, 9, 113143–113155. [Google Scholar] [CrossRef]
- Zhou, Q.; Zhang, W.; Li, R.; Wang, J.; Zhen, S.; Niu, F. Improved YOLOv5-S object detection method for optical remote sensing images based on contextual transformer. J. Electron. Imaging 2022, 31, 043049. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Kim, S.; Na, S.; Kong, B.Y.; Choi, J.; Park, I.-C. Real-time SSDLite object detection on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 1192–1205. [Google Scholar] [CrossRef]
- Fukaya, N.; Toyama, S.; Asfour, T.; Dillmann, R. Design of the TUAT/Karlsruhe humanoid hand. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Takamatsu, Japan, 30 October–5 November 2000; Volume 3, pp. 1754–1759. [Google Scholar]
- Parlikar, S.; Jagannath, V. Application of pneumatic soft actuators as end-effectors on a humanoid torso playing percussion instrument. In Proceedings of the 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 17–19 March 2021; pp. 676–680. [Google Scholar]
- Ramón, J.L.; Calvo, R.; Trujillo, A.; Pomares, J.; Felicetti, L. Trajectory optimization and control of a free-floating two-arm humanoid robot. J. Guid. Control Dyn. 2022, 45, 1661–1675. [Google Scholar] [CrossRef]
NMS | Advantage | Disadvantage |
---|---|---|
Regular | Simple | Sequential processing, IOU selected based on experience |
Weighted [22] | High precision | Sequential processing, low efficiency |
DIOU [23] | High recall; can be combined with other methods | Low efficiency; abnormal conditions when centers coincide |
CIOU [24] | Overcome DIOU anomalies | Low efficiency; increasing number of iterations |
Learning [25] | No hand-crafted settings | Complex implementation |
Environment | Parameters/Version |
---|---|
Operating system | Ubuntu18.04 |
CPU | Intel i7-10700F |
Memory | 16G |
GPU | NVIDIA RTX3070 |
CUDA | 11.2 |
CUDNN | 8.1.1 |
Python | 3.8 |
PaddlePaddle-GPU | 2.2.1 |
Model | Input | mAP | Recall | Accuracy |
---|---|---|---|---|
YOLOv5-s | RGB | 86.0% | 87.6% | 88.2% |
RGB-D | 91.1% | 89.7% | 91.2% | |
Faster R-CNN | RGB | 88.0% | 90.1% | 89.1% |
RGB-D | 91.8% | 91.3% | 91.5% | |
SSDLite | RGB | 87.3% | 87.9% | 88.7% |
RGB-D | 89.6% | 90.2% | 91.1% | |
YOLOv7-tiny | RGB | 90.4% | 90.2% | 89.9% |
RGB-D | 92.8% | 91.6% | 91.8% | |
YOLOv7-tiny-CTD | RGB-D | 94.9% | 96.1% | 95.7% |
Trial | Num of Fruits with Picking Action | Num of Successes on 1st Attempt | Num of Successes on 2nd Attempt | Success Rate with Two Attempts |
---|---|---|---|---|
1 | 77 | 56 | 62 | 81% |
2 | 78 | 57 | 63 | 81% |
3 | 78 | 60 | 65 | 83% |
4 | 72 | 61 | 63 | 87% |
5 | 71 | 61 | 61 | 86% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, Y.; Cui, B.; Deng, H.; Zeng, Z.; Wang, Q.; Lu, D.; Cui, Y.; Tian, Y. Cherry Tomato Detection for Harvesting Using Multimodal Perception and an Improved YOLOv7-Tiny Neural Network. Agronomy 2024, 14, 2320. https://doi.org/10.3390/agronomy14102320
Cai Y, Cui B, Deng H, Zeng Z, Wang Q, Lu D, Cui Y, Tian Y. Cherry Tomato Detection for Harvesting Using Multimodal Perception and an Improved YOLOv7-Tiny Neural Network. Agronomy. 2024; 14(10):2320. https://doi.org/10.3390/agronomy14102320
Chicago/Turabian StyleCai, Yingqi, Bo Cui, Hong Deng, Zhi Zeng, Qicong Wang, Dajiang Lu, Yukang Cui, and Yibin Tian. 2024. "Cherry Tomato Detection for Harvesting Using Multimodal Perception and an Improved YOLOv7-Tiny Neural Network" Agronomy 14, no. 10: 2320. https://doi.org/10.3390/agronomy14102320
APA StyleCai, Y., Cui, B., Deng, H., Zeng, Z., Wang, Q., Lu, D., Cui, Y., & Tian, Y. (2024). Cherry Tomato Detection for Harvesting Using Multimodal Perception and an Improved YOLOv7-Tiny Neural Network. Agronomy, 14(10), 2320. https://doi.org/10.3390/agronomy14102320