RGB-D Heterogeneous Image Feature Fusion for YOLOfuse Apple Detection Model
Abstract
:1. Introduction
1.1. Related Works
1.2. Highlights
2. Materials and Methods
2.1. Materials
2.1.1. Datasets
2.1.2. Image Enhancement
2.2. YOLOfuse Apple Detection Model
2.2.1. YOLOv5s Network Framework
2.2.2. Embedding the CSPDarknet53-Tiny in Backbone
2.2.3. Design Feature Integration Strategy
2.2.4. Embedded CA Attention Module in the Network
2.2.5. Soft-NMS Algorithm
3. Results
3.1. Experiments
3.2. Experimental Results
3.2.1. Training Results
3.2.2. Performance Results
3.2.3. Ablation Experiment of Soft-NMS
3.2.4. Ablation Experiment of Attention
4. Discussion
4.1. Results Discussion
4.2. Discussion Summary
- (1)
- By analyzing data of the public test set, YOLOfuse has relatively higher detection accuracy and relatively lower detection speed, indicating that using RGB-D heterogeneous images as network input can provide richer features to the network than the model with RGB images alone as input. This facilitates the model to make more accurate predictions, but it also increases the computational burden of the whole vision detection system and produces negative effects in terms of model detection speed.
- (2)
- The ablation experiments equipped with both Soft-NMS and NMS demonstrat that the application of Soft-NMS is able to bring a 0.5% improvement in the AP value for the model on the apple fruit detection task. It is verified that the Soft-NMS algorithm has a positive impact on the detection task of occluding overlapping targets.
- (3)
- Ablation experiments equipped with a total of three attention modules, CA, SE, and CBAM prove that each type of attention module can bring some accuracy improvements for the apple fruit detection task, and the reasons for the performance of each model equipped with each of the three modules are analyzed. Considering both detection accuracy and detection speed, we conclude that the CA attention module is more suitable for the apple fruit detection task as it balances the two most important factors of detection accuracy and detection speed for picking robots.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xiaoyang, L.I.U.; Dean, Z.H.A.O.; Weikuan, J.I.A.; Chengzhi, R.U.A.N.; Wei, J.I. Fruits Segmentation Method Based on Superpixel Features for Apple Harvesting Robot. Trans. Chin. Soc. Agric. Mach. 2019, 50, 15–23. [Google Scholar]
- Ultralytics. YOLOv5[R/OL]. Available online: https://github.com/ultralytics/YOLOv5 (accessed on 1 March 2022).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Zhang, Z.Y.; Luo, M.Y.; Guo, S.X.; Li, S.; Zhang, Y. Cherry Fruit Detection Method in Natural Scene Base on Improved YOLO v5. Trans. Chin. Soc. Agric. Mach. 2022, 53, 232–240. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Bin, Y.A.N.; Pan, F.A.N.; Meirong, W.A.N.G.; Shuaiqi, S.H.I.; Xiaoyan, L.E.I.; Fuzeng, Y.A.N.G. Real-time Apple Picking Pattern Recognition for Picking Robot Based on Improved YOLOv5m. Trans. Chin. Soc. Agric. Mach. 2022, 53, 28–38+59. [Google Scholar]
- He, B.; Zhang, Y.B.; Gong, J.L.; Fu, G.; Zhao, Y.; Wu, R. Fast Recognition of Tomato Fruit in Greenhouse at Night Based on Improved YOLO v5. Trans. Chin. Soc. Agric. Mach. 2022, 53, 201–208. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Sun, F.G.; Wang, Y.L.; Lan, P.; Zhang, X.D.; Chen, X.D.; Wang, Z.J. Identification of apple fruit diseases using improved YOLOv5s and transfer learning. Trans. Chin. Soc. Agric. Mach. 2022, 38, 171–179. [Google Scholar]
- Huang, T.M.; Huang, H.Q.; Li, Z.; Shilei, L.; Xiuyun, X.; Qiufang, D.; Wei, W. Citrus fruit recognition method based on the improved model of YOLOv5. J. Huazhong Agric. Univ. 2022, 41, 170–177. [Google Scholar] [CrossRef]
- Lyu, S.; Li, R.; Zhao, Y.; Li, Z.; Fan, R.; Liu, S. Green citrus detection and counting in orchards based on YOLOv5-CS and AI edge system. Sensors 2022, 22, 576. [Google Scholar] [CrossRef]
- Xu, L.; Wang, Y.; Shi, X.; Tang, Z.; Chen, X.; Wang, Y.; Zou, Z.; Huang, P.; Liu, B.; Yang, N.; et al. Real-time and accurate detection of citrus in complex scenes based on HPL-YOLOv4. Comput. Electron. Agric. 2023, 205, 107590. [Google Scholar] [CrossRef]
- Jiang, Z.; Zhao, L.; Li, S.; Jia, Y. Real-time object detection method based on improved YOLOv4-tiny. arXiv 2020, arXiv:2011.04244. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 21–25 June 2021; pp. 13713–13722. [Google Scholar]
- Ferrer-Ferrer, M.; Ruiz-Hidalgo, J.; Gregorio, E.; Vilaplana, V.; Morros, J.R.; Gené-Mola, J. Simultaneous Fruit Detection and Size Estimation Using Multitask Deep Neural Networks[EB/OL]. Available online: https://www.grap.udl.cat/en/publications/papple_rgb-d-size-dataset (accessed on 14 August 2022).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Hosang, J.; Benenson, R.; Schiele, B. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 15–17 June 2017; pp. 4507–4515. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS--improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–10 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Environment | Configuration |
---|---|
Training and testing platform | ecloud |
CPU | Intel(R) Xeon(R) Gold 5118 CPU @ 2.30 GHz |
RAM | 376 GB |
Cloud Storage Space | 50 GB |
GPU | NVIDIA Tesla V00 16 GB |
OS | Ubuntu18.04.03 LST |
Virtual Environment | Anaconda |
NVIDIA GPU Driver | 450.51.05 |
Programming Languages | Python 3.7.6 |
Deep Learning Framework | PyTorch 1.8 |
CUDA Version | 11.0 |
cuDNN Version | 7.6.5 |
Models | Precision | Recall | F1 | AP | FPS | Param/M | Size (MB) |
---|---|---|---|---|---|---|---|
YOLOv3 | 94.7% | 87.1% | 90.8% | 91.8% | 45.676 | 61.524 | 235 |
YOLOv4 | 91.8% | 90.0% | 90.9% | 91.7% | 36.018 | 63.938 | 245 |
YOLOv5s | 95.0% | 87.4% | 91.0% | 93.6% | 54.645 | 7.022 | 26.9 |
YOLOv5m | 94.7% | 90.7% | 92.7% | 94.4% | 41.827 | 20.871 | 79.9 |
YOLOv5l | 92.7% | 91.5% | 92.1% | 95.7% | 33.311 | 46.138 | 176 |
YOLOv5x | 94.0% | 90.7% | 92.3% | 96.1% | 28.602 | 86.217 | 329 |
YOLOv4-Tiny | 93.1% | 85.3% | 89.0% | 91.9% | 154.357 | 5.874 | 22.4 |
Faster RCNN | 57.0% | 92.7% | 70.6% | 92.0% | 17.599 | 28.275 | 109 |
YOLOfuse | 95.4% | 89.1% | 92.1% | 94.2% | 51.761 | 10.701 | 41.1 |
Model | Precision | Recall | F1 | AP |
---|---|---|---|---|
YOLOfuse with NMS | 95.2% | 88.3% | 91.6% | 93.7% |
YOLOfuse with Soft-NMS | 95.4% | 89.1% | 92.1% | 94.2% |
Models | Precision | Recall | F1 | AP | FPS (V100) | Param/M | Size (MB) |
---|---|---|---|---|---|---|---|
YOLOfuse SE | 94.9% | 88.2% | 91.4% | 93.6% | 52.482 | 10.685 | 41.0 |
YOLOfuse CBAM | 95.6% | 88.6% | 92.0% | 94.3% | 48.273 | 10.783 | 41.1 |
YOLOfuse CA | 95.4% | 89.1% | 92.1% | 94.2% | 51.761 | 10.701 | 41.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, L.; Hao, P. RGB-D Heterogeneous Image Feature Fusion for YOLOfuse Apple Detection Model. Agronomy 2023, 13, 3080. https://doi.org/10.3390/agronomy13123080
Liu L, Hao P. RGB-D Heterogeneous Image Feature Fusion for YOLOfuse Apple Detection Model. Agronomy. 2023; 13(12):3080. https://doi.org/10.3390/agronomy13123080
Chicago/Turabian StyleLiu, Liqun, and Pengfei Hao. 2023. "RGB-D Heterogeneous Image Feature Fusion for YOLOfuse Apple Detection Model" Agronomy 13, no. 12: 3080. https://doi.org/10.3390/agronomy13123080
APA StyleLiu, L., & Hao, P. (2023). RGB-D Heterogeneous Image Feature Fusion for YOLOfuse Apple Detection Model. Agronomy, 13(12), 3080. https://doi.org/10.3390/agronomy13123080