An Improved YOLOv5-Based Underwater Object-Detection Framework
Abstract
:1. Introduction
- The backbone framework of RTMDet [32] is introduced into the YOLOv5 object-detection network. The core module CSPLayer block uses 5 × 5 large convolution kernels, increasing the effective receptive field. The introduced backbone structure improves the performance and robustness of the overall object-detection model.
- Inspired by BoT in BoTNet, our study introduces a brand-new BoT3 neck network. We introduce the BoT3 module with a self-attention mechanism into the neck network of the YOLOv5 object-detection framework and improve the image feature extraction ability using MHSA, capturing global information and rich contextual information to identify and localize objects accurately.
- Finally, the improved YOLOv5 detection network architecture was tested on the UPRC2019 and URPC2020 datasets using an MLLE image-enhancement method for data enhancement named union dataset augmentation (UDA). The results of our experiment show that the [email protected] on URPC2019 and URPC2020 after UDA data enhancement reached 79.8% and 79.4%, respectively. The detection accuracy is even improved compared to YOLOv7 and YOLOv8, demonstrating the effectiveness of the proposed network improvement method.
2. Related Work
2.1. Object Detection
2.2. A Review of YOLOv5
2.3. Transformer
2.4. Data Augmentation
3. Method
3.1. Overall Structure
3.2. Improved YOLOv5
3.2.1. Backbone Network
3.2.2. Neck Network with Bottleneck Transformer
3.3. Dataset Augmentation
4. Experimental Details
4.1. Datasets
4.1.1. URPC2019
4.1.2. URPC2020
4.2. Evaluation Indication
4.3. Experiment Settings
5. Experiment
5.1. Experimental Results Obtained with Improved YOLOv5 on URPC2019
5.2. Ablation Experiment
5.3. Comparative Experiment
6. Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MHSA | Multi-Head Self-Attention |
MLLE | Minimal Color Loss and Locally Adaptive Contrast Enhancement |
UDA | Union Dataset Augmentation |
TPH | Transformer Prediction Heads |
BoT | Bottleneck Transformer |
MHSA | Multi-Head Self-Attention |
URPC2019 | Underwater Robot Picking Contest 2019 |
mAP | mean average accuracy |
SOTA | state-of-the-art |
CSPLayer | Cross-Stage Partial Layer |
References
- Lee, M.F.R.; Chen, Y.C. Artificial Intelligence Based Object Detection and Tracking for a Small Underwater Robot. Processes 2023, 11, 312. [Google Scholar] [CrossRef]
- Song, P.; Li, P.; Dai, L.; Wang, T.; Chen, Z. Boosting R-CNN: Reweighting R-CNN samples by RPN’s error for underwater object detection. Neurocomputing 2023, 530, 150–164. [Google Scholar] [CrossRef]
- Javaid, M.; Maqsood, M.; Aadil, F.; Safdar, J.; Kim, Y. An Efficient Method for Underwater Video Summarization and Object Detection Using YoLoV3. Intell. Autom. Soft Comput. 2023, 35, 1295–1310. [Google Scholar] [CrossRef]
- Li, M. Deep-learning-based research on detection algorithms for marine fish. In Proceedings of the Third International Conference on Computer Vision and Data Mining (ICCVDM 2022), Changchun, China, 21–23 July 2023; Volume 12511, pp. 635–639. [Google Scholar]
- Haider, A.; Arsalan, M.; Nam, S.H.; Sultan, H.; Park, K.R. Computer-aided Fish Assessment in an Underwater Marine Environment Using Parallel and Progressive Spatial Information Fusion. J. King Saud-Univ.-Comput. Inf. Sci. 2023, 35, 211–226. [Google Scholar] [CrossRef]
- Hung, K.C.; Lin, S.F. An Adaptive Dynamic Multi-Template Correlation Filter for Robust Object Tracking. Appl. Sci. 2022, 12, 10221. [Google Scholar] [CrossRef]
- Qureshi, S.A.; Hussain, L.; Chaudhary, Q.u.a.; Abbas, S.R.; Khan, R.J.; Ali, A.; Al-Fuqaha, A. Kalman filtering and bipartite matching based super-chained tracker model for online multi object tracking in video sequences. Appl. Sci. 2022, 12, 9538. [Google Scholar] [CrossRef]
- Majstorović, I.; Ahac, M.; Madejski, J.; Lakušić, S. Influence of the Analytical Segment Length on the Tram Track Quality Assessment. Appl. Sci. 2022, 12, 10036. [Google Scholar] [CrossRef]
- Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. In Proceedings of the Computer Vision—ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Part II, pp. 290–307. [Google Scholar]
- Zhou, J.; Sun, J.; Zhang, W.; Lin, Z. Multi-view underwater image enhancement method via embedded fusion mechanism. Eng. Appl. Artif. Intell. 2023, 121, 105946. [Google Scholar] [CrossRef]
- Jiang, Q.; Zhang, Y.; Bao, F.; Zhao, X.; Zhang, C.; Liu, P. Two-step domain adaptation for underwater image enhancement. Pattern Recognit. 2022, 122, 108324. [Google Scholar] [CrossRef]
- Shang, Y.; Guo, Y.; Tang, J. Spectroscopy and chromaticity characterization of yellow to light-blue iron-containing beryl. Sci. Rep. 2022, 12, 10765. [Google Scholar] [CrossRef] [PubMed]
- Ali-Bik, M.W.; Sadek, M.F.; Hassan, S.M. Basement rocks around the eastern sector of Baranis-Aswan road, Egypt: Remote sensing data analysis and petrology. Egypt. J. Remote Sens. Space Sci. 2022, 25, 113–124. [Google Scholar] [CrossRef]
- Wang, H.; Sun, S.; Ren, P. Meta underwater camera: A smart protocol for underwater image enhancement. ISPRS J. Photogramm. Remote Sens. 2023, 195, 462–481. [Google Scholar] [CrossRef]
- Zhou, J.; Pang, L.; Zhang, D.; Zhang, W. Underwater Image Enhancement Method via Multi-Interval Subhistogram Perspective Equalization. IEEE J. Ocean. Eng. 2023, 1–15. [Google Scholar] [CrossRef]
- Jebadass, J.R.; Balasubramaniam, P. Low contrast enhancement technique for color images using interval-valued intuitionistic fuzzy sets with contrast limited adaptive histogram equalization. Soft Comput. 2022, 26, 4949–4960. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, Y.; Li, C. Underwater image enhancement by attenuated color channel correction and detail preserved contrast enhancement. IEEE J. Ocean. Eng. 2022, 47, 718–735. [Google Scholar] [CrossRef]
- Wang, J. Research on Underwater Image Semantic Segmentation Method Based on SegNet; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
- Zhang, W.; Zhuang, P.; Sun, H.H.; Li, G.; Kwong, S.; Li, C. Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef]
- Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A Systematic Review and Analysis of Deep Learning-based Underwater Object Detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
- Mehranian, A.; Wollenweber, S.D.; Walker, M.D.; Bradley, K.M.; Fielding, P.A.; Su, K.H.; Johnsen, R.; Kotasidis, F.; Jansen, F.P.; McGowan, D.R. Image enhancement of whole-body oncology [18F]-FDG PET scans using deep neural networks to reduce noise. Eur. J. Nucl. Med. Mol. Imaging 2022, 49, 539–549. [Google Scholar] [CrossRef] [PubMed]
- Azhar, A.S.B.M.; Harun, N.H.B.; Yusoff, N.B.; Hassan, M.G.B.; Chu, K.B. Image Enhancement on Underwater Images for Protozoan White Spot Fish Disease Detection. In Proceedings of the 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 18–20 May 2022; pp. 1–4. [Google Scholar]
- Yang, G.; Tian, Z.; Bi, Z.; Cui, Z.; Liu, Q. Adjacent Frame Difference with Dynamic Threshold Method in Underwater Flash Imaging LiDAR. Electronics 2022, 11, 2547. [Google Scholar] [CrossRef]
- Zhou, J.; Yao, J.; Zhang, W.; Zhang, D. Multi-scale retinex-based adaptive gray-scale transformation method for underwater image enhancement. Multimed. Tools Appl. 2022, 81, 1811–1831. [Google Scholar] [CrossRef]
- Tang, C.; von Lukas, U.F.; Vahl, M.; Wang, S.; Wang, Y.; Tan, M. Efficient underwater image and video enhancement based on Retinex. Signal, Image Video Process. 2019, 13, 1011–1018. [Google Scholar] [CrossRef]
- Du, Y.; Yuan, C.; Li, B.; Zhao, L.; Li, Y.; Hu, W. Interaction-aware spatio-temporal pyramid attention networks for action classification. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 373–389. [Google Scholar]
- Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16519–16529. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Peng, W.Y.; Peng, Y.T.; Lien, W.C.; Chen, C.S. Unveiling of How Image Restoration Contributes to Underwater Object Detection. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Penghu, Taiwan, 16–18 June 2021; pp. 1–2. [Google Scholar]
- Chen, S.; Wu, Y.; Liu, S.; Yang, Y.; Wan, X.; Yang, X.; Zhang, K.; Wang, B.; Yan, X. Development of Electromagnetic Current Meter for Marine Environment. J. Mar. Sci. Eng. 2023, 11, 206. [Google Scholar] [CrossRef]
- Blasiak, R.; Jouffray, J.B.; Amon, D.J.; Claudet, J.; Dunshirn, P.; Søgaard Jørgensen, P.; Pranindita, A.; Wabnitz, C.C.; Zhivkoplias, E.; Österblom, H. Making marine biotechnology work for people and nature. Nat. Ecol. Evol. 2023, 1–4. [Google Scholar] [CrossRef] [PubMed]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1627–1645. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Jocher, G. YOLOv5 by Ultralytics. 2022. Available online: https://github.com/ultralytics/yolov5 (accessed on 22 December 2022).
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zhang, X.; Fang, X.; Pan, M.; Yuan, L.; Zhang, Y.; Yuan, M.; Lv, S.; Yu, H. A marine organism detection framework based on the joint optimization of image enhancement and object detection. Sensors 2021, 21, 7205. [Google Scholar] [CrossRef]
- Han, F.; Yao, J.; Zhu, H.; Wang, C. Underwater image processing and object detection based on deep CNN method. J. Sens. 2020, 2020, 6707328. [Google Scholar] [CrossRef]
- Liu, H.; Song, P.; Ding, R. Towards domain generalization in underwater object detection. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 25–28 October 2020; pp. 1971–1975. [Google Scholar]
- Li, L.; Wang, Z.; Zhang, T. Gbh-yolov5: Ghost convolution with bottleneckcsp and tiny target prediction head incorporating yolov5 for pv panel defect detection. Electronics 2023, 12, 561. [Google Scholar] [CrossRef]
- Wen, G.; Li, S.; Liu, F.; Luo, X.; Er, M.J.; Mahmud, M.; Wu, T. YOLOv5s-CA: A Modified YOLOv5s Network with Coordinate Attention for Underwater Target Detection. Sensors 2023, 23, 3367. [Google Scholar] [CrossRef]
- Tian, Z.; Huang, J.; Yang, Y.; Nie, W. KCFS-YOLOv5: A High-Precision Detection Method for Object Detection in Aerial Remote Sensing Images. Appl. Sci. 2023, 13, 649. [Google Scholar] [CrossRef]
- Yu, H.; Li, X.; Feng, Y.; Han, S. Multiple attentional path aggregation network for marine object detection. Appl. Intell. 2022, 53, 2434–2451. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Sethi, R.; Sreedevi, I.; Verma, O.P.; Jain, V. An optimal underwater image enhancement based on fuzzy gray world algorithm and bacterial foraging algorithm. In Proceedings of the 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Patna, India, 16–19 December 2015; pp. 1–4. [Google Scholar]
- Reza, A.M. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. Vlsi Signal Process. Syst. Signal, Image Video Technol. 2004, 38, 35–44. [Google Scholar] [CrossRef]
- Weng, C.C.; Chen, H.; Fuh, C.S. A novel automatic white balance method for digital still cameras. In Proceedings of the 2005 IEEE International Symposium on Circuits and Systems (ISCAS), Kobe, Japan, 23–26 May 2005; pp. 3801–3804. [Google Scholar]
- Lee, S. An efficient content-based image enhancement in the compressed domain using retinex theory. IEEE Trans. Circuits Syst. Video Technol. 2007, 17, 199–213. [Google Scholar] [CrossRef]
- Parihar, A.S.; Singh, K. A study on Retinex based method for image enhancement. In Proceedings of the 2018 2nd International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 19–20 January 2018; pp. 619–624. [Google Scholar]
- Fu, X.; Zhuang, P.; Huang, Y.; Liao, Y.; Zhang, X.P.; Ding, X. A retinex-based enhancing approach for single underwater image. In Proceedings of the 2014 IEEE international conference on image processing (ICIP), Paris, France, 27–30 October 2014; pp. 4572–4576. [Google Scholar]
- Zhu, D.; Liu, Z.; Zhang, Y. Underwater image enhancement based on colour correction and fusion. IET Image Process. 2021, 15, 2591–2603. [Google Scholar] [CrossRef]
- Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
- Yang, S.; Chen, Z.; Feng, Z.; Ma, X. Underwater image enhancement using scene depth-based adaptive background light estimation and dark channel prior algorithms. IEEE Access 2019, 7, 165318–165327. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31 × 31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Xie, C.; Wu, Y.; Maaten, L.v.d.; Yuille, A.L.; He, K. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 501–509. [Google Scholar]
- Han, F.; Yao, J.; Zhu, H.; Wang, C. Marine organism detection and classification from underwater vision based on the deep CNN method. Math. Probl. Eng. 2020, 2020, 3937580. [Google Scholar] [CrossRef]
CSPLayer | BoT3 | UDA | [email protected] | [email protected]:0.95 |
---|---|---|---|---|
0.753 | 0.413 | |||
✔ | 0.76 | 0.426 | ||
✔ | 0.765 | 0.42 | ||
✔ | 0.766 | 0.455 | ||
✔ | ✔ | 0.769 | 0.427 | |
✔ | ✔ | ✔ | 0.798 | 0.442 |
Method | AP | [email protected] | ||||
---|---|---|---|---|---|---|
Echinus | Starfish | Holothurian | Scallop | Waterweeds | ||
YOLOv5_x (baseline) | 0.924 | 0.885 | 0.731 | 0.826 | 0.4 | 0.753 |
YOLOv5_n | 0.919 | 0.863 | 0.584 | 0.718 | 0.176 | 0.625 |
YOLOv5_m | 0.921 | 0.886 | 0.729 | 0.822 | 0.34 | 0.74 |
YOLOv5_l | 0.924 | 0.891 | 0.736 | 0.828 | 0.366 | 0.746 |
Faster-RCNN | 0.8744 | 0.884 | 0.7896 | 0.701 | 0.6018 | 0.7708 |
YOLOv7 | 0.906 | 0.896 | 0.777 | 0.835 | 0.46 | 0.775 |
YOLOv8 | - | - | - | - | - | 0.76 |
Ours | 0.917 | 0.865 | 0.738 | 0.842 | 0.626 | 0.798 |
Method | AP | [email protected] | |||
---|---|---|---|---|---|
Holothurian | Echinus | Scallop | Starfish | ||
YOLOv5_x (baseline) | 0.675 | 0.879 | 0.751 | 0.814 | 0.78 |
YOLOv7 | - | - | - | - | 0.793 |
Faster-RCNN | 0.575 | 0.777 | 0.514 | 0.745 | 0.653 |
Literature [46] | - | - | - | - | 0.694 |
YOLOv8 | - | - | - | - | 0.783 |
Ours | 0.694 | 0.879 | 0.777 | 0.825 | 0.794 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Zhang, J.; Zhou, K.; Zhang, Y.; Chen, H.; Yan, X. An Improved YOLOv5-Based Underwater Object-Detection Framework. Sensors 2023, 23, 3693. https://doi.org/10.3390/s23073693
Zhang J, Zhang J, Zhou K, Zhang Y, Chen H, Yan X. An Improved YOLOv5-Based Underwater Object-Detection Framework. Sensors. 2023; 23(7):3693. https://doi.org/10.3390/s23073693
Chicago/Turabian StyleZhang, Jian, Jinshuai Zhang, Kexin Zhou, Yonghui Zhang, Hongda Chen, and Xinyue Yan. 2023. "An Improved YOLOv5-Based Underwater Object-Detection Framework" Sensors 23, no. 7: 3693. https://doi.org/10.3390/s23073693
APA StyleZhang, J., Zhang, J., Zhou, K., Zhang, Y., Chen, H., & Yan, X. (2023). An Improved YOLOv5-Based Underwater Object-Detection Framework. Sensors, 23(7), 3693. https://doi.org/10.3390/s23073693