Enhanced YOLOv7 for Improved Underwater Target Detection
Abstract
:1. Introduction
- (1)
- The K-Means algorithm is used to re-cluster the underwater target dataset, and the anchor frames produced by the clustering are more in line with the smaller size of the underwater targets studied in this paper, which accelerates the convergence speed of the model and improves the detection accuracy of small-sized targets.
- (2)
- Pconv (Partial Convolution) is introduced to optimize the multi-scale feature fusion module, which reduces the parameter count and computational workload while improving the detection speed.
- (3)
- The ShapeIou_NWD loss function is used instead of the CIou loss function, which effectively solves the problem of slow convergence and reduced prediction accuracy during the training process by ignoring the shape and scale of the bounding box itself, as well as the possibility of detecting targets that are too small.
- (4)
- Introducing the SimAM attention mechanism after the multi-scale feature fusion module enhances the detection accuracy of the network model without parameter increase.
2. Materials and Methods
2.1. YOLOv7 Model
2.2. Kmeans
- Randomly select k samples from the dataset as initial clustering centers .
- For each sample in the dataset, calculate its distance from the K cluster centers and assign it to the class corresponding to the cluster center with the smallest distance.
- For each category , recalculate its clustering center (i.e., the center of mass of all samples belonging to the class).
- Repeat steps 2 and 3 until the position of the clustering center no longer changes.
2.3. ELAN_PC
2.4. ShapeIoU_NWD
2.5. SimAM
2.6. Proposed Improved Algorithm
3. Experiments
3.1. Experimental Dataset
3.2. Experimental Environment and Parameterization
3.3. Experimental Design
3.3.1. Evaluation Metrics
3.3.2. Experimental Results
4. Discussion
4.1. Ablation Experiments
4.2. Comparative Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhou, X.; Ding, W.; Jin, W. Microwave-assisted extraction of lipids, carotenoids, and other compounds from marine resources. In Innovative and Emerging Technologies in the Bio-Marine Food Sector; Academic Press: Cambridge, MA, USA, 2022; pp. 375–394. [Google Scholar]
- Wang, S.; Liu, X.; Yu, S.; Zhu, X.; Chen, B.; Sun, X. Design and Implementation of SSS-Based AUV Autonomous Online Object Detection System. Electronics 2024, 13, 1064. [Google Scholar] [CrossRef]
- Yu, G.; Cai, R.; Su, J.; Hou, M.; Deng, R. U-YOLOv7: A network for underwater organism detection. Ecol. Inform. 2023, 75, 102108. [Google Scholar] [CrossRef]
- Jia, R.; Lv, B.; Chen, J.; Liu, H.; Cao, L.; Liu, M. Underwater Object Detection in Marine Ranching Based on Improved YOLOv8. J. Mar. Sci. Eng. 2023, 12, 55. [Google Scholar] [CrossRef]
- Wu, B.; Liu, C.; Jiang, F.; Li, J.; Yang, Z. Dynamic identification and automatic counting of the number of passing fish species based on the improved DeepSORT algorithm. Front. Environ. Sci. 2023, 11, 1059217. [Google Scholar] [CrossRef]
- Zhao, H.; Cui, H.; Qu, K.; Zhu, J.; Li, H.; Cui, Z.; Wu, Y. A fish appetite assessment method based on improved ByteTrack and spatiotemporal graph convolutional network. Biosyst. Eng. 2024, 240, 46–55. [Google Scholar] [CrossRef]
- Liu, Y.; An, D.; Ren, Y.; Zhao, J.; Zhang, C.; Cheng, J.; Liu, J.; Wei, Y. DP-FishNet: Dual-path Pyramid Vision Transformer-based underwater fish detection network. Expert Syst. Appl. 2024, 238, 122018. [Google Scholar] [CrossRef]
- Sahu, P.; Gupta, N.; Sharma, N. A survey on underwater image enhancement techniques. Int. J. Comput. Appl. 2014, 87, 333–338. [Google Scholar] [CrossRef]
- Christensen, L.; de Gea Fernández, J.; Hildebrandt, M.; Koch, C.E.S.; Wehbe, B. Recent advances in ai for navigation and control of underwater robots. Curr. Robot. Rep. 2022, 3, 165–175. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Wei, X. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Jocher, G. YOLOv8 by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 15 February 2023).
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J.; Berkeley, U.C.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Proceedings Part I 14, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Wen, G.; Li, S.; Liu, F.; Luo, X.; Er, M.-J.; Mahmud, M.; Wu, T. Yolov5s-ca: A modified yolov5s network with coordinate attention for underwater target detection. Sensors 2023, 23, 3367. [Google Scholar] [CrossRef] [PubMed]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Sinaga, K.P.; Yang, M. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Part XVII. Springer: Cham, Switzerland, 2022; pp. 649–667. [Google Scholar]
- Lee, Y.; Hwang, J.W.; Lee, S.; Bae, Y.; Park, J. An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Wang, C.Y.; Mark Liao, H.Y.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020), Washington, DC, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv 2020, arXiv:2005.03572. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv 2021, arXiv:2101.08158. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference On Machine Learning (ICML), Virtual Event, 18–24 July 2021; pp. 11863–11974. [Google Scholar]
Feature Map Size | 80 × 80 | 40 × 40 | 20 × 20 |
---|---|---|---|
(21, 17) | (52, 35) | (80, 51) | |
YOLOv7 | (28, 24) | (44, 46) | (82, 79) |
(35, 32) | (57, 56) | (140, 117) |
Attention Mechanism | Holothurian (%) | Echinus (%) | Scallop (%) | Starfish (%) | mAP (%) |
---|---|---|---|---|---|
CBAM | 77.3 | 92.7 | 83.1 | 88.3 | 85.3 |
CPCA | 77.8 | 92.1 | 80.1 | 87.8 | 84.4 |
CA | 77.5 | 92.3 | 83.1 | 88.5 | 85.3 |
GAM | 77.3 | 85.6 | 79.8 | 88.6 | 82.8 |
SimAM | 77.6 | 92.9 | 83.2 | 89.0 | 85.7 |
Algorithm | Params (M) | FLOPs (G) | mAP (%) | FPS (frame/s) |
---|---|---|---|---|
YOLOv7 | 36.5 | 103.2 | 84.7 | 112.5 |
YOLOv7 + Kmeans | 36.5 | 103.2 | 85.4 | 113.2 |
YOLOv7 + Kmeans + ELAN_PC | 28.3 | 75.4 | 84.7 | 126.3 |
YOLOv7 + Kmeans + ELAN_PC + ShapeIoU_NWD | 28.3 | 75.4 | 85.5 | 125.3 |
YOLOv7 + Kmeans + ELAN_PC + ShapeIoU_NWD + SimAM | 28.7 | 76.6 | 85.7 | 122.9 |
Algorithm | Holothurian (%) | Echinus (%) | Scallop (%) | Starfish (%) | mAP (%) | FPS (frame/s) |
---|---|---|---|---|---|---|
Faster RCNN | 56.0 | 82.0 | 52.0 | 79.0 | 67.5 | 40.7 |
SSD | 63.0 | 77.0 | 46.0 | 74.0 | 65.0 | 167.9 |
YOLOv5 | 73.7 | 91.4 | 80.6 | 85.1 | 82.7 | 270.0 |
YOLOv7 | 76.5 | 92.1 | 82.8 | 87.4 | 84.7 | 112.5 |
YOLOv7-W6 | 74.2 | 90.8 | 75.2 | 87.1 | 81.8 | 114.2 |
YOLOv7-E6E | 74.6 | 89.7 | 49.2 | 87.6 | 75.3 | 51.9 |
YOLOv8 | 74.9 | 91.4 | 80.3 | 88.3 | 83.7 | 131.9 |
Ours | 77.6 | 92.9 | 83.2 | 89.0 | 85.7 | 122.9 |
Algorithm | Holothurian (%) | Echinus (%) | Scallop (%) | Starfish (%) | All (%) |
---|---|---|---|---|---|
Faster RCNN | 35.4 | 53.2 | 37.8 | 51.5 | 44.5 |
SSD | 80.6 | 89.0 | 83.1 | 84.4 | 84.3 |
YOLOv5 | 76.5 | 88.3 | 83.4 | 80.5 | 82.2 |
YOLOv7 | 75.1 | 88.0 | 84.6 | 81.5 | 82.3 |
YOLOv7-W6 | 75.5 | 86.3 | 79.7 | 79.6 | 80.3 |
YOLOv7-E6E | 79.9 | 88.6 | 34.8 | 85.2 | 72.1 |
YOLOv8 | 77.8 | 89.0 | 85.6 | 83.4 | 83.9 |
Ours | 78.0 | 88.7 | 85.1 | 81.6 | 83.3 |
Algorithm | Holothurian (%) | Echinus (%) | Scallop (%) | Starfish (%) | All (%) |
---|---|---|---|---|---|
Faster RCNN | 65.6 | 88.2 | 61.5 | 85.1 | 75.1 |
SSD | 50.5 | 51.4 | 27.7 | 55.1 | 46.2 |
YOLOv5 | 70.1 | 86.0 | 73.3 | 79.5 | 77.2 |
YOLOv7 | 71.8 | 86.4 | 72.9 | 82.8 | 78.5 |
YOLOv7-W6 | 68.5 | 85.9 | 66.5 | 82.7 | 75.9 |
YOLOv7-E6E | 66.3 | 82.4 | 64.1 | 79.6 | 73.1 |
YOLOv8 | 68.4 | 83.1 | 68.4 | 82.0 | 75.5 |
Ours | 73.3 | 87.4 | 72.5 | 83.3 | 79.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, D.; Yi, J.; Wang, J. Enhanced YOLOv7 for Improved Underwater Target Detection. J. Mar. Sci. Eng. 2024, 12, 1127. https://doi.org/10.3390/jmse12071127
Lu D, Yi J, Wang J. Enhanced YOLOv7 for Improved Underwater Target Detection. Journal of Marine Science and Engineering. 2024; 12(7):1127. https://doi.org/10.3390/jmse12071127
Chicago/Turabian StyleLu, Daohua, Junxin Yi, and Jia Wang. 2024. "Enhanced YOLOv7 for Improved Underwater Target Detection" Journal of Marine Science and Engineering 12, no. 7: 1127. https://doi.org/10.3390/jmse12071127
APA StyleLu, D., Yi, J., & Wang, J. (2024). Enhanced YOLOv7 for Improved Underwater Target Detection. Journal of Marine Science and Engineering, 12(7), 1127. https://doi.org/10.3390/jmse12071127