Small-Sample Underwater Target Detection: A Joint Approach Utilizing Diffusion and YOLOv7 Model
Abstract
:1. Introduction
- (1)
- The first application of the DDPM to generate small-sample SSS data yielded excellent results in the experiments. It addresses the challenges associated with acquiring SSS data using an AUV and reduces data collection costs.
- (2)
- An improvement was made to the YOLOv7 model by introducing an ECANet attention mechanism to the YOLOv7 network, enhancing the feature extraction capability for underwater targets and improving the detection accuracy of small targets in SSS images.
- (3)
- A dataset of small underwater targets in SSS was constructed, and a comprehensive comparison was conducted between current mainstream data generation methods and object detection methods on this dataset, fully demonstrating the effectiveness of the proposed approach in this paper.
2. Methods
2.1. Denoising Diffusion Probabilistic Models
2.1.1. Forward Process
2.1.2. Reverse Process
2.2. Improved YOLOv7 Model
2.2.1. YOLOv7 Overview
2.2.2. Efficient Channel Attention
3. Results
3.1. Model Evaluation Metrics
3.2. Dataset Preparation
3.3. Comparison of Data Generated by DDPM and GANs
3.4. Performance Comparison of YOLOv7 Networks Trained with DatasetA and DatasetB
3.5. Performance Comparison of YOLOv7 Integrated with Different Attention Mechanisms
3.6. Performance Comparison of Improved YOLOv7 Network against Original YOLOv7 and Other Networks
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, J.; Chen, L.; Shen, J.; Xiao, X.; Liu, X.; Sun, X.; Wang, X.; Li, D. Improved Neural Network with Spatial Pyramid Pooling and Online Datasets Preprocessing for Underwater Target Detection Based on Side Scan Sonar Imagery. Remote Sens. 2023, 15, 440. [Google Scholar] [CrossRef]
- Wu, M.; Wang, Q.; Rigall, E.; Li, K.; Zhu, W.; He, B.; Yan, T. ECNet: Efficient convolutional networks for side scan sonar image segmentation. Sensors 2019, 19, 2009. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.; Zhao, J.; Gong, Q.; Huang, C.; Zheng, G.; Ma, J. Real-time underwater maritime object detection in side-scan sonar images based on transformer-YOLOv5. Remote Sens. 2021, 13, 3555. [Google Scholar] [CrossRef]
- Szymak, P.; Piskur, P.; Naus, K. The effectiveness of using a pretrained deep learning neural networks for object classification in underwater video. Remote Sens. 2020, 12, 3020. [Google Scholar] [CrossRef]
- Li, L.; Li, Y.; Yue, C.; Xu, G.; Wang, H.; Feng, X. Real-time underwater target detection for AUV using side scan sonar images based on deep learning. Appl. Ocean Res. 2023, 138, 103630. [Google Scholar] [CrossRef]
- Long, H.; Shen, L.; Wang, Z.; Chen, J. Underwater Forward-Looking Sonar Images Target Detection via Speckle Reduction and Scene Prior. IEEE Trans. Geosci. Remote. Sens. 2023, 61, l5604413. [Google Scholar] [CrossRef]
- Doersch, C. Tutorial on variational autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar]
- Kingma, D.P.; Welling, M. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning; Now Publishers: Hanover, MD, USA, 2019; Volume 12, pp. 307–392. [Google Scholar]
- Alias, A.G.; Ke, N.R.; Ganguli, S.; Bengio, Y. Variational walkback: Learning a transition operator as a stochastic recurrent net. In Advances in Neural Information Processing Systems, Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30, p. 30. [Google Scholar]
- Kim, T.; Bengio, Y. Deep directed generative models with energy-based probability estimation. arXiv 2016, arXiv:1606.03439. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
- Gui, J.; Sun, Z.; Wen, Y.; Tao, D.; Ye, J. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Trans. Knowl. Data Eng. 2021, 35, 3313–3332. [Google Scholar] [CrossRef]
- Kobyzev, I.; Prince, S.J.; Brubaker, M.A. Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3964–3979. [Google Scholar] [CrossRef]
- Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 2021, 22, 2617–2680. [Google Scholar]
- Saharia, C.; Chan, W.; Chang, H.; Lee, C.; Ho, J.; Salimans, T.; Fleet, D.; Norouzi, M. Palette: Image-to-image diffusion models. In Proceedings of the ACM SIGGRAPH 2022 Conference, Los Angeles, CA, USA, 6–10 August 2022; pp. 1–10. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems: 34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual Conference, 6–12 December 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Chen, Y.; Liang, H.; Pang, S. Study on small samples active sonar target recognition based on deep learning. J. Mar. Sci. Eng. 2022, 10, 1144. [Google Scholar] [CrossRef]
- Xu, Y.; Wang, X.; Wang, K.; Shi, J.; Sun, W. Underwater sonar image classification using generative adversarial network and convolutional neural network. IET Image Process. 2020, 14, 2819–2825. [Google Scholar] [CrossRef]
- Wang, Z.; Guo, Q.; Lei, M.; Guo, S.; Ye, X. High-Quality Sonar Image Generation Algorithm Based on Generative Adversarial Networks. In Proceedings of the 2021 40th Chinese Control Conference (CCC), IEEE, Shanghai, China, 26–28 July 2021; pp. 3099–3104. [Google Scholar]
- Jegorova, M.; Karjalainen, A.I.; Vazquez, J.; Hospedales, T. Full-scale continuous synthetic sonar data generation with markov conditional generative adversarial networks. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 3168–3174. [Google Scholar]
- Jiang, Y.; Ku, B.; Kim, W.; Ko, H. Side-scan sonar image synthesis based on generative adversarial network for images in multiple frequencies. IEEE Geosci. Remote. Sens. Lett. 2020, 18, 1505–1509. [Google Scholar] [CrossRef]
- Lee, E.h.; Park, B.; Jeon, M.H.; Jang, H.; Kim, A.; Lee, S. Data augmentation using image translation for underwater sonar image segmentation. PLoS ONE 2022, 17, e0272602. [Google Scholar] [CrossRef] [PubMed]
- Liu, D.; Wang, Y.; Ji, Y.; Tsuchiya, H.; Yamashita, A.; Asama, H. Cyclegan-based realistic image dataset generation for forward-looking sonar. Adv. Robot. 2021, 35, 242–254. [Google Scholar] [CrossRef]
- Zhang, Z.; Tang, J.; Zhong, H.; Wu, H.; Zhang, P.; Ning, M. Spectral Normalized CycleGAN with Application in Semisupervised Semantic Segmentation of Sonar Images. Comput. Intell. Neurosci. 2022, 2022, 1274260. [Google Scholar] [CrossRef]
- Karjalainen, A.I.; Mitchell, R.; Vazquez, J. Training and validation of automatic target recognition systems using generative adversarial networks. In Proceedings of the 2019 Sensor Signal Processing for Defence Conference (SSPD), Brighton, UK, 9–10 May 2019; pp. 1–5. [Google Scholar]
- Ho, J.; Salimans, T.; Gritsenko, A.; Chan, W.; Norouzi, M.; Fleet, D.J. Video diffusion models. arXiv 2022, arXiv:2204.03458. [Google Scholar]
- Batzolis, G.; Stanczuk, J.; Schönlieb, C.B.; Etmann, C. Conditional image generation with score-based diffusion models. arXiv 2021, arXiv:2111.13606. [Google Scholar]
- Chen, T.; Zhang, R.; Hinton, G. Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv 2022, arXiv:2208.04202. [Google Scholar]
- Alcaraz, J.M.L.; Strodthoff, N. Diffusion-based time series imputation and forecasting with structured state space models. arXiv 2022, arXiv:2208.09399. [Google Scholar]
- Liu, J.; Li, C.; Ren, Y.; Chen, F.; Zhao, Z. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 22 February–1 March 2022; Volume 36, pp. 11020–11028. [Google Scholar]
- Koizumi, Y.; Zen, H.; Yatabe, K.; Chen, N.; Bacchiani, M. SpecGrad: Diffusion probabilistic model based neural vocoder with adaptive noise spectral shaping. arXiv 2022, arXiv:2203.16749. [Google Scholar]
- Cao, H.; Tan, C.; Gao, Z.; Chen, G.; Heng, P.A.; Li, S.Z. A survey on generative diffusion model. arXiv 2022, arXiv:2209.02646. [Google Scholar]
- Luo, S.; Su, Y.; Peng, X.; Wang, S.; Peng, J.; Ma, J. Antigen-specific antibody design and optimization with diffusion-based generative models. bioRxiv 2022. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Liu, K.; Sun, Q.; Sun, D.; Peng, L.; Yang, M.; Wang, N. Underwater target detection based on improved YOLOv7. J. Mar. Sci. Eng. 2023, 11, 677. [Google Scholar] [CrossRef]
- Chen, X.; Yuan, M.; Yang, Q.; Yao, H.; Wang, H. Underwater-YCC: Underwater Target Detection Optimization Algorithm Based on YOLOv7. J. Mar. Sci. Eng. 2023, 11, 995. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhang, M.; Yin, L. Solar cell surface defect detection based on improved YOLO v5. IEEE Access 2022, 10, 80804–80815. [Google Scholar] [CrossRef]
- Sitaula, C.; KC, S.; Aryal, J. Enhanced Multi-level Features for Very High Resolution Remote Sensing Scene Classification. arXiv 2023, arXiv:2305.00679. [Google Scholar]
- Zhang, Z.; Yan, Z.; Jing, J.; Gu, H.; Li, H. Generating Paired Seismic Training Data with Cycle-Consistent Adversarial Networks. Remote Sens. 2023, 15, 265. [Google Scholar] [CrossRef]
Category | DatasetA | DatasetB | ||||||
---|---|---|---|---|---|---|---|---|
Train | Val | Test | Total | Train | Val | Test | Total | |
Cone | 30 | 11 | 12 | 53 | 257 | 11 | 12 | 280 |
Cylinder | 30 | 12 | 13 | 55 | 255 | 12 | 13 | 280 |
Non-target | 168 | 56 | 56 | 280 | 168 | 56 | 56 | 280 |
Method | Cone | Cylinder | Non-Target | All Data |
---|---|---|---|---|
FID (dim = 64) | 2.17 | 0.69 | 0.11 | 0.39 |
FID (dim = 192) | 4.64 | 1.37 | 0.25 | 0.87 |
FID (dim = 768) | 0.20 | 0.16 | 0.13 | 0.08 |
FID (dim = 2048) | 59.51 | 56.84 | 36.94 | 31.51 |
Category | DatasetA | DatasetB | ||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | mAP.5 | mAP.5:.95 | Precision | Recall | mAP.5 | mAP.5:.95 | |
All | 0.930 | 0.621 | 0.615 | 0.290 | 0.877 | 0.902 | 0.894 | 0.517 |
Cone | 1.000 | 0.250 | 0.255 | 0.153 | 0.844 | 0.917 | 0.894 | 0.577 |
Cylinder | 0.909 | 0.769 | 0.746 | 0.388 | 0.812 | 1.000 | 0.966 | 0.583 |
Non-target | 0.931 | 0.844 | 0.845 | 0.329 | 0.974 | 0.788 | 0.822 | 0.392 |
Method | DatasetA | DatasetB | ||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | mAP.5 | mAP.5:.95 | Precision | Recall | mAP.5 | mAP.5:.95 | |
YOLOv7 | 0.930 | 0.621 | 0.615 | 0.290 | 0.877 | 0.902 | 0.894 | 0.517 |
YOLOv7+SE | 0.923 | 0.647 | 0.632 | 0.297 | 0.899 | 0.918 | 0.910 | 0.521 |
cYOLOv7+BAM | 0.935 | 0.620 | 0.614 | 0.285 | 0.878 | 0.897 | 0.896 | 0.515 |
cYOLOv7+CBAM | 0.937 | 0.632 | 0.645 | 0.308 | 0.884 | 0.901 | 0.893 | 0.519 |
cYOLOv7+CA | 0.925 | 0.654 | 0.653 | 0.311 | 0.902 | 0.924 | 0.917 | 0.523 |
cYOLOv7+ECA | 0.928 | 0.653 | 0.649 | 0.312 | 0.903 | 0.922 | 0.914 | 0.524 |
Category | DatasetA | DatasetB | ||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | mAP.5 | mAP.5:.95 | Precision | Recall | mAP.5 | mAP.5:.95 | |
All | 0.928 | 0.653 | 0.649 | 0.312 | 0.903 | 0.922 | 0.914 | 0.524 |
Cone | 1.000 | 0.333 | 0.338 | 0.212 | 0.800 | 1.000 | 0.971 | 0.628 |
Cylinder | 0.852 | 0.769 | 0.745 | 0.388 | 0.923 | 0.923 | 0.928 | 0.557 |
Non-target | 0.932 | 0.858 | 0.865 | 0.337 | 0.976 | 0.844 | 0.842 | 0.388 |
Method | DatasetA | DatasetB | ||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | mAP.5 | mAP.5:.95 | Precision | Recall | mAP.5 | mAP.5:.95 | |
SSD | 0.861 | 0.706 | 0.587 | 0.279 | 0.879 | 0.904 | 0.886 | 0.468 |
DETR | 0.913 | 0.635 | 0.612 | 0.288 | 0.910 | 0.917 | 0.876 | 0.499 |
Faster-RCNN | 0.882 | 0.689 | 0.602 | 0.243 | 0.824 | 0.836 | 0.857 | 0.471 |
EfficientDet-D0 | 0.931 | 0.643 | 0.610 | 0.299 | 0.885 | 0.916 | 0.899 | 0.518 |
YOLOv5 | 0.905 | 0.617 | 0.623 | 0.296 | 0.879 | 0.899 | 0.901 | 0.513 |
YOLOv7 | 0.930 | 0.621 | 0.615 | 0.290 | 0.877 | 0.902 | 0.894 | 0.517 |
Co-DETR | 0.925 | 0.510 | 0.642 | 0.311 | 0.899 | 0.924 | 0.910 | 0.526 |
YOLOv7+ECA | 0.928 | 0.653 | 0.649 | 0.312 | 0.903 | 0.922 | 0.914 | 0.524 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, C.; Hou, X.; Wen, X.; Liu, W.; Zhang, F. Small-Sample Underwater Target Detection: A Joint Approach Utilizing Diffusion and YOLOv7 Model. Remote Sens. 2023, 15, 4772. https://doi.org/10.3390/rs15194772
Cheng C, Hou X, Wen X, Liu W, Zhang F. Small-Sample Underwater Target Detection: A Joint Approach Utilizing Diffusion and YOLOv7 Model. Remote Sensing. 2023; 15(19):4772. https://doi.org/10.3390/rs15194772
Chicago/Turabian StyleCheng, Chensheng, Xujia Hou, Xin Wen, Weidong Liu, and Feihu Zhang. 2023. "Small-Sample Underwater Target Detection: A Joint Approach Utilizing Diffusion and YOLOv7 Model" Remote Sensing 15, no. 19: 4772. https://doi.org/10.3390/rs15194772
APA StyleCheng, C., Hou, X., Wen, X., Liu, W., & Zhang, F. (2023). Small-Sample Underwater Target Detection: A Joint Approach Utilizing Diffusion and YOLOv7 Model. Remote Sensing, 15(19), 4772. https://doi.org/10.3390/rs15194772