SACuP: Sonar Image Augmentation with Cut and Paste Based DataBank for Semantic Segmentation
Abstract
:1. Introduction
- We propose a novel data augmentation pipeline for sonar imaging applications that uses a cut-and-paste-based DataBank approach for segmentation operations.
- Our proposed method creates a DataBank using only existing images and masks, requiring no additional work and preserving the characteristics of sonar noise.
- We show that the proposed method improves performance when using only real data for training as well as when using other augmentation methods.
2. Related Works
2.1. Semantic Segmentation
2.2. Data Augmentation in 2D Images
2.3. Data Augmentation in Sonar Images
3. Methods
3.1. DataBank Generation
3.1.1. Images Extraction
3.1.2. Information Extraction
3.2. Background Inpainting
3.3. Object Insertion
3.3.1. Selecting the Object
3.3.2. Object Positioning
3.3.3. Check Object Overlapping
3.3.4. Adjusting Brightness
3.4. Shadow Generation
4. Experiments
4.1. Dataset
4.2. Experiment Setup
4.3. Statistical Analysis
4.4. Results of Comparison Experiment with Existing Methods
4.5. Ablation Study
4.6. Effects of Augmented Ratio
4.7. Application to Various Architectures
4.8. Results of Experiment on USI Datasets
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
- Valdenegro-Toro, M. Object recognition in forward-looking sonar images with convolutional neural networks. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–6. [Google Scholar]
- Valdenegro-Toro, M. End-to-end object detection and recognition in forward-looking sonar images with convolutional neural networks. In Proceedings of the 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), Tokyo, Japan, 6–9 November 2016; pp. 144–150. [Google Scholar]
- Hansen, C.H. Fundamentals of acoustics. In Occupational Exposure to Noise: Evaluation, Prevention and Control; World Health Organization: Geneva, Switzerland, 2001; Volume 1, pp. 23–52. [Google Scholar]
- Steiniger, Y.; Kraus, D.; Meisen, T. Survey on deep learning based computer vision for sonar imagery. Eng. Appl. Artif. Intell. 2022, 114, 105157. [Google Scholar] [CrossRef]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 843–852. [Google Scholar]
- Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Figueira, A.; Vaz, B. Survey on synthetic data generation, evaluation methods and GANs. Mathematics 2022, 10, 2733. [Google Scholar] [CrossRef]
- Yang, S.; Xiao, W.; Zhang, M.; Guo, S.; Zhao, J.; Shen, F. Image data augmentation for deep learning: A survey. arXiv 2022, arXiv:2204.08610. [Google Scholar]
- DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. arXiv 2014, arXiv:1406.2661. [Google Scholar]
- Song, T.; Wang, Y.; Gao, C.; Chen, H.; Li, J. MSLAN: A Two-Branch Multidirectional Spectral–Spatial LSTM Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5528814. [Google Scholar] [CrossRef]
- Sheng, Y.; Xiao, L. Manifold Augmentation Based Self-Supervised Contrastive Learning for Few-Shot Remote Sensing Scene Classification. In Proceedings of the IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 2239–2242. [Google Scholar]
- Zhang, B.; Zhou, T.; Shi, Z.; Xu, C.; Yang, K.; Yu, X. An underwater small target boundary segmentation method in forward-looking sonar images. Appl. Acoust. 2023, 207, 109341. [Google Scholar] [CrossRef]
- Gibou, F.; Fedkiw, R.; Osher, S. A review of level-set methods and some recent applications. J. Comput. Phys. 2018, 353, 82–109. [Google Scholar] [CrossRef]
- Zhao, D.; Ge, W.; Chen, P.; Hu, Y.; Dang, Y.; Liang, R.; Guo, X. Feature Pyramid U-Net with Attention for Semantic Segmentation of Forward-Looking Sonar Images. Sensors 2022, 22, 8468. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
- Wang, L.; Ye, X.; Zhu, L.; Wu, W.; Zhang, J.; Xing, H.; Hu, C. When SAM Meets Sonar Images. arXiv 2023, arXiv:2306.14109. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Lee, E.h.; Park, B.; Jeon, M.H.; Jang, H.; Kim, A.; Lee, S. Data augmentation using image translation for underwater sonar image segmentation. PLoS ONE 2022, 17, e0272602. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, Y.; Xu, X. Objectaug: Object-level data augmentation for semantic image segmentation. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 2918–2928. [Google Scholar]
- Illarionova, S.; Nesteruk, S.; Shadrin, D.; Ignatiev, V.; Pukalchik, M.; Oseledets, I. Object-based augmentation improves quality of remote sensing semantic segmentation. arXiv 2021, arXiv:2105.05516. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Manhães, M.M.M.; Scherer, S.A.; Voss, M.; Douat, L.R.; Rauschenbach, T. UUV simulator: A gazebo-based package for underwater intervention and multi-robot simulation. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–8. [Google Scholar]
- DeMarco, K.J.; West, M.E.; Howard, A.M. A computationally-efficient 2D imaging sonar model for underwater robotics simulations in Gazebo. In Proceedings of the OCEANS 2015-MTS/IEEE Washington, Washington, DC, USA, 19–22 October 2015; pp. 1–7. [Google Scholar]
- Cerqueira, R.; Trocoli, T.; Neves, G.; Joyeux, S.; Albiez, J.; Oliveira, L. A novel GPU-based sonar simulator for real-time applications. Comput. Graph. 2017, 68, 66–76. [Google Scholar] [CrossRef]
- Cerqueira, R.; Trocoli, T.; Albiez, J.; Oliveira, L. A rasterized ray-tracer pipeline for real-time, multi-device sonar simulation. Graph. Model. 2020, 111, 101086. [Google Scholar] [CrossRef]
- Choi, W.S.; Olson, D.R.; Davis, D.; Zhang, M.; Racson, A.; Bingham, B.; McCarrin, M.; Vogt, C.; Herman, J. Physics-based modelling and simulation of multibeam echosounder perception for autonomous underwater manipulation. Front. Robot. AI 2021, 8, 706646. [Google Scholar] [CrossRef]
- Koenig, N.; Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2149–2154. [Google Scholar]
- Sung, M.; Kim, J.; Kim, J.; Yu, S.C. Realistic sonar image simulation using generative adversarial network. IFAC-PapersOnLine 2019, 52, 291–296. [Google Scholar] [CrossRef]
- Lee, S.; Park, B.; Kim, A. Deep learning from shallow dives: Sonar image generation and training for underwater object detection. arXiv 2018, arXiv:1810.07990. [Google Scholar]
- Singh, D.; Valdenegro-Toro, M. The marine debris dataset for forward-looking sonar semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3741–3749. [Google Scholar]
- SoundMetrics. ARIS Explorer 3000: See What Others Can’t. Available online: http://www.soundmetrics.com/products/aris-sonars/ARIS-Explorer-3000/015335_RevD_ARIS-Explorer-3000_Brochure (accessed on 7 August 2023).
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.Y. Contrastive learning for unpaired image-to-image translation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16; Springer: Cham, Switzerland, 2020; pp. 319–345. [Google Scholar]
Object | Shadow Length | Shadow Intensity | Standard Deviation |
---|---|---|---|
bottle | 76 | 0.88 | 21.19 |
can | 112 | 0.94 | 25.28 |
drink-carton | 122 | 0.81 | 21.04 |
hook | 143 | 0.73 | 20.27 |
propeller | 132 | 0.84 | 25.30 |
tire | 90 | 0.75 | 27.51 |
The Number of Objects | The Number of Pixels | |||||
---|---|---|---|---|---|---|
Object | Train | Test | Val | Train | Test | Val |
bottle | 273 | 91 | 100 | 396,846 | 137,895 | 149,593 |
can | 177 | 39 | 55 | 214,502 | 49,086 | 79,850 |
chain | 185 | 71 | 62 | 728,342 | 277,067 | 326,391 |
drink-carton | 185 | 74 | 75 | 129,130 | 55,511 | 57,594 |
hook | 112 | 30 | 25 | 136,062 | 36,249 | 32,692 |
propeller | 126 | 39 | 29 | 339,402 | 110,211 | 89,275 |
shampoo-bottle | 42 | 27 | 27 | 81,300 | 60,334 | 53,563 |
standing-bottle | 40 | 8 | 14 | 79,161 | 15,851 | 26,567 |
tire | 371 | 127 | 113 | 1,020,156 | 331,350 | 300,764 |
valve | 149 | 43 | 42 | 88,608 | 29,417 | 24,629 |
wall | 554 | 179 | 186 | 2,207,620 | 673,196 | 626,694 |
Object | Baseline | TA | CutOut | CutMix | ObjectAug | Sim2Real | Ours |
---|---|---|---|---|---|---|---|
background | 99.28 | 99.29 | 99.26 | 99.26 | 99.28 | 99.28 | 99.29 |
bottle | 76.03 | 76.01 | 75.15 | 75.72 | 75.52 | 79.30 | 76.64 |
can | 56.44 | 58.12 | 55.34 | 56.43 | 55.21 | 57.02 | 58.99 |
chain | 63.48 | 63.44 | 62.00 | 61.83 | 62.35 | 62.96 | 64.25 |
drink-carton | 73.75 | 74.65 | 72.44 | 74.31 | 73.31 | 74.30 | 75.95 |
hook | 67.73 | 68.87 | 68.41 | 67.62 | 68.18 | 68.47 | 69.41 |
propeller | 73.19 | 74.37 | 72.88 | 73.67 | 74.85 | 73.03 | 74.89 |
shampoo-bottle | 78.07 | 79.91 | 78.18 | 78.51 | 79.47 | 78.88 | 78.61 |
standing-bottle | 79.83 | 80.00 | 79.66 | 78.90 | 82.67 | 79.54 | 81.23 |
tire | 88.00 | 87.64 | 87.63 | 87.49 | 87.65 | 87.61 | 87.92 |
valve | 58.11 | 58.33 | 58.27 | 58.95 | 58.36 | 58.47 | 59.56 |
wall | 87.74 | 88.75 | 88.24 | 88.07 | 88.38 | 88.31 | 88.17 |
mIoU | 75.14 | 75.78 | 74.79 | 75.06 | 75.44 | 75.35 | 76.24 |
Model | Cut & Paste | Adjust Brightness | Shadow Generation | mIoU |
---|---|---|---|---|
Baseline | 75.14 | |||
Cut & Paste | ✓ | 75.79 | ||
Cut & Paste + Adjust brightness | ✓ | ✓ | 75.85 | |
Cut & Paste + Shadow generation | ✓ | ✓ | 75.90 | |
Ours | ✓ | ✓ | ✓ | 76.24 |
Architecture | Encoder | Baseline | Ours |
---|---|---|---|
U-Net | ResNet-18 | 75.14 | 76.24 |
EfficientNet-B0 | 75.71 | 76.40 | |
DeepLabV3+ | ResNet-18 | 75.98 | 76.45 |
EfficientNet-B0 | 75.27 | 76.19 |
Object | Baseline | TA | CutOut | CutMix | ObjectAug | Synthetic | Ours |
---|---|---|---|---|---|---|---|
background | 99.69 | 99.72 | 99.74 | 99.75 | 99.70 | 99.67 | 99.78 |
object | 59.48 | 62.32 | 64.03 | 65.60 | 60.25 | 56.81 | 68.60 |
mIoU | 79.59 | 81.02 | 81.88 | 82.68 | 79.98 | 78.24 | 84.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, S.; Choi, Y.; Hwang, H. SACuP: Sonar Image Augmentation with Cut and Paste Based DataBank for Semantic Segmentation. Remote Sens. 2023, 15, 5185. https://doi.org/10.3390/rs15215185
Park S, Choi Y, Hwang H. SACuP: Sonar Image Augmentation with Cut and Paste Based DataBank for Semantic Segmentation. Remote Sensing. 2023; 15(21):5185. https://doi.org/10.3390/rs15215185
Chicago/Turabian StylePark, Sundong, Yoonyoung Choi, and Hyoseok Hwang. 2023. "SACuP: Sonar Image Augmentation with Cut and Paste Based DataBank for Semantic Segmentation" Remote Sensing 15, no. 21: 5185. https://doi.org/10.3390/rs15215185
APA StylePark, S., Choi, Y., & Hwang, H. (2023). SACuP: Sonar Image Augmentation with Cut and Paste Based DataBank for Semantic Segmentation. Remote Sensing, 15(21), 5185. https://doi.org/10.3390/rs15215185