Cross-Granularity Infrared Image Segmentation Network for Nighttime Marine Observations
Abstract
:1. Introduction
- A cross-granularity infrared image segmentation network named CGSegNet is proposed for high-quality nighttime marine observation. We constructed a hybrid CNN–Transformer–HOG feature framework with multi-granularity to enhance segmentation accuracy against complex marine scenarios.
- We designed a multi-scale fusion module (AFM) to combine CNN-based local feature extraction with Transformer-based global context modeling, effectively aligning features under granularity disparity conditions.
- To address the challenges posed by noisy and low-contrast infrared images, we introduce critical HOG features into our model, improving boundary stability and segmentation accuracy in boundary delineation and low-contrast conditions.
- Our method achieves state-of-the-art results on a public infrared segmentation dataset, demonstrating superior segmentation accuracy compared to other baseline methods. It outperforms leading segmentation models, showcasing its robustness and effectiveness in addressing the unique challenges of infrared-based marine perception.
2. Related Work
2.1. Infrared Image Semantic Segmentation
2.2. Visual Semantic Segmentation in Maritime Environments
3. Method
3.1. Overall Structure
3.2. Local Granularity Branch
3.3. Global Granularity Branch
3.4. HOG Feature Branch
3.5. Adaptive Multi-Scale Fusion (AMF)
3.6. HOG and Deep-Learning Feature Fusion (HDF)
4. Experiment
4.1. Experimental Datasets
4.2. Experimental Setups and Evaluation Metrics
4.3. Comparison to Other Baseline Methods
4.4. Ablation Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, R.; Su, Y.; Li, Y.; Zhang, L.; Feng, J. Infrared and visible image fusion methods for unmanned surface vessels with marine applications. J. Mar. Sci. Eng. 2022, 10, 588. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, B.; Huo, L.; Fan, Y. GT-YOLO: Nearshore infrared ship detection based on infrared images. J. Mar. Sci. Eng. 2024, 12, 213. [Google Scholar] [CrossRef]
- Wang, H.Y.; Fang, H.M.; Chiang, Y.C. Application of unmanned aerial vehicle–based infrared images in Determining Characteristics of Sea Surface Temperature Distribution. J. Mar. Sci. Technol. 2023, 31, 2. [Google Scholar] [CrossRef]
- O’Byrne, M.; Pakrashi, V.; Schoefs, F.; Ghosh, B. Semantic segmentation of underwater imagery using deep networks trained on synthetic imagery. J. Mar. Sci. Eng. 2018, 6, 93. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, L.; Song, H.; Zhou, W. Active contours with selective local or global segmentation: A new formulation and level set method. Image Vis. Comput. 2010, 28, 668–676. [Google Scholar] [CrossRef]
- Xue, H.; Chen, X.; Zhang, R.; Wu, P.; Li, X.; Liu, Y. Deep learning-based maritime environment segmentation for unmanned surface vehicles using superpixel algorithms. J. Mar. Sci. Eng. 2021, 9, 1329. [Google Scholar] [CrossRef]
- Xu, H.; Zhang, X.; He, J.; Geng, Z.; Yu, Y.; Cheng, Y. Panoptic water surface visual perception for USVs using monocular camera sensor. IEEE Sens. J. 2024, 24, 24263–24274. [Google Scholar] [CrossRef]
- Xu, H.; Zhang, X.; He, J.; Geng, Z.; Pang, C.; Yu, Y. Surround-view water surface BEV segmentation for autonomous surface vehicles: Dataset, baseline and hybrid-BEV network. IEEE Trans. Intell. Veh. 2024, 1–15. [Google Scholar] [CrossRef]
- Zhang, L.; Sun, X.; Li, Z.; Kong, D.; Liu, J.; Ni, P. Boundary enhancement-driven accurate semantic segmentation networks for unmanned surface vessels in complex marine environments. IEEE Sens. J. 2024, 24, 24972–24987. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—18th International Conference on Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Liu, F.; Fang, M. Semantic segmentation of underwater images based on improved Deeplab. J. Mar. Sci. Eng. 2020, 8, 188. [Google Scholar] [CrossRef]
- He, J.; Chen, J.; Xu, H.; Yu, Y. SonarNet: Hybrid CNN-Transformer-HOG framework and multi-feature fusion mechanism for forward-Looking sonar image segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4203217. [Google Scholar] [CrossRef]
- Wan, M.; Huang, Q.; Xu, Y.; Gu, G.; Chen, Q. Global and local multi-feature fusion-based active contour model for infrared image segmentation. Displays 2023, 78, 102452. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, K.; Cheng, Z.; Qiao, P.; Zheng, X.; Ji, R.; Liu, C.; Yuan, L.; Chen, J. GraCo: Granularity-controllable interactive segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3501–3510. [Google Scholar]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 877–886. [Google Scholar]
- Tong, X.; Su, S.; Wu, P.; Guo, R.; Wei, J.; Zuo, Z.; Sun, B. MSAFFNet: A multiscale label-supervised attention feature fusion network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5002616. [Google Scholar] [CrossRef]
- Chan, T.F.; Vese, L.A. Active contours without edges. IEEE Trans. Image Process. 2001, 10, 266–277. [Google Scholar] [CrossRef]
- Li, C.; Kao, C.Y.; Gore, J.C.; Ding, Z. Minimization of region-scalable fitting energy for image segmentation. IEEE Trans. Image Process. 2008, 17, 1940–1949. [Google Scholar] [PubMed]
- Fang, J.; Liu, H.; Zhang, L.; Liu, J.; Liu, H. Active contour driven by weighted hybrid signed pressure force for image segmentation. IEEE Access 2019, 7, 97492–97504. [Google Scholar] [CrossRef]
- Liu, H.; Fang, J.; Zhang, Z.; Lin, Y. A novel active contour model guided by global and local signed energy-based pressure force. IEEE Access 2020, 8, 59412–59426. [Google Scholar] [CrossRef]
- Yao, L.; Kanoulas, D.; Ji, Z.; Liu, Y. ShorelineNet: An efficient deep learning approach for shoreline semantic segmentation for unmanned surface vehicles. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5403–5409. [Google Scholar]
- Zhan, W.; Xiao, C.; Wen, Y.; Zhou, C.; Yuan, H.; Xiu, S.; Zou, X.; Xie, C.; Li, Q. Adaptive semantic segmentation for unmanned surface vehicle navigation. Electronics 2020, 9, 213. [Google Scholar] [CrossRef]
- Girisha, S.; Verma, U.; Pai, M.M.; Pai, R.M. Uvid-net: Enhanced semantic segmentation of uav aerial videos by embedding temporal information. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4115–4127. [Google Scholar] [CrossRef]
- Ding, L.; Terwilliger, J.; Sherony, R.; Reimer, B.; Fridman, L. Value of temporal dynamics information in driving scene segmentation. IEEE Trans. Intell. Veh. 2021, 7, 113–122. [Google Scholar] [CrossRef]
- Shi, H.; Li, R.; Liu, F.; Lin, G. Temporal feature matching and propagation for semantic segmentation on 3D point cloud sequences. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7491–7502. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Nirgudkar, S.; DeFilippo, M.; Sacarny, M.; Benjamin, M.; Robinette, P. Massmind: Massachusetts maritime infrared dataset. Int. J. Robot. Res. 2023, 42, 21–32. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 205–218. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Zhang, T.; Cao, S.; Pu, T.; Peng, Z. AGPCNet: Attention-guided pyramid context networks for infrared small target detection. arXiv 2021, arXiv:2111.03580. [Google Scholar]
Method | IoU [%] | mIoU | F1 | ||||||
---|---|---|---|---|---|---|---|---|---|
Sky | Water | Bridge | Obstacle | Person | Background | Others | |||
PSPNet [29] | 95.92 | 96.76 | 47.60 | 35.58 | 31.52 | 67.35 | 85.47 | 65.74 | 75.32 |
FCN [10] | 96.21 | 97.42 | 50.85 | 38.32 | 36.76 | 70.14 | 86.25 | 67.99 | 76.46 |
U-Net [11] | 96.83 | 97.48 | 51.64 | 40.69 | 38.51 | 72.87 | 86.94 | 69.28 | 78.35 |
DeepLabv3+ [27] | 97.40 | 97.91 | 55.19 | 46.32 | 41.59 | 76.60 | 87.83 | 71.83 | 80.91 |
Swin Transformer [30] | 97.87 | 97.83 | 59.93 | 50.75 | 46.97 | 80.54 | 89.24 | 74.73 | 84.57 |
Swin-UNet [31] | 96.92 | 97.59 | 57.48 | 49.18 | 45.84 | 78.38 | 87.56 | 73.27 | 83.42 |
SegFormer [32] | 97.61 | 98.24 | 61.45 | 52.09 | 48.61 | 82.44 | 90.39 | 75.83 | 85.16 |
ISNet [16] | 96.37 | 97.31 | 58.36 | 48.90 | 45.07 | 77.35 | 87.91 | 73.04 | 83.85 |
AGPCNet [33] | 97.58 | 97.88 | 60.23 | 51.41 | 47.94 | 80.03 | 88.27 | 74.76 | 85.14 |
MSAFFNet [17] | 98.02 | 98.37 | 62.38 | 52.83 | 49.39 | 81.56 | 89.63 | 76.02 | 86.52 |
Ours | 98.52 | 98.75 | 66.04 | 57.21 | 53.71 | 83.07 | 90.85 | 78.30 | 89.37 |
Method | Parameters (MB) | Inference Speed (FPS) |
---|---|---|
PSPNet [29] | 47.02 | 39.1 |
FCN [10] | 11.05 | 215.3 |
U-Net [11] | 25.31 | 112.4 |
DeepLabv3+ [27] | 55.84 | 30.2 |
Swin Transformer [30] | 147.5 | 18.5 |
Swin-UNet [31] | 41.59 | 22.1 |
SegFormer [32] | 182.3 | 15.3 |
ISNet [16] | 39.2 | 25.6 |
AGPCNet [33] | 107.3 | 17.4 |
MSAFFNet [17] | 52.3 | 21.2 |
Ours | 138.2 | 19.1 |
Method | mIoU | F1 Score | Inference Speed (FPS) |
---|---|---|---|
Baseline | |||
Local granularity (LG) branch | 71.98 | 80.62 | 48.5 |
Global granularity (GG) branch | 74.21 | 85.04 | 25.0 |
Histogram of oriented gradients (HOG) feature branch | 63.91 | 74.32 | 72.1 |
Cross-granularity | |||
LG + GG | 76.98 | 87.49 | 23.3 |
LG + HOG | 74.15 | 85.20 | 30.2 |
GG + HOG | 77.39 | 88.65 | 26.5 |
LG + GG + HOG (Ours) | 78.30 | 89.37 | 19.1 |
Global–Local Granularity Fusion | HOG–Deep Features Fusion | mIoU | F1 Score | Inference Speed (FPS) |
---|---|---|---|---|
Add | HDF | 74.19 | 83.69 | 23.5 |
Concat | HDF | 75.37 | 86.52 | 17.5 |
CBAM | HDF | 76.03 | 87.59 | 20.3 |
AMF | HDF | 78.30 | 89.37 | 19.1 |
AMF | Add | 75.65 | 87.32 | 21.5 |
AMF | Concat | 76.85 | 87.91 | 18.2 |
AMF | CBAM | 76.20 | 88.54 | 19.4 |
AMF | HDF | 78.30 | 89.37 | 19.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, H.; Yu, Y.; Zhang, X.; He, J. Cross-Granularity Infrared Image Segmentation Network for Nighttime Marine Observations. J. Mar. Sci. Eng. 2024, 12, 2082. https://doi.org/10.3390/jmse12112082
Xu H, Yu Y, Zhang X, He J. Cross-Granularity Infrared Image Segmentation Network for Nighttime Marine Observations. Journal of Marine Science and Engineering. 2024; 12(11):2082. https://doi.org/10.3390/jmse12112082
Chicago/Turabian StyleXu, Hu, Yang Yu, Xiaomin Zhang, and Ju He. 2024. "Cross-Granularity Infrared Image Segmentation Network for Nighttime Marine Observations" Journal of Marine Science and Engineering 12, no. 11: 2082. https://doi.org/10.3390/jmse12112082
APA StyleXu, H., Yu, Y., Zhang, X., & He, J. (2024). Cross-Granularity Infrared Image Segmentation Network for Nighttime Marine Observations. Journal of Marine Science and Engineering, 12(11), 2082. https://doi.org/10.3390/jmse12112082