TCNet: A Transformer–CNN Hybrid Network for Marine Aquaculture Mapping from VHSR Images
Abstract
:1. Introduction
- (1)
- This study presented a Transformer–CNN hybrid Network (TCNet) for the mapping of marine aquaculture from VHSR images;
- (2)
- A hierarchical lightweight Transformer module was proposed to capture long-range dependencies, which helped to identify various marine aquaculture areas in complex marine environments;
- (3)
- An attention-mechanism-based structure was employed to refine the feature space in long-span connections, which helped to decline detailed structures of the marine aquaculture areas.
2. Study Area
3. Materials and Methods
3.1. Data and Preprocessing
3.2. Transformer–CNN Hybrid Network
3.2.1. Encoder Based on CNN
3.2.2. Hierarchical Lightweight Transformer Module
3.2.3. Detailed Structure Refinement
3.3. Experimental Details
3.4. Comparison Methods
3.4.1. FCN-Based Methods
3.4.2. Transformer-Based Methods
3.5. Accuracy Assessment
4. Results and Comparison
4.1. FCN vs. Our Approach
4.2. Transformer vs. Our Approach
5. Discussion
5.1. Integrating Transformer and CNN for Semantic Segmentation
5.2. Additional Spectral Bands Analysis
5.3. Ablation Analysis
5.4. Advantages and Limitations
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- FAO. The State of World Fisheries and Aquaculture 2022: Towards Blue Transformation; FAO: Rome, Italy, 2022; pp. 3–4. [Google Scholar]
- Gentry, R.R.; Froehlich, H.E.; Grimm, D.; Kareiva, P.; Parke, M.; Rust, M.; Gaines, S.D.; Halpern, B.S. Mapping the global potential for marine aquaculture. Nat. Ecol. Evol. 2017, 1, 1317–1324. [Google Scholar] [CrossRef]
- Tovar, A.; Moreno, C.; Mánuel-Vez, M.P.; García-Vargas, M. Environmental impacts of intensive aquaculture in marine waters. Water Res. 2000, 34, 334–342. [Google Scholar] [CrossRef]
- Rubio-Portillo, E.; Villamor, A.; Fernandez-Gonzalez, V.; Antón, J.; Sanchez-Jerez, P. Exploring changes in bacterial communities to assess the influence of fish farming on marine sediments. Aquaculture 2019, 506, 459–464. [Google Scholar] [CrossRef]
- Rigos, G.; Katharios, P. Pathological obstacles of newly-introduced fish species in Mediterranean mariculture: A review. Rev. Fish Biol. Fish. 2010, 20, 47–70. [Google Scholar] [CrossRef]
- Lillesand, T.; Kiefer, R.W.; Chipman, J. Remote Sensing and Image Interpretation, 5th ed.; John Wiley & Sons: Hobokan, NJ, USA, 2004; pp. 42–44. [Google Scholar]
- Cheng, J.; Jia, N.; Chen, R.S.; Guo, X.A.; Ge, J.Z.; Zhou, F.C. High-Resolution Mapping of Seaweed Aquaculture along the Jiangsu Coast of China Using Google Earth Engine (2016–2022). Remote Sens. 2022, 14, 6202. [Google Scholar] [CrossRef]
- Geng, J.; Fan, J.; Wang, H. Weighted Fusion-Based Representation Classifiers for Marine Floating Raft Detection of SAR Images. IEEE Geosci. Remote Sens. 2017, 14, 444–448. [Google Scholar] [CrossRef]
- Zheng, Y.; Wu, J.; Wang, A.; Chen, J. Object-and pixel-based classifications of macroalgae farming area with high spatial resolution imagery. Geocarto Int. 2017, 33, 1048–1063. [Google Scholar] [CrossRef]
- Zheng, Y.H.; Duarte, C.M.; Chen, J.; Li, D.; Lou, Z.H.; Wu, J.P. Remote sensing mapping of macroalgal farms by modifying thresholds in the classification tree. Geocarto Int. 2019, 34, 1098–1108. [Google Scholar] [CrossRef]
- Wang, M.; Cui, Q.; Wang, J.; Ming, D.; Lv, G. Raft cultivation area extraction from high resolution remote sensing imagery by fusing multi-scale region-line primitive association features. Isprs J. Photogramm. 2017, 123, 104–113. [Google Scholar] [CrossRef]
- Fu, Y.; Deng, J.; Ye, Z.; Gan, M.; Wang, K.; Wu, J.; Yang, W.; Xiao, G. Coastal aquaculture mapping from very high spatial resolution imagery by combining object-based neighbor features. Sustainability 2019, 11, 637. [Google Scholar] [CrossRef]
- Fu, Y.; Ye, Z.; Deng, J.; Zheng, X.; Huang, Y.; Yang, W.; Wang, Y.; Wang, K. Finer Resolution Mapping of Marine Aquaculture Areas Using WorldView-2 Imagery and a Hierarchical Cascade Convolutional Neural Network. Remote Sens. 2019, 11, 1678. [Google Scholar] [CrossRef]
- Fu, Y.Y.; Deng, J.S.; Wang, H.Q.; Comber, A.; Yang, W.; Wu, W.Q.; You, S.X.; Lin, Y.; Wang, K. A new satellite-derived dataset for marine aquaculture areas in China’s coastal region. Earth Syst. Sci. Data 2021, 13, 1829–1842. [Google Scholar] [CrossRef]
- Shi, T.; Xu, Q.; Zou, Z.; Shi, Z. Automatic Raft Labeling for Remote Sensing Images via Dual-Scale Homogeneous Convolutional Neural Network. Remote Sens. 2018, 10, 1130. [Google Scholar] [CrossRef]
- Cui, B.E.; Fei, D.; Shao, G.H.; Lu, Y.; Chu, J.L. Extracting Raft Aquaculture Areas from Remote Sensing Images via an Improved U-Net with a PSE Structure. Remote Sens. 2019, 11, 2053. [Google Scholar] [CrossRef]
- Lu, Y.M.; Shao, W.; Sun, J. Extraction of Offshore Aquaculture Areas from Medium-Resolution Remote Sensing Images Based on Deep Learning. Remote Sens. 2021, 13, 3854. [Google Scholar] [CrossRef]
- Fu, Y.Y.; You, S.C.; Zhang, S.J.; Cao, K.; Zhang, J.H.; Wang, P.; Bi, X.; Gao, F.; Li, F.Z. Marine aquaculture mapping using GF-1 WFV satellite images and full resolution cascade convolutional neural network. Int. J. Digit Earth 2022, 15, 2048–2061. [Google Scholar] [CrossRef]
- Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; van der Meer, F.; van der Werff, H.; van Coillie, F.; et al. Geographic Object-Based Image Analysis—Towards a new paradigm. Isprs. J. Photogramm. 2014, 87, 180–191. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Yuan, Q.Q.; Shen, H.F.; Li, T.W.; Li, Z.W.; Li, S.W.; Jiang, Y.; Xu, H.Z.; Tan, W.W.; Yang, Q.Q.; Wang, J.W.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Env. 2020, 241, 111716. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.H.; Zhang, M.M.; Wang, J.J.; Li, W. Cross-Scale Mixing Attention for Multisource Remote Sensing Data Fusion and Classification. IEEE Trans. Geosci. Remote 2023, 61, 5507815. [Google Scholar] [CrossRef]
- Gao, Y.H.; Li, W.; Zhang, M.M.; Wang, J.B.; Sun, W.W.; Tao, R.; Du, Q. Hyperspectral and Multispectral Classification for Coastal Wetland Using Depthwise Feature Interaction Network. IEEE Trans. Geosci. Remote 2022, 60, 5512615. [Google Scholar] [CrossRef]
- Gao, Y.H.; Zhang, M.M.; Li, W.; Song, X.K.; Jiang, X.Y.; Ma, Y.Q. Adversarial Complementary Learning for Multisource Remote Sensing Classification. IEEE Trans. Geosci. Remote 2023, 61, 5505613. [Google Scholar] [CrossRef]
- Li, W.; Gao, Y.H.; Zhang, M.M.; Tao, R.; Du, Q. Asymmetric Feature Fusion Network for Hyperspectral and SAR Image Classification. IEEE Trans. Neur. Net. Lear. 2022, 1–14. [Google Scholar] [CrossRef]
- Peng, C.; Zhang, X.Y.; Yu, G.; Luo, G.M.; Sun, J. Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1743–1751. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Wang, X.L.; Girshick, R.; Gupta, A.; He, K.M. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Adv Neur In, Long Beach, CA, USA, 4–9 December 2017; pp. 600–610. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 447–456. [Google Scholar]
- Pinheiro, P.O.; Lin, T.Y.; Collobert, R.; Dollár, P. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision 2016, Amsterdam, The Netherlands, 8–16 October 2016; pp. 75–91. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Updike, T.; Comp, C. Radiometric Use of WorldView-2 Imagery; DigitalGlobe: Westminster, CO, USA, 2010; pp. 1–17. [Google Scholar]
- Lin, C.; Wu, C.C.; Tsogt, K.; Ouyang, Y.C.; Chang, C.I. Effects of atmospheric correction and pansharpening on LULC classification accuracy using WorldView-2 imagery. Inf. Process. Agric. 2015, 2, 25–36. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Wang, W.H.; Xie, E.Z.; Li, X.; Fan, D.P.; Song, K.T.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved baselines with Pyramid Vision Transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Islam, M.A.; Jia, S.; Bruce, N.D.B. How Much Position Information Do Convolutional Neural Networks Encode? arXiv 2020, arXiv:2001.08248. [Google Scholar] [CrossRef]
- Chu, X.; Tian, Z.; Zhang, B.; Wang, X.; Shen, C. Conditional Positional Encodings for Vision Transformers. arXiv 2021, arXiv:2102.10882. [Google Scholar] [CrossRef]
- Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar] [CrossRef]
- Zhang, Y.; Qiu, Z.; Yao, T.; Liu, D.; Mei, T. Fully Convolutional Adaptation Networks for Semantic Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3431–3440. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.J.; Li, Y.; Bao, Y.J.; Fang, Z.W.; Lu, H.Q. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
- Schlemper, J.; Oktay, O.; Schaap, M.; Heinrich, M.; Kainz, B.; Glocker, B.; Rueckert, D. Attention gated networks: Learning to leverage salient regions in medical images. Med. Image Anal. 2019, 53, 197–207. [Google Scholar] [CrossRef]
- Zheng, S.X.; Lu, J.C.; Zhao, H.S.; Zhu, X.T.; Luo, Z.K.; Wang, Y.B.; Fu, Y.W.; Feng, J.F.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6877–6886. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar]
- Xie, E.Z.; Wang, W.H.; Yu, Z.D.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2021, arXiv:2010.04159. [Google Scholar] [CrossRef]
- Peng, Z.; Huang, W.; Gu, S.; Xie, L.; Wang, Y.; Jiao, J.; Ye, Q. Conformer: Local Features Coupling Global Representations for Visual Recognition. arXiv 2021, arXiv:2105.03889. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.T.; Cao, Y.; Hu, H.; Wei, Y.X.; Zhang, Z.; Lin, S.; Guo, B.N. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar] [CrossRef]
- Jamali, A.; Roy, S.K.; Ghamisi, P. WetMapFormer: A unified deep CNN and vision transformer for complex wetland mapping. Int. J. Appl. Earth Obs. 2023, 120, 103333. [Google Scholar] [CrossRef]
- Yuan, F.N.; Zhang, Z.X.; Fang, Z.J. An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recogn. 2023, 136, 109228. [Google Scholar] [CrossRef]
- Marchisio, G.; Pacifici, F.; Padwick, C. On the Relative Predictive Value of the New Spectral Bands in the Worldview-2 Sensor. In Proceedings of the 2010 IEEE International Geoscience and Remote Sensing Symposium, Honolulu, HI, USA, 25–30 July 2010; pp. 2723–2726. [Google Scholar]
- Maeda-Martinez, A.N.; Ormart, P.; Mendez, L.; Acosta, B.; Sicard, M.T. Scallop growout using a new bottom-culture system. Aquaculture 2000, 189, 73–84. [Google Scholar] [CrossRef]
- Yang, X.L.; Song, Z.X.; King, I.; Xu, Z.L. A Survey on Deep Semi-Supervised Learning. IEEE Trans. Knowl. Data En. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]
Methods | RCA | CCA | Mean | |||
---|---|---|---|---|---|---|
F1-Score | IoU | F1-Score | IoU | F1-Score | IoU | |
FCN-32s | 0.89 | 0.80 | 0.90 | 0.83 | 0.90 | 0.81 |
Deeplab v2 | 0.93 | 0.86 | 0.94 | 0.89 | 0.93 | 0.88 |
UNet | 0.92 | 0.86 | 0.95 | 0.91 | 0.94 | 0.89 |
HCNet | 0.93 | 0.87 | 0.95 | 0.91 | 0.94 | 0.89 |
HCHNet | 0.91 | 0.84 | 0.93 | 0.86 | 0.92 | 0.85 |
Attention UNet | 0.93 | 0.87 | 0.96 | 0.92 | 0.95 | 0.90 |
Ours-TCNet | 0.93 | 0.88 | 0.97 | 0.94 | 0.95 | 0.91 |
Methods | RCA | CCA | Mean | |||
---|---|---|---|---|---|---|
F1-Score | IoU | F1-Score | IoU | F1-Score | IoU | |
SETR | 0.89 | 0.81 | 0.86 | 0.76 | 0.88 | 0.78 |
Swin-UNet | 0.91 | 0.83 | 0.94 | 0.88 | 0.92 | 0.86 |
TransUNet | 0.92 | 0.85 | 0.96 | 0.92 | 0.94 | 0.88 |
Segformers | 0.92 | 0.85 | 0.93 | 0.88 | 0.93 | 0.87 |
Ours-TCNet | 0.93 | 0.88 | 0.97 | 0.94 | 0.95 | 0.91 |
Methods | RCA | CCA | Mean | |||
---|---|---|---|---|---|---|
F1-Score (%) | IoU (%) | F1-Score (%) | IoU (%) | F1-Score (%) | IoU (%) | |
TCNet-4 | 92.6 | 86.3 | 94.7 | 89.9 | 93.7 | 88.1 |
TCNet-8 | 93.5 | 87.8 | 96.9 | 94.0 | 95.2 | 90.9 |
Methods | RCA | CCA | Mean | |||
---|---|---|---|---|---|---|
F1-Score (%) | IoU (%) | F1-Score (%) | IoU (%) | F1-Score (%) | IoU (%) | |
Baseline | 92.9 | 86.8 | 95.3 | 91.0 | 94.1 | 88.9 |
+HLT | 93.4 | 87.7 | 96.4 | 93.1 | 95.0 | 90.4 |
+HLT + FSR | 93.5 | 87.8 | 96.9 | 94.0 | 95.2 | 90.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fu, Y.; Zhang, W.; Bi, X.; Wang, P.; Gao, F. TCNet: A Transformer–CNN Hybrid Network for Marine Aquaculture Mapping from VHSR Images. Remote Sens. 2023, 15, 4406. https://doi.org/10.3390/rs15184406
Fu Y, Zhang W, Bi X, Wang P, Gao F. TCNet: A Transformer–CNN Hybrid Network for Marine Aquaculture Mapping from VHSR Images. Remote Sensing. 2023; 15(18):4406. https://doi.org/10.3390/rs15184406
Chicago/Turabian StyleFu, Yongyong, Wenjia Zhang, Xu Bi, Ping Wang, and Feng Gao. 2023. "TCNet: A Transformer–CNN Hybrid Network for Marine Aquaculture Mapping from VHSR Images" Remote Sensing 15, no. 18: 4406. https://doi.org/10.3390/rs15184406
APA StyleFu, Y., Zhang, W., Bi, X., Wang, P., & Gao, F. (2023). TCNet: A Transformer–CNN Hybrid Network for Marine Aquaculture Mapping from VHSR Images. Remote Sensing, 15(18), 4406. https://doi.org/10.3390/rs15184406