Floodwater Extraction from UAV Orthoimagery Based on a Transformer Model
Abstract
:1. Introduction
2. Materials and Methods
2.1. Research Area Overview
2.2. Related Methods and Water Body Extraction Overall Framework
2.2.1. Attention Mechanism
- (1)
- Based on the query and a certain key, Key_i, calculate the similarity or correlation between the two. The similarity calculation can introduce different functions and computational mechanisms, the most common methods include the following: calculating the dot product of the two vectors, calculating the vector Cosine (Cos) similarity between the two, or introducing an additional neural network to obtain the value.
- (2)
- Since the similarity values obtained in step (1) have different value ranges depending on the specific calculation method used, a calculation method similar to SoftMax is introduced to numerically transform the scores from the first stage. Through this step, normalization can be performed, organizing the original calculation scores into a probability distribution where the sum of all element weights is 1; at the same time, the inherent mechanism of SoftMax further highlights the weight of important elements.
- (3)
- The calculation result a_i from step (2) is the weight coefficient corresponding to value_i, and the attention value can be obtained by weighted summation.
2.2.2. Water Body Extraction Overall Framework
- (1)
- Dataset acquisition and production: The process begins with both automatic and manual data annotation of the acquired remote sensing images to construct a comprehensive floodwater body dataset.
- (2)
- Model training and knowledge transfer: Using the constructed dataset, the method first involves self-supervised pre-training to obtain a pre-trained model. This is followed by training with the fully annotated dataset to develop the water body extraction model.
- (3)
- Extraction and result processing of floodwater contours: For the input remote sensing images, the method consists of five steps—data reading, preprocessing, water body extraction, morphological processing, and vector conversion.
2.3. Algorithm Technical Route and Process
2.3.1. Dataset Acquisition and Production
- (1)
- GID Dataset
- (2)
- GLH-Water Dataset
- (3)
- Floodwater Body Dataset
- (1)
- River, pond (Figure 5a).
- (2)
- Submerged urban roads—cannot include buildings; parked vehicles on the street can be ignored (Figure 5b).
- (3)
- Submerged agricultural land—large areas of forests and vegetation should not be included in the water body area (Figure 5c).
- (4)
- Reflective water bodies from satellite images (Figure 5d).
2.3.2. Model Training and Knowledge Transfer
- (1)
- Model Pre-Training and Knowledge Transfer Based on Deep Learning for Remote Sensing Data
- (2)
- Water Body Extraction Based on Deep Learning Semantic Segmentation Technology
- (1)
- Using a pre-trained transformer model as the backbone network to extract hierarchical features of water bodies in remote sensing images. The transformer model offers stronger feature expression capabilities, enabling the network to extract multi-scale water body features effectively.
- (2)
- Aligning and splicing the multi-scale water body features to achieve feature fusion. Once the model is trained in an end-to-end manner, it is employed for floodwater body extraction, yielding the water body segmentation results.
- (3)
- The transformer block structure is designed as depicted in Figure 8. Each block comprises multiple identical transformer layers, and each transformer layer consists of three main components: layer norm, self-attention, and MLP (multi-layer perceptron). Multi-layer perceptron (MLP) [30,31] is a fundamental type of artificial neural network, composed of multiple layers including an input layer, one or more hidden layers, and an output layer. It is a type of feedforward neural network. Each layer consists of multiple neurons, and each neuron is connected to all neurons in the previous layer, transmitting and processing information through weights and activation functions. The self-attention operation, in particular, is crucial, as it is the primary reason that explains why the transformer structure possesses such strong representation capabilities.
3. Results
3.1. Application in the Haihe River ‘23·7’ Basin Catastrophic Flood
3.2. Accuracy Assessment
- (1)
- Intersection over Union (IoU)
- (2)
- Evaluation Metrics
4. Discussion
- (1)
- Diverse water body types: Remote sensing images feature various water bodies, such as rivers, lakes, and wetlands, each display different spectral characteristics due to terrain, sediment content, and microorganism density. This variability complicates accurate model identification.
- (2)
- Seasonal variations: Water bodies exhibit different textures and spectral properties due to seasonal climate changes, which can be mistaken for shadows cast by vegetation or buildings. The lack of post-disaster training samples further hampers the accuracy of water body boundary extraction.
- (3)
- Limited high-resolution datasets: High-resolution remote sensing datasets with pixel-level annotations are scarce, and the available samples of different water body types are often imbalanced. This scarcity makes it challenging to train robust and generalizable extraction algorithms.
- (4)
- Computational efficiency: The extensive spatial distribution of water bodies in remote sensing images requires algorithms that are both resource-efficient and capable of rapid computation.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Nagasawa, R.; Mas, E.; Moya, L.; Koshimura, S. Model-based analysis of multi-UAV path planning for surveying postdisaster building damage. Sci. Rep. 2021, 11, 18588. [Google Scholar] [CrossRef] [PubMed]
- Garnica-Peña, R.J.; Alcántara-Ayala, I. The use of UAVs for landslide disaster risk research and disaster risk management: A literature review. J. Mt. Sci. 2021, 18, 1–17. [Google Scholar] [CrossRef]
- Dong, Z.; Zhang, M.; Li, L.; Liu, Q.; Wen, Q.; Wang, W.; Luo, W.; Wu, Z.; Tang, T.; Ji, W. A multiscale building detection method based on boundary preservation for remote sensing images: Taking the Yangbi M6.4 earthquake as an example. Nat. Hazards Res. 2022, 2, 121–131. [Google Scholar] [CrossRef]
- Busetti, A.; Leone, C.; Corradetti, A.; Fracaros, S.; Spadotto, S.; Rai, P.; Zini, L.; Calligaris, C. Coastal Storm-Induced Sinkholes: Insights from Unmanned Aerial Vehicle Monitoring. Remote Sens. 2024, 16, 3681. [Google Scholar] [CrossRef]
- Li, Y.; Dang, B.; Li, W.; Zhang, Y. GLH-Water: A Large-Scale Dataset for Global Surface Water Detection in Large-Size Very-High-Resolution Satellite Imagery. arXiv 2023, arXiv:2303.09310. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Liang-Chieh, C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 405–420. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- He, J.; Deng, Z.; Qiao, Y. Dynamic Multi-Scale Filters for Semantic Segmentation. In Proceedings of the International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3562–3572. [Google Scholar]
- Zhao, H.P. Research on Water Body Recognition Combining Spectral and Deep Learning Features Under Big Data. Master’s Thesis, Dalian Jiaotong University, Dalian, China, 2018. [Google Scholar]
- Chen, Y.; Tang, L.L.; Kan, Z.H.; Bilal, M.; Li, Q.Q. A novel water body extraction neural network (WBENN) for optical high resolution multispectral imagery. J. Hydrol. 2020, 588, 125092. [Google Scholar] [CrossRef]
- Chen, Y.; Fan, R.S.; Yang, X.C.; Wang, J.X.; Latif, A. 2018. Extraction of urban water bodies from high-resolution remote-sensing imagery using deep learning. Water 2018, 10, 585. [Google Scholar] [CrossRef]
- Lv, Y.L.; Tian, S.W.; Yu, L.; Zhang, R. Water body recognition based on CNN_SVM with joint spectral features. Comput. Eng. Des. 2019, 40, 243–247. [Google Scholar]
- Weng, L.G.; Xu, Y.M.; Xia, M.; Zhang, Y.H. Water areas segmentation from remote sensing images using a separable residual SegNet network. Int. J. Geo-Inf. 2020, 9, 256. [Google Scholar] [CrossRef]
- Zhang, Z.X.; Liu, Q.J.; Wang, Y.H. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Qiu, C.P.; Mou, L.C.; Schmitt, M.; Zhu, X.X. Fusing multiseasonal Sentinel-2 imagery for urban land cover classification with multibranch residual convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1787–1791. [Google Scholar] [CrossRef]
- Cai, Y.H.; Guo, Y.J.; Lang, S.N.; Liu, J.Q.; Hu, S.B. Classification of hyperspectral images by spectral-spatial dense-residual network. J. Appl. Remote Sens. 2020, 14, 036513. [Google Scholar] [CrossRef]
- Li, Z.Y.; Wang, R.; Zhang, W.; Hu, F.M.; Meng, L.K. Multiscale features supported DeepLabV3+ optimization scheme for accurate water semantic segmentation. IEEE Access 2019, 7, 155787–155804. [Google Scholar] [CrossRef]
- Duan, L.H.; Hu, X.Y. Multiscale refinement network for water body segmentation in high resolution satellite imagery. IEEE Geosci. Remote Sens. Lett. 2020, 17, 686–690. [Google Scholar] [CrossRef]
- Feng, W.Q.; Sui, H.G.; Huang, W.M.; Xu, C.; An, K.Q. Water body extraction from very high-resolution remote sensing imagery using deep U-Net and a super pixel-based conditional random field model. IEEE Geosci. Remote Sens. Lett. 2019, 16, 618–622. [Google Scholar] [CrossRef]
- Tong, X.-Y.; Xia, G.-S.; Lu, Q.; Shen, H.; Li, S.; You, S.; Zhang, L. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens. Environ. 2020, 237, 111322. [Google Scholar] [CrossRef]
- Minsky, M.; Papert, S. Perceptrons: An Introduction to Computational Geometry; MIT Press: Cambridge, MA, USA, 1969. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Gugger, S.; Howard, J. AdamW and Super-Convergence Is Now the Fastest Way to Train Neural Nets. fast.ai blog. 2018. Available online: https://www.fast.ai/posts/2018-07-02-adam-weight-decay.html (accessed on 1 April 2023).
Method | IoU (%) | F1-Score (%) |
---|---|---|
MECNet | 44.67 | 61.75 |
MSResNe | 69.76 | 82.18 |
MSCENet | 74.81 | 85.58 |
FCN8s (General) | 73.66 | 84.83 |
PSPNet (General) | 75.19 | 85.84 |
DeepLab v3 | 79.8 | 88.76 |
HRNet-48 | 78.6 | 88.01 |
STDC-1446 | 75.82 | 86.25 |
MagNet (High-Res) | 62.77 | - |
FCtL (High-Res) | 74.92 | 85.66 |
ISDNet (High-Res) | 53.04 | - |
PCL | 82.26 | 90.27 |
Our Method | 92.25 | 91.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, Z.; Dong, Z.; Yang, K.; Liu, Q.; Wang, W. Floodwater Extraction from UAV Orthoimagery Based on a Transformer Model. Remote Sens. 2024, 16, 4052. https://doi.org/10.3390/rs16214052
Wu Z, Dong Z, Yang K, Liu Q, Wang W. Floodwater Extraction from UAV Orthoimagery Based on a Transformer Model. Remote Sensing. 2024; 16(21):4052. https://doi.org/10.3390/rs16214052
Chicago/Turabian StyleWu, Zhihong, Zhe Dong, Kun Yang, Qingjie Liu, and Wei Wang. 2024. "Floodwater Extraction from UAV Orthoimagery Based on a Transformer Model" Remote Sensing 16, no. 21: 4052. https://doi.org/10.3390/rs16214052
APA StyleWu, Z., Dong, Z., Yang, K., Liu, Q., & Wang, W. (2024). Floodwater Extraction from UAV Orthoimagery Based on a Transformer Model. Remote Sensing, 16(21), 4052. https://doi.org/10.3390/rs16214052