Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery
Abstract
:1. Introduction
2. Materials
2.1. KOMPSAT Imagery
2.2. Training and Testing Datasets
3. Methodology
3.1. UNet3+
3.2. Proposed Network Integrating a CNN and a Transformer
3.3. Deep Supervision for Efficiently Conducting Training
4. Experimental Results
4.1. Training the Deep Learning Model
4.2. Assessment of the Proposed Deep Learning Model
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dwyer, J.L.; Roy, D.P.; Sauer, B.; Jenkerson, C.B.; Zhang, H.K.; Lymburner, L. Analysis Ready Data: Enabling Analysis of the Landsat Archive. Remote Sens. 2018, 10, 1363. [Google Scholar] [CrossRef]
- Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Joseph Hughes, M.; Laue, B. Cloud Detection Algorithm Comparison and Validation for Operational Landsat Data Products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef]
- Frantz, D.; Haß, E.; Uhl, A.; Stoffels, J.; Hill, J. Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sens. Environ. 2018, 215, 471–481. [Google Scholar] [CrossRef]
- Frey, R.A.; Ackerman, S.A.; Liu, Y.; Strabala, K.I.; Zhang, H.; Key, J.R.; Wang, X. Cloud Detection with MODIS. Part I: Improvements in the MODIS Cloud Mask for Collection 5. J. Atmos. Ocean. Technol. 2008, 25, 1057–1072. [Google Scholar] [CrossRef]
- Stöckli, R.; Bojanowski, J.S.; John, V.O.; Duguay-Tetzlaff, A.; Bourgeois, Q.; Schulz, J.; Hollmann, R. Cloud Detection with Historical Geostationary Satellite Sensors for Climate Applications. Remote Sens. 2019, 11, 1052. [Google Scholar] [CrossRef]
- Mahajan, S.; Fataniya, B. Cloud detection methodologies: Variants and development—A review. Complex Intell. Syst. 2019, 6, 251–261. [Google Scholar] [CrossRef]
- Lee, S.; Choi, J. Daytime Cloud Detection Algorithm Based on a Multitemporal Dataset for GK-2A Imagery. Remote Sens. 2021, 13, 3215. [Google Scholar] [CrossRef]
- Zhu, Z.; Woodcock, C.E. Automated cloud, cloud shadow, and snow detection in multitemporal landsat data: An algorithm designed specifically for monitoring land cover change. Remote Sens. Environ. 2014, 152, 217–234. [Google Scholar] [CrossRef]
- Zhu, Z.; Wang, S.; Woodcock, C.E. Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sens. Environ. 2015, 159, 269–277. [Google Scholar] [CrossRef]
- Qiu, S.; Zhu, Z.; He, B. Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4–8 and Sentinel-2 imagery. Remote Sens. Environ. 2019, 231, 111205. [Google Scholar] [CrossRef]
- Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for sentinel-2. In Proceedings of the Image and Signal Processing for Remote Sensing XXIII, Warsaw, Poland, 4 October 2017; p. 1042704. [Google Scholar]
- Bai, T.; Li, D.; Sun, K.; Chen, Y.; Li, W. Cloud Detection for High-Resolution Satellite Imagery Using Machine Learning and Multi-Feature Fusion. Remote Sens. 2016, 8, 715. [Google Scholar] [CrossRef]
- Chen, X.; Liu, L.; Gao, Y.; Zhang, X.; Xie, S. A Novel Classification Extension-Based Cloud Detection Method for Medium-Resolution Optical Images. Remote Sens. 2020, 12, 2365. [Google Scholar] [CrossRef]
- Wei, J.; Huang, W.; Li, Z.; Sun, L.; Zhu, X.; Yuan, Q.; Liu, L.; Cribb, M. Cloud Detection for Landsat Imagery by Combining the Random Forest and Superpixels Extracted via Energy-Driven Sampling Segmentation Approaches. Remote Sens. Environ. 2020, 248, 112005. [Google Scholar] [CrossRef]
- Yao, X.; Guo, Q.; Li, A.; Shi, L. Optical remote sensing cloud detection based on random forest only using the visible light and near-infrared image bands. Eur. J. Remote Sens. 2022, 55, 150–167. [Google Scholar] [CrossRef]
- Pirinen, A.; Abid, N.; Paszkowsky, N.A.; Timoudas, T.O.; Scheirer, R.; Ceccobello, C.; Kovács, G.; Persson, A. Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI. Remote Sens. 2024, 16, 694. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 1, pp. 1097–1105. [Google Scholar]
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597[cs]. [Google Scholar]
- Jégou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1175–1183. [Google Scholar]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. Segformer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar]
- Yan, H.; Li, Z.; Li, W.; Wang, C.; Wu, M.; Zhang, C. ConTNet: Why not use convolution and transformer at the same time? arXiv 2021, arXiv:2104.13497. [Google Scholar]
- Jin, Y.; Han, D.; Ko, H. TrSeg: Transformer for semantic segmentation. Pattern Recognit. Lett. 2021, 148, 29–35. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, Y. Conv-PVT: A fusion architecture of convolution and pyramid vision transformer. Int. J. Mach. Learn. Cyber. 2023, 14, 2127–2136. [Google Scholar] [CrossRef]
- Gao, L.; Liu, H.; Yang, M.; Chen, L.; Wan, Y.; Xiao, Z.; Qian, Y. Stransfuse: Fusing swin Transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10990–11003. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images. Remote Sens. 2021, 13, 3065. [Google Scholar] [CrossRef]
- Zhang, W.; Tan, Z.; Lv, Q.; Li, J.; Zhu, B.; Liu, Y. An Efficient Hybrid CNN-Transformer Approach for Remote Sensing Super-Resolution. Remote Sens. 2024, 16, 880. [Google Scholar] [CrossRef]
- Yao, M.; Zhang, Y.; Liu, G.; Pang, D. SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3023–3037. [Google Scholar] [CrossRef]
- Segal-Rozenhaimer, M.; Li, A.; Das, K.; Chirayath, V. Cloud detection algorithm for multi-modal satellite imagery using convolutional neural-networks (CNN). Remote Sens. Environ. 2020, 237, 111446. [Google Scholar] [CrossRef]
- Pu, W.; Wang, Z.; Liu, D.; Zhang, Q. Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion. Remote Sens. 2022, 14, 4312. [Google Scholar] [CrossRef]
- Li, K.; Ma, N.; Sun, L. Cloud Detection of Multi-Type Satellite Images Based on Spectral Assimilation and Deep Learning. Int. J. Remote Sens. 2023, 44, 3106–3121. [Google Scholar] [CrossRef]
- Pasquarella, V.J.; Brown, C.F.; Czerwinski, W.; Rucklidge, W.J. Comprehensive Quality Assessment of Optical Satellite Imagery Using Weakly Supervised Video Learning. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 2125–2135. [Google Scholar]
- Hughes, M.J.; Hayes, D.J. Automated Detection of Cloud and Cloud Shadow in Single-Date Landsat Imagery Using Neural Networks and Spatial Post-Processing. Remote Sens. 2014, 6, 4907–4926. [Google Scholar] [CrossRef]
- Li, J.; Wu, Z.; Hu, Z.; Jian, C.; Luo, S.; Mou, L.; Zhu, X.X.; Molinier, M. A lightweight deep learning-based cloud detection method for Sentinel-2A imagery fusing multiscale spectral and spatial features. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–19. [Google Scholar] [CrossRef]
- He, Q.; Sun, X.; Yan, Z.; Fu, K. DABnet: Deformable contextual and boundary-weighted network for cloud detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5601216. [Google Scholar] [CrossRef]
- López-Puigdollers, D.; Mateo-García, G.; Gómez-Chova, L. Benchmarking Deep Learning Models for Cloud Detection in Landsat-8 and Sentinel-2 Images. Remote Sens. 2021, 13, 992. [Google Scholar] [CrossRef]
- Kim, B.; Oh, H. AI Training Dataset for Cloud Detection of KOMPSAT Images. GEO DATA 2020, 2, 56–62. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Huang, H.M.; Lin, L.F.; Tong, R.F.; Hu, H.J.; Zhang, Q.W.; Iwamoto, Y.; Han, X.H.; Chen, Y.W.; Wu, J. Unet 3+: A Full-Scale Connected Unet for Medical Image Segmentation. In Proceedings of the ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Virtual, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
- Mo, J.; Seong, S.; Oh, J.; Choi, J.J.I.A. SAUNet3+ CD: A Siamese-attentive UNet3+ for change detection in remote sensing images. IEEE Access 2022, 10, 101434–101444. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. Mnasnet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2820–2828. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. [Google Scholar] [CrossRef]
Satellite Sensor | KOMPSAT-3 | KOMPSAT-3A | |
---|---|---|---|
Ground sample distance | Multispectral | 2.8 m | 2.2 m |
Panchromatic | 0.7 m | 0.55 m | |
Spectral wavelength | Panchromatic | 450–900 nm | 450–900 nm |
Blue | 450–520 nm | 450–520 nm | |
Green | 520–600 nm | 520–600 nm | |
Red | 630–690 nm | 630–690 nm | |
NIR | 760–900 nm | 760–900 nm | |
Orbit altitude | 685 km | 528 km | |
Swath width | >15 km (at nadir) | >12 km (at nadir) | |
Radiometric resolution | 14 bits | 14 bits |
Characteristics | Variables |
---|---|
Size | Satellite image: Labeling data: |
Number | Training data: 9628 Validation data: 1069 |
Hardware and Hyperparameters | Value | |
---|---|---|
Hardware | CPU | Intel Xeon W-2235 (3.8 GHz) |
GPU | NVIDIA Quadro RTX A5000 | |
RAM | 128 GB | |
OS | Linux | |
Framework | PyTorch | |
Batch size | 3 | |
Optimizer | AdamW | |
Number of epochs | 100 | |
Loss function | Categorical cross-entropy | |
Hyperparameter | Base learning rate | 0.0001 |
Max learning rate | 0.001 | |
Step size | 5 | |
Gamma | 0.995 | |
Mode | Exponential range |
Test Image ID | CD-FM3SF | HRNet | Segformer | UNet3+ | UNet3+STE |
---|---|---|---|---|---|
1 | 0.9421 | 0.9541 | 0.9433 | 0.9527 | 0.9555 |
2 | 0.9187 | 0.9213 | 0.9252 | 0.9141 | 0.934 |
3 | 0.9244 | 0.9232 | 0.9339 | 0.9193 | 0.9331 |
4 | 0.9707 | 0.9735 | 0.9725 | 0.9713 | 0.975 |
5 | 0.9543 | 0.9569 | 0.957 | 0.9527 | 0.9587 |
6 | 0.9465 | 0.949 | 0.9541 | 0.9413 | 0.9529 |
7 | 0.9876 | 0.9894 | 0.9915 | 0.9882 | 0.99 |
8 | 0.9683 | 0.9727 | 0.9754 | 0.9686 | 0.9733 |
9 | 0.9284 | 0.9346 | 0.9325 | 0.9284 | 0.9383 |
10 | 0.9552 | 0.9554 | 0.9592 | 0.9539 | 0.9587 |
11 | 0.9147 | 0.9754 | 0.9418 | 0.9581 | 0.978 |
12 | 0.9626 | 0.9628 | 0.9732 | 0.964 | 0.9733 |
13 | 0.8828 | 0.9161 | 0.9085 | 0.9091 | 0.9235 |
14 | 0.9776 | 0.9778 | 0.978 | 0.9787 | 0.9813 |
Mean | 0.9453 | 0.9544 | 0.9533 | 0.95 | 0.959 |
Model | FLOPs | Parameters |
---|---|---|
CD-FM3SF | 23.0 G | 0.66 M |
HRNet | 183.6 G | 137.55 M |
Segformer | 99.8 G | 84.60 M |
UNet3+ | 203.0 G | 6.75 M |
UNet3+STE | 65.7 G | 5.93 M |
Method | Clear Sky | Cloud | Cloud Shadow | Mean F1 Score | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 Score | Precision | Recall | F1 Score | Precision | Recall | F1 Score | ||
UNet3+STE without deep supervision | 0.9726 | 0.9810 | 0.9766 | 0.9077 | 0.9081 | 0.9056 | 0.8380 | 0.7224 | 0.7576 | 0.9585 |
UNet3+STE | 0.9744 | 0.9791 | 0.9767 | 0.9132 | 0.9089 | 0.9068 | 0.8517 | 0.7377 | 0.7613 | 0.9590 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, J.; Seo, D.; Jung, J.; Han, Y.; Oh, J.; Lee, C. Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery. Remote Sens. 2024, 16, 3880. https://doi.org/10.3390/rs16203880
Choi J, Seo D, Jung J, Han Y, Oh J, Lee C. Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery. Remote Sensing. 2024; 16(20):3880. https://doi.org/10.3390/rs16203880
Chicago/Turabian StyleChoi, Jaewan, Doochun Seo, Jinha Jung, Youkyung Han, Jaehong Oh, and Changno Lee. 2024. "Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery" Remote Sensing 16, no. 20: 3880. https://doi.org/10.3390/rs16203880
APA StyleChoi, J., Seo, D., Jung, J., Han, Y., Oh, J., & Lee, C. (2024). Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery. Remote Sensing, 16(20), 3880. https://doi.org/10.3390/rs16203880