CDUNet: Cloud Detection UNet for Remote Sensing Imagery
Abstract
:1. Introduction
- The location of clouds is often complex and diverse, and the distribution of clouds is often irregular and discontinuous. The traditional threshold method usually needs to use experience and manual calibration, so the prediction accuracy and universality are poor. Besides, the existing neural networks have defects in global information extraction, which easily lose the relative position information between clouds and shadows, so they easily cause category error detection;
- Cloud boundaries are often irregular geometric features. Therefore, we paid more attention to the prediction of cloud boundaries during model training. We hoped to complete the alignment directly in the end-to-end training process without the help of boundary postprocessing.
- The traditional segmentation network will lose a portion of the detailed information in the downsampling stage. In order to refine the edge, we designed a high-frequency feature extractor to extract the detailed information of the image; convolution proved to be an effective feature extraction tool, and it also can be regarded as a special filter, so we designed a Multiscale Convolution module (MSC) to “filter” the image. While suppressing high-frequency noise, it extracts effective texture information, which can make the model predict more refined edges;
- The deep feature map contains lower-frequency signals, having stronger intraclass consistency, but weakening the interclass information. In order to re-establish interclass information, we designed a Spatial Prior Self-Attention block (SPSA) for the convolution neural network, which can better distinguish cloud, cloud shadow, and background information;
- A segmentation network based on an encoder–decoder will inevitably produce semantic dilution in the feature fusion stage. Other studies [14,16,17,18] simply combined high-level context information with Low-level spatial information to solve the above problems. This easily produces information redundancy, and consequently, the convolution kernel cannot receive effective feature information. In our method, we designed a parallel Channel Spatial Attention Block (SCAB) in the feature fusion stage, so that the model can quickly capture effective information and improve the prediction accuracy.
2. Methodology
2.1. Network Architecture
2.2. High-Frequency Feature Extractor
2.3. Multiscale Convolution
2.4. Spatial Prior Self-Attention
2.5. Spatial Channel Attention Block
3. Experiment
3.1. Dataset Introduction
3.1.1. Cloud and Cloud Shadow Dataset
3.1.2. SPARCS Dataset
3.2. Training Detail
3.3. Loss Function
3.4. Ablation Study
- Ablation for the HFE: The existing network can easily lose high-frequency information during the coding stage because of the sampling operation. The high-frequency feature extractor can recover some high-frequency components in the feature map. These high-frequency components include the texture characteristics of the cloud, which have a positive effect on the prediction of the results. As shown in Table 4, the HFE can increase the MIoU of the model from 92.28% to 92.56% and the PA from 96.68% to 96.81% due to the reappearance of details;
- Ablation for MSC: After high-frequency feature enhancement, the feature maps obtain more information, and multiscale convolution can extract effective multiscale information from them. As can be seen from the results in Table 4, the MSC module improved the MIoU of the model by 0.2% and the PA by 0.1%;
- Ablation for SPSA: The space prior self-control module can establish the position relationship between pixels, so that classification information can greatly improve the prediction accuracy. However, at the same time, the parameter and calculation amount brought by the module are huge. MobileNet determines the influence of network width on performance [37]. The study showed that there is redundancy in the feature graph channel, which can reduce the redundancy through the compression channel, and the number of overcompressed channels will also make the information that the network can capture limited. In order to filter the redundant channel information better and reduce the parameter quantity, we suggest that the channel reduction ratio R be greater than or equal to 2. As shown in Table 5, we found that when r = 4, the performance was the best, and the parameters and the number of calculations were relatively appropriate. In addition, we added the ablation experiment of the and priors. The experimental comparison results in Table 6 showed that the model with the and priors had the best performance, which also explained how the introduction of learnable parameters help extract the spatial information from the model;
- Ablation for the SCAB: The feature map extracted by the multiscale module has six groups of multiscale information of different receptive fields, but the weights of this information with respect to the prediction results are different; the SPSA module establishes the location information of the feature map pixels, but ignores the channel information. Therefore, we hoped to reduce the redundancy of multiscale information by adjusting the channel weight and spatial weight of the feature map and strengthen the spatial location information to highlight the key points. Through the experiment, the MIoU of our model improved to 93.52%, and the PA improved to 97.22%.
3.5. Comparison Test of the Cloud and Cloud Shadow Datasets
3.6. Comparison Test of SPARCS Dataset
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kegelmeyer, W.P., Jr. Extraction of Cloud Statistics from Whole Sky Imaging Cameras (No. SAND-94-8222); Sandia National Lab. (SNL-CA): Livermore, CA, USA, 1994.
- Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
- Candra, D.S.; Phinn, S.; Scarth, P. Automated cloud and cloud shadow masking for landsat 8 using multitemporal images in a variety of environments. Remote Sens. 2019, 11, 2060. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Wang, Y.; Xu, S.; Wang, H.; Xiang, S.; Pan, C. Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3322–3337. [Google Scholar] [CrossRef]
- Song, L.; Xia, M.; Jin, J.; Qian, M.; Zhang, Y. SUACDNet: Attentional change detection network based on siamese U-shaped structure. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102597. [Google Scholar] [CrossRef]
- Wang, W.; Shi, Z. An All-Scale Feature Fusion Network With Boundary Point Prediction for Cloud Detection. IEEE Geosci. Remote Sens. Lett. 2021, 2021, 9548325. [Google Scholar] [CrossRef]
- Xia, M.; Wang, T.; Zhang, Y.; Liu, J.; Xu, Y. Cloud/shadow segmentation based on global attention feature fusion residual network for remote sensing imagery. Int. J. Remote Sens. 2021, 42, 2022–2045. [Google Scholar] [CrossRef]
- Mohajerani, S.; Saeedi, P. Cloud-Net: An End-To-End Cloud Detection Algorithm for Landsat 8 Imagery. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1029–1032. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef] [Green Version]
- Zhan, Y.; Wang, J.; Shi, J.; Cheng, G.; Yao, L.; Sun, W. Distinguishing Cloud and Snow in Satellite Images via Deep Convolutional Network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1785–1789. [Google Scholar] [CrossRef]
- Drönner, J.; Korfhage, N.; Egli, S.; Mühling, M.; Thies, B.; Bendix, J.; Freisleben, B.; Seeger, B. Fast Cloud Segmentation Using Convolutional Neural Networks. Remote Sens. 2018, 10, 1782. [Google Scholar] [CrossRef] [Green Version]
- Chai, D.; Newsam, S.; Zhang, H.K.; Qiu, Y.; Huang, J. Cloud and cloud shadow detection in Landsat imagery based on deep convolutional neural networks. Remote Sens. Environ. 2019, 225, 307–316. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Qu, Y.; Xia, M.; Zhang, Y. Strip pooling channel spatial attention network for the segmentation of cloud and cloud shadow. Comput. Geosci. 2021, 157, 104940. [Google Scholar] [CrossRef]
- Xia, M.; Liu, W.; Wang, K.; Song, W.; Chen, C.; Li, Y. Non-intrusive load disaggregation based on composite deep long short-term memory network. Expert Syst. Appl. 2020, 160, 113669. [Google Scholar] [CrossRef]
- Wang, Z.; Xia, M.; Lu, M.; Pan, L.; Liu, J. Parameter Identification in Power Transmission Systems Based on Graph Convolution Network. IEEE Trans. Power Deliv. 2021, 1. [Google Scholar] [CrossRef]
- Xia, M.; Liu, W.; Shi, B.; Weng, L.; Liu, J. Cloud/snow recognition for multispectral satellite imagery based on a multidimensional deep residual network. Int. J. Remote Sens. 2018, 40, 156–170. [Google Scholar] [CrossRef]
- Xia, M.; Zhang, X.; Liu, W.; Weng, L.; Xu, Z. Multi-Stage Feature Constraints Learning for Age Estimation. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2417–2428. [Google Scholar] [CrossRef]
- Lin, M.; Chen, Q. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.F.; Fan, Q.; Mallinar, N.; Sercu, T.; Feris, R. Big-little net: An efficient multiscale feature representation for visual and speech recognition. arXiv 2018, arXiv:1807.03848. [Google Scholar]
- Chen, Y.; Fan, H.; Xu, B.; Yan, Z.; Kalantidis, Y.; Rohrbach, M.; Shuicheng, Y.; Feng, J. Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 3435–3444. [Google Scholar]
- Cheng, B.; Xiao, R.; Wang, J.; Huang, T.; Zhang, L. High frequency residual learning for multiscale image classification. arXiv 2019, arXiv:1905.02649. [Google Scholar]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 4905–4913. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar] [CrossRef] [Green Version]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3141–3149. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-Cross Attention for Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Hughes, M. L8 SPARCS Cloud Validation Masks; US Geological Survey: Sioux Falls, SD, USA, 2016.
- Hughes, M.J.; Hayes, D.J. Automated detection of cloud and cloud shadow in single-date Landsat imagery using neural networks and spatial postprocessing. Remote Sens. 2014, 6, 4907–4926. [Google Scholar] [CrossRef] [Green Version]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. In Proceedings of the NIPS 2017 Workshop Autodiff Submission, Long Beach, CA, USA, 9 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference Learn (ICLR), San Diego, CA, USA, 5–8 May 2015. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; 630–645. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Wey, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. DenseASPP for Semantic Segmentation in Street Scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3684–3692. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5686–5696. [Google Scholar] [CrossRef] [Green Version]
Landsat 8 | ||
---|---|---|
Band | Wavelength (nm) | Resolution (m) |
1 (Coastal) | 430–450 | 30 |
2 (Blue) | 450–515 | 30 |
3 (Green) | 525–600 | 30 |
4 (Red) | 630–680 | 30 |
5 (NIR) | 845–885 | 30 |
6 (SWIR-1) | 1560–1660 | 30 |
7 (SWIR-2) | 2100–2300 | 30 |
8 (PAN) | 503–676 | 15 |
Method | PA (%) | MIoU (%) | Parameter (M) | Flops (G) |
---|---|---|---|---|
ResNet34 | ||||
ResNet50 | ||||
ResNet101 |
Method | Booster | PA (%) | MIoU (%) | Parameter (M) |
---|---|---|---|---|
ResNet50 | × | |||
ResNet50 | √ |
Method | PA (%) | MIoU (%) | Parameter (M) | Flops (G) |
---|---|---|---|---|
ResNet50 | ||||
ResNet50 + HFE | ||||
ResNet50 + HFE + MSC | ||||
ResNet50 + HFE + MSC + SPSA | ||||
ResNet50 + HFE + MSC + SPSA + SCAB | 97.22 | 93.52 |
Reduction Rate | PA (%) | MIoU (%) | Parameter (M) | Flops (G) |
---|---|---|---|---|
2 | ||||
4 | 97.14 | 93.36 | ||
6 | ||||
8 |
Method | PA (%) | MIoU (%) | Parameter (M) | Flops (G) |
---|---|---|---|---|
- | ||||
SA | ||||
SPSA | 97.22 | 93.52 |
Method | PA (%) | MPA (%) | F1 (%) | FWIoU (%) | MIoU (%) |
---|---|---|---|---|---|
SegNet [13] | |||||
BiSeNetv2 [39] | |||||
DenseASPP [40] | |||||
FCN8s [11] | |||||
DeepLabv3+ [41] | |||||
UNet | |||||
GAFRNet [7] | |||||
CloudNet [8] | |||||
PSPNet | |||||
CCNet | |||||
HRNet [42] | |||||
Ours | 97.22 | 96.60 | 95.07 | 94.61 | 93.52 |
FCN8s | |||||
CCNet | |||||
PSPNet | |||||
Ours | 97.51 | 96.89 | 95.56 | 95.15 | 94.12 |
Method | PA (%) | MPA (%) | F1 (%) | FWIoU (%) | MIoU (%) |
---|---|---|---|---|---|
SegNet | |||||
FCN8s | |||||
DenseASPP | |||||
PSPNet | |||||
CCNet | |||||
CloudNet | |||||
DeepLabv3+ | |||||
HRNet | |||||
UNet | |||||
Ours | 95.20 | 92.24 | 90.46 | 91.08 | 87.36 |
Class | Cloud (%) | Cloud Shadow (%) | Snow/Ice (%) | Waters (%) | Background (%) | Overall (%) |
---|---|---|---|---|---|---|
SegNet | ||||||
FCN8s | ||||||
DenseASPP | ||||||
PSPNet | ||||||
CCNet | ||||||
CloudNet | ||||||
DeepLabv3+ | ||||||
HRNet | ||||||
UNet | ||||||
Ours | 88.29 | 72.74 | 93.73 | 87.87 | 94.15 | 87.36 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, K.; Zhang, D.; Xia, M. CDUNet: Cloud Detection UNet for Remote Sensing Imagery. Remote Sens. 2021, 13, 4533. https://doi.org/10.3390/rs13224533
Hu K, Zhang D, Xia M. CDUNet: Cloud Detection UNet for Remote Sensing Imagery. Remote Sensing. 2021; 13(22):4533. https://doi.org/10.3390/rs13224533
Chicago/Turabian StyleHu, Kai, Dongsheng Zhang, and Min Xia. 2021. "CDUNet: Cloud Detection UNet for Remote Sensing Imagery" Remote Sensing 13, no. 22: 4533. https://doi.org/10.3390/rs13224533
APA StyleHu, K., Zhang, D., & Xia, M. (2021). CDUNet: Cloud Detection UNet for Remote Sensing Imagery. Remote Sensing, 13(22), 4533. https://doi.org/10.3390/rs13224533