EffResUNet: Encoder Decoder Architecture for Cloud-Type Segmentation
Abstract
:1. Introduction
- Accurate cloud predictions can help in predicting rainfall and climate early, which will help the farmers to take actions accordingly.
- Climate preparedness is a thing which can help people to prepare for a calamity beforehand, and such predictions are even more accurate if the particular cloud types are known.
- We implemented efficient pre-processing and post-processing techniques using Albumentations along with Test Time Augmentation (TTA) to help achieve state-of-the-art results with limited resources.
- We suggest using techniques such as the attention mechanism and leveraging transfer learning to achieve state-of-the-art results in less time efficiently.
2. Related Work
2.1. Standard Encoder–Decoder Network Architecture (UNet)
2.2. Previous Methodologies
3. Dataset and Data Processing
3.1. Dataset
3.2. Pre-Processing
Post-Processing
4. Modules Used
- EfficientNet encoder;
- Residual block decoder;
- Attention mechanism.
4.1. EfficientNet
4.2. Residual Blocks
4.3. Attention Mechanism
5. Proposed Architecture
5.1. Encoder-Decoder Network
5.2. Training and Testing
- Custom data generators were created for training and validation with a batch size of 8 and the image input size of 320 × 480. This batch size and input size was chosen in particular due to the CPU memory constraints.
- A combination(sum) of these was used, which gives clear boundaries and works much better than a single loss function individually.
- The model was trained for 30 epochs. As a pre-trained encoder was used, the model tended to overfit beyond 30 epochs, thereby showing no significant improvement in the scores.
5.2.1. Dice Loss
5.2.2. Binary Cross-Entropy
5.2.3. Optimizer
6. Results
6.1. Evaluation Metrics
6.1.1. Intersection-Over-Union (IoU) [31]
6.1.2. F1 Score
6.1.3. Dice Coefficient
6.2. Output Comparison
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Arking, A. The Radiative Effects of Clouds and their Impact on Climate. Bull. Am. Meteorol. Soc. 1991, 72, 795–814. [Google Scholar] [CrossRef]
- Song, X.; Liu, Z.; Zhao, Y. Cloud detection and analysis of MODIS image. In Proceedings of the 2004 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2004), Anchorage, AK, USA, 20–24 September 2004; Volume 4, pp. 2764–2767. [Google Scholar]
- Mahajan, S.; Fataniya, B. Cloud detection methodologies: Variants and development—A review. Complex Intell. Syst. 2019, 6, 251–261. [Google Scholar] [CrossRef] [Green Version]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images. Remote Sens. 2017, 9, 368. [Google Scholar] [CrossRef] [Green Version]
- Henry, C.; Azimi, S.M.; Merkle, N. Road Segmentation in SAR Satellite Images With Deep Fully Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1867–1871. [Google Scholar] [CrossRef] [Green Version]
- Arbelaez, P.; Hariharan, B.; Gu, C.; Gupta, S.; Bourdev, L.; Malik, J. Semantic segmentation using regions and parts. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), Providence, RI, USA, 16–21 June 2012; pp. 3378–3385. [Google Scholar] [CrossRef]
- Shanmugam, D.; Blalock, D.; Balakrishnan, G.; Guttag, J. Better Aggregation in Test-Time Augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 1214–1223. [Google Scholar]
- Mane, D.; Bidwe, R.; Zope, B.; Ranjan, N. Traffic Density Classification for Multiclass Vehicles Using Customized Convolutional Neural Network for Smart City. In Communication and Intelligent Systems; Sharma, H., Shrivastava, V., Kumari Bharti, K., Wang, L., Eds.; Springer Nature: Singapore, 2022; pp. 1015–1030. [Google Scholar]
- Brezočnik, L.; Fister, I.; Podgorelec, V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl. Sci. 2018, 8, 1521. [Google Scholar] [CrossRef] [Green Version]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Volume 97, pp. 6105–6114. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.C.H.; Heinrich, M.P.; Misawa, K.; Mori, K.; McDonagh, S.G.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Baheti, B.; Innani, S.; Gajre, S.S.; Talbar, S.N. Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Virtual, 14–19 June 2020; pp. 1473–1481. [Google Scholar]
- Li, D.; Dharmawan, D.A.; Ng, B.P.; Rahardja, S. Residual U-Net for Retinal Vessel Segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1425–1429. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef] [Green Version]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef] [Green Version]
- Kalinin, A.A.; Iglovikov, V.I.; Rakhlin, A.; Shvets, A.A. Medical Image Segmentation Using Deep Neural Networks with Pre-trained Encoders. In Deep Learning Applications; Wani, M.A., Kantardzic, M., Sayed-Mouchaweh, M., Eds.; Springer: Singapore, 2020; pp. 39–52. [Google Scholar] [CrossRef]
- Bae, M.H.; Pan, R.; Wu, T.; Badea, A. Automated segmentation of mouse brain images using extended MRF. NeuroImage 2009, 46, 717–725. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Understanding Clouds from Satellite Images Crowd-Sourcing Activity. Available online: https://www.zooniverse.org/projects/raspstephan/sugar-flower-fish-or-gravel (accessed on 7 October 2022).
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Li, W.; Ourselin, S.; Vercauteren, T. Automatic Brain Tumor Segmentation using Convolutional Neural Networks with Test-Time Augmentation. arXiv 2018, arXiv:1810.07884. [Google Scholar]
- Lee, J.; Won, T.; Hong, K. Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network. arXiv 2020, arXiv:2001.06268. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Yao, L.; Zhang, Y. Residual Attention U-Net for Automated Multi-Class Segmentation of COVID-19 Chest CT Images. arXiv 2020, arXiv:2004.05645. [Google Scholar]
- Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Virtual, 27–29 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Moltz, J.H.; Hänsch, A.; Lassen-Schmidt, B.; Haas, B.; Genghi, A.; Schreier, J.; Morgas, T.; Klein, J. Learning a Loss Function for Segmentation: A Feasibility Study. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 357–360. [Google Scholar]
- Dozat, T. Incorporating Nesterov Momentum into Adam. In Proceedings of the ICLR Workshop, San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
- Wang, Z.; Wang, E.; Zhu, Y. Image Segmentation Evaluation: A Survey of Methods. Artif. Intell. Rev. 2020, 53, 5637–5674. [Google Scholar] [CrossRef]
- Van Beers, F.; Lindström, A.; Okafor, E.; Wiering, M. Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, Prague, Czech Republic, 19–21 February 2019; Volume 1, pp. 438–445. [Google Scholar] [CrossRef]
- Bidwe, R.V.; Mishra, S.; Patil, S.; Shaw, K.; Vora, D.R.; Kotecha, K.; Zope, B. Deep Learning Approaches for Video Compression: A Bibliometric Analysis. Big Data Cogn. Comput. 2022, 6, 44. [Google Scholar] [CrossRef]
- Zope, B.; Mishra, S.; Shaw, K.; Vora, D.R.; Kotecha, K.; Bidwe, R.V. Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis. Big Data Cogn. Comput. 2022, 6, 109. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Title | Remark |
---|---|
Fully Convolutional Networks for Semantic Segmentation [17] | The first approach to introduce the family of CNNs in the field of semantic segmentation. However, the loss of spatial features was an issue besides considerable training time. |
U-Net: Convolutional Networks for Biomedical Image Segmentation [11] | Novel encoder–decoder architecture in image segmentation, usually performs better with pre-trained encoders. Feature extraction is the major work performed by this architecture. |
Residual U-Net for Retinal Vessel Segmentation [16] | UNet with residual blocks improved results significantly and increased the convergence without tampering with the spatial information. |
Attention U-Net: Learning Where to Look for the Pancreas [14] | Attention mechanisms have always helped in learning the patterns that vary in shape and size. The mechanism eliminates the need for localizing the objects and allows the model learn by itself as to which parts to focus on. |
Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment [15] | This architecture explores the advantage of EfficientNet, using the method of compound scaling combined with UNet, works as a better feature extractor. |
Dropout | 0.05 | 0.1 | 0.15 | 0.2 |
Dice coefficient | 0.509 | 0.518 | 0.503 | 0.496 |
Model Used | Validation Loss | IoU | Dice Coefficient | F1 Score |
---|---|---|---|---|
ResUNet | 0.7639 | 0.4078 | 0.5437 | 0.5553 |
EfficientNet encoder | 0.7580 | 0.4227 | 0.5537 | 0.5733 |
Efficient net encoder + residual blocks | 0.7535 | 0.4147 | 0.5504 | 0.5629 |
Our Model | 0.7424 | 0.4239 | 0.5557 | 0.5735 |
Improvement w.r.t. ResUNet | −2.81% | +3.95% | +2.2% | +3.28% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nalwar, S.; Shah, K.; Bidwe, R.V.; Zope, B.; Mane, D.; Jadhav, V.; Shaw, K. EffResUNet: Encoder Decoder Architecture for Cloud-Type Segmentation. Big Data Cogn. Comput. 2022, 6, 150. https://doi.org/10.3390/bdcc6040150
Nalwar S, Shah K, Bidwe RV, Zope B, Mane D, Jadhav V, Shaw K. EffResUNet: Encoder Decoder Architecture for Cloud-Type Segmentation. Big Data and Cognitive Computing. 2022; 6(4):150. https://doi.org/10.3390/bdcc6040150
Chicago/Turabian StyleNalwar, Sunveg, Kunal Shah, Ranjeet Vasant Bidwe, Bhushan Zope, Deepak Mane, Veena Jadhav, and Kailash Shaw. 2022. "EffResUNet: Encoder Decoder Architecture for Cloud-Type Segmentation" Big Data and Cognitive Computing 6, no. 4: 150. https://doi.org/10.3390/bdcc6040150
APA StyleNalwar, S., Shah, K., Bidwe, R. V., Zope, B., Mane, D., Jadhav, V., & Shaw, K. (2022). EffResUNet: Encoder Decoder Architecture for Cloud-Type Segmentation. Big Data and Cognitive Computing, 6(4), 150. https://doi.org/10.3390/bdcc6040150