Bayesian U-Net: Estimating Uncertainty in Semantic Segmentation of Earth Observation Images
Abstract
:1. Introduction
2. Material
2.1. Method
2.1.1. Bayesian Learning
2.1.2. Monte Carlo Dropout
2.1.3. Relation between Deep Learning, Bayesian learning and Monte Carlo Dropout
2.1.4. Architecture
2.2. Dataset
2.3. Metrics
2.4. Experiments
- 1
- Models are trained on the training samples of the dataset. The prediction is made on the test images on which a Gaussian noise of 5 dB (see Figure 3b) is added.
- 2
- Models are trained on the training samples of the dataset. The prediction is made on the test images on which a Salt and Pepper noise is added. In the images, 20% of the pixels are randomly selected and set to a value of 0 or 1 with an equal probability (see Figure 3c).
- 3
- Models are trained on a noisy database (a.k.a. label noise settings). In order to generate this noisy database, we randomly selected building polygons and set them to non-buildings. We iterate this process until at least 40% of building pixels have been swapped (see Figure 4).
3. Results
3.1. Massachussets Dataset
3.2. INRIA Dataset
3.3. ISPRS Vaihingen Dataset
3.4. Toulouse Dataset
3.5. Noise Robustness
4. Discussion
- There are many areas that are well predicted but with low confidence (dark green and black areas). These areas correspond either to classes with a wide spectral variability such as the class background that gathers very different classes, e.g., vegetation, road, dirt, water (see Figure 5d and Figure 6d) or the case when two classes are very close (e.g., low vegetation and tree in the ISPRS dataset, see Figure 7d).
- There are some areas where the predicted label is wrong, but the network is very confident in its prediction (red in the uncertainty maps). When taking a closer look at these areas, we can observe that, most of the time, they correspond to the label errors occurring in the reference database. In the Massaschussets results, we can clearly see on the RGB image that some buildings are not referenced in the database, while other building footprints are incomplete (see Figure 9). Similar errors are reported in the Toulouse dataset. A main issue that arises when labeling airborne or spatial images is overlapping classes. In Figure 10, we observe, in the ground truth, that some walking paths are located within a small forest. One can imagine that there are, indeed, walking paths, but it is impossible to detect them in the image. Therefore, both results can be considered relevant: there are walking paths (labeled in the ground truth as background), but the image actually shows trees.
5. Conclusions and Perspectives
5.1. Conclusions
5.2. Perspectives
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Marmanis, D.; Wegner, J.D.; Galliani, S.; Schindler, K.; Datcu, M.; Stilla, U. Semantic segmentation of aerial images with an ensemble of cnss. ISPRS ANnals Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 473–480. [Google Scholar] [CrossRef] [Green Version]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef] [Green Version]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 180–196. [Google Scholar]
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 27–30 June 2016; pp. 1–9. [Google Scholar]
- Kemker, R.; Salvaggio, C.; Kanan, C. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS J. Photogramm. Remote Sens. 2018, 145, 60–77. [Google Scholar] [CrossRef] [Green Version]
- Dong, R.; Pan, X.; Li, F. DenseU-net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access 2019, 7, 65347–65356. [Google Scholar] [CrossRef]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef] [Green Version]
- Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. arXiv 2017, arXiv:1706.04599. [Google Scholar]
- Naeini, M.P.; Cooper, G.; Hauskrecht, M. Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2901–2907. [Google Scholar]
- Gal, Y. Uncertainty In deep Learning. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2016. [Google Scholar]
- Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural networks. arXiv 2015, arXiv:1505.05424. [Google Scholar]
- Graves, A. Practical variational inference for neural networks. Adv. Neural Inf. Process. Syst. 2011, 1, 2348–2356. [Google Scholar]
- Shridhar, K.; Laumann, F.; Liwicki, M. A comprehensive guide to bayesian convolutional neural network with variational inference. arXiv 2019, arXiv:1901.02731. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Mukhoti, J.; Gal, Y. Evaluating bayesian deep learning methods for semantic segmentation. arXiv 2018, arXiv:1811.12709. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Kendall, A.; Badrinarayanan, V.; Cipolla, R. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv 2015, arXiv:1511.02680. [Google Scholar]
- Brostow, G.J.; Fauqueur, J.; Cipolla, R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit. Lett. 2009, 30, 88–97. [Google Scholar] [CrossRef]
- Song, S.; Lichtenberg, S.P.; Xiao, J. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 567–576. [Google Scholar]
- Gal, Y.; Hron, J.; Kendall, A. Concrete dropout. Adv. Neural Inf. Process. Syst. 2017, 1, 3581–3590. [Google Scholar]
- LaBonte, T.M.; Martinez, C.; Roberts, S.A. We Know Where We Don’t Know: 3D Bayesian CNNs for Credible Geometric Uncertainty; Technical Report; Sandia National Lab. (SNL-NM): Albuquerque, NM, USA, 2020.
- Haas, J.; Rabus, B. Uncertainty Estimation for Deep Learning-Based Segmentation of Roads in Synthetic Aperture Radar Imagery. Remote Sens. 2021, 13, 1472. [Google Scholar] [CrossRef]
- Duerr, O.; Sick, B.; Murina, E. Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability; Manning Publications: Shelter Island, NY, USA, 2020. [Google Scholar]
- Nguyen, A.; Yosinski, J.; Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 427–436. [Google Scholar]
- Su, J.; Vargas, D.V.; Sakurai, K. One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 2019, 23, 828–841. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
- Niu, R.; Sun, X.; Tian, Y.; Diao, W.; Chen, K.; Fu, K. Hybrid multiple attention network for semantic segmentation in aerial images. IEEE Trans. Geosci. Remote Sens. 2021, 1–18. [Google Scholar] [CrossRef]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Caruana, R.; Lawrence, S.; Giles, L. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. Adv. Neural Inf. Process. Syst. 2001, 1, 402–408. [Google Scholar]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, USA, 2013. [Google Scholar]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017. [Google Scholar]
- Rottensteiner, F.; Sohn, G.; Jung, J.; Gerke, M.; Baillard, C.; Benitez, S.; Breitkopf, U. The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 1, 293–298. [Google Scholar] [CrossRef] [Green Version]
- Marcu, A.; Costea, D.; Slusanschi, E.; Leordeanu, M. A multi-stage multi-task neural network for aerial scene interpretation and geolocalization. arXiv 2018, arXiv:1804.01322. [Google Scholar]
- Chen, Z.; Li, D.; Fan, W.; Guan, H.; Wang, C.; Li, J. Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images. Remote Sens. 2021, 13, 2524. [Google Scholar] [CrossRef]
Label | Background | Building | Overall |
---|---|---|---|
Precision | 59.34 | 95.69 | 77.52 |
Recall | 82.85 | 87.04 | 84.95 |
F-Score | 69.15 | 91.16 | 80.16 |
IoU | 52.85 | 83.76 | 68.30 |
Accuracy | 86.26 | 86.26 | |
0.61 | 0.65 |
Label | Background | Building | Overall |
---|---|---|---|
Precision | 93.49 | 94.43 | 93.96 |
Recall | 98.95 | 72.05 | 85.50 |
F-Score | 96.14 | 81.74 | 88.94 |
IoU | 92.57 | 69.11 | 80.84 |
Accuracy | 93.63 | 93.63 | |
0.78 | 0.93 |
Label | Impervious Surface | Building | Low Vegetation | Tree | Car | Clutter/Background | Overall |
---|---|---|---|---|---|---|---|
Precision | 97.36 | 94.04 | 89.24 | 78.07 | 80.45 | 93.58 | 88.79 |
Recall | 96.94 | 91.70 | 91.86 | 99.49 | 85.98 | 94.22 | 93.27 |
F-Score | 97.15 | 92.85 | 90.53 | 87.49 | 83.12 | 93.90 | 90.84 |
IoU | 94.46 | 86.66 | 82.70 | 77.76 | 71.11 | 88.50 | 83.53 |
Accuracy | 98.78 | 95.00 | 94.84 | 99.96 | 99.76 | 98.10 | 93.22 |
0.96 | 0.89 | 0.87 | 0.87 | 0.83 | 0.93 | 0.93 |
Label | Background | Buildings | Water | Vegetation | Overall |
---|---|---|---|---|---|
Precision | 83.67 | 74.01 | 95.45 | 75.05 | 82.80 |
Recall | 84.66 | 85.01 | 89.76 | 66.61 | 81.51 |
F-Score | 84.17 | 79.13 | 92.52 | 70.58 | 81.60 |
IoU | 72.66 | 65.47 | 86.08 | 54.53 | 69.69 |
Accuracy | 80.92 | 93.79 | 99.50 | 87.33 | 80.77 |
0.60 | 0.76 | 0.92 | 0.63 | 0.80 |
Method | Precision | Recall | F-Score | IoU | Accuracy | ||
---|---|---|---|---|---|---|---|
Standard Training | Baseline | 53.27 | 76.10 | 75.90 | 63.01 | 83.15 | 0.58 |
BU-Net | 59.34 | 82.85 | 80.16 | 68.30 | 86.26 | 0.65 | |
Gaussian noise (5dB) | Baseline | 54.13 | 71.54 | 75.54 | 62.58 | 83.06 | 0.57 |
BU-Net | 63.21 | 77.05 | 79.83 | 68.25 | 86.17 | 0.65 | |
Salt and Pepper noise | Baseline | 32.23 | 73.57 | 60.30 | 44.95 | 66.34 | 0.37 |
BU-Net | 35.98 | 86.55 | 64.03 | 48.49 | 68.87 | 0.43 | |
Label noise | Baseline | 33.61 | 42.39 | 60.43 | 47.27 | 73.72 | 0.30 |
BU-Net | 64.46 | 63.41 | 77.62 | 65.65 | 86.62 | 0.60 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dechesne, C.; Lassalle, P.; Lefèvre, S. Bayesian U-Net: Estimating Uncertainty in Semantic Segmentation of Earth Observation Images. Remote Sens. 2021, 13, 3836. https://doi.org/10.3390/rs13193836
Dechesne C, Lassalle P, Lefèvre S. Bayesian U-Net: Estimating Uncertainty in Semantic Segmentation of Earth Observation Images. Remote Sensing. 2021; 13(19):3836. https://doi.org/10.3390/rs13193836
Chicago/Turabian StyleDechesne, Clément, Pierre Lassalle, and Sébastien Lefèvre. 2021. "Bayesian U-Net: Estimating Uncertainty in Semantic Segmentation of Earth Observation Images" Remote Sensing 13, no. 19: 3836. https://doi.org/10.3390/rs13193836
APA StyleDechesne, C., Lassalle, P., & Lefèvre, S. (2021). Bayesian U-Net: Estimating Uncertainty in Semantic Segmentation of Earth Observation Images. Remote Sensing, 13(19), 3836. https://doi.org/10.3390/rs13193836