Automated Processing of Remote Sensing Imagery Using Deep Semantic Segmentation: A Building Footprint Extraction Case
Abstract
:1. Introduction
2. Related Work
3. Dataset
4. Method Description
4.1. Image Preparation
4.2. Predictions Fusion
4.3. Model Implementation and Training
5. Results and Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Zhang, Y.; Gong, W.; Sun, J.; Li, W. Web-Net: A Novel Nest Networks with Ultra-Hierarchical Sampling for Building Extraction from Aerial Imageries. Remote Sens. 2019, 11, 1897. [Google Scholar] [CrossRef] [Green Version]
- Lippitt, C.D.; Zhang, S. The impact of small unmanned airborne platforms on passive optical remote sensing: A conceptual perspective. Int. J. Remote Sens. 2018, 39, 4852–4868. [Google Scholar] [CrossRef]
- Yi, Y.; Zhang, Z.; Zhang, W.; Zhang, C.; Li, W.; Zhao, T. Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network. Remote Sens. 2019, 11, 1774. [Google Scholar] [CrossRef] [Green Version]
- Gao, L.; Shi, W.; Miao, Z.; Lv, Z. Method Based on Edge Constraint and Fast Marching for Road Centerline Extraction from Very High-Resolution Remote Sensing Images. Remote Sens. 2018, 10, 900. [Google Scholar] [CrossRef] [Green Version]
- Mou, L.; Zhu, X.X. Vehicle Instance Segmentation from Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6699–6711. [Google Scholar] [CrossRef] [Green Version]
- Nie, X.; Duan, M.; Ding, H.; Hu, B.; Wong, E.K. Attention Mask R-CNN for Ship Detection and Segmentation from Remote Sensing Images. IEEE Access 2020, 8, 9325–9334. [Google Scholar] [CrossRef]
- Li, W.; Hsu, C.Y. Automated terrain feature identification from remote sensing imagery: A deep learning approach. Int. J. Geogr. Inf. Sci. 2020, 34, 637–660. [Google Scholar] [CrossRef]
- Ye, Z.; Fu, Y.; Gan, M.; Deng, J.; Comber, A.; Wang, K. Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network. Remote Sens. 2019, 11, 2970. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems; Papers NIPS. CC; MIT Press: Cambridge, MA, USA, 1990; pp. 396–404. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 3226–3229. [Google Scholar]
- Castelluccio, M.; Poggi, G.; Sansone, C.; Verdoliva, L. Land Use Classification in Remote Sensing Images by Convolutional Neural Networks. Available online: http://arxiv.org/abs/1508.00092 (accessed on 27 January 2020).
- Marmanis, D.; Datcu, M.; Esch, T.; Stilla, U. Deep learning earth observation classification using ImageNet pretrained networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 105–109. [Google Scholar] [CrossRef] [Green Version]
- Srivastava, S.; Vargas Muñoz, J.E.; Lobry, S.; Tuia, D. Fine-grained landuse characterization using ground-based pictures: A deep learning solution based on globally available data. Int. J. Geogr. Inf. Sci. 2020, 34, 1117–1136. [Google Scholar] [CrossRef]
- Wojek, C.; Dorkó, G.; Schulz, A.; Schiele, B. Sliding-windows for rapid object class localization: A parallel technique. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Munich, Germany, 10–13 June 2008; Volume 5096, pp. 71–81. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Hypercolumns for Object Segmentation and Fine-grained Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 447–456. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Noh, H.; Hong, S.; Han, B. Learning Deconvolution Network for Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Kirillov, A.; He, K.; Girshick, R.; Dollár, P. A Unified Architecture for Instance and Semantic Segmentation. Available online: http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf (accessed on 24 January 2020).
- Sun, G.; Huang, H.; Zhang, A.; Li, F.; Zhao, H.; Fu, H. Fusion of Multiscale Convolutional Neural Networks for Building Extraction in Very High-Resolution Images. Remote Sens. 2019, 11, 227. [Google Scholar] [CrossRef] [Green Version]
- Feng, Y.; Thiemann, F.; Sester, M. Learning Cartographic Building Generalization with Deep Convolutional Neural Networks. ISPRS Int. J. Geo-Inf. 2019, 8, 258. [Google Scholar] [CrossRef] [Green Version]
- Schuegraf, P.; Bittner, K. Automatic Building Footprint Extraction from Multi-Resolution Remote Sensing Images Using a Hybrid FCN. ISPRS Int. J. Geo-Inf. 2019, 8, 191. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef] [Green Version]
- Guo, R.; Liu, J.; Li, N.; Liu, S.; Chen, F.; Cheng, B.; Duan, J.; Li, X.; Ma, C. Pixel-Wise Classification Method for High Resolution Remote Sensing Imagery Using Deep Neural Networks. ISPRS Int. J. Geo-Inf. 2018, 7, 110. [Google Scholar] [CrossRef] [Green Version]
- Xie, Y.; Cai, J.; Bhojwani, R.; Shekhar, S.; Knight, J. A locally-constrained YOLO framework for detecting small and densely-distributed building footprints. Int. J. Geogr. Inf. Sci. 2020, 34, 777–801. [Google Scholar] [CrossRef]
- Inria Aerial Image Labeling Dataset. Available online: https://project.inria.fr/aerialimagelabeling/ (accessed on 11 August 2020).
- Keras: The Python Deep Learning Library. Available online: https://keras.io (accessed on 24 January 2020).
- TensorFlow: An End-To-End Open Source Machine Learning Platform. Available online: https://www.tensorflow.org/ (accessed on 24 January 2020).
- Milosavljević, A. Inria Aerial Image Labeling—Building Footprint Extraction Using Deep Semantic Segmentation. Available online: https://github.com/a-milosavljevic/inria-aerial-image-labeling (accessed on 11 August 2020).
- Yakubovskiy, P. Segmentation Models, Github Library. Available online: https://github.com/qubvel/segmentation_models (accessed on 24 January 2020).
- Chaurasia, A.; Culurciello, E. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing, VCIP 2017, St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K.; San Diego, U. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2017; Volume 10553, pp. 240–248. [Google Scholar]
- Hinton, G.; Srivastava, N.; Swersky, K. Neural Networks for Machine Learning, Lecture 6a Overview of Mini-Batch Gradient Descent. Available online: http://www.cs.toronto.edu/~hinton/coursera/lecture6/lec6.pdf (accessed on 24 January 2020).
- Jordan, J. Setting the Learning Rate of Your Neural Network. Available online: https://www.jeremyjordan.me/nn-learning-rate/ (accessed on 24 January 2020).
- Milosavljević, A. Building Footprint Extraction Using Deep Semantic Segmentation. Figshare. Figure. Available online: https://doi.org/10.6084/m9.figshare.11816616.v1 (accessed on 11 August 2020).
Backbone | I/O Size | Val. IoU | Val. Accuracy | Combined | |
---|---|---|---|---|---|
U-Net | ResNet-34 | 384 × 384 | 71.53 | 96.79 | 84.16 |
FPN | ResNet-34 | 72.74 | 96.88 | 84.81 | |
LinkNet | ResNet-34 | 70.87 | 96.67 | 83.77 | |
PSPNet | ResNet-34 | 65.89 | 96.07 | 80.98 | |
FPN | SEResNet-34 | 72.25 | 96.83 | 84.54 | |
FPN | ResNeXt-50 | 73.62 | 96.91 | 85.26 |
Area | Austin | Chicago | Kitsap County | Western Tyrol | Vienna | |||||
---|---|---|---|---|---|---|---|---|---|---|
Image | Acc. | IoU | Acc. | IoU | Acc. | IoU | Acc. | IoU | Acc. | IoU |
1 | 97.69 | 85.52 | 96.52 | 78.69 | 99.94 | 72.87 | 98.53 | 80.94 | 96.60 | 82.50 |
2 | 97.33 | 85.31 | 91.65 | 76.95 | 99.83 | 82.84 | 99.14 | 81.68 | 96.43 | 87.70 |
3 | 97.13 | 83.15 | 93.75 | 74.47 | 99.17 | 86.78 | 98.07 | 83.04 | 93.97 | 84.63 |
4 | 98.22 | 84.65 | 94.83 | 75.87 | 98.75 | 49.01 | 98.74 | 84.54 | 96.41 | 88.32 |
5 | 98.20 | 80.98 | 95.97 | 78.32 | 99.22 | 54.22 | 99.26 | 89.88 | 96.38 | 83.12 |
6 | 97.82 | 76.31 | 93.70 | 78.98 | 99.50 | 74.60 | 98.96 | 83.88 | 95.57 | 68.58 |
All | 97.73 | 83.28 | 94.40 | 77.16 | 99.40 | 73.61 | 98.78 | 84.02 | 95.89 | 84.00 |
Overall | Accuracy: 97.24% | IoU: 81.27% |
Area | Austin | Chicago | Kitsap County | Western Tyrol | Vienna | |||||
---|---|---|---|---|---|---|---|---|---|---|
Model | Acc. | IoU | Acc. | IoU | Acc. | IoU | Acc. | IoU | Acc. | IoU |
1 | 97.73 | 83.28 | 94.40 | 77.16 | 99.40 | 73.61 | 98.78 | 84.02 | 95.89 | 84.00 |
2 | 96.65 | 81.03 | 95.02 | 79.50 | 98.07 | 59.45 | 99.42 | 83.76 | 95.02 | 86.93 |
3 | 95.69 | 79.97 | 95.41 | 82.57 | 98.55 | 73.03 | 98.87 | 80.91 | 94.80 | 86.57 |
4 | 96.23 | 81.39 | 95.28 | 82.98 | 98.35 | 73.48 | 98.59 | 83.86 | 94.95 | 86.28 |
5 | 97.59 | 84.25 | 94.72 | 82.47 | 96.88 | 67.94 | 98.52 | 81.50 | 95.97 | 86.29 |
6 | 98.55 | 86.80 | 93.37 | 77.80 | 98.43 | 73.21 | 99.20 | 82.68 | 96.73 | 83.13 |
All | 97.07 | 82.32 | 94.70 | 80.48 | 98.28 | 69.84 | 98.90 | 82.80 | 95.56 | 85.84 |
Overall | Accuracy: 96.90% | IoU: 82.23% |
Area | TP [pixels] | TN [pixels] | FP [pixels] | FN [pixels] | TP [%] | TN [%] | FP [%] | FN [%] |
---|---|---|---|---|---|---|---|---|
Austin | 122,607,064 | 751,054,005 | 13,598,792 | 12,740,139 | 13.62 | 83.45 | 1.51 | 1.42 |
Chicago | 196,643,143 | 655,649,061 | 25,363,026 | 22,344,770 | 21.85 | 72.85 | 2.82 | 2.48 |
Kitsap C. | 35,833,949 | 848,692,240 | 7,614,202 | 7,859,609 | 3.98 | 94.30 | 0.85 | 0.87 |
W. Tyrol | 47,767,134 | 842,307,306 | 5,101,470 | 4,824,090 | 5.31 | 93.59 | 0.57 | 0.54 |
Vienna | 242,199,369 | 617,841,648 | 22,563,000 | 17,395,983 | 26.91 | 68.65 | 2.51 | 1.93 |
Overall | 645,050,659 | 3,715,544,260 | 74,240,490 | 65,164,591 | 14.33 | 82.57 | 1.65 | 1.45 |
Area | Bellingham | Bloomington | Innsbruck | San Francisco | Eastern Tyrol | |||||
---|---|---|---|---|---|---|---|---|---|---|
Thresh. | Acc. | IoU | Acc. | IoU | Acc. | IoU | Acc. | IoU | Acc. | IoU |
0.30 | 97.27 | 73.79 | 97.39 | 72.97 | 97.26 | 77.19 | 92.01 | 76.46 | 98.20 | 80.33 |
0.35 | 97.32 | 73.90 | 97.38 | 72.56 | 97.29 | 77.28 | 92.01 | 76.05 | 98.23 | 80.41 |
0.40 | 97.35 | 73.90 | 97.36 | 72.05 | 97.32 | 77.31 | 91.88 | 75.28 | 98.24 | 80.35 |
0.45 | 97.37 | 73.81 | 97.32 | 71.44 | 97.33 | 77.28 | 91.63 | 74.17 | 98.24 | 80.16 |
0.50 | 97.37 | 73.65 | 97.28 | 70.73 | 97.34 | 77.21 | 91.28 | 72.73 | 98.23 | 79.86 |
Comb. | 97.35 | 73.90 | 97.39 | 72.97 | 97.32 | 77.31 | 92.01 | 76.46 | 98.23 | 80.41 |
Overall | Accuracy: 96.46% | IoU: 76.27% |
© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Milosavljević, A. Automated Processing of Remote Sensing Imagery Using Deep Semantic Segmentation: A Building Footprint Extraction Case. ISPRS Int. J. Geo-Inf. 2020, 9, 486. https://doi.org/10.3390/ijgi9080486
Milosavljević A. Automated Processing of Remote Sensing Imagery Using Deep Semantic Segmentation: A Building Footprint Extraction Case. ISPRS International Journal of Geo-Information. 2020; 9(8):486. https://doi.org/10.3390/ijgi9080486
Chicago/Turabian StyleMilosavljević, Aleksandar. 2020. "Automated Processing of Remote Sensing Imagery Using Deep Semantic Segmentation: A Building Footprint Extraction Case" ISPRS International Journal of Geo-Information 9, no. 8: 486. https://doi.org/10.3390/ijgi9080486
APA StyleMilosavljević, A. (2020). Automated Processing of Remote Sensing Imagery Using Deep Semantic Segmentation: A Building Footprint Extraction Case. ISPRS International Journal of Geo-Information, 9(8), 486. https://doi.org/10.3390/ijgi9080486