Road Extraction Convolutional Neural Network with Embedded Attention Mechanism for Remote Sensing Imagery
Abstract
:1. Introduction
- This study uses the U-Net structure to achieve end-to-end road network extraction from remote sensing images.
- This study designs an attention module that combines spatial attention and channel attention to enhance spatial detail information and promote the use of spectral features.
- Based on the U-Net structure, this study embeds the residual dense connection blocks to achieve information flow transfer and feature reuse, and uses a residual dilated convolution module to achieve multiscale information extraction.
2. Methodology
2.1. U-Net Framework with Embedded Attention Mechanism
2.2. U-Net Framework
2.3. Channel and Spatial Attention Module (CSAM)
2.4. Residual Densely Connected Blocks (RDCB)
2.5. Residual Dilated Convolution Module (RDCM)
3. Experiments and Analysis
3.1. Experimental Dataset Information
3.2. Road Extraction Network Training Configuration Information
3.3. Comparison Algorithms and Quantitative Evaluation Metrics
3.4. Visual Evaluation Results
3.4.1. Visual Evaluation Results for Dense Road Network
3.4.2. Visual Evaluation Results for Sparse Road Network
3.5. Quantitative Evaluation Results
3.6. Discussion
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, J.; Qin, Q.; Gao, Z.; Zhao, J.; Ye, X. A New Approach to Urban Road Extraction Using High-Resolution Aerial Image. ISPRS Int. Geo-Inf. 2016, 5, 114. [Google Scholar] [CrossRef] [Green Version]
- Hinz, S.; Baumgartner, A.; Ebner, H. Modeling Contextual Knowledge for Controlling Road Extraction in Urban Areas. In Proceedings of the IEEE/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas (Cat. No.01EX482), Rome, Italy, 8–9 November 2001; pp. 40–44. [Google Scholar]
- Lin, X.; Shen, J.; Liang, Y. Semi-Automatic Road Tracking Using Parallel Angular Texture Signature. Intell. Autom. Soft Comput. 2012, 18, 1009–1021. [Google Scholar] [CrossRef]
- Fu, G.; Zhao, H.; Li, C.; Shi, L. Road Detection from Optical Remote Sensing Imagery Using Circular Projection Matching and Tracking Strategy. J. Indian Soc. Remote Sens. 2013, 41, 819–831. [Google Scholar] [CrossRef]
- Lin, X.; Zhang, J.; Liu, Z.; Shen, J. Semi-Automatic Extraction of Ribbon Roads from High Resolution Remotely Sensed Imagery by T-Shaped Template Matching. In Proceedings of the Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Geo-Simulation and Virtual GIS Environments, Guangzhou, China, 28–29 June 2008; p. 71470J. [Google Scholar]
- Shi, W.; Miao, Z.; Debayle, J. An integrated method for urban main-road centerline extraction from optical remotely sensed imagery. IEEE Trans. Geosci. Remote. Sens. 2013, 52, 3359–3372. [Google Scholar] [CrossRef]
- Coulibaly, I.; Spiric, N.; Lepage, R.; St-Jacques, M. Semiautomatic road extraction from VHR images based on multiscale and spectral angle in case of earthquake. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2017, 11, 238–248. [Google Scholar] [CrossRef]
- Cao, C.; Sun, Y. Automatic Road Centerline Extraction from Imagery Using Road GPS Data. Remote Sens. 2014, 6, 9014–9033. [Google Scholar] [CrossRef] [Green Version]
- Manandhar, P.; Marpu, P.R.; Aung, Z. Segmentation Based Traversing-Agent Approach for Road Width Extraction from Satellite Images Using Volunteered Geographic Information. Appl. Comput. Inf. 2021, 17, 131–152. [Google Scholar] [CrossRef]
- Mokhtarzade, M.; Zoej, M.J.V. Road Detection from High-Resolution Satellite Images Using Artificial Neural Networks. Int. J. Appl. Earth Obs. Geoinf. 2007, 9, 32–40. [Google Scholar] [CrossRef] [Green Version]
- Song, M.J.; Civco, D. Road Extraction Using SVM and Image Segmentation. Photogramm. Eng. Remote Sens. 2004, 70, 1365–1371. [Google Scholar] [CrossRef] [Green Version]
- Maurya, R.; Gupta, P.R.; Shukla, A.S. Road Extraction Using K-Means Clustering and Morphological Operations. In Proceedings of the 2011 International Conference on Image Information Processing, Shimla, India, 3–5 November 2011; pp. 1–6. [Google Scholar]
- Seppke, B.; Dreschler-Fischer, L.; Wilms, C. A Robust Concurrent Approach for Road Extraction and Urbanization Monitoring Based on Superpixels Acquired from Spectral Remote Sensing Images. In Proceedings of the ESA-SP, Prague, Czech, 9 May 2016; Volume 740, p. 113. [Google Scholar]
- Huang, X.; Zhang, L. An Adaptive Mean-Shift Analysis Approach for Object Extraction and Classification From Urban Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2008, 46, 4173–4185. [Google Scholar] [CrossRef]
- Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2811–2821. [Google Scholar] [CrossRef]
- Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G.-S. Remote Sensing Image Scene Classification Meets Deep Learning: Challenges, Methods, Benchmarks, and Opportunities. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 3735–3756. [Google Scholar] [CrossRef]
- Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
- Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
- Wang, S.; Chen, W.; Xie, S.M.; Azzari, G.; Lobell, D.B. Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery. Remote Sens. 2020, 12, 207. [Google Scholar] [CrossRef] [Green Version]
- Kemker, R.; Salvaggio, C.; Kanan, C. Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery Using Deep Learning. ISPRS J. Photogramm. Remote Sens. 2018, 145, 60–77. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef] [Green Version]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A Deep Learning Framework for Semantic Segmentation of Remotely Sensed Data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef] [Green Version]
- Wei, Y.; Wang, Z.; Xu, M. Road Structure Refined CNN for Road Extraction in Aerial Image. IEEE Geosci. Remote Sens. Lett. 2017, 14, 709–713. [Google Scholar] [CrossRef]
- Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous Extraction of Roads and Buildings in Remote Sensing Imagery with Convolutional Neural Networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
- Zhong, Z.; Li, J.; Cui, W.; Jiang, H. Fully Convolutional Networks for Building and Road Extraction: Preliminary Results. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1591–1594. [Google Scholar]
- Panboonyuen, T.; Vateekul, P.; Jitkajornwanich, K.; Lawawirojwong, S. An Enhanced Deep Convolutional Encoder-Decoder Network for Road Segmentation on Aerial Imagery. Advances in Intelligent Systems and Computing. In Recent Advances in Information and Communication Technology 2017; Meesad, P., Sodsee, S., Unger, H., Eds.; Springer International Publishing: Cham, Germany, 2018; Volume 566, pp. 191–201. ISBN 978-3-319-60662-0. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Xie, Z.; Feng, Y.; Chen, Z. Road Extraction from High-Resolution Remote Sensing Imagery Using Deep Learning. Remote Sens. 2018, 10, 1461. [Google Scholar] [CrossRef] [Green Version]
- Chaurasia, A.; Culurciello, E. Linknet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), Petersburg, FL, USA, 10–13 December 2013; pp. 1–4. [Google Scholar]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 192–1924. [Google Scholar]
- He, H.; Yang, D.; Wang, S.; Wang, S.; Li, Y. Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss. Remote Sens. 2019, 11, 1015. [Google Scholar] [CrossRef] [Green Version]
- Wulamu, A.; Shi, Z.; Zhang, D.; He, Z. Multiscale Road Extraction in Remote Sensing Images. Comput. Intell. Neurosci. 2019, 2019, 2373798. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mosinska, A.; Marquez-Neila, P.; Koziński, M.; Fua, P. Beyond the pixel-wise loss for topology-aware delineation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3136–3145. [Google Scholar]
- Wan, J.; Xie, Z.; Xu, Y.; Chen, S.; Qiu, Q. DA-RoadNet: A dual-attention network for road extraction from high resolution satellite imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 6302–6315. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–21 June 2019; pp. 3146–3154. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Kim, J.-H.; Choi, J.-H.; Cheon, M.; Lee, J.-S. RAM: Residual Attention Module for Single Image Super-Resolution. arXiv 2018, arXiv:1811.12043 [cs.CV]. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. Lecture Notes in Computer Science. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Germany, 2018; Volume 11211, pp. 294–310. ISBN 978-3-030-01233-5. [Google Scholar]
- Guo, H.; Shi, Q.; Du, B.; Zhang, L.; Wang, D.; Ding, H. Scene-Driven Multitask Parallel Attention Network for Building Extraction in High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4287–4306. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Jiao, L.; Huo, L.; Hu, C.; Tang, P. Refined UNet: UNet-Based Refinement Network for Cloud and Shadow Precise Segmentation. Remote Sens. 2020, 12, 2001. [Google Scholar] [CrossRef]
- McGlinchy, J.; Johnson, B.; Muller, B.; Joseph, M.; Diaz, J. Application of UNet Fully Convolutional Neural Network to Impervious Surface Segmentation in Urban Environment from High Resolution Satellite Imagery. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3915–3918. [Google Scholar]
- Shamsolmoali, P.; Zareapoor, M.; Wang, R.; Zhou, H.; Yang, J. A Novel Deep Structure U-Net for Sea-Land Segmentation in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 3219–3232. [Google Scholar] [CrossRef] [Green Version]
- Li, R.; Liu, W.; Yang, L.; Sun, S.; Hu, W.; Zhang, F.; Li, W. DeepUNet: A Deep Fully Convolutional Network for Pixel-Level Sea-Land Segmentation. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2018, 11, 3954–3962. [Google Scholar] [CrossRef] [Green Version]
- Cao, K.; Zhang, X. An improved res-unet model for tree species classification using airborne high-resolution images. Remote Sens. 2020, 12, 1128. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Super-Resolution. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
- Wei, Y.; Ji, S. Scribble-based weakly supervised deep learning for road surface extraction from remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
Module Name | Module Composition | Parameter Configuration |
---|---|---|
Channel and Spatial Attention Module | channel attention block | See the configuration information of the channel attention block below for details |
spatial attention block | See the configuration information of the spatial attention block below for details. | |
convolutional layer | (input channel = nFeat × 2, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1), ReLU | |
Channel Attention Block | pooling layer | average pooling layer: Adaptive Average Pooling |
pooling layer | max pooling layer: Adaptive Max Pooling | |
fully connected layer | layer 1: (input channel = nFeat, output channel = nFeat/16, kernel size = 1 × 1, stride = 1, pad = 0), ReLU | |
fully connected layer | layer 2: (input channel = nFeat/16, output channel = nFeat, kernel size = 1 × 1, stride = 1, pad = 0), Sigmoid | |
fully connected layer | layer 3: (input channel = nFeat, output channel = nFeat/16, kernel size = 1 × 1, stride = 1, pad = 0), ReLU | |
fully connected layer | layer 4: (input channel = nFeat/16, output channel = nFeat, kernel size = 1 × 1, stride = 1, pad = 0), Sigmoid | |
convolutional layer | layer 1: (input channel = nFeat × 2, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1), ReLU | |
Spatial Attention Block | convolutional layer | layer1: (input channel = nFeat, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1), ReLU |
convolutional layer | layer2: (input channel = nFeat, output channel = 1, kernel size = 1 × 1, stride = 1, pad = 0), Sigmoid | |
convolutional layer | layer3: (input channel = nFeat, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1), ReLU | |
ResidualDilated Convolution Module | Dilated convolutional layer | layer 1: (input channel = nFeat, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1, dilation = 1), ReLU |
Dilated convolutional layer | layer 2: (input channel = nFeat, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1, dilation = 2), ReLU | |
Dilated convolutional layer | layer 3: (input channel = nFeat, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1, dilation = 4), ReLU | |
Dilated convolutional layer | layer 4: (input channel = nFeat, output channel = nFeat, kernel size = 3 × 3, stride = 1, pad = 1, dilation = 8), ReLU | |
Residual Densely Connected Blocks | convolutional layer | (input channel = nFeat, output channel = nFeat/2, kernel size = 3 × 3, stride = 1, pad = 1), ReLU |
1 × 1 convolutional layer | (input channel = nFeat/2, output channel = nFeat, kernel size = 1 × 1, stride = 1, pad = 0) |
Method | Accuracy | Precision | Recall | F1_Score | IoU |
---|---|---|---|---|---|
U-Net | 0.988 | 0.769 | 0.782 | 0.763 | 0.630 |
LinkNet34 | 0.978 | 0.539 | 0.966 | 0.680 | 0.528 |
DlinkNet | 0.982 | 0.584 | 0.975 | 0.724 | 0.576 |
RENA | 0.989 | 0.784 | 0.770 | 0.764 | 0.631 |
Method | Accuracy | Precision | Recall | F1_Score | IoU |
---|---|---|---|---|---|
RENA | 0.989 | 0.784 | 0.770 | 0.764 | 0.631 |
CAB removed | 0.988 | 0.807 | 0.722 | 0.747 | 0.611 |
SAB removed | 0.988 | 0.784 | 0.745 | 0.750 | 0.611 |
RDCM removed | 0.987 | 0.758 | 0.748 | 0.734 | 0.595 |
RDCB removed | 0.987 | 0.739 | 0.763 | 0.732 | 0.596 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shao, S.; Xiao, L.; Lin, L.; Ren, C.; Tian, J. Road Extraction Convolutional Neural Network with Embedded Attention Mechanism for Remote Sensing Imagery. Remote Sens. 2022, 14, 2061. https://doi.org/10.3390/rs14092061
Shao S, Xiao L, Lin L, Ren C, Tian J. Road Extraction Convolutional Neural Network with Embedded Attention Mechanism for Remote Sensing Imagery. Remote Sensing. 2022; 14(9):2061. https://doi.org/10.3390/rs14092061
Chicago/Turabian StyleShao, Shiwei, Lixia Xiao, Liupeng Lin, Chang Ren, and Jing Tian. 2022. "Road Extraction Convolutional Neural Network with Embedded Attention Mechanism for Remote Sensing Imagery" Remote Sensing 14, no. 9: 2061. https://doi.org/10.3390/rs14092061
APA StyleShao, S., Xiao, L., Lin, L., Ren, C., & Tian, J. (2022). Road Extraction Convolutional Neural Network with Embedded Attention Mechanism for Remote Sensing Imagery. Remote Sensing, 14(9), 2061. https://doi.org/10.3390/rs14092061