HRU-Net: High-Resolution Remote Sensing Image Road Extraction Based on Multi-Scale Fusion
Abstract
:1. Introduction
2. Methods
2.1. U-Net
- (1)
- Convolution: Convolution is the process of filtering the input image or feature map to extract its feature information [26]. In the U-Net, convolution operations usually use a 3 × 3 convolution kernel to convolve the input feature map and obtain the output feature map. The convolution operation can be expressed by Equation (1):
- (2)
- Pooling: Pooling operations involve the downsampling of input feature maps, decreasing the resolution and dimensions to enhance the computational efficiency and diminish the complexity [27]. Pooling techniques can be categorized into two primary types: max pooling and average pooling. Max pooling slides a fixed-size window across the input feature map, selecting the maximum value within the window for output. This operation accentuates prominent features while effectively reducing spatial resolution. Conversely, average pooling utilizes the average value within the window as output, smoothing the input feature map’s information and efficiently lowering the spatial resolution. In the U-Net, 2 × 2 max pooling is typically employed, extracting the maximum value from four adjacent pixels in the input feature map.
- (3)
- Activation Function: The activation function applies a nonlinear transformation to the outcomes of convolution and pooling processes, bolstering their expressive capabilities. Within the HRU-Net, the rectified linear unit (ReLU) function is employed, offering rapid convergence and enhanced generalization capacity [28]. The ReLU function can be expressed by Equation (2):
- (4)
- Normalization Layer: The normalization layer is a prevalent structure in neural networks, responsible for normalizing the input data to stabilize and regulate the distribution. In this study, batch normalization was employed, normalizing each layer’s input data within the network [29]. This process results in more stable data distribution, facilitating model training acceleration and enhancing generalization capabilities.
- (5)
- Upsampling: Upsampling is the process of enlarging low-resolution feature maps into high-resolution ones [30]. Typically, it is used alongside downsampling as a conventional feature extraction technique. Within the realm of deep learning, two principal techniques are utilized for the operation of upsampling, namely transposed convolution (also referred to as deconvolution) and bilinear interpolation. In the current investigation, our choice fell upon bilinear interpolation to serve as the upsampling module. This preference is rooted in several key factors: Firstly, bilinear interpolation is more computationally efficient and faster compared to transposed convolution. Secondly, bilinear interpolation is devoid of any parameters that require learning, thereby simplifying the model and reducing the potential for overfitting. Lastly, it avoids the so-called “checkerboard effect” that can result from transposed convolution, thereby ensuring a smoother output image [31].
2.2. HRNet
2.3. HRU-Net
2.3.1. Multi-Feature Module Construction
- (a)
- UMR Module
- (b)
- MRF Module
2.3.2. Realization Principle
2.3.3. Network Structure
3. Experiments and Results
3.1. Dataset Descriptions
3.1.1. Massachusetts Road Dataset
3.1.2. DeepGlobe Road Dataset
3.2. Experimental Settings
3.2.1. Hyperparameter Settings
3.2.2. Training Environment Description
3.2.3. Evaluation Metrics
3.3. Results and Analysis
3.3.1. Test on Massachusetts Road Dataset
3.3.2. Test on DeepGlobe Road Dataset
3.4. Ablation Experiment
3.4.1. Exploring the Impact of Modules on the Network
3.4.2. Exploring the Effect of the Number of Modules on the Network
3.5. Computational Efficiency
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, W.; Yang, N.; Zhang, Y.; Wang, F.; Cao, T.; Eklund, P. A review of road extraction from remote sensing images. J. Traffic Transp. Eng. 2016, 3, 271–282. [Google Scholar] [CrossRef] [Green Version]
- Huang, X.; Zhang, L. Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines. Int. J. Remote Sens. 2009, 30, 1977–1987. [Google Scholar] [CrossRef]
- Bicego, M.; Dalfini, S.; Vernazza, G.; Murino, V. Automatic road extraction from aerial images by probabilistic contour tracking. In Proceedings of the 2003 International Conference on Image Processing (Cat. No.03CH37429), Barcelona, Spain, 14–17 September 2003; Volume 3, p. III-585. [Google Scholar] [CrossRef]
- Baumgartner, A.; Steger, C.; Mayer, H.; Eckstein, W.; Ebner, H. Automatic road extraction based on multi-scale, grouping, and context. Photogramm. Eng. Remote Sens. 1999, 65, 777–786. [Google Scholar]
- Xu, Y.; Xie, Z.; Feng, Y.; Chen, Z. Road Extraction from High-Resolution Remote Sensing Imagery Using Deep Learning. Remote Sens. 2018, 10, 1461. [Google Scholar] [CrossRef] [Green Version]
- Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
- Choi, S.; Do, M. Development of the Road Pavement Deterioration Model Based on the Deep Learning Method. Electronics 2020, 9, 3. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Hinton, G.E. Learning to Detect Roads in High-Resolution Aerial Images. In Computer Vision—ECCV 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 210–223. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. Available online: https://openaccess.thecvf.com/content_cvpr_2015/html/Long_Fully_Convolutional_Networks_2015_CVPR_paper.html (accessed on 6 March 2023).
- Buslaev, A.; Seferbekov, S.; Iglovikov, V.; Shvets, A. Fully Convolutional Network for Automatic Road Extraction From Satellite Imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 207–210. Available online: https://openaccess.thecvf.com/content_cvpr_2018_workshops/w4/html/Buslaev_Fully_Convolutional_Network_CVPR_2018_paper.htm (accessed on 6 March 2023).
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef] [Green Version]
- Hou, Y.; Liu, Z.; Zhang, T.; Li, Y. C-UNet: Complement UNet for Remote Sensing Road Extraction. Sensors 2021, 6, 2153. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef] [Green Version]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet With Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 182–186. Available online: https://openaccess.thecvf.com/content_cvpr_2018_workshops/w4/html/Zhou_D-LinkNet_LinkNet_With_CVPR_2018_paper.html (accessed on 6 March 2023).
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2016, arXiv:1511.07122. [Google Scholar]
- Zhu, Q.; Li, Z.; Zhang, Y.; Guan, Q. Building extraction from high spatial resolution remote sensing images via multiscale-aware and segmentation-prior conditional random fields. Remote Sens. 2020, 12, 3983. [Google Scholar] [CrossRef]
- Cheng, G.; Zhu, F.; Xiang, S.; Wang, Y.; Pan, C. Accurate urban road centerline extraction from VHR imagery via multiscale segmentation and tensor voting. Neurocomputing 2016, 205, 407–420. [Google Scholar] [CrossRef] [Green Version]
- Du, S.; Du, S.; Liu, B.; Zhang, X. Context-enabled extraction of large-scale urban functional zones from very-high-resolution images: A multiscale segmentation approach. Remote Sens. 2019, 11, 1902. [Google Scholar] [CrossRef] [Green Version]
- Salembier, P.; Serra, J.C. Morphological multiscale image segmentation. In Proceedings of the Visual Communications and Image Processing’92, Boston, MA, USA, 16 November 1992; pp. 620–631. [Google Scholar] [CrossRef]
- Wu, Y.; Xia, Y.; Song, Y.; Zhang, Y.; Cai, W. Multiscale network followed network model for retinal vessel segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018, proceedings of the 21st International Conference, Granada, Spain, 16–20 September 2018, Proceedings, Part II 11; Springer: Berlin/Heidelberg, Germany, 2018; pp. 119–126. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. Available online: https://openaccess.thecvf.com/content_CVPR_2019/html/SunDeep_High-Resolution_Representation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.html (accessed on 6 March 2023).
- Xiao, D.; Yin, L.; Fu, Y. Open-Pit Mine Road Extraction From High-Resolution Remote Sensing Images Using RATT-UNet. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Abdollahi, A.; Pradhan, B.; Alamri, A. VNet: An End-to-End Fully Convolutional Neural Network for Road Extraction From High-Resolution Remote Sensing Data. IEEE Access 2020, 8, 179424–179436. [Google Scholar] [CrossRef]
- Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 international conference on engineering and technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Santos, C.D.; Tan, M.; Xiang, B.; Zhou, B. Attentive Pooling Networks. arXiv 2016, arXiv:1602.03609. [Google Scholar]
- Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 2017, 94, 103–114. [Google Scholar] [CrossRef] [Green Version]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Joint Bilateral Upsampling|ACM Transactions on Graphics. Available online: https://dl.acm.org/doi/abs/10.1145/1276377.1276497 (accessed on 7 March 2023).
- Bilinear Interpolation of Digital Images—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/abs/pii/0304399181900619 (accessed on 8 July 2023).
- Chen, D.; Zhong, Y.; Zheng, Z.; Ma, A.; Lu, X. Urban road mapping based on an end-to-end road vectorization mapping network framework. ISPRS J. Photogramm. Remote Sens. 2021, 178, 345–365. [Google Scholar] [CrossRef]
- Jiang, X.; Li, Y.; Jiang, T.; Xie, J.; Wu, Y.; Cai, Q.; Jiang, J.; Xu, J.; Zhang, H. RoadFormer: Pyramidal deformable vision transformers for road network extraction with remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 102987. [Google Scholar] [CrossRef]
- Zhong, Z.; Li, J.; Cui, W.; Jiang, H. Fully convolutional networks for building and road extraction: Preliminary results. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1591–1594. [Google Scholar] [CrossRef]
- Mena, J.B. State of the art on automatic road extraction for GIS update: A novel classification. Pattern Recognit. Lett. 2003, 24, 3037–3058. [Google Scholar] [CrossRef]
- Tan, J.; Gao, M.; Yang, K.; Duan, T. Remote sensing road extraction by road segmentation network. Appl. Sci. 2021, 11, 5050. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
- Lian, R.; Wang, W.; Mustafa, N.; Huang, L. Road extraction methods in high-resolution remote sensing images: A comprehensive review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5489–5507. [Google Scholar] [CrossRef]
- Zhu, Q.; Zhang, Y.; Wang, L.; Zhong, Y.; Guan, Q.; Lu, X.; Zhang, L.; Li, D. A global context-aware and batch-independent network for road extraction from VHR satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 353–365. [Google Scholar] [CrossRef]
Real Category | Predictive Results | |
---|---|---|
Road | Non-Road | |
Road | True positive (TP) | False negative (FN) |
Non -road | False positive (FP) | True negative (TN) |
Scheme | Network | Precision (%) | Recall (%) | IoU (%) |
---|---|---|---|---|
One | U-Net | 78.82 | 83.76 | 77.67 |
Two | ResNet | 78.93 | 83.25 | 77.90 |
Three | Deeplabv3 | 79.56 | 83.90 | 75.45 |
Four | ResUnet | 79.29 | 83.85 | 77.97 |
Five | HRNet | 77.96 | 83.90 | 77.54 |
Six | HRU-Net (ours) | 80.09 | 84.85 | 78.62 |
Network | Precision (%) | Recall (%) | IoU (%) |
---|---|---|---|
HRNet | 81.55 | 83.09 | 73.23 |
U-Net | 83.43 | 84.45 | 75.36 |
HRU-Net (ours) | 86.06 | 85.2 | 77.23 |
Scheme | Precision (%) | Recall (%) | IoU (%) |
---|---|---|---|
Remove UMR | 78.92 | 83.14 | 77.91 |
Remove MRF | 79.19 | 83.47 | 78.06 |
HRU-Net (ours) | 80.09 | 84.85 | 78.62 |
Network | Precision (%) | Recall (%) | IoU (%) |
---|---|---|---|
Baseline Model | 83.91 | 84.76 | 75.83 |
Plan 1 | 84.45 | 85.03 | 76.32 |
Plan 2 | 85.76 | 85.12 | 76.71 |
HRU-Net (ours) | 86.06 | 85.2 | 77.23 |
Network | Parameters (M) | FLOPS (GLOPS) |
---|---|---|
U-Net | 29.95 | 5.64 |
ResNet | 25.56 | 5.40 |
Deeplabv3 | 5.87 | 6.61 |
ResUnet | 38.52 | 8.64 |
HRNet | 28.53 | 4.66 |
HRU-Net (ours) | 32.78 | 6.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, A.; Ren, C.; Yan, Z.; Xue, X.; Yue, W.; Wei, Z.; Liang, J.; Zhang, X.; Lin, X. HRU-Net: High-Resolution Remote Sensing Image Road Extraction Based on Multi-Scale Fusion. Appl. Sci. 2023, 13, 8237. https://doi.org/10.3390/app13148237
Yin A, Ren C, Yan Z, Xue X, Yue W, Wei Z, Liang J, Zhang X, Lin X. HRU-Net: High-Resolution Remote Sensing Image Road Extraction Based on Multi-Scale Fusion. Applied Sciences. 2023; 13(14):8237. https://doi.org/10.3390/app13148237
Chicago/Turabian StyleYin, Anchao, Chao Ren, Zhiheng Yan, Xiaoqin Xue, Weiting Yue, Zhenkui Wei, Jieyu Liang, Xudong Zhang, and Xiaoqi Lin. 2023. "HRU-Net: High-Resolution Remote Sensing Image Road Extraction Based on Multi-Scale Fusion" Applied Sciences 13, no. 14: 8237. https://doi.org/10.3390/app13148237
APA StyleYin, A., Ren, C., Yan, Z., Xue, X., Yue, W., Wei, Z., Liang, J., Zhang, X., & Lin, X. (2023). HRU-Net: High-Resolution Remote Sensing Image Road Extraction Based on Multi-Scale Fusion. Applied Sciences, 13(14), 8237. https://doi.org/10.3390/app13148237