RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution
Abstract
:1. Introduction
2. Related Works
2.1. Road Extraction Methods
2.2. Transformer-Based Approaches
2.3. Feature Separation
3. Method
3.1. RoadFormer Overall Design
3.2. Encoder
3.3. Bottleneck
3.4. Decoder and Loss Function
4. Experimental Results and Analysis
4.1. Datasets and Experiment Implementation
- Deepglobe Dataset: Deepglobe is the dataset prepared for the 2018 Deepglobe road extraction challenge. This dataset includes 6226 images with a resolution of 0.5 m and a size of pixels. These RGB images in JPG format cover Thailand, India, and Indonesia, and include roads of cement, asphalt, and mountain. Each annotation image is a three channel binary image in PNG format, which uses (255, 255, 255) and (0, 0, 0) to present roads and backgrounds, respectively. In the experiment of our model, the dataset was split into the training set (4987 images) and the test set (1246 images).
- Massachusetts dataset: The Massachusetts road dataset consists of 1108 images for training, 14 images for validation, and 49 images for testing, all of which are in size. According to [44], the resolution of Massachusetts can be inferred to be about 1.5 m. The source image in TIF format is three channel color image and its label in TIFF format is a binary image that uses white and black to distinguish roads and backgrounds. The roads of cement and asphalt are the main types in this dataset.
4.2. Evaluation Metrics
4.3. Ablation Experiments
4.4. Comparative Experiments
4.4.1. Experiments on the Deepglobe Dataset
4.4.2. Experiments on the Massachusetts Dataset
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Wei, Y.; Zhang, K.; Ji, S. Simultaneous Road Surface and Centerline Extraction from Large-Scale Remote Sensing Images Using CNN-Based Segmentation and Tracing. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8919–8931. [Google Scholar] [CrossRef]
- Yang, F.; Wang, H.; Jin, Z. A Fusion Network for Road Detection via Spatial Propagation and Spatial Transformation. Pattern Recognit. 2020, 100, 107141. [Google Scholar] [CrossRef]
- Valero, S.; Chanussot, J.; Benediktsson, J.A.; Talbot, H.; Waske, B. Advanced Directional Mathematical Morphology for the Detection of the Road Network in Very High Resolution Remote Sensing Images. Pattern Recognit. Lett. 2010, 31, 1120–1127. [Google Scholar] [CrossRef] [Green Version]
- Chaudhuri, D.; Kushwaha, N.K.; Samal, A. Semi-Automated Road Detection From High Resolution Satellite Images by Directional Morphological Enhancement and Segmentation Techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1538–1544. [Google Scholar] [CrossRef]
- Bae, Y.; Lee, W.-H.; Choi, Y.; Jeon, Y.W.; Ra, J.B. Automatic Road Extraction From Remote Sensing Images Based on a Normalized Second Derivative Map. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1858–1862. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef] [Green Version]
- Patil, D.; Jadhav, S. Road Extraction Techniques from Remote Sensing Images: A Review. In Innovative Data Communication Technologies and Application; Raj, J.S., Iliyasu, A.M., Bestak, R., Baig, Z.A., Eds.; Springer: Singapore, 2021; pp. 663–677. [Google Scholar]
- Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
- Mendes, C.C.T.; Frémont, V.; Wolf, D.F. Exploiting Fully Convolutional Neural Networks for Fast Road Detection. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–20 May 2016; pp. 3174–3179. [Google Scholar]
- Alshehhi, R.; Marpu, P.R. Hierarchical Graph-Based Segmentation for Extracting Road Networks from High-Resolution Satellite Images. ISPRS J. Photogramm. Remote Sens. 2017, 126, 245–260. [Google Scholar] [CrossRef]
- Costea, D.; Leordeanu, M. Aerial Image Geolocalization from Recognition and Matching of Roads and Intersections. arXiv 2016, arXiv:1605.08323. [Google Scholar]
- Bastani, F.; He, S.; Abbar, S.; Alizadeh, M.; Balakrishnan, H.; Chawla, S.; Madden, S.; DeWitt, D. RoadTracer: Automatic Extraction of Road Networks from Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4720–4728. [Google Scholar]
- Liu, Y.; Yao, J.; Lu, X.; Xia, M.; Wang, X.; Liu, Y. RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes from High-Resolution Remotely Sensed Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2043–2056. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, X.; Gao, H.; Miao, Q.; Xi, Y.; Ai, Y.; Gao, D. MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion. Remote Sens. 2022, 14, 3233. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Zhong, Z.; Li, J.; Cui, W.; Jiang, H. Fully Convolutional Networks for Building and Road Extraction: Preliminary Results. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1591–1594. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. CoRR 2018. Available online: https://link.springer.com/chapter/10.1007/978-3-030-00889-5_1 (accessed on 12 February 2023).
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Chaurasia, A.; Culurciello, E. LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 192–1924. [Google Scholar]
- Li, J.; Liu, Y.; Zhang, Y.; Zhang, Y. Cascaded Attention DenseUNet (CADUNet) for Road Extraction from Very-High-Resolution Images. ISPRS Int. J. Geo-Inf. 2021, 10, 329. [Google Scholar] [CrossRef]
- Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 483–499. [Google Scholar]
- Batra, A.; Singh, S.; Pang, G.; Basu, S.; Jawahar, C.; Paluri, M. Improved Road Connectivity by Joint Learning of Orientation and Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zhou, G.; Chen, W.; Gui, Q.; Li, X.; Wang, L. Split Depth-Wise Separable Graph-Convolution Network for Road Extraction in Complex Environments From High-Resolution Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5614115. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv 2021, arXiv:2105.05537. [Google Scholar]
- Liu, P.; Wang, Q.; Yang, G.; Li, L.; Zhang, H. Survey of Road Extraction Methods in Remote Sensing Images Based on Deep Learning. PFG 2022, 90, 135–159. [Google Scholar] [CrossRef]
- Shao, Y.; Guo, B.; Hu, X.; Di, L. Application of a Fast Linear Feature Detector to Road Extraction From Remotely Sensed Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 626–631. [Google Scholar] [CrossRef]
- Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous Extraction of Roads and Buildings in Remote Sensing Imagery with Convolutional Neural Networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
- Cui, F.; Feng, R.; Wang, L.; Wei, L. Joint Superpixel Segmentation and Graph Convolutional Network Road Extration for High-Resolution Remote Sensing Imagery. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 2178–2181. [Google Scholar]
- Kestur, R.; Farooq, S.; Abdal, R.; Qadri, E.; Narasipura, O.; Mudigere, M. UFCN: A Fully Convolutional Neural Network for Road Extraction in RGB Imagery Acquired by Remote Sensing from an Unmanned Aerial Vehicle. J. Appl. Remote Sens. 2018, 12, 016020. [Google Scholar] [CrossRef]
- Varia, N.; Dokania, A.; Senthilnath, J. DeepExt: A Convolution Neural Network for Road Extraction Using RGB Images Captured by UAV. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bengaluru, India, 18–21 November 2018; pp. 1890–1895. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected Crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Park, N.; Kim, S. How Do Vision Transformers Work? arXiv 2022, arXiv:2202.06709. [Google Scholar]
- Tao, C.; Qi, J.; Li, Y.; Wang, H.; Li, H. Spatial Information Inference Net: Road Extraction Using Road-Specific Contextual Information. ISPRS J. Photogramm. Remote Sens. 2019, 158, 155–166. [Google Scholar] [CrossRef]
- Sifre, L.; Mallat, S. Rigid-Motion Scattering for Texture Classification. arXiv 2014, arXiv:1403.1687. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Chen, Z.; Deng, L.; Luo, Y.; Li, D.; Marcato Junior, J.; Nunes Gonçalves, W.; Awal Md Nurunnabi, A.; Li, J.; Wang, C.; Li, D. Road Extraction in Remote Sensing Data: A Survey. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102833. [Google Scholar] [CrossRef]
Backbone | Params | Precision | Recall | IoU | F1-Score |
---|---|---|---|---|---|
ResNet-50 | 21.66 M | 84.91 | 78.61 | 68.14 | 81.64 |
Swin-T | 28.30 M | 84.05 | 81.34 | 69.91 | 82.67 |
Swin-S | 49.59 M | 85.29 | 82.51 | 72.18 | 83.88 |
Swin-B | 88.07 M | 85.76 | 83.17 | 73.11 | 84.50 |
Methods | Feature Separation | Dilated Block | Precision | Recall | IoU | F1-Score |
---|---|---|---|---|---|---|
RoadFormer | × | × | 83.71 | 80.70 | 69.66 | 82.18 |
RoadFormer | √ | × | 82.83 | 83.68 | 71.28 | 83.26 |
RoadFormer | √ | channel | 86.79 | 79.45 | 70.86 | 82.95 |
RoadFormer | √ | spatial | 85.29 | 82.51 | 0.7218 | 83.88 |
RoadFormer | √ | channel +spatial | 86.47 | 80.80 | 71.77 | 83.54 |
Methods | Precision | Recall | IoU | F1-Score | Param | FLOPs |
---|---|---|---|---|---|---|
FCN | 83.1 | 75.5 | 64.8 | 79.1 | 47.1 M | 197.7 G |
U-Net | 82.6 | 64.0 | 55.3 | 72.1 | 29.1 M | 202.5 G |
PSPNet | 84.7 | 70.1 | 60.1 | 76.7 | 49.0 M | 178.4 G |
DeeplabV3 | 78.9 | 58.3 | 50.0 | 67.0 | 65.7 M | 270.0 G |
Seg-Net | 69.5 | 73.0 | 55.3 | 71.2 | - | - |
LinkNet | 78.3 | 78.8 | 64.7 | 78.6 | - | - |
D-LinkNet | 84.9 | 78.6 | 68.1 | 81.6 | - | - |
HourGlass | 79.4 | 80.1 | 66.3 | 79.8 | - | - |
Batra et al. | 83.8 | 84.1 | 72.4 | 84.0 | - | - |
SwinUnet | 82.1 | 73.3 | 62.9 | 77.7 | 27.1 M | 254.8 G |
RoadFormer (Swin-T) | 84.1 | 81.3 | 69.9 | 82.7 | 28.3 M | 176.5 G |
RoadFormer (Swin-S) | 85.3 | 82.5 | 72.2 | 83.9 | 49.6 M | 269.4 G |
RoadFormer (Swin-B) | 85.8 | 83.2 | 73.1 | 84.5 | 89.0 M | 447.7 G |
Methods | Precision | Recall | IoU | F1-Score | Param | FLOPs |
---|---|---|---|---|---|---|
FCN | 82.8 | 68.1 | 59.7 | 74.7 | 47.1 M | 197.7 G |
U-Net | 82.3 | 70.37 | 61.1 | 75.9 | 29.1 M | 202.5 G |
U-Net++ | 80.9 | 72.4 | 61.8 | 76.4 | - | - |
PSPNet | 77.9 | 76.3 | 62.7 | 77.1 | 49.0 M | 178.4 G |
DeepLabV3 | 78.3 | 74.0 | 61.4 | 76.1 | 65.7 M | 270.0 G |
Seg-Net | 82.5 | 72.1 | 62.5 | 76.9 | - | - |
CADUNet | 79.5 | 76.6 | 64.1 | 77.9 | - | - |
Batra et al. | 81.9 | 69.3 | 60.1 | 75.1 | - | - |
SGCN | 84.8 | 73.9 | 65.3 | 79.0 | - | - |
SwinUnet | 78.5 | 75.8 | 62.8 | 77.1 | 27.1 M | 254.8 G |
RoadFormer (Swin-B) | 80.7 | 77.6 | 65.5 | 79.2 | 89.0 M | 447.7 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Wang, Z.; Wan, J.; Zhang, J.; Xi, Y.; Liu, R.; Miao, Q. RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution. Remote Sens. 2023, 15, 1049. https://doi.org/10.3390/rs15041049
Liu X, Wang Z, Wan J, Zhang J, Xi Y, Liu R, Miao Q. RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution. Remote Sensing. 2023; 15(4):1049. https://doi.org/10.3390/rs15041049
Chicago/Turabian StyleLiu, Xiangzeng, Ziyao Wang, Jinting Wan, Juli Zhang, Yue Xi, Ruyi Liu, and Qiguang Miao. 2023. "RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution" Remote Sensing 15, no. 4: 1049. https://doi.org/10.3390/rs15041049
APA StyleLiu, X., Wang, Z., Wan, J., Zhang, J., Xi, Y., Liu, R., & Miao, Q. (2023). RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution. Remote Sensing, 15(4), 1049. https://doi.org/10.3390/rs15041049