Deep Dual-Resolution Road Scene Segmentation Networks Based on Decoupled Dynamic Filter and Squeeze–Excitation Module
Abstract
:1. Introduction
- (1)
- DDF is used to replace the ordinary convolution of each module in the network, which reduces the number of network parameters, and enables the network to dynamically adjust the weight of each convolution kernel according to the sample image.
- (2)
- By adding SE-Module to each module in the network, the local feature map in the network can obtain global features, which reduces the impact of image local interference on the segmentation effect.
- (3)
- We conducted experiments on the Cityscapes dataset to evaluate the proposed DDF&SE-DDRNet. The experimental results show that the performance of the proposed DDF&SE-DDRNet is better than that of the state of the art.
2. Related Work
2.1. Deep Dual-Resolution Networks
2.2. Attention Mechanism Based on Squeeze-and-Excitation
2.3. Decoupled Dynamic Filter
3. Approach
3.1. Structure of DDF&SE-DDRNet
3.2. DDF in the Proposed Network
3.3. SE-Module in the Proposed Network
4. Experiment
4.1. Implementation Details
4.2. Evaluation Methodology
4.3. Road Scene Segmentation on Cityscape Dataset
- (1)
- DDRNet and DDF&SE-DDRNet outperform BiSeNet V2 and ResNet-50 in both segmentation effect and speed.
- (2)
- The parameter amount of DDF&SE-DDRNet is less than that of the original DDRNet.
- (3)
- The reduction in the number of parameters is the main reason for the improved inferring speed of the DDF&SE-DDRNet road scene segmentation algorithm compared to the original DDRNet road scene segmentation algorithm. The FPS value of the proposed DDF&SE-DDRNet road scene segmentation algorithm is 60, which is higher than the original DDRNet in inferring speed.
- (4)
- Among all networks, LECNN has the highest inference speed and the least number of parameters. However, the MIou value of LECNN is the lowest.
- (5)
- In terms of segmentation accuracy, the PA value, mPA value, and MIou value of the DDF&SE-DDRNet road scene segmentation algorithm are higher than that of LECNN, DDF-DDRNet, and DDRNet.
- (6)
- DDF&SE DDRNet does not sacrifice segmentation accuracy excessively to improve inference speed like LECNN because we believe that the development of hardware devices will greatly improve inference speed. Overall, DDF&SE-DDRNet achieved the highest segmentation accuracy while also achieving satisfactory inference speed.
4.4. Ablation Studies
4.5. Visualized Road Scene Segmentation Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xu, J.; Park, S.H.; Zhang, X.; Hu, J. The improvement of road driving safety guided by visual inattentional blindness. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4972–4981. [Google Scholar] [CrossRef]
- Chen, J.; Wang, Q.; Peng, W.; Xu, H.; Li, X.; Xu, W. Disparity-Based multiscale fusion network for transportation detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18855–18863. [Google Scholar] [CrossRef]
- Xiong, S.; Li, B.; Zhu, S. DCGNN: A single-stage 3D object detection network based on density clustering and graph neural network. Complex Intell. Syst. 2023, 9, 3399–3408. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Chen, J.; Wang, Q.; Cheng, H.H.; Peng, W.; Xu, W. A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19954–19979. [Google Scholar] [CrossRef]
- Ren, X.; Ahmad, S.; Zhang, L.; Xiang, L.; Nie, D.; Yang, F.; Wang, Q.; Shen, D. Task decomposition and synchronization for semantic biomedical image segmentation. IEEE Trans. Image Process. 2020, 29, 7497–7510. [Google Scholar]
- Jing, L.; Chen, Y.; Tian, Y. Coarse-to-fine semantic segmentation from image-level labels. IEEE Trans. Image Process. 2019, 29, 225–236. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cira, C.-I.; Kada, M.; Manso-Callejo, M.-Á.; Alcarria, R.; Bordel Sanchez, B. Improving road surface area extraction via semantic segmentation with conditional generative learning for deep inpainting operations. ISPRS Int. J. Geo-Inf. 2022, 11, 43. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- Du, L.; Zhang, Y.; Liu, B.; Yan, H. An Urban Road Semantic Segmentation Method Based on Bilateral Segmentation Network. In Proceedings of the 2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 24–26 February 2023; pp. 503–507. [Google Scholar]
- Kherraki, A.; Maqbool, M.; El Ouazzani, R. Lightweight and Efficient Convolutional Neural Network for Road Scene Semantic Segmentation. In Proceedings of the 2022 IEEE 18th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 22–24 September 2022; pp. 135–139. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. Efficient convNet for real-time semantic segmentation. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1789–1794. [Google Scholar]
- Badrinarayanan, V.; Handa, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for mage segmentation. arXiv 2015, arXiv:1505.07293. [Google Scholar] [CrossRef]
- Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv 2021, arXiv:2101.06085. [Google Scholar]
- Zhou, J.; Jampani, V.; Pi, Z.; Liu, Q.; Yang, M.-H. Decoupled dynamic filter networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 6647–6656. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Li, B.; Lu, Y.; Pang, W.; Xu, H. Image Colorization using CycleGAN with semantic and spatial rationality. Multimed. Tools Appl. 2023, 82, 21641–21655. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3213–3223. [Google Scholar]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Method | PA (%) | mPA (%) | Miou (%) | FPS | Parameter |
---|---|---|---|---|---|
LECNN | - | - | 65.9 | 106 | 0.64 M |
BiSeNet V2 | - | - | 75.8 | 47 | 49 M |
ResNet-50 | - | - | 76 | 40 | 25.6 M |
DDRNet | 96.1 | 85.1 | 77.2 | 51 | 20.3 M |
DDF&SE-DDRNet | 96.4 | 86.2 | 79.2 | 60 | 18.6 M |
Method | DDF | SE-Module | mIou (%) | Parameter |
---|---|---|---|---|
DDRNet | - | - | 77.20 | 20.3 M |
DDF-DDRNet | √ | - | 78.52 | 18.4 M |
SE-DDRNet1 | - | √1–3 | 77.52 | 20.4 M |
SE-DDRNet2 | - | √4–5 | 77.63 | 20.4 M |
SE-DDRNet3 | - | √1–5 | 77.92 | 20.6 M |
DDF&SE-DDRNet | √ | √1–5 | 79.20 | 18.6 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ni, H.; Jiang, S. Deep Dual-Resolution Road Scene Segmentation Networks Based on Decoupled Dynamic Filter and Squeeze–Excitation Module. Sensors 2023, 23, 7140. https://doi.org/10.3390/s23167140
Ni H, Jiang S. Deep Dual-Resolution Road Scene Segmentation Networks Based on Decoupled Dynamic Filter and Squeeze–Excitation Module. Sensors. 2023; 23(16):7140. https://doi.org/10.3390/s23167140
Chicago/Turabian StyleNi, Hongyin, and Shan Jiang. 2023. "Deep Dual-Resolution Road Scene Segmentation Networks Based on Decoupled Dynamic Filter and Squeeze–Excitation Module" Sensors 23, no. 16: 7140. https://doi.org/10.3390/s23167140
APA StyleNi, H., & Jiang, S. (2023). Deep Dual-Resolution Road Scene Segmentation Networks Based on Decoupled Dynamic Filter and Squeeze–Excitation Module. Sensors, 23(16), 7140. https://doi.org/10.3390/s23167140