Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud
Abstract
:1. Introduction
1.1. Background
1.2. Related Work
2. Study Area and Materials
3. Methodology
3.1. RandLA-Net Network
3.2. Multi-Feature Aggregation
3.2.1. RandLA-Net++ Network
3.2.2. RandLA-Net3+ Network
3.3. Dilated Convolution
3.4. Loss Function
4. Results
4.1. Experiment Design
4.2. Evaluation Metrics
4.3. Experimental Results and Analysis
4.3.1. SensatUrban Dataset
4.3.2. Toronto-3D Dataset
4.3.3. NJSeg-3D Dataset
4.3.4. Efficiency of RandLA-Net++ and RandLA-Net3+
4.3.5. The Impact of Imbalanced Class Distribution
5. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kasznar, A.P.P.; Hammad, A.W.; Najjar, M.; Linhares Qualharini, E.; Figueiredo, K.; Soares, C.A.P.; Haddad, A.N. Multiple dimensions of smart cities’ infrastructure: A review. Buildings 2021, 11, 73. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Yang, B.; Liang, F.; Huang, R. Progress, challenges and perspectives of 3D LiDAR point cloud processing. Acta Geod. Cartogr. Sin. 2017, 46, 1509–1516. [Google Scholar]
- Liao, X. Scientific and technological progress and development prospect of the earth observation in China in the past 20 years. Natl. Remote Sens. Bull. 2021, 25, 267–275. [Google Scholar]
- Yi, L.; Su, H.; Guo, X.; Guibas, L.J. SyncSpecCNN: Synchronized spectral CNN for 3D shape segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6584–6592. [Google Scholar]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3DShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. ScanNet: Richlyannotated 3D reconstructions of indoor scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2432–2443. [Google Scholar]
- Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, I V-1/W1, 91–98. [Google Scholar] [CrossRef] [Green Version]
- Hu, Q.; Yang, B.; Khalid, S.; Xiao, W.; Trigoni, N.; Markham, A. Towards semantic segmentation of urban-scale 3D point clouds: A dataset, benchmarks and challenges. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Virtual, 19–25 June 2021; pp. 4977–4987. [Google Scholar]
- Tan, W.; Qin, N.; Ma, L.; Li, Y.; Du, J.; Cai, G.; Yang, K.; Li, J. Toronto-3D: A large-scale mobile LiDAR dataset for semantic segmentation of urban roadways. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 202–203. [Google Scholar]
- Yang, B.; Dong, Z. Progress and perspective of point cloud intelligence. Acta Geod. Cartogr. Sin. 2019, 48, 1575–1585. [Google Scholar]
- Yu, B.; Dong, C.; Liu, Y. Deep learning based point cloud segmentation: A survey. Comput. Eng. Appl. 2020, 56, 38–45. [Google Scholar]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4338–4364. [Google Scholar] [CrossRef]
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar]
- Maturana, D.; Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October 2015. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural. Inf. Process. Syst. 2018, 31, 820–830. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 6411–6420. [Google Scholar]
- Wang, C.; Samari, B.; Siddiqi, K. Local spectral graph convolution for point set feature learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 52–66. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.D.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. Acm Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Himmelsbach, M.; Hundelshausen, F.V.; Wuensche, H.J. Fast segmentation of 3D point clouds for ground vehicles. In Proceedings of the 2010 IEEE Intelligent Vehicles Symposium, La Jolla, CA, USA, 21–24 June 2010; pp. 560–565. [Google Scholar]
- Liu, F.; Li, S.; Zhang, L.; Zhou, C.; Ye, R.; Wang, Y.; Lu, J. 3DCNN-DQN-RNN: A deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5678–5687. [Google Scholar]
- Zhao, C.; Zhou, W.; Lu, L.; Zhao, Q. Pooling scores of neighboring points for improved 3D point cloud segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1475–1479. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
- Pang, Y.; Li, Y.; Shen, J.; Shao, L. Towards bridging semantic gap to improve semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 4230–4239. [Google Scholar]
- Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced seg-mentations. In Deep Learning in Medical Image Analysis and Multi-Modal Learning for Clinical Decision Support; Springer: Cham, Switzerland, 2017; pp. 240–248. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
- Cortinhal, T.; Tzelepis, G.; Aksoy, E.E. Salsanext: Fast semantic segmentation of lidar point clouds for autonomous driving. arXiv 2020, arXiv:2003.03653. [Google Scholar]
OA | mIoU | Ground | Veg. | Building | Wall | Bridge | Parking | Rail | Traffic | Street | Car | Footpath | Bike | Water | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RandLANet [20] | 89.8 | 52.7 | 80.1 | 98.1 | 91.6 | 48.9 | 40.8 | 51.6 | 0.0 | 56.7 | 33.2 | 80.1 | 32.6 | 0.0 | 71.3 |
Ours (++) | 91.9 | 57.1 | 84.1 | 98.2 | 94.4 | 58.6 | 59.8 | 53.4 | 10.7 | 54.6 | 42.6 | 78.2 | 38.2 | 0.0 | 69.7 |
Ours (3+) | 91.9 | 55.4 | 83.7 | 98.2 | 94.8 | 50.9 | 53.9 | 60.2 | 6.9 | 56.1 | 39.1 | 76.8 | 38.9 | 0.0 | 61.4 |
OA | mIoU | Road | Rd Mrk. | Natural | Building | Util. Line | Pole | Car | Fence | |
---|---|---|---|---|---|---|---|---|---|---|
RandLA-Net [20] | 93.0 | 77.7 | 94.6 | 42.6 | 96.9 | 93.0 | 86.5 | 78.1 | 92.9 | 37.1 |
Ours (++) | 96.9 | 80.9 | 96.4 | 63.7 | 96.2 | 94.8 | 86.8 | 77.7 | 87.6 | 43.6 |
Ours (3+) | 97.0 | 79.9 | 96.8 | 70.0 | 96.1 | 92.3 | 86.3 | 80.4 | 91.5 | 29.4 |
Type | OA% | Recall % | Accuracy% | F1 | IoU% | mIoU% | |
---|---|---|---|---|---|---|---|
RandLA-Net [20] | building | 94.5 | 93.5 | 86.0 | 91.5 | 80.1 | 82.1 |
road | 82.4 | 75.4 | 78.7 | 65.0 | |||
tree | 91.5 | 97.5 | 95.5 | 91.4 | |||
others | 96.3 | 95.8 | 96.1 | 92.4 | |||
RandLA-Net++ | building | 95.1 | 96.5 | 91.9 | 94.7 | 89.9 | 85.5 |
road | 84.3 | 79.5 | 81.8 | 69.3 | |||
tree | 91.9 | 96.5 | 95.2 | 90.1 | |||
others | 97.6 | 94.9 | 96.2 | 92.7 | |||
RandLA-Net3+ | building | 95.0 | 94.9 | 90.2 | 93.5 | 87.8 | 85.3 |
road | 83.2 | 75.9 | 79.4 | 65.8 | |||
tree | 96.9 | 98.8 | 96.8 | 93.8 | |||
others | 97.8 | 95.7 | 96.7 | 93.7 |
Parameters (Millions) | Training Time (Hours) | Inference Time (Seconds) | |
---|---|---|---|
RandLA-Net | 14.9 | 16 ± 1 | 362 ± 5 |
RandLA-Net++ | 17.6 | 20 ± 1 | 371 ± 5 |
RandLA-Net3+ | 18.6 | 18 ± 1 | 375 ± 5 |
OA | mIoU | Building | Tree | Road | Others | |
---|---|---|---|---|---|---|
RandLA-Net (Lwce) | 94.5 | 82.1 | 80.1 | 91.4 | 65.0 | 92.4 |
RandLA-Net++ (Lwce) | 95.1 | 85.5 | 89.9 | 90.1 | 69.3 | 92.7 |
RandLA-Net++ (Lls) | 95.8 | 86.2 | 93.2 | 86.7 | 70.9 | 93.8 |
RandLA-Net++ (Lwce+ Lls) | 96.0 | 86.8 | 93.1 | 88.5 | 71.4 | 94.0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Zhao, Y.; Meng, C.; Liu, Y. Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud. Remote Sens. 2022, 14, 5134. https://doi.org/10.3390/rs14205134
Chen J, Zhao Y, Meng C, Liu Y. Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud. Remote Sensing. 2022; 14(20):5134. https://doi.org/10.3390/rs14205134
Chicago/Turabian StyleChen, Jiaqing, Yindi Zhao, Congtang Meng, and Yang Liu. 2022. "Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud" Remote Sensing 14, no. 20: 5134. https://doi.org/10.3390/rs14205134
APA StyleChen, J., Zhao, Y., Meng, C., & Liu, Y. (2022). Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud. Remote Sensing, 14(20), 5134. https://doi.org/10.3390/rs14205134