Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion
Abstract
:1. Introduction
- Inspired by video object segmentation tasks [23], an appearance–motion fusion (AMF) module is designed, which consists of a shared attention mechanism and motion correction method, to enhance the extraction capability of appearance features with motion information;
- A majority voting strategy (MVS) post-processing method is proposed, which integrates temporal point cloud semantic information to update the current predictions, addressing the boundary blur and semantic label misclassification issues caused by re-projection;
- In the test set, the IoU of this proposed method reaches 72.19%, exceeding top dynamic object segmentation networks such as LMNet and MotionSeg3D by 9.68% and 4.86%, respectively.
2. Method
2.1. Overall Network Structure
2.2. Motion Feature Representation
2.3. Meta-Kernel Convolution
2.4. Appearance–Motion Feature Fusion Module
2.5. Majority Voting Strategy
3. Experiments
3.1. Experiment Setups
3.2. Implementation Details
3.3. Analysis of Experimental Results
3.4. Ablation Study
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, X.; Milioto, A.; Palazzolo, E.; Giguere, P.; Behley, J.; Stachniss, C. Suma++: Efficient lidar-based semantic slam. In Proceedings of the2019 IEEE/2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; IEEE: New York, NY, USA, 2019; pp. 4530–4537. [Google Scholar]
- Baur, S.A.; Emmerichs, D.J.; Moosmann, F.; Pinggera, P.; Ommer, B.; Geiger, A. Slim: Self-supervised lidar scene flow and motion segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 13126–13136. [Google Scholar]
- Tishchenko, I.; Lombardi, S.; Oswald, M.R.; Pollefeys, M. Self-supervised learning of non-rigid residual flow and ego-motion. In Proceedings of the 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan, 25–28 November 2020; IEEE: New York, NY, USA, 2020; pp. 150–159. [Google Scholar]
- Chen, P.; Pei, J.; Lu, W.; Li, M. A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance. Neurocomputing 2022, 497, 64–75. [Google Scholar] [CrossRef]
- Pomerleau, F.; Krüsi, P.; Colas, F.; Furgale, P.; Siegwart, R. Long-term 3D map maintenance in dynamic environments. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; IEEE: New York, NY, USA, 2014; pp. 3712–3719. [Google Scholar]
- Underwood, J.P.; Gillsjö, D.; Bailey, T.; Vlaskine, V. Explicit 3D change detection using ray-tracing in spherical coordinates. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; IEEE: New York, NY, USA, 2013; pp. 4735–4741. [Google Scholar]
- Kim, G.; Kim, A. Remove, then revert: Static point cloud map construction using multiresolution range images. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; IEEE: New York, NY, USA, 2020; pp. 10758–10765. [Google Scholar]
- Schauer, J.; Nüchter, A. The peopleremover-removing dynamic objects from 3-d point cloud data by traversing a voxel occupancy grid. IEEE Robot. Autom. Lett. 2018, 3, 1679–1686. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lak City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 4490–4499. [Google Scholar]
- Cortinhal, T.; Tzelepis, G.; Erdal Aksoy, E. Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. In Proceedings of the Advances in Visual Computing: 15th International Symposium, San Diego, CA, USA, 5–7 October 2020; Springer: Berlin, Germany, 2020; pp. 207–222. [Google Scholar]
- Wang, D.F.; Shang, H.; Cao, J.; Wang, T.; Xia, X.; Han, Y. Semantic segmentation of point clouds in autonomous driving scenes based on self-attention mechanism. Automot. Eng. 2022, 44, 1656–1664. [Google Scholar]
- Mersch, B.; Chen, X.; Vizzo, I.; Nunes, L.; Behley, J.; Stachniss, C. Receding moving object segmentation in 3d lidar data using sparse 4d convolutions. IEEE Robot. Autom. Lett. 2022, 7, 7503–7510. [Google Scholar] [CrossRef]
- Wang, N.; Shi, C.; Guo, R.; Lu, H.; Zheng, Z.; Chen, X. InsMOS: Instance-Aware Moving Object Segmentation in LiDAR Data. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: New York, NY, USA, 2023; pp. 7598–7605. [Google Scholar]
- Graham, B.; Engelcke, M.; Van Der Maaten, L. 3D Semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 9224–9232. [Google Scholar]
- Yan, X.; Gao, J.; Li, J.; Zhang, R.; Li, Z.; Huang, R.; Cui, S. Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI: Menlo Park, CA, USA, 2021; Volume 35, pp. 3101–3109. [Google Scholar]
- Chen, X.; Li, S.; Mersch, B.; Wiesmann, L.; Gall, J.; Behley, J.; Stachniss, C. Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data. IEEE Robot. Autom. Lett. 2021, 6, 6529–6536. [Google Scholar] [CrossRef]
- Kim, J.; Woo, J.; Im, S. Rvmos: Range-view moving object segmentation leveraged by semantic and motion features. IEEE Robot. Autom. Lett. 2022, 7, 8044–8051. [Google Scholar] [CrossRef]
- Sun, J.; Dai, Y.; Zhang, X.; Xu, J.; Ai, R.; Gu, W.; Chen, X. Efficient spatial-temporal information fusion for lidar-based 3d moving object segmentation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: New York, NY, USA, 2022; pp. 11456–11463. [Google Scholar]
- Wang, N.; Hou, Z.Q.; Zhao, M.Q.; Yu, W.; Ma, S. Semantic segmentation algorithm combined with edge detection. Comput. Eng. 2021, 47, 257–265. [Google Scholar]
- Wu, B.; Wan, A.; Yue, X.; Keutzer, K. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 1887–1893. [Google Scholar]
- Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and accurate lidar semantic segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; IEEE: New York, NY, USA, 2019; pp. 4213–4220. [Google Scholar]
- Yang, S.; Zhang, L.; Qi, J.; Lu, H.; Wang, S.; Zhang, X. Learning motion-appearance co-attention for zero-shot video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 1564–1573. [Google Scholar]
- Wang, T.; Wang, W.J.; Cai, Y. Research on semantic segmentation methods for 3D point clouds based on deep learning. Comput. Eng. Appl. 2021, 57, 18–26. [Google Scholar]
- Xia, X.T.; Wang, D.F.; Cao, J.; Zhang, G.; Zhang, J. Semantic segmentation of vehicle-mounted LiDAR point clouds based on sparse convolutional neural networks. Automot. Eng. 2022, 44, 26–35. [Google Scholar]
- Fan, L.; Xiong, X.; Wang, F.; Wang, N.; Zhang, Z. Rangedet: In defense of range view for lidar-based 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 2918–2927. [Google Scholar]
- Stergiou, A.; Poppe, R. Adapool: Exponential adaptive pooling for information-retaining downsampling. IEEE Trans. Image Process. 2022, 32, 251–266. [Google Scholar] [CrossRef] [PubMed]
- Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020. [Google Scholar]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, US, 2019; pp. 9297–9307. [Google Scholar]
Methods | Params [M] | Inference Time [ms] | ||
---|---|---|---|---|
LMNet | 62.51 | 74.48 | 6.71 | 35 |
SalsaNext | 46.6 | 52.69 | 6.73 | 41.67 |
LiMoSeg | 52.6 | - | - | 8 |
MotionSeg3D-v1 | 67.33 | 77.50 | 10.41 | 42.53 |
MotionSeg3D-v2 | 71.42 | 79.58 | 21.77 | 112 |
RVMOS | 71.2 | - | 2.63 | 29 |
InsMOS | 73.2 | 82.12 | 25.35 | 127 |
Ours | 72.19 | 81.76 | 12.18 | 48.23 |
Params [M] | FLOPs [G] | Inference Time [ms] | |
---|---|---|---|
Baseline Model | 10.41 | 523.28 | 40.78 |
Baseline Model + CRF | 51.05 | ||
Baseline Model + k-NN | 42.53 | ||
Baseline Model + MVS | 12.18 | 553.54 | 48.23 |
Baseline Models | AMF | MVS | ||
---|---|---|---|---|
√ | 61.93 | Baseline | ||
√ | √ | 65.69 | +3.76 | |
√ | √ | 68.43 | +6.5 | |
√ | √ | √ | 72.19 | +10.26 |
Post-Processing | ||
---|---|---|
No Post-Processing | 65.69 | Baseline |
CRF | 64.53 | −1.16 |
k-NN | 67.66 | +1.97 |
MVS | 72.19 | +6.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, F.; Wang, Z.; Zheng, Y.; Wang, Q.; Hao, B.; Xiang, Y. Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion. Electronics 2024, 13, 3975. https://doi.org/10.3390/electronics13203975
Huang F, Wang Z, Zheng Y, Wang Q, Hao B, Xiang Y. Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion. Electronics. 2024; 13(20):3975. https://doi.org/10.3390/electronics13203975
Chicago/Turabian StyleHuang, Fei, Zhiwen Wang, Yu Zheng, Qi Wang, Bingsen Hao, and Yangkai Xiang. 2024. "Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion" Electronics 13, no. 20: 3975. https://doi.org/10.3390/electronics13203975
APA StyleHuang, F., Wang, Z., Zheng, Y., Wang, Q., Hao, B., & Xiang, Y. (2024). Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion. Electronics, 13(20), 3975. https://doi.org/10.3390/electronics13203975