Point-Rich: Enriching Sparse Light Detection and Ranging Point Clouds for Accurate Three-Dimensional Object Detection
Abstract
:1. Introduction
- We propose Point-Rich, a lightweight plug-and-play module that encompasses two crucial components: the HighDensity module (HD) and the HighLight module (HL). By seamlessly integrating these modules into existing 3D object detection networks, the overall performance can be significantly enhanced.
- The HighDensity (HD) module aims to enhance the density of the point cloud surrounding the object, ensuring more accurate and consistent measurements.
- The HighLight (HL) module enriches the point cloud with additional semantic features from images, enabling improved discrimination between the foreground object and the background.
2. Related Work
2.1. LiDAR-Based 3D Learning
2.2. Multi-Modal Fusion-Based 3D Object Detection
3. Methodology
3.1. HighDensity Module
3.2. HighLight Module
Algorithm 1: Generate dense semantic virtual points to initially enhance the point cloud |
Algorithm 2: Recode the LiDAR point cloud |
3.3. Three-Dimensional Object Detection
4. Experiments
4.1. Dataset
4.2. Experiment Settings
4.3. Main Results
4.4. Ablation Study
4.5. Compared with SOTA
4.6. Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vora, S.; Lang, A.H.; Helou, B.; Beijbom, O. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 4–19 June 2020; pp. 4604–4612. [Google Scholar]
- Wang, C.; Ma, C.; Zhu, M.; Yang, X. Pointaugmenting: Cross-modal augmentation for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11794–11803. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3D object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- Wang, Z.; Jia, K. Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Macau, China, 3–8 November 2019; pp. 1742–1749. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Huang, T.; Liu, Z.; Chen, X.; Bai, X. Epnet: Enhancing point features with image semantics for 3D object detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 35–52. [Google Scholar]
- Liu, Z.; Huang, T.; Li, B.; Chen, X.; Wang, X.; Bai, X. EPNet++: Cascade bi-directional fusion for multi-modal 3D object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 8324–8341. [Google Scholar] [CrossRef] [PubMed]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-based 3D single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 4–19 June 2020; pp. 11040–11048. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 4–19 June 2020; pp. 10529–10538. [Google Scholar]
- Yoo, J.H.; Kim, Y.; Kim, J.; Choi, J.W. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3D object detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 720–736. [Google Scholar]
- Liang, M.; Yang, B.; Chen, Y.; Hu, R.; Urtasun, R. Multi-task multi-sensor fusion for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7345–7353. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep continuous fusion for multi-sensor 3d object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 641–656. [Google Scholar]
- Li, Y.; Yu, A.W.; Meng, T.; Caine, B.; Ngiam, J.; Peng, D.; Shen, J.; Lu, Y.; Zhou, D.; Le, Q.V.; et al. Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17182–17191. [Google Scholar]
- Chen, Z.; Li, Z.; Zhang, S.; Fang, L.; Jiang, Q.; Zhao, F.; Zhou, B.; Zhao, H. Autoalign: Pixel-instance feature aggregation for multi-modal 3D object detection. arXiv 2022, arXiv:2201.06493. [Google Scholar]
- Bai, X.; Hu, Z.; Zhu, X.; Huang, Q.; Chen, Y.; Fu, H.; Tai, C.L. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1090–1099. [Google Scholar]
- Wu, H.; Wen, C.; Shi, S.; Li, X.; Wang, C. Virtual Sparse Convolution for Multimodal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 21653–21662. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Wu, H.; Deng, J.; Wen, C.; Li, X.; Wang, C.; Li, J. CasA: A cascade attention network for 3-D object detection from LiDAR point clouds. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
- Smith, L.N.; Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Baltimore, MD, USA, 15–17 April 2019; Volume 11006, pp. 369–386. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Neuhold, G.; Ollmann, T.; Rota Bulo, S.; Kontschieder, P. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4990–4999. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Cheng, B.; Collins, M.D.; Zhu, Y.; Liu, T.; Huang, T.S.; Adam, H.; Chen, L.C. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 4–19 June 2020; pp. 12475–12485. [Google Scholar]
- Wu, X.; Peng, L.; Yang, H.; Xie, L.; Huang, C.; Deng, C.; Liu, H.; Cai, D. Sparse fuse dense: Towards high quality 3d detection with depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5418–5427. [Google Scholar]
Methods | Year | Modality | 3D Detection | Bird’s Eye View | Orientation | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |||
PointPillars [13] | 2019 | L | 87.91 | 76.40 | 73.76 | 92.53 | 87.52 | 85.39 | 95.54 | 91.57 | 90.84 |
PointRCNN [12] | 2019 | L | 92.04 | 82.63 | 80.55 | 93.19 | 90.31 | 88.48 | 95.90 | 94.01 | 91.84 |
3DSSD [11] | 2020 | L | 92.37 | 83.11 | 80.13 | 95.22 | 91.31 | 88.72 | 98.63 | 95.07 | 92.50 |
3DSSD_SASA [25] | 2022 | L | 92.23 | 85.28 | 82.58 | 95.38 | 91.81 | 89.27 | 98.62 | 95.52 | 94.78 |
SFD [32] | 2022 | L + C | 92.73 | 87.82 | 84.59 | 95.78 | 92.68 | 90.19 | 99.05 | 96.72 | 94.95 |
VirConv-L [23] | 2023 | L + C | 93.21 | 88.02 | 85.61 | 96.16 | 93.50 | 91.39 | 99.22 | 97.52 | 95.05 |
VirConv-L+Point-Rich | - | L + C | 93.03 | 88.22 | 85.75 | 95.98 | 93.44 | 91.34 | 99.15 | 97.62 | 95.02 |
Methods | mAP | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Mod. | Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |
Pointpillars | 66.88 | 92.53 | 87.52 | 85.39 | 59.63 | 52.82 | 48.32 | 83.97 | 60.31 | 56.31 |
Pointpillars+Point-Rich | 72.41 | 93.15 | 87.82 | 85.56 | 66.51 | 60.58 | 56.52 | 89.12 | 68.84 | 64.55 |
Delta | +5.53 | +0.62 | +0.30 | +0.17 | +6.88 | +7.76 | +8.20 | +5.15 | +8.53 | +8.24 |
Methods | Detection | Bird’s Eye View | Orientation | ||||||
---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |
PointRCNN | 91.45 | 82.28 | 78.33 | 93.08 | 88.73 | 86.34 | 95.78 | 93.19 | 89.57 |
PointRCNN+Point-Rich | 92.04 | 82.63 | 80.55 | 93.19 | 90.31 | 88.48 | 95.90 | 94.01 | 91.84 |
Delta | +0.59 | +0.35 | +2.22 | +0.11 | +1.58 | +2.14 | +0.12 | +0.82 | +2.27 |
PointPillars | HD Module | HL Module | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |||
✓ | 92.53 | 87.52 | 85.39 | 59.63 | 52.82 | 48.32 | 83.97 | 60.31 | 56.31 | ||
✓ | ✓ | 93.81 | 87.32 | 85.13 | 59.40 | 52.52 | 48.46 | 83.29 | 59.65 | 55.97 | |
✓ | ✓ | 93.19 | 87.71 | 86.99 | 64.35 | 59.57 | 56.28 | 89.07 | 68.57 | 65.03 | |
✓ | ✓ | ✓ | 93.15 | 87.82 | 85.56 | 66.51 | 60.58 | 56.52 | 89.12 | 68.84 | 64.55 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Zheng, Y.; Zhu, D.; Wu, Q.; Zeng , H.; Gu, L.; Zhai, X.B. Point-Rich: Enriching Sparse Light Detection and Ranging Point Clouds for Accurate Three-Dimensional Object Detection. Mathematics 2023, 11, 4809. https://doi.org/10.3390/math11234809
Zhang Y, Zheng Y, Zhu D, Wu Q, Zeng H, Gu L, Zhai XB. Point-Rich: Enriching Sparse Light Detection and Ranging Point Clouds for Accurate Three-Dimensional Object Detection. Mathematics. 2023; 11(23):4809. https://doi.org/10.3390/math11234809
Chicago/Turabian StyleZhang, Yanchao, Yinuo Zheng, Dingkun Zhu, Qiaoyun Wu, Hansheng Zeng , Lipeng Gu, and Xiangping Bryce Zhai. 2023. "Point-Rich: Enriching Sparse Light Detection and Ranging Point Clouds for Accurate Three-Dimensional Object Detection" Mathematics 11, no. 23: 4809. https://doi.org/10.3390/math11234809
APA StyleZhang, Y., Zheng, Y., Zhu, D., Wu, Q., Zeng , H., Gu, L., & Zhai, X. B. (2023). Point-Rich: Enriching Sparse Light Detection and Ranging Point Clouds for Accurate Three-Dimensional Object Detection. Mathematics, 11(23), 4809. https://doi.org/10.3390/math11234809