LiDAR-Based Intensity-Aware Outdoor 3D Object Detection
Abstract
:1. Introduction
2. Related Work
2.1. Grid-Based Feature Encoders
2.2. Point-Set-Based Feature Encoders
2.3. Hybrid-Based Feature Encoders
3. Methodology
3.1. Problem Formulation
3.2. 3D Object Detection Pipeline
3.2.1. Intensity-Aware Voxel Feature Encoding
3.2.2. 3D and 2D Backbone Stages
3.3. Experimental Setup
3.3.1. Datasets
3.3.2. Training and Evaluation
4. Results
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Anderson, M. The road ahead for self-driving cars: The av industry has had to reset expectations, as it shifts its focus to level 4 autonomy-[News]. IEEE Spectr. 2020, 57, 8–9. [Google Scholar] [CrossRef]
- Deng, J.; Shi, S.; Li, P.; Zhou, W.; Zhang, Y.; Li, H. Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 1201–1209. [Google Scholar]
- Wu, X.; Peng, L.; Yang, H.; Xie, L.; Huang, C.; Deng, C.; Liu, H.; Cai, D. Sparse fuse dense: Towards high quality 3d detection with depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5418–5427. [Google Scholar]
- Pire, T.; Corti, J.; Grinblat, G. Online object detection and localization on stereo visual slam system. J. Intell. Robot. Syst. 2020, 98, 377–386. [Google Scholar] [CrossRef]
- Xu, X.; Zhang, L.; Yang, J.; Cao, C.; Wang, W.; Ran, Y.; Tan, Z.; Luo, M. A review of multi-sensor fusion slam systems based on 3d lidar. Remote Sens. 2022, 14, 2835. [Google Scholar] [CrossRef]
- Ghasemi, Y.; Jeong, H.; Choi, S.H.; Park, K.B.; Lee, J.Y. Deep learning-based object detection in augmented reality: A systematic review. Comput. Ind. 2022, 139, 103661. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11618–11628. [Google Scholar]
- Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Glaeser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1341–1360. [Google Scholar] [CrossRef]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Hegde, S.; Gangisetty, S. Pig-net: Inception based deep learning architecture for 3d point cloud segmentation. Comput. Graph. 2021, 95, 13–22. [Google Scholar] [CrossRef]
- Shi, S.; Wang, Z.; Shi, J.; Wang, X.; Li, H. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 48, 2647–2664. [Google Scholar] [CrossRef] [PubMed]
- Hahner, M.; Sakaridis, C.; Dai, D.; Van Gool, L. Fog simulation on real lidar point clouds for 3d object detection in adverse weather. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15283–15292. [Google Scholar]
- Kilic, V.; Hegde, D.; Sindagi, V.; Cooper, A.B.; Foster, M.A.; Patel, V.M. Lidar light scattering augmentation (lisa): Physics-based simulation of adverse weather conditions for 3d object detection. arXiv 2021, arXiv:2107.07004. [Google Scholar]
- Arnold, E.; Al-Jarrah, O.Y.; Dianati, M.; Fallah, S.; Oxtoby, D. A Survey on 3D Object Detection Methods for Autonomous Driving Applications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3782–3795. [Google Scholar] [CrossRef]
- Charron, N.; Phillips, S.; Waslander, S.L. De-noising of lidar point clouds corrupted by snowfall. In Proceedings of the 2018 15th Conference on Computer and Robot Vision, CRV 2018, Toronto, ON, Canada, 8–10 May 2018; pp. 254–261. [Google Scholar]
- Heinzler, R.; Piewak, F.; Schindler, P.; Stork, W. CNN-Based Lidar Point Cloud De-Noising in Adverse Weather. IEEE Robot. Autom. Lett. 2020, 5, 2514–2521. [Google Scholar] [CrossRef]
- Lindell, D.B.; Wetzstein, G. Three-dimensional imaging through scattering media based on confocal diffuse tomography. Nat. Commun. 2020, 11, 4517. [Google Scholar] [CrossRef] [PubMed]
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Park, J.I.; Park, J.; Kim, K.S. Fast and accurate desnowing algorithm for lidar point clouds. IEEE Access 2020, 8, 160202–160212. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Chang, M.F.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; Lucey, S.; Ramanan, D.; et al. Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Waymo. Waymo Open Dataset: An Autonomous Driving Dataset; Waymo: Mountain View, CA, USA, 2020. [Google Scholar]
- Li, B.; Zhang, T.; Xia, T. Vehicle detection from 3d lidar using fully convolutional network. arXiv 2016, arXiv:1608.07916. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep continuous fusion for multi-sensor 3d object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Sindagi, V.A.; Zhou, Y.; Tuzel, O. Mvx-net: Multimodal voxelnet for 3d object detection. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7276–7282. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Shen, X.; Jia, J. Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1951–1960. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. Tog 2019, 38, 146. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, S.; Shen, X.; Jia, J. Fast point r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10526–10535. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Pitropov, M.; Garcia, D.E.; Rebello, J.; Smart, M.; Wang, C.; Czarnecki, K.; Waslander, S. Canadian Adverse Driving Conditions Dataset. Int. J. Robot. Res. 2020, 40, 681–690. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. 2012. Available online: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (accessed on 28 April 2024).
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure aware single-stage 3d object detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11873–11882. [Google Scholar]
- Liang, M.; Yang, B.; Chen, Y.; Hu, R.; Urtasun, R. Multi-task multi-sensor fusion for 3D object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7337–7345. [Google Scholar]
- Qian, K.; Zhu, S.; Zhang, X.; Li, L.E. Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Agarwal, S.; Vora, A.; Pandey, G.; Williams, W.; Kourous, H.; McBride, J. Ford Multi-AV Seasonal Dataset. Int. J. Robot. Res. 2020, 39, 1367–1376. [Google Scholar] [CrossRef]
- Maddern, W.; Pascoe, G.; Gadd, M.; Barnes, D.; Yeomans, B.; Newman, P. Real-time kinematic ground truth for the oxford robotcar dataset. arXiv 2020, arXiv:2002.10152. [Google Scholar]
Difficulty Level | Min. Bounding Box Height | Max. Occlusion Level | Max. Truncation |
---|---|---|---|
Easy | 40 Px | 0 (Fully visible) | 15% |
Moderate | 25 Px | 1 (Partly occluded) | 30% |
Hard | 25 Px | 2 (Difficult to see) | 50% |
Method | Modality | 3D Detection (Car) | BEV Detection (Car) | ||||
---|---|---|---|---|---|---|---|
Easy | Medium | Hard | Easy | Medium | Hard | ||
MV3D [27] | R + L | 74.97 | 63.63 | 54.00 | 86.62 | 78.93 | 69.80 |
AVOD-FPN [28] | R + L | 83.07 | 71.76 | 65.73 | 90.99 | 84.82 | 79.62 |
F-PointNet [40] | R + L | 82.19 | 69.79 | 60.59 | 91.17 | 84.67 | 74.77 |
UberATG-MMF | R + L | 88.40 | 77.43 | 70.22 | 93.67 | 88.21 | 81.99 |
SECOND [31] | L | 83.34 | 72.55 | 65.82 | 89.39 | 83.77 | 78.59 |
PointPillars [9] | L | 82.58 | 74.31 | 68.99 | 90.07 | 86.56 | 82.81 |
PointRCNN [36] | L | 86.96 | 75.64 | 70.70 | 92.13 | 87.39 | 82.72 |
STD [33] | L | 87.95 | 79.71 | 75.09 | 94.74 | 89.19 | 86.42 |
Part--Net [14] | L | 85.94 | 77.86 | 72.00 | 89.52 | 84.76 | 81.47 |
PV-RCNN [36] | L | 90.25 | 81.43 | 76.82 | 94.98 | 90.65 | 86.14 |
Voxel R-CNN [2] | L | 90.90 | 81.62 | 77.06 | 95.52 | 91.25 | 88.99 |
Ours | L | 88.88 | 79.27 | 74.27 | 92.97 | 88.70 | 85.97 |
Method | Modality | 3D Detection (Pedestrian) | BEV Detection (Pedestrian) | ||||
---|---|---|---|---|---|---|---|
Easy | Medium | Hard | Easy | Medium | Hard | ||
AVOD-FPN [28] | R + L | 50.46 | 42.27 | 39.04 | 58.49 | 50.32 | 46.98 |
F-PointNet [40] | R + L | 50.53 | 41.15 | 38.08 | 57.13 | 49.57 | 45.48 |
SECOND [31] | L | 51.45 | 41.92 | 38.89 | 58.69 | 50.13 | 46.84 |
PointPillars [9] | L | 51.85 | 41.58 | 39.37 | 58.77 | 50.35 | 46.13 |
PointRCNN [36] | L | 53.29 | 43.47 | 38.35 | 60.02 | 48.72 | 44.55 |
STD [33] | L | 54.49 | 44.50 | 42.36 | 59.72 | 51.12 | 48.04 |
Part--Net [14] | L | 53.42 | 43.29 | 40.29 | 59.86 | 50.57 | 46.74 |
PV-RCNN [36] | L | 53.77 | 43.59 | 40.29 | 59.80 | 50.57 | 46.74 |
Voxel R-CNN [2] | L | 54.95 | 44.52 | 41.25 | 60.74 | 50.58 | 46.74 |
Ours | L | 61.62 | 53.74 | 47.90 | 65.75 | 58.76 | 52.30 |
Method | Modality | 3D Detection (Cyclist) | BEV Detection (Cyclist) | ||||
---|---|---|---|---|---|---|---|
Easy | Medium | Hard | Easy | Medium | Hard | ||
AVOD-FPN [28] | R + L | 63.76 | 50.55 | 44.93 | 69.39 | 57.12 | 51.09 |
F-PointNet [40] | R + L | 72.27 | 56.12 | 49.01 | 77.26 | 61.37 | 53.78 |
SECOND [31] | L | 71.33 | 52.08 | 45.83 | 76.5 | 56.05 | 49.45 |
PointPillars [9] | L | 77.1 | 58.65 | 51.92 | 79.9 | 62.73 | 55.58 |
PointRCNN [36] | L | 74.96 | 58.82 | 52.53 | 82.56 | 67.24 | 60.28 |
STD [33] | L | 78.69 | 61.59 | 55.3 | 81.36 | 67.23 | 59.35 |
Part--Net [14] | L | 78.58 | 62.73 | 57.74 | 81.91 | 68.12 | 61.92 |
PV-RCNN [36] | L | 78.6 | 63.71 | 57.65 | 82.49 | 68.89 | 62.41 |
Ours | L | 83.61 | 65.88 | 61.94 | 85.71 | 69.12 | 64.85 |
Encoding Schemes | Models | Modality | Hardware | FPS |
---|---|---|---|---|
Grid-Based | VoxelNet [12] | L | Titan X | 4.4 |
MVX-Ne [30] | L | - | - | |
Second [31] | L | GTX 1080 Ti | 30.4 | |
SA-SSD [41] | L | GTX 1080 Ti | 25.0 | |
Voxel R-CNN [2] | L | RTX 2080 Ti | 25.2 | |
Ours | L | RTX 3080 | 40.7 | |
Point-Based | Points [10] | L + R | GTX1080 | 1.3 |
MV3D [27] | L + R | Titan X | 2.8 | |
Hybrid | PointPillars [9] | L | GTX 1080 Ti | 42.4 |
PV-RCNN [36] | L | GTX1080 | 8.9 | |
MMF [42] | L + R | GTX1080 | 0.08 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Naich, A.Y.; Carrión, J.R. LiDAR-Based Intensity-Aware Outdoor 3D Object Detection. Sensors 2024, 24, 2942. https://doi.org/10.3390/s24092942
Naich AY, Carrión JR. LiDAR-Based Intensity-Aware Outdoor 3D Object Detection. Sensors. 2024; 24(9):2942. https://doi.org/10.3390/s24092942
Chicago/Turabian StyleNaich, Ammar Yasir, and Jesús Requena Carrión. 2024. "LiDAR-Based Intensity-Aware Outdoor 3D Object Detection" Sensors 24, no. 9: 2942. https://doi.org/10.3390/s24092942
APA StyleNaich, A. Y., & Carrión, J. R. (2024). LiDAR-Based Intensity-Aware Outdoor 3D Object Detection. Sensors, 24(9), 2942. https://doi.org/10.3390/s24092942