SPV-SSD: An Anchor-Free 3D Single-Stage Detector with Supervised-PointRendering and Visibility Representation
Abstract
:1. Introduction
- We propose a novel supervised-PointRendering to eliminate the effects of incorrect image semantics on object detection, significantly improving the detection precision of all categories.
- We introduce laser visibility features into the sequential fusion-based 3D object detection, which supplements more scene context to boost the 3D object detection.
- We design an anchor-free detection head powered by the 3D optimal transport assignment (OTA-3D) in the voxel-based approaches, achieving a decent balance between precision and speed. At present, our model has achieved comparable precision with the state-of-the-art single-stage approaches on both KITTI and nuScenes datasets with an ultra-high inference speed, showing the generality and efficiency of our model in different traffic scenarios.
2. Related Work
2.1. 3D Object Detection with Point Cloud
2.2. Multi-Modal Fusion
2.3. Visibility Representation
3. Proposed Method
3.1. Network Architecture
3.2. Supervised-PointRendering
3.2.1. PointRendering
3.2.2. Point-Wise Supervision Task
3.3. Visibility Representation
3.3.1. Ray Casting Algorithm
3.3.2. Spatial Visibility States
Algorithm 1 Vanilla voxel traversal algorithm |
|
3.4. Anchor-Free Detection Head with OTA-3D
3.4.1. OTA-3D
- Calculating the cost matrix: Its element indicates the pair-wise matching cost between a positive candidate box and a ground truth. Different from the 2D OTA, the pair-wise cost is composed of the classification loss , the bounding box regression loss and the BEV rotated-IoU loss between a positive candidate i and the ground truth box j, which is calculated as:
- Calculating the dynamic k: For each ground truth, its IoU (intersection over union) score is calculated with all positive candidates. We select the top N box predictions and sum their IoU scores as k, which is further rounded by the floor operation. Here, we set ; k is thus the number of positive samples assigned to one ground truth bounding box.
- Selecting the top k predictions with the least cost.
- Filtering the repeated predictions: In the case that the same prediction matches multiple ground truth bounding boxes, the prediction is assigned to the ground truth bounding box with the least cost.
3.4.2. Anchor-Free Detection Head
3.5. Loss Function
4. Experiments and Discussion
4.1. Datasets
4.2. Setup of Supervised-PointRendering
4.2.1. Setup on KITTI Dataset
4.2.2. Setup on nuScenes Dataset
4.3. Setup of Visibility Representation
4.3.1. Visibility Consistency in Data Augmentation
4.3.2. Spatial Visibility Feature Extraction
4.4. Choice of Key Parameter
- Setting 1: The unknown state is set to 0, the occupied state is set to 1 and the free state is set to −1.
- Setting 2: According to the formula provided by Octomap [39], the unknown state is set to 0.5, the occupied state is set to 0.7 and the free state is set to 0.4.
4.5. Comparison with State of the Art
4.5.1. Results on the KITTI dataset
4.5.2. Results on nuScenes Dataset
4.6. Ablation Study
4.6.1. Study on Anchor-Free Detection Head with OTA-3D
4.6.2. Study on Supervised-PointRendering
4.6.3. Study on Spatial Visibility Fusion
4.6.4. Run Time and Computation Efficiency
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, B.; Lan, J.; Gao, J. LiDAR filtering in 3D object detection based on improved RANSAC. Remote Sens. 2022, 14, 2110. [Google Scholar] [CrossRef]
- Deng, S.; Liang, Z.; Sun, L.; Jia, K. VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 8448–8457. [Google Scholar]
- Peng, K.; Fei, J.; Yang, K.; Roitberg, A.; Zhang, J.; Bieder, F.; Heidenreich, P.; Stiller, C.; Stiefelhagen, R. MASS: Multi-attentional semantic segmentation of LiDAR data for dense top-view understanding. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15824–15840. [Google Scholar] [CrossRef]
- Vora, S.; Lang, A.H.; Helou, B.; Beijbom, O. PointPainting: Sequential Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 4603–4611. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 770–779. [Google Scholar]
- Kamal, A.; Dhakal, P.; Javaid, A.Y.; Devabhaktuni, V.K.; Kaur, D.; Zaientz, J.; Marinier, R. Recent advances and challenges in uncertainty visualization: A survey. J. Vis. 2021, 24, 861–890. [Google Scholar] [CrossRef]
- Yang, L.; Hyde, D.; Grujic, O.; Scheidt, C.; Caers, J. Assessing and visualizing uncertainty of 3D geological surfaces using level sets with stochastic motion. Comput. Geosci. 2019, 122, 54–67. [Google Scholar] [CrossRef]
- Choi, J.; Chun, D.; Kim, H.; Lee, H.J. Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 502–511. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11618–11628. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [Green Version]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection From Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 12689–12697. [Google Scholar]
- Ge, R.; Ding, Z.; Hu, Y.; Wang, Y.; Chen, S.; Huang, L.; Li, Y. AFDet: Anchor Free One Stage 3D Object Detection. arXiv 2020, arXiv:2006.12671. [Google Scholar]
- Yin, T.; Zhou, X.; Krähenbühl, P. Center-based 3D Object Detection and Tracking. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 11779–11788. [Google Scholar]
- Ge, Z.; Liu, S.; Li, Z.; Yoshie, O.; Sun, J. OTA: Optimal Transport Assignment for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 303–312. [Google Scholar]
- Yang, L.; Hou, W.; Cui, C.; Cui, J. GOSIM: A multi-scale iterative multiple-point statistics algorithm with global optimization. Comput. Geosci. 2016, 89, 57–70. [Google Scholar] [CrossRef]
- Chen, Y.; Tai, L.; Sun, K.; Li, M. Monopair: Monocular 3d object detection using pairwise spatial relationships. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 12093–12102. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar]
- Zhao, Z.Q.; Zheng, P.; Xu, S.t.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [Green Version]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
- Shi, W.; Rajkumar, R.R. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1708–1716. [Google Scholar]
- He, C.H.; Zeng, H.; Huang, J.; Hua, X.; Zhang, L. Structure Aware Single-Stage 3D Object Detection From Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11870–11879. [Google Scholar]
- Deng, J.; Shi, S.; Li, P.; Zhou, W.; Zhang, Y.; Li, H. Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection. In Proceedings of the AAAI, Virtual, 2–9 February 2021. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10526–10535. [Google Scholar]
- Shi, S.; Wang, Z.; Wang, X.; Li, H. Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud. arXiv 2019, arXiv:1907.03670. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6526–6534. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep Continuous Fusion for Multi-sensor 3D Object Detection. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Chen, Z.; Li, Z.; Zhang, S.; Fang, L.; Jiang, Q.; Zhao, F.; Zhou, B.; Zhao, H. AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection. arXiv 2022, arXiv:2201.06493. [Google Scholar]
- Liu, Z.; Tang, H.; Amini, A.; Yang, X.; Mao, H.; Rus, D.; Han, S. BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. arXiv 2022, arXiv:2205.13542. [Google Scholar]
- Qi, C.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Wang, Z.; Jia, K. Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1742–1749. [Google Scholar]
- Fürst, M.; Wasenmüller, O.; Stricker, D. LRPD: Long Range 3D Pedestrian Detection Leveraging Specific Strengths of LiDAR and RGB. In Proceedings of the IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–7. [Google Scholar]
- Buhmann, J.M.; Burgard, W.; Cremers, A.B.; Fox, D.; Hofmann, T.; Schneider, F.E.; Strikos, J.; Thrun, S. The Mobile Robot Rhino. In Proceedings of the SNN Symposium on Neural Networks, Nijmegen, The Netherlands, 14–15 September 1995. [Google Scholar]
- Hornung, A.; Wurm, K.M.; Bennewitz, M.; Stachniss, C.; Burgard, W. OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 2013, 34, 189–206. [Google Scholar] [CrossRef] [Green Version]
- Richter, S.; Wirges, S.; Königshof, H.; Stiller, C. Fusion of range measurements and semantic estimates in an evidential framework / Fusion von Distanzmessungen und semantischen Größen im Rahmen der Evidenztheorie. TM-Tech. Mess. 2019, 86, 102–106. [Google Scholar] [CrossRef]
- Hu, P.; Ziglar, J.; Held, D.; Ramanan, D. What You See is What You Get: Exploiting Visibility for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10998–11006. [Google Scholar]
- Zheng, W.; Tang, W.; Chen, S.; Jiang, L.; Fu, C.W. CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud. In Proceedings of the AAAI, Virtual, 2–9 February 2021. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS — Improving Object Detection with One Line of Code. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. IoU Loss for 2D/3D Object Detection. In Proceedings of the International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 85–94. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.B.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid Task Cascade for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4969–4978. [Google Scholar]
- nuImages. 2020. Available online: https://www.nuscenes.org/nuimages (accessed on 4 December 2022).
- Pang, S.; Morris, D.D.; Radha, H. CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10386–10393. [Google Scholar]
- Chen, X.; Zhang, T.; Wang, Y.; Wang, Y.; Zhao, H. Futr3d: A unified sensor fusion framework for 3d detection. arXiv 2022, arXiv:2203.10642. [Google Scholar]
- Yin, J.; Shen, J.; Guan, C.; Zhou, D.; Yang, R. Lidar-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11495–11504. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-Based 3D Single Stage Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11037–11045. [Google Scholar]
- Zhu, X.; Zhou, H.; Wang, T.; Hong, F.; Li, W.; Ma, Y.; Li, H.; Yang, R.; Lin, D. Cylindrical and asymmetrical 3d convolution networks for lidar-based perception. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6807–6822. [Google Scholar] [CrossRef]
- Zhu, B.; Jiang, Z.; Zhou, X.; Li, Z.; Yu, G. Class-balanced grouping and sampling for point cloud 3D object detection. arXiv 2019, arXiv:1908.09492. [Google Scholar]
- Chen, Q.; Sun, L.; Cheung, E.; Yuille, A.L. Every view counts: Cross-view consistency in 3D object detection with hybrid-cylindrical-spherical voxelization. Adv. Neural Inf. Process. Syst. 2020, 33, 21224–21235. [Google Scholar]
Method | Car (%) | Pedestrian (%) | Cyclist (%) | ||||||
---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |
PointPillars [4] | 87.22 | 76.95 | 73.52 | 57.75 | 52.29 | 47.91 | 82.29 | 63.26 | 59.82 |
Painted PointPillars [4] | 86.26 | 76.77 | 70.25 | 61.50 | 56.15 | 50.03 | 79.12 | 64.18 | 60.79 |
Delta | −0.96 | −0.18 | −3.27 | +3.75 | +3.86 | +2.12 | −3.17 | +0.92 | +0.97 |
PointRCNN [6] | 86.96 | 75.64 | 70.70 | 47.98 | 39.37 | 36.01 | 74.96 | 58.82 | 52.53 |
Painted PointRCNN [4] | 82.11 | 71.70 | 67.08 | 50.32 | 40.97 | 37.87 | 77.63 | 63.78 | 55.89 |
Delta | −4.85 | −3.94 | −3.62 | +2.34 | +1.6 | +1.86 | +2.67 | +4.96 | +3.36 |
Method | Det. Head | 3D AP (%) | BEV AP (%) | ||||
---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | ||
SECOND [11] | anchor-based | 87.43 | 76.48 | 69.10 | 89.96 | 87.07 | 79.66 |
PointPillars [12] | anchor-based | 83.73 | 76.04 | 69.12 | 89.68 | 86.34 | 84.38 |
AFDet [13] | anchor-free | 85.68 | 75.57 | 69.31 | 89.42 | 85.45 | 80.56 |
SECOND | anchor-based | 84.65 | 75.96 | 68.71 | 89.39 | 83.77 | 78.59 |
PointPillars | anchor-based | 82.58 | 74.31 | 68.99 | 90.07 | 96.56 | 82.81 |
CenterPoint [14] | anchor-free | 81.17 | 73.96 | 69.48 | 88.47 | 85.05 | 81.19 |
Visibility Encoding [U, O, F] | Pedestrian AP (BEV) (%) | mAP (BEV) (%) | Pedestrian AP (3D) (%) | mAP (3D) (%) | ||||
---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | |||
[1, 0, −1] | 62.86 | 58.08 | 52.88 | 57.94 | 58.76 | 55.90 | 47.74 | 54.13 |
[0.7, 0.5, 0.4] | 64.57 | 60.15 | 54.50 | 59.74 | 60.84 | 57.98 | 49.77 | 56.20 |
Delta | +1.71 | +2.07 | +1.62 | +1.80 | +2.08 | +2.08 | +2.03 | +2.07 |
Method | Modality | Car (%) | Pedestrian (%) | Cyclist (%) | FPS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |||
Voxelnet [21] | L | 77.47 | 65.11 | 57.73 | 39.48 | 33.69 | 31.51 | 61.22 | 48.36 | 44.37 | 4.4 |
SECOND [11] | L | 84.65 | 75.96 | 68.71 | 45.31 | 35.52 | 33.14 | 75.83 | 60.82 | 53.67 | 20 |
PointPillars [12] | L | 82.58 | 74.31 | 68.99 | 51.45 | 41.92 | 38.89 | 77.10 | 58.65 | 51.92 | 42 |
PointRCNN [6] | L | 86.96 | 75.64 | 70.70 | 47.98 | 39.37 | 36.01 | 74.96 | 58.82 | 52.53 | - |
CenterPoint [14] | L | 81.17 | 73.96 | 69.48 | 47.25 | 39.28 | 36.78 | 73.04 | 56.67 | 50.60 | - |
CLOCs_SecCas [50] | L | 86.38 | 78.45 | 72.45 | - | - | - | - | - | - | - |
SA-SSD [23] | L | 88.75 | 79.79 | 74.16 | - | - | - | - | - | - | 25 |
CIA-SSD [42] | L | 89.59 | 80.28 | 72.87 | - | - | - | - | - | - | 32 |
MV3D [30] | L & I | 74.97 | 63.63 | 54.00 | - | - | - | - | - | - | 2.8 |
AVOD-FPN [31] | L & I | 83.07 | 71.76 | 65.73 | 50.46 | 42.27 | 39.04 | 63.76 | 50.55 | 44.93 | 10 |
F-PointNet [35] | L & I | 82.19 | 69.79 | 60.59 | 50.53 | 42.15 | 38.08 | 72.27 | 56.12 | 49.01 | 5.9 |
Ours | L & I | 87.22 | 80.34 | 75.40 | 45.83 | 38.45 | 36.03 | 78.36 | 64.40 | 56.92 | 33 |
Method | Car (%) | Pedestrian (%) | Cyclist (%) | ||||||
---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |
PointPillars [12] | 87.22 | 76.95 | 73.52 | 57.75 | 52.29 | 47.91 | 82.29 | 63.26 | 59.82 |
Painted PointPillars [4] | 86.26 | 76.77 | 70.25 | 61.50 | 56.15 | 50.03 | 79.12 | 64.18 | 60.79 |
Delta | −0.96 | −0.18 | −3.27 | +3.75 | +3.86 | +2.12 | −3.17 | +0.92 | +0.97 |
VoxelNet [24] | 81.97 | 65.46 | 62.85 | 57.86 | 53.42 | 48.87 | 67.17 | 47.65 | 45.11 |
SECOND [11] | 87.43 | 76.48 | 69.10 | - | - | - | - | - | - |
SA-SSD [23] | 90.15 | 79.91 | 78.78 | - | - | - | - | - | - |
CIA-SSD [42] | 90.04 | 79.81 | 78.80 | - | - | - | - | - | - |
AFDet [13] | 85.68 | 75.57 | 69.31 | - | - | - | - | - | - |
Ours | 89.16 | 82.97 | 79.49 | 60.75 | 55.67 | 50.20 | 83.67 | 69.59 | 64.21 |
Methods | Stages | NDS | mAP | Car | Truck | Bus | Trailer | Cons. Veh. | Ped. | Motor. | Bicycle | Tr.Cone | Barrier |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WYSIWYG [41] | One | 41.9 | 35.0 | 79.1 | 30.4 | 46.6 | 40.1 | 7.1 | 65.0 | 18.2 | 0.1 | 28.8 | 34.7 |
PointPillars [12] | One | 45.3 | 30.5 | 68.4 | 23.0 | 28.2 | 23.4 | 4.1 | 59.7 | 27.4 | 1.1 | 30.8 | 38.9 |
3DVID [52] | One | 53.1 | 45.4 | 79.7 | 33.6 | 47.1 | 43.1 | 18.1 | 76.5 | 40.7 | 7.9 | 58.8 | 48.8 |
3DSSD [53] | One | 56.4 | 42.6 | 81.2 | 47.2 | 61.4 | 30.5 | 12.6 | 70.2 | 36.0 | 8.6 | 31.1 | 47.9 |
Cylinder3D [54] | One | 61.6 | 50.6 | - | - | - | - | - | - | - | - | - | - |
CenterPoint [14] | Two | 65.5 | 58.0 | 84.6 | 51.0 | 60.2 | 53.2 | 17.5 | 83.4 | 53.7 | 28.7 | 76.7 | 70.9 |
CGBS [55] | One | 63.3 | 52.8 | 81.1 | 48.5 | 54.9 | 42.9 | 10.5 | 80.1 | 51.5 | 22.3 | 70.9 | 65.7 |
CVCNet [56] | One | 64.2 | 55.8 | 82.6 | 49.5 | 59.4 | 51.1 | 16.2 | 83.0 | 61.8 | 38.8 | 69.7 | 69.7 |
Ours | One | 69.6 | 65.3 | 86.6 | 56.5 | 65.8 | 60.4 | 29.7 | 86.5 | 68.3 | 42.9 | 80.6 | 75.5 |
Methods | Stages | NDS | mAP | Car | Truck | Bus | Trailer | Cons. Veh. | Ped. | Motor. | Bicycle | Tr. Cone | Barrier |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PointPillars [12] | One | 45.3 | 29.6 | 70.5 | 25.0 | 34.5 | 20.0 | 4.5 | 59.9 | 16.8 | 1.7 | 29.6 | 33.2 |
SECOND [11] | One | 48.4 | 27.1 | 75.5 | 21.9 | 29.0 | 13.0 | 0.4 | 59.9 | 16.9 | 0.0 | 22.5 | 32.2 |
MEGVII [55] | One | 62.5 | 50.7 | 81.6 | 51.7 | 67.2 | 37.5 | 14.8 | 77.7 | 42.6 | 17.4 | 57.4 | 59.2 |
FUTR3D [51] | One | 68.0 | 64.2 | 86.3 | 61.5 | 71.9 | 42.1 | 26.0 | 82.6 | 73.6 | 63.3 | 70.1 | 64.4 |
Ours | One | 69.2 | 64.7 | 86.8 | 61.0 | 72.6 | 42.1 | 27.1 | 86.5 | 71.8 | 56.2 | 73.3 | 69.7 |
OTA-3D Head | 3D D-IoU | IoU Pred. | Car (%) | Pedestrian (%) | Cyclist (%) |
---|---|---|---|---|---|
Mod. | Mod. | Mod. | |||
76.48 | 51.14 | 66.74 | |||
✔ | 77.37 | 52.68 | 67.23 | ||
✔ | ✔ | 78.56 | 54.55 | 67.65 | |
✔ | ✔ | ✔ | 81.85 | 53.70 | 68.13 |
Class_ID | One-Hot | VR | Seg. Score | SUPV | Car (%) | Ped. (%) | Cyc. (%) |
---|---|---|---|---|---|---|---|
Mod. | Mod. | Mod. | |||||
81.85 | 53.70 | 68.13 | |||||
✔ | 78.45 | 57.36 | 64.48 | ||||
✔ | 77.92 | 55.59 | 65.12 | ||||
✔ | 79.10 | 54.19 | 64.25 | ||||
✔ | 79.35 | 55.10 | 67.20 | ||||
✔ | ✔ | 83.21 | 57.64 | 68.17 |
Vis.(Early) | Vis.(Late) | SUPV | Car (%) | Ped (%) | Cyc. (%) |
---|---|---|---|---|---|
Mod. | Mod. | Mod. | |||
81.85 | 53.70 | 68.13 | |||
✔ | 79.33 | 57.98 | 69.54 | ||
✔ | 78.26 | 55.91 | 69.03 | ||
✔ | ✔ | 82.97 | 55.67 | 69.59 |
1-Stage | Point-GNN | Associate-3Ddet | SASSD | 3DSSD | TANet | Ours (1-Stage) |
---|---|---|---|---|---|---|
time (ms) | 643 | 60 | 40.1 | 38 | 34.75 | 30.33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, L.; Tian, W.; Wang, L.; Wang, Z.; Yu, Z. SPV-SSD: An Anchor-Free 3D Single-Stage Detector with Supervised-PointRendering and Visibility Representation. Remote Sens. 2023, 15, 161. https://doi.org/10.3390/rs15010161
Yin L, Tian W, Wang L, Wang Z, Yu Z. SPV-SSD: An Anchor-Free 3D Single-Stage Detector with Supervised-PointRendering and Visibility Representation. Remote Sensing. 2023; 15(1):161. https://doi.org/10.3390/rs15010161
Chicago/Turabian StyleYin, Lingmei, Wei Tian, Ling Wang, Zhiang Wang, and Zhuoping Yu. 2023. "SPV-SSD: An Anchor-Free 3D Single-Stage Detector with Supervised-PointRendering and Visibility Representation" Remote Sensing 15, no. 1: 161. https://doi.org/10.3390/rs15010161
APA StyleYin, L., Tian, W., Wang, L., Wang, Z., & Yu, Z. (2023). SPV-SSD: An Anchor-Free 3D Single-Stage Detector with Supervised-PointRendering and Visibility Representation. Remote Sensing, 15(1), 161. https://doi.org/10.3390/rs15010161