Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey
Abstract
:1. Introduction
- A detailed study on multi-modal 3D object detection methods, categorized into three parts: methods using only RGB images, techniques using LiDAR point clouds, and approaches integrating RGB and point cloud data for improved accuracy and robustness.
- A summary of recent advancements in multi-modal 3D object detection, with a side-by-side comparison of different techniques, highlighting their strengths and weaknesses.
- An extensive survey of various sensor fusion strategies implemented in autonomous vehicles, with a comparative analysis of their performance in different scenarios.
2. Background
2.1. Autonomous Vehicles
2.2. 3D Object Detection in Autonomous Vehicles
2.2.1. Early Beginnings
2.2.2. Advancement in Sensor Technologies
2.2.3. Multi-Sensor Fusion
2.3. Sensors in 3D Object Detection
2.3.1. LiDAR (Light Detection and Ranging)
2.3.2. Radar (Radio Detection and Ranging)
2.3.3. Camera
2.3.4. Ultrasonic Sensors
2.3.5. Infrared Sensors (IR)
2.3.6. ToF (Time-of-Flight) Cameras
Sensor Type | Strengths | Weaknesses | Use Cases |
---|---|---|---|
LiDAR | High accuracy and detail in 3D mapping; precise distance measurement; capable of detecting small and complex objects | High cost; performance can degrade in adverse weather conditions like fog and heavy rain | Navigation and object identification in AVs |
Radar | Effective in adverse weather; measures velocity and distance; robust and reliable | Lower resolution; limited in detecting fine details | Adaptive cruise control; collision avoidance |
Stereo Cameras | Natural depth perception; effective in well-lit environments; relatively low cost | Requires significant computational resources; less effective in low light conditions | Object detection and recognition; navigation in complex environments |
Monocular Cameras | Simpler setup and lower cost compared to stereo cameras | Requires complex processing for depth estimation; accuracy can suffer in static environments | Cost-effective visual sensing; traffic sign and lane detection |
Ultrasonic Sensors | Effective for short-range detection; low cost; works in various lighting conditions | Limited to short-range; lower resolution; slower response time | Parking assistance; obstacle detection in tight spaces |
Infrared Sensors (IR) | Effective in low-light conditions; can detect warm objects against cooler backgrounds | Performance can degrade in foggy or dusty conditions | Night vision applications; detecting living creatures |
ToF Cameras | Provides rapid depth information; effective in real-time applications | Struggles with surfaces that absorb or reflect light unevenly | Real-time 3D mapping; interactive applications |
3. Data Processing and Sensor Fusion
3.1. Challenges in Data Processing
3.1.1. Sensor Data Integration and Fusion
3.1.2. Real-Time Processing
3.1.3. Handling Environmental Variability
3.1.4. Accuracy and Reliability
3.1.5. Scalability and Efficiency
3.1.6. Data Annotation and Model Training
3.1.7. Regulatory and Safety Standards
3.2. Sensor Fusion Approaches
3.2.1. Early Fusion (Raw Data Fusion)
3.2.2. Feature-Level Fusion (Intermediate Fusion)
3.2.3. Decision-Level Fusion (Late Fusion)
Fusion Approach | Differences | Advantages | Disadvantages | Algorithms/Papers |
---|---|---|---|---|
Early Fusion | Integrates raw data from multiple sensors before processing. | Utilizes complete data, capturing all potential interactions; enhances feature extraction. | High computational burden; requires synchronization of sensor data. | “Multi-sensor Fusion Framework for 3D Object Detection in Autonomous Driving” by X. Wang et al. [14] |
Feature-Level Fusion | Combines features extracted from sensor data after initial processing. | Reduces computational load; utilizes robust features from each sensor. | Complexity in feature compatibility and extraction design. | “Feature fusion for robust patch matching with compact binary descriptors” by A. Migliorati et al. [60] |
Decision-Level Fusion | Aggregates final decisions from each sensor’s independent analysis. | Lower computational demands; flexible in decision-making. | Possible loss of detail from raw and feature-level data; less accurate if individual decisions are weak. | “Decision fusion for signalized intersection control” by S. Elkosantini et al. [61] |
4. 3D Object Detection Algorithms
4.1. Traditional Image Object Techniques
4.1.1. Stereo Vision
4.1.2. Laser Range Finders and LiDAR
4.1.3. Template Matching
4.1.4. Basic Machine Learning Techniques
4.1.5. Feature-Based Approaches
4.2. Deep Learning Approaches to 3D Object Detection
4.3. Recent Developments for 3D Object Detection Algorithms
4.3.1. 3D Object Detection Algorithms for Point Cloud Data Sparsity
Transformers, Attention Mechanisms, and Self-Supervision
GAN-Based, Denoising, and Upsampling Methods
Feature Extraction and Enhancement
Three-Dimensional Reconstruction and Radar-Based Detection
Fusion and Multi-Modal Techniques
4.3.2. 3D Object Detection Algorithms for Multi-Modal Fusion
Projection-Based Fusion
Alignment and Distillation Techniques
Segmentation-Guided Fusion
Transformers and Attention Mechanisms
5. Current Challenges and Limitations
5.1. Sensor Performance Under Varying Environmental Conditions
5.2. Efficient Sensor Fusion
5.3. Accurate Object Detection in Dynamic Environments
5.4. Computational Resources
5.5. Processing Large Volumes of Data
5.6. Evolving Detection Algorithms
5.7. Development of Regulatory Frameworks
6. Future Direction and Emerging Trends
6.1. Multi-Sensor Fusion Advances
6.2. Use of Transformers in 3D Object Detection
6.3. Algorithm Improvements for Sparse Point Clouds
6.3.1. Integration of Deep Learning with Sensor Fusion
6.3.2. Handling Sparse and Noisy Data
6.3.3. Improvement of Computational Efficiency
6.3.4. Enhanced Feature Extraction Techniques
6.3.5. Ethical, Security, and Privacy Considerations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Buehler, M.; Iagnemma, K.; Singh, S. The DARPA Urban Challenge: Autonomous Vehicles in City Traffic; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009; Volume 56. [Google Scholar]
- Patz, B.J.; Papelis, Y.; Pillat, R.; Stein, G.; Harper, D. A practical approach to robotic design for the DARPA urban challenge. J. Field Robot. 2008, 25, 528–566. [Google Scholar] [CrossRef]
- Faisal, A.; Kamruzzaman, M.; Yigitcanlar, T.; Currie, G. Understanding autonomous vehicles. J. Transp. Land Use 2019, 12, 45–72. [Google Scholar] [CrossRef]
- Parekh, D.; Poddar, N.; Rajpurkar, A.; Chahal, M.; Kumar, N.; Joshi, G.P.; Cho, W. A review on autonomous vehicles: Progress, methods and challenges. Electronics 2022, 11, 2162. [Google Scholar] [CrossRef]
- Sun, Z.; Lin, M.; Chen, W.; Dai, B.; Ying, P.; Zhou, Q. A case study of unavoidable accidents of autonomous vehicles. Traffic Inj. Prev. 2024, 25, 8–13. [Google Scholar] [CrossRef] [PubMed]
- Dixit, V.V.; Chand, S.; Nair, D.J. Autonomous vehicles: Disengagements, accidents and reaction times. PLoS ONE 2016, 11, e0168054. [Google Scholar] [CrossRef] [PubMed]
- Hopkins, D.; Schwanen, T. Talking about automated vehicles: What do levels of automation do? Technol. Soc. 2021, 64, 101488. [Google Scholar] [CrossRef]
- SAE On-Road Automated Vehicle Standards Committee. Taxonomy and definitions for terms related to on-road motor vehicle automated driving systems. SAE Stand. J. 2014, 3016, 1. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Wang, Y.; Ye, J. An overview of 3d object detection. arXiv 2020, arXiv:2010.15614. [Google Scholar]
- Wu, Y.; Wang, Y.; Zhang, S.; Ogai, H. Deep 3D object detection networks using LiDAR data: A review. IEEE Sens. J. 2020, 21, 1152–1171. [Google Scholar] [CrossRef]
- Ma, X.; Ouyang, W.; Simonelli, A.; Ricci, E. 3d object detection from images for autonomous driving: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 3537–3556. [Google Scholar] [CrossRef]
- Wang, X.; Li, K.; Chehri, A. Multi-sensor fusion technology for 3D object detection in autonomous driving: A review. IEEE Trans. Intell. Transp. Syst. 2024, 25, 1148–1165. [Google Scholar] [CrossRef]
- Alaba, S.Y.; Ball, J.E. A survey on deep-learning-based lidar 3d object detection for autonomous driving. Sensors 2022, 22, 9577. [Google Scholar] [CrossRef]
- SAE International. Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles; SAE International: Warrendale, PA, USA, 2021. [Google Scholar]
- SAE International. Levels of Driving AutomationTM Refined for Clarity and International Audience; SAE International: Warrendale, PA, USA, 2021. [Google Scholar]
- Channon, M.; McCormick, L.; Noussia, K. The Law and Autonomous Vehicles; Taylor & Francis: Abingdon, UK, 2019. [Google Scholar]
- Ilková, V.; Ilka, A. Legal aspects of autonomous vehicles—An overview. In Proceedings of the 2017 21st International Conference on Process Control (PC), Strbske Pleso, Slovakia, 6–9 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 428–433. [Google Scholar]
- Gibson, B. Analysis of Autonomous Vehicle Policies; Technical Report; Transportation Cabinet: Lexington, KY, USA, 2017. [Google Scholar]
- Kilanko, V. Government Response and Perspective on Autonomous Vehicles. In Government Response to Disruptive Innovation: Perspectives and Examinations; IGI Global: Hershey, PA, USA, 2023; pp. 137–153. [Google Scholar]
- Carranza-García, M.; Torres-Mateo, J.; Lara-Benítez, P.; García-Gutiérrez, J. On the performance of one-stage and two-stage object detectors in autonomous vehicles using camera data. Remote Sens. 2020, 13, 89. [Google Scholar] [CrossRef]
- Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and sensor fusion technology in autonomous vehicles: A review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef]
- Gulzar, M.; Muhammad, Y.; Muhammad, N. A survey on motion prediction of pedestrians and vehicles for autonomous driving. IEEE Access 2021, 9, 137957–137969. [Google Scholar] [CrossRef]
- Trauth, R.; Moller, K.; Betz, J. Toward safer autonomous vehicles: Occlusion-aware trajectory planning to minimize risky behavior. IEEE Open J. Intell. Transp. Syst. 2023, 4, 929–942. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Moravec, H.P. The Stanford cart and the CMU rover. Proc. IEEE 1983, 71, 872–884. [Google Scholar] [CrossRef]
- Wandinger, U. Introduction to lidar. In Lidar: Range-Resolved Optical Remote Sensing of the Atmosphere; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–18. [Google Scholar]
- Shan, J.; Toth, C.K. Topographic Laser Ranging and Scanning: Principles and Processing; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Royo, S.; Ballesta-Garcia, M. An overview of lidar imaging systems for autonomous vehicles. Appl. Sci. 2019, 9, 4093. [Google Scholar] [CrossRef]
- Wang, Z.; Wu, Y.; Niu, Q. Multi-sensor fusion in automated driving: A survey. IEEE Access 2019, 8, 2847–2868. [Google Scholar] [CrossRef]
- Earnest, L. Stanford Cart; Stanford University: Stanford, CA, USA, 2012. [Google Scholar]
- Rosenfeld, A. Digital Picture Processing; Academic Press: Cambridge, MA, USA, 1976. [Google Scholar]
- Moody, S.E. Commercial applications of lidar: Review and outlook. Opt. Remote Sens. Ind. Environ. Monit. 1998, 3504, 41–44. [Google Scholar]
- Grimson, W.E.L. Object Recognition by Computer: The Role of Geometric Constraints; MIT Press: Cambridge, MA, USA, 1991. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Ma, F.; Karaman, S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4796–4803. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10529–10538. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep continuous fusion for multi-sensor 3d object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 641–656. [Google Scholar]
- Lee, S. Deep learning on radar centric 3D object detection. arXiv 2020, arXiv:2003.00851. [Google Scholar]
- Li, P.; Chen, X.; Shen, S. Stereo r-cnn based 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7644–7652. [Google Scholar]
- Zhou, Y.; He, Y.; Zhu, H.; Wang, C.; Li, H.; Jiang, Q. Monocular 3d object detection: An extrinsic parameter free approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7556–7566. [Google Scholar]
- Nesti, T.; Boddana, S.; Yaman, B. Ultra-sonic sensor based object detection for autonomous vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 210–218. [Google Scholar]
- Komatsu, S.; Markman, A.; Mahalanobis, A.; Chen, K.; Javidi, B. Three-dimensional integral imaging and object detection using long-wave infrared imaging. Appl. Opt. 2017, 56, D120–D126. [Google Scholar] [CrossRef]
- Hansard, M.; Lee, S.; Choi, O.; Horaud, R.P. Time-of-Flight Cameras: Principles, Methods and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- He, Y.; Chen, S. Recent advances in 3D data acquisition and processing by time-of-flight camera. IEEE Access 2019, 7, 12495–12510. [Google Scholar] [CrossRef]
- Wang, K.; Zhou, T.; Li, X.; Ren, F. Performance and challenges of 3D object detection methods in complex scenes for autonomous driving. IEEE Trans. Intell. Veh. 2022, 8, 1699–1716. [Google Scholar] [CrossRef]
- Balasubramaniam, A.; Pasricha, S. Object detection in autonomous vehicles: Status and open challenges. arXiv 2022, arXiv:2201.07706. [Google Scholar]
- Csurka, G. Domain adaptation for visual applications: A comprehensive survey. arXiv 2017, arXiv:1702.05374. [Google Scholar]
- Liu, J.; Li, T.; Xie, P.; Du, S.; Teng, F.; Yang, X. Urban big data fusion based on deep learning: An overview. Inf. Fusion 2020, 53, 123–133. [Google Scholar] [CrossRef]
- Peli, T.; Young, M.; Knox, R.; Ellis, K.K.; Bennett, F. Feature-level sensor fusion. In Proceedings of the Sensor Fusion: Architectures, Algorithms, and Applications III, Orlando, FL, USA, 7–9 April 1999; SPIE: Bellingham, WA, USA, 1999; Volume 3719, pp. 332–339. [Google Scholar]
- Rashinkar, P.; Krushnasamy, V. An overview of data fusion techniques. In Proceedings of the 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bengaluru, India, 21–23 February 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 694–697. [Google Scholar]
- Migliorati, A.; Fiandrotti, A.; Francini, G.; Lepsoy, S.; Leonardi, R. Feature fusion for robust patch matching with compact binary descriptors. In Proceedings of the 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), Vancouver, BC, Canada, 29–31 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Elkosantini, S.; Frikha, A. Decision fusion for signalized intersection control. Kybernetes 2015, 44, 57–76. [Google Scholar] [CrossRef]
- Han, Y. Reliable template matching for image detection in vision sensor systems. Sensors 2021, 21, 8176. [Google Scholar] [CrossRef] [PubMed]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Dong, N.; Liu, F.; Li, Z. Crowd Density Estimation Using Sparse Texture Features. J. Converg. Inf. Technol. 2010, 5, 125–137. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the Computer Vision—ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Part I; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ramachandram, D.; Taylor, G.W. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process. Mag. 2017, 34, 96–108. [Google Scholar] [CrossRef]
- Zeller, M.; Sandhu, V.S.; Mersch, B.; Behley, J.; Heidingsfeld, M.; Stachniss, C. Radar Instance Transformer: Reliable Moving Instance Segmentation in Sparse Radar Point Clouds. IEEE Trans. Robot. 2024, 40, 2357–2372. [Google Scholar] [CrossRef]
- Ando, A.; Gidaris, S.; Bursuc, A.; Puy, G.; Boulch, A.; Marlet, R. Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5240–5250. [Google Scholar]
- Wang, H.; Shi, C.; Shi, S.; Lei, M.; Wang, S.; He, D.; Schiele, B.; Wang, L. Dsvt: Dynamic sparse voxel transformer with rotated sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13520–13529. [Google Scholar]
- Hu, Y.; Li, S.; Weng, W.; Xu, K.; Wang, G. NSAW: An Efficient and Accurate Transformer for Vehicle LiDAR Object Detection. IEEE Trans. Instrum. Meas. 2023, 72, 5028310. [Google Scholar] [CrossRef]
- Boulch, A.; Sautier, C.; Michele, B.; Puy, G.; Marlet, R. Also: Automotive lidar self-supervision by occupancy estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13455–13465. [Google Scholar]
- Lu, G.; He, Z.; Zhang, S.; Huang, Y.; Zhong, Y.; Li, Z.; Han, Y. A Novel Method for Improving Point Cloud Accuracy in Automotive Radar Object Recognition. IEEE Access 2023, 11, 78538–78548. [Google Scholar] [CrossRef]
- Chai, R.; Li, B.; Liu, Z.; Li, Z.; Knoll, A.; Chen, G. GAN Inversion Based Point Clouds Denoising in Foggy Scenarios for Autonomous Driving. In Proceedings of the 2023 IEEE International Conference on Development and Learning (ICDL), Macau, China, 9–11 November 2023; pp. 107–112. [Google Scholar] [CrossRef]
- Liu, Z.S.; Wang, Z.; Jia, Z. Arbitrary Point Cloud Upsampling Via Dual Back-Projection Network. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1470–1474. [Google Scholar]
- Zhang, S.; Yao, T.; Wang, J.; Feng, T.; Wang, Z. Dynamic Object Classification of Low-Resolution Point Clouds: An LSTM-Based Ensemble Learning Approach. IEEE Robot. Autom. Lett. 2023, 8, 8255–8262. [Google Scholar] [CrossRef]
- Xiang, Y.; Mu, A.; Tang, L.; Yang, X.; Wang, G.; Guo, S.; Cui, G.; Kong, L.; Yang, X. Person Identification Method Based on PointNet++ and Adversarial Network for mmWave Radar. IEEE Internet Things J. 2024, 11, 10104–10114. [Google Scholar] [CrossRef]
- Su, M.; Chang, C.; Liu, Z.; Tan, P. A Train Identification Method Based on Sparse Point Clouds Scan Dataset. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 5532–5536. [Google Scholar]
- Yu, F.; Lu, Z. Road Traffic Marking Extraction Algorithm Based on Fusion of Single Frame Image and Sparse Point Cloud. IEEE Access 2023, 11, 88881–88894. [Google Scholar] [CrossRef]
- Han, Z.; Fang, H.; Yang, Q.; Bai, Y.; Chen, L. Online 3D Reconstruction Based On Lidar Point Cloud. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 4505–4509. [Google Scholar]
- Hu, K.; Hu, X.; Qi, L.; Lu, G.; Zhong, Y.; Han, Y. RADNet: A Radar Detection Network for Target Detection Using 3D Range-Angle-Doppler Tensor. In Proceedings of the 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), Auckland, New Zealand, 26–30 August 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Rong, Y.; Wei, X.; Lin, T.; Wang, Y.; Kasneci, E. DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 3237–3246. [Google Scholar]
- Zhao, C.; Xu, H.; Xu, H.; Lai, K.; Cen, M. Spatio-Temporal Fusion: A Fusion Approach for Point Cloud Sparsity Problem. In Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, China, 20–22 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 4836–4841. [Google Scholar]
- Deng, P.; Zhou, L.; Chen, J. VRVP: Valuable Region and Valuable Point Anchor-Free 3D Object Detection. IEEE Robot. Autom. Lett. 2024, 9, 33–40. [Google Scholar] [CrossRef]
- Liu, Z.; Tang, H.; Amini, A.; Yang, X.; Mao, H.; Rus, D.L.; Han, S. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2774–2781. [Google Scholar]
- Jacobson, P.; Zhou, Y.; Zhan, W.; Tomizuka, M.; Wu, M.C. Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based Objects. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 8312–8318. [Google Scholar]
- Hu, Z.K.; Jhong, S.Y.; Hwang, H.W.; Lin, S.H.; Hua, K.L.; Chen, Y.Y. Bi-Directional Bird’s-Eye View Features Fusion for 3D Multimodal Object Detection and Tracking. In Proceedings of the 2023 International Automatic Control Conference (CACS), Penghu, Taiwan, 26–29 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Wang, J.; Kong, X.; Nishikawa, H.; Lian, Q.; Tomiyama, H. Dynamic Point-Pixel Feature Alignment for Multi-modal 3D Object Detection. IEEE Internet Things J. 2023, 11, 11327–11340. [Google Scholar] [CrossRef]
- Klingner, M.; Borse, S.; Kumar, V.R.; Rezaei, B.; Narayanan, V.; Yogamani, S.; Porikli, F. X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13343–13353. [Google Scholar]
- Milli, E.; Erkent, Ö.; Yılmaz, A.E. Multi-Modal Multi-Task (3MT) Road Segmentation. IEEE Robot. Autom. Lett. 2023, 8, 5408–5415. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, K.; Wen, T.; Jiao, X.; Wijaya, B.; Miao, J.; Shi, Y.; Fu, Z.; Yang, M.; Yang, D. SGFNet: Segmentation Guided Fusion Network for 3D Object Detection. IEEE Robot. Autom. Lett. 2023, 8, 8239–8246. [Google Scholar] [CrossRef]
- Sai, S.S.; Kumaraswamy, H.; Kumari, M.U.; Reddy, B.R.; Baitha, T. Implementation of Object Detection for Autonomous Vehicles by LiDAR and Camera Fusion. In Proceedings of the 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), Gwalior, India, 14–16 March 2024; IEEE: Piscataway, NJ, USA, 2024; Volume 2, pp. 1–6. [Google Scholar]
- Wang, H.; Tang, H.; Shi, S.; Li, A.; Li, Z.; Schiele, B.; Wang, L. UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 6792–6802. [Google Scholar]
- Kim, Y.; Shin, J.; Kim, S.; Lee, I.J.; Choi, J.W.; Kum, D. Crn: Camera radar net for accurate, robust, efficient 3d perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 17615–17626. [Google Scholar]
- Appiah, E.O.; Mensah, S. Object detection in adverse weather condition for autonomous vehicles. Multimed. Tools Appl. 2024, 83, 28235–28261. [Google Scholar] [CrossRef]
- Mao, J.; Shi, S.; Wang, X.; Li, H. 3D object detection for autonomous driving: A comprehensive survey. Int. J. Comput. Vis. 2023, 131, 1909–1963. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, B.; Qin, J.; Hu, F.; Hao, J. CooPercept: Cooperative Perception for 3D Object Detection of Autonomous Vehicles. Drones 2024, 8, 228. [Google Scholar] [CrossRef]
- Xiao, Y.; Liu, Y.; Luan, K.; Cheng, Y.; Chen, X.; Lu, H. Deep LiDAR-radar-visual fusion for object detection in urban environments. Remote Sens. 2023, 15, 4433. [Google Scholar] [CrossRef]
- Aher, V.A.; Jondhale, S.R.; Agarkar, B.S.; George, S.; Shaikh, S.A. Advances in Deep Learning-Based Object Detection and Tracking for Autonomous Driving: A Review and Future Directions. In Proceedings of the International Conference on Multi-Strategy Learning Environment, Dehradun, India, 12–13 January 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 569–581. [Google Scholar]
- Padmaja, B.; Moorthy, C.V.; Venkateswarulu, N.; Bala, M.M. Exploration of issues, challenges and latest developments in autonomous cars. J. Big Data 2023, 10, 61. [Google Scholar] [CrossRef]
- Liang, L.; Ma, H.; Zhao, L.; Xie, X.; Hua, C.; Zhang, M.; Zhang, Y. Vehicle detection algorithms for autonomous driving: A review. Sensors 2024, 24, 3088. [Google Scholar] [CrossRef]
Category | Strengths | Limitations | Contributions |
---|---|---|---|
Transformers, Attention Mechanisms, and Self-Supervision Techniques |
|
|
|
GAN-Based, Denoising, and Upsampling Techniques |
|
|
|
Feature Extraction and Enhancement Techniques |
|
|
|
3D Reconstruction and Radar-Based Detection Techniques |
|
|
|
Fusion and Multi-Modal Techniques |
|
|
|
Ref. Paper | Year | Strengths | Limitations | Performance |
---|---|---|---|---|
Transformers, Attention Mechanisms, and Self-Supervision Techniques | ||||
Ref. [69] | 2024 | Excels in moving instance segmentation, minimal computational resources | Limited to scenarios involving movement, may not perform as well in static environments | Reliable moving instance segmentation in sparse radar point clouds |
Ref. [70] | 2023 | Robust representation learning, enhances 3D semantic segmentation accuracy | Dependency on pre-trained ViTs, complexity in converting 3D point clouds to 2D images | Improved segmentation accuracy in sparse and noisy LiDAR point clouds |
Ref. [71] | 2023 | Superior feature propagation, geometric information preservation, state-of-the-art performance | Complexity in window partitioning and processing, potential challenges in scaling | State-of-the-art performance on several benchmarks, effective in managing sparse voxel data |
Ref. [72] | 2023 | Improved feature extraction, significant enhancement in detection accuracy, accelerated model convergence | Focus on non-empty windows may miss relevant information in adjacent sparse areas | Substantial improvements in 3D object detection accuracy, efficiency demonstrated on KITTI dataset |
Ref. [73] | 2023 | Enhanced semantic information capture, improved performance on downstream tasks | Effectiveness dependent on quality of pretext task, may require extensive pre-training data | Significant improvements in handling sparse point clouds across various datasets, effective in semantic segmentation and object detection tasks |
GAN-Based, Denoising, and Upsampling Techniques | ||||
Ref. [74] | 2023 | Enriches semantic information, significant improvements in classification accuracy and segmentation performance | Computationally intensive, performance might be constrained by the quality of the training data | Effective in improving object detection and segmentation in sparse data scenarios |
Ref. [75] | 2023 | Enhances point cloud quality in adverse weather conditions, particularly fog | Balancing noise removal and detail preservation can be challenging | Outperforms other denoising techniques in foggy scenarios, improves reliability of autonomous driving perception systems |
Ref. [76] | 2023 | Restores detailed geometric information, lowest point set matching losses on uniform and non-uniform sparse point clouds | Computational intensity, ensuring added points accurately reflect underlying geometry | Achieves superior performance in restoring geometric detail and increasing point cloud density, outperforms state-of-the-art methods |
Feature Extraction and Enhancement Techniques | ||||
Ref. [77] | 2023 | Enhances dynamic object classification, captures temporal changes, integrates diverse network outputs | Computationally intensive, dependency on quality of temporal data | Superior classification accuracy in low-resolution, sparse point clouds, effective in dynamic object classification |
Ref. [78] | 2023 | Filters noise, enhances point cloud density and quality, robust feature extraction | Balancing noise filtering and detail preservation can be challenging | Significant improvement in person identification accuracy, robust for various applications |
Feature Extraction and Enhancement Techniques | ||||
Ref. [79] | 2024 | High recognition accuracy and efficiency, effective temporal consistency | Requires accurate temporal analysis, computational complexity in projection techniques | Accurate railway train identification in sparse environments, robust temporal and spatial consistency |
Ref. [80] | 2023 | Improved feature extraction and accuracy, high recall rate, F1 score, and error reduction | Ensuring accurate combination of image and point cloud data, computational intensity in data fusion | Enhanced identification and extraction of road traffic markings, effective integration of visual and spatial information |
3D Reconstruction and Radar-Based Detection Techniques | ||||
Ref. [81] | 2023 | Enhances density and geometric consistency, robust and fast for large-scale scenes | May struggle in highly dynamic environments | Accurate 3D reconstructions from sparse LiDAR data, effective for large-scale scene reconstruction |
Ref. [82] | 2023 | Enhances real-time target detection, reduces false alarms, manages sparsity and noise in radar data | Requires significant computational resources, complexity in processing multiple frames | Average accuracy of 80.34% and recall of 85.84% in various driving scenarios, effective in dynamic environments |
Fusion and Multi-Modal Techniques | ||||
Ref. [83] | 2023 | Enhances feature extraction in low point density regions, robust against noisy 2D predictions | May require significant computational resources to process multi-scale features | Superior performance on benchmarks like KITTI and Waymo, effective in scenarios with sparse LiDAR returns |
Ref. [83] | 2023 | Richer feature extraction through dynamic–static integration, effective use of temporal information | Ensuring accurate fusion of dynamic–static data can be challenging | State-of-the-art results on datasets like nuScenes, boost the performance of existing frameworks |
Ref. [84] | 2023 | Provides a denser and more informative representation, reduces noise, improves data quality | Processing multi-frame aggregations can be computationally intensive | Significant boost in 3D object detection performance on benchmark datasets, effective temporal aggregation |
Ref. [85] | 2024 | Fine-grained multi-scale feature encoding, effective in small object detection | Ensuring accurate fusion of valuable point data is crucial | Exceptional performance in detecting small objects like pedestrians and cyclists, robust feature extraction |
Category | Strengths | Limitations | Contributions |
---|---|---|---|
Projection-Based Fusion |
|
|
|
Alignment and Distillation Techniques |
|
|
|
Segmentation-Guided Fusion |
|
|
|
Transformers and Attention Mechanisms |
|
|
|
Ref. Paper | Year | Strengths | Limitations | Performance |
---|---|---|---|---|
Projection-Based Fusion Methods | ||||
Ref. [86] | 2023 | Preserves detailed geometric and semantic information, superior detection results, robustness under various conditions | Higher computational demand | State-of-the-art performance on datasets like nuScenes and Waymo, comprehensive projection approach |
Ref. [87] | 2023 | Enhanced computational efficiency, maintains high detection accuracy, reduces number of features to be fused | Trades some richness in feature representation | +4.9% improvement in Mean Average Precision (mAP) and +2.4% in nuScenes Detection Score (NDS) over LiDAR-only baseline, suitable for real-time applications |
Ref. [88] | 2023 | Addresses misalignment issues, enhances detection and tracking performance, comprehensive feature integration | More complex processing steps | Significant improvements in precision and recall, effective in various environmental conditions |
Alignment and Distillation Techniques | ||||
Ref. [89] | 2023 | Robust handling of noise, effective feature interaction, low computational complexity | Complexity in dynamic alignment | State-of-the-art performance on KITTI dataset, excels in detecting small objects |
Ref. [90] | 2023 | Leverages privileged LiDAR information, robust detection without LiDAR during inference | Requires extensive training data and complex distillation processes | Significant improvement in mAP and NDS metrics on nuScenes and Waymo datasets |
Ref. [91] | 2023 | Effective feature integration, high accuracy in road segmentation, robust performance under various conditions | Complexity in integrating features from multiple sensors | High accuracy and robust performance on KITTI and Cityscapes datasets, versatile in real-time applications |
Segmentation-Guided Fusion Techniques | ||||
Ref. [92] | 2023 | Preserves more image content during projection, detailed and robust feature extraction | Complexity in hierarchical feature map projection | State-of-the-art performance on KITTI and nuScenes datasets, significant improvements in detecting small, occluded, and truncated objects |
Ref. [93] | 2024 | High accuracy in object detection and distance estimation, efficient feature extraction and fusion | May not preserve as much detailed feature information as hierarchical approaches | Object detection accuracy of 98% and distance estimation accuracy of around 97% on KITTI benchmark dataset |
Transformers and Attention Mechanisms | ||||
Ref. [94] | 2023 | Efficient feature interaction and alignment, handles multiple sensor modalities simultaneously | Complexity in managing shared parameters for different modalities | State-of-the-art performance on nuScenes dataset, significant improvements in 3D object detection and BEV map segmentation tasks |
Ref. [95] | 2023 | Robustness and accuracy in long-range detection, effective handling of spatial misalignment | Complexity in transforming and aligning features from different modalities | State-of-the-art performance on nuScenes dataset, excels in long-range detection scenarios |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fawole, O.A.; Rawat, D.B. Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey. AI 2024, 5, 1255-1285. https://doi.org/10.3390/ai5030061
Fawole OA, Rawat DB. Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey. AI. 2024; 5(3):1255-1285. https://doi.org/10.3390/ai5030061
Chicago/Turabian StyleFawole, Oluwajuwon A., and Danda B. Rawat. 2024. "Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey" AI 5, no. 3: 1255-1285. https://doi.org/10.3390/ai5030061
APA StyleFawole, O. A., & Rawat, D. B. (2024). Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey. AI, 5(3), 1255-1285. https://doi.org/10.3390/ai5030061