Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction
Abstract
:1. Introduction
1.1. Paper Motivation
1.2. Paper Contribution
- The fundamental concepts of 3D object detection are illustrated in this paper (Section 2);
- Three-dimensional object detection is booming in different sectors. This paper presents 3D object detection techniques applied in different sectors, benchmark datasets in various fields and compares sensor types (Section 3);
- The most frequently used evaluation metrics are discussed (Section 4);
- This paper compares the speed and performance of different methods in a popular benchmark dataset (Section 5);
- A SWOT (Strengths, Weaknesses, Opportunities and Threats) analysis is presented on the singular and multimodal techniques (Section 5);
- After thorough analysis, future directions and limitations of the existing methods are provided (Section 6).
2. Three-dimensional Object Detection Techniques
2.1. Singular Modalities
2.1.1. Point Clouds
Direct Point Cloud Processing
Projection of Point Cloud in Bird’s Eye View (BEV) or 2D Plane
2.1.2. Camera/Vision
Monocular (RGB)
Stereo
2.2. Multimodal Methods
2.2.1. Two-dimensional Image and Point Cloud
2.2.2. RADAR and 2D Image
2.2.3. Other 3D Object Detection Methods
3. Branches of 3D Object Detection Techniques
- Autonomous Vehicle navigation: LiDAR is a very popular modality in this field. It can be used both in singular and multimodal methods. LiDAR exhibits long-range LASER scanning, making it capable of designing a standalone end-to-end 3D object detection system. RGB-D sensors have low range (usually lower than 10 m). Due to this constraint, autonomous vehicle navigation-related research works are not found to use this modality.
- Robot Vision: RGB-D sensor is the most popular sensor, serving both in singular and multimodal techniques of robot vision. Most of these research works are conducted in indoor environments. For this reason, the long-range detection requirement is absent in this field. This makes RGB-D a wonderful option for perceiving both color and depth information of objects in indoor environments. RGB-D cameras are constructed with an RGB camera (for color perception) and infrared sensors (for depth perception).
- Precision Agriculture: In agriculture, LiDAR is found to be used for long-range 3D detection. Specially, precision agriculture involving UAVs from high altitude is benefitted by LiDAR. LiDAR has been applied as singular modality or multimodal technique with other sensors such as RGB cameras or narrow beam SONAR (Sound Navigation and Ranging). However, monocular cameras are being used as a singular modality in the Multiview 3D detection technique. In this method, 2D images captured from different angles around the object contribute to 3D detection. In the case of multimodal detection, RGB cameras can be used with RGB-D sensors. Monocular cameras, having lower range than LiDAR can serve for 3D object detection from a closer range.
- Human activity/pose detection: Monocular RGB cameras are widely used for human pose detection using the Multiview 3D object detection technique. For long-range detection, LiDAR has been used by researchers. To enhance the detectability of LiDAR-based detection, some other modalities such as inertial measurement unit (IMU) are also used in existing research works. However, in indoor robot vision, human activity detection and precision agriculture, RADAR is not usually preferred for 3D object detection technique. The reason may be the low spatial resolution of RADAR (compared to LiDAR or camera) which makes detection of thin objects or close-proximity objects difficult and ambiguous.
Domain | Ref. | Sensor Data Type | Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LiDAR | RGB-D | Monocular (RGB) image | Stereo Image Pair | RADAR | other | KITTI [9] | nuScenes [12] | Waymo [13] | SUN RGB-D [10] | ScanNet [11] | Others | ||
Autonomous Vehicle navigation (Detecting vehicles and people in road scene) | [37] | ✓ | ✓ | ||||||||||
[31] | ✓ | ✓ | |||||||||||
[43] | ✓ | ✓ | |||||||||||
[44] | ✓ | ✓ | |||||||||||
[56] | ✓ | ✓ | |||||||||||
[59] | ✓ | ✓ | |||||||||||
[60] | ✓ | ✓ | |||||||||||
[53] | ✓ | ✓ | |||||||||||
[42] | ✓ | ✓ | ✓ | ||||||||||
[54] | ✓ | ✓ | |||||||||||
[104] | ✓ | ||||||||||||
[105] | ✓ | ✓ | ✓ | ||||||||||
Indoor objects (Robotic Vision) | [104] | ✓ | ✓ | ✓ | |||||||||
[105] | ✓ | ✓ | ✓ | ||||||||||
[29] | ✓ | [106,107] | |||||||||||
[30] | ✓ | ✓ | [107,108] | ||||||||||
[32] | ✓ | ✓ | ✓ | ||||||||||
[109] | ✓ | [110] | |||||||||||
[111] | ✓ | Self-made | |||||||||||
[112] | ✓ | TUM RGB-D | |||||||||||
Precision Agriculture | [113] | Multi spectral | Self-made | ||||||||||
[97] | ✓ | Self-made | |||||||||||
[114] | ✓ (Multi View) | Self-made | |||||||||||
Human Pose/ Activity Detection | [115] | ✓ (Multi View) | [116,117,118] | ||||||||||
[119] | ✓ (Multi View) | [116,117,118] |
Domain | Ref. | Sensor Data Type | Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LiDAR | RGB-D | Monocular (RGB) image | Stereo Image | RADAR | Other | KITTI [9] | nuScenes [12] | Waymo [13] | SUN RGB-D [10] | ScanNet [11] | Others | ||
Autonomous Vehicle navigation | [61] | ✓ | ✓ | ✓ | |||||||||
[120] | ✓ | ✓ | ✓ | ||||||||||
[90] | ✓ | ✓ | ✓ | [121] | |||||||||
[67] | ✓ | ✓ | ✓ | ✓ | |||||||||
[92] | ✓ | ✓ | ✓ | ||||||||||
[94] | ✓ | ✓ | ✓ | [122] | |||||||||
[95] | ✓ | ✓ | ✓ | ✓ | |||||||||
Indoor objects (Robotic Vision) | [61] | ✓ | ✓ | ✓ | |||||||||
Precision Agriculture | [123] | ✓ | ✓ | Self-made | |||||||||
[124] | ✓ | ✓ | Self-made | ||||||||||
[125] | ✓ | ✓ | narrow beam SONAR | Self-made | |||||||||
Human Pose/ Activity Detection | [126] | ✓ | inertial measurement unit (IMU) | [127] |
4. Evaluation Metrics
5. Observation and Analysis
6. Advancement, Challenges and Future Directions
- Point cloud-based 3D object detection can be performed in both indoor and outdoor environments. However, LiDAR can generate point clouds at longer range in variable weather conditions, whereas RGB-D or Kinect-based point clouds face limitations in terms of range and weather conditions. For this reason, point cloud-based 3D object detection is performed with the help of LiDAR sensors in autonomous vehicle navigation research. However, RGB-D sensors are less expensive, and the generated point clouds are successfully implemented in close-range research works on precision agriculture or indoor robotic vision;
- Three-dimensional object detection technology is significantly supported by deep learning. Deep learning networks are comprised of multilayer neural networks which can learn the patterns of data. In significant 3D object detection networks, such as PointNet, PointNet++, VoxelNet, CenterNet, etc., deep learning is used to learn object information from points or group of points. Also, in the two-stage networks where initially RGB images are used for region proposals, deep learning is applied for predicting the regions of objects. Future research work may include deep learning for leveraging more opportunities such as transfer learning in 3D object detection-related research;
- Development of end-to-end 3D object detection networks is becoming popular for their ease of application. End-to-end networks directly need to collect raw sensor data and provide 3D bounding-box output predictions. To develop such networks, it is essential to choose the necessary type of sensor (LiDAR, camera or RADAR), pre-process the data, design a neural network to learn the features of the data and train, validate and evaluate the model. Both hardware and software knowledge are required for the developers of end-to-end networks;
- Data collection and annotation for 3D object detection is more complex compared to 2D object detection. Three-dimensional object detection data collection involves fusing the data from different types of sensors, such as LiDAR, monocular or stereo cameras, RADAR, etc. This process requires calibration of different devices and synchronization of the data. The data annotation for 3D object detection needs to describe not only the object location, but also its dimensions in space, position and orientation. The data description involves parameters such as object length, width, height, yaw, pitch, roll, occlusion amount, etc. More expertise in terms of 3D geometry is necessary to annotate data in the case of 3D object detection;
- The limitation of point cloud-based object detection, especially in outdoor environments, is sparsity. Hence, thin object detection is an open research question in this field. Comparing the average precision values of cars and pedestrians in Table 6, it is clearly visible that detection of the thin surface of pedestrians achieved far less precision. Hence, how to increase precision for thin object detection with point cloud methods is open research question;
- Data scarcity is one of the main constraints in 3D object detection-related research. Due to the support and sponsorship of automobile companies in their pursuit of building self-driving cars, some enriched benchmark datasets such as KITTI, Waymo, nuScenes, etc. are widely available. A few indoor benchmark datasets are also available for robot vision research, such as SUN RGB-D and ScanNet. But there is scarcity of open benchmark datasets for other fields. Specially, 3D object detection is becoming popular in precision agriculture, but the conducted research works are found to be using self-collected datasets. However, these datasets are not publicly available. This is a constraint on conducting 3D object detection research in agriculture.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; Available online: http://pjreddie.com/yolo/ (accessed on 19 January 2023).
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; Available online: http://pjreddie.com/yolo9000/ (accessed on 19 January 2023).
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. in Computer vision and pattern recognition. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Thuan, D. Evolution of Yolo Algorithm and Yolov5: The State-of-the-Art Object Detention Algorithm. 2021. Available online: http://www.theseus.fi/handle/10024/452552 (accessed on 19 January 2023).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. Available online: https://github.com/rbgirshick/ (accessed on 19 January 2023).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Available online: https://github.com/ (accessed on 19 January 2023).
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
- Song, S.; Lichtenberg, S.P.; Xiao, J. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; Available online: https://rgbd.cs.princeton.edu/ (accessed on 11 January 2023).
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. ScanNet | Richly-annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; Available online: http://www.scan-net.org/ (accessed on 11 January 2023).
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbo, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; Available online: https://waymo.com/open/ (accessed on 12 January 2023).
- Shahbazi, M.; Ménard, P.; Sohn, G.; Théau, J. Unmanned aerial image dataset: Ready for 3D reconstruction. Data Brief 2019, 25, 103962. [Google Scholar] [CrossRef] [PubMed]
- SVélez, S.; Vacas, R.; Martín, H.; Ruano-Rosa, D.; Álvarez, S. High-Resolution UAV RGB Imagery Dataset for Precision Agriculture and 3D Photogrammetric Reconstruction Captured over a Pistachio Orchard (Pistacia vera L.) in Spain. Data 2022, 7, 157. [Google Scholar] [CrossRef]
- Li, T.; Liu, J.; Zhang, W.; Ni, Y.; Wang, W.; Li, Z. UAV-Human: A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16266–16275. [Google Scholar]
- Singh, S.P.S.; Wang, L.; Gupta, S.; Goli, H.; Padmanabhan, P.; Gulyás, B. 3d deep learning on medical images: A review. Sensors 2023, 20, 5097. [Google Scholar] [CrossRef] [PubMed]
- Fernandes, D.; Silva, A.; Névoa, R.; Simões, C.; Gonzalez, D.; Guevara, M.; Novais, P.; Monteiro, J.; Melo-Pinto, P. Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy. Inf. Fusion 2020, 68, 161–191. [Google Scholar] [CrossRef]
- Zamanakos, G.; Tsochatzidis, L.; Amanatiadis, A.; Pratikakis, I. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving. Comput. Graph. 2021, 99, 153–181. [Google Scholar] [CrossRef]
- Arnold, E.; Al-Jarrah, O.Y.; Dianati, M.; Fallah, S.; Oxtoby, D.; Mouzakitis, A. A Survey on 3D Object Detection Methods for Autonomous Driving Applications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3782–3795. [Google Scholar] [CrossRef]
- Liang, W.; Xu, P.; Guo, L.; Bai, H.; Zhou, Y.; Chen, F. A survey of 3D object detection. Multimedia Tools Appl. 2021, 80, 29617–29641. [Google Scholar] [CrossRef]
- Mao, J.; Shi, S.; Wang, X.; Li, H. 3D Object Detection for Autonomous Driving: A Comprehensive Survey. Int. J. Comput. Vis. 2023, 131, 1–55. [Google Scholar] [CrossRef]
- MDrobnitzky, M.; Friederich, J.; Egger, B.; Zschech, P. Survey and Systematization of 3D Object Detection Models and Methods. Vis. Comput. 2023, 1–47. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, Y.; Zhang, S.; Ogai, H. Deep 3D Object Detection Networks Using LiDAR Data: A Review. IEEE Sens. J. 2021, 21, 1152–1171. [Google Scholar] [CrossRef]
- Hoque, S.; Arafat, Y.; Xu, S.; Maiti, A.; Wei, Y. A Comprehensive Review on 3D Object Detection and 6D Pose Estimation with Deep Learning. IEEE Access 2021, 9, 143746–143770. [Google Scholar] [CrossRef]
- Mohan, N.; Kumar, M. Room layout estimation in indoor environment: A review. Multimedia Tools Appl. 2022, 81, 1921–1951. [Google Scholar] [CrossRef]
- Hasan, M.; Hanawa, J.; Goto, R.; Suzuki, R.; Fukuda, H.; Kuno, Y.; Kobayashi, Y. LiDAR-based detection, tracking, and property estimation: A contemporary review. Neurocomputing 2022, 506, 393–405. [Google Scholar] [CrossRef]
- Tong, G.; Li, Y.; Chen, D.; Sun, Q.; Cao, W.; Xiang, G. CSPC-Dataset: New LiDAR Point Cloud Dataset and Benchmark for Large-Scale Scene Semantic Segmentation. IEEE Access 2020, 8, 87695–87718. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Li, C.R.Q.; Hao, Y.; Leonidas, S.; Guibas, J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. 2018. Available online: http://arxiv.org/abs/1812.04244 (accessed on 29 November 2023).
- Qi, C.R.; Litany, O.; He, K.; Guibas, L. Deep Hough Voting for 3D Object Detection in Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 2 November 2019; Available online: http://arxiv.org/abs/1904.09664 (accessed on 29 November 2023).
- Huang, X.; Wang, P.; Cheng, X.; Zhou, D.; Geng, Q.; Yang, R. The ApolloScape Open Dataset for Autonomous Driving and Its Application. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2702–2719. [Google Scholar] [CrossRef] [PubMed]
- Casas, S.; Gulino, C.; Liao, R.; Urtasun, R. SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 9491–9497. [Google Scholar] [CrossRef]
- Halder, S.; Lalonde, J.-F.; De Charette, R. Physics-Based Rendering for Improving Robustness to Rain. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10203–10212. Available online: https://team.inria.fr/rits/computer-vision/weather-augment/ (accessed on 21 January 2023).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Available online: http://arxiv.org/abs/1711.06396 (accessed on 29 November 2023).
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-View Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 945–953. Available online: http://vis-www.cs.umass.edu/mvcnn (accessed on 21 January 2023).
- Qi, C.R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and Multi-View CNNs for Object Classification on 3D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5648–5656. [Google Scholar]
- Premebida, C.; Carreira, J.; Batista, J.; Nunes, U. Pedestrian detection combining RGB and dense LIDAR data. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 4112–4117. [Google Scholar] [CrossRef]
- Gonzalez, A.; Villalonga, G.; Xu, J.; Vazquez, D.; Amores, J.; Lopez, A.M. Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection. In Proceedings of the IEEE Intelligent Vehicles Symposium, Seoul, Republic of Korea, 28 June–1 July 2015; pp. 356–361. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krähenbühl, P. Center-based 3D Object Detection and Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; Available online: http://arxiv.org/abs/2006.11275 (accessed on 29 November 2023).
- Simon, M.; Milz, S.; Amende, K.; Gross, H.-M. Complex-YOLO: Real-Time 3D Object Detection on Point Clouds. 2018. Available online: http://arxiv.org/abs/1803.06199 (accessed on 29 November 2023).
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019; Available online: http://arxiv.org/abs/1812.05784 (accessed on 29 November 2023).
- Mahayuddin, Z.R.; Saif, A.F.M.S. Edge Feature based Moving Object Detection Using Aerial Images: A Comparative Study. In Proceedings of the 6th International Conference on Computing, Engineering, and Design, ICCED 2020, Sukabumi, Indonesia, 15–16 October 2020. [Google Scholar]
- Mahayuddin, Z.R.; Saif, A.F.M.S. Moving Object Detection Using Semantic Convolutional Features. J. Inf. Syst. Technol. Manag. 2022, 7, 24–41. [Google Scholar] [CrossRef]
- Saif, A.F.M.S.; Mahayuddin, Z.R.; Arshad, H. Vision-Based Efficient Collision Avoidance Model Using Distance Measurement. In Soft Computing Approach for Mathematical Modeling of Engineering Problems; CRC Press: Boca Raton, FL, USA, 2021; pp. 191–202. [Google Scholar] [CrossRef]
- Mahayuddin, Z.R.; Saif, A.S. View of A Comparative Study of Three Corner Feature Based Moving Object Detection Using Aerial Images. Malays. J. Comput. Sci. 2019, 25–33. Available online: http://adum.um.edu.my/index.php/MJCS/article/view/21461/10985 (accessed on 13 February 2023).
- Saif, A.F.M.S.; Mahayuddin, Z.R. Crowd Density Estimation from Autonomous Drones Using Deep Learning: Challenges and Applications. J. Eng. Sci. Res. 2021, 5, 1–6. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, G.; Lei, Z.; Hwang, J.-N. Eye in the Sky. In Proceedings of the 27th ACM International Conference on Multimedia, New York, NY, USA, 21–25 October 2019; pp. 899–907. [Google Scholar] [CrossRef]
- Saif, S.; Zainal, F.; Mahayuddin, R. Vision based 3D Object Detection using Deep Learning: Methods with Challenges and Applications towards Future Directions. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 203–214. [Google Scholar] [CrossRef]
- Brazil, G.; Liu, X. M3D-RPN: Monocular 3D Region Proposal Network for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; Available online: http://arxiv.org/abs/1907.06038 (accessed on 29 November 2023).
- Liu, Z.; Wu, Z.; Tóth, R. SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; Available online: http://arxiv.org/abs/2002.10111 (accessed on 29 November 2023).
- Wang, T.; Zhu, X.; Pang, J.; Lin, D. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; Available online: https://paperswithcode.com/paper/fcos3d-fully-convolutional-one-stage (accessed on 11 January 2023).
- Shapii, A.; Pichak, S.; Mahayuddin, Z.R. 3D Reconstruction Technique from 2d Sequential Human Body Images in Sports: A Review. Technol. Rep. Kansai Univ. 2020, 62, 4973–4988. Available online: https://www.researchgate.net/publication/345392953 (accessed on 13 February 2023).
- Wang, Y.; Chao, W.-L.; Garg, D.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; Available online: http://arxiv.org/abs/1812.07179 (accessed on 29 November 2023).
- You, Y.; Wang, Y.; Chao, W.L.; Garg, D.; Pleiss, G.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving. 2019. Available online: http://arxiv.org/abs/1906.06310 (accessed on 29 November 2023).
- Chen, Y.; Huang, S.; Liu, S.; Yu, B.; Jia, J. DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022. Available online: http://arxiv.org/abs/2204.03039 (accessed on 29 November 2023).
- Li, P.; Chen, X.; Shen, S. Stereo R-CNN based 3D Object Detection for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; Available online: http://arxiv.org/abs/1902.09738 (accessed on 29 November 2023).
- Qin, Z.; Wang, J.; Lu, Y. Triangulation Learning Network: From Monocular to Stereo 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; Available online: http://arxiv.org/abs/1906.01193 (accessed on 29 November 2023).
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; Available online: http://arxiv.org/abs/1711.08488 (accessed on 29 November 2023).
- Wang, Z.; Jia, K. Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Macau, China, 3–8 November 2019; pp. 1742–1749. [Google Scholar] [CrossRef]
- Shin, K.; Kwon, Y.P.; Tomizuka, M. RoarNet: A Robust 3D object detection based on region approximation refinement. In Proceedings of the IEEE Intelligent Vehicles Symposium, Paris, France, 9–12 June 2019; pp. 2510–2515. [Google Scholar] [CrossRef]
- Paigwar, A.; Sierra-Gonzalez, D.; Erkent, Ö.; Laugier, C. Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection Using RGB Camera and LiDAR. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Du, X.; Ang, M.H.; Karaman, S.; Rus, D. A General Pipeline for 3D Detection of Vehicles. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 3194–3200. [Google Scholar] [CrossRef]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Vora, S.; Lang, A.H.; Helou, B.; Beijbom, O. PointPainting: Sequential Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4604–4612. [Google Scholar]
- Xu, S.; Zhou, D.; Fang, J.; Yin, J.; Bin, Z.; Zhang, L. FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Indianapolis, IN, USA, 19–22 September 2021; pp. 3047–3054. [Google Scholar] [CrossRef]
- Simon, M.; Amende, K.; Kraus, A.; Honer, J.; Samann, T.; Kaulbersch, H.; Milz, S.; Gross, H.M. Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Meyer, G.P.; Charland, J.; Hegde, D.; Laddha, A.; Vallespi-Gonzalez, C. Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Wang, S.; Suo, S.; Ma, W.-C.; Pokrovsky, A.; Urtasun, R. Deep Parametric Continuous Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2589–2597. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep Continuous Fusion for Multi-Sensor 3D Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 641–656. [Google Scholar]
- Sindagi, A.V.; Zhou, Y.; Tuzel, O. MVX-net: Multimodal VoxelNet for 3D object detection. In Proceedings of the IEEE International Conference on Robotics and Automation, Montreal, Canada, 20–24 May 2019; pp. 7276–7282. [Google Scholar] [CrossRef]
- Li, Y.; Yu, A.W.; Meng, T.; Caine, B.; Ngiam, J.; Peng, D.; Shen, J.; Lu, Y.; Zhou, D.; Le, Q.V.; et al. DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17182–17191. Available online: https://github.com/NVIDIA/semantic-segmentation (accessed on 13 January 2023).
- Zhang, Y.; Chen, J.; Huang, D. CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 908–917. [Google Scholar]
- Yoo, J.H.; Kim, Y.; Kim, J.; Choi, J.W. 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection. Lect. Notes Comput. Sci. 2020, 12372, 720–736. [Google Scholar]
- Chen, X.; Zhang, T.; Wang, Y.; Wang, Y.; Zhao, H. FUTR3D: A Unified Sensor Fusion Framework for 3D Detection. arXiv 2022, arXiv:2203.10642. [Google Scholar] [CrossRef]
- Liu, Z.; Tang, H.; Amini, A.; Yang, X.; Mao, H.; Rus, D.L.; Han, S. BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. arXiv 2022, arXiv:2205.13542. [Google Scholar] [CrossRef]
- Chen, Z.; Li, Z.; Zhang, S.; Fang, L.; Jiang, Q.; Zhao, F.; Zhou, B.; Zhao, H. AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection. Int. Jt. Conf. Artif. Intell. 2022, 827–833. [Google Scholar] [CrossRef]
- Bai, X.; Hu, Z.; Zhu, X.; Huang, Q.; Chen, Y.; Fu, H.; Tai, C.L. TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1090–1099. [Google Scholar]
- Dou, J.; Xue, J.; Fang, J. SEG-VoxelNet for 3D vehicle detection from RGB and LiDAR data. In Proceedings of the 2019 International Conference on Robotics and Automation, Montreal, Canada, 20–24 May 2019; pp. 4362–4368. [Google Scholar] [CrossRef]
- Chen, Y.; Li, H.; Gao, R.; Zhao, D. Boost 3-D Object Detection via Point Clouds Segmentation and Fused 3-D GIoU-L Loss. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 762–773. [Google Scholar] [CrossRef]
- Wang, C.; Ma, C.; Zhu, M.; Yang, X.; Key, M. PointAugmenting: Cross-Modal Augmentation for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11794–11803. [Google Scholar]
- Xu, D.; Anguelov, D.; Jain, A. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 244–253. [Google Scholar]
- Huang, T.; Liu, Z.; Chen, X.; Bai, X. EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Springer Science and Business Media Deutschland GmbH, Glasgow, UK, 23–28 August 2020; pp. 35–52. [Google Scholar]
- Xie, L.; Xiang, C.; Yu, Z.; Xu, G.; Yang, Z.; Cai, D.; He, X. PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12460–12467. [Google Scholar] [CrossRef]
- Wang, Z.; Zhao, Z.; Jin, Z.; Che, Z.; Tang, J.; Shen, C.; Peng, Y. Multi-Stage Fusion for Multi-Class 3D Lidar Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 3120–3128. [Google Scholar]
- Zhu, M.; Ma, C.; Ji, P.; Yang, X. Cross-Modality 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikola, HI, USA, 3–8 January 2021; pp. 3772–3781. [Google Scholar]
- Li, Y.; Qi, X.; Chen, Y.; Wang, L.; Li, Z.; Sun, J.; Jia, J. Voxel Field Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1120–1129. Available online: https://github.com/dvlab-research/VFF (accessed on 30 May 2023).
- Liang, M.; Yang, B.; Chen, Y.; Hu, R.; Urtasun, R. Multi-Task Multi-Sensor Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7345–7353. [Google Scholar]
- An, P.; Liang, J.; Yu, K.; Fang, B.; Ma, J. Deep structural information fusion for 3D object detection on LiDAR–camera system. Comput. Vis. Image Underst. 2022, 214, 103295. [Google Scholar] [CrossRef]
- Nabati, R.; Qi, H. CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Nashville, TN, USA, 20–25 June 2021; Available online: https://github.com/mrnabati/CenterFusion (accessed on 26 January 2023).
- Nabati, R.; Qi, H. Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles. 2020. Available online: http://arxiv.org/abs/2009.08428 (accessed on 29 November 2023).
- Nobis, F.; Geisslinger, M.; Weber, M.; Betz, J.; Lienkamp, M. A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection. In Proceedings of the 2019 Symposium on Sensor Data Fusion: Trends, Solutions, Applications, SDF 2019, Bonn, Germany, 15–17 October 2019. [Google Scholar] [CrossRef]
- Wang, L.; Chen, T.; Anklam, C.; Goldluecke, B. High Dimensional Frustum PointNet for 3D Object Detection from Camera, LiDAR, and Radar. In Proceedings of the IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1621–1628. [Google Scholar] [CrossRef]
- Chen, X.; Huang, H.; Liu, Y.; Li, J.; Liu, M. Robot for automatic waste sorting on construction sites. Autom. Constr. 2022, 141, 104387. [Google Scholar] [CrossRef]
- Gené-Mola, J.; Sanz-Cortiella, R.; Rosell-Polo, J.R.; Morros, J.-R.; Ruiz-Hidalgo, J.; Vilaplana, V.; Gregorio, E. Fruit detection and 3D location using instance segmentation neural networks and structure-from-motion photogrammetry. Comput. Electron. Agric. 2020, 169, 105165. [Google Scholar] [CrossRef]
- Teng, P.; Zhang, Y.; Yamane, T.; Kogoshi, M.; Yoshida, T.; Ota, T.; Nakagawa, J. Accuracy Evaluation and Branch Detection Method of 3D Modeling Using Backpack 3D Lidar SLAM and UAV-SfM for Peach Trees during the Pruning Period in Winter. Remote Sens. 2023, 15, 408. [Google Scholar] [CrossRef]
- Parmar, H.S.; Nutter, B.; Long, R.; Antani, S.; Mitra, S. Deep learning of volumetric 3D CNN for fMRI in Alzheimer’s disease classification. In Medical Imaging 2020: Biomedical Applications in Molecular, Structural, and Functional Imaging; SPIE: Bellingham, WA, USA, 2020; Volume 11317, pp. 66–71. [Google Scholar] [CrossRef]
- Wegmayr, V.; Aitharaju, S.; Buhmann, J. Classification of brain MRI with big data and deep 3D convolutional neural networks. In Medical Imaging 2018: Computer-Aided Diagnosis; SPIE: Bellingham, WA, USA, 2018; Volume 10575, pp. 406–412. [Google Scholar] [CrossRef]
- Nie, D.; Zhang, H.; Adeli, E.; Liu, L.; Shen, D. 3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II; Springer: Cham, Switzerland, 2016; pp. 212–220. [Google Scholar] [CrossRef]
- Tang, Z.; Chen, K.; Pan, M.; Wang, M.; Song, Z. An Augmentation Strategy for Medical Image Processing Based on Statistical Shape Model and 3D Thin Plate Spline for Deep Learning. IEEE Access 2019, 7, 133111–133121. [Google Scholar] [CrossRef]
- Han, C.; Kitamura, Y.; Kudo, A.; Ichinose, A.; Rundo, L.; Furukawa, Y.; Umemoto, K.; Li, Y.; Nakayama, H. Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-Based CT Image Augmentation for Object Detection. In Proceedings of the 2019 International Conference on 3D Vision, 3DV 2019, Québec, Canada, 16–19 September 2019; pp. 729–737. [Google Scholar] [CrossRef]
- Feng, M.; Gilani, S.Z.; Wang, Y.; Zhang, L.; Mian, A. Relation Graph Network for 3D Object Detection in Point Clouds. IEEE Trans. Image Process. 2021, 30, 92–107. [Google Scholar] [CrossRef]
- Pan, X.; Xia, Z.; Song, S.; Li, L.E.; Huang, G. 3D Object Detection with Pointformer. In Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Armeni, I. 3D Semantic Parsing of Large-Scale Indoor Spaces (a) Raw Point Cloud (b) Space Parsing and Alignment in Canonical 3D Space (c) Building Element Detection Enclosed Spaces. Available online: http://buildingparser.stanford.edu/ (accessed on 9 September 2023).
- Princeton ModelNet. Available online: https://modelnet.cs.princeton.edu/ (accessed on 11 January 2023).
- SHREC15. Non-Rigid 3D Shape Retrieval. Available online: https://www.icst.pku.edu.cn/zlian/representa/3d15/dataset/index.htm (accessed on 13 February 2023).
- Wang, L.; Li, R.; Sun, J.; Liu, X.; Zhao, L.; Seah, H.S.; Quah, C.K.; Tandianus, B. Multi-View Fusion-Based 3D Object Detection for Robot Indoor Scene Perception. Sensors 2019, 19, 4092. [Google Scholar] [CrossRef]
- Hua, B.-S.; Pham, Q.-H.; Nguyen, D.T.; Tran, M.-K.; Yu, L.-F.; Yeung, S.-K. SceneNN: A scene meshes dataset with aNNotations. In Proceedings of the 2016 4th International Conference on 3D Vision, 3DV, Stanford, CA, USA, 25–28 October 2016; pp. 92–101. [Google Scholar] [CrossRef]
- Tao, C.; Gao, Z.; Yan, J.; Li, C.; Cui, G. Indoor 3D Semantic Robot VSLAM based on mask regional convolutional neural network. IEEE Access 2020, 8, 52906–52916. [Google Scholar] [CrossRef]
- Guan, H.; Qian, C.; Wu, T.; Hu, X.; Duan, F.; Ye, X. A Dynamic Scene Vision SLAM Method Incorporating Object Detection and Object Characterization. Sustainability 2023, 15, 3048. [Google Scholar] [CrossRef]
- Comba, L.; Biglia, A.; Aimonino, D.R.; Gay, P. Unsupervised detection of vineyards by 3D point-cloud UAV photogrammetry for precision agriculture. Comput. Electron. Agric. 2018, 155, 84–95. [Google Scholar] [CrossRef]
- Ge, L.; Zou, K.; Zhou, H.; Yu, X.; Tan, Y.; Zhang, C.; Li, W. Three dimensional apple tree organs classification and yield estimation algorithm based on multi-features fusion and support vector machine. Inf. Process. Agric. 2022, 9, 431–442. [Google Scholar] [CrossRef]
- Tu, H.; Wang, C.; Zeng, W. VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment. In Proceedings of the European Conference on Computer Vision, Springer Science and Business Media Deutschland GmbH, Glasgow, UK, 23–28 August 2020; pp. 197–212. [Google Scholar]
- Belagiannis, V.; Amin, S.; Andriluka, M.; Schiele, B.; Navab, N.; Ilic, S. 3D Pictorial Structures for Multiple Human Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1669–1676. [Google Scholar]
- Joo, H.; Soo, H.; Sheikh, P.Y. MAP Visibility Estimation for Large-Scale Dynamic 3D Reconstruction. In Proceedings of the Computer Vision and Pattern Recognition Conference, Columbus, OH, USA, 23–28 June 2014; Available online: http://www.cs.cmu.edu/ (accessed on 29 November 2023).
- Joo, H.; Liu, H.; Tan, L.; Gui, L.; Nabbe, B.; Matthews, I.; Kanade, T.; Nobuhara, S.; Sheikh, Y. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; Available online: http://www.cs.cmu.edu/ (accessed on 29 November 2023).
- Liu, H.; Wu, J.; He, R. Center point to pose: Multiple views 3D human pose estimation for multi-person. PLoS ONE 2022, 17, e0274450. [Google Scholar] [CrossRef]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; Available online: http://arxiv.org/abs/1712.02294 (accessed on 29 November 2023).
- Yang, B.; Luo, W.; Urtasun, R. PIXOR: Real-time 3D Object Detection from Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Computer Vision Group—Datasets—RGB-D SLAM Dataset and Benchmark. Available online: https://cvg.cit.tum.de/data/datasets/rgbd-dataset (accessed on 10 September 2023).
- Kang, H.; Chen, C. Fruit detection, segmentation and 3D visualisation of environments in apple orchards. Comput. Electron. Agric. 2020, 171, 105302. [Google Scholar] [CrossRef]
- Wu, G.; Li, B.; Zhu, Q.; Huang, M.; Guo, Y. Using color and 3D geometry features to segment fruit point cloud and improve fruit recognition accuracy. Comput. Electron. Agric. 2020, 174, 105475. [Google Scholar] [CrossRef]
- Pretto, A.; Aravecchia, S.; Burgard, W.; Chebrolu, N.; Dornhege, C.; Falck, T.; Fleckenstein, F.V.; Fontenla, A.; Imperoli, M.; Khanna, R.; et al. Building an Aerial-Ground Robotics System for Precision Farming: An Adaptable Solution. IEEE Robot. Autom. Mag. 2021, 28, 29–49. [Google Scholar] [CrossRef]
- Patil, A.K.; Balasubramanyam, A.; Ryu, J.Y.; N, P.K.B.; Chakravarthi, B.; Chai, Y.H. Fusion of multiple lidars and inertial sensors for the real-time pose tracking of human motion. Sensors 2020, 20, 5342. [Google Scholar] [CrossRef] [PubMed]
- Trumble, M.; Gilbert, A.; Malleson, C.; Hilton, A.; Collomosse, J. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In Proceedings of the 28th British Machine Vision Conference, London, UK, 21–24 November 2017; pp. 1–13. Available online: https://openresearch.surrey.ac.uk/esploro/outputs/conferencePresentation/Total-Capture-3D-Human-Pose-Estimation-Fusing-Video-and-Inertial-Sensors/99512708202346 (accessed on 1 June 2023).
- Chen, Y.; Liu, S.; Shen, X.; Jia, J. DSGN: Deep Stereo Geometry Network for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–18 June 2020. [Google Scholar]
- Mousavian, A.; Anguelov, D.; Flynn, J.; Košecká, J. 3D Bounding Box Estimation Using Deep Learning and Geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Maxwell, A.E.; Warner, T.A.; Guillén, L.A. Accuracy assessment in convolutional neural network-based deep learning remote sensing studies—Part 1: Literature review. Remote Sens. 2021, 13, 2450. [Google Scholar] [CrossRef]
- Hung, W.-C.; Kretzschmar, H.; Casser, V.; Hwang, J.-J.; Anguelov, D. LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D Detection. 2022. Available online: http://arxiv.org/abs/2206.07705 (accessed on 29 November 2023).
- Chen, X.; Jin, Z.; Zhang, Q.; Wang, P. Research on Comparison of LiDAR and Camera in Autonomous Driving. J. Phys. Conf. Ser. 2021, 2093, 012032. [Google Scholar] [CrossRef]
- Wu, H.; Wen, C.; Shi, S.; Li, X.; Wang, C. Virtual Sparse Convolution for Multimodal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and, and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; Available online: http://openaccess.thecvf.com/content/CVPR2023/html/Wu_Virtual_Sparse_Convolution_for_Multimodal_3D_Object_Detection_CVPR_2023_paper.html (accessed on 15 September 2023).
- Li, X.; Ma, T.; Hou, Y.; Shi, B.; Yang, Y.; Liu, Y.; Wu, X.; Chen, Q.; Li, Y.; Qiao, Y.; et al. LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; Available online: https://github.com/sankin97/LoGoNet (accessed on 29 November 2023).
- Wu, H.; Deng, J.; Wen, C.; Li, X.; Wang, C.; Li, J. CasA: A cascade attention network for 3-D object detection from LiDAR point clouds. IEEE Trans. Geosci. Remote Sens. 2022, 60, pp. 1–11. Available online: https://ieeexplore.ieee.org/abstract/document/9870747/ (accessed on 17 September 2023).
- Chen, J.; Wang, Q.; Peng, W.; Xu, H.; Li, X.; Xu, W. Disparity-Based Multiscale Fusion Network for Transportation Detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18855–18863. [Google Scholar] [CrossRef]
- Ye, Q.; Jiang, L.; Zhen, W.; Du, Y.; Chuxing, D. Consistency of Implicit and Explicit Features Matters for Monocular 3D Object Detection. arXiv 2022, arXiv:2207.07933. [Google Scholar]
- Hu, H.-N.; Yang, Y.-H.; Fischer, T.; Darrell, T.; Yu, F.; Sun, M. Monocular Quasi-Dense 3D Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1992–2008. [Google Scholar] [CrossRef]
Ref. | Modality | Application Domain | Content | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Point Cloud | Camera | RADAR | Multi Modal | Autonomous Vehicles | Robot Vision | Precision Agriculture | Human Pose | ||||
LiDAR | RGB-D | Mono | Stereo | ||||||||
[21] | H | M | H | L | L | L | H | H | L | L | This research focuses on point cloud and monocular camera-based 3D object detection. However, the scope is limited to autonomous vehicle navigation and robot vision. |
[22] | H | M | H | H | M | H | H | L | L | L | This research elaborates different modality (camera, LiDAR, RADAR) performance, but the scope is limited to the autonomous vehicle sector. |
[23] | H | H | H | M | L | M | H | H | L | L | This paper explores 3D object detection focusing on autonomous vehicles and indoor robot vision. |
[24] | H | L | M | L | L | M | H | L | L | L | This article discusses 3D object detection techniques with point cloud methods, especially in the autonomous vehicle navigation sector. However, the other modalities such as stereo sensors are not discussed. |
[25] | H | H | H | M | L | M | H | H | L | M | This research focuses on 3D object detection techniques based on point cloud and monocular camera. |
[26] | L | H | H | M | L | M | L | H | L | L | This research discusses 3D object detection techniques for room shape assessment. |
[27] | H | L | L | L | L | M | L | M | L | H | This article discusses 3D object detection techniques specifically with LiDAR sensors. However, the scope is limited to human detection. |
Dataset | Modalities | Annotated Object Category | Constraint | ||||
---|---|---|---|---|---|---|---|
LiDAR | RGB Camera | Stereo Camera | RADAR | RGB-D | |||
KITTI [9] | ✓ | ✓ | ✓ (Greyscale) | ✗ | 11 classes | Focuses on road scene. Not suitable for indoor robotic vision research. | |
nuScenes [12] | ✓ | ✓ | ✗ | ✓ | 23 classes | Focuses on autonomous driving scenario. | |
Waymo [13] | ✓ | ✓ | ✗ | ✓ | 23 classes | Exclusively built for autonomous driving. | |
SUN RGB-D [10] | ✗ | ✓ | ✗ | ✗ | ✓ | 700 classes | Focuses only on indoor scene. |
ScanNet [11] | ✗ | ✓ | ✗ | ✗ | ✓ | 20 classes | Focused on indoor scenes. |
Performance Criterion | RADAR | LiDAR | RGB Camera |
---|---|---|---|
Object detection | Good | Good | Fair |
Object classification | Poor | Fair | Good |
Distance inference | Good | Good | Fair |
Detecting edge | Poor | Good | Good |
Visibility range | Good | Fair | Fair |
Adverse weather performance | Good | Fair | Poor |
Performance in low light condition | Good | Good | Fair |
Modality | Method | Reference | Category | Easy | Moderate | Hard | Run Time |
---|---|---|---|---|---|---|---|
LiDAR+ RGB | VirConv-S | [133] | Car | 92.48% | 87.20% | 82.45% | 0.09 s |
LiDAR+ RGB | LoGoNet | [134] | Car | 91.80% | 85.06% | 80.74% | 0.1 s |
LiDAR+ RGB | LoGoNet | [134] | Pedestrian | 54.04% | 47.43% | 44.56% | 0.1 s |
LiDAR | CasA++ | [135] | Car | 90.68% | 84.04% | 79.69% | 0.1 s |
LiDAR | CasA++ | [135] | Pedestrian | 56.33% | 49.29% | 46.70% | 0.1 s |
Stereo | DSGN++ | [58] | Car | 83.21% | 67.37% | 59.91% | 0.2 s |
Stereo | DSGN++ | [58] | Pedestrian | 43.05% | 32.74% | 29.54% | 0.2 s |
Stereo | DMF | [136] | Car | 77.55% | 67.33% | 62.44% | 0.2 s |
Stereo | DMF | [136] | Pedestrian | 37.21% | 29.77% | 27.62% | 0.2 s |
Monocular (RGB) | CIE + DM3D | [137] | Car | 35.96% | 25.02% | 21.47% | 0.1 s |
Monocular (RGB) | QD-3DT [LSTM on RGB] | [138] | Car | 12.81% | 9.33% | 7.86% | 0.03 s |
Monocular (RGB) | QD-3DT [LSTM on RGB] | [138] | Pedestrian | 5.53% | 3.37% | 3.02% | 0.03 s |
Strengths
| Weaknesses
|
Opportunities
| Threats
|
Strengths
| Weaknesses
|
Opportunities
| Threats
|
Strengths
| Weaknesses
|
Opportunities
| Threats
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karim, T.; Mahayuddin, Z.R.; Hasan, M.K. Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction. Appl. Sci. 2023, 13, 13267. https://doi.org/10.3390/app132413267
Karim T, Mahayuddin ZR, Hasan MK. Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction. Applied Sciences. 2023; 13(24):13267. https://doi.org/10.3390/app132413267
Chicago/Turabian StyleKarim, Tajbia, Zainal Rasyid Mahayuddin, and Mohammad Kamrul Hasan. 2023. "Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction" Applied Sciences 13, no. 24: 13267. https://doi.org/10.3390/app132413267
APA StyleKarim, T., Mahayuddin, Z. R., & Hasan, M. K. (2023). Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction. Applied Sciences, 13(24), 13267. https://doi.org/10.3390/app132413267