3D Instance Segmentation and Object Detection Framework Based on the Fusion of Lidar Remote Sensing and Optical Image Sensing
Abstract
:1. Introduction
- (1)
- The fusion application of Lidar remote sensing technology and optical image sensing technology are fully leveraged to determine the pre-fusion and alignment of the field of view, reduce redundant data processing, and reduce the algorithm complexity of a certain level.
- (2)
- The stereo regional proposal selective search-driven DAGNN expands the receptive field under the trick of the dilated convolution, avoids the scale loss, and the redouble loss function effectively integrates the positioning and semantic information. The detection result for small objects, object occlusion, and object stacking all have significant feedback.
- (3)
- Similarly to superpixels, the point cloud data are calculated at a certain granularity while considering the unique and innovative 2D object information, point cloud color, texture, size, and physical concave-convex geometric features of the 3D point cloud voxelization and hypervoxel clustering boundary. The proposed point cloud instance segmentation is excellent, and the octree-driven voxel storage and cluster growth calculation also make the layout of the segmentation class more accurate and precise, and the calculation becomes faster.
- (4)
- Finally, the visualization of 2D and 3D object boundary box mapping is carried out, which provides certain accurate positioning information and semantic information, and can provide an essential basis for intelligent navigation and path planning. The proposed framework of multi-sensor, multi-dimensional data, multi-mode fusion, and multi-layer interaction remedies a single sensor failure under complex weather, vehicle transportation environment, and lighting conditions. The framework can be used as a lever or alternative application.
2. Related Works
- Model-based point cloud segmentation algorithms.
- Attribute-based point cloud segmentation algorithms.
- Boundary-based point cloud segmentation algorithms.
- Region-based point cloud segmentation algorithms.
- Graph-based point cloud segmentation algorithms.
- Learning-based point cloud segmentation algorithms.
3. The Proposed Framework Overview
3.1. Data Fusion
- The projection matrix of Lidar point cloud data to the image plane is calculated with Equation (1):
- The point cloud is sampled at the granularity size of the unit radius , and the near points of the obstacles in the image plane are removed, i.e., the laser point cloud with negative behind the camera plane is deleted, so that the detection range is forward, and the near point is removed.
- As the task flow of preliminary sensor fusion and alignment, the projective transformation of the 3D point cloud and the image plane is performed by the projection matrix . The image points in the homogeneous coordinate system are calculated and normalized.
- The results of data fusion between Lidar point clouds and images are drawn in the 2D space plane, and the color values are assigned to the depth of the dot matrix to represent the colormap of the depth of field.
3.2. Stereo Regional Proposal Selective Search-Driven DAGNN
Algorithm 1. Region Selective Search |
3.3. Octree-Guided Hypervoxels Over-Segmentation
- Firstly, a set of three-dimensional cube grids is established on the input data of point clouds.
- Then, each three-dimensional cube has meshed, and all points in the grid are approximated by the center points of all point clouds’ data.
- Finally, the voxel cloud data are generated.
- When , i.e., , the connection between the two super-voxels is concave.
- Otherwise, when , i.e., , the relative connection between the two super-voxels is convex.
- The concave-convex degree is the sum of and .
3.4. 3D Object Instance Segmentation and Bounding Box Mapping
Algorithm 2. Process for point cloud calibration and boundary box mapping |
4. Results and Discussion
4.1. Implementation Details and Inputs
Implementation, testing and evaluation platform and database configuration |
2×PointGray Flea2 color cameras (FL2-14S3C-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter |
4.2. Analysis of Fusion
4.3. Results of 2D Detection and 3D Segmentation
4.4. Evaluation and Discussion for 3D Segmentation of 2D-Driven Results
- The point cloud accuracy (PA) represents the proportion of the point cloud of the predicted category to the point cloud aligned by the fusion of Lidar remote sensing and optical image sensing.
- The accuracy of the category point cloud (CPA) indicates the accuracy of the point cloud that actually belongs to category i in the prediction of category i.
- The mean point cloud accuracy (MPA) represents the average proportion of all types of point clouds in the point cloud after the sensor and camera sensor are fused and aligned.
- The average accuracy of the point cloud category (MCPA) indicates the average proportion of each predicted category point cloud to all categories.
- The average intersection of union (MIOU) represents the ratio of the intersection and union of the predicted category point cloud and the ground truth.
- The horizontal positioning error (HPE) represents the difference between the centroid of the point cloud of the predicted category object and the ground truth in the north and east, that is, the component of the x-y coordinate axis.
- Object positioning error (OPE) indicates the difference of 3D rigid body motion between the centroid of the point cloud of the predicted category object and the ground truth.
- The average horizontal positioning error (MHPE) and the average object positioning error (MOPE) represent the average HPE and average OPE between the object point cloud and the ground truth for all prediction categories.
4.5. Object Detection and Visualized Mapping Results
4.6. Comparison and Discussion for Object Detection
Three kinds of object detection difficulty degree |
|
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pang, C.; Zhong, X.; Hu, H.; Tian, J.; Peng, X.; Zeng, J. Adaptive Obstacle Detection for Mobile Robots in Urban Environments Using Downward-Looking 2D LiDAR. Sensors 2018, 18, 1749. [Google Scholar] [CrossRef] [Green Version]
- Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J. 3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2270–2287. [Google Scholar] [CrossRef]
- Guo, Y.; Wen, C.; Sun, X.; Wang, C.; Li, J. Partial 3D Object Retrieval and Completeness Evaluation for Urban Street Scene. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1252–1255. [Google Scholar]
- Zou, K.; Zhang, Z.; Zhang, J.; Zhang, Q. A 3D model feature extraction method using curvature-based shape distribution. In Proceedings of the IEEE International Conference on Fuzzy Systems & Knowledge Discovery, Zhangjiajie, China, 15–17 August 2015; pp. 1809–1813. [Google Scholar]
- Mokhtarian, F.; Khalili, N.; Yuen, P. Curvature Computation on Free-Form 3-D Meshes at Multiple Scales. Comput. Vis. Image Underst. 2001, 83, 118–139. [Google Scholar] [CrossRef] [Green Version]
- Hung, C.-C.; Kulkarni, S.; Kuo, B.-C. A New Weighted Fuzzy C-Means Clustering Algorithm for Remotely Sensed Image Classification. IEEE J. Sel. Top. Signal Process. 2010, 5, 543–553. [Google Scholar] [CrossRef]
- Garro, V.; Giachetti, A. Scale Space Graph Representation and Kernel Matching for Non-Rigid and Textured 3D Shape Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 1258–1271. [Google Scholar] [CrossRef] [PubMed]
- Chen, B.; Chen, H.; Yuan, D.; Yu, L. 3D Fast Object Detection Based on Discriminant Images and Dynamic Distance Threshold Clustering. Sensors 2020, 20, 7221. [Google Scholar] [CrossRef] [PubMed]
- Zhou, W.; Pan, S.; Lei, J.; Yu, L.; Zhou, X.; Luo, T. Three-branch architecture for stereoscopic 3D salient object detection. Digital Signal Process. 2020, 106, 1051–2004. [Google Scholar]
- Luo, Q.; Ma, H.; Tang, L.; Wang, Y.; Xiong, R. 3D-SSD: Learning hierarchical features from RGB-D images for amodal 3D object detection. arXiv 2017, arXiv:1711.00238. [Google Scholar] [CrossRef] [Green Version]
- Ong, J.; Vo, B.-T.; Kim, D.Y.; Nordholm, S. A Bayesian Filter for Multi-view 3D Multi-object Tracking with Occlusion Handling. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 12, 6009–6027. [Google Scholar] [CrossRef]
- Awadallah, M.; Abbott, L.; Ghannam, S. Segmentation of sparse noisy point clouds using active contour models. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 6061–6065. [Google Scholar]
- Wang, Y.; Shi, H. A Segmentation Method for Point Cloud Based on Local Sample and Statistic Inference. Geoinform. Resour. Manag. Sustain. Ecosyst. 2015, 482, 274–282. [Google Scholar]
- Li, L.; Yang, F.; Zhu, H.; Li, D.; Li, Y.; Tang, L. An Improved RANSAC for 3D Point Cloud Plane Segmentation Based on Normal Distribution Transformation Cells. Remote Sens. 2017, 9, 433. [Google Scholar] [CrossRef] [Green Version]
- Zhao, C.; Guo, H.; Lu, J.; Yu, D.; Zhou, X.; Lin, Y. A new approach for roof segmentation from airborne LiDAR point clouds. Remote Sens. Lett. 2021, 12, 377–386. [Google Scholar] [CrossRef]
- Xu, B.; Jiang, W.; Shan, J.; Zhang, J.; Li, L. Investigation on the Weighted RANSAC Approaches for Building Roof Plane Segmentation from LiDAR Point Clouds. Remote Sens. 2016, 8, 5. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Cao, J.; Liu, X.; Chen, H.; Li, B.; Liu, L. Multi-Normal Estimation via Pair Consistency Voting. IEEE Trans. Vis. Comput. Graph. 2018, 25, 1693–1706. [Google Scholar] [CrossRef]
- Dey, E.; Kurdi, F.T.; Awrangjeb, M.; Stantic, B. Effective Selection of Variable Point Neighbourhood for Feature Point Extraction from Aerial Building Point Cloud Data. Remote Sens. 2021, 13, 1520. [Google Scholar] [CrossRef]
- Bergamasco, F.; Pistellato, M.; Albarelli, A.; Torsello, A. Cylinders extraction in non-oriented point clouds as a clustering problem. Pattern Recognit. 2020, 107, 107443. [Google Scholar] [CrossRef]
- Dirk, H.; Stefan, H.; Radu, B.R.; Sven, B. Real-Time Plane Segmentation Using RGB-D Cameras. In Robot Soccer World Cup; Springer: Cham, Switzerland, 2011; Volume 7416, pp. 306–317. [Google Scholar]
- Hu, F.; Tian, Z.; Li, Y.; Huang, S.; Feng, M. A Combined Clustering and Image Mapping based Point Cloud Segmentation for 3D Object Detection. In Proceedings of the Chinese Control and Decision Conference, Shenyang, China, 9–11 June 2018; pp. 1664–1669. [Google Scholar]
- Luo, H.; Zheng, Q.; Wang, C.; Guo, W. Boundary-Aware and Semiautomatic Segmentation of 3-D Object in Point Clouds. IEEE Geosci. Remote Sens. Lett. 2021, 18, 910–914. [Google Scholar] [CrossRef]
- Vo, A.-V.; Truong-Hong, L.; Laefer, D.; Bertolotto, M. Octree-based region growing for point cloud segmentation. ISPRS J. Photogramm. Remote Sens. 2015, 104, 88–100. [Google Scholar] [CrossRef]
- Li, L.; Yao, J.; Tu, J.; Liu, X.; Li, Y.; Guo, L. Roof Plane Segmentation from Airborne LiDAR Data Using Hierarchical Clustering and Boundary Relabeling. Remote Sens. 2020, 12, 1363. [Google Scholar] [CrossRef]
- Hasirci, Z.; Ozturk, M. The comparison of region growing algorithms with using EMST for point clouds. In Proceedings of the International Conference on Telecommunications and Signal Processing (TSP), Prague, Czech Republic, 9–11 July 2015; pp. 1–5. [Google Scholar]
- Wu, H.; Zhang, X.; Shi, W.; Song, S.; Tristan, A.C.; Li, K. An Accurate and Robust Region-Growing Algorithm for Plane Segmentation of TLS Point Clouds Using a Multiscale Tensor Voting Method. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4160–4168. [Google Scholar] [CrossRef]
- Strom, J.; Richardson, A.; Olson, E. Graph-based segmentation for colored 3D laser point clouds. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 2131–2136. [Google Scholar]
- Tatavarti, A.; Papadakis, J.; Willis, A.R. Towards real-time segmentation of 3D point cloud data into local planar regions. In Proceedings of the SoutheastCon, Concord, NC, USA, 30 March–2 April 2017; pp. 1–6. [Google Scholar]
- Zhang, S.; Cui, S.; Ding, Z. Hypergraph Spectral Clustering for Point Cloud Segmentation. IEEE Signal Process. Lett. 2020, 27, 1655–1659. [Google Scholar] [CrossRef]
- Sd, A.; Mhb, C.; Nk, D.; Pk, A. Combining graph-cut clustering with object-based stem detection for tree segmentation in highly dense airborne lidar point clouds. ISPRS J. Photogramm. Remote Sens. 2021, 172, 207–222. [Google Scholar]
- Charles, R.Q.; Wei, L.; Chenxia, W.; Hao, S.; Leonidas, J.G. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- Lin, Z.-H.; Huang, S.Y.; Wang, Y.-C.F. Learning of 3D Graph Convolution Networks for Point Cloud Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef] [PubMed]
- Cui, Y.; Liu, X.; Liu, H.; Zhang, J.; Zare, A.; Fan, B. Geometric attentional dynamic graph convolutional neural networks for point cloud analysis. Neurocomputing 2021, 432, 300–310. [Google Scholar] [CrossRef]
- Wang, J.; Xu, C.; Dai, L.; Zhang, J.; Zhong, R.Y. An Unequal Learning Approach for 3D Point Cloud Segmentation. IEEE Trans. Ind. Inform. 2021. [Google Scholar] [CrossRef]
- Nagy, B.; Benedek, C. On-the-Fly Camera and Lidar Calibration. Remote Sens. 2020, 12, 1137. [Google Scholar] [CrossRef] [Green Version]
- Geiger, A.; Moosmann, F.; Car, O.; Schuster, B. A toolbox for automatic calibration of range and camera sensors using a single shot. In Proceedings of the International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012. [Google Scholar]
- Bai, L.; Li, Y.; Hu, F.; Zhao, F. Region-proposal Convolutional Network-driven Point Cloud Voxelization and Over-segmentation for 3D Object Detection. In Proceedings of the IEEE Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 3553–3558. [Google Scholar]
- Qingming, Z.; Yubin, L.; Yinghui, X. Color-Based Segmentation of Point Clouds. Laser Scanning 2009, 38, 155–161. [Google Scholar]
- Stein, S.C.; Schoeler, M.; Papon, J.; Worgotter, F. Object partitioning using local convexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 304–311. [Google Scholar]
- Golovinskiy, A.; Funkhouser, T. Min-cut based segmentation of point clouds. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV), Kyoto, Japan, 27 September–4 October 2009; pp. 39–46. [Google Scholar]
- Rabbani, T.; Van Den Heuvel, F.; Vosselmann, G. Segmentation of point clouds using smoothness constraint. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2006, 36, 248–253. [Google Scholar]
- Papon, J.; Abramov, A.; Schoeler, M.; Worgotter, F. Voxel Cloud Connectivity Segmentation—Supervoxels for Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2027–2034. [Google Scholar]
- David, C.; Nafornita, C.; Gui, V.; Campeanu, A.; Carrie, G.; Monnerat, M. GNSS Localization in Constraint Environment by Image Fusing Techniques. Remote Sens. 2021, 13, 2021. [Google Scholar] [CrossRef]
- Beltran, J.; Guindel, C.; Moreno, F.M.; Cruzado, D.; Garcia, F.; De La Escalera, A. BirdNet: A 3D Object Detection Framework from LiDAR Information. In Proceedings of the IEEE 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 1–7. [Google Scholar]
- Reading, C.; Harakeh, A.; Chae, J.; Waslander, S.L. Categorical depth distribution network for monocular 3d object detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, Paris, France, 19–25 June 2021; pp. 1–11. [Google Scholar]
- Shi, X.; Ye, Q.; Chen, X.; Chen, C.; Chen, Z.; Kim, T.K. Geometry-based distance decomposition for monocular 3D object de-tection. arXiv 2021, arXiv:2104.03775. [Google Scholar]
- Liu, Y.; Wang, L.; Liu, M. Yolostereo3D: A step back to 2D for efficient stereo 3D detection. arXiv 2021, arXiv:2103.09422. [Google Scholar]
- Zeng, Y.; Hu, Y.; Liu, S.; Ye, J.; Han, Y.; Li, X.; Sun, N. RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving. IEEE Robot. Autom. Lett. 2018, 3, 3434–3440. [Google Scholar] [CrossRef]
- Hu, H.N.; Yang, Y.H.; Fischer, T.; Darrell, T.; Yu, F.; Sin, M. Monocular quasi-dense 3D object tracking. arXiv 2021, arXiv:2103.07351. [Google Scholar]
Layer | Size Dimension | Kernel Size | Stride | Padding |
---|---|---|---|---|
conv1_1 | 224 × 224 × 64 | 3 × 3 | 1 | 1 |
conv1_2 | 224 × 224 × 64 | 3 × 3 | 1 | 1 |
pooling1 | 112 × 112 × 64 | 2 × 2 | 2 | 0 |
conv2_1 | 112 × 112 × 128 | 3 × 3 | 1 | 1 |
conv2_2 | 112 × 112 × 128 | 3 × 3 | 1 | 1 |
pooling2 | 56 × 56 × 128 | 2 × 2 | 2 | 0 |
conv3_1 | 56 × 56 × 256 | 3 × 3 | 1 | 1 |
conv3_2 | 56 × 56 × 256 | 3 × 3 | 1 | 1 |
conv3_3 | 56 × 56 × 256 | 3 × 3 | 1 | 1 |
pooling3 | 28 × 28 × 256 | 2 × 2 | 2 | 0 |
conv4_1 | 28 × 28 × 512 | 3 × 3 | 1 | 1 |
conv4_2 | 28 × 28 × 512 | 3 × 3 | 1 | 1 |
dilated4_3_2 | 28 × 28 × 512 | 5 × 5 | 1 | 0 |
pooling4 | 14 × 14 × 512 | 2 × 2 | 2 | 0 |
conv5_1 | 14 × 14 × 512 | 3 × 3 | 1 | 1 |
conv5_2 | 14 × 14 × 512 | 3 × 3 | 1 | 1 |
dilated5_3_2 | 14 × 14 × 512 | 5 × 5 | 1 | 0 |
dilated5_4_1 | 14 × 14 × 512 | 5 × 5 | 1 | 0 |
dilated5_4_2 | 14 × 14 × 512 | 9 × 9 | 1 | 0 |
dilated5_4_4 | 14 × 14 × 512 | 17 × 17 | 1 | 0 |
Pooling6 | 7 × 7 × 512 | 2 × 2 | 2 | 0 |
fc6_1 | 1 × 1 × 4096 | - | - | - |
fc6_2 | 1 × 1 × 4096 | - | - | - |
Methods | PA | CPA | MCPA | MPA | MIOU | |||||
O1 | O2 | O3 | O1 | O2 | O3 | |||||
Color-based region grow [38] | / | 0.48% | / | / | 24.68% | / | 8.23% | 0.16% | 8.26% | |
Peer hypervoxel [39] | 3.33% | 1.98% | / | 95.00% | 100.00% | / | 65.00% | 1.77% | 91.30% | |
Min-cut of graph model [40] | / | 0.45% | / | / | 23.38% | / | 7.79% | 0.15% | 7.83% | |
Distance cluster [41] | / | 0.43% | 0.48% | / | 22.08% | 100.00% | 40.69% | 0.30% | 13.04% | |
Octree voxel [42] | 2.91% | 1.75% | 0.45% | 82.86% | 90.91% | 100.00% | 91.26% | 1.70% | 86.52% | |
OURS | 3.46% | 1.55% | 0.33% | 98.57% | 80.52% | 100.00% | 93.03% | 1.78% | 92.61% | |
Methods | HPE(m) | OPE(m) | MHPE(m) | MOPE(m) | Runtime(s) | RN(%) | ||||
O1 | O2 | O3 | O1 | O2 | O3 | |||||
Color-based region grow | / | 0.7989 | / | / | 1.1473 | / | 0.9330 | 1.0491 | 1.426 | 90% |
Peer hypervoxel | 0.1082 | 0.0927 | / | 0.1082 | 0.0945 | / | 0.4003 | 0.4009 | 1.155 | 70% |
Min-cut of graph model | / | 0.5492 | / | / | 1.1856 | / | 0.8497 | 1.0619 | 1.287 | 70% |
Distance cluster | / | 0.5266 | 0.4057 | / | 0.5529 | 0.4591 | 0.6441 | 0.6707 | 2.812 | 90% |
Octree voxel | 0.0156 | 0.0594 | 0.9409 | 0.1210 | 0.1548 | 0.9594 | 0.3387 | 0.4117 | 1.216 | 80% |
OURS | 0.0900 | 0.3779 | 0.0859 | 0.0901 | 0.3781 | 0.0942 | 0.1846 | 0.1874 | 0.068 | 100% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bai, L.; Li, Y.; Cen, M.; Hu, F. 3D Instance Segmentation and Object Detection Framework Based on the Fusion of Lidar Remote Sensing and Optical Image Sensing. Remote Sens. 2021, 13, 3288. https://doi.org/10.3390/rs13163288
Bai L, Li Y, Cen M, Hu F. 3D Instance Segmentation and Object Detection Framework Based on the Fusion of Lidar Remote Sensing and Optical Image Sensing. Remote Sensing. 2021; 13(16):3288. https://doi.org/10.3390/rs13163288
Chicago/Turabian StyleBai, Ling, Yinguo Li, Ming Cen, and Fangchao Hu. 2021. "3D Instance Segmentation and Object Detection Framework Based on the Fusion of Lidar Remote Sensing and Optical Image Sensing" Remote Sensing 13, no. 16: 3288. https://doi.org/10.3390/rs13163288
APA StyleBai, L., Li, Y., Cen, M., & Hu, F. (2021). 3D Instance Segmentation and Object Detection Framework Based on the Fusion of Lidar Remote Sensing and Optical Image Sensing. Remote Sensing, 13(16), 3288. https://doi.org/10.3390/rs13163288