Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud
Abstract
:1. Introduction
- We propose a one-stage object detection and pose estimation approach using a graph convolutional network based on point cloud.
- We design a point cloud-based graph convolutional network with a keypoint attention mechanism.
- In the RGB-D Dataset 7-Scene, 3D objects are estimated with nine DOF, and rotation error is overcome to achieve comparable performance with state-of-the-art systems.
2. Related Work
2.1. Graph Neural Network Models
2.2. One-Stage and Two-Stage Object Detection Models
2.3. Three-Dimensional Intersection over Union
3. Proposed Method
3.1. Keypoint Extraction and Graph Matrix
3.1.1. Keypoint Extraction for Graph Representation
3.1.2. Graph Matrix
3.2. Graph Convolutional Network for Object Pose Estimation
3.2.1. Convolutional Layer
3.2.2. Skip Connection
3.2.3. Loss
3.3. Intersection over Union and Bounding Box
3.3.1. Quaternion
3.3.2. 3D Intersection over Union
3.3.3. Bounding Box
4. Experimental Evaluation
4.1. Experimental Setup
4.1.1. Dataset
4.1.2. Implementation Details
4.2. Results
5. Further Analyses
5.1. Keypoint Attention Mechanism Performance Comparison Analysis
5.2. The GCN Module of Iterations
5.3. Point Cloud Down-Sampling and Runtime Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Weingarten, J.W.; Gruener, G.; Siegwart, R. A state-of-the-art 3D sensor for robot navigation. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; IEEE: Piscataway, NJ, USA; pp. 2155–2160. [Google Scholar]
- Cortés Gallardo Medina, E.; Velazquez Espitia, V.M.; Chípuli Silva, D.; Fernández Ruiz de las Cuevas, S.; Palacios Hirata, M.; Zhu Chen, A.; González González, J.Á.; Bustamante-Bello, R.; Moreno-García, C.F. Object Detection, Distributed Cloud Computing and Parallelization Techniques for Autonomous Driving Systems. Appl. Sci. 2021, 11, 2925. [Google Scholar] [CrossRef]
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Wolcott, R.W.; Eustice, R.M. Visual localization within LIDAR maps for automated urban driving. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; IEEE: Piscataway, NJ, USA; pp. 176–183. [Google Scholar]
- Li, X.; Guo, W.; Li, M.; Chen, C.; Sun, L. Generating colored point cloud under the calibration between TOF and RGB cameras. In Proceedings of the 2013 IEEE International Conference on Information and Automation (ICIA), Yingchuan, China, 26–28 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 483–488. [Google Scholar]
- Qi, X.; Liao, R.; Jia, J.; Fidler, S.; Urtasun, R. 3D graph neural networks for RGBD semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5209–5218. [Google Scholar]
- Virtanen, J.P.; Daniel, S.; Turppa, T.; Zhu, L.; Julin, A.; Hyyppä, H.; Hyyppä, J. Interactive dense point clouds in a game engine. ISPRS J. Photogramm. Remote Sens. 2020, 163, 375–389. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3D point clouds: A survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2020; Volume 43, pp. 4338–4364. [Google Scholar]
- Alakwaa, W.; Nassef, M.; Badr, A. Lung cancer detection and classification with 3D convolutional neural network (3D-CNN). Int. J. Adv. Comput. Sci. Appl. 2017, 8, 409. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Qi, C.R.; Su, H.; Kaichun, M.; Guibas, L.J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
- Riegler, G.; Ulusoy, A.O.; Geiger, A. OctNet: Learning deep 3D representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6620–6629. [Google Scholar]
- Li, G.; Mueller, M.; Qian, G.; Perez, I.C.D.; Abualshour, A.; Thabet, A.K.; Ghanem, B. DeepGCNs: Making GCNs go as deep as CNNs. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
- Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4558–4567. [Google Scholar]
- Bi, Y.; Chadha, A.; Abbas, A.; Bourtsoulatze, E.; Andreopoulos, Y. Graph-based object classification for neuromorphic vision sensing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 491–501. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 146. [Google Scholar] [CrossRef] [Green Version]
- Shi, W.; Rajkumar, R. Point-GNN: Graph neural network for 3D object detection in a point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1708–1716. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Jung, T.-W.; Jeong, C.-S.; Kwon, S.-C.; Jung, K.-D. Point-Graph Neural Network Based Novel Visual Positioning System for Indoor Navigation. Appl. Sci. 2021, 11, 9187. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2021, 1, 57–81. [Google Scholar] [CrossRef]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems 30 NIPS, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure aware single-stage 3D object detection from point cloud. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11870–11879. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-based 3D single stage object detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11037–11045. [Google Scholar]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast encoders for object detection from point clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12689–12697. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zheng, W.; Tang, W.; Jiang, L.; Fu, C.W. SE-SSD: Self-Ensembling Single-Stage Object Detector from Point Cloud. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 18 October 2021; pp. 14489–14498. [Google Scholar]
- Mousavian, A.; Anguelov, D.; Flynn, J.; Kosecka, J. 3D Bounding Box Estimation Using Deep Learning and Geometry. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2016; pp. 5632–5640. [Google Scholar]
- Ahmadyan, A.; Zhang, L.; Wei, J.; Ablavatski, A.; Wei, J.; Grundmann, M. Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 18 October 2021; pp. 7818–7827. [Google Scholar]
- Jon, L.B.; Donald, F.S.; Williams, E.H. The complexity of finding fixed-radius near neighbors. Inf. Process. Lett. 1977, 6, 209–212. [Google Scholar]
- Shotton, J.; Glocker, B.; Zach, C.; Izadi, S.; Criminisi, A.; Fitzgibbon, A. Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2930–2937. [Google Scholar]
- Melekhov, I.; Ylioinas, J.; Kannala, J.; Rahtu, E. Image-based localization using hourglass networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 879–886. [Google Scholar] [CrossRef]
- Wu, J.; Ma, L.; Hu, X. Delving deeper into convolutional neural networks for camera relocalization. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 5644–5651. [Google Scholar]
- Valada, A.; Radwan, N.; Burgard, W. Deep auxiliary learning for visual localization and odometry. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–26 May 2018; pp. 6939–6946. [Google Scholar]
- Radwan, N.; Valada, A.; Burgard, W. VLocNet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robot. Autom. Lett. 2018, 3, 4407–4414. [Google Scholar] [CrossRef] [Green Version]
- Balntas, V.; Li, S.; Prisacariu, V. RelocNet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 782–799. [Google Scholar]
- Brachmann, E.; Krull, A.; Nowozin, S.; Shotton, J.; Michel, F.; Gumhold, S.; Rother, C. DSAC-differentiable RANSAC for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 6684–6692. [Google Scholar] [CrossRef]
Loss | Chess | Fire | Heads | Office | Pumpkin | RedKitchen | Stairs |
---|---|---|---|---|---|---|---|
cls_loss | 0.0060 | 0.0027 | 0.0310 | 0.0330 | 0.0300 | 0.0330 | 0.0340 |
loc_loss | 0.0135 | 0.0100 | 0.0080 | 0.1580 | 0.0120 | 0.0390 | 0.0080 |
recall | 0.8598 | 0.9286 | 1.0000 | 0.9752 | 0.8756 | 1.0000 | 1.0000 |
precision | 0.9200 | 0.8667 | 0.8333 | 0.9100 | 0.9910 | 0.9518 | 0.9302 |
Class | Chess | Fire | Heads | Office | Pumpkin | RedKitchen | Stairs |
---|---|---|---|---|---|---|---|
IoU | 0.8719 | 0.9247 | 0.9118 | 0.8372 | 0.9112 | 0.8782 | 0.8782 |
7-Scenes | Hourglass-Pose [33] | BranchNet [34] | VLocNet [35] | VLocNet++ [36] | RelocNet [37] | DSAC [38] | Ours |
---|---|---|---|---|---|---|---|
Chess | 0.15 m, 6.53° | 0.18 m, 5.17° | 0.036 m, 1.71° | 0.023 m, 1.44° | 0.12 m, 4.14° | 0.02 m, 1.2° | 0.032 m, 1.44 ° |
Fire | 0.27 m, 10.84° | 0.34 m, 8.99° | 0.039 m, 5.34° | 0.018 m, 1.39° | 0.26 m, 10.4° | 0.04 m, 1.5° | 0.021 m, 1.24° |
Heads | 0.19 m, 11.63° | 0.20 m, 14.15° | 0.046 m, 6.64° | 0.016 m, 0.99° | 0.14 m, 10.5° | 0.03 m, 2.7° | 0.021 m, 2.8 1° |
Office | 0.21 m, 8.48° | 0.30 m, 7.05° | 0.039 m, 1.95° | 0.024 m, 1.14° | 0.18 m, 5.32° | 0.04 m, 1.6° | 0.052 m, 1.78° |
Pumpkin | 0.25 m, 7.01° | 0.27 m, 5.10° | 0.037 m, 2.28° | 0.024 m, 1.45° | 0.26 m, 4.17° | 0.05 m, 2.0° | 0.028 m, 2.12° |
RedKitchen | 0.27 m, 10.84° | 0.33 m, 7.40° | 0.039 m, 2.20° | 0.025 m, 2.27° | 0.23 m, 5.08° | 0.05 m, 2.0° | 0.032 m, 2.53° |
Stairs | 0.29 m, 12.46° | 0.38 m, 10.26° | 0.097 m, 6.48° | 0.021 m, 1.08° | 0.28 m, 7.53° | 1.17 m, 33.1° | 0.031 m, 3.26° |
Not Used The KAT | Chess | Fire | Heads | Office | Pumpkin | RedKitchen | Stairs |
---|---|---|---|---|---|---|---|
IoU | 0.4890 | 0.7708 | 0.8423 | 0.8936 | 0.7704 | 0.8935 | 0.7113 |
GCN Layer | 1st Iteration | 2nd Iteration | 3rd Iteration | 4th Iteration |
---|---|---|---|---|
Error Distance | 0.012 m | 0.023 m | 0.014 m | 0.019 m |
Error Angle | 1.768° | 1.562° | 1.244° | 1.400° |
Running Time (s) | Chess | Fire | Heads | Office | Pumpkin | RedKitchen | Stairs |
---|---|---|---|---|---|---|---|
voxel size 0.2 m | 0.0287 | 0.0490 | 0.0396 | 0.0507 | 0.0360 | 0.0486 | 0.0501 |
voxel size 0.4 m | 0.0279 | 0.0472 | 0.0386 | 0.0447 | 0.0355 | 0.0452 | 0.0477 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jung, T.-W.; Jeong, C.-S.; Kim, I.-S.; Yu, M.-S.; Kwon, S.-C.; Jung, K.-D. Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud. Sensors 2022, 22, 8166. https://doi.org/10.3390/s22218166
Jung T-W, Jeong C-S, Kim I-S, Yu M-S, Kwon S-C, Jung K-D. Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud. Sensors. 2022; 22(21):8166. https://doi.org/10.3390/s22218166
Chicago/Turabian StyleJung, Tae-Won, Chi-Seo Jeong, In-Seon Kim, Min-Su Yu, Soon-Chul Kwon, and Kye-Dong Jung. 2022. "Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud" Sensors 22, no. 21: 8166. https://doi.org/10.3390/s22218166
APA StyleJung, T. -W., Jeong, C. -S., Kim, I. -S., Yu, M. -S., Kwon, S. -C., & Jung, K. -D. (2022). Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud. Sensors, 22(21), 8166. https://doi.org/10.3390/s22218166