Dense Feature Pyramid Deep Completion Network
Abstract
:1. Introduction
2. Methodology
2.1. Dense Networks DenseNet and Its Variants
2.2. Network Architecture
2.2.1. Encoder
- Before conducting various convolution operations, the feature extraction and fusion of the input RGB image and depth map information are required. This paper utilizes 3 × 3 convolutional layers to map them into 16-channel and 48-channel feature maps, which are then merged into a 64-channel joint feature map.
- The joint feature map is merely a consolidation of channel numbers; thus, before conducting multi-scale feature extraction, further convolution operations are required for feature extraction. In this study, we employ 3 × 3 dilated convolutions to extract features from the joint feature map, with a dilation rate of 3, a stride of 1, and the channel number remaining unchanged. As illustrated in Figure 2, compared to regular convolutions, the dilated convolutions utilized in this study offer a larger receptive field, enabling the extraction of more features. While maintaining the size of the feature map, they gather a broader range of contextual information, facilitating more accurate feature extraction and prediction.
- The average pooling in the transition layers is removed, retaining only batch normalization, ReLU, and 1 × 1 convolution. Pooling operations lead to information loss and the compression of feature map size, and their removal ensures the size of the feature map remains unchanged while preserving more details.
- A dense block is combined with its subsequent transition layer into a dense convolutional block. Additionally, the channel numbers of the feature maps obtained after four dense convolutional blocks are changed to 128, 256, 512, and 1024, respectively, and the feature map sizes are reduced to 1/2, 1/4, 1/8, and 1/16 of the original data. Through four-fold downsampling operations, features from each scale can be obtained as much as possible, while outliers and noise can also be removed.
- The global average pooling layer, final fully connected layer, and softmax function are removed, and the feature maps obtained from the fourth dense block are directly connected to the decoder.
2.2.2. Decoder
3. Results and Discussion
3.1. Datasets and Their Processing
- Rotation: RGB images and depth images are rotated by a random rotation angle within the range of [−5, 5] degrees.
- Horizontal flipping: RGB images and depth images are horizontally flipped with a probability .
- Color distortion: Brightness, contrast, and saturation values are increased or decreased by multiplying them by a distortion factor within the range of [0.5, 1.5].
3.2. Loss Function
3.3. Environment Configuration and Parameter Setting
3.4. Analysis of Experimental Results
3.4.1. Comparison with Other Networks on KITTI Dataset
3.4.2. Ablation Experiment
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, S. Research on depth Completion Algorithm Based on fusion of lidar and camera. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2022. [Google Scholar]
- Xiu, Y.L.; Yang, J.L.; Tzionas, D.; Black, M.J. ICON: Implicit clothed humans obtained from normals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13286–13296. [Google Scholar]
- Yin, T.W.; Zhou, X.Y.; Krahenbuhl, P. Center-based 3D object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11779–11788. [Google Scholar]
- Tu, Y.; Zhang, X.; Zhang, J.; Hu, L. Depth Image super-resolution Reconstruction Guided by Edge features. Comput. Appl. Softw. 2017, 34, 220–225. [Google Scholar] [CrossRef]
- He, K.M.; Sun, J.; Tang, X.O. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1397–1409. [Google Scholar] [CrossRef] [PubMed]
- Di, W.; Zhang, X.; Hu, L.; Duan, L. Second-order Total generalized variation for depth Map super-resolution Reconstruction Constrained by Color images. J. Image Graph. 2014, 19, 1162–1167. [Google Scholar] [CrossRef]
- Wang, Y.; Pu, Y.; Sun, R. Depth map super-resolution reconstruction combined with color image of the same scene. Acta Opt. Sin. 2017, 37, 810002. [Google Scholar] [CrossRef]
- Chen, J.; Li, R. A Depth Map Super-Resolution Reconstruction Based on Improved MRF. Microprocessors 2017, 38, 60–63, 71. [Google Scholar]
- Park, J.; Kim, H.; Tai, Y.W.; Brown, M.S.; Kweon, I. High quality depth map upsampling for 3D-TOF cameras. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 1623–1630. [Google Scholar]
- Rock, J.; Gupta, T.; Thorsen, J.; Gwak, J.Y.; Shin, D.; Hoiem, D. Completing 3D object shape from one depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2484–2493. [Google Scholar]
- Qian, G.C.; Abualshour, A.; Li, G.H.; Thabet, A.; Ghanem, B. PU-GCN: Point cloud upsampling using graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11678–11687. [Google Scholar]
- Achlioptas, P.; Diamanti, O.; Mitliagkas, I.; Guibas, L. Learning representations and generative models for 3d point clouds. In Proceedings of the International Conference on Machine Learning, Macau, China, 26–28 February 2018; pp. 40–49. [Google Scholar]
- Yang, Y.Q.; Feng, C.; Shen, Y.R.; Tian, D. FoldingNet: Point cloud auto-encoder via deep grid deformation. arXiv 2017, arXiv:1712.07262. [Google Scholar]
- Tchapmi, L.P.; Kosaraju, V.; Rezatofighi, H.; Reid, I.; Savarese, S. TopNet: Structural point cloud Decoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 383–392. [Google Scholar]
- Huang, Z.T.; Yu, Y.K.; Xu, J.W.; Ni, F.; Le, X. PF-net: Point fractal network for 3D point cloud completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7659–7667. [Google Scholar]
- Hu, J.; Bao, C.; Ozay, M.; Fan, C.; Gao, Q.; Liu, H.; Lam, T.L. Deep depth completion from extremely sparse data: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 8244–8264. [Google Scholar] [CrossRef] [PubMed]
- Cheng, X.; Wang, P.; Yang, R. Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European Conference on Computer Vision (ECCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 103–119. [Google Scholar]
- Jaritz, M.; Charette, D.R.; Wirbe, E.; Perrotton, X.; Nashashibi, F. Sparse and dense data with cnns: Depth completion and semantic segmentation. In Proceedings of the International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 52–60. [Google Scholar]
- Ma, F.; Cavalheiro, G.V.; Karaman, S. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3288–3295. [Google Scholar]
- Du, W.C.; Chen, H.; Yang, H.Y.; Zhang, Y. Depth Completion using Geometry-Aware Embedding. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
System/Platform | Configuration/Version |
---|---|
CPU (Intel Corporation, Santa Clara, CA, USA) | 12th GenIntel(R) Core(TM) i5-12400F |
Memory | 64G |
GPU (NVIDIA Corporation, Santa Clara, CA, USA) | NVIDIA GeForce RTX3060 (16G) |
OS | Ubuntu16.04 |
Programming Language | Python3.7 |
Deep Learning Framework | Pytorch1.13.1 |
Network | RMSE | MAE | iRMSE | iMAE |
---|---|---|---|---|
CSPN | 919.64 | 279.46 | 2.63 | 1.25 |
Spade-RGBsD | 907.34 | 234.81 | 2.17 | 0.95 |
Sparse-to-Dense | 814.73 | 249.95 | 2.8 | 1.2.1 |
GAENET | 793.90 | 231.29 | 2.27 | 1.08 |
Ours | 756.75 | 223.15 | 2.21 | 1.13 |
Loss Function | RMSE | MAE | iRMSE | iMAE |
---|---|---|---|---|
LH | 808.63 | 263.98 | 2.74 | 1.2.5 |
Lap | 786.50 | 254.69 | 2.59 | 1.18 |
Ours (LH + Lap) | 756.75 | 223.15 | 2.27 | 1.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, X.; Ni, P.; Li, Z.; Liu, G. Dense Feature Pyramid Deep Completion Network. Electronics 2024, 13, 3490. https://doi.org/10.3390/electronics13173490
Yang X, Ni P, Li Z, Liu G. Dense Feature Pyramid Deep Completion Network. Electronics. 2024; 13(17):3490. https://doi.org/10.3390/electronics13173490
Chicago/Turabian StyleYang, Xiaoping, Ping Ni, Zhenhua Li, and Guanghui Liu. 2024. "Dense Feature Pyramid Deep Completion Network" Electronics 13, no. 17: 3490. https://doi.org/10.3390/electronics13173490
APA StyleYang, X., Ni, P., Li, Z., & Liu, G. (2024). Dense Feature Pyramid Deep Completion Network. Electronics, 13(17), 3490. https://doi.org/10.3390/electronics13173490