Expanding Sparse Radar Depth Based on Joint Bilateral Filter for Radar-Guided Monocular Depth Estimation
Abstract
:1. Introduction
- The proposal suggests employing a joint bilateral filter and calculating a corresponding confidence map to facilitate radar expansion.
- The use of a range-aware window size for expanding radar, providing a better expanding region and higher resolution compared with a fixed window.
- The ability of the proposed expansion method to increase radar points by over 1000 times while minimizing intrinsic errors.
- The flexibility of our method, which does not require lidar supervision during training and can be applied to lidar-free or unsupervised datasets.
- Superior performance for depth estimation compared with previously proposed radar preprocessing methods under the same model settings across various evaluation metrics.
2. Related Works
2.1. Monocular Depth Estimation
2.2. Camera-Lidar Depth Completion
2.3. Radar-Guided Monocular Depth Estimation
3. Methodology
Algorithm 1: Proposed Joint Bilateral Expansion. |
|
3.1. Joint Bilateral Filter
3.2. Proposed Expansion Method
3.3. Intrinsic Error
4. Experiments
4.1. Implementation Details
4.2. Evaluation Metrics
4.3. Radar-Guided Depth Estimation
4.4. Radar Inference Experiments
4.5. Selection of Spatial and Range Sigma
4.6. Effects of Employing Only a Single Kernel
4.7. Impact of Using Additional Confidence Map
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chang, J.; Chen, Y. Pyramid Stereo Matching Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- Zhang, F.; Prisacariu, V.; Yang, R.; Torr, P. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 185–194. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 8–13 December 2014; pp. 2366–2374. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting Depth, Surface Normals, and Semantic Labels with a Common Multi-scale Convolutional Architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep Ordinal Regression Network for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2002–2011. [Google Scholar]
- Lee, J.; Han, M.; Ko, D.; Suh, I. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv 2019, arXiv:1907.10326. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6602–6611. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Firman, M.; Brostow, G. Digging into Self-Supervised Monocular Depth Prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3827–3837. [Google Scholar]
- Ma, F.; Karaman, S. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples, and a Single Image. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 4796–4803. [Google Scholar]
- Ma, F.; Cavalheiro, G.; Karaman, S. Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3288–3295. [Google Scholar]
- Siddiqui, S.; Vierling, A.; Berns, K. Multi-Modal Depth Estimation Using Convolutional Neural Networks. In Proceedings of the IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Abu Dhabi, United Arab Emirates, 4–6 November 2020; pp. 354–359. [Google Scholar]
- Boettcher, W.; Hoyer, L.; Unal, O.; Li, K.; Dai, D. LiDAR Meta Depth Completion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 7750–7756. [Google Scholar]
- Lin, J.; Dai, D.; Van Gool, L. Depth Estimation from Monocular Images, and Sparse Radar Data. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24–29 October 2020; pp. 10233–10240. [Google Scholar]
- Lo, C.-C.; Vandewalle, P. Depth Estimation From Monocular Images, and Sparse Radar Using Deep Ordinal Regression Network. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 3343–3347. [Google Scholar]
- Long, Y.; Morris, D.; Liu, X.; Castro, M.; Chakravarty, P.; Narayanan, P. Radar-Camera Pixel Depth Association for Depth Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 12502–12511. [Google Scholar]
- Lee, W.; Jovanov, L.; Philips, W. Semantic-Guided Radar-Vision Fusion for Depth Estimation, and Object Detection. In Proceedings of the 32th British Machine Vision Conference (BMVC), Online, 22–25 November 2021. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.; Vora, S.; Liong, V.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11618–11628. [Google Scholar]
- Huang, Y.; Liu, Y.; Wu, T.; Su, H.; Chang, Y.; Tsou, T.; Wang, Y.; Hsu, W. S³: Learnable Sparse Signal Superdensity for Guided Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16701–16711. [Google Scholar]
- Singh, A.; Ba, Y.; Sarker, A.; Zhang, H.; Kadambi, A.; Soatto, S.; Srivastava, M.; Wong, A. Depth Estimation From Camera Image and mmWave Radar Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9275–9285. [Google Scholar]
- Saxena, A.; Chung, S.; Ng, A. Learning depth from single monocular images. In Proceedings of the Advances in Neural Information Processing Systems 18 (NIPS 2005), Vancouver, BC, Canada, 5–8 December 2005. [Google Scholar]
- Saxena, A.; Sun, M.; Ng, A. Make3d: Learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 824–840. [Google Scholar] [CrossRef] [PubMed]
- Aich, S.; Vianney, J.; Islam, A.; Kaur, M.; Liu, B. Bidirectional Attention Network for Monocular Depth Estimation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11746–11752. [Google Scholar]
- Zhu, S.; Brazil, G.; Liu, X. The edge of depth: Explicit constraints between segmentation and depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13113–13122. [Google Scholar]
- Vijayanarasimhan, S.; Ricco, S.; Schmid, C.; Sukthankar, R.; Fragkiadaki, K. SfMNet: Learning of structure and motion from video. arXiv 2017, arXiv:1704.07804. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision Transformers for Dense Prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12159–12168. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Wong, A.; Cicek, S.; Soatto, S. Learning topology from synthetic data for unsupervised depth completion. IEEE Robot. Autom. Lett. 2021, 6, 1495–1502. [Google Scholar] [CrossRef]
- Hu, M.; Wang, S.; Li, B.; Ning, S.; Fan, L.; Gong, X. PENet: Towards Precise and Efficient Image Guided Depth Completion. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13656–13662. [Google Scholar]
- Vangansbeke, W.; Neven, D.; Debrabandere, B.; Van Gool, L. Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty. In Proceedings of the International Conference on Machine Vision and Applications (MVA), Tokyo, Japan, 27–31 May 2019. [Google Scholar]
- Li, A.; Yuan, Z.; Ling, Y.; Chi, W.; Zhang, S.; Zhang, C. A Multi-Scale Guided Cascade Hourglass Network for Depth Completion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 32–40. [Google Scholar]
- Qiu, J.; Cui, Z.; Zhang, Y.; Zhang, X.; Liu, S.; Zeng, B.; Pollefeys, M. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3308–3317. [Google Scholar]
- Jaritz, M.; De Charette, R.; Wirbel, E.; Perrotton, X.; Nashashibi, F. Sparse and dense data with CNNs: Depth completion and semantic segmentation. In Proceedings of the International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 52–60. [Google Scholar]
- Lo, C.-C.; Vandewalle, P. RCDPT: Radar-Camera Fusion Dense Prediction Transformer. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Bombay, India, 7 January 1998; pp. 839–846. [Google Scholar]
- Paris, S.; Kornprobst, P.; Tumblin, J.; Durand, F. Bilateral Filtering: Theory and Applications; Now Foundations and Trends: Norwell, MA, USA, 2009; Available online: https://ieeexplore.ieee.org/document/8187212 (accessed on 18 August 2009).
- Kopf, J.; Cohen, M.; Lischinski, D.; Uyttendaele, M. Joint bilateral upsampling. ACM Trans. Graph. 2007, 26, 96–es. [Google Scholar] [CrossRef]
- Yang, Q.; Yang, R.; Davis, J.; Nister, D. Spatial-depth super resolution for range images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
- Eisemann, E.; Durand, F. Flash photography enhancement via intrinsic relighting. ACM Trans. Graph. 2004, 23, 673–678. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style High-Performance Deep Learning Library. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; pp. 8026–8037. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Roy, A.; Todorovic, S. Monocular Depth Estimation Using Neural Regression Forest. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5506–5514. [Google Scholar]
- Lo, C.-C.; Vandewalle, P. How Much Depth Information can Radar Contribute to a Depth Estimation Model? In Proceedings of the IST Symposium on Electronic Imaging, San Francisco, CA, USA, 19 January 2023; pp. 122-1–122-7. [Google Scholar]
Method | Lidar | ↑ | ↑ | RMSE ↓ | # Points ↑ | Density (%) ↑ |
---|---|---|---|---|---|---|
Raw | - | 0.41 | 0.61 | 29.93 | 39.01 | 0.01 |
Height-extend [14] | - | 0.46 | 0.67 | 21.52 | 9187 | 2.55 |
(ad hoc) [18] | - | 0.43 | 0.63 | 28.04 | 6518 | 1.81 |
MER [15] | ✓ | 0.73 | 0.85 | 11.29 | 25,370 | 7.05 |
JBF | - | 0.59 | 0.77 | 14.64 | 103,249 | 28.68 |
Method | Radar Format | ↑ | ↑ | ↑ | RMSE ↓ | AbsRel ↓ |
---|---|---|---|---|---|---|
DORN [5] | - | 0.872 | 0.952 | 0.978 | 5.382 | 0.117 |
S2D [9] | - | 0.862 | 0.948 | 0.976 | 5.613 | 0.126 |
DPT [26] | - | 0.886 | 0.957 | 0.980 | 5.244 | 0.106 |
S2D [9] | Raw | 0.876 | 0.949 | 0.974 | 5.628 | 0.115 |
DORNradar (1-stage) [14] | Height-extend | 0.889 | 0.961 | 0.984 | 5.191 | 0.109 |
DORNradar (2-stage) [14] | Height-extend | 0.890 | 0.960 | 0.983 | 5.260 | 0.108 |
Lin (1-stage) [13] | Raw | 0.884 | 0.953 | 0.977 | 5.409 | 0.112 |
Lin (2-stage) [13] | Raw | 0.901 | 0.958 | 0.978 | 5.180 | 0.100 |
Lee [16] | Raw | 0.895 | 0.958 | 0.978 | 5.209 | 0.104 |
[18] | 0.798 | 0.921 | 0.962 | 6.77 | 0.161 | |
(GDC) [18] | 0.799 | 0.921 | 0.962 | 6.76 | 0.160 | |
FusionNet [19] | RadarNet | 0.87 | 0.95 | 0.98 | 5.79 | 0.12 |
RC-PDA [15] | MER | 0.830 | 0.917 | 0.956 | 6.942 | 0.128 |
DPT-Early [34] | MER | 0.892 | 0.956 | 0.978 | 5.401 | 0.099 |
DPT-Late [34] | MER | 0.888 | 0.958 | 0.981 | 5.207 | 0.104 |
RCDPT [34] | MER | 0.901 | 0.961 | 0.981 | 5.165 | 0.095 |
DORNradar (1-stage) [14] | JBF | 0.901 | 0.962 | 0.981 | 5.228 | 0.104 |
RCDPT [34] | JBF | 0.909 | 0.964 | 0.985 | 4.873 | 0.089 |
Model | Input Radar | CAP (m) | ↑ | ↑ | RMSE ↓ | AbsRel ↓ |
---|---|---|---|---|---|---|
Raw radar | 80 | 0.716 | 0.774 | 7.817 | 0.260 | |
Lo [14] | Height-extend [14] | 80 | 0.763 | 0.844 | 6.582 | 0.232 |
MER [15] | 80 | 0.736 | 0.902 | 7.781 | 0.227 | |
JBF | 80 | 0.786 | 0.902 | 7.684 | 0.196 | |
Raw radar | 80 | 0.714 | 0.768 | 8.151 | 0.247 | |
Lin [13] | Height-extend [14] | 80 | 0.783 | 0.865 | 6.404 | 0.220 |
MER [15] | 80 | 0.801 | 0.890 | 7.290 | 0.155 | |
JBF | 80 | 0.785 | 0.901 | 7.698 | 0.179 |
(, ) | ↑ | ↑ | ↑ | RMSE ↓ | AbsRel ↓ |
---|---|---|---|---|---|
(10, 5) | 0.901 | 0.961 | 0.981 | 5.175 | 0.093 |
(25, 10) | 0.909 | 0.964 | 0.985 | 4.873 | 0.089 |
(50, 20) | 0.891 | 0.959 | 0.980 | 5.317 | 0.102 |
Kernel | (, ) | ↑ | ↑ | RMSE ↓ | # Points ↑ | Density (%) ↑ |
---|---|---|---|---|---|---|
JBF | (25, 10) | 0.59 | 0.77 | 14.64 | 103,249 | 28.68 |
Range | (-, 10) | 0.54 | 0.69 | 19.62 | 181,609 | 50.44 |
Spatial | (25, -) | 0.52 | 0.66 | 22.18 | 166,668 | 46.29 |
Confidence Map | ↑ | ↑ | ↑ | RMSE ↓ | AbsRel ↓ |
---|---|---|---|---|---|
No | 0.909 | 0.964 | 0.985 | 4.873 | 0.089 |
Yes | 0.911 | 0.967 | 0.986 | 4.735 | 0.087 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lo, C.-C.; Vandewalle, P. Expanding Sparse Radar Depth Based on Joint Bilateral Filter for Radar-Guided Monocular Depth Estimation. Sensors 2024, 24, 1864. https://doi.org/10.3390/s24061864
Lo C-C, Vandewalle P. Expanding Sparse Radar Depth Based on Joint Bilateral Filter for Radar-Guided Monocular Depth Estimation. Sensors. 2024; 24(6):1864. https://doi.org/10.3390/s24061864
Chicago/Turabian StyleLo, Chen-Chou, and Patrick Vandewalle. 2024. "Expanding Sparse Radar Depth Based on Joint Bilateral Filter for Radar-Guided Monocular Depth Estimation" Sensors 24, no. 6: 1864. https://doi.org/10.3390/s24061864
APA StyleLo, C. -C., & Vandewalle, P. (2024). Expanding Sparse Radar Depth Based on Joint Bilateral Filter for Radar-Guided Monocular Depth Estimation. Sensors, 24(6), 1864. https://doi.org/10.3390/s24061864