Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction
Abstract
:1. Introduction
2. Related Work
2.1. Deep Estimation Methods for Monocular Images Based on Deep Learning
2.2. Current Status of Generative Adversarial Networks
3. Methods
3.1. Network Structure
3.2. Generator
3.3. Discriminator
3.4. Loss Function
4. Experimental Results and Analysis
4.1. Experimental Design
4.2. Training Method of Model
Algorithm 1: Pseudo-code of monocular image depth prediction based on the cGAN |
For the number of training iterations, do: |
For k steps do: |
|
End for |
|
End for |
4.3. Experimental Results and Conclusions
4.3.1. NYU-V2 Dataset
- 1449 densely annotated aligned image-depth pairs;
- Data from 464 new scenes across 3 cities;
- 407,024 unannotated frames.
- Random noise addition: Add some noise to each random vector during each training epoch, where the noise is sampled from a Gaussian distribution with a mean of 0 and a variance of 1.
- Conditional vector addition: Use room type, indoor furniture, lighting, and other information vectors as conditional inputs to the generator to generate realistic images.
4.3.2. Make3D Dataset
- Scaling: the Input and target images are scaled with the corresponding depth data divided by .
- Rotation: the input and target images are rotated by degrees.
- Color adjustment: the Input image is multiplied by a random RGB value .
- Flips: the Input and target images are horizontally flipped with a 0.5 probability
- Adding conditional vector: information vectors such as city or rural, lighting, and roads can be added as conditional vectors to the generator to generate realistic images.
5. Limitation
- The primary drawback of using GANs is that they can be challenging to train. Despite the implementation of various empirical tricks to improve efficiency (such as using batch normalization in our proposed method), GANs remain difficult to train.
- Compared with the latest monocular image depth estimation algorithms, the performance of our algorithm is not outstanding enough, possibly because we have not optimized the generator structure optimization well, especially since the attention module has not been added. Experiments have shown that the attention mechanism can significantly improve the accuracy and detail extraction of image depth estimation.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Babic, B.; Miljkovic, Z.; Vukovic, N.; Antic, V. Towards Implementation and Autonomous Navigation of an Intelligent Automated Guided Vehicle in Material Handing Systems. IJST-T Mech Eng. 2012, 36, 25–40. [Google Scholar]
- Jensen, L.K.; Kristensen, B.B.; Demazeau, Y. FLIP: Prototyping multi-robot systems. Robot Auton. Syst. 2005, 53, 230–243. [Google Scholar] [CrossRef]
- Mahjourian, R.; Wicke, M.; Angelova, A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Chen, K.; Li, J.; Lin, W.; See, J.; Wang, J.; Duan, L.; Chen, Z.; He, C.; Zou, J. Towards accurate one-stage object detection with AP-loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 15–21 June 2019. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multiscale deep network. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, Canada, 8–13 December 2014. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016. [Google Scholar]
- Cao, Y.; Wu, Z.; Shen, C. Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 3174–3182. [Google Scholar] [CrossRef] [Green Version]
- Liu, M.; Salzmann, M.; He, X. Discrete-Continuous Depth Estimation from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Hu, J.; Ozay, M.; Zhang, Y.; Okatani, T. Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019. [Google Scholar]
- Alhashim, I.; Wonka, P. High Quality Monocular Depth Estimation via Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Bhat, S.F.; Alhashim, I.; Wonka, P. AdaBins: Depth Estimation Using Adaptive Bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Kim, D.; Ka, W.; Ahn, P.; Joo, D.; Chun, S.; Kim, J. Global-Local Path Networks for Monocular 13. Depth Estimation with Vertical CutDepth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–23 June 2022. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kuznietsov, Y.; Stuckler, J.; Leibe, B. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Bian, J.; Li, Z.; Wang, N.; Zhan, H.; Shen, C.; Cheng, M.M.; Reid, I. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Casser, V.; Pirk, S.; Mahjourian, R.; Angelova, A. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Waikoloa, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Bhutani, V.; Vankadari, M.; Jha, O.; Majumder, A.; Kumar, S.; Dutta, S. Unsupervised Depth and Confidence Prediction from Monocular Images using Bayesian Inference. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021. [Google Scholar]
- Almalioglu, Y.; Saputra, M.R.U.; Gusmão, P.P.B.d.; Markham, A.; Trigoni, N. GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the 33th International Conference on Machine Learning, San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least Squares Generative Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved Training of Wasserstein GANs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Berthelot, D.; Schumm, T.; Metz, L. BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv 2017, arXiv:1703.10717. [Google Scholar]
- Lsola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on International Conference on Machine Learning, Lile, France, 6–11 July 2015. [Google Scholar]
- Fu, H.; Gong, M.M.; Wang, C.H.; Batmanghelich, K.; Tao, D. Deep Ordinal Regression Network for Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA, 18–22 June 2018. [Google Scholar]
- Agarwal, A.; Arora, C. Attention Attention Everywhere: Monocular Depth Prediction With Skip Attention. In Proceedings of the IEEE/CVFWinter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 1–7 January 2023. [Google Scholar]
- Li, B.; Shen, C.; Dai, Y.; Van Den Hengel, A.; He, M. Depth and surface normal estimation from monocular images using regressionon deep features and hierarchical crfs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G. Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, S.; Zhang, L.; Qiu, K.; Zhang, Z. Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction. Electronics 2023, 12, 1189. https://doi.org/10.3390/electronics12051189
Hao S, Zhang L, Qiu K, Zhang Z. Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction. Electronics. 2023; 12(5):1189. https://doi.org/10.3390/electronics12051189
Chicago/Turabian StyleHao, Shengang, Li Zhang, Kefan Qiu, and Zheng Zhang. 2023. "Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction" Electronics 12, no. 5: 1189. https://doi.org/10.3390/electronics12051189
APA StyleHao, S., Zhang, L., Qiu, K., & Zhang, Z. (2023). Conditional Generative Adversarial Network for Monocular Image Depth Map Prediction. Electronics, 12(5), 1189. https://doi.org/10.3390/electronics12051189