Noise-Resilient Depth Estimation for Light Field Images Using Focal Stack and FFT Analysis
Abstract
:1. Introduction
2. Our Contribution
- To reduce the dependence of depth accuracy on RGB values of individual pixels compared in the image patches, we propose a method that uses frequency domain analysis to estimate the depth map for light field images.
- The key contribution of our approach is noise resilience in depth estimation. Our analysis confirms the hypothesis that comparing focal stack image patches in the frequency domain improves depth map accuracy, especially in the presence of noise. We shown that our algorithm out-performs the current state-of-the-art benchmark algorithms in the presence of noise.
3. Background
4. Related Work
4.1. Depth Estimation Using Stereo Matching
4.2. Depth Estimation Using Epipolar Plane Images
4.3. Depth Estimation Using Defocus
4.4. Depth Estimation Using Convolutional Neural Networks
5. Methodology
5.1. Initial Depth Estimation
5.2. Focal Stack Generation and Image Pre-Processing
5.3. Patch Generation and Comparison
5.4. Depth Map Refinement
6. Misdetection Analysis
7. Experimental Results
7.1. Synthetic LF Images
7.2. Real LF Images
7.3. Noisy Image Analysis
7.4. Runtime Complexity Analysis
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhou, J.; Yang, D.; Cui, Z.; Wang, S.; Sheng, H. LRFNet: An Occlusion Robust Fusion Network for Semantic Segmentation with Light Field. In Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Virtual, 1–3 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1168–1178. [Google Scholar]
- Hu, X.; Yang, K.; Fei, L.; Wang, K. Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1440–1444. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Chen, C.; Lin, H.; Yu, Z.; Bing Kang, S.; Yu, J. Light field stereo matching using bilateral statistics of surface cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1518–1525. [Google Scholar]
- Zhang, Y.; Lv, H.; Liu, Y.; Wang, H.; Wang, X.; Huang, Q.; Xiang, X.; Dai, Q. Light-Field Depth Estimation via Epipolar Plane Image Analysis and Locally Linear Embedding. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 739–747. [Google Scholar] [CrossRef]
- Schechner, Y.Y.; Kiryati, N. Depth from Defocus vs. Stereo: How Different Really Are They? Int. J. Comput. Vis. 2000, 39, 141–162. [Google Scholar] [CrossRef]
- Feng, M.; Wang, Y.; Liu, J.; Zhang, L.; Zaki, H.F.M.; Mian, A. Benchmark Data Set and Method for Depth Estimation from Light Field Images. IEEE Trans. Image Process. 2018, 27, 3586–3598. [Google Scholar] [CrossRef] [PubMed]
- Heber, S.; Yu, W.; Pock, T. Neural EPI-Volume Networks for Shape from Light Field. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2271–2279. [Google Scholar] [CrossRef]
- Strecke, M.; Alperovich, A.; Goldluecke, B. Accurate depth and normal maps from occlusion-aware focal stack symmetry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 2814–2822. [Google Scholar]
- Wang, T.; Efros, A.A.; Ramamoorthi, R. Occlusion-Aware Depth Estimation Using Light-Field Cameras. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3487–3495. [Google Scholar] [CrossRef]
- Zhang, S.; Sheng, H.; Li, C.; Zhang, J.; Xiong, Z. Robust depth estimation for light field via spinning parallelogram operator. Comput. Vis. Image Underst. 2016, 145, 148–159. [Google Scholar] [CrossRef]
- Shin, C.; Jeon, H.G.; Yoon, Y.; Kweon, I.S.; Kim, S.J. Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4748–4757. [Google Scholar]
- Ng, R. Digital Light Field Photography; Stanford University: Stanford, CA, USA, 2006. [Google Scholar]
- Wu, G.; Masia, B.; Jarabo, A.; Zhang, Y.; Wang, L.; Dai, Q.; Chai, T.; Liu, Y. Light field image processing: An overview. IEEE J. Sel. Top. Signal Process. 2017, 11, 926–954. [Google Scholar] [CrossRef] [Green Version]
- Mousnier, A.; Vural, E.; Guillemot, C. Partial light field tomographic reconstruction from a fixed-camera focal stack. arXiv 2015, arXiv:1503.01903. [Google Scholar]
- Wilburn, B.; Joshi, N.; Vaish, V.; Talvala, E.V.; Antunez, E.; Barth, A.; Adams, A.; Horowitz, M.; Levoy, M. High performance imaging using large camera arrays. ACM Trans. Graph. TOG 2005, 24, 765–776. [Google Scholar] [CrossRef] [Green Version]
- Ng, R.; Levoy, M.; Brédif, M.; Duval, G.; Horowitz, M.; Hanrahan, P. Light field photography with a hand-held plenoptic camera. Comput. Sci. Tech. Rep. CSTR 2005, 2, 1–11. [Google Scholar]
- Levoy, M.; Hanrahan, P. Light field rendering. In Proceedings of the 23rd annual Conference on Computer Graphics and Interactive Techniques, ACM, Virtual Event, 9–13 August 2021; pp. 31–42. [Google Scholar]
- Kolmogorov, V.; Zabih, R. Multi-camera scene reconstruction via graph cuts. In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, 28–31 May 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 82–96. [Google Scholar]
- Woodford, O.; Torr, P.; Reid, I.; Fitzgibbon, A. Global stereo reconstruction under second-order smoothness priors. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 2115–2128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bleyer, M.; Rother, C.; Kohli, P. Surface stereo with soft segmentation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1570–1577. [Google Scholar]
- Johannsen, O.; Sulc, A.; Goldluecke, B. What Sparse Light Field Coding Reveals about Scene Structure. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3262–3270. [Google Scholar] [CrossRef]
- Wanner, S.; Goldluecke, B. Reconstructing reflective and transparent surfaces from epipolar plane images. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); LNCS: Tokyo, Japan, 2013; Volume 8142, pp. 1–10. [Google Scholar] [CrossRef] [Green Version]
- Criminisi, A.; Kang, S.B.; Swaminathan, R.; Szeliski, R.; Anandan, P. Extracting layers and analyzing their specular properties using epipolar-plane-image analysis. Comput. Vis. Image Underst. 2005, 97, 51–85. [Google Scholar] [CrossRef]
- Wanner, S.; Goldluecke, B. Globally Consistent Depth Labeling of 4D Light Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 41–48. [Google Scholar] [CrossRef]
- Zhu, X.; Cohen, S.; Schiller, S.; Milanfar, P. Estimating Spatially Varying Defocus Blur from a Single Image. IEEE Trans. Image Process. 2013, 22, 4879–4891. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhuo, S.; Sim, T. Defocus map estimation from a single image. Pattern Recognit. 2011, 44, 1852–1858. [Google Scholar] [CrossRef]
- Nayar, S.K.; Nakagawa, Y. Shape from focus. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 824–831. [Google Scholar] [CrossRef] [Green Version]
- Tao, M.W.; Hadap, S.; Malik, J.; Ramamoorthi, R. Depth from Combining Defocus and Correspondence Using Light-Field Cameras. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 673–680. [Google Scholar] [CrossRef]
- Heber, S.; Pock, T. Convolutional Networks for Shape from Light Field. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3746–3754. [Google Scholar] [CrossRef]
- 3dMD Laser Scanner. 2018. Available online: https://3dmd.com (accessed on 18 December 2021).
- Honauer, K.; Johannsen, O.; Kondermann, D.; Goldluecke, B. A dataset and evaluation methodology for depth estimation on 4d light fields. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Berlin/Heidelberg, Germany, 2017; pp. 19–34. [Google Scholar]
- Rerabek, M.; Ebrahimi, T. New light field image dataset. In Proceedings of the 8th International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
Dots | Dots | |
---|---|---|
FFT Patch Depth Map | RGB Patch Depth Map | |
Badpix 0.07 | 0.9705 | 0.7760 |
Badpix 0.03 | 0.8853 | 0.5967 |
Badpix 0.01 | 0.3880 | 0.2391 |
Back-Gammon | Dots | Kitchen | Medi-eval2 | Museum | |
---|---|---|---|---|---|
Proposed Results | |||||
Badpix7 | 0.8230 | 0.9605 | 0.7010 | 0.9362 | 0.8440 |
Badpix3 | 0.7324 | 0.8853 | 0.5941 | 0.8528 | 0.7772 |
Badpix1 | 0.4910 | 0.3880 | 0.3749 | 0.5514 | 0.5305 |
Strecke et al. [9] | |||||
Badpix7 | 0.9580 | 0.6273 | 0.7224 | 0.9608 | 0.8578 |
Badpix3 | 0.9283 | 0.4514 | 0.6282 | 0.8895 | 0.7615 |
Badpix1 | 0.6606 | 0.1777 | 0.4644 | 0.6469 | 0.5256 |
Wang et al. [10] | |||||
Badpix7 | 0.8753 | 0.8801 | 0.6300 | 0.5136 | 0.8522 |
Badpix3 | 0.4525 | 0.2485 | 0.3991 | 0.1119 | 0.6902 |
Badpix1 | 0.0544 | 0.0456 | 0.1772 | 0.0370 | 0.2741 |
Zhang et al. [11] | |||||
Badpix7 | 0.7889 | 0.7358 | 0.6379 | 0.9580 | 0.8940 |
Badpix3 | 0.3762 | 0.4810 | 0.3165 | 0.7513 | 0.5413 |
Badpix1 | 0.1057 | 0.4810 | 0.0997 | 0.2658 | 0.1899 |
Shin et al. [12] | |||||
Badpix7 | 0.9777 | 0.9473 | 0.7931 | 0.9847 | 0.9598 |
Badpix3 | 0.9594 | 0.7957 | 0.7209 | 0.9584 | 0.9053 |
Badpix1 | 0.8265 | 0.5122 | 0.4809 | 0.7263 | 0.6478 |
Pillows | Platonic | Pyramids | Stripes | Tomb | |
---|---|---|---|---|---|
Proposed Results | |||||
Badpix7 | 0.9212 | 0.9747 | 0.9920 | 0.8853 | 0.9696 |
Badpix3 | 0.8769 | 0.9447 | 0.9582 | 0.8275 | 0.9100 |
Badpix1 | 0.6096 | 0.7600 | 0.7485 | 0.6732 | 0.6423 |
Strecke et al. [9] | |||||
Badpix7 | 0.9710 | 0.9645 | 0.9969 | 0.8741 | 0.9813 |
Badpix3 | 0.8687 | 0.9230 | 0.9927 | 0.8556 | 0.9252 |
Badpix1 | 0.4914 | 0.7792 | 0.9417 | 0.4925 | 0.6875 |
Wang et al. [10] | |||||
Badpix7 | 0.9387 | 0.6583 | 0.9843 | 0.8231 | 0.7953 |
Badpix3 | 0.5611 | 0.4620 | 0.7520 | 0.0048 | 0.4134 |
Badpix1 | 0.1492 | 0.1889 | 0.0737 | 0.0004 | 0.1359 |
Zhang et al. [11] | |||||
Badpix7 | 0.9398 | 0.9906 | 0.8958 | 0.8373 | 0.9622 |
Badpix3 | 0.5066 | 0.7454 | 0.1885 | 0.5243 | 0.7500 |
Badpix1 | 0.1869 | 0.2946 | 0.0634 | 0.5243 | 0.2871 |
Shin et al. [12] | |||||
Badpix7 | 0.9939 | 0.9981 | 0.9972 | 0.9894 | 0.9963 |
Badpix3 | 0.9772 | 0.9941 | 0.9917 | 0.9865 | 0.9826 |
Badpix1 | 0.7727 | 0.7273 | 0.8673 | 0.8869 | 0.6453 |
Dots | |||||
---|---|---|---|---|---|
Proposed Result | Strecke et al. [9] | Wang et al. [10] | Zhang et al. [11] | Shin et al. [12] | |
Badpix7 | 0.9605 | 0.6273 | 0.8800 | 0.7357 | 0.9473 |
Badpix3 | 0.8853 | 0.4514 | 0.2485 | 0.4810 | 0.7957 |
Badpix1 | 0.3880 | 0.1777 | 0.0456 | 0.4810 | 0.5122 |
Back-Gammon | Dots | Kitchen | Medi-eval2 | Museum | |
---|---|---|---|---|---|
Proposed Results | |||||
Badpix7 | 0.7408 | 0.9620 | 0.6341 | 0.8171 | 0.6921 |
Badpix3 | 0.5126 | 0.8561 | 0.4323 | 0.5584 | 0.4889 |
Badpix1 | 0.2101 | 0.2808 | 0.1733 | 0.2290 | 0.2023 |
Strecke et al. [9] | |||||
Badpix7 | 0.2781 | 0.3975 | 0.2309 | 0.3024 | 0.1938 |
Badpix3 | 0.1312 | 0.1895 | 0.1140 | 0.1419 | 0.0919 |
Badpix1 | 0.0450 | 0.0634 | 0.0402 | 0.0487 | 0.0318 |
Wang et al. [10] | |||||
Badpix7 | 0.0022 | 0.8619 | 0.1229 | 0.0892 | 0.1868 |
Badpix3 | 0.0008 | 0.7684 | 0.0570 | 0.0340 | 0.0790 |
Badpix1 | 0.0003 | 0.1351 | 0.0197 | 0.0083 | 0.0260 |
Zhang et al. [11] | |||||
Badpix7 | 0.0144 | 0.0002 | 0.1587 | 0.2456 | 0.1054 |
Badpix3 | 0.0057 | 0.0001 | 0.0666 | 0.1022 | 0.0517 |
Badpix1 | 0.0019 | 0.0000 | 0.0217 | 0.0322 | 0.0174 |
Shin et al. [12] | |||||
Badpix7 | 0.5778 | 0.8990 | 0.5035 | 0.6512 | 0.5237 |
Badpix3 | 0.3265 | 0.6624 | 0.3090 | 0.3898 | 0.3112 |
Badpix1 | 0.1162 | 0.3034 | 0.1247 | 0.1451 | 0.1181 |
Pillows | Platonic | Pyramids | Stripes | Tomb | |
---|---|---|---|---|---|
Proposed Results | |||||
Badpix7 | 0.6823 | 0.8457 | 0.9891 | 0.3582 | 0.8008 |
Badpix3 | 0.5099 | 0.6272 | 0.9108 | 0.1982 | 0.5089 |
Badpix1 | 0.2417 | 0.2600 | 0.4835 | 0.0930 | 0.1877 |
Strecke et al. [9] | |||||
Badpix7 | 0.3212 | 0.3093 | 0.6557 | 0.1388 | 0.1404 |
Badpix3 | 0.1698 | 0.1476 | 0.3456 | 0.0641 | 0.0615 |
Badpix1 | 0.0613 | 0.0511 | 0.1136 | 0.0212 | 0.0206 |
Wang et al. [10] | |||||
Badpix7 | 0.1472 | 0.2515 | 0.6136 | 0.2488 | 0.0561 |
Badpix3 | 0.0700 | 0.1126 | 0.0870 | 0.1106 | 0.0209 |
Badpix1 | 0.0257 | 0.0366 | 0.0141 | 0.0014 | 0.0069 |
Zhang et al. [11] | |||||
Badpix7 | 0.1794 | 0.0394 | 0.2569 | 0.0419 | 0.0530 |
Badpix3 | 0.0814 | 0.0148 | 0.0917 | 0.0219 | 0.0214 |
Badpix1 | 0.0273 | 0.0045 | 0.0115 | 0.0078 | 0.0075 |
Shin et al. [12] | |||||
Badpix7 | 0.6000 | 0.6621 | 0.9729 | 0.2084 | 0.3957 |
Badpix3 | 0.4383 | 0.3836 | 0.8325 | 0.1051 | 0.1904 |
Badpix1 | 0.2193 | 0.1461 | 0.4132 | 0.0372 | 0.0664 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sharma, R.; Perry, S.; Cheng, E. Noise-Resilient Depth Estimation for Light Field Images Using Focal Stack and FFT Analysis. Sensors 2022, 22, 1993. https://doi.org/10.3390/s22051993
Sharma R, Perry S, Cheng E. Noise-Resilient Depth Estimation for Light Field Images Using Focal Stack and FFT Analysis. Sensors. 2022; 22(5):1993. https://doi.org/10.3390/s22051993
Chicago/Turabian StyleSharma, Rishabh, Stuart Perry, and Eva Cheng. 2022. "Noise-Resilient Depth Estimation for Light Field Images Using Focal Stack and FFT Analysis" Sensors 22, no. 5: 1993. https://doi.org/10.3390/s22051993
APA StyleSharma, R., Perry, S., & Cheng, E. (2022). Noise-Resilient Depth Estimation for Light Field Images Using Focal Stack and FFT Analysis. Sensors, 22(5), 1993. https://doi.org/10.3390/s22051993