FPattNet: A Multi-Scale Feature Fusion Network with Occlusion Awareness for Depth Estimation of Light Field Images
:1. Introduction
- Occlusion awareness: The depth estimation network is specifically designed to handle occluded scenes. We employ a multi-scale feature fusion strategy that amalgamates contextual information from various levels, enabling the model to handle challenging scenarios, such as textureless and occluded regions.
- Scene-specific depth estimation: The optimal combination of views for accurate depth estimation can vary from scene to scene. With this view selection module, our network can dynamically adapt its focus to the most relevant information in each scene.
2. Related Work
2.1. Quality Assessment of Light Filed Image
2.2. Optimization-Based Depth Estimation Methods
2.2.1. Methods Based on Multi-View Stereo Matching
2.2.2. Methods Based on EPI
2.2.3. Refocusing-Based Methods
2.3. Learning-Based Depth Estimation Methods
3. Methodology
3.1. Feature Extraction and Feature Pyramid Network
3.2. Construction Cost Volume
3.3. View Selection Module
3.4. Disparity Regression and Loss
4. Experiment and Discussion
4.1. 4D Light Field Dataset
4.2. Implementation Details
4.3. Evaluation
4.4. Ablation Experiment
4.4.1. The Effect of the Feature Pyramid Networks
4.4.2. The Effect of the View Selection Module
5. Conclusions
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- De Jesus, J.C.; Kich, V.A.; Kolling, A.H.; Grando, R.B.; Guerra, R.S.; Drews, P.L.J. Depth-CUPRL: Depth-Imaged Contrastive Unsupervised Prioritized Representations in Reinforcement Learning for Mapless Navigation of Unmanned Aerial Vehicles. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10579–10586. [Google Scholar]
- Li, C.-C.; Shuai, H.-H.; Wang, L.-C. Efficiency-Reinforced Learning with Auxiliary Depth Reconstruction for Autonomous Navigation of Mobile Devices. In Proceedings of the 2022 23rd IEEE International Conference on Mobile Data Management (MDM), Paphos, Cyprus, 6–9 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 458–463. [Google Scholar]
- Kim, C.; Zimmer, H.; Pritch, Y.; Sorkine-Hornung, A.; Gross, M. Scene Reconstruction from High Spatio-Angular Resolution Light Fields. ACM Trans. Graph. 2013, 32, 73. [Google Scholar] [CrossRef]
- Geiger, A.; Ziegler, J.; Stiller, C. StereoScan: Dense 3d Reconstruction in Real-Time. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, 5–9 June 2011; IEEE: New York, NY, USA, 2011; pp. 963–968. [Google Scholar]
- El Jamiy, F.; Marsh, R. Survey on Depth Perception in Head Mounted Displays: Distance Estimation in Virtual Reality, Augmented Reality, and Mixed Reality. IET Image Process. 2019, 13, 707–712. [Google Scholar] [CrossRef]
- Choi, M.-H.; Yi, W.-J.; Choi, S.-E.; Kang, S.-R.; Yoo, J.-Y.; Yang, S.; Kim, J.-E.; Huh, K.-H.; Lee, S.-S.; Heo, M.-S. Markerless Registration for Augmented-Reality Surgical Navigation System Based on Monocular Depth Estimation. Trans. Korean Inst. Electr. Eng. 2021, 70, 1898–1905. [Google Scholar] [CrossRef]
- Tao, Y.; Xiong, S.; Conway, S.J.; Muller, J.-P.; Guimpier, A.; Fawdon, P.; Thomas, N.; Cremonese, G. Rapid Single Image-Based DTM Estimation from ExoMars TGO CaSSIS Images Using Generative Adversarial U-Nets. Remote Sens. 2021, 13, 2877. [Google Scholar] [CrossRef]
- Lore, K.G.; Reddy, K.; Giering, M.; Bernal, E.A. Generative Adversarial Networks for Depth Map Estimation from RGB Video. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: New York, NY, USA, 2018; pp. 1258–1266. [Google Scholar]
- Tao, Y.; Muller, J.-P.; Xiong, S.; Conway, S.J. MADNet 2.0: Pixel-Scale Topography Retrieval from Single-View Orbital Imagery of Mars Using Deep Learning. Remote Sens. 2021, 13, 4220. [Google Scholar] [CrossRef]
- Raytrix|3D Light Field Camera Technology. Available online: https://raytrix.de/ (accessed on 1 July 2023).
- Heber, S.; Pock, T. Shape from Light Field Meets Robust PCA. In Computer Vision—ECCV 2014; Lecture Notes in Computer Science; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; Volume 8694, pp. 751–767. ISBN 978-3-319-10598-7. [Google Scholar]
- Jeon, H.-G.; Park, J.; Choe, G.; Park, J.; Bok, Y.; Tai, Y.-W.; Kweon, I.S. Accurate Depth Map Estimation from a Lenslet Light Field Camera. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; IEEE: New York, NY, USA, 2015; pp. 1547–1555. [Google Scholar]
- Zhang, S.; Sheng, H.; Li, C.; Zhang, J.; Xiong, Z. Robust Depth Estimation for Light Field via Spinning Parallelogram Operator. Comput. Vis. Image Underst. 2016, 145, 148–159. [Google Scholar] [CrossRef]
- Wanner, S.; Goldluecke, B. Globally Consistent Depth Labeling of 4D Light Fields. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; IEEE: New York, NY, USA, 2012; pp. 41–48. [Google Scholar]
- Shin, C.; Jeon, H.-G.; Yoon, Y.; Kweon, I.S.; Kim, S.J. EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 8–22 June 2018; IEEE: New York, NY, USA, 2018; pp. 4748–4757. [Google Scholar]
- Leistner, T.; Schilling, H.; Mackowiak, R.; Gumhold, S.; Rother, C. Learning to Think Outside the Box: Wide-Baseline Light Field Depth Estimation with EPI-Shift. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 249–257. [Google Scholar]
- Shi, L.; Zhao, S.; Chen, Z. Belif: Blind Quality Evaluator of Light Field Image with Tensor Structure Variation Index. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3781–3785. [Google Scholar]
- Shi, L.; Zhou, W.; Chen, Z.; Zhang, J. No-Reference Light Field Image Quality Assessment Based on Spatial-Angular Measurement. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4114–4128. [Google Scholar] [CrossRef]
- Zhou, W.; Shi, L.; Chen, Z.; Zhang, J. Tensor Oriented No-Reference Light Field Image Quality Assessment. IEEE Trans. Image Process. 2020, 29, 4070–4084. [Google Scholar] [CrossRef] [PubMed]
- Meng, C.; An, P.; Huang, X.; Yang, C.; Shen, L.; Wang, B. Objective Quality Assessment of Lenslet Light Field Image Based on Focus Stack. IEEE Trans. Multimed. 2022, 24, 3193–3207. [Google Scholar] [CrossRef]
- Bishop, T.E.; Favaro, P. The Light Field Camera: Extended Depth of Field, Aliasing, and Superresolution. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 972–986. [Google Scholar] [CrossRef] [PubMed]
- Yu, Z.; Guo, X.; Ling, H.; Lumsdaine, A.; Yu, J. Line Assisted Light Field Triangulation and Stereo Matching. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013; IEEE: New York, NY, USA, 2013; pp. 2792–2799. [Google Scholar]
- Williem; Park, I.K.; Lee, K.M. Robust Light Field Depth Estimation Using Occlusion-Noise Aware Data Costs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2484–2497. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Lin, H.; Yu, Z.; Kang, S.B.; Yu, J. Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; IEEE: New York, NY, USA, 2014; pp. 1518–1525. [Google Scholar]
- Bolles, R.; Baker, H.; Marimont, D. Epipolar-Plane Image-Analysis-an Approach to Determining Structure from Motion. Int. J. Comput. Vis. 1987, 1, 7–55. [Google Scholar] [CrossRef]
- Matoušek, M.; Werner, T.; Hlavác, V. Accurate Correspondences from Epipolar Plane Images. In Proceedings of the Computer Vision Winter Workshop, Brno, Czech Republic, 5–7 February 2001; Citeseer: University Park, PA, USA, 2001; pp. 181–189. [Google Scholar]
- Criminisi, A.; Kang, S.B.; Swaminathan, R.; Szeliski, R.; Anandan, P. Extracting Layers and Analyzing Their Specular Properties Using Epipolar-Plane-Image Analysis. Comput. Vis. Image Underst. 2005, 97, 51–85. [Google Scholar] [CrossRef]
- Tao, M.W.; Hadap, S.; Malik, J.; Ramamoorthi, R. Depth from Combining Defocus and Correspondence Using Light-Field Cameras. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013; IEEE: New York, NY, USA, 2013; pp. 673–680. [Google Scholar]
- Mousnier, A.; Vural, E.; Guillemot, C. Partial Light Field Tomographic Reconstruction from a Fixed-Camera Focal Stack. arXiv 2015, arXiv:1503.01903. [Google Scholar]
- Heber, S.; Yu, W.; Pock, T. Neural EPI-Volume Networks for Shape from Light Field. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 2271–2279. [Google Scholar]
- Heber, S.; Pock, T. Convolutional Networks for Shape from Light Field. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 3746–3754. [Google Scholar]
- Tsai, Y.-J.; Liu, Y.-L.; Ouhyoung, M.; Chuang, Y.-Y. Attention-Based View Selection Networks for Light-Field Disparity Estimation. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Assoc Advancement Artificial Intelligence: Palo Alto, CA, USA, 2020; Volume 34, pp. 12095–12103. [Google Scholar]
- Lin, J.C.Z. Attention-Based Multi-Level Fusion Network for Light Field Depth Estimation. Available online: https://aaai.org/papers/01009-attention-based-multi-level-fusion-network-for-light-field-depth-estimation/ (accessed on 23 July 2023).
- Wang, Y.; Wang, L.; Liang, Z.; Yang, J.; An, W.; Guo, Y. Occlusion-Aware Cost Constructor for Light Field Depth Estimation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 19777–19786. [Google Scholar]
- Wang, Y.; Wang, L.; Wu, G.; Yang, J.; An, W.; Yu, J.; Guo, Y. Disentangling Light Fields for Super-Resolution and Disparity Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 425–443. [Google Scholar] [CrossRef] [PubMed]
- Yu, F.; Koltun, V.; Funkhouser, T. Dilated Residual Networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 636–644. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: New York, NY, USA, 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 936–944. [Google Scholar]
- Chang, J.-R.; Chen, Y.-S. Pyramid Stereo Matching Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: New York, NY, USA, 2018; pp. 5410–5418. [Google Scholar]
- Zbontar, J.; LeCun, Y. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. J. Mach. Learn. Res. 2016, 17, 65. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision-Eccv 2018, Pt Vii, Munich, Germany, 8–14 September 2018; International Publishing Ag: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 66–75. [Google Scholar]
- Honauer, K.; Johannsen, O.; Kondermann, D.; Goldluecke, B. A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields. In Proceedings of the Computer Vision-Accv 2016, Pt Iii, Taipei, Taiwan, 20–24 November 2016; Springer International Publishing Ag: Cham, Switzerland, 2017; Volume 10113, pp. 19–34. [Google Scholar]
- Jeon, H.-G.; Park, J.; Choe, G.; Park, J.; Bok, Y.; Tai, Y.-W.; Kweon, I.S. Depth from a Light Field Image with Learning-Based Matching Costs. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 297–310. [Google Scholar] [CrossRef] [PubMed]
- Schilling, H.; Diebold, M.; Rother, C.; Jaehne, B. Trust Your Model: Light Field Depth Estimation with Inline Occlusion Handling. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 4530–4538. [Google Scholar]
- Huang, Z.; Hu, X.; Xue, Z.; Xu, W.; Yue, T. Fast Light-Field Disparity Estimation with Multi-Disparity-Scale Cost Aggregation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 6300–6309. [Google Scholar]
- Luo, Y.; Zhou, W.; Fang, J.; Liang, L.; Zhang, H.; Dai, G. EPI-Patch Based Convolutional Neural Network for Depth Estimation on 4D Light Field. In Proceedings of the Neural Information Processing (ICONIP 2017), Pt Iii, Guangzhou, China, 14–18 November 2017; Springer International Publishing Ag: Cham, Switzerland, 2017; Volume 10636, pp. 642–652. [Google Scholar]
- Sheng, H.; Cong, R.; Yang, D.; Chen, R.; Wang, S.; Cui, Z. UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7880–7893. [Google Scholar] [CrossRef]
Layers | Kernel Size | Input Size | Output Size |
Feature Extraction | |||
Conv2D_1 | 3 × 3 | N × H × W × 1 | N × H × W × 4 |
Conv2D_2 | 3 × 3 | N × H × W × 4 | N × H × W × 4 |
ResBlock2D × 13 | 3 × 3 | N × H × W × 4 | N × (H/16) × (W/16) × 64 |
FPN Module | 1 × 1 | N × (H/16) × (W/16) × 64 | N × (H/8) × (W/8) × 8 |
1 × 1 | N × (H/8) × (W/8) × 8 | N × (H/4) × (W/4) × 16 | |
1 × 1 | N × (H/4) × (W/4) × 16 | N × (H/2) × (W/2) × 24 | |
1 × 1 | N × (H/2) × (W/2) × 24 | N × H × W × 32 | |
Cost Volume Construction | |||
Shift&Concat | - | N × H × W × 4 | D × H × W × (4 × N) |
Attention Module | 1 × 1 × 1 | D × H × W × (4 × N) | D × H × W × (4 × N) |
Cost Aggregation | |||
Conv3D_1 | 3 × 3 × 3 | D × H × W × (4 × N) | D × H × W × 150 |
Conv3D_2 | 3 × 3 × 3 | D × H × W × 150 | D × H × W × 150 |
ResBlock3D × 2 | 3 × 3 × 3 | D × H × W × 150 | D × H × W × 150 |
3 × 3 × 3 | |||
Conv3D_3 | 3 × 3 × 3 | D × H × W × 150 | D × H × W × 150 |
Cost | 3 × 3 × 3 | D × H × W × 150 | D × H × W × 1 |
Squeeze&Transpose | - | D × H × W × 1 | H × W × D |
Disparity Regression | |||
Softmax | - | H × W × D | H × W × D |
Regress | - | H × W × D | H × W × 1 |
Backgammon | Dots | Pyramids | Stripes | |||||||||||||
0.01 | 0.03 | 0.07 | MSE | 0.01 | 0.03 | 0.07 | MSE | 0.01 | 0.03 | 0.07 | MSE | 0.01 | 0.03 | 0.07 | MSE | |
cae [24] | 17.32 | 4.313 | 3.924 | 6.074 | 83.70 | 42.50 | 12.40 | 5.082 | 27.54 | 7.162 | 1.681 | 0.048 | 39.95 | 16.90 | 2.405 | 3.556 |
spo_lf4cv [14] | 49.94 | 8.639 | 3.781 | 4.587 | 58.08 | 35.07 | 16.27 | 5.238 | 79.21 | 6.263 | 0.861 | 0.043 | 21.88 | 15.46 | 14.99 | 6.955 |
ps_rf25 [46] | 74.66 | 13.94 | 7.142 | 6.892 | 78.80 | 17.54 | 7.975 | 8.338 | 83.23 | 6.235 | 0.107 | 0.043 | 41.65 | 5.790 | 2.964 | 1.382 |
epnosgc [49] | 55.98 | 10.56 | 3.328 | 3.699 | 84.91 | 82.74 | 39.25 | 22.37 | 28.56 | 3.169 | 0.242 | 0.018 | 28.17 | 19.59 | 18.54 | 8.731 |
FastLFNet [48] | 39.84 | 11.41 | 5.138 | 3.986 | 68.15 | 41.11 | 21.17 | 3.407 | 22.19 | 2.193 | 0.620 | 0.018 | 63.40 | 32.59 | 9.442 | 0.892 |
obercrossanp [47] | 13.66 | 4.952 | 3.413 | 4.799 | 73.13 | 37.66 | 0.974 | 1.757 | 8.171 | 1.130 | 0.364 | 0.008 | 44.72 | 9.352 | 3.065 | 1.435 |
distgdisp [36] | 26.42 | 10.72 | 5.594 | 4.542 | 32.05 | 8.375 | 2.994 | 1.525 | 3.393 | 0.609 | 0.188 | 0.005 | 17.49 | 6.836 | 3.974 | 0.924 |
Epinet-fcn [16] | 20.90 | 6.289 | 3.580 | 3.629 | 41.05 | 12.74 | 3.183 | 1.635 | 11.88 | 0.913 | 0.192 | 0.008 | 15.67 | 3.115 | 2.462 | 0.950 |
Epinet-fcn 9 × 9 [16] | 15.40 | 4.482 | 3.287 | 3.909 | 44.65 | 18.71 | 4.030 | 1.980 | 8.913 | 0.604 | 0.147 | 0.007 | 14.76 | 2.876 | 2.413 | 0.915 |
Epinet-fcnm [16] | 19.44 | 5.563 | 3.501 | 3.705 | 35.62 | 9.117 | 2.490 | 1.475 | 11.43 | 0.874 | 0.159 | 0.007 | 11.77 | 2.711 | 2.457 | 0.932 |
AttMLFNet [34] | 13.73 | 4.625 | 3.228 | 3.863 | 10.61 | 2.021 | 1.606 | 1.035 | 1.767 | 0.429 | 0.174 | 0.003 | 15.44 | 4.743 | 2.932 | 0.814 |
LFattNet [33] | 11.58 | 3.984 | 3.126 | 3.648 | 15.06 | 3.012 | 1.432 | 1.425 | 2.063 | 0.489 | 0.195 | 0.004 | 18.21 | 5.417 | 2.933 | 0.892 |
FPattNet (Ours) | 11.05 | 4.289 | 3.296 | 3.808 | 11.47 | 1.540 | 0.960 | 1.352 | 1.615 | 0.436 | 0.215 | 0.004 | 14.07 | 5.839 | 3.587 | 0.838 |
boxes | cotton | dino | sideboard | |||||||||||||
0.01 | 0.03 | 0.07 | MSE | 0.01 | 0.03 | 0.07 | MSE | 0.01 | 0.03 | 0.07 | MSE | 0.01 | 0.03 | 0.07 | MSE | |
cae [24] | 72.69 | 40.40 | 17.88 | 8.424 | 59.22 | 15.50 | 3.369 | 1.506 | 61.06 | 21.30 | 4.968 | 0.382 | 56.92 | 26.85 | 9.845 | 0.876 |
spo_lf4cv [14] | 73.23 | 29.53 | 15.89 | 9.107 | 69.06 | 13.71 | 2.594 | 1.313 | 69.88 | 16.36 | 2.184 | 0.310 | 73.37 | 28.81 | 9.297 | 1.024 |
ps_rf25 [46] | 76.39 | 35.23 | 18.95 | 9.043 | 70.41 | 14.98 | 2.426 | 1.161 | 75.97 | 16.44 | 4.379 | 0.751 | 79.98 | 36.28 | 11.75 | 1.945 |
epnosgc [49] | 67.35 | 29.01 | 15.30 | 9.314 | 54.85 | 9.767 | 2.060 | 1.406 | 58.79 | 12.79 | 2.877 | 0.565 | 66.35 | 23.87 | 7.997 | 1.744 |
FastLFNet [48] | 71.82 | 37.45 | 18.70 | 4.395 | 49.34 | 6.785 | 0.714 | 0.322 | 56.24 | 13.27 | 2.407 | 0.189 | 61.96 | 21.62 | 7.032 | 0.747 |
obercrossanp [47] | 44.96 | 17.92 | 10.76 | 4.750 | 36.79 | 7.722 | 1.018 | 0.555 | 22.76 | 6.161 | 2.070 | 0.366 | 32.79 | 12.48 | 5.671 | 0.941 |
distgdisp [36] | 41.37 | 20.64 | 12.83 | 3.743 | 6.343 | 1.216 | 0.451 | 0.225 | 21.68 | 4.031 | 1.545 | 0.132 | 26.98 | 8.983 | 3.768 | 0.578 |
Epinet-fcn [16] | 49.04 | 19.76 | 12.84 | 6.240 | 28.07 | 2.310 | 0.508 | 0.191 | 22.40 | 3.452 | 1.286 | 0.167 | 41.88 | 12.08 | 4.801 | 0.827 |
Epinet-fcn 9 × 9 [16] | 45.74 | 18.66 | 12.25 | 6.036 | 25.78 | 2.217 | 0.464 | 0.223 | 23.45 | 3.221 | 1.263 | 0.151 | 40.50 | 11.82 | 4.783 | 0.806 |
Epinet-fcnm [16] | 46.09 | 18.11 | 12.34 | 5.968 | 25.72 | 2.076 | 0.447 | 0.197 | 19.40 | 3.105 | 1.207 | 0.157 | 36.50 | 10.87 | 4.462 | 0.798 |
AttMLFNet [34] | 37.66 | 18.65 | 11.14 | 3.842 | 1.522 | 0.374 | 0.195 | 0.059 | 4.559 | 1.193 | 0.440 | 0.045 | 21.56 | 6.951 | 2.691 | 0.398 |
LFattNet [33] | 37.05 | 18.97 | 11.04 | 3.996 | 3.644 | 0.697 | 0.272 | 0.209 | 12.22 | 2.340 | 0.848 | 0.093 | 20.74 | 7.243 | 2.870 | 0.531 |
FPattNet (Ours) | 33.98 | 16.85 | 9.576 | 3.672 | 3.574 | 0.627 | 0.197 | 0.214 | 9.808 | 1.975 | 0.699 | 0.088 | 20.03 | 6.491 | 2.688 | 0.466 |
Methods | BadPix 0.01 | BadPix 0.03 | BadPix 0.07 | MSE |
cae | 52.30 | 21.86 | 7.743 | 3.243 |
ps_rf25 | 72.63 | 18.30 | 6.961 | 3.694 |
spo_lf4cv | 61.83 | 19.23 | 8.233 | 3.572 |
epnosgc | 55.62 | 23.94 | 11.20 | 5.981 |
FastLFNet | 54.12 | 20.80 | 8.153 | 1.756 |
obercrossanp | 34.62 | 12.17 | 3.417 | 1.823 |
Distgdisp | 21.97 | 7.677 | 3.918 | 1.459 |
Epinet-fcn | 28.86 | 7.582 | 3.606 | 1.705 |
Epinet-fcn 9 × 9 | 27.33 | 7.824 | 3.579 | 1.753 |
Epinet-fcnm | 25.75 | 6.552 | 3.383 | 1.655 |
AttMLFNet | 13.35 | 4.874 | 2.801 | 1.257 |
LFattNet | 15.07 | 5.269 | 2.840 | 1.350 |
FPattNet (Ours) | 13.09 | 4.661 | 2.598 | 1.304 |
Backgammon | Dots | Pyramids | Stripes | |||||
0.03 | MSE | 0.03 | MSE | 0.03 | MSE | 0.03 | MSE | |
w/o-FPN | 9.363 | 4.791 | 6.641 | 1.926 | 0.772 | 0.007 | 11.79 | 0.902 |
FPattNet | 4.289 | 3.808 | 1.540 | 1.352 | 0.436 | 0.004 | 5.073 | 0.841 |
box | cotton | dino | sideboard | |||||
0.03 | MSE | 0.03 | MSE | 0.03 | MSE | 0.03 | MSE | |
w/o-FPN | 21.87 | 5.789 | 1.882 | 0.401 | 4.686 | 0.145 | 11.32 | 0.639 |
FPattNet | 16.85 | 3.672 | 0.627 | 0.197 | 1.975 | 0.088 | 6.491 | 0.466 |
Backgammon | Dots | Pyramids | Stripes | |||||
0.03 | MSE | 0.03 | MSE | 0.03 | MSE | 0.03 | MSE | |
w/o-ATT | 11.53 | 5.019 | 8.891 | 2.161 | 0.855 | 0.007 | 8.696 | 1.109 |
FPattNet | 4.289 | 3.808 | 1.540 | 1.352 | 0.436 | 0.004 | 5.073 | 0.841 |
box | cotton | dino | sideboard | |||||
0.03 | MSE | 0.03 | MSE | 0.03 | MSE | 0.03 | MSE | |
w/o-attention | 23.04 | 4.599 | 1.789 | 0.255 | 5.292 | 0.230 | 11.53 | 0.646 |
FPattNet | 16.85 | 3.672 | 0.627 | 0.197 | 1.975 | 0.088 | 6.491 | 0.466 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, M.; Lv, C.; Liu, X. FPattNet: A Multi-Scale Feature Fusion Network with Occlusion Awareness for Depth Estimation of Light Field Images. Sensors 2023, 23, 7480. https://doi.org/10.3390/s23177480
Xiao M, Lv C, Liu X. FPattNet: A Multi-Scale Feature Fusion Network with Occlusion Awareness for Depth Estimation of Light Field Images. Sensors. 2023; 23(17):7480. https://doi.org/10.3390/s23177480
Chicago/Turabian StyleXiao, Min, Chen Lv, and Xiaomin Liu. 2023. "FPattNet: A Multi-Scale Feature Fusion Network with Occlusion Awareness for Depth Estimation of Light Field Images" Sensors 23, no. 17: 7480. https://doi.org/10.3390/s23177480
APA StyleXiao, M., Lv, C., & Liu, X. (2023). FPattNet: A Multi-Scale Feature Fusion Network with Occlusion Awareness for Depth Estimation of Light Field Images. Sensors, 23(17), 7480. https://doi.org/10.3390/s23177480