On Alpha-Expansion-Based Graph-Cut Optimization for Decoder-Side Depth Estimation
Abstract
:1. Introduction
- The second cycle label offset introduces a change in the starting point for the second iteration cycle, effectively narrowing the search space for depth values. This scheme is designed to leverage the insights gained from the first cycle of graph-cut, effectively optimizing the selection of depth values for subsequent analysis.
- By including depth values of adjacent segments into optimization, neighboring segments label examination counteracts the boundary-blurring effects of video compression.
2. State of the Art in Alpha-Expansion-Based Graph-Cut Optimization
3. The Proposed Method
3.1. Second-Cycle Label Offset
3.2. Neighboring Segments Label Examination
4. Experimental Results
- (1)
- basic alpha-expansion,
- (2)
- hierarchical alpha-expansion with optimization step (current state of the art and reference method in MIV experiments)
- (3)
- as in (3), but with ,
- (4)
- proposed hierarchical alpha-expansion with second-cycle label offset with optimization step ,
- (5)
- as in (4), but with ,
- (6)
- proposed hierarchical alpha-expansion with second-cycle label offset and neighboring segments label examination (with ).
4.1. Methodology
Sequence | Source | Type | Resolution | Views | |
---|---|---|---|---|---|
Classroom Video | [25] | Equirectangular projection | Computer generated | 4096 × 2048 | 15 |
Museum | [26] | Equirectangular projection | Computer generated | 2048 × 2048 | 24 |
Chess | [27] | Equirectangular projection | Computer generated | 2048 × 2048 | 10 |
Guitarist | [28] | Equirectangular projection | Computer generated | 2048 × 2048 | 23 |
Hijack | [29] | Equirectangular projection | Mixed | 4096 × 2048 | 10 |
Cyberpunk | [29] | Equirectangular projection | Mixed | 2048 × 2048 | 10 |
Kitchen | [30] | Perspective, planar | Computer generated | 1920 × 1080 | 25 |
Cadillac | [31] | Perspective, planar | Computer generated | 1920 × 1080 | 15 |
Mirror | [32] | Perspective, planar | Computer generated | 1920 × 1080 | 15 |
Fan | [33] | Perspective, planar | Computer generated | 1920 × 1080 | 15 |
Group | [34] | Perspective, convergent | Computer generated | 1920 × 1080 | 21 |
Dancing | [35] | Perspective, convergent | Computer generated | 1920 × 1080 | 24 |
Painter | [36] | Perspective, planar | Natural content | 2048 × 1088 | 16 |
Breakfast | [37] | Perspective, planar | Natural content | 1920 × 1080 | 15 |
Barn | [38] | Perspective, planar | Natural content | 1920 × 1080 | 15 |
Frog | [39] | Perspective, planar | Natural content | 1920 × 1080 | 13 |
Carpark | [40] | Perspective, planar | Natural content | 1920 × 1088 | 9 |
Street | [40] | Perspective, planar | Natural content | 1920 × 1088 | 9 |
Fencing | [41] | Perspective, convergent | Natural content | 1920 × 1088 | 9 |
CBABasketball | [42] | Perspective, convergent | Natural content | 1920 × 1080 | 34 |
MartialArts | [43] | Perspective, convergent | Natural content | 1920 × 1080 | 15 |
4.2. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Debarba, H.G.; Montagud, M.; Chagué, S.; Herrero, J.G.-L.; Lacosta, I.; Langa, S.F.; Charbonnier, C. Content format and quality of experience in virtual reality. Multimed. Tools Appl. 2022, 81, 14269–14303. [Google Scholar] [CrossRef]
- Boyce, J.M.; Dore, R.; Dziembowski, A.; Fleureau, J.; Jung, J.; Kroon, B.; Salahieh, B.; Vadakital, V.K.M.; Yu, L. MPEG Immersive Video Coding Standard. Proc. IEEE 2021, 109, 1654–1676. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, J.; Liu, Z.; Wang, R.; Chen, G.; Tong, X.; Guo, B. VirtualCube: An Immersive 3D Video Communication System. IEEE Trans. Vis. Comput. Graph. 2022, 28, 1681–1690. [Google Scholar] [CrossRef] [PubMed]
- Mieloch, D.; Garus, P.; Milovanovic, M.; Jung, J.; Jeong, J.Y.; Ravi, S.L.; Salahieh, B. Overview and Efficiency of Decoder-Side Depth Estimation in MPEG Immersive Video. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 4282–4296. [Google Scholar] [CrossRef]
- Chan, Y.L.; Fu, C.H.; Chen, H.; Tsang, S.H. Overview of current development in depth map coding of 3D video and its future. IET Signal Process. 2020, 14, 1–14. [Google Scholar] [CrossRef]
- Garus, P.; Jung, J.; Maugey, T.; Guillemot, C. Bypassing Depth Maps Transmission for Immersive Video Coding. In Proceedings of the 2019 Picture Coding Symposium (PCS), Ningbo, China, 8–10 May 2019. [Google Scholar]
- Dziembowski, A.; Domanski, M.; Grzelka, A.; Mieloch, D.; Stankowski, J.; Wegner, K. The influence of a lossy compression on the quality of estimated depth maps. In Proceedings of the 2016 International Workshop on Systems, Signals and Image Processing (IWSSIP), Bratislava, Slovakia, 18–20 May 2016. [Google Scholar]
- Vadakital, V.K.M.; Dziembowski, A.; Lafruit, G.; Thudor, F.; Lee, G.; Alface, P.R. The MPEG Immersive Video Standard—Current Status and Future Outlook. IEEE MultiMedia 2022, 29, 101–111. [Google Scholar] [CrossRef]
- Mieloch, D.; Dziembowski, A.; Klóska, D.; Szydełko, B.; Jeong, J.Y.; Lee, G. A New Approach to Decoder-Side Depth Estimation in Immersive Video Transmission. IEEE Trans. Broadcast. 2023, 69, 611–624. [Google Scholar] [CrossRef]
- Ravi, S.L.; Milovanovic, M.; Morin, L.; Henry, F. A Study of Conventional and Learning-Based Depth Estimators for Immersive Video Transmission. In Proceedings of the 2022 IEEE 29th International Conference on Image, Video and Signal Processing (MMSP), Shanghai, China, 25–27 August 2022. [Google Scholar]
- Kolmogorov, V.; Zabih, R. Multi-camera Scene Reconstruction via Graph Cuts. In Proceedings of the 7th European Conference on Computer Vision (ECCV), London, UK, 28 May–2 June 2002. [Google Scholar]
- Rogge, S.; Bonatto, D.; Sancho, J.; Salvador, R.; Juarez, E.; Munteanu, A.; Lafruit, G. MPEG-I Depth Estimation Reference Software. In Proceedings of the 2019 International Conference on 3D Immersion (IC3D), Brussels, Belgium, 18–20 September 2019. [Google Scholar]
- Papadakis, N.; Caselles, V. Multi-label Depth Estimation for Graph Cuts Stereo Problems. J. Math. Imaging Vis. 2010, 38, 70–82. [Google Scholar] [CrossRef]
- Mieloch, D.; Grzelka, A. Segmentation-based Method of Increasing the Depth Maps Temporal Consistency. Int. J. Electron. Telecommun. 2018, 64, 283–289. [Google Scholar] [CrossRef]
- Mieloch, D.; Stankiewicz, O.; Domanski, M. Depth Map Estimation for Free-Viewpoint Television and Virtual Navigation. IEEE Access 2020, 8, 5760–5776. [Google Scholar] [CrossRef]
- Dziembowski, A.; Mieloch, D.; Stankowski, J.; Grzelka, A. IV-PSNR—The Objective Quality Metric for Immersive Video Applications. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7575–7591. [Google Scholar] [CrossRef]
- Mieloch, D.; Dziembowski, A.; Grzelka, A.; Stankiewicz, O.; Domański, M. Graph-based multiview depth estimation using segmentation. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 10–14 July 2017. [Google Scholar]
- Lei, J.; Liu, J.; Zhang, H.; Gu, Z.; Ling, N.; Hou, C. Motion and Structure Information Based Adaptive Weighted Depth Video Estimation. IEEE Trans. Broadcast. 2015, 61, 351–362. [Google Scholar] [CrossRef]
- Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1222–1239. [Google Scholar] [CrossRef]
- Tippetts, B.; Lee, D.J.; Lillywhite, K.; Archibald, J. Review of stereo vision algorithms and their suitability for resource-limited systems. J. Real-Time Image Process. 2016, 11, 5–27. [Google Scholar] [CrossRef]
- Xue, T.; Owens, A.; Scharstein, D.; Goesele, M.; Szeliski, R. Multiframe stereo matching with edges, planes, and superpixels. Image Vis. Comput. 2019, 91, 103806. [Google Scholar] [CrossRef]
- Adrian, D.; Adam, G.; Dawid, M.; Olgierd, S. Depth map upsampling and refinement for FTV systems. In Proceedings of the 2016 International Conference on Signals and Electronic Systems (ICSES), Kraków, Poland, 11–13 September 2016. [Google Scholar]
- ISO/IEC JTC1/SC29/WG4 MPEG2023/N0406; Common Test Conditions for MPEG Immersive Video. ISO: Geneva, Switzerland, 2023.
- Bjøntegaard, G. Calculation of average PSNR differences between RD-Curves. In Proceedings of the ITU-T VCEG Meeting, Austin, TX, USA, 2–4 April 2001. [Google Scholar]
- ISO/IEC JTC1/SC29/WG11 MPEG2018/M42415; 3DoF+ Test Sequence ClassroomVideo. ISO: Geneva, Switzerland, 2018.
- ISO/IEC JTC1/SC29/WG11 MPEG2018/M42349; Technicolor 3DoF+ Test Materials. ISO: Geneva, Switzerland, 2018.
- ISO/IEC JTC1/SC29/WG11 MPEG2019/M50787; New Test Content for Immersive Video—Nokia Chess. ISO: Geneva, Switzerland, 2020.
- ISO/IEC JTC1/SC29/WG04 MPEG2021/M58080; A New Computer Graphics Scene, Guitarist, Suitable for MIV Edition-2. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG11, MPEG2021/M58433; [MIV] ERP Content Proposal for MIV ver.1 Verification Test. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG11 MPEG2018/M43318; Proposition of New Sequences for Windowed-6DoF Experiments on Compression, Synthesis and Depth Estimation. ISO: Geneva, Switzerland, 2018.
- ISO/IEC JTC1/SC29/WG4, MPEG2021/M57186; [MIV] New Cadillac Content Proposal for Advanced MIV v2 Investigations. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG11 MPEG2020/M55710; Interdigital Mirror Content Proposal for Advanced MIV Investigations on Reflection. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG11 MPEG/M54732; InterdigitalFan Content Proposal for MIV. ISO: Geneva, Switzerland, 2020.
- ISO/IEC JTC1/SC29/WG11 MPEG2020/M54731; InterdigitalGroup Content Proposal. ISO: Geneva, Switzerland, 2020.
- ISO/IEC JTC1/SC29/WG4 MPEG2021/M57751; [MIV] Dancing Sequence for Verification Tests. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG11 MPEG2017/M40010; Light Field Content from 16-Camera Rig. ISO: Geneva, Switzerland, 2017.
- ISO/IEC JTC1/SC29/WG4 MPEG2021/M56730; MIV] Breakfast New Natural Content Proposal for MIV. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG4 MPEG2021/M56632; Barn New Natural Content Proposal for MIV. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG11 MPEG2018/M43748; Kermit Test Sequence for Windowed 6DoF Activities. ISO: Geneva, Switzerland, 2018.
- ISO/IEC JTC1/SC29/WG11 MPEG2019/M51598; Natural Outdoor Test Sequences. ISO: Geneva, Switzerland, 2020.
- ISO/IEC JTC1/SC29/WG11 MPEG2018/M38247; Multiview Test Video Sequences for Free Navigation Exploration Obtained Using Paris of Cameras. ISO: Geneva, Switzerland, 2016.
- ISO/IEC JTC1/SC29/WG11 MPEG2021/M58500; [MIV] Undistorted CBA Basketball Test Sequence for MPEG-I Visual. ISO: Geneva, Switzerland, 2021.
- ISO/IEC JTC1/SC29/WG04 MPEG2023/M61949; [MIV] New Natural Content—MartialArts. ISO: Geneva, Switzerland, 2023.
Sequence | ||||||
---|---|---|---|---|---|---|
BD-Rate Y-PSNR | BD-Rate IV-PSNR | Decoding and Rendering Runtime | BD-Rate Y-PSNR | BD-Rate IV-PSNR | Decoding and Rendering Runtime | |
ClassroomVideo | −7.1% | −2.8% | 75.5% | 2.8% | −0.6% | 61.8% |
Museum | −2.8% | −0.9% | 108.5% | −4.8% | −1.3% | 93.2% |
Chess | −99.2% | −17.8% | 99.8% | 19.9% | −12.0% | 81.3% |
Guitarist | 13.6% | 2.4% | 109.8% | 76.1% | 34.3% | 87.6% |
Hijack | 50.9% | 110.6% | 95.8% | −11.4% | 2.0% | 86.2% |
Cyberpunk | −2.4% | −0.7% | 90.7% | 1.6% | −3.3% | 77.5% |
Kitchen | 2.5% | 2.7% | 75.0% | 3.8% | 2.6% | 52.6% |
Cadillac | −1.2% | −0.1% | 76.3% | 0.1% | 1.5% | 57.8% |
Mirror | 1.9% | 2.7% | 77.1% | 2.4% | 2.3% | 57.4% |
Fan | −1.5% | −0.8% | 77.8% | −2.0% | −1.3% | 59.9% |
Group | 39.7% | 31.2% | 98.5% | 44.3% | 21.6% | 84.6% |
Dancing | 3.8% | 2.4% | 75.9% | 7.4% | 5.0% | 58.6% |
Painter | 0.6% | 1.0% | 70.4% | 1.7% | 1.5% | 51.0% |
Breakfast | −20.4% | −12.9% | 83.4% | −11.2% | −6.1% | 66.5% |
Barn | 1.4% | −0.8% | 77.4% | 2.8% | −1.4% | 54.9% |
Frog | 0.1% | −0.0% | 57.6% | −0.2% | −0.3% | 37.7% |
Carpark | 13.9% | 7.9% | 78.0% | 8.3% | 5.7% | 55.6% |
Street | 2.3% | 0.4% | 68.6% | 17.0% | 11.3% | 53.0% |
Fencing | 49.9% | 20.8% | 66.9% | 147.7% | 71.3% | 43.8% |
CBA Basketball | −5.3% | −60.1% | 79.9% | 2.9% | −0.2% | 62.7% |
MartialArts | 67.1% | 19.7% | 89.8% | 56.7% | 81.8% | 68.4% |
Average | 5.1% | 5.0% | 82.5% | 17.4% | 10.2% | 64.4% |
Sequence | ||||||
---|---|---|---|---|---|---|
BD-Rate Y-PSNR | BD-Rate IV-PSNR | Decoding and Rendering Runtime | BD-Rate Y-PSNR | BD-Rate IV-PSNR | Decoding and Rendering Runtime | |
ClassroomVideo | −7.3% | −4.6% | 66.5% | −5.7% | −3.5% | 60.1% |
Museum | −6.5% | −2.1% | 57.6% | −3.5% | −1.1% | 91.5% |
Chess | −9.3% | −13.0% | 59.0% | −7.9% | −4.2% | 77.8% |
Guitarist | −0.4% | 1.3% | 53.2% | 26.5% | 34.0% | 86.3% |
Hijack | 45.3% | 40.2% | 65.3% | 26.4% | −43.0% | 78.7% |
Cyberpunk | −7.3% | −5.0% | 57.7% | −6.1% | −2.8% | 71.2% |
Kitchen | 1.1% | 0.3% | 56.5% | 1.9% | 1.4% | 51.8% |
Cadillac | 0.3% | 1.1% | 54.8% | −0.4% | 1.1% | 54.4% |
Mirror | 1.2% | 1.8% | 56.7% | 2.6% | 2.6% | 55.8% |
Fan | 0.1% | 0.7% | 53.8% | −2.2% | −0.9% | 54.9% |
Group | 40.4% | 31.0% | 58.2% | 31.1% | 124.6% | 77.6% |
Dancing | 3.7% | 2.4% | 58.1% | 5.5% | 3.5% | 55.1% |
Painter | 0.5% | 0.6% | 57.9% | 0.7% | 0.0% | 45.4% |
Breakfast | −8.2% | −5.2% | 57.9% | −12.9% | −7.1% | 60.7% |
Barn | 3.3% | 0.3% | 59.6% | 1.4% | −1.1% | 55.5% |
Frog | 0.2% | 0.2% | 50.3% | 0.2% | 0.1% | 33.7% |
Carpark | 12.0% | 7.8% | 56.7% | 15.7% | 10.4% | 53.8% |
Street | −1.3% | −1.3% | 57.2% | 3.3% | 0.7% | 49.9% |
Fencing | −1.1% | −3.3% | 56.9% | 45.9% | 18.3% | 41.0% |
CBA Basketball | −25.4% | −27.0% | 57.2% | −4.3% | −9.3% | 59.7% |
MartialArts | −7.6% | −10.8% | 54.3% | 32.9% | 4.4% | 64.2% |
Average | 1.6% | 0.7% | 57.4% | 7.2% | 6.1% | 60.9% |
Sequence | BD-Rate Y-PSNR | BD-Rate IV-PSNR | Decoding and Rendering Runtime |
---|---|---|---|
ClassroomVideo | −5.6% | −2.9% | 60.2% |
Museum | −3.7% | −1.0% | 89.9% |
Chess | −14.5% | −5.2% | 80.1% |
Guitarist | −8.9% | −89.1% | 88.8% |
Hijack | 6.6% | 25.4% | 78.8% |
Cyberpunk | −9.7% | −6.6% | 69.8% |
Kitchen | 2.3% | 1.3% | 51.4% |
Cadillac | 0.8% | 1.8% | 57.8% |
Mirror | 2.3% | 2.3% | 57.1% |
Fan | −2.9% | −1.3% | 57.3% |
Group | −1.3% | −27.4% | 78.6% |
Dancing | 1.3% | 1.5% | 56.8% |
Painter | 2.4% | 2.3% | 45.6% |
Breakfast | −16.3% | −8.5% | 63.5% |
Barn | 2.0% | 0.2% | 58.0% |
Frog | 0.2% | 0.3% | 36.6% |
Carpark | 10.8% | 6.5% | 55.1% |
Street | 3.4% | 0.5% | 51.4% |
Fencing | 15.8% | 8.2% | 41.3% |
CBA Basketball | −19.4% | −21.9% | 61.3% |
MartialArts | −28.7% | −25.0% | 65.9% |
Average | −3.0% | −6.6% | 62.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mieloch, D.; Klóska, D.; Stankiewicz, O. On Alpha-Expansion-Based Graph-Cut Optimization for Decoder-Side Depth Estimation. Appl. Sci. 2024, 14, 5768. https://doi.org/10.3390/app14135768
Mieloch D, Klóska D, Stankiewicz O. On Alpha-Expansion-Based Graph-Cut Optimization for Decoder-Side Depth Estimation. Applied Sciences. 2024; 14(13):5768. https://doi.org/10.3390/app14135768
Chicago/Turabian StyleMieloch, Dawid, Dominika Klóska, and Olgierd Stankiewicz. 2024. "On Alpha-Expansion-Based Graph-Cut Optimization for Decoder-Side Depth Estimation" Applied Sciences 14, no. 13: 5768. https://doi.org/10.3390/app14135768
APA StyleMieloch, D., Klóska, D., & Stankiewicz, O. (2024). On Alpha-Expansion-Based Graph-Cut Optimization for Decoder-Side Depth Estimation. Applied Sciences, 14(13), 5768. https://doi.org/10.3390/app14135768