Attention Networks for the Quality Enhancement of Light Field Images
Abstract
:1. Introduction
- (1)
- A novel CNN-based filtering method is proposed for enhancing the quality of LF images encoded using HEVC [16].
- (2)
- A novel neural network architecture design for the quality enhancement of LF images is proposed using an efficient complex Processing Block (PB) and a novel Attention-based Residual Block (ARB).
- (3)
- The proposed CNN-based filtering method follows an MP-wise filtering approach to take advantage of the specific LF structure.
- (4)
- The input patch is designed as a tensor of four MP volumes corresponding to four EPIs at four different angles ( and ).
- (5)
- The elaborated experimental validation carried out on the EPFL LF dataset [35] demonstrates the potential of attention networks for the quality enhancement of LF images.
2. Related Work
3. Proposed Method
3.1. Input Patch
- (1)
- The EPI of MP volume:
- (2)
- The EPI of MP volume:
- (3)
- The EPI of MP volume: and
- (4)
- The EPI of MP volume:
3.2. Network Design
3.3. Training Details
4. Experimental Validation
4.1. Experimental Setup
4.2. Experimental Results
4.3. Visual Results
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Jeon, H.G.; Park, J.; Choe, G.; Park, J.; Bok, Y.; Tai, Y.W.; Kweon, I.S. Accurate depth map estimation from a lenslet light field camera. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1547–1555. [Google Scholar] [CrossRef] [Green Version]
- Wang, T.C.; Efros, A.A.; Ramamoorthi, R. Depth Estimation with Occlusion Modeling Using Light-Field Cameras. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2170–2181. [Google Scholar] [CrossRef]
- Schiopu, I.; Munteanu, A. Deep-learning-based depth estimation from light field images. Electron. Lett. 2019, 55, 1086–1088. [Google Scholar] [CrossRef]
- Rogge, S.; Schiopu, I.; Munteanu, A. Depth Estimation for Light-Field Images Using Stereo Matching and Convolutional Neural Networks. Sensors 2020, 20, 6188. [Google Scholar] [CrossRef]
- Flynn, J.; Broxton, M.; Debevec, P.; DuVall, M.; Fyffe, G.; Overbeck, R.; Snavely, N.; Tucker, R. DeepView: View Synthesis With Learned Gradient Descent. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2362–2371. [Google Scholar] [CrossRef] [Green Version]
- Peng, J.; Xiong, Z.; Zhang, Y.; Liu, D.; Wu, F. LF-fusion: Dense and accurate 3D reconstruction from light field images. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Chen, M.; Tang, Y.; Zou, X.; Huang, K.; Li, L.; He, Y. High-accuracy multi-camera reconstruction enhanced by adaptive point cloud correction algorithm. Opt. Lasers Eng. 2019, 122, 170–183. [Google Scholar] [CrossRef]
- Forman, M.C.; Aggoun, A.; McCormick, M. A novel coding scheme for full parallax 3D-TV pictures. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 21–24 April 1997; Volume 4, pp. 2945–2947. [Google Scholar] [CrossRef] [Green Version]
- de Carvalho, M.B.; Pereira, M.P.; Alves, G.; da Silva, E.A.B.; Pagliari, C.L.; Pereira, F.; Testoni, V. A 4D DCT-Based Lenslet Light Field Codec. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 435–439. [Google Scholar] [CrossRef]
- Chang, C.-L.; Zhu, X.; Ramanathan, P.; Girod, B. Light field compression using disparity-compensated lifting and shape adaptation. IEEE Trans. Image Process. 2006, 15, 793–806. [Google Scholar] [CrossRef]
- Rüefenacht, D.; Naman, A.T.; Mathew, R.; Taubman, D. Base-Anchored Model for Highly Scalable and Accessible Compression of Multiview Imagery. IEEE Trans. Image Process. 2019, 28, 3205–3218. [Google Scholar] [CrossRef]
- Jang, J.S.; Yeom, S.; Javidi, B. Compression of ray information in three-dimensional integral imaging. Opt. Eng. 2005, 44, 1–10. [Google Scholar] [CrossRef]
- Kang, H.H.; Shin, D.H.; Kim, E.S. Compression scheme of sub-images using Karhunen-Loeve transform in three-dimensional integral imaging. Opt. Commun. 2008, 281, 3640–3647. [Google Scholar] [CrossRef]
- Elias, V.; Martins, W. On the Use of Graph Fourier Transform for Light-Field Compression. J. Commun. Inf. Syst. 2018, 33. [Google Scholar] [CrossRef] [Green Version]
- Hog, M.; Sabater, N.; Guillemot, C. Superrays for Efficient Light Field Processing. IEEE J. Sel. Top. Signal Process. 2017, 11, 1187–1199. [Google Scholar] [CrossRef] [Green Version]
- Sullivan, G.; Ohm, J.; Han, W.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Ramanathan, P.; Flierl, M.; Girod, B. Multi-hypothesis prediction for disparity compensated light field compression. In Proceedings of the 2001 International Conference on Image Processing (Cat. No.01CH37205), Thessaloniki, Greece, 7–10 October 2001; Volume 2, pp. 101–104. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Xiang, W.; Pickering, M.; Chen, C.W. Light Field Multi-View Video Coding With Two-Directional Parallel Inter-View Prediction. IEEE Trans. Image Process. 2016, 25, 5104–5117. [Google Scholar] [CrossRef] [PubMed]
- Conti, C.; Nunes, P.; Soares, L.D. New HEVC prediction modes for 3D holoscopic video coding. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 1325–1328. [Google Scholar] [CrossRef]
- Zhong, R.; Schiopu, I.; Cornelis, B.; Lu, S.P.; Yuan, J.; Munteanu, A. Dictionary Learning-Based, Directional, and Optimized Prediction for Lenslet Image Coding. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 1116–1129. [Google Scholar] [CrossRef]
- Dricot, A.; Jung, J.; Cagnazzo, M.; Pesquet, B.; Dufaux, F. Improved integral images compression based on multi-view extraction. In Applications of Digital Image Processing XXXIX; Tescher, A.G., Ed.; International Society for Optics and Photonics, SPIE: San Diego, CA, USA, 2016; Volume 9971, pp. 170–177. [Google Scholar] [CrossRef] [Green Version]
- Astola, P.; Tabus, I. Coding of Light Fields Using Disparity-Based Sparse Prediction. IEEE Access 2019, 7, 176820–176837. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2016, arXiv:1409.0473. [Google Scholar]
- Zhu, M.; Jiao, L.; Liu, F.; Yang, S.; Wang, J. Residual Spectral–Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote. Sens. 2021, 59, 449–462. [Google Scholar] [CrossRef]
- Wan, S.; Tang, S.; Xie, X.; Gu, J.; Huang, R.; Ma, B.; Luo, L. Deep Convolutional-Neural-Network-based Channel Attention for Single Image Dynamic Scene Blind Deblurring. IEEE Trans. Circuits Syst. Video Technol. 2020. [Google Scholar] [CrossRef]
- Fu, C.; Yin, Y. Edge-Enhanced with Feedback Attention Network for Image Super-Resolution. Sensors 2021, 21, 2064. [Google Scholar] [CrossRef]
- Zhou, K.; Zhan, Y.; Fu, D. Learning Region-Based Attention Network for Traffic Sign Recognition. Sensors 2021, 21, 686. [Google Scholar] [CrossRef]
- Lian, J.; Yin, Y.; Li, L.; Wang, Z.; Zhou, Y. Small Object Detection in Traffic Scenes Based on Attention Feature Fusion. Sensors 2021, 21, 3031. [Google Scholar] [CrossRef]
- Schiopu, I.; Gabbouj, M.; Gotchev, A.; Hannuksela, M.M. Lossless compression of subaperture images using context modeling. In Proceedings of the 2017 3DTV Conf.: The True Vision—Capture, Transmission and Display of 3D Video (3DTV-CON), Copenhagen, Denmark, 7–9 June 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Schiopu, I.; Munteanu, A. Macro-Pixel Prediction Based on Convolutional Neural Networks for Lossless Compression of Light Field Images. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athene, Greece, 7–10 October 2018; pp. 445–449. [Google Scholar] [CrossRef]
- Schiopu, I.; Munteanu, A. Deep-learning-based macro-pixel synthesis and lossless coding of light field images. Apsipa Trans. Signal Inf. Process. 2019, 8, e20. [Google Scholar] [CrossRef] [Green Version]
- Schiopu, I.; Munteanu, A. Deep-Learning-Based Lossless Image Coding. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1829–1842. [Google Scholar] [CrossRef]
- Huang, H.; Schiopu, I.; Munteanu, A. Frame-wise CNN-based Filtering for Intra-Frame Quality Enhancement of HEVC Videos. IEEE Trans. Circuits Syst. Video Technol. 2020. [Google Scholar] [CrossRef]
- Huang, H.; Schiopu, I.; Munteanu, A. Macro-pixel-wise CNN-based filtering for quality enhancement of light field images. Electron. Lett. 2020, 56, 1413–1416. [Google Scholar] [CrossRef]
- Rerabek, M.; Ebrahimi, T. New Light Field Image Dataset. Proc. Int. Conf. Qual. Multimedia Experience (QoMEX). 2016, pp. 1–2. Available online: https://infoscience.epfl.ch/record/218363/files/Qomex2016_shortpaper.pdf?version=1 (accessed on 1 July 2017).
- Dong, C.; Deng, Y.; Loy, C.C.; Tang, X. Compression Artifacts Reduction by a Deep Convolutional Network. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 576–584. [Google Scholar] [CrossRef] [Green Version]
- Cavigelli, L.; Hager, P.; Benini, L. CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 752–759. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Liu, D.; Chang, S.; Ling, Q.; Yang, Y.; Huang, T.S. D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2764–2772. [Google Scholar] [CrossRef]
- Galteri, L.; Seidenari, L.; Bertini, M.; Bimbo, A.D. Deep Generative Adversarial Compression Artifact Removal. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4836–4845. [Google Scholar] [CrossRef] [Green Version]
- Ororbia, A.G.; Mali, A.; Wu, J.; O’Connell, S.; Dreese, W.; Miller, D.; Giles, C.L. Learned Neural Iterative Decoding for Lossy Image Compression Systems. In Proceedings of the 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 26–29 March 2019; pp. 3–12. [Google Scholar] [CrossRef] [Green Version]
- Dai, Y.; Liu, D.; Wu, F. A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding. Lect. Notes Comput. Sci. 2016, 28–39. [Google Scholar] [CrossRef] [Green Version]
- Yang, R.; Xu, M.; Wang, Z.; Li, T. Multi-frame Quality Enhancement for Compressed Video. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef] [Green Version]
- He, X.; Hu, Q.; Zhang, X.; Zhang, C.; Lin, W.; Han, X. Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athene, Greece, 7–10 October 2018; pp. 216–220. [Google Scholar] [CrossRef] [Green Version]
- Ma, C.; Liu, D.; Peng, X.; Wu, F. Convolutional Neural Network-Based Arithmetic Coding of DC Coefficients for HEVC Intra Coding. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athene, Greece, 7–10 October 2018; pp. 1772–1776. [Google Scholar] [CrossRef]
- Song, X.; Yao, J.; Zhou, L.; Wang, L.; Wu, X.; Xie, D.; Pu, S. A Practical Convolutional Neural Network as Loop Filter for Intra Frame. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athene, Greece, 7–10 October 2018; pp. 1133–1137. [Google Scholar] [CrossRef] [Green Version]
- Wan, S. CE13-Related: Integrated in-Loop Filter Based on CNN. JVET Document, JVET-N0133-v2. 2019. Available online: https://www.itu.int/wftp3/av-arch/jvet-site/2019_03_N_Geneva/JVET-N_Notes_d2.docx (accessed on 1 July 2020).
- Norkin, A.; Bjontegaard, G.; Fuldseth, A.; Narroschke, M.; Ikeda, M.; Andersson, K.; Zhou, M.; Van der Auwera, G. HEVC Deblocking Filter. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1746–1754. [Google Scholar] [CrossRef]
- Fu, C.; Alshina, E.; Alshin, A.; Huang, Y.; Chen, C.; Tsai, C.; Hsu, C.; Lei, S.; Park, J.; Han, W. Sample Adaptive Offset in the HEVC Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1755–1764. [Google Scholar] [CrossRef]
- Park, W.; Kim, M. CNN-based in-loop filtering for coding efficiency improvement. In Proceedings of the 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Bordeaux, France, 11–12 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, Z.; Lin, J.; Li, W. Learned Scalable Image Compression with Bidirectional Context Disentanglement Network. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 1438–1443. [Google Scholar] [CrossRef] [Green Version]
- Li, F.; Tan, W.; Yan, B. Deep Residual Network for Enhancing Quality of the Decoded Intra Frames of Hevc. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athene, Greece, 7–10 October 2018; pp. 3918–3922. [Google Scholar] [CrossRef]
- Lai, P.; Wang, J. Multi-stage Attention Convolutional Neural Networks for HEVC In-Loop Filtering. In Proceedings of the 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genova, Italy, 31 August–2 September 2020; pp. 173–177. [Google Scholar] [CrossRef]
- Zhang, X.; Xiong, R.; Lin, W.; Zhang, J.; Wang, S.; Ma, S.; Gao, W. Low-Rank-Based Nonlocal Adaptive Loop Filter for High-Efficiency Video Compression. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2177–2188. [Google Scholar] [CrossRef]
- Zhang, Y.; Shen, T.; Ji, X.; Zhang, Y.; Xiong, R.; Dai, Q. Residual Highway Convolutional Neural Networks for in-loop Filtering in HEVC. IEEE Trans. Image Process. 2018, 27, 3827–3841. [Google Scholar] [CrossRef] [PubMed]
- Jia, C.; Wang, S.; Zhang, X.; Wang, S.; Liu, J.; Pu, S.; Ma, S. Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding. IEEE Trans. Image Process. 2019, 28, 3343–3356. [Google Scholar] [CrossRef]
- Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute (HHI). HEVC Reference Software. Available online: hevc.hhi.fraunhofer.de (accessed on 1 July 2019).
- Bossen, F. Common HM Test Conditions and Software Reference Configurations. JCT-VC Document, JCTVC-G1100. 2012. Available online: https://www.itu.int/wftp3/av-arch/jctvc-site/2012_02_H_SanJose/JCTVC-H_Notes_dI.doc (accessed on 1 July 2017).
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arxiv 2015, arXiv:1502.03167. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- FFmpeg. Libx265 Implementation of HEVC. Available online: http://ffmpeg.org (accessed on 1 April 2021).
- Viitanen, M.; Koivula, A.; Lemmetti, A.; Ylä-Outinen, A.; Vanne, J.; Hämäläinen, T.D. Kvazaar: Open-Source HEVC/H.265 Encoder. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 1179–1182. [Google Scholar] [CrossRef]
- Hamidouche, W.; Raulet, M.; Déforges, O. 4K Real-Time and Parallel Software Video Decoder for Multilayer HEVC Extensions. IEEE Trans. Circuits Syst. Video Technol. 2016, 26, 169–180. [Google Scholar] [CrossRef]
- Pescador, F.; Chavarrías, M.; Garrido, M.; Malagón, J.; Sanz, C. Real-time HEVC decoding with OpenHEVC and OpenMP. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Berlin, Germany, 3–6 September 2017; pp. 370–371. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bjøntegaard, G. Calculation of average PSNR differences between RD-curves. In Proceedings of the ITU-T Video Coding Experts Group (VCEG) 13th Meeting, Austin, TX, USA, 2–4 April 2001; pp. 2–4. [Google Scholar]
Method | Bjøntegaard Metric | |
---|---|---|
Y-BD-PSNR (dB) | Y-BD-Rate (%) | |
FQE-CNN [33] | 0.4515 | −9.1921 |
MPQE-CNN [34] | 1.5478 | −25.5285 |
AEQE-CNN + DBF&SAO | 2.2044 | −35.3142 |
AEQE-CNN | 2.3006 | −36.5713 |
Method | Bjøntegaard Metric | Nr. of Trained | Inference Time | |
---|---|---|---|---|
Y-BD-PSNR | Y-BD-Rate | Parameters | Per Img. | |
AEQE-CNN [N=16] | 2.0954 dB | –35.2581% | 197,661 (−74.7%) | 98 s (−44.6%) |
AEQE-CNN [3×3] | 2.0799 dB | –35.0914% | 782,661 | 105 s (−40.7%) |
AEQE-CNN | 2.3006 dB | −36.5713% | 782,661 | 177 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Schiopu, I.; Munteanu, A. Attention Networks for the Quality Enhancement of Light Field Images. Sensors 2021, 21, 3246. https://doi.org/10.3390/s21093246
Schiopu I, Munteanu A. Attention Networks for the Quality Enhancement of Light Field Images. Sensors. 2021; 21(9):3246. https://doi.org/10.3390/s21093246
Chicago/Turabian StyleSchiopu, Ionut, and Adrian Munteanu. 2021. "Attention Networks for the Quality Enhancement of Light Field Images" Sensors 21, no. 9: 3246. https://doi.org/10.3390/s21093246
APA StyleSchiopu, I., & Munteanu, A. (2021). Attention Networks for the Quality Enhancement of Light Field Images. Sensors, 21(9), 3246. https://doi.org/10.3390/s21093246