Multi-Frame Content-Aware Mapping Network for Standard-Dynamic-Range to High-Dynamic-Range Television Artifact Removal
Abstract
:1. Introduction
- We propose a multi-frame content-aware mapping network (MCMN) which takes into consideration the temporal continuity and spatial features of video frames in a structured manner to improve the performance from low-quality SDRTV to high-quality HDRTV.
- An innovative content-aware temporal spatial alignment module (CTAM) is introduced, employing dynamic deformable convolution to enhance the alignment accuracy of features across different frames and scales. Temporal spatial dynamic convolution (TSDC) adapts its convolution kernels based on the evolving temporal spatial patterns in the video, which is crucial for accurately capturing inter-frame relationships.
- The hybrid prior extraction module (HPEM) is designed to capture the multi-scale information in video content which is crucial for subsequent temporal spatial content-adaptive dynamic modulation.
- The temporal spatial transformation module (TSTM) employs a sequence of temporal spatial dynamic convolutions and mapping modules to perform content-adaptive dynamic modulation. Specifically, a cross-temporal mapping module (CTMM), a local spatial mapping module (LSMM), and a global spatial mapping module (GSMM) are introduced to refine both local and global details within images, leading to improved inverse tone mapping results and enhanced correction of encoding artifacts.
2. Related Work
2.1. SDRTV-to-HDRTV
2.2. Artifact Removal
3. Methodology
3.1. Overall
3.2. Content-Aware Temporal Spatial Alignment Module
3.3. Hybrid Prior Extraction Module
3.4. Temporal Spatial Transformation Module
3.5. Quality Enhancement Module
4. Results
4.1. Experimental Settings
4.1.1. Dataset
4.1.2. Implementation Details
4.2. Quantitative Results
4.3. Qualitative Results
4.4. Ablation Study
4.4.1. Ablation Study of the Content-Aware Temporal Spatial Alignment Module (CTAM)
4.4.2. Ablation Study of the Temporal Spatial Transformation Module (TSTM)
4.5. Processing Time
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mantiuk, R.; Daly, S.; Kerofsky, L. Display adaptive tone mapping. In Proceedings of the SIGGRAPH ’08: ACM SIGGRAPH 2008 Papers, Los Angeles, CA, USA, 11–15 August 2008; pp. 1–10. [Google Scholar]
- SMPTE ST 2084; High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays. SMPTE: White Plains, NY, USA, 2014; p. 11.
- Nagata, Y.; Ichikawa, K.; Yamashita, T.; Mitsuhashi, S.; Masuda, H. Content Production Technology on Hybrid Log-Gamma. In Proceedings of the SMPTE 2017 Annual Technical Conference and Exhibition, Los Angeles, CA, USA, 23–26 October 2017; pp. 1–12. [Google Scholar]
- Rissanen, J.; Langdon, G.G. Arithmetic coding. IBM J. Res. Dev. 1979, 23, 149–162. [Google Scholar] [CrossRef]
- Chen, Y.; Murherjee, D.; Han, J.; Grange, A.; Xu, Y.; Liu, Z.; Parker, S.; Chen, C.; Su, H.; Joshi, U.; et al. An overview of core coding tools in the AV1 video codec. In Proceedings of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 41–45. [Google Scholar]
- Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, Z.; Ren, J.S.; Tian, L.; Qiao, Y.; Dong, C. A New Journey From SDRTV to HDRTV. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 4500–4509. [Google Scholar]
- Kim, S.Y.; Oh, J.; Kim, M. Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications. In Proceedings of the International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Kim, S.Y.; Oh, J.; Kim, M. Jsi-gan: Gan-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for uhd hdr video. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11287–11295. [Google Scholar]
- He, G.; Long, S.; Xu, L.; Wu, C.; Yu, W.; Zhou, J. Global priors guided modulation network for joint super-resolution and SDRTV-to-HDRTV. Neurocomputing 2023, 554, 126590. [Google Scholar] [CrossRef]
- He, G.; Xu, K.; Xu, L.; Wu, C.; Sun, M.; Wen, X.; Tai, Y.W. SDRTV-to-HDRTV via Hierarchical Dynamic Context Feature Mapping. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), New York, NY, USA, 10–14 October 2022; pp. 2890–2898. [Google Scholar] [CrossRef]
- Xu, N.; Chen, T.; Crenshaw, J.E.; Kunkel, T.; Lee, B. Methods and Systems for Inverse Tone Mapping. U.S. Patent US9607364B2, 28 March 2017. [Google Scholar]
- Ballestad Andrey, A.; Ward, K.J. Method and Apparatus for Image Data Transformation. U.S. Patent US9224363B2, 29 December 2015. [Google Scholar]
- Akyüz, A.O.; Fleming, R.; Riecke, B.E.; Reinhard, E.; Bülthoff, H.H. Do HDR displays support LDR content? A psychophysical evaluation. ACM Trans. Graph. (TOG) 2007, 26, 38-es. [Google Scholar] [CrossRef]
- Banterle, F.; Ledda, P.; Debattista, K.; Chalmers, A. Expanding low dynamic range videos for high dynamic range applications. In Proceedings of the 24th Spring Conference on Computer Graphics, Budmerice Castle, Slovakia, 21–23 April 2008; pp. 33–41. [Google Scholar]
- Banterle, F.; Debattista, K.; Artusi, A.; Pattanaik, S.; Myszkowski, K.; Ledda, P.; Chalmers, A. High dynamic range imaging and low dynamic range expansion for generating HDR content. In Computer Graphics Forum; Wiley Online Library: Oxford, UK, 2009; Volume 28, pp. 2343–2367. [Google Scholar]
- Marnerides, D.; Bashford-Rogers, T.; Debattista, K. Deep HDR hallucination for inverse tone mapping. Sensors 2021, 21, 4032. [Google Scholar] [CrossRef]
- Liu, Y.L.; Lai, W.S.; Chen, Y.S.; Kao, Y.L.; Yang, M.H.; Chuang, Y.Y.; Huang, J.B. Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1651–1660. [Google Scholar]
- Eilertsen, G.; Kronander, J.; Denes, G.; Mantiuk, R.K.; Unger, J. HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. Graph. (TOG) 2017, 36, 178. [Google Scholar] [CrossRef]
- Santos, M.S.; Ren, T.I.; Kalantari, N.K. Single image HDR reconstruction using a CNN with masked features and perceptual loss. ACM Trans. Graph. (TOG) 2020, 39, 1–10. [Google Scholar] [CrossRef]
- Debevec, P.E.; Malik, J. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 3–8 August 1997; pp. 369–378. [Google Scholar]
- Lee, S.; An, G.H.; Kang, S.J. Deep recursive hdri: Inverse tone mapping using generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 596–611. [Google Scholar]
- Niu, Y.; Wu, J.; Liu, W.; Guo, W.; Lau, R.W. HDR-GAN: HDR image reconstruction from multi-exposed LDR images with large motions. IEEE Trans. Image Process. 2021, 30, 3885–3896. [Google Scholar] [CrossRef] [PubMed]
- Yan, Q.; Zhang, L.; Liu, Y.; Zhu, Y.; Sun, J.; Shi, Q.; Zhang, Y. Deep HDR imaging via a non-local network. IEEE Trans. Image Process. 2020, 29, 4308–4322. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Deng, Y.; Change Loy, C.; Tang, X. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 576–584. [Google Scholar]
- Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
- Dai, Y.; Liu, D.; Wu, F. A convolutional neural network approach for post-processing in HEVC intra coding. In Proceedings of the International Conference on Multimedia Modeling, Reykjavik, Iceland, 4–6 January 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 28–39. [Google Scholar]
- Zhang, Y.; Shen, T.; Ji, X.; Zhang, Y.; Xiong, R.; Dai, Q. Residual highway convolutional neural networks for in-loop filtering in HEVC. IEEE Trans. Image Process. 2018, 27, 3827–3841. [Google Scholar] [CrossRef] [PubMed]
- Yang, R.; Xu, M.; Liu, T.; Wang, Z.; Guan, Z. Enhancing quality for HEVC compressed videos. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 2039–2054. [Google Scholar] [CrossRef]
- He, X.; Hu, Q.; Zhang, X.; Zhang, C.; Lin, W.; Han, X. Enhancing HEVC compressed videos with a partition-masked convolutional neural network. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 216–220. [Google Scholar]
- Ding, D.; Kong, L.; Chen, G.; Liu, Z.; Fang, Y. A Switchable Deep Learning Approach for In-loop Filtering in Video Coding. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1871–1887. [Google Scholar] [CrossRef]
- Xue, T.; Chen, B.; Wu, J.; Wei, D.; Freeman, W.T. Video enhancement with task-oriented flow. Int. J. Comput. Vis. 2019, 127, 1106–1125. [Google Scholar] [CrossRef]
- Yang, R.; Xu, M.; Wang, Z.; Li, T. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6664–6673. [Google Scholar]
- Guan, Z.; Xing, Q.; Xu, M.; Yang, R.; Liu, T.; Wang, Z. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 949–963. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Wang, L.; Pu, S.; Zhuo, C. Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10696–10703. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 606–615. [Google Scholar]
- He, J.; Liu, Y.; Qiao, Y.; Dong, C. Conditional sequential modulation for efficient global image retouching. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 679–695. [Google Scholar]
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Ho, M.M.; Zhou, J.; He, G. RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding. IEEE Trans. Image Process. 2021, 30, 1702–1715. [Google Scholar] [CrossRef] [PubMed]
- Xu, G.; Hou, Q.; Zhang, L.; Cheng, M.M. FMNet: Frequency-Aware Modulation Network for SDR-to-HDR Translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), New York, NY, USA, 10–14 October 2022; pp. 6425–6435. [Google Scholar] [CrossRef]
- Chen, X.; Liu, Y.; Zhang, Z.; Qiao, Y.; Dong, C. HDRUnet: Single image HDR reconstruction with denoising and dequantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 354–363. [Google Scholar]
- Shao, T.; Zhai, D.; Jiang, J.; Liu, X. Hybrid Conditional Deep Inverse Tone Mapping. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), New York, NY, USA, 10–14 October 2022; pp. 1016–1024. [Google Scholar] [CrossRef]
- Yang, C.; Jin, M.; Jia, X.; Xu, Y.; Chen, Y. AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-time Image Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17522–17531. [Google Scholar]
Methods | Time | PSNR | Mean- | Mean- | |||
---|---|---|---|---|---|---|---|
QP = 27 | QP = 32 | QP = 37 | QP = 42 | PSNR | SSIM | ||
CSRNET [39] | 0.3 | 33.598 | 32.472 | 31.288 | 29.946 | 31.826 | 0.9518 |
STDF [35] | 3.7 | 33.978 | 32.810 | 31.591 | 30.163 | 32.135 | 0.9455 |
AGCM [7] | 0.4 | 34.260 | 33.123 | 31.878 | 30.395 | 32.414 | 0.9526 |
AILUT [46] | 3.3 | 34.265 | 33.058 | 31.789 | 30.350 | 32.366 | 0.9498 |
DeepSRITM [8] | 7.4 | 34.688 | 33.332 | 31.998 | 30.483 | 32.625 | 0.9515 |
FMNet [43] | 1.4 | 34.462 | 33.474 | 32.146 | 30.584 | 32.666 | 0.9523 |
HDRUNET [44] | 2.9 | 34.586 | 33.591 | 32.262 | 30.706 | 32.786 | 0.9514 |
HDCFM [11] | 2.4 | 34.897 | 33.784 | 32.440 | 30.929 | 33.012 | 0.9538 |
HyCondITM [45] | 8.3 | 34.860 | 33.862 | 32.573 | 31.103 | 33.100 | 0.9554 |
Ours | 3.8 | 35.072 | 34.026 | 32.651 | 31.083 | 33.208 | 0.9555 |
Exp. | Baseline | DCN | CTAM | Time | PSNR | Mean | |||
---|---|---|---|---|---|---|---|---|---|
QP = 27 | QP = 32 | QP = 37 | QP = 42 | ||||||
1 | ✓ | ✗ | ✗ | 2.88 | 34.097 | 33.133 | 31.894 | 30.437 | 32.390 |
2 | ✓ | ✓ | ✗ | 2.95 | 34.595 | 33.409 | 32.137 | 30.617 | 32.689 |
MCMN | ✓ | ✓ | ✓ | 3.83 | 35.072 | 34.026 | 32.651 | 31.083 | 33.208 |
Exp. | GSMM | LSMM | CTMM | TSDC | Time | PSNR | Mean | |||
---|---|---|---|---|---|---|---|---|---|---|
QP = 27 | QP = 32 | QP = 37 | QP = 42 | |||||||
3 | ✓ | ✗ | ✗ | ✗ | 3.09 | 34.097 | 33.133 | 31.894 | 30.437 | 32.390 |
4 | ✓ | ✓ | ✗ | ✗ | 3.49 | 34.628 | 33.498 | 32.195 | 30.673 | 32.748 |
5 | ✓ | ✗ | ✓ | ✗ | 3.43 | 34.865 | 33.567 | 32.252 | 30.704 | 32.847 |
6 | ✓ | ✓ | ✓ | ✗ | 3.56 | 34.977 | 33.923 | 32.569 | 31.012 | 33.120 |
MCMN | ✓ | ✓ | ✓ | ✓ | 3.83 | 35.072 | 34.026 | 32.651 | 31.083 | 33.208 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; He, G. Multi-Frame Content-Aware Mapping Network for Standard-Dynamic-Range to High-Dynamic-Range Television Artifact Removal. Sensors 2024, 24, 299. https://doi.org/10.3390/s24010299
Wang Z, He G. Multi-Frame Content-Aware Mapping Network for Standard-Dynamic-Range to High-Dynamic-Range Television Artifact Removal. Sensors. 2024; 24(1):299. https://doi.org/10.3390/s24010299
Chicago/Turabian StyleWang, Zheng, and Gang He. 2024. "Multi-Frame Content-Aware Mapping Network for Standard-Dynamic-Range to High-Dynamic-Range Television Artifact Removal" Sensors 24, no. 1: 299. https://doi.org/10.3390/s24010299
APA StyleWang, Z., & He, G. (2024). Multi-Frame Content-Aware Mapping Network for Standard-Dynamic-Range to High-Dynamic-Range Television Artifact Removal. Sensors, 24(1), 299. https://doi.org/10.3390/s24010299