Enhanced LDR Detail Rendering for HDR Fusion by TransU-Fusion Network
Abstract
:1. Introduction
- This work proposes a novel method, called TransU-fusion, for HDR reconstruction from multiexposure LDR images using a combination of U-Net and Transformer. The method is capable of reconstructing high-quality and ghost-free HDR images even when LDR images contain large foreground motions and oversaturated areas.
- The proposed Transformer structure, FDTB, distills image features extracted from the Encoder and splits ghost from effective image information, showing better performance than previous methods.
- Experimental results on three benchmarks demonstrate that the proposed model prevails over state-of-the-art HDR models.
2. Related Work
2.1. HDR Reconstruction Algorithms without Deep Learning Methods
2.2. Deep Learning Based Algorithms
3. Proposed Algorithm
3.1. FDTB
3.2. Overall Architecture
3.3. Training
4. Experiments
4.1. Dataset and Metrics
4.2. Ablation Studies
4.2.1. Model Architecture
- w/o Transformer: Compare results by removing the FDTB part present in the original TransU model and using it as a control variable.
- w/Transformer: Replaced the FDTB part with the same number of vanilla transformers to analyze the contribution of the transformer in image fusion.
- Swin-Transformer: Replaced the transformer with the same number of Swin-Transformers to compare their deghosting performance.
- FDTB: Replaced the Swin-Transformer with FDTB to test its effectiveness in deghosting and information fusion capabilities.
4.2.2. Study on Loss Function
4.3. Comparison with SOTA Methods
4.3.1. Test on Kalantari et al.’s Dataset [3]
4.3.2. Test on Other Dataset
4.4. Timing Performance
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
HDR | High Dynamic Range |
LDR | Low Dynamic Range |
AI | Artificial Intelligence |
LDR | Directory of open access journals |
DFTB | Three letter acronym |
DRDB | Linear dichroism |
VR | Virtual Reality |
CNN | Convolutional Neural Network |
GAN | Generative Adversarial Network |
MSA | Multiheaded Self-Attention |
FDTB | Feature Distillation Transformer Block |
RSTB | Residual Swin Transformer Blocks |
ViT | Vision Transformer |
References
- Nayar, S.K.; Mitsunaga, T. High dynamic range imaging: Spatially varying pixel exposures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hilton Head, SC, USA, 15 June 2000; pp. 472–479. [Google Scholar]
- Tumblin, J.; Agrawal, A.; Raskar, R. Why I want a gradient camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; pp. 103–110. [Google Scholar]
- Kalantari, N.K.; Ramamoorthi, R. Deep high dynamic range imaging of dynamic scenes. ACM Trans. Graph. 2017, 36, 144. [Google Scholar] [CrossRef]
- Wu, S.; Xu, J.; Tai, Y.W.; Tang, C.K. Deep high dynamic range imaging with large foreground motions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 117–132. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Yan, Q.; Gong, D.; Shi, Q.; Hengel, A.V.D.; Shen, C.; Reid, I.; Zhang, Y. Attention-guided network for ghost-free high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 1751–1760. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
- Niu, Y.; Wu, J.; Liu, W.; Guo, W.; Lau, R.W.H. HDR-GAN: HDR image reconstruction from multi-exposed LDR images with large motions. IEEE Trans. Image Process. 2021, 30, 3885–3896. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Park, N.; Kim, S. How do vision transformers work? arXiv 2022, arXiv:2202.06709. [Google Scholar]
- Yan, Q.; Zhang, L.; Liu, Y.; Zhu, Y.; Sun, J.; Shi, Q.; Zhang, Y. Deep HDR imaging via a non-local network. IEEE Trans. Image Process. 2020, 29, 4308–4322. [Google Scholar] [CrossRef] [PubMed]
- Grosch, T. Fast and robust high dynamic range image generation with camera and object movement. In Vision, Modeling and Visualization; RWTH Aachen: Aachen, Germany, 2006; Volume 3. [Google Scholar]
- Jacobs, K.; Loscos, C.; Ward, G. Automatic high-dynamic range image generation for dynamic scenes. IEEE Comput. Graph. Appl. 2008, 28, 84–93. [Google Scholar] [CrossRef] [PubMed]
- Pece, F.; Kautz, J. Bitmap movement detection: HDR for dynamic scenes. In Proceedings of the Conference on Visual Media Production, London, UK, 17–18 November 2010. [Google Scholar]
- Zhang, W.; Cham, W.K. Gradient-directed multiexposure composition. IEEE Trans. Image Process. 2011, 21, 2318–2323. [Google Scholar] [CrossRef] [PubMed]
- Heo, Y.S.; Lee, K.M.; Lee, S.U.; Moon, Y.; Cha, J. Ghost-free high dynamic range imaging. In Proceedings of the Computer Vision—ACCV 2010: 10th Asian Conference on Computer Vision, Queenstown, New Zealand, 8–12 November 2010; Volume 10, pp. 486–500. [Google Scholar]
- Bogoni, L. Extending dynamic range of monochrome and color images through fusion. In Proceedings of the 15th International Conference on Pattern Recognition, ICPR-2000, Barcelona, Spain, 3–7 September 2000; pp. 7–12. [Google Scholar]
- Kang, S.B.; Uyttendaele, M.; Winder, S.; Szeliski, R. High dynamic range video. ACM Trans. Graph. (TOG) 2003, 22, 319–325. [Google Scholar] [CrossRef]
- Zimmer, H.; Bruhn, A.; Weickert, J. Freehand HDR imaging of moving scenes with simultaneous resolution enhancement. Comput. Graph. Forum 2011, 30, 405–414. [Google Scholar] [CrossRef]
- Gallo, O.; Troccoli, A.; Hu, J.; Pulli, K.; Kautz, J. Locally non-rigid registration for mobile HDR photography. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 49–56. [Google Scholar]
- Sen, P.; Kalantari, N.K.; Yaesoubi, M.; Darabi, S.; Goldman, D.B.; Shechtman, E. Robust patch-based hdr reconstruction of dynamic scenes. ACM Trans. Graph. 2012, 31, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Gallo, O.; Pulli, K.; Sun, X. HDR deghosting: How to deal with saturation? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1163–1170. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the Computer Vision—ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; pp. 41–55. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. Electrical Engineering and Systems Science—Image and Video Processing. arXiv 2021, arXiv:2108.10257. [Google Scholar]
- Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Tursun, O.T.; Akyüz, A.O.; Erdem, A.; Erdem, E. An objective deghosting quality metric for HDR images. Comput. Graph. Forum 2016, 35, 139–152. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018, Proceedings 4; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
Structure | PSNR- | PSNR-L | HDR-VDP-2 |
---|---|---|---|
w/o ViT | 42.29 | 40.54 | 62.32 |
Vanilla ViT | 43.34 | 41.15 | 64.52 |
RSTB | 43.68 | 41.56 | 64.97 |
FDTB | 43.87 | 41.83 | 65.83 |
Layers | PSNR- | Time(s) | HDR-VDP-2 |
---|---|---|---|
1 | 43.12 | 0.11 | 64.59 |
2 | 43.29 | 0.16 | 65.21 |
3 (Proposed Method) | 43.87 | 0.23 | 65.83 |
4 | 43.56 | 0.45 | 64.92 |
Loss Function | PSNR- | PSNR-L | HDR-VDP-2 |
---|---|---|---|
L1 | 43.54 | 41.03 | 63.40 |
L2 | 43.68 | 41.11 | 64.69 |
L2 + Perception Loss (Proposed) | 43.87 | 41.83 | 65.83 |
Method | PSNR- | PSNR-L | SSIM- | SSIM-L | HDR-VDP-2 |
---|---|---|---|---|---|
Sen’s Algorithm [22] | 40.80 | 38.11 | 0.9808 | 0.9721 | 59.38 |
Hu’s Algorithm [23] | 35.79 | 30.76 | 0.9717 | 0.9503 | 57.05 |
Kalantari [3] | 42.67 | 41.23 | 0.9888 | 0.9846 | 65.05 |
AHDRNet [6] | 43.63 | 41.14 | 0.9900 | 0.9702 | 64.61 |
NHDRRNet [12] | 42.41 | 41.43 | 0.9877 | 0.9857 | 61.21 |
HDR-GAN [8] | 43.92 | 41.57 | 0.9905 | 0.9865 | 65.45 |
Proposed <TransU Fusion> | 43.87 | 41.83 | 0.9904 | 0.9876 | 65.83 |
Method | GPU(s) |
---|---|
Kalantari | 0.19 |
AHDRNet | 0.23 |
NHDRRNet | 0.25 |
HDR-GAN | 0.23 |
Proposed Method | 0.23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, B.; Gao, R.; Wang, Y.; Yu, Q. Enhanced LDR Detail Rendering for HDR Fusion by TransU-Fusion Network. Symmetry 2023, 15, 1463. https://doi.org/10.3390/sym15071463
Song B, Gao R, Wang Y, Yu Q. Enhanced LDR Detail Rendering for HDR Fusion by TransU-Fusion Network. Symmetry. 2023; 15(7):1463. https://doi.org/10.3390/sym15071463
Chicago/Turabian StyleSong, Bo, Rui Gao, Yong Wang, and Qi Yu. 2023. "Enhanced LDR Detail Rendering for HDR Fusion by TransU-Fusion Network" Symmetry 15, no. 7: 1463. https://doi.org/10.3390/sym15071463
APA StyleSong, B., Gao, R., Wang, Y., & Yu, Q. (2023). Enhanced LDR Detail Rendering for HDR Fusion by TransU-Fusion Network. Symmetry, 15(7), 1463. https://doi.org/10.3390/sym15071463