Lightweight Reference-Based Video Super-Resolution Using Deformable Convolution
Abstract
:1. Introduction
2. Related Work
2.1. Single-Image Super-Resolution
2.2. Video Super-Resolution
2.3. Reference-Based Super-Resolution
3. Proposed Method
3.1. Feature Extraction
3.2. Feature Alignment
3.3. Reconstruction
3.4. Loss Function
4. Experimental Results
4.1. Datasets
4.2. Baseline Models
- Single-Image Super-Resolution Method
- We used the single image super-resolution architecture, RCAN [16], as one of our baselines. Because its pre-trained model is not trained on the REDS dataset, we trained this model on the REDS dataset for 50 epochs.
- Reference-based Super-Resolution Method
4.3. Implementation Details
4.4. Results
4.4.1. Quantitative Evaluation
4.4.2. Analysis on Running Time and Memory Usage
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gupta, R.; Sharma, A.; Kumar, A. Super-resolution using GANs for medical imaging. Procedia Comput. Sci. 2020, 173, 28–35. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, H.; Shen, H.; Li, P. A super-resolution reconstruction algorithm for surveillance images. Signal Process. 2010, 90, 848–859. [Google Scholar] [CrossRef]
- Wang, P.; Bayram, B.; Sertel, E. A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth-Sci. Rev. 2022, 232, 104110. [Google Scholar] [CrossRef]
- Pan, L.; Chen, L.; Zhu, S.; Tong, W.; Guo, L. Research on small sample data-driven inspection technology of UAV for transmission line insulator defect detection. Information 2022, 13, 276. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K., Eds.; Curran Associates, Inc.: New York, NY, USA, 2014; Volume 27. [Google Scholar]
- Zheng, H.; Ji, M.; Wang, H.; Liu, Y.; Fang, L. CrossNet: An end-to-end reference-based super resolution network using cross-scale warping. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 87–104. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, Z.; Lin, Z.; Qi, H. Image super-resolution by neural texture Transfer. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7974–7983. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Häusser, P.; Hazirbas, C.; Golkov, V.; Smagt, P.V.D.; Cremers, D.; Brox, T. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar] [CrossRef]
- Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture Transformer network for image super-resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, DC, USA, 14–19 June 2020; pp. 5790–5799. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets V2: More deformable, better results. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9300–9308. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar] [CrossRef]
- Agustsson, E.; Timofte, R. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1122–1131. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 294–310. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision— ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Kolesnikov, A.; Dosovitskiy, A.; Weissenborn, D.; Heigold, G.; Uszkoreit, J.; Beyer, L.; Minderer, M.; Dehghani, M.; Houlsby, N.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR) 2021, Virtual, 3–7 May 2021. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image restoration using Swin Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical vision Transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Yao, X.; Pan, Y.; Wang, J. An omnidirectional image super-resolution method based on enhanced SwinIR. Information 2024, 15, 248. [Google Scholar] [CrossRef]
- Zheng, L.; Zhu, J.; Shi, J.; Weng, S. Efficient mixed transformer for single image super-resolution. Eng. Appl. Artif. Intell. 2024, 133, 108035. [Google Scholar] [CrossRef]
- Haris, M.; Shakhnarovich, G.; Ukita, N. Recurrent back-projection network for video super-resolution. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3892–3901. [Google Scholar] [CrossRef]
- Chan, K.C.; Wang, X.; Yu, K.; Dong, C.; Loy, C.C. BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- De Brabandere, B.; Jia, X.; Tuytelaars, T.; Van Gool, L. Dynamic filter networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 667–675. [Google Scholar]
- Jo, Y.; Oh, S.W.; Kang, J.; Kim, S.J. Deep Video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3224–3232. [Google Scholar] [CrossRef]
- Tian, Y.; Zhang, Y.; Fu, Y.; Xu, C. TDAN: Temporally-deformable alignment network for video super-resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, DC, USA, 14–19 June 2020; pp. 3357–3366. [Google Scholar] [CrossRef]
- Wang, X.; Chan, K.C.; Yu, K.; Dong, C.; Loy, C.C. EDVR: Video restoration with enhanced deformable convolutional networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 1954–1963. [Google Scholar] [CrossRef]
- Chan, K.C.; Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Understanding deformable alignment in video super-resolution. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021. [Google Scholar]
- Zhang, Y.; Wang, H.; Zhu, H.; Chen, Z. Optical flow reusing for high-efficiency space-time video super resolution. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 2116–2128. [Google Scholar] [CrossRef]
- Feng, Z.; Zhang, W.; Liang, S.; Yu, Q. Deep vdeo super-resolution using hybrid imaging system. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4855–4867. [Google Scholar] [CrossRef]
- Liu, X.; Li, J.; Duan, T.; Li, J.; Wang, Y. DSMA: Reference-based image super-resolution method based on dual-view supervised learning and multi-attention mechanism. IEEE Access 2022, 10, 54649–54659. [Google Scholar] [CrossRef]
- Shim, G.; Park, J.; Kweon, I.S. Robust reference-based super-resolution with similarity-aware deformable convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8425–8434. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar] [CrossRef]
- Nah, S.; Baik, S.; Hong, S.; Moon, G.; Son, S.; Timofte, R.; Lee, K.M. NTIRE 2019 challenge on video deblurring and super-resolution: Dataset and study. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 1996–2005. [Google Scholar] [CrossRef]
- Liu, C.; Sun, D. On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 346–360. [Google Scholar] [CrossRef] [PubMed]
CUFED5 Levels | |||||||
---|---|---|---|---|---|---|---|
Method | Vid4 | REDS4 | 1 | 2 | 3 | 4 | 5 |
Bicubic | 20.41 | 25.93 | 22.92 | 22.92 | 22.92 | 22.92 | 22.92 |
RCAN | 20.00 | 28.15 | 24.21 | 24.21 | 24.21 | 24.21 | 24.21 |
CrossNet | 18.63 | 22.28 | 20.18 | 20.06 | 20.09 | 20.04 | 20.07 |
SRNTT | 19.04 | 27.61 | 24.12 | 24.09 | 24.09 | 24.06 | 24.09 |
RSRDCN (ours) | 20.00 | 28.27 | 24.24 | 24.24 | 24.23 | 24.23 | 24.24 |
CUFED5 Levels | |||||||
---|---|---|---|---|---|---|---|
Method | Vid4 | REDS4 | 1 | 2 | 3 | 4 | 5 |
Bicubic | 0.520 | 0.724 | 0.632 | 0.632 | 0.632 | 0.632 | 0.632 |
RCAN | 0.513 | 0.802 | 0.712 | 0.712 | 0.712 | 0.712 | 0.712 |
CrossNet | 0.372 | 0.504 | 0.465 | 0.451 | 0.454 | 0.448 | 0.448 |
SRNTT | 0.488 | 0.779 | 0.717 | 0.715 | 0.715 | 0.714 | 0.714 |
RSRDCN (ours) | 0.530 | 0.815 | 0.727 | 0.727 | 0.727 | 0.727 | 0.727 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miyazaki, T.; Guo, Z.; Omachi, S. Lightweight Reference-Based Video Super-Resolution Using Deformable Convolution. Information 2024, 15, 718. https://doi.org/10.3390/info15110718
Miyazaki T, Guo Z, Omachi S. Lightweight Reference-Based Video Super-Resolution Using Deformable Convolution. Information. 2024; 15(11):718. https://doi.org/10.3390/info15110718
Chicago/Turabian StyleMiyazaki, Tomo, Zirui Guo, and Shinichiro Omachi. 2024. "Lightweight Reference-Based Video Super-Resolution Using Deformable Convolution" Information 15, no. 11: 718. https://doi.org/10.3390/info15110718
APA StyleMiyazaki, T., Guo, Z., & Omachi, S. (2024). Lightweight Reference-Based Video Super-Resolution Using Deformable Convolution. Information, 15(11), 718. https://doi.org/10.3390/info15110718