Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks
Abstract
:1. Introduction
- (1)
- We propose LGUN, a hybridization structure designed for resource-constrained devices. It combines the strengths of Convolutional Networks and ViTs, allowing for effective encoding of both local processing and global interaction throughout the network by the proposed LGU.
- (2)
- In the shallow layer, we employ MLHA to focus on encoding local spatial information. By using the STF strategy, MLHA promotes the learning of different patterns while also saving computational resources. In the deep layer, we utilize DGSA based on the MTS strategy to model global context dependencies. This enhances the network’s ability to model complex image patterns with high adaptability and representational power.
- (3)
- Experimental results on popular benchmark datasets demonstrate the superiority of our method compared to other recently advanced Transformer-based approaches. Our method outperforms in both quantitative and qualitative evaluations, providing evidence for the effectiveness of the MLHA-with-STF strategy and the DGSA-with-MTS strategy.
2. Related Work
2.1. Convolutional Networks
2.2. Transformers
2.3. Combination of Transformers and Convolutional Networks
3. Methods
3.1. Overall Architecture
3.2. LGU
3.3. Multi-Order Local Hierarchical Aggregation (MLHA)
3.4. Dynamic Global Sparse Attention (DGSA)
3.5. Feed-Forward Network (FFN)
3.6. Discussion
4. Experiments
4.1. Implementation Details
4.2. Comparison with State-of-the-Art (SOTA) Methods
4.2.1. Quantitative and Qualitative Results
4.2.2. Visualization Analysis
4.2.3. Remote Sensing Image Super-Resolution
4.3. Ablation Study
4.4. Application
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Silva, N.P.; Amin, B.; Dunne, E.; Hynes, N.; O’Halloran, M.; Elahi, A. Implantable Pressure-Sensing Devices for Monitoring Abdominal Aortic Aneurysms in Post-Endovascular Aneurysm Repair. Sensors 2024, 24, 3526. [Google Scholar] [CrossRef]
- Silva, N.P.; Elahi, A.; Dunne, E.; O’Halloran, M.; Amin, B. Design and Characterisation of a Read-Out System for Wireless Monitoring of a Novel Implantable Sensor for Abdominal Aortic Aneurysm Monitoring. Sensors 2024, 24, 3195. [Google Scholar] [CrossRef] [PubMed]
- Negre, P.; Alonso, R.S.; González-Briones, A.; Prieto, J.; Rodríguez-González, S. Literature Review of Deep-Learning-Based Detection of Violence in Video. Sensors 2024, 24, 4016. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Yang, L.; Zhang, L.; Shang, F.; Liu, Y.; Wang, L. Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering. Sensors 2024, 24, 3659. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, D.; Boni, R.; Mills, B.N.; Cheng, J.; Komissarov, I.; Gerber, S.A.; Sobolewski, R. High-Density Polyethylene Custom Focusing Lenses for High-Resolution Transient Terahertz Biomedical Imaging Sensors. Sensors 2024, 24, 2066. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; He, J.; Liu, H.; Yuan, W. MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer. Sensors 2024, 24, 4056. [Google Scholar] [CrossRef] [PubMed]
- Chang, H.K.; Chen, W.W.; Jhang, J.S.; Liou, J.C. Siamese Unet Network for Waterline Detection and Barrier Shape Change Analysis from Long-Term and Large Numbers of Satellite Imagery. Sensors 2023, 23, 9337. [Google Scholar] [CrossRef] [PubMed]
- Njimi, H.; Chehata, N.; Revers, F. Fusion of Dense Airborne LiDAR and Multispectral Sentinel-2 and Pleiades Satellite Imagery for Mapping Riparian Forest Species Biodiversity at Tree Level. Sensors 2024, 24, 1753. [Google Scholar] [CrossRef] [PubMed]
- Wan, S.; Guan, S.; Tang, Y. Advancing bridge structural health monitoring: Insights into knowledge-driven and data-driven approaches. J. Data Sci. Intell. Syst. 2023, 2, 129–140. [Google Scholar] [CrossRef]
- Wu, Z.; Tang, Y.; Hong, B.; Liang, B.; Liu, Y. Enhanced Precision in Dam Crack Width Measurement: Leveraging Advanced Lightweight Network Identification for Pixel-Level Accuracy. Int. J. Intell. Syst. 2023, 2023, 9940881. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Liu, X.; Liao, X.; Shi, X.; Qing, L.; Ren, C. Efficient Information Modulation Network for Image Super-Resolution. In ECAI 2023; IOS Press: Amsterdam, The Netherlands, 2023; pp. 1544–1551. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Chu, X.; Zhang, B.; Ma, H.; Xu, R.; Li, Q. Fast, accurate and lightweight super-resolution with neural architecture search. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 59–64. [Google Scholar]
- Gao, Q.; Zhao, Y.; Li, G.; Tong, T. Image super-resolution using knowledge distillation. In Proceedings of the Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Revised Selected Papers, Part II. Springer: Berlin/Heidelberg, Germany, 2019; pp. 527–541. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Zhong, B.; Fu, Y. Residual non-local attention networks for image restoration. arXiv 2019, arXiv:1903.10082. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12299–12310. [Google Scholar]
- Cheng, G.; Matsune, A.; Du, H.; Liu, X.; Zhan, S. Exploring more diverse network architectures for single image super-resolution. Knowl. Based Syst. 2022, 235, 107648. [Google Scholar] [CrossRef]
- Wang, X.; Dong, C.; Shan, Y. Repsr: Training efficient vgg-style super-resolution networks with structural re-parameterization and batch normalization. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 2556–2564. [Google Scholar]
- Wan, J.; Yin, H.; Liu, Z.; Chong, A.; Liu, Y. Lightweight image super-resolution by multi-scale aggregation. IEEE Trans. Broadcast. 2020, 67, 372–382. [Google Scholar] [CrossRef]
- Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
- Fan, Q.; Huang, H.; Zhou, X.; He, R. Lightweight vision transformer with bidirectional interaction. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Zhou, X.; Huang, H.; Wang, Z.; He, R. Ristra: Recursive image super-resolution transformer with relativistic assessment. IEEE Trans. Multimed. 2024, 26, 6475–6487. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Huang, Z.; Ben, Y.; Luo, G.; Cheng, P.; Yu, G.; Fu, B. Shuffle transformer: Rethinking spatial shuffle for vision transformer. arXiv 2021, arXiv:2106.03650. [Google Scholar]
- Vaswani, A.; Ramachandran, P.; Srinivas, A.; Parmar, N.; Hechtman, B.; Shlens, J. Scaling local self-attention for parameter efficient visual backbones. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12894–12904. [Google Scholar]
- Mehta, S.; Rastegari, M. Separable self-attention for mobile vision transformers. arXiv 2022, arXiv:2206.02680. [Google Scholar]
- Wang, S.; Li, B.Z.; Khabsa, M.; Fang, H.; Ma, H. Linformer: Self-attention with linear complexity. arXiv 2020, arXiv:2006.04768. [Google Scholar]
- Ho, J.; Kalchbrenner, N.; Weissenborn, D.; Salimans, T. Axial attention in multidimensional transformers. arXiv 2019, arXiv:1912.12180. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
- Wu, S.; Wu, T.; Tan, H.; Guo, G. Pale transformer: A general vision transformer backbone with pale-shaped attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 2731–2739. [Google Scholar]
- Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating long sequences with sparse transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
- Zhao, G.; Lin, J.; Zhang, Z.; Ren, X.; Su, Q.; Sun, X. Explicit sparse transformer: Concentrated attention through explicit selection. arXiv 2019, arXiv:1912.11637. [Google Scholar]
- Cai, H.; Gan, C.; Han, S. Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. arXiv 2022, arXiv:2205.14756. [Google Scholar]
- Yuan, K.; Guo, S.; Liu, Z.; Zhou, A.; Yu, F.; Wu, W. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 579–588. [Google Scholar]
- Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12175–12185. [Google Scholar]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 22–31. [Google Scholar]
- Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 12259–12269. [Google Scholar]
- Li, Y.; Zhang, K.; Cao, J.; Timofte, R.; Van Gool, L. Localvit: Bringing locality to vision transformers. arXiv 2021, arXiv:2104.05707. [Google Scholar]
- Xiao, T.; Singh, M.; Mintun, E.; Darrell, T.; Dollár, P.; Girshick, R. Early convolutions help transformers see better. Adv. Neural Inf. Process. Syst. 2021, 34, 30392–30400. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the Computer Vision–ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part III. Springer: Berlin/Heidelberg, Germany, 2023; pp. 205–218. [Google Scholar]
- Song, Y.; He, Z.; Qian, H.; Du, X. Vision transformers for single image dehazing. arXiv 2022, arXiv:2204.03883. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 568–578. [Google Scholar]
- Pan, Z.; Zhuang, B.; Liu, J.; He, H.; Cai, J. Scalable vision transformers with hierarchical pooling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 377–386. [Google Scholar]
- Heo, B.; Yun, S.; Han, D.; Chun, S.; Choe, J.; Oh, S.J. Rethinking spatial dimensions of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 11936–11945. [Google Scholar]
- Chen, C.F.R.; Fan, Q.; Panda, R. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 357–366. [Google Scholar]
- Chen, Y.; Dai, X.; Chen, D.; Liu, M.; Dong, X.; Yuan, L.; Liu, Z. Mobile-former: Bridging mobilenet and transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5270–5279. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 22367–22377. [Google Scholar]
- Yoo, J.; Kim, T.; Lee, S.; Kim, S.H.; Lee, H.; Kim, T.H. Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution. arXiv 2022, arXiv:2203.07682. [Google Scholar]
- Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 3–7 September 2012. [Google Scholar]
- Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the Curves and Surfaces: 7th International Conference, Avignon, France, 24–30 June 2010; Revised Selected Papers 7. Springer: Berlin/Heidelberg, Germany, 2012; pp. 711–730. [Google Scholar]
- Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 2, pp. 416–423. [Google Scholar]
- Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
- Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Tai, Y.; Yang, J.; Liu, X.; Xu, C. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4539–4547. [Google Scholar]
- Li, Z.; Yang, J.; Liu, Z.; Yang, X.; Jeon, G.; Wu, W. Feedback network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3867–3876. [Google Scholar]
- Hui, Z.; Wang, X.; Gao, X. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 723–731. [Google Scholar]
- Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
- Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Lin, Z.; An, W.; Guo, Y. Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 4917–4926. [Google Scholar]
- Chen, H.; Gu, J.; Zhang, Z. Attention in attention network for image super-resolution. arXiv 2021, arXiv:2104.09497. [Google Scholar]
- Choi, H.; Lee, J.; Yang, J. N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution. arXiv 2022, arXiv:2211.11436. [Google Scholar]
- Liu, C.; Lei, P. An efficient group skip-connecting network for image super-resolution. Knowl. Based Syst. 2021, 222, 107017. [Google Scholar] [CrossRef]
- Esmaeilzehi, A.; Ahmad, M.O.; Swamy, M. FPNet: A Deep Light-Weight Interpretable Neural Network Using Forward Prediction Filtering for Efficient Single Image Super Resolution. IEEE Trans. Circuits Syst. II Express Briefs 2021, 69, 1937–1941. [Google Scholar] [CrossRef]
- Gu, J.; Dong, C. Interpreting super-resolution networks with local attribution maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 9199–9208. [Google Scholar]
- Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Training Config | Settings |
---|---|
Random rotation | (, , ) |
Random flipping | Horizontal |
Patch size | 64 × 64 |
Batch size | 16 |
Base learning rate | 5 × 10−4 |
Optimizer momentum | = 0.9, = 0.999 |
Weight decay | 1 × 10−4 |
Learning rate schedule | Cosine decay |
Learning rate bound | 1 × 10−7 |
Method | Scale | #Params (K) | Multi-Adds (G) | Set5 | Set14 | BSDS100 | Urban100 | Manga109 |
---|---|---|---|---|---|---|---|---|
Bicubic | ×2 | ∖ | ∖ | 33.66/0.9299 | 30.24/0.8688 | 29.56/0.8431 | 26.88/0.8403 | 30.80/0.9339 |
SRCNN (TPAMI’14) [15] | ×2 | 57 | 52.7 | 36.66/0.9542 | 32.45/0.9067 | 31.36/0.8879 | 29.50/0.8946 | 35.60/0 9663 |
VDSR (CVPR’16) [11] | ×2 | 665 | 612.6 | 37.53/0.9590 | 33.05/0.9130 | 31.90/0.8960 | 30.77/0.9140 | 37.22/0.9750 |
DRCN (CVPR’16) [65] | ×2 | 1774 | 9788.7 | 37.63/0.9588 | 33.04/0.9118 | 31.85/0.8942 | 30.75/0.9133 | 37.55/0.9732 |
LapSRN (CVPR’17) [66] | ×2 | 813 | 29.9 | 37.52/0.9591 | 33.08/0.9130 | 31.08/0.8950 | 30.41/0.9101 | 37.27/0.9740 |
MemNet (ICCV’17) [67] | ×2 | 677 | 623.9 | 37.78/0.9597 | 33.28/0.9142 | 32.08/0.8978 | 31.31/0.9195 | 37.72/0.9740 |
IDN (CVPR’18) [69] | ×2 | 553 | 127.7 | 37.83/0.9600 | 33.30/09148 | 32.08/0.8985 | 31.27/0.9196 | 38.01/0.9749 |
CARN (ECCV’18) [70] | ×2 | 1592 | 222.8 | 37.76/0.9590 | 33.52/0.9166 | 32.09/0.8978 | 31.92/0.9256 | 38.36/0.9765 |
EDSR-baseline (CVPR’19) [12] | ×2 | 1370 | 316 | 37.99/0.9604 | 33.57/0.9175 | 32.16/0.8994 | 31.98/0.9272 | 38.54/0.9769 |
SRFBN-S (CVPR’19) [68] | ×2 | 282 | 574.4 | 37.78/0.9597 | 33.35/0.9156 | 32.00/0.8970 | 31.41/0.9207 | 38.06/0.9757 |
FALSRA (ICPR’21) [18] | ×2 | 1021 | 234.7 | 37.82/0.9595 | 33.55/0.9168 | 32.12/0.8987 | 31.93/0.9256 | - |
SMSR (CVPR’21) [71] | ×2 | 985 | 131.6 | 38.00/0.9601 | 33.64/09179 | 32.17/0.8990 | 32.19/0.9284 | 38.76/0.9771 |
A2N (arXiv’19) [72] | ×2 | 1036 | 247.5 | 38.06/0.9608 | 33.75/09194 | 32.22/09002 | 32.43/0.9311 | 38.87/0.9769 |
LMAN (TBC’21) [26] | ×2 | 1531 | 347.1 | 38.08/0.9608 | 33.80/0.9023 | 32.22/0.9001 | 32.42/0.9302 | 38.92/0.9772 |
SwinIR (ICCV’21) [32] | ×2 | 878 | 243.7 | 38.14/0.9611 | 33.86/0.9206 | 32.31/0.9012 | 32.76/0.9340 | 39.12/0.9783 |
B-GSCN 10 (KBS’21) [74] | ×2 | 1490 | 343 | 38.04/0.9606 | 33.64/0.9182 | 32.19/0.8999 | 32.19/0.9293 | 38.64/0.9771 |
DRSDN (KBS’21) [24] | ×2 | 1055 | 243.1 | 38.06/0.9607 | 33.65/0.9189 | 32.23/0.9003 | 32.40/0.9308 | - |
FPNet (TCSVT’22) [75] | ×2 | 1615 | - | 38.13/0.9619 | 33.83/0.9198 | 32.29/0.9018 | 32.04/0.9278 | - |
NGswin (CVPR’23) [73] | ×2 | 998 | 140.4 | 38.05/0.9610 | 33.79/0.9199 | 32.27/0.9008 | 32.53/0.9324 | 38.97/0.9777 |
LGUN (Ours) | ×2 | 675 | 141.1 | 38.24/0.9618 | 33.93/0.9208 | 32.34/0.9027 | 32.65/0.9322 | 39.38/0.9786 |
Bicubic | ×3 | ∖ | ∖ | 30.39/0.8682 | 27.55/0.7742 | 27.21/0.7385 | 24.46/0.7349 | 26.95/0.8556 |
SRCNN (TPAMI’14) [15] | ×3 | 57 | 52.7 | 32.75/0.9090 | 29.30/0.8215 | 28.41/0.7863 | 26.24/0.7989 | 30.48/0.9117 |
VDSR (CVPR’16) [11] | ×3 | 665 | 612.6 | 33.67/0.9210 | 29.78/0.8320 | 28.83/0.7990 | 27.14/0.8290 | 32.01/0.9340 |
DRCN (CVPR’16) [65] | ×3 | 1774 | 9788.7 | 33.82/0.9226 | 29.76/0.8311 | 28.80/0.7963 | 27.14/0.8279 | 32.24/0.9343 |
MemNet (ICCV’17) [67] | ×3 | 677 | 623.9 | 34.09/0.9248 | 30.01/0.8350 | 28.96/0.8001 | 27.56/0.8376 | 32.51/0.9369 |
IDN (CVPR’18) [69] | ×3 | 553 | 57 | 34.11/0.9253 | 29.99/0.8354 | 28.95/0.8013 | 27.42/0.8359 | 3271/0.9381 |
CARN (ECCV’18) [70] | ×3 | 1592 | 118.8 | 34.29/0.9255 | 30.29/0.8407 | 29.06/0.8034 | 28.06/0.8493 | 33.50/0.9440 |
EDSR-baseline (CVPR’19) [12] | ×3 | 1555 | 160 | 34.37/0.9270 | 30.28/0.8417 | 29.09/0.8052 | 28.15/0.8527 | 33.45/0.9439 |
SRFBN-S (CVPR’19) [68] | ×3 | 375 | 686.4 | 34.20/0.9255 | 30.10/0.8372 | 28.96/0.8010 | 27.66/0.8415 | 33.02/0.9404 |
SMSR (CVPR’21) [71] | ×3 | 993 | 67.8 | 34.40/0.9270 | 30.33/0.8412 | 29.10/0.8050 | 28.25/0.8536 | 33.68/0.9445 |
A2N (arXiv’19) [72] | ×3 | 1036 | 1175 | 34.47/0.9279 | 30.44/0.8437 | 29.14/0.8059 | 28.41/0.8570 | 33.78/0.9458 |
LMAN (TBC’21) [26] | ×3 | 1718 | 173.8 | 34.56/0.9286 | 30.46/0.8439 | 29.17/0.8067 | 28.47/0.8576 | 34.00/0.9470 |
SwinIR (ICCV’21) [32] | ×3 | 886 | 109.5 | 34.60/0.9289 | 30.54/0.8463 | 29.20/0.8082 | 28.66/0.8624 | 33.98/090978 |
B-GSCN 10 (KBS’21) [74] | ×3 | 1510 | 154 | 34.30/0.9271 | 30.35/0.8425 | 29.11/0.8035 | 28.20/0.8535 | 33.54/0.9445 |
DRSDN (KBS’21) [24] | ×3 | 1071 | 109.8 | 34.48/0.9282 | 30.41/0.8445 | 29.17/0.8072 | 28.45/0.8589 | - |
FPNet (TCSVT’22) [75] | ×3 | 1615 | - | 34.48/0.9285 | 30.53/0.8454 | 29.20/0.8086 | 28.19/0.8534 | - |
NGswin (CVPR’23) [73] | ×3 | 1007 | 66.6 | 34.52/0.9282 | 30.53/0.8456 | 29.19/0.8078 | 28.52/0.8603 | 33.89/0.9470 |
LGUN (Ours) | ×3 | 684 | 63.5 | 34.60/0.9292 | 30.54/0.8458 | 29.25/0.8102 | 28.53/0.8586 | 34.26/0.9480 |
Bicubic | ×4 | ∖ | ∖ | 28.42/0.8104 | 26.00/0.7027 | 25.96/0.6675 | 23.14/0.6577 | 24.89/0.7866 |
SRCNN(TPAMI’14) [15] | ×4 | 57 | 52.7 | 30.48/0.8628 | 27.50/0.7513 | 26.90/0.7101 | 24.52/0.721 | 27.58/0.85555 |
VDSR(CVPR’16) [11] | ×4 | 665 | 612.6 | 31.35/0.8830 | 28.02/0.7680 | 27 29/0.7260 | 25.18/0.7540 | 28.83/0.8870 |
DRCN(CVPR’16) [65] | ×4 | 1774 | 9788.7 | 31.53/0.8854 | 28.02/0.7670 | 27.23/0.7233 | 25.18/0.7524 | 28.93/0.8854 |
LapSRN(CVPR’17) [66] | ×4 | 813 | 149.4 | 31.54/0.8850 | 28.19/0.7720 | 27.32/0.7270 | 25.21/0.7560 | 29.09/0.8900 |
MemNet(ICCV’17) [67] | ×4 | 677 | 623.9 | 31.74/0.8893 | 28.26/0.7723 | 27.40/0.7281 | 25.50/0.7630 | 29.42/0.8942 |
IDN(CVPR’18) [69] | ×4 | 553 | 32.3 | 31.82/0.8903 | 28.25/0.7730 | 27.41/0.7297 | 25.41/0.7632 | 29.41/0.8942 |
CARN(ECCV’18) [70] | ×4 | 1592 | 90.9 | 32.13/0.8937 | 28.60/0.7806 | 27.58/0.7349 | 26.07/0.7837 | 30.47/0.9084 |
EDSR-baseline(CVPR’19) [12] | ×4 | 1518 | 114 | 32.09/0.8938 | 28.58/0.7813 | 27.57/0.7357 | 26.04/0.7849 | 30.35/0.9067 |
SRFBN-S(CVPR’19) [68] | ×4 | 483 | 852.9 | 31.98/0.8923 | 28.45/0.7779 | 27.44/0.7313 | 25.71/0.7719 | 29.91/0.9008 |
SMSR(CVPR’21) [71] | ×4 | 1006 | 41.6 | 32.12/0.8932 | 28.55/0.7808 | 27.55/0.7351 | 26.11/0.7868 | 30.54/0.9085 |
A2N(arXiv’19) [72] | ×4 | 1047 | 72.4 | 32.30/0.8966 | 28.71/0.7842 | 27.61/0.7374 | 26.27/0.7920 | 30.67/0.9110 |
LMAN(TBC’21) [26] | ×4 | 1673 | 122.0 | 32.40/0.8974 | 28.72/0.7842 | 27.66/0.7388 | 26.36/0.7934 | 30.84/0.9129 |
SwinIR(ICCV’21) [32] | ×4 | 897 | 61.7 | 32.44/0.8976 | 28.77/0.7858 | 27.69/0.7406 | 26.47/0.7980 | 30.92/0.9151 |
B-GSCN 10(KBS’21) [74] | ×4 | 1530 | 88 | 32.18/0.8950 | 28.60/0.7821 | 27.59/0.7364 | 26.12/0.7872 | 30.50/0.9080 |
DRSDN(KBS’21) [24] | ×4 | 1095 | 63.1 | 32.28/0.8962 | 28.64/0.7836 | 27.64/0.7388 | 26.30/0.7933 | - |
FPNet(TCSVT’22) [75] | ×4 | 1615 | - | 32.32/0.8962 | 28.78/0.7856 | 27.66/0.7394 | 26.09/0.7850 | - |
NGswin(CVPR’23) [73] | ×4 | 1019 | 36.4 | 32.33/0.8963 | 28.78/0.7859 | 27.66/0.7396 | 26.45/0.7963 | 30.80/0.9128 |
LGUN (Ours) | ×4 | 696 | 36.4 | 32.63/0.9008 | 28.94/0.7897 | 27.82/0.7458 | 26.88/0.8084 | 31.52/0.9183 |
(a) Results for the MLHA and DGSA modules. | |||||
LGU | Set5 | Set14 | BSDS100 | Urban100 | Manga109 |
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | |
w/o MLHA | 38.19/0.9616 | 33.84/0.9199 | 32.28/0.9018 | 32.49/0.9307 | 39.31/0.9784 |
w/o DGSA | 38.15/0.9612 | 33.65/0.9180 | 32.25/0.9014 | 32.18/0.9284 | 39.11/0.9780 |
w MLHA + DGSA (Ours) | 38.24/0.9618 | 33.93/0.9208 | 32.34/0.9027 | 32.65/0.9322 | 39.38/0.9786 |
(b) Results for the STF strategy in MLHA. | |||||
MLHA | Set5 | Set14 | BSDS100 | Urban100 | Manga109 |
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | |
w/o STF | 38.20/0.9616 | 33.89/0.9200 | 32.30/0.9020 | 32.48/0.9309 | 39.28/0.9781 |
w STF (Ours) | 38.24/0.9618 | 33.93/0.9208 | 32.34/0.9027 | 32.65/0.9322 | 39.38/0.9786 |
(c) Results for the MTS strategy in DGSA. | |||||
DGSA | Set5 | Set14 | BSDS100 | Urban100 | Manga109 |
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | |
w/o top-k | 38.21/0.9615 | 33.87/0.9201 | 32.32/0.9024 | 32.56/0.9316 | 39.30/0.9785 |
w top-k | 38.22/0.9616 | 33.90/0.9203 | 32.32/0.9024 | 32.57/0.9317 | 39.34/0.9786 |
top-k with MTS (Ours) | 38.24/0.9618 | 33.93/0.9208 | 32.34/0.9027 | 32.65/0.9322 | 39.38/0.9786 |
(d) Results for the effectiveness of LKCS modules in MLHA. | |||||
MLHA | Set5 | Set14 | BSDS100 | Urban100 | Manga109 |
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | |
Identical LKCS | 38.11/0.9609 | 33.62/0.9175 | 32.19/0.9008 | 32.13/0.9277 | 39.05/0.9772 |
Different LKCS (Ours) | 38.24/0.9618 | 33.93/0.9208 | 32.34/0.9027 | 32.65/0.9322 | 39.38/0.9786 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, L.; Liao, X.; Ren, C. Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks. Sensors 2024, 24, 5098. https://doi.org/10.3390/s24165098
Xiao L, Liao X, Ren C. Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks. Sensors. 2024; 24(16):5098. https://doi.org/10.3390/s24165098
Chicago/Turabian StyleXiao, Luyang, Xiangyu Liao, and Chao Ren. 2024. "Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks" Sensors 24, no. 16: 5098. https://doi.org/10.3390/s24165098
APA StyleXiao, L., Liao, X., & Ren, C. (2024). Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks. Sensors, 24(16), 5098. https://doi.org/10.3390/s24165098