Pixel-Level Degradation for Text Image Super-Resolution and Recognition
Abstract
:1. Introduction
2. Theoretical Foundations
2.1. Noise
2.1.1. Gaussian Noise
2.1.2. JPEG Compression Noise
2.2. Gaussian Blur
2.3. Transfer Learning
3. Methods
3.1. Super Resolution Network Module
3.2. Pixel-Level High-Order Degradation Module
4. Results
4.1. Datasets
4.1.1. TextZoom Dataset
4.1.2. MJSynth Dataset
4.2. Implementation Details
4.3. Experiment Result
4.4. Limitation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Long, S.; He, X.; Yao, C. Scene text detection and recognition: The deep learning era. Int. J. Comput. Vis. 2021, 129, 161–184. [Google Scholar] [CrossRef]
- Naiemi, F.; Ghods, V.; Khalesi, H. Scene text detection and recognition: A survey. Multimed. Tools Appl. 2022, 81, 20255–20290. [Google Scholar] [CrossRef]
- Dong, C.; Zhu, X.; Deng, Y.; Loy, C.C.; Qiao, Y. Boosting optical character recognition: A super-resolution approach. arXiv 2015, arXiv:1506.02211. [Google Scholar]
- Wang, W.; Xie, E.; Liu, X.; Wang, W.; Liang, D.; Shen, C.; Bai, X. Scene text image super-resolution in the wild. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 650–666, Part X 16. [Google Scholar]
- Chen, J.; Li, B.; Xue, X. Scene text telescope: Text-focused scene image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12026–12035. [Google Scholar]
- Fang, C.; Zhu, Y.; Liao, L.; Ling, X. TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention. Neurocomputing 2021, 455, 88–96. [Google Scholar] [CrossRef]
- Ma, J.; Liang, Z.; Zhang, L. A text attention network for spatial deformation robust scene text image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5911–5920. [Google Scholar]
- Wang, K.; Belongie, S. Word spotting in the wild. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 591–604, Part I 11. [Google Scholar]
- Yuan, T.-L.; Zhu, Z.; Xu, K.; Li, C.-J.; Mu, T.-J.; Hu, S.-M. A large chinese text dataset in the wild. J. Comput. Sci. Technol. 2019, 34, 509–521. [Google Scholar] [CrossRef]
- Risnumawan, A.; Shivakumara, P.; Chan, C.S.; Tan, C.L. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 2014, 41, 8027–8048. [Google Scholar] [CrossRef]
- Chen, H.; Gu, J.; Liu, Y.; Magid, S.A.; Dong, C.; Wang, Q.; Pfister, H.; Zhu, L. Masked Image Training for Generalizable Deep Image Denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1692–1703. [Google Scholar]
- Lan, R.; Sun, L.; Liu, Z.; Lu, H.; Pang, C.; Luo, X. MADNet: A fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 2020, 51, 1443–1453. [Google Scholar] [CrossRef] [PubMed]
- Ates, H.F.; Yildirim, S.; Gunturk, B.K. Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation. Comput. Vis. Image Underst. 2023, 233, 103718. [Google Scholar] [CrossRef]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
- Niu, A.; Zhu, Y.; Zhang, C.; Sun, J.; Wang, P.; Kweon, I.S.; Zhang, Y. Ms2net: Multi-scale and multi-stage feature fusion for blurred image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5137–5150. [Google Scholar] [CrossRef]
- Xuhong, L.; Grandvalet, Y.; Davoine, F. Explicit inductive bias for transfer learning with convolutional networks. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2825–2834. [Google Scholar]
- Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 22367–22377. [Google Scholar]
- Cai, J.; Zeng, H.; Yong, H.; Cao, Z.; Zhang, L. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3086–3095. [Google Scholar]
- Zhang, X.; Chen, Q.; Ng, R.; Koltun, V. Zoom to learn, learn to zoom. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3762–3770. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv 2014, arXiv:1406.2227. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Zhao, C.; Feng, S.; Zhao, B.N.; Ding, Z.; Wu, J.; Shen, F.; Shen, H.T. Scene text image super-resolution via parallelly contextual attention network. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 2908–2917. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Backbone | Easy/% | Medium/% | Hard/% |
---|---|---|---|
BICUBIC | 64.7 | 42.4 | 31.2 |
SRCNN | 69.4 | 43.4 | 32.2 |
SRResNet | 69.6 | 47.6 | 34.3 |
EDSR | 72.3 | 48.3 | 34.3 |
RDN | 70.0 | 47.0 | 34.0 |
ESRGAN | 68.4 | 49.5 | 35.6 |
TSRN | 75.1 | 56.3 | 40.1 |
TSRGAN | 75.7 | 57.3 | 40.9 |
TBSRN | 75.7 | 59.9 | 41.6 |
Ours | 78.7 | 63.3 | 45.5 |
Backbone | Easy/% | Medium/% | Hard/% |
---|---|---|---|
BICUBIC | 60.6 | 37.9 | 30.8 |
SRCNN | 63.2 | 39.0 | 30.2 |
SRResNet | 60.7 | 42.9 | 32.6 |
EDSR | 63.6 | 45.4 | 32.2 |
RDN | 61.7 | 42.0 | 31.6 |
ESRGAN | 63.4 | 43.2 | 34.3 |
TSRN | 70.1 | 53.3 | 37.9 |
TSRGAN | 72.0 | 54.6 | 39.3 |
TBSRN | 74.1 | 57.0 | 40.8 |
Ours | 75.7 | 61.5 | 45.0 |
Backbone | Easy/% | Medium/% | Hard/% |
---|---|---|---|
BICUBIC | 36.4 | 21.1 | 21.1 |
SRCNN | 38.7 | 21.6 | 20.9 |
SRResNet | 39.7 | 27.6 | 22.7 |
EDSR | 42.7 | 29.3 | 24.1 |
RDN | 41.6 | 24.4 | 23.5 |
ESRGAN | 50.2 | 33.0 | 28.9 |
TSRN | 52.5 | 38.2 | 31.4 |
TSRGAN | 56.2 | 42.5 | 32.8 |
TBSRN | 59.6 | 47.1 | 35.3 |
Ours | 62.7 | 55.0 | 41.1 |
Backbone | PSNR | ||
---|---|---|---|
Easy | Medium | Hard | |
EDSR | 24.26 | 18.63 | 19.14 |
RDN | 22.27 | 18.95 | 19.70 |
LapSRN | 24.58 | 18.85 | 19.77 |
ESRGAN | 24.01 | 19.62 | 20.30 |
TSRN | 25.07 | 18.86 | 19.74 |
TSRGAN | 24.22 | 19.17 | 19.99 |
TBSRN | 23.82 | 19.17 | 19.68 |
Ours | 20.64 | 18.88 | 19.20 |
Backbone | SSIM | ||
---|---|---|---|
Easy | Medium | Hard | |
EDSR | 0.8633 | 0.6440 | 0.7108 |
RDN | 0.8249 | 0.6427 | 0.7113 |
LapSRN | 0.8556 | 0.6480 | 0.7087 |
ESRGAN | 0.8489 | 0.6569 | 0.7290 |
TSRN | 0.8897 | 0.6676 | 0.7302 |
TSRGAN | 0.8791 | 0.6770 | 0.7420 |
TBSRN | 0.8660 | 0.6533 | 0.7490 |
Ours | 0.8029 | 0.6292 | 0.6655 |
Backbone | SRCNN | EDSR | ESRGAN | TSRN | TBSRN | Ours |
---|---|---|---|---|---|---|
Parameters | 1.8 M | 43 M | 16 M | 2.6 M | 3.2 M | 5.2 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qian, X.; Xie, L.; Ye, N.; Le, R.; Yang, S. Pixel-Level Degradation for Text Image Super-Resolution and Recognition. Electronics 2023, 12, 4546. https://doi.org/10.3390/electronics12214546
Qian X, Xie L, Ye N, Le R, Yang S. Pixel-Level Degradation for Text Image Super-Resolution and Recognition. Electronics. 2023; 12(21):4546. https://doi.org/10.3390/electronics12214546
Chicago/Turabian StyleQian, Xiaohong, Lifeng Xie, Ning Ye, Renlong Le, and Shengying Yang. 2023. "Pixel-Level Degradation for Text Image Super-Resolution and Recognition" Electronics 12, no. 21: 4546. https://doi.org/10.3390/electronics12214546
APA StyleQian, X., Xie, L., Ye, N., Le, R., & Yang, S. (2023). Pixel-Level Degradation for Text Image Super-Resolution and Recognition. Electronics, 12(21), 4546. https://doi.org/10.3390/electronics12214546