Historical Text Image Enhancement Using Image Scaling and Generative Adversarial Networks
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Data Collection
Database | Task | Number of Images | Real/Synthetic | Image Size (Resolution) |
---|---|---|---|---|
Noisy Office [30] | Image de-noising | 288 | Mixed | Variable |
S-MS [27] | Mixed (de-noising, de-blurring, de-fading) | 3.4309 | Synthetic | 1001 × 330 |
Bishop Bickley diary [29] | Image binarization | 07 | Real | 1050 × 1350 |
Tobacco 800 [31] | Image de-noising | 1290 | Real | 1200 × 1600 |
Smart Doc-QA [32] | Image de-blurring | 4260 | 351 × 292 | |
Blurry document images [5] | Image de-blurring | 35,000 | Synthetic | 300 × 300 |
3.2. Methodology
3.2.1. Pixels Resolution Enhancement
LWT Bi-Cubic Interpolation Method
Stationary Wavelet Transform (SWT)
Addition of Sub-Bands and Interpolation
Inverse Lifting Wavelet Transform
3.2.2. Generative Adversarial Network (GAN)
4. Results
5. Conclusions and Future Direction
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Megyei, B.; Blomqvist, N.; Pettersson, E. The decode database: Collection of historical ciphers and keys. In Proceedings of the the 2nd International Conference on Historical Cryptology, Histo-Crypt 2019, Mons, Belgium, 23–26 June 2019; pp. 69–78. [Google Scholar]
- Kang, S.; Iwana, B.K.; Uchida, S. Complex image processing with less data document image binarization by integrating multiple pre train edu-net modules. Pattern Recognit. 2021, 109, 107577. [Google Scholar] [CrossRef]
- Jemni, S.K.; Souibgui, M.A.; Kessentini, Y.; Fornes, A. Enhance to read better: Amulti-task adversarial network for hand written document image enhancement. Pattern Recognit. 2022, 123, 108370. [Google Scholar] [CrossRef]
- Wang, B.; Chen, C.L.P. An effective background estimation method for shadows removal of document images. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3611–3615. [Google Scholar]
- Hradis, M.; Kotera, J.; Zemcık, P.; Šroubek, F. Convolutional neural networks for direct text deblurring. In Proceedings of the BMVC, Swansea, UK, 7–10 September 2015. [Google Scholar]
- Gu, S.; Zuo, W.; Guo, S.; Chen, Y.; Chen, C.; Zhang, L. Learning dynamic guidance for depth image enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3769–3778. [Google Scholar]
- Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1780–1789. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
- Sauvola, J.; Pietikainen, M. Adaptive document image binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef]
- Xiong, W.; Xu, J.; Xiong, Z.; Wang, J.; Liu, M. Degraded historical document image binarization using local features and support vector machine(svm). Optik 2018, 164, 218–223. [Google Scholar] [CrossRef]
- Hedjam, R.; Cheriet, M.; Kalacska, M. Constrained energy maximization and self-referencing method for invisible ink detection from multi spectral historical document images. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 3026–3031. [Google Scholar]
- Pratikakis, I.; Zagoris, K.; Barlas, G.; Gatos, B. Icdar 2017 competition on document image binarization (dibco2017). In Proceedings of the 2017 International Conference on Document Analysis and Recognition, Kyoto, Japan, 9–15 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1395–1403. [Google Scholar]
- Zhao, W.; Jun, W.; Lidong, H.; Zebin, S. Combination of contrast limited adaptive histogram equalization and discrete wavelet transform for image enhancement. IET Image Process. 2015, 9, 908–915. [Google Scholar]
- Afzal, M.Z.; Pastor-Pellicer, J.; Shafait, F.; Breuel, T.M.; Dengel, A.; Liwicki, M. Document image binarization using lstm: A sequence learning approach. In Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, Nancy, France, 22 August 2015; pp. 79–84. [Google Scholar]
- Mao, X.-J.; Shen, C.; Yang, Y.-B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2810–2818. [Google Scholar]
- Akbari, Y.; Al-Maadeed, S.; Adam, K. Binarization of degraded document images using convolutional neural networks and wavelet based multi channel images. IEEE Access 2020, 8, 153517–153534. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Souibgui, M.A.; Kessentini, Y. De-gan: A conditional generative adversarial network for document enhancement. IEEE Trans.Pattern Anal. Mach. Intell. 2020, 44, 1180–1191. [Google Scholar] [CrossRef] [PubMed]
- Zhao, J.; Shi, C.; Jia, F.; Wang, Y.; Xiao, B. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognit. 2019, 96, 106968. [Google Scholar] [CrossRef]
- Bhunia, A.K.; Bhunia, A.K.; Sain, A.; Roy, P.P. Improving document binarization via adversarial noise-texture augmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2721–2725. [Google Scholar]
- Tamrin, M.O.; Ech-Cherif, M.E.-A.; Cheriet, M. A two stage unsupervised deep learning framework for degradation removal in ancient documents. In Proceedings of the CPR International Workshops and Challenges, Virtual Event, 10–15 January 2021; pp. 292–303. [Google Scholar]
- Souibgui, M.A.; Kessentini, Y.; Fornes, A. A Conditional GAN based Approach for Distorted Camera Captured Documents Recovery. In Proceedings of the Mediterranean Conference on Pattern Recognition and Artificial Intelligence, Hammamet, Tunisia, 20–22 December 2020; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; VanGool, L.; Timofte, R. Swinir: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Feng, H.; Wang, Y.; Zhou, W.; Deng, J.; Li, H. Doctr: Document image transformer for geometric unwarping and illumination correction. arXiv 2021, arXiv:2110.12942. [Google Scholar]
- Hejam, R.; Nafchi, H.Z.; Moghaddam, R.F.; Kalacska, M.; Cheriet, M. Icdar2015 contest on multi spectral text extraction (ms-tex2015). In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR) 2015, Tunis, Tunisia, 23–26 August 2015; pp. 1181–1185. [Google Scholar]
- Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
- Deng, F.; Wu, Z.; Lu, Z.; Brown, M.S. Binarization shop: A user-assisted software suite for converting old documents to black-and-white. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, Gold Coast, Australia, 21–25 June 2010; pp. 255–258. [Google Scholar]
- Dua, D.; Graff, C. UCL Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 31 January 2023).
- Lai, W.-S.; Huang, J.-B.; Ahuja, N.; Yang, M.-H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Nayef, N.; Luqman, M.M.; Prum, S.; Eskenazi, S.; Chazalon, J.; Ogier, J.M. Smartdoc-qa: A dataset for quality assessment of smart phone captured document images-single and multiple distortions. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1231–1235. [Google Scholar]
Method | PSNR | FM | WER | DRD |
---|---|---|---|---|
S. Kang [2] (CNN) | 14.42 | 96.54 | 85.45 | 1.23 |
S. K. Jemni [3] (CGAN) | 14.09 | 94.21 | 84.95 | 1.18 |
LWT &SWT | 13.95 | 94.18 | 83.12 | 1.01 |
Final Proposed | 15.02 | 96.51 | 86.17 | 1.31 |
Method | PSNR | FM | WER | DRD |
---|---|---|---|---|
S. Kang [2] (CNN) | 17.31 | 85.61 | 74.21 | 2.02 |
S. K. Jemni [3] (CGAN) | 16.99 | 85.21 | 74.19 | 1.99 |
LWT and SWT | 16.91 | 84.02 | 73.91 | 1.88 |
Final Proposed | 18.01 | 85.71 | 73.32 | 2.44 |
Method | PSNR | FM | WER | DRD |
---|---|---|---|---|
S. Kang [2] (CNN) | 19.42 | 98.12 | 81.28 | 2.11 |
S. K. Jemni [3] (CGAN) | 19.02 | 97.84 | 81.31 | 2.10 |
LWT and SWT | 18.21 | 94.02 | 80.21 | 2.01 |
Final Proposed | 19.49 | 97.13 | 82.12 | 2.17 |
Method | PSNR | FM | WER | DRD |
---|---|---|---|---|
S. Kang [2] (CNN) | 21.02 | 93.21 | 83.21 | 2.18 |
S. K. Jemni [3] (CGAN) | 20.51 | 93.01 | 83.02 | 2.07 |
LWT and SWT | 19.79 | 92.81 | 82.88 | 1.98 |
Final Proposed | 21.74 | 93.24 | 83.84 | 2.57 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, S.U.; Ullah, I.; Khan, F.; Lee, Y.; Ullah, S. Historical Text Image Enhancement Using Image Scaling and Generative Adversarial Networks. Sensors 2023, 23, 4003. https://doi.org/10.3390/s23084003
Khan SU, Ullah I, Khan F, Lee Y, Ullah S. Historical Text Image Enhancement Using Image Scaling and Generative Adversarial Networks. Sensors. 2023; 23(8):4003. https://doi.org/10.3390/s23084003
Chicago/Turabian StyleKhan, Sajid Ullah, Imdad Ullah, Faheem Khan, Youngmoon Lee, and Shahid Ullah. 2023. "Historical Text Image Enhancement Using Image Scaling and Generative Adversarial Networks" Sensors 23, no. 8: 4003. https://doi.org/10.3390/s23084003
APA StyleKhan, S. U., Ullah, I., Khan, F., Lee, Y., & Ullah, S. (2023). Historical Text Image Enhancement Using Image Scaling and Generative Adversarial Networks. Sensors, 23(8), 4003. https://doi.org/10.3390/s23084003