Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition
Abstract
:1. Introduction
- We prepare an offline handwritten database that is used for various applications, such as text recognition, writer identification, and signature verification. In addition, we prepare a synthetic dataset to pre-train the proposed model and adjust the weights and hyperparameters of the proposed model. The provided datasets are released and can be used by others in the future.
- A comparative analysis between gated CNN-LSTM, CNN-LSTM, and HTR-flor++ (gated CNN-BGRU), which show state-of-the-art results on the IAM public dataset, and our proposed model, gated CNN-Transformer, is given.
- The proposed model shows a promising recognition result on the collected HEWD and HETD and can be effectively used practically.
2. Materials and Methods
2.1. Ethiopic Script
2.2. Overview of Handwritten Text Recognition Techniques
- Assabie and Bigun [24] presented a writer-independent offline HCR system using the characteristics and special relationships of primitive strokes. The accuracy of the proposed model was determined via the special relationships of primitives. If the relationships of primitives are poor, the recognition will fail; otherwise, they will be perfectly recognized. They used three different datasets that were collected from different sources, and the proposed approach achieved the recognition results of 87%, 76%, and 81% for each dataset.
- In [25], an HMM-based writer-independent offline handwritten Amharic word recognition system was designed using direction field tensor to detect text lines and extract features from the text lines. For each Ethiopic character, primitive structural features were stored as a feature list for the training and testing of the model. This work focused only on the Amharic language, which is one of the languages using half of Ethiopic script as the writing system.
- Assabie and Bigun [18] presented online handwritten Ethiopic script recognition by generating a unique set of primitive stroke sequences for each character using a special tree structure. For recognition, each stroke sequence was matched against a stored knowledge base. To improve the processing time and efficiency of recognition, structural similarity was used to classify a plausible set of unknown inputs. These approaches are limited to identifying a new entry with different characteristics from the stored ones.
- An unconstrained handwritten Amharic word recognizer was presented using the concatenated features of constituent characters and HMM [26]. Word features were formed by concatenating features of constituting characters from sample extracted features of characters. To build the HMM model, features were extracted from isolated handwritten characters. The models were trained and tested on good-quality and poor-quality data with 10, 100, and 10,932 training words. On both the good- and poor-quality training data, HMM recognition resulted in a better performance than a feature-level concatenation method. In the poor-quality dataset, HMM recognized 78%, 73%, and 41% of 10, 100, and 10,932 training words. Similarly, for good-quality data, HMM recognized 92%, 93%, and 66% of 10, 100, and 10,932 training words.
- In [27], a deep convolutional neural network was introduced to recognize Ethiopian ancient Ge’ez characters. This method considers only twenty-six characters.
2.3. Proposed Methodology
2.3.1. Feature Extraction Layer
2.3.2. Encoder Layer
2.3.3. Decoder Layer
3. Results
3.1. Data Preparation
Algorithm 1: Word Detection and Semi-Automatic Labeling. |
|
3.2. Experiment Setup
3.3. Experimental Results
4. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, C.-L.; Yin, F.; Wang, D.-H.; Wang, Q.-F. Online and Offline Handwritten Chinese Character Recognition: Benchmarking on New Databases. Pattern Recognit. 2013, 46, 155–162. [Google Scholar] [CrossRef]
- Natarajan, P.; Saleem, S.; Prasad, R.; MacRostie, E.; Subramanian, K. Multi-Lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach. In Arabic and Chinese Handwriting Recognition; Springer: Berlin/Heidelberg, Germany, 2008; pp. 231–250. [Google Scholar] [CrossRef]
- España-Boquera, S.; Castro-Bleda, M.J.; Gorbe-Moya, J.; Zamora-Martinez, F. Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 767–779. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process Syst. 2012, 25, 1–9. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, X.; Fu, B.; Zhan, Z.; Sun, H.; Li, L.; Zhang, G. Evaluation and Recognition of Handwritten Chinese Characters Based on Similarities. Appl. Sci. 2022, 12, 8521. [Google Scholar] [CrossRef]
- Hu, M.; Qu, X.; Huang, J.; Wu, X. An End-to-End Classifier Based on CNN for In-Air Handwritten-Chinese-Character Recognition. Appl. Sci. 2022, 12, 6862. [Google Scholar] [CrossRef]
- Graves, A.; Schmidhuber, J.J. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Available online: https://proceedings.neurips.cc/paper_files/paper/2008/hash/66368270ffd51418ec58bd793f2d9b1b-Abstract.html (accessed on 1 December 2023).
- Puigcerver, J. Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Kyoto, Japan, 2 July 2017; IEEE Computer Society: Washington, DC, USA; Volume 1, pp. 67–72. [Google Scholar]
- Bluche, T.; Messina, R. Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Kyoto, Japan, 2 July 2017; IEEE Computer Society: Washington, DC, USA; Volume 1, pp. 646–651. [Google Scholar]
- Flor, A.; Neto, D.S.; Leite, B.; Bezerra, D.; Toselli, A.H. HTR-Flor++: A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models; Association for Computing Machinery: New York, NY, USA, 2020. [Google Scholar]
- Marti, U.V.; Bunke, H. The IAM-Database: An English Sentence Database for Offline Handwriting Recognition. Int. J. Doc. Anal. Recognit. 2003, 5, 39–46. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Transformer: Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Mendisu, B.S.; Efforts, C.B. The Ethiopic Script: Linguistic Features and Socio-Cultural Connotations. Oslo. Stud. Lang. 2017, 8, 137–172. [Google Scholar]
- Huang, T.S.; Yang, G.J.; Tang, G.Y. A Fast Two-Dimensional Median Filtering Algorithm. IEEE Trans. Acoust. 1979, 27, 13–18. [Google Scholar] [CrossRef]
- Praveen, K.S.; Babu, K.P.; Sreenivasulu, M. Implementation of Image Sharpening and. Int. Sci. Eng. Appl. Sci. 2016, 2, 7–14. [Google Scholar]
- Xu, S.; Wu, Q.; Zhang, S. Application of Neural Network in Handwriting Recognition; IEEE Transactions on International Conference of Stanford University: Stanford, CA, USA, 2020. [Google Scholar]
- Sadri, J.; Suen, C.Y.; Bui, T.D. Application of Support Vector Machines for Recognition of Handwritten Arabic/Persian Digits. In Proceedings of the Second Conference on Machine Vision and Image Processing & Applications (MVIP 2003), Tehran, Iran, 13–14 February 2003; Volume 1, pp. 300–307. [Google Scholar]
- Assabie, Y.; Bigun, J. Online Handwriting Recognition of Ethiopic Script. In Proceedings of the Eleventh International Conference on Frontiers in Handwriting Recognition (ICFHR2008), Montreal, QC, Canada, 19–21 August 2008; pp. 153–158. [Google Scholar]
- Bluche, T.; Louradour, J.; Messina, R. Scan, Attend and Read: End-To-End Handwritten Paragraph Recognition with MDLSTM Attention. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 1050–1055. [Google Scholar] [CrossRef]
- Graves, A. Offline Arabic Handwriting Recognition with Multidimensional Recurrent Neural Networks. In Guide to OCR for Arabic Scripts; Springer: London, UK, 2012; pp. 297–313. [Google Scholar] [CrossRef]
- Moysset, B.; Messina, R. Are 2D-LSTM Really Dead for Offline Text Recognition? Int. J. Doc. Anal. Recognit. 2019, 22, 193–208. [Google Scholar] [CrossRef]
- Stuner, B.; Chatelain, C.; Paquet, T. Handwriting Recognition Using Cohort of LSTM and Lexicon Verification with Extremely Large Lexicon. Multimed. Tools Appl. 2020, 79, 34407–34427. [Google Scholar] [CrossRef]
- Soomro, M.; Farooq, M.A.; Raza, R.H. Performance Evaluation of Advanced Deep Learning Architectures for Offline Handwritten Character Recognition. In Proceedings of the 2017 International Conference on Frontiers of Information Technology, FIT, Islamabad, Pakistan, 18–20 December 2017; pp. 362–367. [Google Scholar] [CrossRef]
- Assabie, Y.; Bigun, J. Writer-Independent Offline Recognition of Handwritten Ethiopic Characters. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR), Montréal, QC, Canada, 19–21 August 2008; pp. 652–657. [Google Scholar]
- Assabie, Y.; Bigun, J. HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; IEEE: Toulouse, France; pp. 961–965. [Google Scholar]
- Assabie, Y.; Bigun, J. Offline Handwritten Amharic Word Recognition. Pattern Recognit. Lett. 2011, 32, 1089–1099. [Google Scholar] [CrossRef]
- Demilew, F.A.; Sekeroglu, B. Ancient Geez Script Recognition Using Deep Learning. SN Appl. Sci. 2019, 1, 1315. [Google Scholar] [CrossRef]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv 2005, arXiv:1502.01852. [Google Scholar]
- Ioffe, S. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 1946–1954. [Google Scholar]
- Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. EMNIST: An Extension of MNIST to Handwritten Letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017. [Google Scholar]
- Kiessling, B.; Ezra, D.S.B.; Miller, M.T. BadAM: A Public Dataset for Baseline Detection in Arabic-Script Manuscripts. In ACM International Conference Proceeding Series; Association for Computing Machinery: New York, NY, USA, 2019; pp. 13–18. [Google Scholar] [CrossRef]
- Yavariabdi, A.; Kusetogullari, H.; Celik, T.; Thummanapally, S.; Rijwan, S.; Hall, J. CArDIS: A Swedish Historical Handwritten Character and Word Dataset. IEEE Access 2022, 10, 55338–55349. [Google Scholar] [CrossRef]
- Cheddad, A.; Kusetogullari, H.; Hilmkil, A.; Sundin, L.; Yavariabdi, A.; Aouache, M.; Hall, J. SHIBR—The Swedish Historical Birth Records: A Semi-Annotated Dataset. Neural Comput. Appl. 2021, 33, 15863–15875. [Google Scholar] [CrossRef]
- Dutta, A.; Zisserman, A. The {VIA} Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 1 January 2021; ACM: New York, NY, USA, 2019. [Google Scholar]
- Breuel, T.M. The OCRopus Open Source OCR System. In Proceedings of the Document Recognition and Retrieval XV, SPIE, San Jose, CA, USA, 27 January 2008. [Google Scholar]
Database | Total | Training | Testing | Validation |
---|---|---|---|---|
HETD | 2800 | 2240 | 560 | - |
HEWD | 10,540 | 8432 | 2108 | - |
Synthetic text line | 290,000 | 174,000 | 58,000 | 58,000 |
Synthetic word | 500,000 | 300,000 | 100,000 | 100,000 |
Network | WER | CER |
---|---|---|
Puigcerver | 15.5 | 10.15 |
Bluche | 14.8 | 9.12 |
Flor | 14.51 | 8.92 |
Proposed | 13.11 | 8.72 |
Network | WER | CER |
---|---|---|
Puigcerver | 13.75 | 9.55 |
Bluche | 11.24 | 8.46 |
Flor | 11.08 | 8.41 |
Proposed | 9.17 | 8.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tadesse, D.A.; Liu, C.-M.; Ta, V.-D. Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition. Information 2023, 14, 654. https://doi.org/10.3390/info14120654
Tadesse DA, Liu C-M, Ta V-D. Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition. Information. 2023; 14(12):654. https://doi.org/10.3390/info14120654
Chicago/Turabian StyleTadesse, Direselign Addis, Chuan-Ming Liu, and Van-Dai Ta. 2023. "Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition" Information 14, no. 12: 654. https://doi.org/10.3390/info14120654
APA StyleTadesse, D. A., Liu, C. -M., & Ta, V. -D. (2023). Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition. Information, 14(12), 654. https://doi.org/10.3390/info14120654