A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition
Abstract
:1. Introduction
- We produced a new Bengali handwritten digit dataset. The collection contains 5440 images.
- The collection strategy distinguishes this dataset from other ordinary datasets.
- We employed a robust handcrafted image feature extraction method on this dataset and applied the machine learning algorithm to recognize the images.
- To effectively and efficiently recognize the handwritten digits, we developed three customized convolutional neural networks (CNN) model using a fine-tuning approach and then stacked them with the produced CNN model layers.
- Our proposed CNN model three achieved the highest accuracy of 97.43%, a training accuracy of 99.66%, and a testing accuracy of 97.43% on our new dataset. And the proposed stacked model achieved the highest accuracy of 96.14%, a training accuracy of 99.26%, and a testing accuracy of 96.14% on another dataset.
- To validate our study, we performed cutting-edge action on several research datasets of Bengali handwritten digits.
2. Related Work
3. Materials and Methods
3.1. Dataset Description
3.2. Feature Extractor and Learning Algorithm
3.3. Handcrafted Feature Extractors
3.3.1. Local Binary Pattern (LBP)
3.3.2. Complete Local Binary Pattern (CLBP)
3.3.3. Histogram of Oriented Gradients (HOG)
3.4. Convolution Neural Network (CNN)
3.5. Pretrained CNN Features
4. Bengali Handwritten Digit Recognition Pipeline
4.1. Data Collection Approach
4.2. Image Preprocessing
5. Research Methodology
5.1. First Proposed Approach
Handcrafted Model Details
5.2. Second Proposed Approach
5.2.1. CNN Model Details
5.2.2. Stacked Convolution Neural Network
6. Experimental Setup and System Specification
7. Experimental Results and Discussion
7.1. State-of-the-Art Action
7.2. Strengths and Weaknesses
8. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jana, R.; Bhattacharyya, S.; Das, S. Handwritten digit recognition using convolutional neural networks. Deep. Learn. Res. Appl. 2020, 4, 51–68. [Google Scholar] [CrossRef]
- Ivanov, A.S.; Nikolaev, K.G.; Novikov, A.S.; Yurchenko, S.O.; Novoselov, K.S.; Andreeva, D.V.; Skorb, E.V. Programmable soft-matter electronics. J. Phys. Chem. Lett. 2021, 12, 2017–2022. [Google Scholar] [CrossRef] [PubMed]
- Vadyala, S.R.; Betgeri, S.N.; Matthews, J.C.; Matthews, E. A review of physics-based machine learning in civil engineering. Results Eng. 2022, 13, 100316. [Google Scholar] [CrossRef]
- Amin, R.; Yasmin, R.; Ruhi, S.; Rahman, M.H.; Reza, M.S. Prediction of chronic liver disease patients using integrated projection-based statistical feature extraction with machine learning algorithms. Inform. Med. Unlocked 2023, 36, 101155. [Google Scholar] [CrossRef]
- Chai, J.; Zeng, H.; Li, A.; Ngai, E.W. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Mach. Learn. Appl. 2021, 6, 100134. [Google Scholar] [CrossRef]
- Shin, J.P. Optimal stroke-correspondence search method for on-line character recognition. Pattern Recognit. Lett. 2002, 23, 601–608. [Google Scholar] [CrossRef]
- Shin, J. On-line cursive hangul recognition that uses DP matching to detect key segmentation points. Pattern Recognit. 2004, 37, 2101–2112. [Google Scholar] [CrossRef]
- Gopalakrishan, V.; Arun, R.; Sasikumar, L. Handwritten Digit Recognition for Banking System. Int. J. Eng. Res. Technol. 2021, 9, 313–314. [Google Scholar]
- Karakaya, R.; Kazan, S. Handwritten Digit Recognition Using Machine Learning. Sak. Univ. J. Sci. 2021, 25, 65–71. [Google Scholar] [CrossRef]
- Shin, J.; Maniruzzaman, M.; Uchida, Y.; Hasan, M.A.M.; Megumi, A.; Suzuki, A.; Yasumura, A. Important features selection and classification of adult and child from handwriting using machine learning methods. Appl. Sci. 2022, 12, 5256. [Google Scholar] [CrossRef]
- Liu, C.L.; Suen, C.Y. A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters. Pattern Recognit. 2009, 42, 3287–3295. [Google Scholar] [CrossRef]
- Sufian, A.; Ghosh, A.; Naskar, A.; Sultana, F.; Sil, J.; Rahman, M.M.H. BDNet: Bengali Handwritten Numeral Digit Recognition based on Densely connected Convolutional Neural Networks. J. King Saud Univ. Comput. Inf. Sci. 2020, 4, 2610–2620. [Google Scholar] [CrossRef]
- Sen, O.; Fuad, M.; Islam, M.D.N.; Rabbi, J.; Masud, M.; Hasan, K.; Awal, M.D.A.; Fime, A.A.; Fuad, M.D.T.H.; Sikder, D.; et al. Bangla natural language processing: A comprehensive analysis of classical, machine learning, and deep learning-based methods. IEEE Access 2022, 10, 38999–39044. [Google Scholar] [CrossRef]
- Alam, S.; Reasat, T.; Doha, R.M.; Humayun, A.I. NumtaDB—Assembled Bengali Handwritten Digits. arXiv 2018, arXiv:1806.02452. [Google Scholar]
- Islam, M.; Shuvo, S.A.; Nipun, M.S.; Sulaiman, R.B.; Nayeem, J.; Haque, Z.; Sourav, M.S.U. Efficient approach of using CNN based pretrained model in Bangla handwritten digit recognition. arXiv 2022, arXiv:2209.13005. [Google Scholar]
- Basri, R.; Haque, M.R.; Akter, M.; Uddin, M.S. Bangla handwritten digit recognition using deep convolutional neural network. In Proceedings of the International Conference on Computing Advancements, Dhaka, Bangladesh, 10–12 January 2020; pp. 1–7. [Google Scholar]
- Basu, S.; Sarkar, R.; Das, N.; Kundu, M.; Nasipuri, M.; Basu, D.K. Handwritten Bangla digit recognition using classifier combination through DS technique. In Pattern Recognition and Machine Intelligence, Proceedings of the First International Conference, PReMI 2005, Kolkata, India, 20–22 December 2005; Proceedings 1; Springer: Berlin/Heidelberg, Germany, 2005; pp. 236–241. [Google Scholar]
- Shopon, M.; Mohammed, N.; Abedin, M.A. Bangla handwritten digit recognition using autoencoder and deep convolutional neural network. In Proceedings of the 2016 International Workshop on Computational Intelligence (IWCI), Dhaka, Bangladesh, 12–13 December 2016; IEEE: Piscataway Township, NJ, USA, 2016; pp. 64–68. [Google Scholar]
- Nasir, M.K.; Das, T.R.; Hasan, S.; Jani, M.R.; Tabassum, F.; Islam, M.I. Hand Written Bangla Numerals Recognition for Automated Postal System. IOSR J. Comput. Eng. 2013, 09, 158–171. [Google Scholar] [CrossRef]
- Scarmana, G. Lossless data compression of grid-based digital elevation models: A PNG image format evaluation. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 2, 313–319. [Google Scholar] [CrossRef] [Green Version]
- Mubarak, A.S.; Serte, S.; Al-Turjman, F.; Ameen, Z.S.; Ozsoz, M. Local binary pattern and deep learning feature extraction fusion for COVID-19 detection on computed tomography images. Expert Syst. 2022, 39, 1–13. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
- Alanne, K.; Sierla, S. An overview of machine learning applications for smart buildings. Sustain. Cities Soc. 2022, 76, 103445. [Google Scholar] [CrossRef]
- Zamzami, I.F. Deep Learning Models Applied to Prediction of 5G Technology Adoption. Appl. Sci. 2023, 13, 119. [Google Scholar] [CrossRef]
- Razzak, M.I.; Naz, S.; Zaib, A. Deep learning for medical image processing: Overview, challenges and the future. Lect. Notes Comput. Vis. Biomech. 2018, 26, 323–350. [Google Scholar] [CrossRef] [Green Version]
- Sliti, O.; Hamam, H.; Amiri, H. CLBP for scale and orientation adaptive mean shift tracking. J. King Saud Univ. Comput. Inf. Sci. 2018, 30, 416–429. [Google Scholar] [CrossRef]
- Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462. [Google Scholar] [CrossRef] [PubMed]
- Guo, Z.; Zhang, L.; Zhang, D. A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 2010, 19, 1657–1663. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; IEEE: Piscataway Township, NJ, USA, 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ameh Joseph, A.; Abdullahi, M.; Junaidu, S.B.; Hassan Ibrahim, H.; Chiroma, H. Improved multi-classification of breast cancer histopathological images using handcrafted features and deep neural network (dense layer). Intell. Syst. Appl. 2022, 14, 200066. [Google Scholar] [CrossRef]
- Ahmed, M.Z.I.; Sinha, N.; Phadikar, S.; Ghaderpour, E. Automated Feature Extraction on AsMap for Emotion Classification Using EEG. Sensors 2022, 22, 2346. [Google Scholar] [CrossRef]
- Sadik, R.; Majumder, A.; Biswas, A.A.; Ahammad, B.; Rahman, M.M. An in-depth analysis of Convolutional Neural Network architectures with transfer learning for skin disease diagnosis. Healthc. Anal. 2023, 3, 100143. [Google Scholar] [CrossRef]
- Srinivas, C.; Nandini, N.P.; Zakariah, M.; Alothaibi, Y.A.; Shaukat, K.; Partibane, B.; Awal, H. Deep Transfer Learning Approaches in Performance Analysis of Brain Tumor Classification Using MRI Images. J. Healthc. Eng. 2022, 2022, 3264367. [Google Scholar] [CrossRef]
- Stančić, A.; Vyroubal, V.; Slijepčević, V. Classification Efficiency of Pre-Trained Deep CNN Models on Camera Trap Images. J. Imaging 2022, 8, 20. [Google Scholar] [CrossRef]
- Yadav, S.S.; Jadhav, S.M. Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 2019, 6, 1–18. [Google Scholar] [CrossRef] [Green Version]
- Sann, S.S.; Win, S.S.; Thant, Z.M. An analysis of various image pre-processing techniques in butterfly image. Int. J. Adv. Res. Dev. 2021, 6, 1–4. [Google Scholar]
- Paul, O. Image Pre-processing on NumtaDB for Bengali Handwritten Digit Recognition. In Proceedings of the 2018 International Conference on Bangla Speech and Language Processing, ICBSLP 2018, Sylhet, Bangladesh, 21–22 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Renjith, S.; Abraham, A.; Jyothi, S.B.; Chandran, L.; Thomson, J. An ensemble deep learning technique for detecting suicidal ideation from posts in social media platforms. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9564–9575. [Google Scholar] [CrossRef]
- Kabir, M.H.; Ahmad, F.; Hasan, M.A.M.; Shin, J. Gender Recognition of Bangla Names Using Deep Learning Approaches. Appl. Sci. 2022, 13, 522. [Google Scholar] [CrossRef]
- Caruana, R.; Lawrence, S.; Giles, L. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. Adv. Neural Inf. Process. Syst. 2000, 13, 402–408. [Google Scholar]
- Wolpert, D. Stacked Generalization (Stacking). Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Gour, M.; Jain, S. Automated COVID-19 detection from X-ray and CT images with stacked ensemble convolutional neural network. Biocybern. Biomed. Eng. 2022, 42, 27–41. [Google Scholar] [CrossRef] [PubMed]
- Adnan, M.; Rahman, F.; Imrul, M.; Al, N.; Shabnam, S. Handwritten Bangla character recognition using inception convolutional neural network. Int. J. Comput. Appl. 2018, 181, 48–59. [Google Scholar] [CrossRef]
- Pal, U.; Chaudhuri, B.B. Automatic recognition of unconstrained off-line Bangla handwritten numerals. In Proceedings of the Advances in Multimodal Interfaces—ICMI 2000: Third International Conference, Beijing, China, 14–16 October 2000; Springer: Berlin/Heidelberg, Germany, 2001; pp. 371–378. [Google Scholar]
- Wen, Y.; Lu, Y.; Shi, P. Handwritten Bangla numeral recognition system and its application to postal automation. Pattern Recognit. 2007, 40, 99–107. [Google Scholar] [CrossRef]
- Hassan, T.; Khan, H.A. Handwritten bangla numeral recognition using local binary pattern. In Proceedings of the 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka, Bangladesh, 21–23 May 2015; IEEE: Piscataway Township, NJ, USA, 2015; pp. 1–4. [Google Scholar]
- Wen, Y.; He, L. A classifier for Bangla handwritten numeral recognition. Expert Syst. Appl. 2012, 39, 948–953. [Google Scholar] [CrossRef]
- Basu, S.; Das, N.; Sarkar, R.; Kundu, M.; Nasipuri, M.; Basu, D.K. A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recognit. 2010, 43, 3507–3521. [Google Scholar] [CrossRef]
- Saha, C.; Masuma, F.; Ahammad, K.; Muzammel, C.S.; Mohibullah, M. Real time Bangla Digit Recognition through Hand Gestures on Air Using Deep Learning and OpenCV. Int. J. Curr. Sci. Res. Rev. 2022, 5. [Google Scholar] [CrossRef]
- Bhattacharya, U.; Chaudhuri, B.B. Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 444–457. [Google Scholar] [CrossRef] [PubMed]
Model Name | Layers | Shape | Parameters | Total Parameters |
---|---|---|---|---|
CNN model one | Conv2D Kernel size MaxPooling2D Flatten Dense layer Dropout Dense | (28 × 28 × 32) (5 × 5) (14 × 14 × 32) (None, 6272) 124 0.30 10 | 832 0 0 0 777,852 0 1250 | 779,934 |
CNN model two | Conv2D Kernel size MaxPooling2D Conv2D Kernel size MaxPooling2D Flatten Dense Dropout Dense | (28 × 28 × 64) (3 × 3) (9 × 9 × 64) (9 × 9 × 32) (3 × 3) (4 × 4 × 32) (None, 512) 128 0.25 10 | 640 0 0 18,464 0 0 0 65,664 0 1290 | 86,058 |
CNN model three | Conv2D Kernel size MaxPooling2D Conv2D Kernel size MaxPooling2D Flatten Dense Dropout Dense | (28 × 28 × 64) (3 × 3) (14 × 14 × 64) (14 × 14 × 32) (3 × 3) (7 × 7 × 32) (None, 1568) 256 0.4 10 | 640 0 0 18,466 0 0 0 401,664 0 2570 | 423,338 |
Stacked model | CNN model_1 CNN model_2 CNN model_3 Stacked layer Dense Dropout Dense Dropout Dense | (None, 10) (None, 10) (None, 10) (None, 30) 256 0.4 150 0.3 10 | 7,799,934 86,058 423,338 0 12,400 0 120,300 0 3010 | 1,425,050 |
Resources | Usable System Specification |
---|---|
CPU | Intel Core(TM) i3 11 Gen-7100U CPU @ 2.40 GHz |
System Type | 64-bit operating system, x64-based processor |
Environment | Python 3 Google Colab |
RAM | 12.68 GB |
Disk | 1 TB Hard disk & 128 GB SSD |
Packages | Keras with TensorFlow, Sci-kit learn, Matplotlib, Pandas, Numpy |
Classifier | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | ROC AUC Score (%) |
---|---|---|---|---|---|
SVM | 58.08 | 58.61 | 58.05 | 58.91 | 87.86 |
XGBoost | 85.29 | 85.11 | 85.18 | 85.08 | 98.67 |
RF | 74.39 | 75.33 | 74.38 | 74.43 | 95.27 |
Data Source | Model Name | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | ROC AUC Score (%) |
---|---|---|---|---|---|---|
PUST | CNN model_1 | 93.87 | 93.91 | 93.87 | 93.86 | 99.61 |
CNN model_2 | 97.12 | 97.15 | 97.12 | 97.12 | 99.94 | |
CNN model_3 | 97.43 | 97.46 | 97.43 | 97.43 | 99.91 | |
Stack model | 97.06 | 97.09 | 97.06 | 97.06 | 99.92 | |
Kaggle | CNN model_1 | 95.35 | 95.37 | 95.35 | 95.35 | 99.88 |
CNN model_2 | 97.19 | 97.21 | 97.19 | 97.19 | 99.92 | |
CNN model_3 | 97.22 | 97.21 | 97.19 | 97.19 | 99.92 | |
Stack model | 96.14 | 96.18 | 96.14 | 96.14 | 99.90 |
Data Source | Model Name | Training Accuracy (%) | Validation Accuracy (%) | Testing Accuracy (%) | Epoch |
---|---|---|---|---|---|
PUST | CNN model_1 | 99.19 | 93.87 | 93.87 | 29/150 |
CNN model_2 | 99.53 | 97.12 | 97.12 | 48/150 | |
CNN model_3 | 99.66 | 97.43 | 97.43 | 32/150 | |
Stack model | 99.37 | 97.58 | 97.57 | 32/150 | |
Kaggle | CNN model_1 | 99.07 | 95.35 | 93.35 | 26/150 |
CNN model_2 | 99.12 | 97.19 | 97.18 | 32/150 | |
CNN model_3 | 99.24 | 97.19 | 99.18 | 23/150 | |
Stack model | 99.26 | 96.14 | 96.14 | 15/150 |
PUST | Kaggle | ||||
---|---|---|---|---|---|
Ranking | Model Name | Accuracy (%) | Ranking | Model Name | Accuracy (%) |
First | CNN model_3 | 97.43 | First | CNN model_3 | 97.22 |
Second | CNN model_2 | 97.12 | Second | CNN model_2 | 97.19 |
Third | Stack model | 97.06 | Third | Stack model | 96.14 |
Fourth | CNN model_1 | 93.87 | Fourth | CNN model_1 | 95.35 |
Authors | Dataset | Digit Data | Algorithm | Highest Result |
---|---|---|---|---|
M. Adnan et al. [43] | CMATERdb | Bengali | VGG Net, ResNet, FractalNet, DenseNet, Nin, All-Conv | They compared the results of all algorithms based on the testing results. The DenseNet had the highest accuracy of 99.13%. |
Basu et al. [17] | CVPR unit, ISI, Kolkata | Bengali | Multilayer perceptron one, Multilayer perceptron two, Dempster–Shafer (DS) | On average, the DS algorithm achieved the highest recognition rate of 95.1%. |
U. Pal et al. [44] | Own dataset | Bengali | Thinning- and normalization-free automatic recognition | The overall recognition rate was 91.98%. |
Y. Wen et al. [45] | Supported by Bangladesh Post | Bengali | Original + SVM, PCA + SVM, KPCA+SVM, IRPCA, KPS, PCA + SVM + IRPCA + KPS | The average recognition rate attained by combining PCA + SVM + IRPCA + KPS was 95.05%. |
T. Hassan et al. [46] | CMATERdb | Bengali | KNN, ANN, SVM | The highest accuracy of 96.7% was achieved by ANN. |
O. Paul et al. [37] | NumtaDB | Bengali | CNN, CapsNet, KNN, KNN with PCA, SVM, SVM with PCA, LR, LR with PCA, DT | The CNN model achieved the highest accuracy of 91.30%. |
Y. Wen et al. [47] | Dataset was acquired from live letters by the automatic letters sorting machine in the Dhaka mail processing center of Bangladesh Post Office | Bengali | Euclidean Distance, BP, SVM, RIPCA, KPS, BD, KBD-P, KBD-G | Among all classifiers, the KBD-G had the best accuracy of 96.91%. |
S. Basu et al. [48] | CVPR unit, Indian Statistical Institute, Kolkata and CMATER, Jadavpur University | Bengali | MLP, SVM | The MLP and SVM algorithms on a variety of handcrafted feature extractor methods reached 97.15% accuracy. |
C. Saha et al. [49] | BanglaLekha-Isolated dataset | Bengali | Convolution Neural Network | The recognition performance for the training, testing, and validation accuracy was 93.29%, 98.37%, and 96%, respectively. |
U. Bhattacharya [50] | Devanagri numeral Bengali dataset | Bengali | Three-stage multilayer perceptron | The proposed system had the highest training accuracy of 99.26% and the highest testing accuracy of 98.01%. |
Our proposed approach | PUST dataset | Bengali | Proposed CNN model and their stacked model | On this dataset, our proposed customized CNN model three gave the best training accuracy of 99.66%, testing accuracy of 97.43%, and validation accuracy of 97.43%. |
(continued) | Kaggle dataset | Bengali | Proposed CNN model and their stacked model | Among the entire assessment, the stacked model provided the very impressive and robust training accuracy of 99.26%, testing accuracy of 96.14%, and validation accuracy of 96.14%. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Amin, R.; Reza, M.S.; Okuyama, Y.; Tomioka, Y.; Shin, J. A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition. Electronics 2023, 12, 3337. https://doi.org/10.3390/electronics12153337
Amin R, Reza MS, Okuyama Y, Tomioka Y, Shin J. A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition. Electronics. 2023; 12(15):3337. https://doi.org/10.3390/electronics12153337
Chicago/Turabian StyleAmin, Ruhul, Md. Shamim Reza, Yuichi Okuyama, Yoichi Tomioka, and Jungpil Shin. 2023. "A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition" Electronics 12, no. 15: 3337. https://doi.org/10.3390/electronics12153337
APA StyleAmin, R., Reza, M. S., Okuyama, Y., Tomioka, Y., & Shin, J. (2023). A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition. Electronics, 12(15), 3337. https://doi.org/10.3390/electronics12153337