Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning
Abstract
:1. Introduction
- A light-weighted feature stacking-based network is presented to simply, rapidly, and precisely predict multi-size Chinese characters in old documents. By stacking feature maps gradually, we achieve accurate predictions. Furthermore, we are freed of anchor candidates by looking for the center of the character and then regressing to the four corners of the bounding box, which reduces time-wasting issues and makes for simpler training.
- Due to the lack of datasets with Chinese characters in historical documents, we worked collaboratively with a team from Kyungpook National University to collect and analyze a new dataset containing tough challenges that were then used for training and testing our model.
- The proposed algorithm provided great results that beat state-of-the-art methods in terms of accuracy and efficiency.
2. Related Work
- Text different-level detection in old documents Peng [6] studied the fully convolutional network for segmentation and recognition of Chinese text in an end-to-end manner. Another end-to-end style was applied by Peng [7] himself for page-level recognition of handwritten Chinese text using weakly supervised learning. Ma [8] analyzed joint layout detection and recognition for historical document digitization of old characters. However, all previously mentioned methods ignored the challenging detection at the character level, which we considered in this work.
- Chinese character-level detection in old documents Wang et al. [3] proposed a weakly supervised learning method in a character classifier for over-segmentation-based handwritten Chinese text recognition. Using a two-stage convolutional network, each line is over-segmented into a sequence of parts that are integrated to produce the character candidates. Afterward, a recognition score is set for each character class to produce a result for a recognized string. Aleskerova and Zhuravlev [1] used a hierarchical classifier of two-stage to solve the problem of a high number of Chinese character classes. First, all classes of similar features are grouped into one cluster and trained by the first-stage network to determine the number of groups. Then, the errors obtained are corrected by the second-stage classifier in order to assign correct labels to corresponding classes. Zhu et al. [2] took advantage of over-segmentation and a CRNN with an attention-based method to investigate one-to-many attention problems over character recognition output.
- Liu et al. [12] improved the detection of Oracle characters by embedding feature fusion at different levels based on ResNet101 as the backbone feature extractor. Zheng et al. [13] received strings as an input sequence and then attached the features of each character and word level to extract local features of various sizes. Finally, using a deep pyramid structure, they can capture global features. Yuan et al. [14] added a so-called ‘Gate’ after each feature map before uniting it to extract powerful features and remove any existing noise. Such features captured by a gate-based layer are more effective.
3. Methodology
3.1. Network Architecture
3.2. Bounding Box Generation
3.3. Loss Function Head
4. Experiments
4.1. Datasets
4.2. Training Details
4.3. Ablation Study
4.3.1. Reliability of the Center Heatmap
4.3.2. Significance of the Regression to Four Corners
4.3.3. Reliability of Locations and Scale Variations
4.3.4. Reliability of the Feature Map Concatenation
4.4. Comparisons with SOTAs and Investigations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mubarok, A.; Nugroho, H. Handwritten character recognition using hierarchical graph matching. In Proceedings of the 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Malang, Indonesia, 15–16 October 2016; pp. 454–459. [Google Scholar] [CrossRef]
- Zhu, Z.Y.; Yin, F.; Wang, D.H. Attention Combination of Sequence Models for Handwritten Chinese Text Recognition. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 288–294. [Google Scholar] [CrossRef]
- Wang, Z.X.; Wang, Q.F.; Yin, F.; Liu, C.L. Weakly Supervised Learning for Over-Segmentation Based Handwritten Chinese Text Recognition. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 157–162. [Google Scholar] [CrossRef]
- Droby, A.; Barakat, B.K.; Madi, B.; Alaasam, R.; El-Sana, J. Unsupervised Deep Learning for Handwritten Page Segmentation. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 240–245. [Google Scholar] [CrossRef]
- Ryu, J.; Kim, S. Chinese Character Detection Using Modified Single Shot Multibox Detector. In Proceedings of the 2018 18th International Conference on Control, Automation and Systems (ICCAS), PyeongChang, Korea, 17–20 October 2018. [Google Scholar]
- Peng, D.; Jin, L.; Wu, Y.; Wang, Z.; Cai, M. A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In Proceedings of the ICDAR, Sydney, Australia, 20–25 September 2019. [Google Scholar]
- Peng, D.; Jin, L.; Liu, Y.; Luo, C.; Lai, S. PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition. Int. J. Comput. Vis. 2022, 130, 2623–2645. [Google Scholar] [CrossRef]
- Ma, W.; Zhang, H.; Jin, L.; Wu, S.; Wang, J.; Wang, Y. Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 31–36. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; MICCAI 2015. Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351. [Google Scholar] [CrossRef] [Green Version]
- Feng, S.; Fan, Y.; Tang, Y.; Cheng, H.; Zhao, C.; Zhu, Y.; Cheng, C. A Change Detection Method Based on Multi-Scale Adaptive Convolution Kernel Network and Multimodal Conditional Random Field for Multi-Temporal Multispectral Images. Remote Sens. 2022, 14, 5368. [Google Scholar] [CrossRef]
- Wang, T.; Xu, X.; Xiong, J.; Jia, Q.; Yuan, H.; Huang, M.; Zhuang, J.; Shi, Y. ICA-UNet: ICA Inspired Statistical UNet for Real-Time 3D Cardiac Cine MRI Segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2020; MICCAI 2020. Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12266. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, X.; Yang, C.; Liu, J.; Yao, X.; Xu, Z.; Guan, Y. Oracle character detection based on improved Faster R-CNN. In Proceedings of the 2021 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Xi’an, China, 27–28 March 2021; pp. 697–700. [Google Scholar] [CrossRef]
- Zheng, F.; Yan, Q.; Leung, V.C.M.; Yu, F.R.; Ming, Z. HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection. Comput. Secur. 2022, 114, 102584. [Google Scholar] [CrossRef]
- Yuan, J.; Xiong, H.C.; Xiao, Y.; Guan, W.; Wang, M.; Hong, R.; Li, Z.Y. Gated CNN: Integrating multi-scale feature layers for object detection. Pattern Recognit. 2020, 105, 107131. [Google Scholar] [CrossRef]
- Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. East: An efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5551–5560. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 July 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
- Zhang, F.; Zhu, X.; Dai, H.; Ye, M.; Zhu, C. Distribution-Aware Coordinate Representation for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7093–7102. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. Arxiv 2019, arXiv:1904.08189. Available online: http://arxiv.org/abs/1904.08189 (accessed on 26 June 2020).
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. Arxiv 2017, arXiv:1708.02002. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. 2017. Available online: https://openreview.net/forum?id=BJJsrmfCZ (accessed on 26 June 2020).
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. Arxiv 2014, arXiv:1412.6980. [Google Scholar]
- Shrivastava, A.; Gupta, A.; Girshick, R. Training regionbased object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Le, A.D.; Clanuwat, T.; Kitamoto, A. A Human-Inspired Recognition System for Pre-Modern Japanese Historical Documents. IEEE Access 2019, 7, 84163–84169. [Google Scholar] [CrossRef]
- Xiao, X.; Jin, L.; Yang, Y.; Yang, W.; Sun, J.; Chang, T. Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recognit. 2017, 72, 72–81. [Google Scholar] [CrossRef] [Green Version]
- Melnyk, P.; You, Z.; Li, K. A high-performance CNN method for offline handwritten Chinese character recognition and visualization. Soft Comput. 2019, 24, 7977–7987. [Google Scholar] [CrossRef] [Green Version]
- Alnaasan, M.; Kim, S. FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection. Sensors 2021, 21, 1424–8220. [Google Scholar] [CrossRef] [PubMed]
- Aleskerova, N.; Zhuravlev, A. Handwritten Chinese Characters Recognition Using Two-Stage Hierarchical Convolutional Neural Network. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 343–348. [Google Scholar] [CrossRef]
- Ryu, J.; Kim, S. Chinese Character Boxes: Single Shot Detector Network for Chinese Character Detection. Appl. Sci. 2019, 9, 2076–3417. [Google Scholar] [CrossRef] [Green Version]
- Ueki, K.; Kojima, T.; Mutou, R.; Nezhad, R.S.; Hagiwara, Y. Recognition of Japanese Connected Cursive Characters Using Multiple Softmax Outputs. In Proceedings of the 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020; pp. 127–130. [Google Scholar] [CrossRef]
Position IOU | MR/FPPC (%) | |
---|---|---|
0.5 | 0.7 | |
Center | 4.56 | 35.44 |
Pair of corners | 6.86 | 29.61 |
Left-top corner | 7.90 | 42.52 |
Right-bottom corner | 8.23 | 45.22 |
Regression Technique IOU | MR/FPPC (%) | |
---|---|---|
0.5 | 0.7 | |
Four corners | 4.56 | 35.44 |
Two corners | 5.98 | 42.61 |
One corner | 6.99 | 46.73 |
Feature Maps for Prediction IOU | Blend Connections | MR/FPPC (%) | |
---|---|---|---|
0.5 | 0.7 | ||
S1-1 | 5.65 | 32.09 | |
S2 | 5.02 | 37.71 | |
◯ | 4.82 | 29.13 | |
S3 | 7.44 | 56.53 | |
◯ | 6.48 | 34.91 | |
S4 | 21.34 | 77.56 | |
◯ | 8.34 | 35.74 |
Feature Maps | No. Parameters MB | MR/FPPC (%) | Test Time (ms/Image) |
---|---|---|---|
10.7 | 11.12 | 40.2 | |
22.8 | 6.02 | 45.3 | |
40.4 | 6.01 | 52.5 | |
23.1 | 7.43 | 50.0 | |
46.0 | 4.93 | 60.1 | |
45.6 | 4.82 | 62.2 |
Algorithm | Backbone | Small-Scale MR/FPPC (%) | Medium-Scale MR/FPPC (%) | Large-Scale MR/FPPC (%) | Multi-Scale MR/FPPC (%) |
---|---|---|---|---|---|
A human-inspired recognition system [28] | DenseNet | 6.31 | 6.23 | 5.35 | - |
HCCR-CNN12layer [29] | LeNet | - | 5.62 | 5.35 | - |
GWOAP [30] | CNN | 9.05 | 6.20 | 6.09 | - |
FAN-MCCD [31] | ResNet-52 | 4.90 | 4.71 | 4.98 | 4.95 |
Two-stage hierarchical deep CNN [32] | CNN | - | 4.99 | 5.01 | - |
CCB-SSD [33] | ResNet-34 | 7.38 | 6.94 | 6.82 | 6.70 |
Recognition of Japanese Connected Cursive Characters [34] | CNN | 7.75 | 7.56 | 6.91 | - |
FC-MSCCD (ours) | ResNet-52 | 4.51 | 4.63 | 4.82 | 4.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alnaasan, M.; Kim, S. Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning. Sensors 2023, 23, 2305. https://doi.org/10.3390/s23042305
Alnaasan M, Kim S. Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning. Sensors. 2023; 23(4):2305. https://doi.org/10.3390/s23042305
Chicago/Turabian StyleAlnaasan, Manar, and Sungho Kim. 2023. "Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning" Sensors 23, no. 4: 2305. https://doi.org/10.3390/s23042305
APA StyleAlnaasan, M., & Kim, S. (2023). Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning. Sensors, 23(4), 2305. https://doi.org/10.3390/s23042305