1. Introduction
Nüshu [
1,
2] is a writing system unique to Jiangyong County, Hunan Province, China, and is also known as “women’s script” or “Nüshu characters”. It originates from folk literature and women’s traditions, expresses women’s emotions and psychology, and has unique artistic and cultural values. Nüshu characters are concise, beautiful, and highly artistic, and are used for communication and inheritance among women. The style and content of Nüshu are also regarded as symbols and expressions of women’s culture [
3]. In 2006, Nüshu culture was included in the first batch of the National Intangible Cultural Heritage List approved by the State Council. With the development of and changes in society, and at the same time influenced by foreign cultures, Nüshu faces the threat of gradual marginalization and even extinction [
4]. Therefore, it is necessary to use information and digital technology to protect the cultural heritage of Nüshu.
Figure 1 shows Nüshu-related works, including calligraphy, fans, and cloth patches.
Due to the fact that Nüshu is usually only learned and passed down by women, very few people are able to recognize Nüshu characters, resulting in enormous pressure to protect and pass on Nüshu. In recent years, researchers of Nüshu have studied and organized the basic Nüshu characters, which amount to less than 500, through the application of character position theory. Currently, there are few achievements in the recognition of handwritten Nüshu characters and no publicly available Nüshu dataset for use in information digitization technology. In recent years, with the rapid development of deep learning technology, deep learning models have achieved outstanding results in image classification and recognition. Increasingly, researchers have attempted to apply convolutional neural networks (CNNs) to text recognition tasks. Jeow [
5] proposed a text classification algorithm that uses dual-modal information extraction and long short-term memory recurrent neural networks. Liu Min et al. [
6] applied deep learning models to the recognition of oracle bone inscriptions and achieved good results. Aneja et al. [
7] proposed the use of deep convolutional neural networks in transfer learning for the recognition of handwritten Sanskrit characters, achieving an accuracy rate of 98%. It can be seen that deep learning algorithms have been widely used in the field of text recognition, and in order to better preserve the culture of Nüshu, this paper will also explore the research of deep learning in the recognition of handwritten characters in Nüshu.
In this paper, we explore the application of CNNs in the recognition of handwritten Nüshu characters. Our main contributions are summarized as follows:
- (1)
In order to solve the problem of scarce resources for Nüshu datasets, in this paper, we first established a reliable, clear image and large-scale handwritten Nüshu dataset, called HWNS2023 (handwritten Nüshu 2023). This dataset contains ten characters from the Nüshu international coding standard character set 1B180 to 1B189, with a total of 2364 samples. Considering the diversity of handwriting styles and different lighting conditions of handwritten Nüshu characters, we also use rotation, brightness adjustment, and noise addition to expand the dataset to approximately 6800 images. The HWNS2023 dataset can be found on
https://www.kaggle.com/datasets/yanz0409/hwns2023 (accessed on 1 June 2023).
- (2)
A scheme for handwritten Nüshu recognition (SGooTY) is proposed in order to aid in the preservation and organization of cultural heritage. The scheme includes two deep learning models to achieve better Nüshu character detection and recognition. In the first stage, an improved GoogLeNet-tiny model is applied to identify and classify handwritten Nüshu character. However, there may still be some Nüshu images that are not recognized. In the second stage, YOLOv5-CBAM is used for image recognition of undetected handwritten Nüshu character images, and the recognition results are outputted in the end. The proposed system achieves a recognition accuracy of 99.9% on the WHNS2023 dataset and effectively solves the problem of handwritten Nüshu character recognition.
- (3)
Five classic recognition methods, including AlexNet, VGGNet16, GoogLeNet, MobileNetV3, and ResNet were compared in experiments, and the two models with the best results, AlexNet and GoogLeNet, were selected. New models AlexNet-BN, AlexNet-SC, AlexNet-LR, and GoogLeNet-tiny were generated by adjusting network parameters and modifying network layers. Various optimization strategies were used to train and test these new models, and the new models were compared and analyzed to determine the first-stage recognition model. In the second stage, YOLOv3 [
8], YOLOv5, and YOLOv7 [
9] were trained and tested, and the results were compared to determine that the second-stage network model is YOLOv5. An attention mechanism was introduced into the YOLOv5 backbone network to improve the recognition accuracy of the model.
This paper proposes a two-stage recognition scheme based on convolutional neural networks for recognizing handwritten Nüshu characters, and improves the recognition models in both stages to effectively recognize handwritten Nüshu characters. The structure of this paper is organized as follows. The first part introduces the research background of this paper. The second part discusses related work on character recognition based on convolutional neural networks. The third part introduces the activation function, BN function, and CBAM attention mechanism in the model improvement. The fourth part introduces the dataset used in the experiment, related work on model improvement, and model training optimization methods. The fifth part describes the experimental design, experimental environment, and model training process. The sixth part presents the experimental results. The seventh part concludes and summarizes the research of this paper.
2. Related Work
The recognition technology for Nüshu characters is traditionally divided into holistic text recognition and single character recognition. While holistic techniques in characters such as Roman letters refer to word recognition, in Nüshu or other cases, these techniques refer to partial characters or ligatures (as character boundaries are difficult to determine in Nüshu text). On the other hand, single character recognition refers to character-level recognition in language text, and this classification is commonly used for both printed and handwritten text.
Handwriting recognition has always been one of the most researched topics in computer vision. The problem has been studied extensively for several decades, and many handwriting recognition systems have gradually matured. Handwriting recognition is usually limited by the accuracy of the preceding text recognition and segmentation steps. Inspired by this, Wigington et al. [
10] proposed a deep learning model that jointly learns text detection, segmentation, and recognition using images without detection or segmentation annotations. Ptucha et al. [
11] proposed a fully convolutional network architecture that outputs symbol streams of arbitrary length from handwritten text. Although promising, there are several limitations to the current results, including computational complexity, dependence on the classifier used, and difficulty in assessing interactions between features. Cilia et al. [
12] attempted to overcome some of these drawbacks by adopting a feature-ranking-based technique, considering different univariate measures to generate feature rankings, and proposing a greedy search method to select a subset of features that can maximize classification results. In addition, Choudhury et al. [
13] proposed an online handwriting representation method using a multicomponent sine wave model. The multicomponent sine wave model was proposed as a model-based representation for online pen strokes. Pashine et al. [
14] performed handwritten digit recognition using support vector machine (SVM), multilayer perceptron (MLP), and convolutional neural network (CNN) models using the MNIST dataset.
With the continued development of deep learning, researchers have applied it to various handwriting recognition tasks. Ly et al. [
15] proposed an end-to-end deep convolutional recurrent network (DCRN) model for offline handwritten Japanese text line recognition. Majid et al. [
16] proposed an offline Bengali handwriting recognition system using Faster R-CNN for sequential recognition of characters and diacritics. Ali et al. [
17] studied intelligent handwriting recognition using a mixed SVM classifier based on CNN architecture and dropout. The authors modeled deep learning architectures that can effectively recognize Arabic handwriting. In addition, Carbune et al. [
18] described an online handwriting system that can support 102 languages using a deep neural network architecture.
Despite the development of many sophisticated handwriting recognition systems, Nüshu character recognition remains a challenging problem because humans can write the same information in almost infinite ways. Currently, researchers mainly rely on traditional machine learning methods to learn the features of Nüshu characters by artificially extracting them. For example, Hei et al. [
19] proposed an offline handwritten Nüshu character image multi-directional text line extraction method to extract text lines in different directions from Nüshu character images on different media such as fans and paper. However, this method cannot recognize individual Nüshu characters. To obtain individual Nüshu character images, Sun et al. [
20] proposed a Nüshu character segmentation algorithm based on a local adaptive thresholding technique. This method can automatically obtain local thresholds, avoid the loss of character image information, and improve the accuracy of Nüshu character image segmentation. In practical image segmentation, this method can effectively reduce the effect of background noise. Wang et al. [
21] proposed a statistical–structural character learning algorithm based on hidden Markov models, which considers the stroke relationship of handwritten Nüshu characters to recognize their features, and achieved impressive performance in handwritten Nüshu character recognition.
In recent years, artificial intelligence technologies such as machine learning and deep learning have made breakthroughs, benefiting from their powerful feature extraction capabilities. Deep learning has been widely applied to text recognition, resulting in the development of many outstanding algorithms. However, no researchers have attempted to apply deep learning to Nüshu character recognition tasks. Therefore, this paper attempts to explore the application of convolutional neural networks to the recognition and study of Nüshu characters.
7. Conclusions
The complex character shapes and high stroke count of handwritten Nüshu characters make recognition difficult. In this paper, we first established the HWNS2023 dataset for handwritten Nüshu characters and explored several data augmentation methods. We then proposed a two-stage handwritten Nüshu recognition scheme that utilizes two deep learning models. In the first stage, to ensure recognition accuracy, we evaluated five different models to determine which one was best suited for Nüshu recognition and improved and assessed the selected model, ultimately determining that the GoogLeNet-tiny model was the best for stage one recognition. However, due to the limitations of classification models, not all characters can be correctly detected and recognized. Therefore, in the second stage, we used YOLOv5-CBAM as the backbone network to perform secondary recognition on misidentified Nüshu handwritten characters. However, the small size of our dataset currently limits the effectiveness of the proposed method, and expanding its scale will be a key focus of future work. The recognition method proposed in this paper lays the foundation for the construction of a mobile Nüshu recognition system and provides new ideas for the protection and inheritance of Nüshu culture, which is of great significance for solving practical problems in Nüshu character recognition in real-world scenarios.