Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction

Lang, Qing; Zhang, Caifeng; Qi, Hengnian; Du, Yaqin; Zhu, Xiaorong; Zhang, Chu; Li, Mizhen

doi:10.3390/su15032393

Open AccessArticle

Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction

by

Qing Lang

¹,

Caifeng Zhang

²,

Hengnian Qi

^2,*

,

Yaqin Du

²,

Xiaorong Zhu

²,

Chu Zhang

²

and

Mizhen Li

³

¹

Huzhou University Library, Huzhou University, Huzhou 313000, China

²

School of Information Engineering, Huzhou University, Huzhou 313000, China

³

Department of Training, China Language & Culture Press, Beijing 100010, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(3), 2393; https://doi.org/10.3390/su15032393

Submission received: 20 December 2022 / Revised: 24 January 2023 / Accepted: 26 January 2023 / Published: 28 January 2023

(This article belongs to the Special Issue E-learning Personalization Systems and Sustainable Education)

Download

Browse Figures

Versions Notes

Abstract

:

Using AI technology to improve teaching and learning is an important goal of educational sustainability. By mining the correlation between knowledge points, the discrete knowledge points can be integrated to improve the knowledge density and reduce the learning task. In addition, the successful experiences of similar learners can be shared, thus shortening the learning path of new learners. To change the common situation of irregular writing stroke order, to teach and correct stroke order effectively, this study uses association rules to explore the potential correlation between error-prone Chinese characters based on a large number of learners’ writing records, and then summarizes and sorts out a set of error-prone Chinese characters based on this. Every Chinese character contained in an error-prone category has a common error-prone feature. By correcting this error, it can be extended to every Chinese character of this category, and the learning efficiency of Chinese character strokes can be improved tens of times. In the training and testing system with a Chinese character error-prone character set, combined with the improved collaborative filtering algorithm, a learner-based personalized error-prone Chinese character recommendation model was proposed. Experimental results showed that the Apriori algorithm with lift measure can excavate effective strong association rules and provide an important reference for the character set table. The improved collaborative filtering algorithm can make use of the similarity between learners, share successful learning experiences, provide a personalized recommendation service for error-prone Chinese characters, and the recommendation performance is higher than that of the traditional collaborative filtering model. In the test of different types of learning groups, there are obvious differences between the independent pre-test and the post-training test, which effectively corrects the irregular writing habits, and further indicates that the excavation of knowledge correlation and the combination of learners’ similarity can effectively improve the efficiency and effect of teaching and learning.

Keywords:

educational sustainability; Chinese character stroke order; association rules; collaborative filtering; personalized learning

1. Introduction

Chinese characters are the carriers of Chinese history and culture, and have a rich traditional cultural heritage. The standardization of stroke order is the most basic part of the writing process, but it is also the part that people most easily overlook. Mastering the standard stroke order not only helps to correct the wrong writing habits, but also adapts to the physiological function of the wrist, making the written characters more proportional, balanced and beautiful, and improving the speed of writing to a certain extent [1]. According to statistics, among the more than 260,000 contestants in the competition Brush and Ink in China sponsored by the Ministry of Education of China in 2020 and 2021, 79.24% of them had problems with stroke order. However, as typical morpheme characters [2], the total number of Chinese characters is very large, and it is very difficult for learners to learn and correct the wrong stroke order character by character. Therefore, how to effectively learn the standard stroke and correct the wrong stroke is an important research topic of sustainable education.

Research has shown that computer technology used in the teaching process can provide students with new learning experiences and more effective learning, which can lead to sustainable education. For example, Dillenbourg P et al. [3] investigated how MOOC study groups watch videos together under different configurations. The results show that watching MOOCs in groups provides highly satisfying learning experience as learners feel connected and interactions among them are enabled, which reveals that collaborative learning with the help of computer technology can increase students’ sense of participation and improve learning efficiency. Dillenbourg P et al. [4] also captured students’ behavioral patterns through analysis of sequential interaction logs, which enabled more effective and personalized support during the learning processes. Troussas C et al. [5] presented a fully operating and evaluated adaptive and intelligent e-learning system for second language acquisition, which provided each student with a unique educational experience. Furthermore, the inference system utilized the knowledge inference relationship between the learning objects and created a personalized learning environment for each student.

It can be inferred from these studies that the successful experiences of similar learners can be shared to shorten the learning path of new learners. In addition, by mining the correlation between knowledge points, the discrete knowledge points can be integrated into integrated knowledge points, so as to improve the knowledge density and reduce the learning task. These two points will help to establish an effective and personalized learning system. According to this idea, we take the learning of Chinese character stroke order knowledge as a case study of e-learning personalization system and try to use artificial intelligence technology to improve the efficiency of learning standard stroke order and correcting wrong stroke order.

Chinese character forms consist of limited basic strokes and components, so the writing of different characters is necessarily related. Such relationships can be obtained through the mining association rule. The experience of Chinese character writers can be shared and learned from, and the group writing experience can be promoted through collaborative filtering technology. Association rule mining is a typical data mining technique that has been widely used in the field of computer-assisted education. Ding Jihong et al. [6] achieved accurate recommendation of learning resources based on association rules in a big data environment, enhancing the experience of online learning. Zhang L et al. [7] applied association rule mining techniques to teaching information management in universities, which shows that data mining is helpful for teaching management. However, there is a paucity of research on the incorporation of association rules into Chinese character writing teaching techniques. Collaborative filtering is one of the core techniques used in recommendation systems, mainly by calculating preference information between similar users and then predicting what other users might be interested in [8].

As a typical study case, this paper addressed the learning need for efficient correction of Chinese writing stroke order, forming an error-prone character set through mining the correlation between error-prone Chinese characters and recommending exercises through an optimized collaborative filtering algorithm. A systematic stroke order correction method and system were achieved. To verify the effectiveness of the personalized stroke correction algorithm, we conducted special tests and effect evaluations on two different groups of learners. The experiments show that our method can effectively achieve personalized correction of Chinese characters’ stroke order and has effectiveness in teaching stroke order standardization to different groups, which can advance sustainable education.

2. Personalized Chinese Character Stroke Order Correction Algorithm

Based on the massive writing records of learners, combined with data mining technology, this paper constructs a personalized Chinese character stroke order correction algorithm. As shown in Figure 1, the algorithm is divided into two stages. The first stage constructed a unique error-prone Chinese character set library based on the Apriori algorithm that introduced lift, which provided important support for realizing the second stage of error-prone Chinese character recommendation. The second stage introduced learner-based collaborative filtering-inverse item frequence based on the error-prone Chinese character set library, which recommended effective experiences for learners by calculating their similarity. In general, association rules were used to explore the potential relationships between Chinese characters. On this basis, the error-prone Chinese character set library is summarized. Then, based on the improved collaborative filtering algorithm, a personalized error-prone Chinese character recommendation model based on user was proposed, which took learners and error-prone Chinese character set as the core, and a complete personalized Chinese character stroke correction algorithm was constructed.

2.1. Apriori Algorithm with Lift Measure

As a data mining algorithm based on association rules [9], the Apriori algorithm can analyze valuable information from the writing records of different learners, and reflect the writing situation of most learners. It is an important means to summarize the types of error-prone Chinese characters and then generate the error-prone Chinese character set table.

Assume that the error-prone Chinese character data set D contains all the incorrect characters in the database. The non-empty item set Q represents a learner written record, an item set composed of several Chinese characters [10]. Let X and Y be two error-prone Chinese character sets in learner written records Q,

X \subseteq Q

and

Y \subseteq Q

. If there is

X \neq Ø

,

Y \neq Ø

, and

X \cap Y = Ø

, then

X \Rightarrow Y

constitutes an error-prone Chinese character association rule in learner written record D. The effectiveness of the association rules of the Apriori algorithm is usually measured by the support and confidence [11]. Support refers to the percentage of the number of characters X and Y appearing simultaneously in the total characters in pre-processed writing record dataset C, and is denoted as support (

X \Rightarrow Y

), as shown in Equation (1). Confidence is the percentage of the number of characters X and Y to the number of characters X in pre-processed dataset C, denoted as confidence(

X \Rightarrow Y

), as shown in Equation (2). Where

count (X \cup Y)

is the number of characters X and Y that can occur simultaneously, and

support (X)

is the percentage of the number of characters X in dataset C.

support (X \Rightarrow Y) = P (X \cup Y) = \frac{count (X \cup Y)}{count (C)}

(1)

confidence (X \Rightarrow Y) = P (Y | X) = \frac{support (X \Rightarrow Y)}{support (X)}

(2)

The traditional Apriori algorithm uses two evaluation indexes: support and confidence, for rule filtering, and many of the association rules mined are invalid. To address the shortcomings of the support–confidence framework, we introduce lift [12,13] to further filter the mined association rules. The lift refers to the ratio of the probability of the occurrence of character Y in the condition of the existence of character X to the probability of the occurrence of character Y without the existence of character X, reflecting the correlation between X and Y, as shown in Equation (3).

lift (X \Rightarrow Y) = \frac{P (Y | X)}{P (Y)} = \frac{support (X \Rightarrow Y)}{support (X) \cdot support (Y)}

(3)

Support (Y)

is the percentage of the number of characters Y in the index data to the whole data set D. The value range of lift is [0, +∞]. When the lift is greater than 1, it indicates that the appearance of character X promotes the appearance of character Y, which is called the positive correlation rule. When the lift is equal to 1, it indicates that the simultaneous occurrence of characters X and Y is an independent random event, and this rule is called irrelevant rule. When the lift is less than 1, it indicates that the occurrence of character X reduces the probability of occurrence of character Y, which is called negative correlation rule.

Therefore, the Apriori algorithm which introduces list measure can extract meaningful association rules from the massive Chinese writing records and summarize them into the error-prone Chinese character set table, providing support for the subsequent effective recommendation of error-prone Chinese characters.

2.2. Learner-Based Collaborative Filtering-Inverse Item Frequence

By calculating the similarity of learners, learners can be recommended incorrectly written Chinese characters of similar learners; thus, the experience of other learners can be effectively utilized to enhance learning efficiency.

Firstly, it can make statistics according to the number of errors of different Chinese characters made by learners. In general, the more frequently learners make mistakes in a Chinese character, the more likely the Chinese character is to be written wrong. Thus, the scoring matrix of learners for Chinese characters is established. By calculating the similarity of different learners, the nearest neighbor set of the current learner is established, and the error-prone degree of different Chinese characters is ranked according to the learners in the nearest neighbor set, to obtain the recommendation of the current learners’ error-prone characters. Jaccard similarity [14] and cosine similarity [15] can be used to calculate the similarity between different learners. The calculation formulas are shown in Equations (4) and (5):

w_{u, v, J} = \frac{| N (u) \cap N (v) |}{| N (u) \cup N (v) |}

(4)

w_{u, v, C} = \frac{| N (u) \cap N (v) |}{\sqrt{| N (u) | | N (v) |}}

(5)

N (u)

refers to the collection of wrong Chinese characters written by learner with number u.

N (v)

refers to the collection of wrong Chinese characters written by learner with number v.

However, neither of the two similarity calculation methods mentioned above can avoid the influence of high-frequency Chinese characters’ handwriting errors. It means that many error-prone Chinese characters have similar problems with a common error-prone Chinese character. Therefore, it is necessary to improve the similarity degree. When two learners have the same writing errors for certain low-frequency characters, it is more indicative of similarity between the two learners. Therefore, we introduce user-based Collaborative filtering-inverse Item Frequence (UserCF-IIF) into the cosine similarity calculation formula [14] and penalize the effect of common error-prone characters in the learner and error-prone character sets on similarity. In this case, the improved Jaccard similarity and cosine similarity formulas are shown in Equations (6) and (7):

w_{u, v} = \frac{\sum_{i \in N (u) \cap N (v)} \frac{1}{\log (1 + N (i))}}{| N (u) \cup N (v) |}

(6)

w_{u, v} = \frac{\sum_{i \in N (u) \cap N (v)} \frac{1}{\log (1 + N (i))}}{\sqrt{| N (u) | | N (v) |}}

(7)

The note

i

indicates the number of error-prone Chinese characters. Finally, the recommendation analysis of relevant error-prone Chinese characters is realized by sorting the similarity degree of learners.

3. Experimental Results and Analysis of Association Rules and Collaborative Filtering

This section took the writing records of the competition Brush and Ink in China as the data source. For example, a participant’s error writing record is “连, 迈, 莲, 房, and 剪”. In order to extract more effective data, data source and pretreatment methods are introduced in detail in Section 3.1. In Section 3.2, the improved association rule algorithm was used to mine and analyze typical Chinese characters, and the correlation strength between different Chinese characters and error types was calculated, and the error-prone Chinese characters set library was summarized. In Section 3.3, by comparing various traditional collaborative filtering algorithms, it proved the effectiveness of the improved collaborative filtering algorithm in recommending Chinese characters that learners were interested in.

3.1. Data Pre-Processing

The data was based on the standardized writing questions from the 2020 and 2021 Competition—Brush and Ink in China, which was targeted at teachers, students in schools and colleges, and members of the community across the country. One hundred and thirty-seven error-prone Chinese characters were selected as the question bank of standard writing. We randomly sampled 20,000 participants from 2020 and 2021 to write record data as the research object. The specific data pre-processing operation steps are shown below.

Data Cleaning: Delete missing data to complete data cleaning. Since some participants left several questions empty without answering them directly, resulting in vacant answer data and wrong judging data, these records need to be deleted. In addition, the purpose of the research is to mine information about Chinese characters, so redundant data such as participants’ cell phone numbers and titles were deleted.
Data Integration: Multiple sub-data were integrated into one data file and duplicate records were removed to resolve data redundancy. Since there may be multiple submissions by a participant resulting in the data records being saved multiple times, these duplicate data need to be removed to avoid data redundancy.
Data Conversion: The form of data is subject to the requirements of the algorithm, and the data used for mining needs to be processed by data conversion. Since character data is generally not directly used as input to the algorithm, it is necessary to encode the character data into digital data to make it meet the requirements of the algorithm.

3.2. The Error-Prone Character Set Table of Stroke Order Based on Association Rules

By using the Apriori algorithm that introduced lift to mine the pre-processed contest data, the relationship between error-prone Chinese characters (incorrect stroke sequence/incorrect number of strokes) and error-prone Chinese characters was mined. Some of the mined results are shown in Table 1, respectively.

It can be seen from Table 1 that error-prone Chinese characters are often significantly associated with specific error types. Taking “之” as an example, the confidence of “之” and the error type of “Wrong number of strokes” is 0.99. This indicates that the reliability of this rule is very high, and learners are most likely to have this error type when practicing this character; it is also in line with the fact that it is easy to write the two strokes of “horizontal-break” and “right-falling” as one folding stroke. Therefore, in the process of writing correction, attention should be paid to the correction of the character strokes. Error-prone Chinese characters like this can be grouped into the set of characters with incorrect hyphenated strokes. In addition, the character “怀” corresponding to no. 6 in Table 1 has a high correlation with the character “情”. When learners make writing errors on the character “怀”, they may also make errors on the character “情”. It is easy to observe that both characters have the “忄” side, which stroke order is easy to write incorrectly, indicating that there is a certain correlation between Chinese characters with the same components, which is relatively intuitive. These kind of error-prone Chinese characters can be grouped into the set of characters with the same error-prone components. However, there are also some Chinese characters with no intuitive correlation. Through the Apriori algorithm, we found that characters “龙” and “为” have a certain correlation. From the similarity of structure, it can be explained that their commonality is independent dot strokes, suggesting that we can sum up the stroke order rules of Chinese characters with independent dot strokes, grouping them into sets of characters with the same error-prone features. Other characters do not have the above features but are also easy to write incorrectly due to their complex structure, which can be grouped into the set of characters with complex structures that are not easy to write correctly.

By mining the correlation between error-prone Chinese characters, some error-prone Chinese character categories were obtained. Additionally, we constructed the basic error-prone Chinese character set table by expanding the set of characters within different categories (Table 2). Each category contains dozens of Chinese characters with common error-prone feature. By correcting this error, it can be extended to every Chinese character of this category, and the learning efficiency of Chinese character strokes can be improved tens of times.

Thus far, we described a generation method of error-prone Chinese character set library based on the improved Aprori algorithm. By calling the Chinese characters in the library summarized above, we can make personalized recommendations according to the user information and character library. To verify the effectiveness of the error-prone Chinese character set library, we imported it into an applet developed by ourselves for internet users to practice. The writing records of each user were extracted and further analyzed for the types of errors in the strokes and stroke order of Chinese characters in the writing records, and some of the exercise data are shown in Table 3.

We can see that there is a mutual relationship between the wrong characters written by learners, and there is an explicit same-part correlation. For example, characters “龙” and “拢” written by learner no. 1 have the same component “龙”. Additionally, there is an implicit same-part correlation, such characters “丑” and “再” written by learner no. 5, which both have a “土” structure. We can conclude that the “土” is integrated into the character, and the stroke order rule is “vertical first and then two horizontal” [16]. Inspired by this idea, 38 different error types and their character sets were summarized through data mining and analysis to form the error-prone Chinese character set table, in which correlations between characters were confirmed in the learners’ writing records.

3.3. Recommendation of Error-Prone Chinese Characters based on Collaborative Filtering

The learned-based collaborative filtering algorithm for the error-prone Chinese character recommendation is mainly aimed at learners and the Chinese character writing records of the test system based on the error-prone Chinese character set table as experimental samples. The experimental analysis is carried out through the intelligent recommendation of error-prone Chinese characters based on UserCF-IIF. The experiment focuses on analyzing the quality of recommendation for selecting error-prone Chinese characters based on improved cosine similarity.

The purpose of the algorithm applied in the recommendation is to recommend the most error-prone Chinese characters to learners, so the top-N recommendation strategy was used [17]. To evaluate the recommendation results objectively, we adopted commonly used evaluation indexes in the recommendation system, namely precision, recall and coverage. Among them, accuracy rate refers to the ratio of error-prone Chinese characters recommended to learners to the true error Chinese characters. Recall rate represents the ratio of learners’ true error Chinese characters appearing in the most likely error-prone Chinese characters set recommended in the test set. Coverage rate represents the ratio of all the recommended error-prone Chinese characters to the whole error-prone Chinese character set table. The formulas are shown in Equations (8)–(10):

Precision = \frac{\sum_{u} | R (u) \cap T (u) |}{\sum_{u} | R (u) |}

(8)

Recall = \frac{\sum_{u} | R (u) \cap T (u) |}{\sum_{u} | T (u) |}

(9)

Coverage = \frac{\sum_{u} | R (u) |}{| I |}

(10)

R (u)

represents the error-prone Chinese character set recommended for learner

u

,

T (u)

represents the true error Chinese characters of learner

u

in the test set,

I

represents the sum of the whole error-prone Chinese character set table,

P (i)

represents the prevalence of Chinese character

i

, and

N

represents the list length of the recommended error-prone Chinese character

R (u)

.

The comparison algorithms adopted in this paper were three collaborative filtering algorithms: UserCF (Learner-based collaborative filtering algorithm), MostPopularCF (Heat-based collaborative filtering algorithm) and RandomCF (random filtering algorithm) [18]. All algorithms were tested separately using Jaccard similarity and cosine similarity for comparison. The length of the recommendation list selected in the experiment was 10, and the length of similar learners was 5–30.

As can be seen from Table 4, the collaborative filtering algorithm based on improved similarity can improve the accuracy rate of the recommendation of error-prone Chinese characters. Comparing several models, Jaccard-UserCF-IIF has the best accuracy rate; RandomCF is the best in coverage, while Jaccard-UsercF-iIF is the second best. This is because RandomCF is randomly recommended, but its performance in terms of accuracy and recall is flawed. Overall, Jaccard-UserCF-IIF shows the best comprehensive performance among all models.

We further analyze the model with the best comprehensive performance, namely Jaccard-UserCF-IIF. Figure 2 reports the influence of different neighbor numbers on the recommendation performance of Jaccard-UserCF-IIF. With the increase in neighbor numbers, the recommendation performance increases gradually. When the number of neighbors is 25, the accuracy and recall rate reach the maximum value. The coverage rate decreases with the increase of neighbor number, while the prevalence rate increases steadily.

4. Experimental Test and Effect Evaluation of Stroke Order Correction Algorithm

To verify the effectiveness of the personalized stroke correction algorithm, we developed an error-prone Chinese character writing stroke correction training WeChat applet and conducted special tests and effect evaluations on two different groups of learners. Among them, Test Experiment 1 reported the results of a pre-and post-training test of 65 junior normal students in a college of education in Zhejiang Province, and Test Experiment 2 reported the results of 593 pupils in a district of Hangzhou.

4.1. Test Experiment 1

The first experiment invited junior normal students from a university in Zhejiang Province as experimental subjects, and there were a total of 65 valid experimenters, including 13 male and 56 female students with an average age of about 21 years old, and the pre-test and post-test scores are shown in Figure 3.

The abscissa of Figure 3 represents the number of experimental subjects, the ordinate represents the scores obtained from the test, and the horizontal bar chart represents the test scores of different experimental subjects before training, which objectively reflects the overall basic stroke standardization level of the subjects. The black line shows the test scores of the different subjects after the training. The following basic conclusions can be drawn: 37 of the 65 learners (57%) improved their scores to varying degrees, 19 learners’ test scores remained the same, 17 of whom scored perfect on both the pre- and post-tests. Additionally, 9 learners’ test scores did not improve effectively.

The paired-samples t-test analysis yielded (as shown in Table 5) a correlation coefficient of 0.655, significance level p < 0.001, between the learners’ scores on the pre-and post-training. The students’ scores were 85.46 ± 14.60 on the pre-test and 91.92 ± 9.66 on the post-test, and the mean test scores improved by 6.46 points, an increase of 7.56%. This indicates that the personalized stroke correction algorithm is effective in improving the stroke regulation training of normal students (t = −4.718, p = 0.00 < 0.001, the difference is statistically significant).

4.2. Test Experiment 2

Experiment 2 organized students from several elementary schools in a district of Hangzhou to participate in this test, and obtained data from 593 valid experimenters, including 120 in grade 1, 98 in grade 2, 94 in grade 3, 122 in grade 4, 138 in grade 5, and 21 in grade 6, as shown in Figure 4.

The grid portion of Figure 4 represents the pre-test scores before training, reflecting the initial level of the learners. The black slash portion represents the post-test scores after training. It can be seen that the average scores of all grades improved by more than 10 points after the training. Table 6 counts the number of people whose scores changed by grade. In total 460 of the 593 learners who participated in the test had their scores improved, accounting for 77.6%; 40 had their test scores unchanged, of which 6 had perfect test scores on both the pre-test and post-test; Additionally, 93 had their test scores not effectively improved.

Further analysis by paired samples t-test yielded (as shown in Table 7) a correlation coefficient of 0.524, significance level p < 0.001, for student scores before and after training. The students’ scores were 66.47 ± 17.42 on the pre-test and 79.84 ± 15.46 on the post-test, and the mean test scores before and after training improved by 13.37 points, an increase of 20.11%. This indicates that the personalized stroke correction algorithm is effective in improving the stroke regulation training of normal students (t = −20.172, p < 0.001, the difference is statistically significant).

In summary, both Test Experiment 1 and Test Experiment 2 show that our method can effectively achieve personalized correction of Chinese characters’ stroke order and has effectiveness in teaching stroke order standardization to different groups.

5. Conclusions

As an educational sustainability case study, a personalized Chinese stroke order correction algorithm was successfully developed to correct irregular writing habits. In this algorithm, the Apriori algorithm improved by lift measure was first used to construct the error-prone Chinese character set table, and the improved collaborative filtering algorithm was then used to develop a learner-based personalized error-prone Chinese character recommendation model. The empirical testing of the personalized stroke correction algorithm of two experiments showed that the experimental testers’ performance was significantly improved after training. The overall results illustrated the effectiveness of the proposed algorithms. However, the strength of the association rules was not sufficient due to massive competition data with sparsity, which deserves further in-depth investigation. In future studies, we will further optimize the error-prone Chinese character set table and introduce more perspectives of learner information to improve the performances of the proposed algorithms. The methods in this study can also be extended to relevance mining of other subjects and the design of teaching strategies, due to the fact that knowledge is relevant and learners have similar groups in each domain.

Author Contributions

Conceptualization, Q.L. and H.Q.; methodology, Q.L., C.Z. (Caifeng Zhang) and H.Q.; software, C.Z. (Caifeng Zhang) and Y.D.; validation, Q.L. and Y.D; formal analysis, C.Z. (Caifeng Zhang) and Q.L.; investigation, C.Z. (Caifeng Zhang) and Q.L.; resources, Q.L., C.Z. (Caifeng Zhang) and H.Q.; data curation, Q.L., C.Z. (Caifeng Zhang) and Y.D.; writing—original draft preparation, C.Z. (Caifeng Zhang), Q.L. and X.Z.; writing—review and editing, C.Z. (Chu Zhang), Q.L. and X.Z.; supervision, H.Q. and C.Z. (Chu Zhang); project administration, H.Q., C.Z. (Chu Zhang) and M.L.; funding acquisition, Q.L. and H.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Chaomi S&T Company Cooperation Project (Grant number: HK16003) and project supported by Scientific Research Fund of Zhejiang Provincial Education Department (Grant number: Y202248424).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wu, D.L.; Chen, Y. The Current Situation and Reflections on the Study of Strokes and Errors of Chinese Characters in the Last Decade. Pop. Lit. Arts 2020, 2020, 155–157. [Google Scholar]
Xia, Y.; Xie, R.B.; Wang, Z.L.; Ruan, S.F.; Wu, X.C. Developmental relationships among morpheme awareness, Chinese character recognition, and vocabulary knowledge in lower elementary Chinese children - A cross-lagged study. J. Psychol. 2022, 54, 905–916. [Google Scholar]
Li, N.; Verma, H.; Skevi, A.; Zufferey, G.; Blom, J.; Dillenbourg, P. Watching MOOCs together: Investigating co-located MOOC study groups. Distance. Educ. 2014, 35, 217–233. [Google Scholar] [CrossRef]
Boroujeni, M.S.; Dillenbourg, P. Discovery and temporal analysis of latent study patterns in MOOC interaction sequences. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge, Virtual Event, 5–9 March 2018; pp. 206–215. [Google Scholar]
Troussas, C.; Chrysafiadi, K.; Virvou, M. An intelligent adaptive fuzzy-based inference system for computer-assisted language learning. Expert. Syst. Appl. 2019, 127, 85–96. [Google Scholar] [CrossRef]
Ding, J.H.; Liu, H.Z. Accurate recommendation of learning resources based on multi-dimensional correlation analysis in age of big data. E-educ. Res. 2018, 39, 53–59+66. [Google Scholar]
Zhang, L.; Lu, Z. Applications of association rule mining in Teaching Evaluation. In Proceedings of the 2018 3rd International Conference on Humanities Science, Management and Education Technology (HSMET 2018), Nanjing, China, 8–10 June 2018; pp. 331–334. [Google Scholar]
Zhang, Y.J.; Dong, Z.; Meng, X.W. Research on Personalized Advertising Recommendation Systems and Their Applications. Chin. J. Comput. 2021, 44, 531–563. [Google Scholar]
Chen, H.J. Design of Information Recommendation Book Management System based on Apriori Data Mining Algorithm. Mod. Electron. Tech. 2019, 42, 115–119+124. [Google Scholar]
Guo, P.; Cai, C. Data Mining and Analysis of Students’ Score Based on Clustering and Association Algorithm. Comput. Eng. Appl. 2019, 55, 169–179. [Google Scholar]
Wang, Y.Z.; Shen, Y.J.; Wang, L.J. The Causes Analysis of Traffic Accident Black Spots based on Improved Interest Measurement and Apriori Algorithm. J. Zhejiang Univ. (Sci. Ed.) 2021, 48, 349–355. [Google Scholar]
Harahap, M.; Husein, A.M.; Aisyah, S.; Lubis, F.R.; Wijaya, B.A. Mining association rule based on the diseases population for recommendation of medicine need. J. Phys. Conf. Ser. 2018, 1007, 012017. [Google Scholar] [CrossRef]
Das, S.; Dutta, A.; Jalayer, M.; Bibeka, A.; Wu, L. Factors influencing the patterns of wrong-way driving crashes on freeway exit ramps and median crossovers: Exploration using ‘Eclat’ association rules to promote safety. Int. J. Transp. Sci. Technol. 2018, 7, 114–123. [Google Scholar] [CrossRef]
Kosub, S. A note on the triangle inequality for the Jaccard distance. Pattern. Recogn. Lett. 2018, 120, 36–38. [Google Scholar] [CrossRef] [Green Version]
Li, Y.Y.; Deng, H.J. Collaborative filtering recommendation algorithm based on improved cosine similarity. Comput. Mod. 2020, 2020, 69–74. [Google Scholar]
Lang, Q.; Ma, J.; Qi, H.N. Studies on Teaching of Stroke Order and Re-summarizing of Stroke Order Norms. J. Huzhou Univ. 2020, 42, 102–107. [Google Scholar]
Xue, F.; He, X.; Wang, X.; Xu, J.; Liu, K.; Hong, R. Deep Item-based Collaborative Filtering for Top-N Recommendation. ACM T. Inform. Syst. 2019, 37, 1–25. [Google Scholar] [CrossRef]
Bedi, P.; Gautam, A.; Sharma, C. Using Novelty Score of Unseen Items to Handle Popularity Bias in Recommender Systems. In Proceedings of the International Conference on Contemporary Computing and Informatics, Noida, India, 8–9 May 2015; pp. 934–939. [Google Scholar]

Figure 1. Personalized Chinese stroke order correction algorithm.

Figure 2. The effect of the number of nearest neighbors on the performance of Jaccard-UserCF-IIF recommendation. Subfigures (a) reports the influence of different neighbor numbers on Precision. Subfigures (b) reports the influence of different neighbor numbers on Recall. Subfigures (c) reports the influence of different neighbor numbers on Coverage.

Figure 3. Comparison of experimental test scores of junior normal students from a university in Zhejiang Province.

Figure 4. Comparison of mean test scores of participating elementary school grades in a city district.

Table 1. Partial association rules.

Order	Association Rule	Support	Confidence	Lift
1	之→ Wrong number of strokes	0.06	0.99	2.21
2	山→ Wrong order of stroke	0.02	0.83	1.50
3	家→ Wrong number of strokes	0.02	0.94	2.08
4	义→ Wrong order of stroke	0.03	0.97	1.75
5	为→ Wrong order of stroke	0.07	0.94	1.70
6	怀→情	0.01	0.43	2.10
7	存→好	0.01	0.42	3.28
8	龙→为	0.36	0.47	1.19
9	鹿→花	0.02	0.50	4.81
10	乘→老	0.01	0.41	4.31

Note: Some contents on the table are Chinese characters.

Table 2. Stroke order error-prone type character set table.

Associated Charset	Rule	Explain	Label
军, 挥, 辉, 连, 轰…	The structure of “车” ends with vertical	The structure of “车” is not the side, the last stroke vertical	军
怕, 忙, 快, 怜, 怪…	The structure of “忄” write two points left and right	The structure of “忄” writes two points first and then writes vertical, in line with the writing method	惊
刀, 刃, 分, 初, 剪…	The structure of “刀” ends with the prime	The last stroke of “万, 刀, 力, and 乃” is prime	刀
灯, 灾, 灼, 灵, 烂…	The structure of “火” write two points right and left	“人” structure write together, dot and prime in the “人” above, first write dot and prime, then write “人”	火
义, 仪, 斗, 门, 闪…	Anything on the top or top left should be written first	Write according to the most basic structure leading rule from top to bottom	义
…	…	…

Note: Some contents on the table are Chinese characters.

Table 3. Part of learners’ practice records.

Learner Order	Characters with Wrong Stroke Order	Characters with Wrong Strokes Number
1	搜, 军, 龙, 丹, 拢	连, 初, 莲, 扔, 字
2	浑, 挥, 辆, 防, 房	迈, 转, 轮, 初, 莲
3	奶, 圾, 船, 母, 丑	区, 医, 能, 北, 笼
4	连, 迈, 莲, 房, 剪	轰, 软, 防, 浑, 连
5	丑, 再, 垂, 每, 悔	极, 扔, 里, 重, 秀

Note: Some contents on the table are Chinese characters.

Table 4. The average influence of different models on the recommendation results of error-prone Chinese characters.

Model	$Precision$	$Recall$	$Coverage$
cos-UserCF	0.4611	0.5338	0.7902
Jaccard-UserCF	0.4569	0.521	0.7894
cos-MostPopularCF	0.2759	0.31	0.2304
Jaccard-MostPopularCF	0.2736	0.3065	0.2206
cos-RandomCF	0.0692	0.0793	1
Jaccard-RandomCF	0.0747	0.0835	1
cos-UserCF-IIF	0.4596	0.5166	0.8259
Jaccard-UserCF-IIF	0.4709	0.5296	0.8061

Table 5. Paired sample statistics and test (Dataset 1).

						Difference 95% Confidence Interval
		Average	Number of Cases	Standard Deviation	Standard Error Mean	Upper Limit	Lower Limit	Degree of Freedom	Sig. (2-Tailed)
Pair 1	First test	85.46153846	65	14.59567244	1.810370537
	Effect test	91.91794872	65	9.655530721	1.197621190
	First test-Effect test	−6.45641026		11.03302637	1.368478498	−9.19026033	−3.72256018	64	0.000

Note: Correlation coefficient = 0.655, p < 0.001, t = −4.718 (t-Test, Dataset 1).

Table 6. The number of learners whose scores changed by grade.

Grade	Rising	No change	Decline	Sum
1	93	10	17	120
2	70	11	17	98
3	74	5	15	94
4	100	4	18	122
5	113	7	18	138
6	10	3	8	21
Sum	460	40	93	593

Table 7. Paired sample statistics and test (Dataset 2).

						Difference 95% Confidence Interval
		Average	Number of Cases	Standard Deviation	Standard Error Mean	Upper Limit	Lower Limit	Degree of Freedom	Sig. (2-Tailed)
Pair 1	Pre-test	66.47	593	17.419	0.715
	Post-test	79.84	593	15.461	0.635
	Pre-test–Post-test	–13.363		16.131	0.662	–14.664	–12.062	592	0.000

Note: Correlation coefficient = 0.524, p < 0.001, t = −20.172 (t-Test, Dataset 2).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lang, Q.; Zhang, C.; Qi, H.; Du, Y.; Zhu, X.; Zhang, C.; Li, M. Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction. Sustainability 2023, 15, 2393. https://doi.org/10.3390/su15032393

AMA Style

Lang Q, Zhang C, Qi H, Du Y, Zhu X, Zhang C, Li M. Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction. Sustainability. 2023; 15(3):2393. https://doi.org/10.3390/su15032393

Chicago/Turabian Style

Lang, Qing, Caifeng Zhang, Hengnian Qi, Yaqin Du, Xiaorong Zhu, Chu Zhang, and Mizhen Li. 2023. "Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction" Sustainability 15, no. 3: 2393. https://doi.org/10.3390/su15032393

APA Style

Lang, Q., Zhang, C., Qi, H., Du, Y., Zhu, X., Zhang, C., & Li, M. (2023). Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction. Sustainability, 15(3), 2393. https://doi.org/10.3390/su15032393

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mining and Utilizing Knowledge Correlation and Learners’ Similarity Can Greatly Improve Learning Efficiency and Effect: A Case Study on Chinese Writing Stroke Correction

Abstract

1. Introduction

2. Personalized Chinese Character Stroke Order Correction Algorithm

2.1. Apriori Algorithm with Lift Measure

2.2. Learner-Based Collaborative Filtering-Inverse Item Frequence

3. Experimental Results and Analysis of Association Rules and Collaborative Filtering

3.1. Data Pre-Processing

3.2. The Error-Prone Character Set Table of Stroke Order Based on Association Rules

3.3. Recommendation of Error-Prone Chinese Characters based on Collaborative Filtering

4. Experimental Test and Effect Evaluation of Stroke Order Correction Algorithm

4.1. Test Experiment 1

4.2. Test Experiment 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Grade	Rising	No change	Decline	Sum
1	93	10	17	120
2	70	11	17	98
3	74	5	15	94
4	100	4	18	122
5	113	7	18	138
6	10	3	8	21
Sum	460	40	93	593

Grade	Rising	No change	Decline	Sum
1	93	10	17	120
2	70	11	17	98
3	74	5	15	94
4	100	4	18	122
5	113	7	18	138
6	10	3	8	21
Sum	460	40	93	593

Grade	Rising	No change	Decline	Sum
1	93	10	17	120
2	70	11	17	98
3	74	5	15	94
4	100	4	18	122
5	113	7	18	138
6	10	3	8	21
Sum	460	40	93	593