1. Introduction
With the rapid development of artificial intelligence (AI), the research on intelligent tutoring algorithms (ITAs) has also made great progress [
1,
2,
3]. The most important link in developing ITA for mechanics problems is problem understanding [
4]. The purpose of problem understanding is to convert input problems into structured given conditions through natural language processing (NLP). Nevertheless, as mechanics problems usually involve a large number of complex relations (nested relations and overlapping relations), existing methods utilize named entity recognition (NER) to accomplish the problem understanding task [
4,
5].
NER is the basis for developing NLP applications [
6], such as information extraction (IE) [
7,
8,
9], knowledge extraction (KE) [
10,
11], question answering (QA) [
12,
13], and intelligent tutoring systems (ITSs) [
3,
4]. The main task of NER is to extract entities in unstructured text and classify the elements into predefined categories such as name, organization, location, and string value. Recognizing named entities for mechanics problems can acquire the basic information and given conditions of the input problem, which provide support for automatically finding the answer. Therefore, it is of great significance to accurately extract named entities for mechanics problems.
From the perspective of application scenarios, our purpose was to identify the key entities of input mechanics problems. For instance, the green characters are the key entities of the example provided in
Figure 1. However, developing an efficient model for this task faces some challenges. In contrast to English, Chinese sentences have no boundaries for word segmentation [
14]. Therefore, early Chinese named entity recognition (CNER) methods only took characters as input [
15]. Since lexical boundaries are usually the same as entity boundaries, scholars have paid increasing attention to developing algorithms utilizing lexical information [
16]. With continuous exploration, many innovational methods have been proposed. These methods can be divided into three categories. The first approach is the pipeline model. This type of method performs Chinese word segmentation (CWS) first and then takes these segmented sequences as the input. However, these methods ignore the influence of word segmentation, which easily causes error propagation and reduces the recognition performance. The second approach is the multi-task model. Inspired by multi-task learning, this type of method usually trains CWS and CNER jointly. However, these methods make fully mining the learning ability of the model difficult, and the generalization ability is usually poor. The last approach is the character representation enhancing model. This type of method usually enhances the character representation with lexical information before feeding it into the main model. Compared with the aforementioned two methods, the third approach can obtain better performance. The current state-of-the-art CNER algorithm was based on the third approach. The model proposed in this article also belongs to this approach.
In addition to the lexical segmentation problem, there are three challenges in identifying entities from mechanics problems. Firstly, as provided in
Figure 1, the target text contains much terminology, and only some of these terms are target entities. For instance, as provided in
Figure 1, “速度 (speed)”, “匀减速直线运动 (uniform deceleration linear motion)”, and “运动 (motion)” are terminology. Nonetheless, only “匀减速直线运动 (uniform deceleration linear motion)” is the target entity. Although large-scale language models, such as bidirectional encoder representation from transformers (BERT), can enhance the character representation, they tend to ignore the fact that learning character terminology information directly can bring further improvement. The boundary information, semantic information, and contextual information of lexicons are a benefit to distinguishing target entities accurately. For example, as character “运 (transport)” is contained in “运动 (motion)” and “匀减速直线运动 (uniform deceleration linear motion)”, we call both terms matched words of “运 (transport)” in this article. Learning the abovementioned information can help the first “运 (transport)” predict the tag of “O” and help the second “运 (transport)” predict the tag of “I-MOT” (an inside character of a motion entity) rather than “B-MOT” (the beginning character of a motion entity) and “O”.
Secondly, there are many composite entities composed of numbers and characters, such as “20 米每秒 (20 meters per second)” and “10 米每秒 (10 meters per second)”. In this situation, the input character need to fully learn the contextual information of adjacent characters or terms to predict correct labels. For instance, the target entity “20 米每秒 (20 meters per second)” consists of the number “20” and the terminology “米每秒 (meters per second)”. By learning the nearest contextual character information and contextual lexical information, an “I-VEL” (an inside character of a velocity entity) tag can be predicted for “米 (meter)” rather than “B-VEL”(the beginning character of a velocity entity).
Thirdly, identifying entities from mechanics problems is a typical vertical field problem. Chinese characters are pictographs that evolved from pictures; thus, the structure can reflect the relevant information of the character. In Chinese, characters are usually formed by radicals, called a glyph structure. As characters consisting of the same radicals usually have similar meanings, the glyph structure can enrich the character representation further. For instance, the characters formed by “鸟 (bird)” usually indicate meanings related to birds, such as “鸡 (chicken)”, “鸭 (duck)”, and “鹰 (eagle)”. The characters formed by “月 (moon)” usually indicate meanings related to the body, such as “腰 (waist)”, “肾 (renal)”, and “肩 (shoulder)”. It has been demonstrated that the cosine distance of Chinese characters with the same radical or a similar structure is smaller [
17]. Therefore, radical information can enhance the semantic information of Chinese characters. Additionally, in the medicine and medical fields, the glyph structure has been used as an important means to improve model performance. However, current research on mechanics tends to ignore this beneficial approach.
From the perspective of implementation, recent research has demonstrated that the benefits external information and a pre-trained language model bring to CNER cannot be ignored. Many scholars have noticed this and have combined the two to develop new methods. Nevertheless, there are still two defects, including the inability to use multivariate information and insufficient information learning. Firstly, most current works have concatenated the external information with the output of the pre-trained language model, and this leads to the inadequate learning of external information. In this article, we call this type of approach the model-level fusion method. Secondly, although a few methods enable the pre-trained language model to learn external information, they tend to only use external lexical information while ignoring radical information.
For Chinese, the main difference between vertical and generic domains is that vertical domains contain a large number of terms and the radicals of these terms. Therefore, we proposed a novel CNER method for the mechanics domain based on the characteristics of Chinese. Firstly, to improve the overall performance of the proposed model, we used the pre-trained language model BERT. Secondly, we used Multi-Meta Information Embedding (MMIE) and fully integrated it into BERT. The motivation was that this could help BERT to learn the multi-meta information while leveraging it. Specifically, we proposed an information adapter layer (IAL). The multi-meta embedding could be integrated into the underlying transformer of BERT through IAL directly, so that BERT could learn the multi-meta information. In this article, we call this approach the sub-model-level fusion method. In summary, the specific contributions of this article are listed as follows:
A Multi-Meta Information Embedding method was proposed to enrich the representation of Chinese sentences.
A sub-model-level fusion method was proposed to fuse the external information embedding into the bottom transformer of BERT, which could promote the pre-trained model to learn the external information embedding.
An information adapter layer was proposed to fine-tune the model while leveraging the external information embedding.
The rest of this article is organized as follows. In
Section 2, we introduce the related works. In
Section 3, we introduce the the proposed method, including the Multi-Meta Information Embedding, the information adapter layer, the encoding layer, and the decoding layer. In
Section 4, extensive experiments are presented. In
Section 5, three case studies are discussed. Finally,
Section 6 concludes this article.
2. Related Works
Recent research has demonstrated that external information and a pre-trained language model can improve the performance of a CNER algorithm. These algorithms can be divided into three categories: external information-based algorithms, pre-trained language model-based algorithms, and hybrid algorithms.
External information-based algorithms aim to enhance character representation by adding external information (such as lexical information). Zhang et al. [
18] were the first to develop this type of method. They proposed Lattice LSTM, which encodes and matches words of the input sentence. However, each character of Lattice LSTM can only obtain the lexical information ending with that character, and there is no continuous memory for the previous lexical information. Moreover, Tang et al. [
19] proposed the word-character graph convolution network (WCGCN). This algorithm enhances the lexical information using a graph convolution network (GCN). Additionally, Song et al. [
20] proposed a multi-information-based algorithm. They used a bidirectional gated recurrent unit (Bi-GRU)-conditional random field (CRF) to fuse character embedding, lexical embedding, and radical embedding. Further research was reported in [
21,
22,
23,
24,
25]. However, these methods do not leverage a pre-trained language model, which could significantly improve the recognition performance.
Pre-trained language model-based algorithms aim to enhance character representation using a pre-trained model (such as BERT). Research [
26,
27] has proved that using the character features from BERT outperforms static embedding-based CNER algorithms. These methods usually use fine-tuning to connect the pre-trained language model with traditional CNER models, such as BERT-BiLSTM-CRF and BERT-BiGRU-CRF. A defect of these methods is that they do not consider the effect of external information on the recognition performance, which is more obvious in the vertical field.
Hybrid algorithms aim to integrate external information and a pre-trained language model [
17,
28,
29,
30]. Ma et al. [
30] proposed a model-level fusion method, named SoftLexicon. This algorithm takes the output of BERT as an enhanced character representation, directly concatenates it with the lexical embedding, and then inputs them into a fusion layer for CNER. Additionally, Li et al. [
28] proposed the flat-lattice transformer (FLAT) to integrate lexical information and character embedding. Similarly, FLAT also uses model-level fusion when leveraging BERT. Although these methods simultaneously utilize external information and a pre-trained language model, they only use the model-level integration of the two, which does not allow the pre-trained language model to fully learn the external information and easily leads to overfitting.
With the development of deep learning, the CNER algorithm has become more and more mature in the open domain. However, for the vertical domain, the data sources are diverse, and the data structure is different. Compared with the open domain, the entities of the vertical domain are complex and difficult to identify. Guo et al. [
31] proposed a CNER method combining a convolutional neural network (CNN) [
32], BiLSTM, and an attention mechanism (AM) to recognize agricultural pest- and disease-related named entities. Chen et al. [
33] proposed a hybrid neural network model based on MC-BERT, namely MC-BERT + BiLSTM + CNN + MHA + CRF, to recognize entities in Chinese electronic medical records. He et al. [
34] proposed a CNER method that combines knowledge graph embedding with a self-attention mechanism to recognize entities in Chinese marine texts. Liu et al. [
26] applied BERT-BiLSTM-CRF to the recognition of named entities in the field of history and culture. However, these research works simply applied the open-domain algorithms to the vertical domain, and few works have taken into account the characteristics of professional entities.
In summary, there are two challenges in developing an efficient CNER algorithm for the mechanics domain. The first is how to use the domain information, and the second is how to integrate the external information and pre-trained model. Therefore, this article proposes a novel CNER algorithm for recognizing mechanics entities, namely Multi-Meta Information Embedding Enhanced BERT (MMIEE-BERT). In contrast to the above methods, MMIEE-BERT uses terminology from the mechanics domain and its corresponding roots as the external information. Then, to fully integrate external information with the pre-trained model, a sub-model-level fusion method is proposed to integrate the external information into the pre-trained model BERT. The proposed method integrates external information into the underlying transformer of BERT, so that the top transformers of BERT can fully learn this information.
5. Case Studies
To illustrate the performance of the proposed method, in this section we present three case studies on the tagged sequence generated by our model, LE-BERT, BERT-FLAT, and BERT-MECT. These three cases belong to different categories of long text data. The first case is a mechanics problem that does not contain force entities and power entities, the second is a case that contains force entities but no power entities, and the last case comprises problems that contain power entities.
5.1. Case 1: An Example of Newton’s Law of Motion
The first case is a problem concerning Newton’s law of motion. The description of this problem is: “甲, 乙两汽车沿同平直公路同向全速行驶中, 甲车在前, 乙车在后, 行驶速度均为 m/s. 当两车快要到十字路口时, 甲车司机看到红灯亮起,于是紧急刹车, 乙车司机看到甲车刹车后也紧急刹车 (乙车司机的反应时间忽略不计). 已知甲车, 乙车紧急刹车时的加速度大小分别为 m/s, m/s, 求: (1) 乙车刹车后经多长时间速度与甲车相等; (2) 为保证两车在紧急刹车过程中不相撞, 甲乙两车刹车前的距离 x 至少是多少”. (“Car A and car B two cars along the same straight road in the same direction full speed, car A in front, car B in the back, the speed is m/s. When the two cars are about to reach the intersection, the driver of car A saw the red light, and then the emergency brake, the driver of car B saw the car A brake after the emergency brake (the reaction time of the driver of car B is ignored), known car A, car B emergency braking acceleration is m/s, m/s, respectively, to find: (1) after how long the speed of car B is equal to the speed of car A; (2) to ensure that the two cars in the emergency braking process does not collide, the distance x before the two cars is at least what.”)
Table 14 reports the recognition results of the four models for case 1. It is worth noting that the target entity in case 1 contains two “刹车 (brakes)”, where the first “brake” corresponds to research object 1, “甲车 (car A)”, and the other corresponds to research object 2, “乙车 (car B)”. The three models other than MMIEEE-BERT produced errors in the recognition of these two entities. Additionally, the three baselines of LE-BERT, BERT-MECT, and BERT-FLAT failed to identify “6 m/s
” as an acceleration entity.
5.2. Case 2: An Example of Force Analysis
The second case is a problem concerning force analysis. The description of this problem is: “据报载, 我国自行设计生产运行速度可达 m/s 的磁悬浮飞机. 假设“飞机”的总质量 , 沿水平轨道以 m/m 的加速度从静止做匀加速起动至最大速度, 忽略一切阻力的影响 ( m/s) 求: (1) “飞机”所需的动力 F; (2) “飞机”起动至最大速度所需的时间 t”. (“It is reported that a magnetic levitation aircraft with a speed of m/s has been designed and produced by our country. Assuming that the total mass of the “aircraft” is , and it accelerates uniformly from rest to maximum speed along a horizontal track with an acceleration of m/s, ignoring the influence of all resistances, ( m/s). Find: (1) the force required by the “aircraft” F (2) the time required by the “aircraft” to reach maximum speed t.”)
Table 15 reports the recognition results of the four models for case 2. It can be observed that LE-BERT produced deviations in the recognition of “1 m/s
” and “匀加速起动 (uniform acceleration start)”. Specifically, the model failed to identify “1 m/s
” as an acceleration entity and failed to identify “匀加速起动 (uniform acceleration start)” as a state entity. BERT-MECT failed to identify “水平 (horizontal)” and “匀加速起动 (uniform acceleration start)”. Finally, BERT-FLAT incorrectly identified the category of “10 m/s
” and also failed to identify “匀加速起动 (uniform acceleration start)” as an entity.
5.3. Case 3: An Example of Power and Energy
The third case is a problem concerning power and energy. The description of this problem is: “哈尔滨第24 届世界大学生冬运会某滑雪道为曲线轨道, 滑雪道长 m, 竖直高度 m. 运动员从该滑道顶端由静止开始滑下, 经 s 到达滑雪道底端时速度 m/s, 人和滑雪板的总质量 kg, 取 m/s, 求人和滑雪板 (1) 到达底端时的动能; (2) 在滑动过程中重力做功的功率; (3) 在滑动过程中克服阻力做的功”. (“The length of a curve track of the 24th World University Winter Games in Harbin is m, and the vertical height is m. The athlete starts to slide down from the top of the track from a static position, and reaches the bottom of the track at a speed of m/s after s. The total mass of the person and the skis is kg. m/s. The following parameters of the person and the skis are given: (1) the kinetic energy when reaching the bottom; (2) the power of the work done by gravity in the sliding process; (3) the work done by overcoming the resistance in the sliding process.”)
Table 16 reports the recognition results of the four models for case 3. It can be seen that the three models other than MMIEEE-BERT failed to recognize the entity ”
m”, which was considered to be caused by the fact that the entity was too complex. Furthermore, LE-BERT correctly recognized all other entities in this case, while BERT-MECT and BERT-FLAT failed to recognize the entity “人和滑雪板 (the person and the skis)”. Finally, BERT-FLAT incorrectly recognized “40 m/s” as acceleration.
Through the three case studies, it was found that for mechanics problems containing multiple research objects, the model usually displayed recognition bias for the object information of the target entities. In addition, some key entities of non-technical terms were also easily misrecognized, such as “平直 (flat)” and “上 (up)”. Finally, the model was also prone to mistakes in the recognition of complex entities containing operators and the category recognition of velocity entities and acceleration entities.