4.4. Comparative Experiment
4.4.1. Baseline Models
To prove the effectiveness of the model, comparison experiments were conducted to compare the proposed model with baseline models and classic entity recognition neural network models proposed in recent years. The comparison results are shown in
Table 3 and
Table 4. Bold data indicates the highest value of the corresponding indicator in the table.
BERT-CRF encodes the semantic features of the text with the BERT model and applies the CRF layer for sequence decoding to obtain the prediction results. BERT-GlobalPointer uses rotation position encoding (RoPE) on the output features of the BERT model to add position information and extracts entities by calculating the score of the entity boundary in the diagonal matrix. AT-CBGP [
34] improves the robustness and generalization of the model by adding an adversarial neural network on the GlobalPointer module. LLPA [
35] adds relative position encoding on the bidirectional Lattice-LSTM module. MFT [
36] adds Chinese word root information and improves the structure of the transformer. BERT + para-lattice + CRF [
37] adds Chinese character vectors as features to enhance performance.
This paper proves the effectiveness of the proposed model on the CLUENER2020 dataset by comparing its performance with that of classic models in comparative experiments. The following models are classic entity recognition network structures that were reimplemented in this paper. BERT-CRF extracts the semantic information of the text with the BERT layer and learns the constraint information between entity tags with the CRF conditional random field. On the basis of the BERT-CRF module, three complex entity recognition models are created by adding the feature extraction layer IDCNN module, the BILSTM module, and the IDCNN-BILSTM module, respectively, to obtain four classic entity recognition models.
The experimental results of AT-CBGP, LLPA, MFT, and BERT + para-lattice + CRF are the experimental results of other papers, and the remaining six baseline models are the experimental results obtained by reproducing other papers’ models. The Weibo dataset is divided into training, testing, and validation sets. The training method on the Weibo dataset is to verify the validation set after each round of training, and after 50 rounds of training, the model selects the model parameters with the highest performance on the validation set, tests the performance on the test set with the best model parameters, and the experimental results on the test set are the final results of the model. The CLUENER 2020 dataset is divided into training and validation sets. The training method on the CLUENER2020 dataset is to validate the validation set after each round of training, and after 20 rounds of training, the model selects the best performance on the validation set as the final result of the model.
4.4.2. Comparative Experiment
Table 3 and
Table 4 show the comparative experimental results of the proposed model on the CLUENER2020 dataset and the Weibo dataset. The analysis of the comparative experiments is as follows.
According to the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-GlobalPointer, the F1 value of the model increased by 1.151% on the CLUENER2020 dataset and increased by 6.107% on the Weibo dataset. BERT-ATT-BILTAR-GlobalPointer adds the BILTAR module and attention mechanism to BERT-GlobalPointer. The improvement in performance shows that the BITAR module can extract deeper semantic information, and the attention mechanism can highlight important information, thereby improving the performance of the model.
In the comparative experiments on the Weibo dataset, compared with the LLPA model, the F1 value of the proposed model was increased by 13.89%. Compared with the MFT model, the F1 value of the proposed model was increased by 11.37%. Compared with BERT + para-lattice + CRF, the F1 value of the proposed model was increased by 5.18%. Compared with AT-CBGP, the F1 value of the proposed model was increased by 4.56%. The experimental results prove that the combination of the GlobalPointer module and the deep semantic feature extraction layer BILTAR can achieve good performance of the model.
Comparative experiments on the two datasets show that the entity boundary score calculation method of GlobalPointer is significantly better than the sequence decoding method of CRF. The BILTAR feature extraction layer is significantly better than the BILSTM and BILSTM-IDCNN modules. The BILTAR module is superior to the IDCNN module in terms of both model parameter volume and performance. Splitting the forward and backward information separately in the BILSTM module can extract more sequence information, adding attention mechanisms at multiple positions can denoise the features, and adding the TIME module on the output of BILSTM can extract more semantic information based on each word.
4.5. Ablation Experiments
The BILTAR module of this study is divided into six ablation experiment modules. The six modules are the attention mechanism module ATT, the positive and negative direction module PN, the time module TIME, the rotation position encoding RoPE, the entity boundary score calculation of GlobalPointer, and the overall BILTAR module. This study adopted the method of controlling variables for the experimental design. In order to prove the effectiveness of the six modules, this study used uniform data processing methods, the same operating environment, and training parameter settings. The only difference between the models existed in the different structural compositions, and the common parts were consistent. The hyperparameters of the models are the same when the models are trained on the same dataset. A total of 18 module combinations were tested in the ablation experiment.
Table 5 shows the ablation experiment results of the modules. Bold data indicates the highest value of the corresponding indicator in the table.
This study conducted five ablation experiments on the encoder of the TIME module by changing the number of fully connected layers and found that the encoder composed of two fully connected layers and the decoder composed of two fully connected layers had the best performance as the encoder of the TIME module. Below are the introductions and analyses of each ablation experiment:
4.5.1. Ablation Experiments of Rotation Position Encoding RoPE
To prove the effectiveness of the rotation position encoding RoPE, this study removed the RoPE from the GlobalPointer module and conducted two comparative experiments.
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-ATT-BILTAR-GlobalPointer (without RoPE), the F1 value of the model decreased by 0.841% on the CLUENER2020 dataset, and the F1 value of the model decreased by 4.972% on the Weibo dataset.
In the comparison between BERT-GlobalPointer and BERT-GlobalPointer (without RoPE), the F1 value of the model decreased by 0.460% on the CLUENER2020 dataset, and the F1 value of the model decreased by 2.338% on the Weibo dataset.
Therefore, it is shown that the rotational position encoding RoPE can extract the position information of the sequence, thereby improving the predictive ability of the model. Putting rotation position encoding into the GlobalPointer module has significant performance on Twitter datasets.
4.5.2. Ablation Experiments of Entity Boundary Score Calculation of GlobalPointer
To prove that the entity boundary score calculation method of GlobalPointer was better than the conditional random field (CRF), this study replaced the CRF module with the GlobalPointer module and conducted four comparative experiments.
In the comparison between BERT-GlobalPointer and BERT-CRF, the F1 value of the model increased by 3.597% on the CLUENER2020 dataset, and the F1 value of the model increased by 1.952% on the Weibo dataset.
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-ATT-BILTAR-CRF, the F1 value of the model increased by 3.965% on the CLUENER2020 dataset, and the F1 value of the model increased by 10.990% on the Weibo dataset.
In the comparison between BERT-BILTAR-GlobalPointer and BERT-BILTAR-CRF, the F1 value of the model increased by 3.390% on the CLUENER2020 dataset, and the F1 value of the model increased by 0.397% on the Weibo dataset.
In the comparison between BERT-ATT-GlobalPointer and BERT-ATT-CRF, the F1 value of the model increased by 6.862% on the CLUENER2020 dataset, and the F1 value of the model increased by 5.069% on the Weibo dataset.
Through the four experiments, it is found that the F1 value of the model can increase significantly by replacing the CRF layer with the GlobalPointer module in various module combinations. Therefore, it is concluded that the performance of the GlobalPointer module is superior to that of the CRF layer. The span-based GlobalPointer module avoids the CRF problem of predicting invalid element labels with the sequence tagging method.
4.5.3. Ablation Experiments of Attention Mechanism
To demonstrate the effectiveness of the attention mechanism, this study conducted five comparative experiments by adding attention mechanisms to multiple locations in the model.
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-BILTAR-GlobalPointer, the F1 value of the model increased by 0.944% on the CLUENER2020 dataset and by 10.732% on the Weibo dataset.
In the comparison between BERT-ATT-GlobalPointer and BERT-GlobalPointer, the F1 value of the model did not improve on the CLUENER2020 dataset but increased by 1.471% on the Weibo dataset.
In the comparison between BERT-ATT-BiLSTM-ATT-GlobalPointer and BERT-BiLSTM-GlobalPointer, the F1 value of the model increased by 1.792% on the CLUENER2020 dataset but did not improve on the Weibo dataset.
In the comparison between BERT-ATT-BiLSTM-PNATT-GlobalPointer and BERT-BiLSTM-PN-GlobalPointer, the F1 value of the model increased by 1.257% on the CLUENER2020, and the F1 value of the model increased by 2.438% on the Weibo dataset.
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-BiLSTM-PNTIMER-GlobalPointer, the F1 value of the model increased by 2.451% on the CLUENER2020 dataset, and the F1 value of the model increased by 4.445% on the Weibo dataset.
The experiments show that adding the attention mechanism to most module combinations can improve the F1 value of the models. It can be concluded that processing features with an attention mechanism in most positions can achieve a denoising effect and improve the performance of the model.
4.5.4. Ablation Experiments of the Time-Step Function
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-ATT-BiLSTM-PNATT-GlobalPointer, the time-step module TIME behind PN was added. On the CLUENER2020 dataset, the F1 value of the model increased by 1.016%, and on the Weibo dataset, the F1 value of the model increased by 4.300%. This indicates that the time-step module TIME can mine deep semantic information of words and improve the performance of the model. It should be noted that the model generally does not converge if the output vector of the time-step function is not residual with the output vector of the attention vector or the output vector of the BERT layer.
4.5.5. Ablation Experiments of the Positive–Negative Module PN
To demonstrate the effectiveness of the positive–negative module PN, this study conducted five comparative experiments by adding PN behind the BILSTM module.
In the comparison between BERT-BILTAR-GlobalPointer and BERT-BiLSTM-ATT-TIMER-GlobalPointer, the F1 value of the model increased by 0.114% on the CLUENER2020 dataset but did not improve on the Weibo dataset.
In the comparison between BERT-BiLSTM-PN-GlobalPointer and BERT-BiLSTM-GlobalPointer, the F1 value of the model increased by 0.145% on the CLUENER2020 dataset but did not improve on the Weibo dataset.
In the comparison between BERT-BiLSTM-PNTIMER-GlobalPointer and BERT-BiLSTM-TIMER-GlobalPointer, the F1 value of the model increased by 0.063% on the CLUENER2020 dataset and by 3.954% on the Weibo dataset.
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-ATT-BiLSTM-ATT-TIMER-GlobalPointer, the F1 value of the model increased by 0.106% on the CLUENER2020 dataset and by 3.812% on the Weibo dataset.
This study conducted four comparative experiments on the PN module and found that adding the PN module to most module combinations can improve the F1 value of the models. It can be concluded that separately processing the forward and backward outputs of the BILSTM module can extract more sequence information.
4.5.6. Ablation Experiments of the BLITAR Module
To demonstrate the effectiveness of the BLITAR module, this study conducted five comparative experiments by adding the BLITAR module to the experimental model.
In the comparison between BERT-ATT-BILTAR-GlobalPointer and BERT-ATT-GlobalPointer, the F1 value of the model increased by 1.987% on the CLUENER2020 dataset and by 4.635% on the Weibo dataset.
In the comparison between BERT-BILTAR-GlobalPointer and BERT-GlobalPointer, the F1 value of the model increased by 0.148% on the CLUENER2020 dataset but did not improve on the Weibo dataset.
In the comparison between BERT-BILTAR-CRF and BERT-CRF, the F1 value of the model increased by 0.678% on the CLUENER2020 dataset but did not improve on the Weibo dataset.
In the comparison between BERT-ATT-BILTAR-CRF and BERT-ATT-CRF, the F1 value of the model increased by 0.530% on the CLUENER2020 dataset but did not improve on the Weibo dataset.
Adding the BLITAR module to the model combinations did not have a significant effect on the model performance with CRF on the Weibo dataset. However, adding the BLITAR module to the BERT-ATT-GlobalPointer model improved the F1 value of the model. Therefore, the BLITAR module can extract deeper semantic information from text sequences. Compared with the CRF, the BILTAR module is more suitable for the GlobalPointer module.
Compared to CRF modules, it is better to add a GlobalPointer module after the BILTAR module under the same conditions.
Through these four module combination experiments, it is shown that the BILTAR module can extract the deep features of the text sequence. In the BILTAR module, BILSTM can extract the connections between words, the PN module processes the forward and backward information of the sequence, respectively, the attention mechanism denoises the data, the TIME module encoder extracts the deep semantic features of the words, the residual summation method efficiently fuses the forward and backward information of the sequence, and the fused information is the deep semantic information of the sequence. Finally, the effect of the model is improved by the various submodules in the experimental results.
4.5.7. Ablation Experiment on the Number of Fully Connected Layers in TIME Module
The encoder of the TIME module changes the number of fully connected layers to do five groups of ablation experiments and compare the experimental effect. The experimental results show that the model performs best when the encoder is set with two fully connected layers and the decoder also with two fully connected layers. It is concluded that the TIME module can extract deeper semantic information by feature dimension reduction and dimension increase. (A,b) indicate that the encoder of the module has A fully connected layers and the decoder has B fully connected layers. (A) indicates that the module consists of A fully connected layers. The experimental results are shown in
Table 6. Bold data indicates the highest value of the corresponding indicator in the table.
4.5.8. Ablation Experiment of Information Fusion Method of Summation and Concatenation
In the fusion of forward and backward information, two sets of ablation experiments were performed by summation and cascade, respectively. The results shows that the residual summation method is superior to the concatenation method. On the CLUENER2020 dataset, the
F1 value of the model increased by 0.531%, while on the Weibo dataset, the
F1 value of the model increased by 3.582%. It is concluded that the summation method for fusing forward and backward information can retain more effective information compared to the concatenation method. The experimental results are shown in
Table 7. Bold data indicates the highest value of the corresponding indicator in the table.
4.5.9. Summary and Analysis of Ablation Experiment
In short, the performance of the model is improved when the submodules of the model are added to various combined models. Eighteen sets of ablation experiments show the validity of each submodule of the model. The CLUENER2020 dataset contains ten categories, and each category represents a domain. The Weibo dataset contains four categories, and each category is labeled with two markers, explicit entities (NAM), and generic entities (NOM). The CLUENER2020 dataset focuses on distinguishing between multiple categories, and the Weibo dataset has a lower requirement for distinguishing between categories. But the Weibo dataset requires the model to recognize whether entities are explicitly or generically referred to. The difficulties of the two datasets are different, so in the same ablation experiment, the model has different enhancement effects on the two datasets.