In this section, we analyze the experimental results and model evaluation of the emotion research task, including the selection and division of the datasets, the selection of the hyperparameters, and the selection of the evaluation indicators, and judge the performance of the method based on the relevant parameters. We compare the model used in our research with other deep learning models and set up an ablation experiment according to the method used in our research for a performance comparison.
5.3. Experimental Data Processing and Its Progress
To enhance the model’s performance, this research employed jieba word segmentation for the text segmentation and removed stop words and illegal characters after the word segmentation. During the data reading process, the dataset’s files were read and stored in three arrays, the labels were merged into one label array, and the text was tokenized using Tokenizer; that is, the text was converted into a sequence of numbers. The experimental operation’s process was completed on a Windows 10 operating system, using CPU for the training, the programming language used was Python, the development tool used was PyCharm, the deep learning framework used was TensorFlow, and the processor was an Intel(R) Core (TM) i5-5200U CPU RAM = 12.0 GB.
5.4. Determination of Experimental Parameters and Experimental Results
In this section, we select the hyperparameters, perform the individual training and separate tuning of the hyperparameters, select a combination of hyperparameters related to the model, and conduct comparative training with other hyperparameter combinations to determine the optimal hyperparameters of the model in this paper’s combination of parameters.
The model’s manifestation can be significantly influenced by the number of iterations. When the number of iterations is too limited, the ability to learn features is insufficient, and the effect is not good. Excessive iterations can lead to the occurrence of overfitting, which will not make the model perform better. The outcomes of the tests are displayed in
Table 2 and
Figure 6. Once the counts of iterations surpassed 20, the model’s results started to deteriorate, which can be attributed to the occurrence of overfitting. When the number of iterations fell below 20, the model had an insufficient ability to learn features, resulting in the effect being nonoptimal. From the analysis, 20 was chosen as the optimal iterative parameter for the model in this paper.
The performance of the model was affected by the text input’s length. A longer text length requires filling in many zero values, which will reduce the performance and accuracy of the representation. If the text input is too short, the review length will be greater and needs to be intercepted, thereby affecting the emotional characteristics of the captured text. Through statistical calculations, the length of most sentences in our dataset was concentrated at about 180 words, so we selected comparison parameters in an appropriate value range for the experiments. The outcomes of these experiments are displayed in
Table 3 and
Figure 7. When the text input length was 210, the experiment achieved the best results, which were roughly consistent with the previous statistics. Therefore, this paper chose 210 as the text input length, which is close to the sentence length of most input data.
The model will have an overfitting phenomenon during the training process, and in order to avoid overfitting, we introduced a dropout value in the research. The phenomenon of overfitting usually shows that the loss function of the training data will be relatively small, and the accuracy rate will be relatively high; however, the prediction accuracy rate on the test data will be low, and the loss function will be relatively large. The introduction of the dropout value will reduce the complex co-adaptive relationship among neurons so that it will not rely too much on some local features, which can enhance the model’s capacity for generalization and enhance the performance of the neural network. As shown in
Table 4 and
Figure 8 below, overfitting can be effectively avoided when dropout = 0.5 and the model’s performance is the best at this time.
The learning rate is an important hyperparameter in deep learning. It is used to control the parameter update range of the model during the training and decide the speed and step size of renewing the weight parameters in iterations. Our goal is to optimize the weight parameters by minimizing the loss function. The learning rate controls the magnitude during the parameter update. Increasing the learning rate can speed up the convergence process, and it may also result in oscillating around the best solution or not converging to the optimal solution. In the event that the learning rate is too diminutive, the loss will exhibit no changes over an extended duration and the optimization process will be very slow, or even impossible to achieve optimal solution.
Using an appropriate learning rate is a key step during the training. We needed to dynamically adjust the learning rate to balance the convergence speed and model performance during the training, such as through learning rate scheduling or using adaptive learning rate algorithms. This paper dynamically adjusted the learning rate and combined this result under various learning rates and the change trend of the loss function. Through experiments and data analysis, the model in this paper works best when the learning rate is 0.0001. The experiment is shown in
Table 5 and
Figure 9.
The batch size refers to the quantity of text samples that are provided as input to the model in one iteration. If the value is too large, the model may fall into a poor local optimum and cannot achieve a better performance. If the batch size is too large, multiple data can be processed at the same time to learn more detailed features, but this may cause the model to overfit the training data. In addition, a larger batch size requires more memory to store input data and gradient information; if the memory is insufficient, training may fail. On the contrary, too small of a batch size will increase the training time, which may lead to instability in the training process and may also lead to underfitting of the model, making the model’s generalization capacity worse, because it can only see a small amount of data, making it difficult to learn global features. Therefore, selecting the appropriate batch size is crucial to attain the desired research outcome. As shown in
Table 6 and
Figure 10, the experiment shows that when the batch size is 22, the model achieves the best effect.
Through the above experiments, the optimal hyperparameter combination of the model in this paper is summarized. The hyperparameter combination is presented in the following
Table 7.
5.5. Model Comparison
To validate the efficacy of our model at text sentiment analysis, the following comparative experiments were designed. We selected seven representative deep learning models to conduct comparative experiments on the datasets used in this paper [
15,
34,
35,
36,
37,
38,
39]. The comparison model is segmented into three components: the initial part is a BERT-based model, the second component is a conventional deep learning model method, and the third component is a combination model related to the model in this paper.
By analyzing
Table 8, it is evident that the effect of the traditional deep learning model is not satisfactory. The model presented in our research has significant advantages in
and
. This is because this article introduces the
language model to convert text into dynamic word vectors and then combines MultiHeadAttention, CNN, and BiGRU
Among these, BiGRU has lower model complexity and simple parameters, which reduces the cost. It extracts richer contextual information from two directions and makes full use of contextual information to enhance the text’s semantic mining effect. Therefore, the model that combines BiGRU and CNN performs better than the CNN model alone, with a 1.5% increase in
and a 1.9% increase in the
. In addition, incorporating the attention idea into the traditional deep learning model can further improve the performance of the model, as it can capture the text content related to emotional color. Our research used the MultiHeadAttention method, which was obtained after optimization and improvement of the attention mechanism. It can obtain attention information at multiple levels and obtain the correlation among data in different information spaces from different subspaces. By subjecting the text to multiple linear transformations, it can acquire a comprehensive understanding of emotional information through learning the attention representation. After adding the BERT pretrained method, the emotion prediction performance experienced a substantial improvement; this is attributed to the fact that the traditional language models Word2vec and GloVe have not solved the problem of word vectors obtained after training still being static.
is an advanced pretrained language model that can dynamically generate word vectors according to the context, which can make word representation more accurate and have stronger context fusion capabilities, and it is more conducive to the development of downstream tasks.
The experimental outcome indicates that the classification accuracy of our proposed method was better than that of all of the compared models, which fully reflects the advantage of our method over the other models. The following aspects serve as the primary manifestations of this: We first used the advanced method to convert the input word into a dynamic word vector, which can obtain a more comprehensive and accurate word vector embedding. Afterwards, a CNN employed for the extraction of crucial features of the text through convolution and pooling operations, and then we used BiGRU to extract richer contextual information representing emotional features to obtain a more comprehensive and deep text feature representation. Finally, we used the MultiHeadAttention mechanism to obtain correlations among the data in different information spaces from different subspaces to selectively extract the features related to emotion in the context features and obtain more comprehensive emotional information.
5.6. Ablation Experiment
With the aim of evaluating the effect of employing the mechanism and in our research, we set up comparative experiments and conducted ablation experiments on the basis of our proposed model.
: and were removed from our structure, while and were retained. The model transformed the input text into dynamic word vectors, and then it used the model to perform convolution and pooling operations. Finally, the outcome entered the output layer.
: was removed from our model, and was retained. The remaining model was . transformed the input text into dynamic word vectors, and then the performed convolution and pooling operations; afterwards, extracted richer contextual information representing emotional features. Finally, the outcome entered the output layer. This group of experiments studied the impact of the introduction of on the model’s performance.
: was removed from the model, and was retained. The remaining model was . The model converted the input text into dynamic word vectors, and then it used the model to perform convolution and pooling operations; afterwards, it used the to extract the features related to emotion in the context features. Finally, the outcome entered the output layer. This group of experiments studied the impact of introducing on the model’s performance.
Our model: .
Table 9 and
Figure 11 display the test outcomes of the ablation experiments. It is better to retain all methods of our model than removing BiGRU and MultiHeadAttention at the same time. The
rate exhibited a 2% increase. After adding
, the model’s
increased by 1.3%. This is because
can more effectively capture the distant dependency relationship in the context, thereby extracting richer contextual information in the text representing emotional features and can obtain a more comprehensive and deep text feature representation, thereby improving the model’s effect. Furthermore, it is evident that adding the
mechanism can improve the model’s manifestation, and the model achieved a 1.1% increase in accuracy. The sentiment word representing the sentiment color had a good effect on predicting the text’s polarity. We introduced the
mechanism from different subspaces to obtain correlations among the data in different information spaces to selectively extract features related to emotions in the context features and obtain more comprehensive and important emotional information, thereby enhancing the model’s performance.