Deep Learning Short Text Sentiment Analysis Based on Improved Particle Swarm Optimization
Abstract
:1. Introduction
2. Related Works
2.1. IPSO
2.1.1. Improvement of the Inertia Weight ‘w’
2.1.2. Improvement of Learning Factors
- (1)
- Randomly generate an initial batch of particles (along with initialized particle velocities and positions) to constitute the current population.
- (2)
- Calculate the fitness for each particle.
- (3)
- Update the individual best and global best solutions for each particle based on their fitness function values. The updated equations are listed as Equation (8). Here, ‘’ represents the fitness value of the particle at the current iteration ‘’, ‘’ represents the global best solution at iteration ‘’, and ‘’ is the maximum number of iterations.
- (4)
- Updating particle velocities and positions using Equations (3) and (4).
- (5)
- Calculate the fitness for each particle again.
- (6)
- Check if the iteration is complete. If it is, output the current best solution. Otherwise, proceed to (3).
2.1.3. Performance Analysis of the IPSO
2.2. BiLSTM
2.3. TextCNN
2.4. Self-Attention
3. Methodology
3.1. Noise Injection and Adversarial Training
- Utilizing dropout: This involves randomly deactivating different weights during training.
- Leveraging adversarial networks for noise injection: Adversarial networks are used to generate noisy data, introducing necessary perturbations.
3.2. Model Architecture
4. Experiments and Analysis
4.1. Experimental Environment
4.2. Datasets
- (1)
- The IMDB: This movie review dataset comprises user reviews from the IMDB website about movies. For the experiment, all 50,000 records from this dataset were obtained.
- (2)
- The MR: This dataset is a binary-class English movie review dataset, containing an equal number of positive and negative sentiment samples, totaling 10,662 instances.
- (3)
- The SST-2: This dataset originates from the Stanford Sentiment Treebank and is designed specifically for binary sentiment classification tasks, comprising a total of 70,044 sentiment-labeled comments.
- (4)
- The Hotel Reviews: This dataset, hereinafter referred to as the HR dataset, consists of user-generated polarity sentiment reviews about renowned hotels worldwide, collected by the Whale Community. After eliminating irrelevant comments, this dataset comprises a total of 877,640 valuable entries.
4.3. Metrics
4.4. Hyperparameter Optimization Based on IPSO
- (1)
- Dropout rate, denoted as ‘’, representing the random deactivation rate of the dropout layer in the generators of GAN, with a range of ;
- (2)
- BiLSTM hidden layer dimensions, denoted as ‘’, indicating the dimensionality of the output vectors from the BiLSTM layer, with a range of ;
- (3)
- Learning rate, denoted as ‘’, which controls the magnitude of updates to the internal parameters during model training, with ;
- (4)
- Batch size, indicating the number of samples used in each training iteration, with ;
- (5)
- Convolutional kernel strides, denoted as ‘’, representing the length of each movement of the convolutional kernel in the TextCNN layer, with ;
4.5. Comparative Model Experiments and Results Analysis
- (1)
- FastText [38]: an open-source model by Facebook which is a rapid text classification approach. Its fundamental idea involves the summation of N-gram vectors of the text, followed by averaging to obtain the vector representation of the text. This vector is then fed into a softmax classifier to yield the classification outcome.
- (2)
- TextCNN-MultiChannel [5]: This model, also known as the TextCNN model proposed by Kim, is a multi-channel convolutional neural network. It utilizes convolutional kernels with window sizes of 3, 4, and 5 in the convolutional layers to extract local text features of various scales. These features are then concatenated after undergoing max-pooling, followed by classification using the Softmax operation.
- (3)
- DCNN [39]: The DCNN employs wide convolutions for feature extraction, allowing it to effectively utilize the edge information present in the text.
- (4)
- RCNN [40]: The RCNN employs a recurrent structure to learn word representations while simultaneously capturing contextual information as much as possible. Additionally, it incorporates maximum pooling to automatically identify the pivotal role of words in text classification, thereby capturing critical semantics within the text.
- (5)
- BiLSTM+EMB_Att [41]: The BiLSTM+EMB_Att model employs an attention mechanism to directly learn the weight distribution of each word’s sentiment tendency from the foundational word vectors. It then utilizes a BiLSTM to capture the semantic information within the text, followed by a fusion process for classification.
- (6)
- BiLSTM-CNN [17]: The BiLSTM-CNN model initially employs BiLSTM to extract contextual semantic information. Subsequently, it utilizes multiple convolutional layers to extract local semantics and then fuses them. The classification is ultimately performed using the Softmax function.
- (7)
- MLACNN [42]: The MLACNN model employs CNN to extract local semantic information and utilizes LSTM-Attention to obtain global semantic information. The outputs from both components are fused to achieve classification.
- (8)
- MC-AttCNN-AttBiGRU [21]: This model incorporates the Attention mechanism over word embeddings. It employs both TextCNN and BiGRU to capture local and global features, respectively. The extracted features from both components are then fused and used for classification.
4.6. Impact Analysis of GAN
4.7. Ablation and Validation Experiments
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Vries, A.; Mamoulis, N.; Nes, N. Efficient k-NN search on vertically decomposed data. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA, 3–6 June 2002; pp. 322–333. [Google Scholar]
- Chen, Z.; Shi, G.; Wang, X. Text Classification Based on Naive Bayes Algorithm with Feature Selection. Int. J. Inf. 2012, 15, 4255–4260. [Google Scholar]
- Wang, J.H.; Yu, H.; Chan, W. Automatic text Classification based on KNN+ Hierarchical SVM. Comput. Appl. Softw. 2016, 33, 38–41. [Google Scholar]
- Rojas, B.; Maria, L. Deep learning for sentiment analysis. Ling. Linguist. Compass 2016, 10, 205–212. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1746–1751. [Google Scholar]
- Conneau, A.; Schwenk, H.; Barrault, L. Very deep convolutional networks for text classification. arXiv 2016, arXiv:1606.01781v1. [Google Scholar]
- Le, H.T.; Cerisara, C.; Denis, A. Do convolutional networks need to be deep for text classification? arXiv 2017, arXiv:1707.04108. [Google Scholar]
- Johnson, R.; Zhang, T. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meetings of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 562–570. [Google Scholar]
- Guo, J.; Yue, B.; Xu, G. An enhanced convolutional neural network model for answer selection. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; pp. 789–790. [Google Scholar]
- Wang, H.; He, J.; Zhang, X. A short text classification method based on n-gram and CNN. Chin. J. Electron. 2020, 29, 248–254. [Google Scholar] [CrossRef]
- Irsoy, O.; Cardie, C. Opinion Mining with Deep Recurrent Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 720–728. [Google Scholar]
- Xin, W.; Liu, Y.; Sun, C. Predicting Polarities of Tweets by Composing Word Embeddings with Long Short Term Memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 1343–1353. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memery. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Gers, F.A.; Schermidhber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neual Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
- Cho, K.; Van Merrienboer, B.; Gulcehre, C. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Tai, K.S.; Socher, R.; Manning, C.D. Improved semantic representations from tree-structured long short-term memory networks. arXiv 2015, arXiv:1503.00075. [Google Scholar]
- Zhao, H.; Wang, L.; Wang, W.J. Text sentiment analysis based on serial hybrid model of bi-directional long short-term memory and convolutional neural network. J. Comput. Appl. 2020, 40, 16–22. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Zhou, Y.; Xu, J.; Cao, J. Hybrid attention networks for Chinese short text classification. Comput. Syst. 2017, 21, 759–769. [Google Scholar] [CrossRef]
- Cheng, Y.; Yao, L.B.; Zhang, G.H.; Tang, T.W.; Xiang, G.X.; Chen, H.M.; Feng, Y.; Cai, Z. Text Sentiment Orientation Analysis of Multi-Channels CNN and BiGRU Based on Attention Mechanism. J. Comput. Res. Dev. 2020, 57, 2583–2595. [Google Scholar]
- Matthias, F.; Katharina, E.; Stefan, F. Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning. arXiv 2021, arXiv:2007.04074v2. [Google Scholar]
- Wang, Y.; Yan, Y.; Li, Z. Efficient and Robust Auto-tuning of Deep Learning Hyperparameters with Gaussian Processses. JMLR 2023, 24, 1–99. [Google Scholar]
- Jeremy, B.; Chirs, M. Automatic Gradient Descent: Deep Learning without Hyperparameters. arXiv 2023, arXiv:2304.05187. [Google Scholar]
- Yapici, H.; Cetinkaya, N. A new meta-heuristic optimizer: Pathfinder algorithm. Appl. Soft Comput. 2019, 78, 545–568. [Google Scholar] [CrossRef]
- Mirjalili, S.; Hatamlou, A. Multi-verse optimizer: A nature-inspired algorithm for global optimization. Neural Comput. Appl. 2016, 27, 459–513. [Google Scholar] [CrossRef]
- Mirijalili, S. Moth-flame optimization algorith: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
- Abualigah, L.; Diabat, A.; Mirjalili, S. The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 133609. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the Icnn95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
- Wang, J.J.; Tufan, K. Parameter Optimization of Interval Type-2 Fuzzy Neural Network Based on PSO and BBBC Methods. IEEE/CAA J. Autom. Sin. 2019, 1, 247–257. [Google Scholar] [CrossRef]
- Shafipour, M.; Rashno, A.; Fadaei, S. Particle distance rank feature selection by particle swarm optimization. Expert Syst. Appl. 2021, 185, 115620. [Google Scholar] [CrossRef]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the IEEE International Joint Conference on Neural Networks, Montreal, QC, Canda, 31 July–4 August 2005; pp. 2047–2052. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M. Generative Adversarial Nets. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Liang, J.J.; Wei, J.J.; Jiang, Z.F. Generative Adversarial Networks GAN Overview. J. Front. Comput. Sci. Technol. 2020, 14, 1–17. [Google Scholar]
- Miyato, T.; Dai, A.M.; Goodfellow, I. Adversarial training methods for semi-supervised text classification. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 1–11. [Google Scholar]
- Chen, R.L. Attention-based adversarial multi-task review text classification. Master’s Thesis, Dalian University of Technology, Dalian, China, 10 May 2019. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Joulin, A.; Grave, E.; Bojanowski, P. Bag of tricks for efficient text classification. arXiv 2016, arXiv:1607.07159v3. [Google Scholar]
- Nal, K.; Edward, G.; Phil, B. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 23–25 June 2014; pp. 655–665. [Google Scholar]
- Lai, S.; Xu, L.; Liu, K. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th Conference on Artififcal Inteligence, Austin, TX, USA, 25–30 January 2015; pp. 2267–2273. [Google Scholar]
- Guan, P.F.; Li, B.A.; Lv, X.Q.; Zhou, J.S. Attention Enhanced Bi-directional LSTM for Sentiment Analysis. JCIP 2019, 33, 105–111. [Google Scholar]
- Teng, J.B.; Kong, W.W.; Tian, Q.X.; Wang, Z.Q. Text Classification Method Based on LSTM-Attention and CNN Hybrid Model. Comput. Eng. Appl. 2021, 57, 126–133. [Google Scholar]
Function Name | Mathematical Expression |
---|---|
axis_parallel_hyper_ellipsoid | |
rosenbrocks_valley | |
schwefel | |
ackley | |
Griewank |
Axis Parallel Hyper Ellipsoid | Rosenbrocks Valley | Schwefel | Ackley | Griewank | |
---|---|---|---|---|---|
PA | 830.983 | ||||
MVO | 830.983 | ||||
MFA | 830.075 | ||||
AOA | 830.075 | ||||
PSO | 830.069 |
Device | Version |
---|---|
Operating System | Win10 64-bit |
CPU | AMD Ryzen 7 5800H |
RAM | 16G |
GPU | GTX1650 |
Programming Language | Python 3.8 |
Integrated Development Environment | Pycharm Community Edition |
Deep Learning Framework | Kreas 2.4.3 |
Datasets | Number of Samples | Average Length | Number of Train Data | Number of Test Data |
---|---|---|---|---|
IMDB | 50,000 + 5000 | 282 | 38,500 | 16,500 |
MR | 10,662 + 1066 | 20 | 8210 | 3518 |
SST-2 | 70,042 + 7004 | 13 | 53,932 | 23,114 |
HR | 877,640 + 87,764 | 24 | 668,083 | 286,321 |
Hyperparameter | Value |
---|---|
0.38/0.42/0.37/0.44 | |
141/97/94/101 | |
0.0079/0.0072/0.0068/0.0078 | |
32/32/32/64 | |
1/1/1/1 |
Models | IMDB | SST-2 | MR | HR |
---|---|---|---|---|
FastText | 81.25/80.95/82.71 | 85.03/84.82/84.91 | 76.81/77.03/76.75 | 84.32/84.26/85.71 |
TextCNN-MultiChannel | 82.33/81.27/80.56 | 87.12/86.91/87.24 | 77.94/76.02/78.13 | 84.79/83.09/83.26 |
DCNN | 81.99/83.26/82.03 | 86.83/86.03/86.80 | 77.52/76.59/77.46 | 86.76/84.23/83.79 |
RCNN | 82.41/81.71/80.26 | 87.82/88.41/87.74 | 80.28/81.99/80.25 | 87.91/85.99/85.41 |
BiLSTM+EMB_Att | 84.49/83.02/85.13 | 88.84/87.56/88.65 | 81.32/82.31/81.37 | 89.03/87.23/88.41 |
BiLSTM-CNN | 85.75/84.26/84.51 | 88.57/89.21/88.41 | 80.60/79.21/80.47 | 89.69/87.29/87.03 |
MLACNN | 86.47/86.74/85.02 | 89.83/89.95/89.91 | 82.51/81.03/81.89 | 90.24/89.46/88.73 |
MC-AttCNN-AttBiGRU | 88.39/87.09/87.38 | 90.35/90.04/90.32 | 83.38/82.71/83.30 | 92.35/91.23/91.45 |
BiLSTM-TCSA(Ours) | 91.38/92.15/91.52 | 91.74/91.04/92.20 | 85.49/84.76/84.21 | 94.59/92.71/92.17 |
Models | IMDB | SST-2 | MR | HR |
---|---|---|---|---|
FastText | 81.25/82.03 | 85.03/85.99 | 76.81/77.79 | 84.32/85.72 |
TextCNN-MultiChannel | 82.33/83.16 | 87.12/87.81 | 77.94/79.03 | 84.79/86.23 |
DCNN | 81.99/83.26 | 86.83/87.71 | 77.52/78.49 | 86.76/88.01 |
RCNN | 82.41/83.44 | 87.82/88.52 | 80.28/81.31 | 87.91/89.02 |
BiLSTM-TCSA(Ours) | 90.24/91.38 | 90.88/91.74 | 84.36/85.49 | 93.24/94.59 |
Model | Hidden Layer Dimension | Learning Rate | Batch_Size | Strides | Dropout |
---|---|---|---|---|---|
LSTM | ☑ | ☑ | ☑ | ||
BiLSTM | ☑ | ☑ | ☑ | ||
LSTM-TextCNN | ☑ | ☑ | ☑ | ☑ | |
BiLSTM-TextCNN | ☑ | ☑ | ☑ | ☑ | |
LSTM-Attention | ☑ | ☑ | ☑ | ||
BiLSTM-Attention | ☑ | ☑ | ☑ | ||
TextCNN-Attention | ☑ | ☑ | ☑ | ||
BiLSTM-TCSA (Ours) | ☑ | ☑ | ☑ | ☑ | ☑ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yue, Y.; Peng, Y.; Wang, D. Deep Learning Short Text Sentiment Analysis Based on Improved Particle Swarm Optimization. Electronics 2023, 12, 4119. https://doi.org/10.3390/electronics12194119
Yue Y, Peng Y, Wang D. Deep Learning Short Text Sentiment Analysis Based on Improved Particle Swarm Optimization. Electronics. 2023; 12(19):4119. https://doi.org/10.3390/electronics12194119
Chicago/Turabian StyleYue, Yaowei, Yun Peng, and Duancheng Wang. 2023. "Deep Learning Short Text Sentiment Analysis Based on Improved Particle Swarm Optimization" Electronics 12, no. 19: 4119. https://doi.org/10.3390/electronics12194119
APA StyleYue, Y., Peng, Y., & Wang, D. (2023). Deep Learning Short Text Sentiment Analysis Based on Improved Particle Swarm Optimization. Electronics, 12(19), 4119. https://doi.org/10.3390/electronics12194119