On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis

Alawneh, Hussam; Hasasneh, Ahmad; Maree, Mohammed

doi:10.3390/modelling5040076

Open AccessArticle

On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis

by

Hussam Alawneh

¹

,

Ahmad Hasasneh

^1,* and

Mohammed Maree

^2,*

¹

Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine

²

Department of Information Technology, Arab American University, Ramallah P.O. Box 240, Palestine

^*

Authors to whom correspondence should be addressed.

Modelling 2024, 5(4), 1469-1489; https://doi.org/10.3390/modelling5040076

Submission received: 2 July 2024 / Revised: 20 September 2024 / Accepted: 30 September 2024 / Published: 7 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

Social media users often express their emotions through text in posts and tweets, and these can be used for sentiment analysis, identifying text as positive or negative. Sentiment analysis is critical for different fields such as politics, tourism, e-commerce, education, and health. However, sentiment analysis approaches that perform well on English text encounter challenges with Arabic text due to its morphological complexity. Effective data preprocessing and machine learning techniques are essential to overcome these challenges and provide insightful sentiment predictions for Arabic text. This paper evaluates a combined CNN-LSTM framework with emoji encoding for Arabic Sentiment Analysis, using the Arabic Sentiment Twitter Corpus (ASTC) dataset. Three experiments were conducted with eight-parameter fusion approaches to evaluate the effect of data preprocessing, namely the effect of emoji encoding on their real and emotional meaning. Emoji meanings were collected from four websites specialized in finding the meaning of emojis in social media. Furthermore, the Keras tuner optimized the CNN-LSTM parameters during the 5-fold cross-validation process. The highest accuracy rate (91.85%) was achieved by keeping non-Arabic words and removing punctuation, using the Snowball stemmer after encoding emojis into Arabic text, and applying Keras embedding. This approach is competitive with other state-of-the-art approaches, showing that emoji encoding enriches text by accurately reflecting emotions, and enabling investigation of the effect of data preprocessing, allowing the hybrid model to achieve comparable results to the study using the same ASTC dataset, thereby improving sentiment analysis accuracy.

Keywords:

sentiment analysis; emoji encoding; CNN-LSTM; hyperparameters optimization; NLP; data preprocessing

1. Introduction

Due to the growth and proliferation of social media platforms, the huge amount of textual data available on the Internet is prompting more attention to be given to sentiment analysis [1]. Sentiment analysis (SA), often referred to as opinion mining, is a type of Natural Language Processing (NLP) that aims to extract sentiments by analyzing textual data and classifying it based on text polarity [2]. It plays an important role in analyzing thoughts, opinions, and emotions in texts written about healthcare systems, e-commerce, and social networks [3]. Although Arabic is one of the most widely used languages in the world, research in Arabic sentiment analysis is still growing slowly compared to other languages such as English [4]. Therefore, extending the same success in SA to the Arabic language is still a challenge.

In the field of SA, most research is focused on the English language, with little attention paid to the Arabic language [5]. This is because Arabic Sentiment Analysis (ASA) is still challenging due to Arabic varieties, orthography, morphology, lack of corpora, lack of sentiment lexicons, and the use of dialectal Arabic [6]. Arabic is a global language with more than 500 million speakers worldwide [7], and about 185 million Arabic speakers use the Web [6]. Thus, ASA has recently emerged as an active research area, particularly in the field of Machine Learning (ML) applications [8]. One of the ways to strengthen the ASA domain is through the use of emojis, as they provide helpful features to enrich the textual features for sentiment analysis, which are becoming more popular in the world of social media [9].

They provide a rich source of semantic dimensions that can assist in conveying users’ opinions. Here, we did not just consider emoticons that reflect facial expressions, but also those that are used to enrich the text with concepts and ideas, such as celebrations, weather status, vehicles and buildings, food and drink, animals and plants, and the intended feelings and emotions from their use [10]. For example, the “❤” emoji means “يحب شخص و الرومنسية و المودة”, and in English means “loves someone, romance, and affection”, and “😀” means “السعادة والإثارة بشكل عام”, and means “happiness and excitement in general” in English, while “⛰” is rich in meanings and intentions such as “الجبال المادية أو فكرة المشي لمسافات طويلة والمغامرة. الإعجاب بالطبيعة أو القوة أو السفر. او التغلب على التحديات، أو إحساسًا بالسلام والتأمل”, which means “Physical mountains or the idea of hiking and adventure. Admiration for nature, strength, or travel. Or overcoming challenges, or a sense of peace and contemplation”. Thus, eliminating such emojis could omit valuable information and feelings that they reflect, and change the overall meaning of the user’s tweet and its emotional tone. On the other hand, including the intended meaning and emotion of the emoji will help ML extract the right insights and support decision-makers and managers in their decision-making.

This research work presents an approach to emoji encoding introduced by replacing each emoji with its emotional and real social media meaning. Furthermore, a hybrid deep learning model is proposed to evaluate the impact of this preprocessing step on the quality of ASA and to build robust prediction models. These techniques address specific challenges in Arabic sentiment analysis, such as the complexity of Arabic dialects, the lack of sentiment lexicons, and the intricacies of Arabic morphology. Our approach advances the state of the art by offering a more nuanced understanding of how these techniques can be effectively employed to overcome these challenges. To the best of our knowledge, this is the first work that utilizes a combined deep learning approach with emoji encoding for Arabic Sentiment Analysis, which deserves to be considered.

Accordingly, we can summarize the main contribution of our proposed approach as follows:

Combination of emoji encoding with the hybrid CNN-LSTM model: Our method integrated emoji encoding that captures all the emotional and real meanings, specifically tailored to enhance the understanding of sentiment in Arabic text.
Impact of preprocessing steps: We explored the effects of various preprocessing techniques, such as keeping non-Arabic words, retaining punctuations, and using different stemmers and embedding transformers, on the performance of our sentiment analyzer. This exploration provides deeper insights into how specific transformation or stemming strategies can effectively leverage punctuation and non-Arabic words to enhance sentiment extraction in Arabic text.

The rest of this paper is organized as follows. The literature review on sentiment analysis and text data preprocessing is discussed in Section 2. Then, the proposed methodology, including data collection, preprocessing, and hybrid model prediction and tuning processes, is presented in Section 3. Section 4 shows the results obtained from the different experiments, which are discussed in Section 5. Finally, Section 6 presents the conclusions and suggests future work in this area.

2. Literature Review

Sentiment analysis is the understanding of people’s opinions, emotions, and attitudes toward any topic or person expressed in textual data [11]. In the field of Natural Language Processing, the ASA has recently received increasing attention [12]. Through reading on the ASA field, we found research undertaken on hybrid models, deep learning models, and classical machine learning models for classifying Arabic sentiments.

Hybrid models play a role in our understanding of the complexity of Arabic sentiment as these models are trained on different datasets to build predictive models. The study in [13] applied a combination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) on three datasets, the Arabic Health Services Dataset (Main-AHS and Sub-AHS) [14], Ar-Twitter [15], and the Arabic Sentiment Tweets Dataset (ASTD) [16] datasets. The max-pooling layer was excluded from the CNN to maintain the same feature vector length after convolving the filters on the input data. In addition, several dataset-preparation techniques such as MADAMIRA, Farasa, and Stanford for Arabic text preprocessing and several pre-trained word-embedding techniques for providing vector representation for the text features, such as Word2Vec, Glove, and fastText, were investigated to improve the accuracy of Arabic sentiment classification. The best accuracy, of 94.83%, was achieved for the Main-AHS dataset using Farasa Lemmatization normalization, and 88.86% for the Ar-Twitter dataset using Madamira Stem normalization and 81.62% for the ASTD using Word2VecSG word embedding. Subsequently, a more complex approach was proposed in [17] by implementing a hybrid model to combine contextualized sentence representations generated by the AraBERT model with static word embedding using pre-trained Mazajak. In addition, CNN-Bidirectional Long Short-Term Memory (CNN-BiLSTM) was used to obtain sentence representations from the static word vectors in order to be able to concatenate the two types of embeddings. The hybrid model outperforms the standalone AraBERT model tested on the ArSarcasm-v2 dataset for both sarcasm and sentiment classification tasks. The best results are a 0.62 F1 score and 0.715 F-PN score (macro average of positive and negative class F scores) for sarcasm and sentiment classification, respectively. Another hybrid model of CNN-BiLSTM was used in [18] for different tasks, including a topic classifier, a sentiment analyzer, a sarcasm detector, and an emotion classifier. This model was trained on different datasets for each task, with four of them for sentiment analysis tasks; SS2030, ArSAS, Twitter dataset for Arabic Sentiment Analysis, and ArSarcasm-v2 datasets, consisting of 4214, 21,000, 348,797, and 15,548 tweets, respectively. The proposed model achieves an accuracy of 97.58%, 86%, 97%, and 81.6% for topic, sentiment, sarcasm, and emotion classification, respectively.

On the other hand, deep learning has also been used for ASA. For example, the study in [19] used deep learning to evaluate GloVe, Word2Vec, and FastText as classical word embedding techniques and ARBERT as contextualized word embedding for sentiment analysis with a comparative analysis. The word embedding techniques were evaluated in trained and pre-trained versions by applying two deep learning models of BiLSTM and CNN on five datasets, including HARD, Khooli, Arabic Jordanian General Tweets (AJGT), ArSAS, and ASTD for sentiment classification. The BiLSTM model outperforms CNN on three datasets, while CNN performs better on smaller datasets. In addition, the generated embeddings outperform their pre-trained versions by about 0.28% to 1.8% accuracy. The contextualized transformer-based embedding BERT model achieves the highest performance in both trained and pre-trained versions. Another study in [20] employed Deep Neural Networks along with investigating Support Vector Machines (SVM), Naive Bayes (NB), and Random Forest (RF) as classical ML models that were tuned using Differential Evolution (DE) algorithms for classifying the sentiment of Arabic texts related to monkeypox. The dataset used was collected from Twitter over eight months, resulting in 4763 tweets. The best result was obtained using the DNN based on Leaky ReLU with an accuracy of 92%.

Classical ML has also been used for ASA. Thus, several supervised ML models have been applied in [21], including SVM, Linear Regression, NB, Complementary Naive Bayes (CNB), and Stochastic Gradient Descent (SGD) for both sentiment and sarcasm classification. These models were trained and tested with 5-fold cross-validation on the ArSarcasm-v2 dataset. The best accuracy was achieved using SVM with 59.8% and 74.6% for sentiment and sarcasm, respectively. Based on the same dataset, an improvement was presented in [22] by applying different versions of two transformer-based models, AraELECTRA and AraBERT, for sarcasm and sentiment detection. The best results for sarcasm were achieved by the AraBERTv2-base model with an accuracy of 78.3%, while AraBERTv0.2-large was the best for the sentiment task, with an accuracy of 65.15%. It is important to note that the pre-trained model in [3] was not used to generate the embeddings. Instead, it presents a fine-tuning approach of three stages for a pre-trained model called Arabic BERT, which was developed for Arabic sentiment analysis. These stages consist of text pre-processing and data cleaning, transfer learning of weights of pre-trained models, and a classification layer. Model evaluation was performed by testing this model on five different Arabic review datasets and comparing its results with 11 state-of-the-art models. This model outperforms the prediction accuracy of the proposed models.

Researchers in SA follow different strategies to deal with emojis; some researchers just eliminate the emojis, while others have considered the significance of emojis in their work [23]. Including the emojis can help in expressing writers’ feelings, which helps in improving the classification performance [24].

One strategy exploits the emojis in SA by replacing the emojis with textual data, such as the study in [25], which is directed towards translating emojis by conducting emoji Unicode translation. Also, it investigates the effect of combining Recurrent Neural Network (RNN), LSTM, and Gated Recurrent Unit (GRU) in conjunction with Logistic Regression (LR), RF, and SVM and grid search to improve the prediction performance for Arabic sentiment analysis. The model performance is compared with three deep learning models, which are RNN, LSTM, and GRU, implemented with CBOW word embedding and tuned using Keras-tuner, and with five ML models, which are Decision Tree (DT), LR, K-Nearest Neighbor (KNN), RF, and NB, implemented with the Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction model and grid-search cross-validation for model tuning. Different datasets are used for training and testing the models: ASTC, ArTwitter, and AJGT. Stacking LR achieved the highest testing accuracy of 92.22% compared to ML models and DL models when using the ASTC dataset. Also, the study in [26] used a Russian dataset of 6957 posts and each post has at least one emotional indicator (emojis, emoticons, punctuation marks that express emotions); each emotional indicator was replaced with its meaning to improve the model. The best model was an ensemble model of word2vector model and a model of emotional indicator embedding tested on a dataset of 524 posts with an accuracy of 91%.

Another strategy to improve SA is to use emojis as non-verbal features. The study in [23] adapted non-verbal features for the task of Arabic sentiment analysis. Thus, several ML models including NB, multinomial naive Bayes (MNB), SGD, sequential minimal optimization-based support vector machines (SMO-SVM), DT, and RF were evaluated on emoji-based features with a feature vector of length 429 and for 2091 instances. The MNB achieved the best Area Under the Curve (AUC) of 87.30% when applied to the top 250 most relevant emojis selected using ReliefF and Correlation-Attribute Evaluator feature selection techniques. In [27], several ML models were also investigated, including SGD, SVM, Gaussian NB, KNN, DT, LSTM, GRU, Bi-LSTM, and bidirectional-GRU, to evaluate non-verbal features. A dataset of 2091 microblogs after excluding tweets without emojis was collected from ASTD, ArTwitter, QCRI, Syria, Semeval-2017 Task4 Subtask#A, and 843 Arabic microblogs with emojis from Twitter and YouTube. Then, the Emoji Sentiment Ranking (ESR) lexicon, which is an emoji lexicon containing 969 used emojis after excluding the unused emojis, and Principle Component Analysis (PCA) were applied to reduce the dimensionality of the features from 430 to 100 features. The best accuracy of 71.71% was achieved by the bidirectional-GRU model. In addition to non-verbal features, textual features were also used in the study of [9]. Thus, five datasets were used after removing instances that did not contain emojis, including Syria, ASTD, ArTwitter, QCRI, and Semeval-2017. After merging all the datasets, each tweet was divided into textual and emoji features, and then for the feature extraction step, the TF-IDF, Latent Semantic Analysis (LSA), and two methods of word embedding were used to extract textual features, while a set of 120 emojis was used to calculate the occurrence of each emoji to obtain nonverbal features. The SVM achieved the best results by merging skip-gram features with emojis and using correlation-based feature selection with an accuracy of 83.02%.

In [28], another approach was applied by training an attention-based long short-term memory network on the embeddings generated by bi-sense emojis and inspired by word sense embedding. To obtain sentiment-aware embeddings of emojis, the bi-sense emojis were learned under positive and negative sentimental tweets. The best accuracy of 90% was achieved on the AA-sentiment dataset using Multi-Level Attention-based LSTM with bi-sense emoji embedding (MATTBiE-LSTM) and 83.4% on the HA-sentiment dataset using word-guide attention-based LSTM with bi-sense emoji embedding.

The previous studies used different hybrid models, transformers, and emoji-handling strategies for ASA. However, the morphological complexity of the Arabic language and the effect of several factors that change the meaning of the text, such as punctuation, non-Arabic words, and emojis, mean that the ASA field needs further investigation. The studies in [13,17,18,19] applied the hybrid and deep learning models on prepared datasets without exploring the effect of emoji meaning, punctuation, or sentences in other languages on the final classification results. Other studies applied classical machine learning models, such as [21,22], but these models could not overcome the complexity of Arabic, so they did not reach high accuracy scores. On the other hand, the studies in [20,25,26] treated the emojis by replacing them with textual data. In contrast, the studies in [23,27] treated the emojis as non-verbal features and removed text which may be rich in sentiment that can improve the model results, so in [9] both the non-verbal features and the original text were used. Although these studies examined emojis, they did not investigate the effect of keeping non-Arabic words, punctuation, or the most suitable transformers when having words written in other languages inside the Arabic text or when keeping punctuation, or emoji encoding on the emotional and real meaning in their results. In this study, we propose a combination of CNN and LSTM models trained and tested on the ASTC dataset to improve the ASA. The study also investigates the effect of the proposed hybrid model under different experiments and conditions to understand the importance of each step in data preprocessing, including examining Keras and AraVec transformers and their suitability when keeping punctuation and non-Arabic words, emoji handling, and the effect of keeping Arabic words and punctuation on the model results.

3. Materials and Methods

Data preparation is an important part of developing accurate and realistic predictive models. It reduces the dimensionality of the data by removing unnecessary text and characters. At the same time, the text can be enriched with the intended feelings of the users by replacing the emoji with its meaning, which leads to improved accuracy. Therefore, the main goal of this research is to investigate the effect of data preprocessing on ASA tasks. Figure 1 shows the workflow used to achieve this goal.

3.1. Dataset Descriptions

In this research, the performance of the proposed model and the importance of data preprocessing were evaluated using two different datasets, namely Arabic Sentiment Twitter Corpus (ASTC) and Emoji Meaning. The ASTC dataset contains Arabic tweets labeled with their corresponding sentiment polarity for training the model, and the Emoji Meaning dataset forms a dictionary of emoji meanings, which is used to give each emoji in the ASTC dataset its meaning to find the effect of emoji encoding on ASA.

The ASTC [29] is a publicly available dataset on Kaggle, collected in April 2019 using a positive and negative emoji lexicon. It is a balanced dataset consisting of 56 K labeled Arabic tweets, as shown in Figure 2, which represent the number of tweets in each class. The dataset is divided into 45 K for model training, with 22,760 positive and 22,513 negative tweets, and 11 K for model testing, with 5751 positive and 5767 negative tweets. The target variable is also labeled as positive or negative to describe the emotions of the tweets. In addition, the Emoji Meaning dataset consists of 912 emojis collected from the ASTC dataset, and then each emoji was mapped to its meaning. Each emoji has an emotional meaning based on Twitter users’ use in their tweets, such as “🙂”, which means “السعادة بشكل عام أو الود او عندما يكون الشخص ساخر أو عدواني بشكل سلبي” and in English means “general happiness or friendliness or when someone is being sarcastic or passive-aggressive”. Other emojis can have an emotional meaning and can be used to indicate their real meaning, such as “🐕”, which means “حيوانك الأليف او الكلب او الولاء والصداقة والرفقة و الثقة” and in English “your pet or dog or loyalty, friendship, companionship, trust”. So, to obtain all the emoji meanings, four websites [30,31,32,33] specialized in collecting emoji meanings were used to map each emoji to its meaning. These websites were also used to validate the consistency of the emoji meanings, providing cross-referencing with several emoji interpretation databases or dictionaries. Also, most emojis can have multiple meanings, so the strategy that was used for handling ambiguity and validating the emojis’ meaning included the addition of all the commonly shared emotional and real word meanings between Arabic-speaking users provided by [30,31] and English-speaking users provided by [32,33], supported by human judgment. In addition to all of this, the performance comparison with and without emoji encoding demonstrates that the inclusion of emoji meaning improves the robustness of the model as a final step in validating the emoji encoding process. The Emoji Meaning dataset provides a rich source of emotion and context that improves model performance by replacing each emoji in the ASTC dataset with its emotional and real meaning from the Emoji Meaning dataset. It is important to point to the fact that emoji interpretation may differ across different cultures; however, to address this issue, we focused in this study on utilizing and interpreting emojis according to their common norms globally among cultures. Nevertheless, we acknowledge that further investigation is still required to identify both common and uncommon emojis that may have various meanings depending on culture.

3.2. Data Pre-Processing

The ASTC dataset contains many duplicate rows, hashtags, and diacritics. Therefore, several preprocessing steps were required to clean the text and remove all tokens that do not contribute to the actual meaning of the text. In particular, the following steps were used in all the experiments we conducted to evaluate our proposed model:

Drop duplication: The ASTC consists of 56 K rows divided into training and testing parts. The training dataset contains 15,721 duplicate rows, while in the testing part there are 2678 duplicate rows, resulting in 18,399 rows being dropped.
Remove hashtags: Twitter users widely adopt the hashtag character “#” to bookmark their tweet content or join a topic or trend community [34,35]. Therefore, the hashtags were removed during this phase.
Remove diacritics: All diacritics were removed from the data because they did not affect the SA measurements [36]. For example, in “ٓ💔سمعت بكاه نهار العزا من بعد قفوا معزينه ليا”, which means in English “I heard him crying on the day of the funeral, then they stood up to offer their condolences to me 💔 ٓ”, diacritics such as Fatha in “ٓ” are used to give an aesthetic shape to the sentence without following the rules of the Arabic language, such as using the Fatha after the broken heart emoji: “💔 ٓ”.
Remove numbers: All numbers are removed from tweets because they do not reflect the sentiments contained in the text and are useless [37].
Removing stop words: stop words are frequently used in Arabic and English languages [36] and they have little semantic value [38]. Therefore, it is necessary to remove the Arabic stop words if only Arabic words are kept in the dataset and both Arabic and English stop words if both languages are used.
Tokenization: The TweetTokenizer from the NLTK library was used to break the text into tokens [39]. It is a simple and fast tokenizer that focuses on data from Twitter and works based on regular expressions [40]. Also, TweetTokenizer preserves the emojis and emoticons as tokens, which allows them to be handled appropriately, and it deals with the repeated characters by reducing them to a length of three [41,42]. This makes it suitable for the dataset and preprocessing experiments used.
Preprocessing is divided into three phases, based on removing non-Arabic words and punctuation in Experiment 1, keeping the non-Arabic words and removing the punctuation in Experiment 2, and keeping the non-Arabic words and the punctuation in Experiment 3. Then, each experiment is tested over eight preprocessing conditions denoted by R1–R8 as shown in Figure 1, and based on the conditions described in the 8, 9, and 10 points.
Handling Emojis: To study the effect of emojis, two approaches were followed; the first one involved removing the emojis from each tweet, while the second approach treated emojis by collecting the emotional and real meaning of emojis based on their usage on social media platforms and replacing each emoji with its textual meaning.
Stemming: This is a common morphological analysis that aims to reduce inflectional forms and achieve a common base form for words in sentences [43]. For the Arabic natural language, different stemmers can indicate the lexical root of the words. Therefore, the effect of using different stemmers in Arabic sentiment analysis was investigated by applying both the Information Science Research Institute’s (ISRI) and Snowball stemmers.
Embedding: Embedding provides a numerical representation for words and sentences by transforming each word into a numerical vector representation that captures the syntactic and semantic meaning based on its contextual usage in the dataset [44]. The effect of the transformation model was investigated by evaluating two transformation methods, Keras embedding and AraVec 3.0 embedding. Keras embedding is trainable and not a pertained model. This means that the embedding vector for each word was adjusted randomly to small weights, and during back-propagation, the embedding vectors were updated to minimize the loss function [45]. On the other hand, AraVec 3.0 [46] is an open-source project that provides a powerful pre-trained model for Arabic word embedding transformation. The latest version of AraVec 3.0 has been trained on two Arabic content domains, namely tweets and Arabic Wikipedia articles, resulting in the provision of 16 different word-embedding models. This version also provides two types of models, unigrams and n-grams, and the most commonly used n-gram models are trained with a total of more than 1,169,075,128 tokens. In this research, the n-gram model was used to generate embedding with a vector size of 100.

3.3. A Combined Deep Learning Model

3.3.1. Convolutional Neural Network (CNN)

CNNs are a neural network type with a design which gives them the ability to process and analyze data with a special representation [47]. CNNs are excellent at capturing spatial dependencies of targets and their environment, which makes them well-suited for tasks such as time-series prediction, image recognition, natural language processing, and audio signal pattern recognition [47,48]. In the proposed model, the CNN part is used to extract the informative features from the input textual data, such as word combinations and patterns, since the convolutional layer uses the learnable filters to extract the features from the input data at different spatial locations [47].

3.3.2. Long Short-Term Memory (LSTM)

LSTMs are a type of RNN designed to deal with temporal dependencies, including text sequences and time series [47]. This means that the RNNs face a problem during back-propagation, where the error function can explode when there are multiple time steps [47]. On the other hand, a memory cell has been added to the LSTM design, which solves the vanishing or exploding gradient problem faced in RNN by regulating the flow of information through the network [47]. Thus, LSTMs effectively handle sequential data with long-term dependencies, making them suitable for problems strongly related to time series analysis or natural language processing [48].

The basic structure for each LSTM unit consists of a memory cell and three gates, which are the input gate, which updates the memory cell with the fresh data, the forget gate, which takes the role of determining whether to keep the data or discard the data from the memory, and the output gate, which generates the next hidden state from the current memory cell [47,49]. Thus, these gates play a role in updating the current memory cell and the current hidden state [49].

3.3.3. CNN-LSTM

In this study, we developed a combined deep learning architecture specifically for Arabic sentiment analysis and classification. The workflow of the proposed combined CNN-LSTM model includes five stages which can be summarized as follows: the embedding layer, CNN layer, max pooling layer, LSTM layer, and output layer (as shown in Figure 3). The CNN-LSTM model is a hybrid model that combines the advantages of both CNNs and RNNs, specifically LSTM networks. This combination results in an effective model to capture both local and global dependencies in the text data, making it well-suited for sentiment analysis tasks.

The embedding layer takes the preprocessed text and transforms it into a vector representation so that the CNN-LSTM model can understand and process it effectively. The transformer depends on three parameters to generate the embedding, and these are input-dim to understand the vocabulary size of the dataset, output-dim to describe how the words will be embedded in a certain vector space, and input-length to show the input sequence length [50]. These determine the shape of the generated output from the embedding layer as (batch-size, input-length, output-dim), where the batch-size value of “None” is used for the dynamic batch size, which is common in the Keras implementation [47], while the value of the input-length varies depending on the preprocessing steps that affect the sequence length of the input data, and the output dimension is explored between 100 and 400, with a step of 50 units for Keras embedding and 100 for the AraVec, since it is a pre-trained model with a static vector size of 100. Thus, the shape of the embedding layer output will be different for different experiments and runs, depending on the input sequence length, which can vary depending on whether non-Arabic words, punctuation, or emojis are retained or removed, and the output dimension, which is determined during the hyperparameter tuning phase. Also, the final shape of the generated embeddings for all experiments is summarized in Table 1. In this research work, two types of embedding transformation methods were used to study their effect on the model performance: AraVec 3.0 and Keras embedding.

Then, the CNN layer extracts the local features [51] from the generated embedding of the embedding layer to feed the max-pooling layer. Thus, the feature extraction is found by applying the convolutional filter (kernel) to the input matrix by shifting the kernel in the matrix [50]. This results in an output shape of (None, Conv-input-length, Conv-filters) with a dynamic batch size indicated by a None value and the values of the Conv-input-length and the number of convolutional filters summarized in Table 1 for all experiments. The number of filters in the Conv1D layer varied between 100 and 400, with increments of 100. Here, a convolutional filter was applied to a window of words Xi:i + ℎ − 1, where h is the window size and Xi is a K-dimensional vector, and Xi:i + j represents the input feature matrix that extends from the ith to (i + j) words of the sentence vector [52]. The window size, which is also called kernel size, was tested with values of 2, 3, and 6 to capture different n-gram features. This results in a feature Cif, as proposed in Equation (1).

C_{i} = f (W . X_{i : i + h - 1} + b)

(1)

where W represents the convolutional filter, b represents the bias, which is a real number, and f is the activation function [52] because the output of each filter in the CNN layer is applied to the ReLU activation function, which allows it to learn complex patterns. Then, a feature map is generated by convolving through all the windows of words for a single convolutional filter based on Equation (2), while the m filters in the convolutional layer will generate m(n − ℎ + 1) features [52]. Then, the activity L2 regularization of 0.01 is used to prevent overfitting.

C = C_{1}, C_{2}, C_{3}, C_{n - h + 1}

(2)

The max function from the max pooling with a pool size of 2 is applied to each CNN filter output to select the maximum feature value from each filter window while iterating across the matrix [53], resulting in reduced the output complexity while saving the important features [51] to be fed to the LSTM layer. The output shape generated by the max pooling layer is (None, Conv-input-length/2, Conv-filters); Table 1 shows all the output shapes for this layer for all experiments.

Then, the LSTM layer is used to handle long-term dependencies for understanding the context that reflects the emotions in the text. In this research, an LSTM layer was used with a dropout value which was set by the Keras tuner to find the best value; the dropout rate in the LSTM layer varied between 0.2 and 0.5, with a step of 0.1 to prevent overfitting. Also, the LSTM units were used with a search space between 30 and 300, with a step of 10; this tuning range provides a trade-off between the model performance and computational efficiency. This layer was then followed by the Flatten layer to shape the features that were larger than the threshold. The output shapes of both the LSTM and Flatten layers are presented in Table 1.

The last layer, also called the fully connected layer [49], is a dense layer with one neuron that generates a (None, 1) output shape and a sigmoid function to binary-classify the output of the LSTM layer as positive or negative sentiment. Then, the Adam optimizer was utilized to enhance the training process of our hybrid model with a learning rate that was sampled logarithmically between 1 × 10⁻⁵ and 1 × 10⁻³ to ensure optimal training convergence.

Moreover, each layer in the CNN-LSTM model contributes a total number of trainable parameters that are updated during model training. Table 2 presents the number of trainable parameters for each layer of the CNN-LSTM model for all experiments and runs, providing a clear understanding of the model complexity and ability to learn from the data. The number of trainable parameters can be calculated automatically for each layer using the summary () function from Keras.

The final phase is model tuning using the Keras tuner [54], an easy-to-use framework that provides scalable hyperparameter optimization for deep learning models. The Keras tuner solves the pain points of hyperparameter search by using one of the built-in search algorithms (Random Search, Bayesian Optimization, and Hyperband) and configuring its search space using define-by-run syntax to find the best set of hyperparameter values for the model. In this research, the Keras Tuner was utilized with Bayesian Optimization to explore the hyperparameter space by focusing on promising regions, thereby reducing the number of trials required. The final selection of the hyperparameters is based on their improvement of the model performance in a 5-fold cross-validation. Thus, the best model was selected not only for its validation accuracy but also for its consistency across different folds.

The five-fold cross-validation with the Keras tuner was used to validate the model and optimize the hyperparameters using five different validation folds, which allowed the model to explore different hyperparameter combinations and select those that make the model perform best for ASA. Moreover, during the model training and validation, all dynamic batch sizes that were previously defined as None could be set to a value of 50 since it achieved the best result after manually trying different values. Then, the best model with the best validation performance was evaluated on test data never seen before using the same number of epochs: 10.

3.4. Model Evaluation

Several evaluation measures were used to evaluate and check the model performance in the Arabic sentiment classification task. The CNN-LSTM consistency was checked by evaluating the results on the test data after training the hybrid model. Although accuracy is the most popular performance measure, it may not represent the whole idea [55]. Therefore, the precision, recall, and F1-score were also used to ensure a comprehensive evaluation.

Accuracy is a metric that provides an overall measure of how often the model correctly classifies sentiment. It represents the ratio of correctly predicted observations to the total number of observations and is calculated using Equation (3):

A c c u r a c y = \frac{T r u e P o s i t i v e + T r u e N e g a t i v e}{T r u e P o s i t i v e + T r u e N e g a t i v e + F a l s e P o s i t i v e + F a l s e N e g a t i v e}

(3)

Precision is an evaluation metric to determine the model performance by finding the ratio of observations correctly predicted as positive to the total predicted positives, as shown in Equation (4).

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(4)

Recall is a popular metric that measures the consistency of the model’s performance by finding the ratio of observations correctly predicted as positive to all actual positives, as shown in Equation (5).

R e c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

(5)

The F1-score is the harmonic mean of precision and recall, and is also an important metric to verify the test accuracy [56]. The calculation of the F1-score is shown in Equation (6).

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(6)

4. Results

The results demonstrate the performance when using a combined deep learning model in this context and the effect of each preprocessing step on the model performance, including the effect of removing and keeping the non-Arabic words and punctuation over eight combinations of preprocessing conditions. These groups of conditions expand the scope of the research to find the role of replacing the emojis with their meanings and the appropriate stemmer and embedding transformer for each group of preprocessing steps. The preprocessing conditions were grouped into eight groups from R1 to R8, as follows.

R1: ISRI stemmer, emoji removal, Keras embedding.
R2: ISRI stemmer, emoji encoding, Keras embedding.
R3: Snowball stemmer, emoji encoding, Keras embedding.
R4: ISRI stemmer, emoji encoding, AraVec 3.0 embedding.
R5: Snowball stemmer, emoji removal, Keras embedding.
R6: Snowball stemmer, emoji encoding, AraVec 3.0 embedding.
R7: Snowball stemmer, emoji removal, AraVec 3.0 embedding.
R8: ISRI stemmer, emoji removal, AraVec 3.0 embedding.

After preprocessing the data and during model training and cross-validation, the Keras tuner used Bayesian Optimization to sample a set of hyperparameters and then trained the model based on these parameters. After that, the model performance was evaluated using the validation data. These operations were repeated for a predefined number of iterations. Once all iterations had been completed, the model with the best accuracy was selected after repeating this process 5-fold during the cross-validation. Table 3 summarizes the best set of hyperparameters.

The best set of hyperparameters differs from experiment to experiment, with no single set of parameters repeated more than the other. Therefore, we could not generalize a set of these values to all experiments. Instead, we adjusted the parameters with the best set of values extracted with the Keras tuner for each experiment, as appropriate. The results of Experiments 1–3 are summarized in Table 4, Table 5 and Table 6, respectively. All experiments were conducted on the Google Colab-L4 platform using Python version 3.10.12. The deep learning models were implemented using Keras version 3.4.1, running on TensorFlow version 2.17.0.

4.1. Experiment 1

In this experiment, the effect of removing non-Arabic words and punctuation was tested over eight conditions, resulting in eight experimental runs.

The results in Table 4 show that removing the emojis had a negative impact on the model performance, achieving the lowest accuracies of 70.15%, 69.92%, 53.87%, and 53.87% in R1, R5, R7, and R8, respectively, while translating the emojis to their textual meaning improved the model classification performance in R2, R3, R4, and R6, achieving accuracies of 90.23%, 91.69%, 87.32%, and 76.09%, respectively. Also, the results in R2 and R3 prove that Keras embedding is better than AraVec for both stemmers. Moreover, Keras embedding gives a better representation when removing the emojis in R1 and R5, which can be explained by the fact that Keras has the advantage of being specifically trained on the same dataset.

4.2. Experiment 2

In this experiment, the impact of keeping the non-Arabic tokens and removing the punctuation was tested using eight combinations of parameters, resulting in eight experimental runs.

The results in Table 5 show that removing the emojis had a negative impact on the model performance, achieving the lowest accuracies of 70.11%, 69.94%, 56.48%, and 56.07% in R1, R5, R7, and R8, respectively, while translating the emojis to their textual meaning improved the model classification performance in R2, R3, R4, and R6, achieving accuracies of 89.81%, 91.85%, 78.08%, and 75.59%, respectively. Also, the results in R2 and R3 prove that Keras embedding is better than the AraVec in R4 and R6 for both stemmers. Also, Keras embedding gives a better representation when removing the emojis in R1 and R5. In addition, keeping the non-Arabic words in this experiment showed the superior ability of the Snowball stemmer and Keras embedding in dealing with other languages over the ISRI stemmer and AraVec embedding, as shown in the results in R2, R3, and R4.

4.3. Experiment 3

In this experiment, the effect of keeping the non-Arabic words and punctuation was tested over eight conditions, resulting in eight experimental runs.

Table 6 also presents the importance of translating the emojis, which provides a real improvement to the model results and shows that Keras embedding outperforms the AraVec transformer. However, keeping the punctuation does not have an effect on improving the results; instead, the results decreased in Table 6 when the punctuation was not removed compared to the results in Table 5 for all experiments except in R2 and R6. These results reflect the reality that Twitter users use punctuation to decorate text and do not follow the rules of the Arabic language. These results will direct attention to the importance of removing the punctuation from tweets to obtain real results for the SA. In this experiment, the best results of 90.43% were achieved in R3 when using Keras embedding, Snowball stemmer, and emoji encoding, which reflects their ability to deal with punctuation and extract the emotions expressed in the place of the emojis.

Figure 4 shows the confusion matrix of Experiment 2 R3 because it achieves the best accuracy score of all experiments, showing a percentage of true positives (TP) and true negatives (TN) of 91.85%, and a percentage of false positives and false negatives of 8.15%, indicating a strong performance with high TP and TN rates. Specifically, the model correctly predicted 4199 out of 4454 TN samples and 3921 out of 4386 TP samples. These results suggest that our hybrid CNN-LSTM model effectively captures the nuances of the data, resulting in fewer misclassifications.

Figure 5 shows a Receiver Operating Characteristic Curve (ROC-Curve) for the same experiment. The ROC-Curve shows the same results as the confusion matrix in Figure 4, as the curve rises dramatically to the upper left near the Y-axis to show the high true positive and true negative rates, highlighting the robustness of the model in distinguishing between positive and negative sentiments and contributing to its overall superior performance, resulting in an area under the ROC-Curve of 91.84%.

5. Discussion

In this research study, we present a combined deep-learning approach for the analysis and classification of Arabic tweets. Also, the role of preprocessing in improving the field of Arabic sentiment analysis was investigated by checking the model performance with different preprocessing groups to find the most suitable set of preprocessing steps for the tweet dataset. We also investigated the translation of emojis into their meanings to understand their importance in data preparation.

In the first experiment, all non-Arabic words and punctuation were removed. Then, the model was used to evaluate the different techniques for handling emojis, stemming, and embedding. The results in Table 4 show that removing emojis from the data resulted in poor classification accuracy in R1, R5, R7, and R8, whereas translating emojis into real and emotional meanings improved the model accuracy in R2, R3, R4, and R6, reaching 91.69% in R3 when using Snowball stemmer and Keras embedding. Also, using the ISRI stemmer in R2 gave a close result of 90.23%. In R4 and R6, the pre-trained AraVec 3.0 had less effect on improving the model results, with an accuracy of 87.32% and 76.09%, respectively.

In the second experiment, the results in Table 5 suggest that keeping the non-Arabic words had no positive effect on the results when using ISRI stemmer and emoji encoding or AraVec 3.0 embedding and emoji encoding in R2, R4, and R6 over the results in Table 4, while keeping the non-Arabic words improved the results in R3 and R5 when using Snowball stemmer and Keras embedding. This indicated that the combination of Snowball stemmer and Keras embedding can deal with both the emotions stored inside the emojis and the words written in other languages and can employ them to provide insight into full vector representation, while ISRI stemmer and AraVec transformers could not employ the non-Arabic words to improve the classification results, especially when using emoji encoding. This is because AraVec is a pre-trained model trained on Arabic tweets and texts from Wikipedia, and the existence of non-Arabic words affects its transformation performance, while the Keras transformer is trained on the same dataset, which helps it to provide better representation of the emotions from the emojis and the non-Arabic words. So, the best result of 91.85% was achieved in Exp. 2 R3 over all experiments by keeping the non-Arabic words, which often carry significant sentiment information that contributes to the overall meaning of a post, and this led to a noticeable improvement in the sentiment classification accuracy. This is because non-Arabic words often act as strong sentiment indicators. For example, a tweet containing the phrase “I love” would likely indicate a positive sentiment. Removing these tokens would remove important context from the post, potentially leading to misclassification. Also, AraVec 3.0 embedding was slightly positively affected by keeping the non-Arabic words and removing the emoticons compared to Experiment 1 R7 and R8. This can be explained by the fact that removing the emoticons helps AraVec to provide vector representation for the tweets with an output dimension of 100, while Keras embedding uses the appropriate output dimension with the Keras tuner and generates more meaningful full embeddings.

In the third experiment, non-Arabic and punctuation were retained, and the effect of emoji removal and emoji encoding on the model was the same as in Experiments 1 and 2. This is because the accuracy achieved by removing the emoji was improved by replacing each emoji with its meaning. Also, the effect of punctuation was tested in this experiment by keeping the punctuation and non-Arabic words to see their effect on the model performance compared to keeping the non-Arabic and removing the punctuation. These results show that keeping the punctuation had a negative effect on the model accuracy for all experiments, especially R3, which provided the best accuracy in Exp 2, while R2 and R6 results were improved. These results show the indiscriminate use of punctuation by Twitter users. Thus, removing the punctuation will provide a more reliable and constant model.

The results obtained by the proposed approach applied to the ASTC dataset were compared with the results obtained by following different approaches that applied to the same dataset. The comparison was made with the study in [25] and presents the difference in the study aim, preprocessing steps, and the classification model, as shown in Table 7.

The proposed model shows comparable results with the results obtained in Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis, which used the emoji Unicode translation and CBOW word embedding to generate a numerical representation of the text to use the RNN, LSTM, and GRU combined with three meta-learners, LR, RF, and SVM, to classify the tweets [25]. The investigated model in [25] aimed to improve the performance of the model for predicting Arabic sentiment analysis. Therefore, they started with data preprocessing by cleaning the data by removing non-Arabic letters, digits, single Arabic letters, symbols, URLs, emails, and hashtags. Then, tokenization was carried out by splitting the text with spaces, followed by the removal of stop words, stemming from the ISRI stemmer, and emoji Unicode translation. In contrast, the proposed model investigated the hybrid CNN-LSTM model with different data preprocessing steps to achieve comparable accuracy results and highlighted the effect of emoji encoding on emotional and real meaning, as well as non-Arabic words, punctuation, Arabic stemmers, and trainable and pre-trained transformers. This research presents the compatibility between Snowball stemmer, Keras embedding, and CNN-LSTM model and shows how keeping the non-Arabic words improved the model, while keeping the punctuation had a negative effect on it. Moreover, both the study in [25] and our approach showed the role of using the emoji meaning to enrich the sentiment of the text by achieving an accuracy of 92.22% and 91.85%, respectively, while our approach provided a comparison between the results when removing the emojis and when transforming them, which validates the meaning of the emojis in the generated emoji meaning dataset.

6. Conclusions and Future Work

The most important step in applying any machine learning or deep learning model is data preprocessing, as it plays a role in building real and accurate models. Therefore, the main goal of this research was to investigate the importance of providing the real and emotional meaning of emojis in sentiment analysis and of finding the best combination of preprocessing steps to enhance the model. It also aimed to show the effect of the presence of punctuation and non-Arabic words in ASA.

This research proposed a real contribution to the improvement of ASA and shows that emoji encoding has the most important effect on the results since social media users enrich their tweets and posts with emotional signs by using these emojis. Also, including the non-Arabic words when using the Keras embedding and Snowball stemmer resulted in the best set of preprocessing combinations and achieved the highest accuracy score of 91.85% using the CNN-LSTM model.

Given the promising improvements in Arabic Sentiment Analysis (ASA) achieved through advanced data preprocessing and emoji encoding techniques, future research will focus on integrating these steps with other pre-trained transformers like AraBERT, Glove, and MARBERT. This investigation aims to evaluate and compare their performance against the currently utilized transformers, potentially uncovering more efficient models for enhanced sentiment analysis outcomes. We would also like to explore the effect of integrating insights from speech synthesis, as in [57], into text-based sentiment analysis models, which could lead to the development of hybrid models capable of understanding both text and speech data.

Author Contributions

Conceptualization, H.A.; methodology, H.A.; software, H.A.; validation, H.A., A.H. and M.M.; formal analysis, H.A.; investigation, H.A.; data curation, H.A.; writing—original draft preparation, H.A.; writing—review and editing, A.H. and M.M.; visualization, H.A.; supervision, A.H. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The ASTC dataset supporting the findings in this research is available from the link in the dataset citation. On the other hand, the Emoji Meaning dataset is available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Diwali, A.; Saeedi, K.; Dashtipour, K.; Gogate, M.; Cambria, E.; Hussain, A. Sentiment Analysis Meets Explainable Artificial Intelligence: A Survey on Explainable Sentiment Analysis. IEEE Trans. Affect. Comput. 2023, 15, 837–846. [Google Scholar] [CrossRef]
Saberi, B.; Saad, S. Sentiment analysis or opinion mining: A review. Int. J. Adv. Sci. Eng. Inf. Technol. 2017, 7, 1660–1666. [Google Scholar]
Abdelfattah, M.F.; Fakhr, M.W.; Rizka, M.A. ArSentBERT: Fine-tuned bidirectional encoder representations from transformers model for Arabic sentiment classification. Bull. Electr. Eng. Inform. 2023, 12, 1196–1202. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. Deep learning approaches for Arabic sentiment analysis. Soc. Netw. Anal. Min. 2019, 9, 52. [Google Scholar] [CrossRef]
Abdelwahab, Y.; Kholief, M.; Sedky, A.A.H. Justifying Arabic Text Sentiment Analysis Using Explainable AI (XAI): LASIK Surgeries Case Study. Information 2022, 13, 536. [Google Scholar] [CrossRef]
Oueslati, O.; Cambria, E.; Ben HajHmida, M.; Ounelli, H. A review of sentiment analysis research in Arabic language. Future Gener. Comput. Syst. 2020, 112, 408–430. [Google Scholar] [CrossRef]
Al Shamsi, A.A.; Abdallah, S. Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects. J. King Saud. Univ.-Comput. Inf. Sci. 2023, 35, 101691. [Google Scholar] [CrossRef]
Elnagar, A.; Einea, O.; Lulu, L. Comparative study of sentiment classification for automated translated Latin reviews into Arabic. In Proceedings of the IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), Hammamet, Tunisia, 30 October–3 November 2017; pp. 443–448. [Google Scholar] [CrossRef]
Al-Azani, S.; El-Alfy, E.S.M. Combining emojis with Arabic textual features for sentiment classification. In Proceedings of the 2018 9th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 3–5 April 2018; pp. 139–144. [Google Scholar] [CrossRef]
Novak, P.K.; Smailović, J.; Sluban, B.; Mozetič, I. Sentiment of Emojis. PLoS ONE 2015, 10, e144296. [Google Scholar] [CrossRef]
Soleymani, M.; Garcia, D.; Jou, B.; Schuller, B.; Chang, S.F.; Pantic, M. A survey of multimodal sentiment analysis. Image Vis. Comput. 2017, 65, 3–14. [Google Scholar] [CrossRef]
Li, W.; Zhu, L.; Shi, Y.; Guo, K.; Cambria, E. User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models. Appl. Soft Comput. 2020, 94, 106435. [Google Scholar] [CrossRef]
Alayba, A.M.; Palade, V. Leveraging Arabic sentiment classification using an enhanced CNN-LSTM approach and effective Arabic text preparation. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 9710–9722. [Google Scholar] [CrossRef]
Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. Arabic language sentiment analysis on health services. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 114–118. [Google Scholar] [CrossRef]
Abdulla, N.A.; Ahmed, N.A.; Shehab, M.A.; Al-Ayyoub, M. Arabic sentiment analysis: Lexicon-based and corpus-based. In Proceedings of the 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), Amman, Jordan, 3–5 December 2013; pp. 1–6. [Google Scholar] [CrossRef]
Nabil, M.; Aly, M.; Atiya, A. Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 Conference on Empirical Methods in Natural, Lisbon, Portugal, 17–21 September 2015; pp. 2515–2519. Available online: https://aclanthology.org/D15-1299.pdf (accessed on 29 April 2024).
Hengle, A.; Kshirsagar, A.; Desai, S.; Marathe, M. Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification. arXiv 2021, arXiv:2103.05683. Available online: https://arxiv.org/abs/2103.05683v1 (accessed on 7 September 2023).
Jalil, A.A.; Aliwy, A.H. Classification of Arabic Social Media Texts Based on a Deep Learning Multi-Tasks Model. Al-Bahir J. Eng. Pure Sci. 2023, 2, 12. [Google Scholar] [CrossRef]
Sabbeh, S.F.; Fasihuddin, H.A. A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification. Electronics 2023, 12, 1425. [Google Scholar] [CrossRef]
Gharaibeh, H.; Al Mamlook, R.E.; Samara, G.; Nasayreh, A.; Smadi, S.; Nahar, K.M.; Aljaidi, M.; Al-Daoud, E.; Gharaibeh, M.; Abualigah, L. Arabic sentiment analysis of Monkeypox using deep neural network and optimized hyperparameters of machine learning algorithms. Soc. Netw. Anal. Min. 2024, 14, 30. [Google Scholar] [CrossRef]
Nayel, H.; Amer, E.; Allam, A.; Abdallah, H. Machine Learning-Based Model for Sentiment and Sarcasm Detection. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kiev, Ukraine, 19 April 2021; pp. 386–389. Available online: https://aclanthology.org/2021.wanlp-1.51 (accessed on 7 September 2023).
Wadhawan, A. AraBERT and Farasa Segmentation Based Approach for Sarcasm and Sentiment Detection in Arabic Tweets. arXiv 2021, arXiv:2103.01679. Available online: https://arxiv.org/abs/2103.01679v1 (accessed on 7 September 2023).
Al-Azani, S.; El-Alfy, E.S.M. Emoji-Based Sentiment Analysis of Arabic Microblogs Using Machine Learning. In Proceedings of the 21st Saudi Computer Society National Computer Conference (NCC), Riyadh, Saudi Arabia, 25–26 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
Arifiyanti, A.A.; Wahyuni, E.D. Emoji and emoticon in tweet sentiment classification. In Proceedings of the 6th Information Technology International Seminar (IT IS), Surabaya, Indonesia, 14–16 October 2020; pp. 145–150. [Google Scholar] [CrossRef]
Saleh, H.; Mostafa, S.; Alharbi, A.; El-Sappagh, S.; Alkhalifah, T. Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis. Sensors 2022, 22, 3707. [Google Scholar] [CrossRef]
Surikov, A.; Egorova, E. Alternative method sentiment analysis using emojis and emoticons. Procedia Comput. Sci. 2020, 178, 182–193. [Google Scholar] [CrossRef]
Al-Azani, S.; El-Alfy, E.S. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks. In Proceedings of the 2018 International Conference on Computing Sciences and Engineering (ICCSE), Kuwait City, Kuwait, 11–13 March 2018; pp. 1–6. [Google Scholar] [CrossRef]
Chen, Y.; You, Q.; Yuan, J.; Luo, J. Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM. In Proceedings of the MM 2018—Proceedings of the 2018 ACM Multimedia Conference, Seoul, Republic of Korea, 22–26 October 2018; pp. 117–125. [Google Scholar] [CrossRef]
Arabic Sentiment Twitter Corpus. Available online: https://www.kaggle.com/datasets/mksaad/arabic-sentiment-twitter-corpus/data?select=arabic_tweets (accessed on 31 March 2024).
EmojiGuide. Available online: https://ar.emojiguide.com/ (accessed on 9 April 2024).
EmojiAll. Available online: https://www.emojiall.com/ar (accessed on 9 April 2024).
Symbol Planet. Available online: https://symbolplanet.com/smileys-emotion-emoji-meanings/ (accessed on 9 April 2024).
wikiHow. Available online: https://www.wikihow.com/Category:Emoticons-and-Emojis (accessed on 9 April 2024).
Ma, Z.; Sun, A.; Yuan, Q.; Cong, G. Tagging your tweets: A probabilistic modeling of hashtag annotation in twitter. In Proceedings of the 23rd ACM International Conference on Conference on Conference on Information and Knowledge Management, Shanghai, China, 3 November 2014; pp. 999–1008. [Google Scholar] [CrossRef]
Yang, L.; Sun, T.; Zhang, M.; Mei, Q. We know what @you #tag: Does the dual role affect hashtag adoption? In Proceedings of the 21st Annual Conference on World Wide Web (WWW), Lyon, France, 16–20 April 2012; pp. 261–270. [Google Scholar] [CrossRef]
Khalid Bolbol, N.; Maghari, A.Y. Sentiment analysis of arabic tweets using supervised machine learning. In Proceedings of the 2020 International Conference on Promising Electronic Technologies (ICPET), Jerusalem, Palestine, 16–17 December 2020; pp. 89–93. [Google Scholar] [CrossRef]
Khamphakdee, N.; Seresangtakul, P. An Efficient Deep Learning for Thai Sentiment Analysis. Data 2023, 8, 90. [Google Scholar] [CrossRef]
Al-Helalat, M. Enhanced arabic information retrieval for informed decision-making: Empowering political search. Int. J. Progress. Res. Eng. Manag. Sci. (IJPREMS) 2023, 3, 232–240. Available online: https://www.ijprems.com/uploadedfiles/paper/issue_7_july_2023/31816/final/fin_ijprems1689480149.pdf (accessed on 10 May 2024).
Gurusamy, V.; Professor, A. Preprocessing Techniques for Text Mining. Int. J. Comput. Sci. Commun. Netw. 2014, 5, 7–16. [Google Scholar]
Van Der Goot, R. Where are we Still Split on Tokenization? In Findings of the Association for Computational Linguistics: EACL; Association for Computational Linguistics: St. Julian’s, Malta, 2024; pp. 118–137. Available online: https://aclanthology.org/2024.findings-eacl.9 (accessed on 27 April 2024).
Bird, S. NLTK: The natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions; Association for Computational Linguistics: Sydney, Australia, 2006; pp. 69–72. Available online: https://aclanthology.org/P06-4018.pdf (accessed on 27 April 2024).
Islam, J.; Mercer, R.E.; Xiao, L. Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 1355–1365. [Google Scholar] [CrossRef]
Maree, M.; Eleyat, M.; Rabayah, S.; Belkhatir, M. A hybrid composite features based sentence level sentiment analyzer. IAES Int. J. Artif. Intell. 2023, 12, 284–294. [Google Scholar] [CrossRef]
Radwan, A.; Amarneh, M.; Alawneh, H.; Ashqar, H.I.; AlSobeh, A.; Magableh, A.A.A.R. Predictive Analytics in Mental Health Leveraging LLM Embeddings and Machine Learning Models for Social Media Analysis. Int. J. Web Serv. Res. 2024, 21, 1–22. [Google Scholar] [CrossRef]
Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing Ltd.: Birmingham, UK, 2017; Available online: https://scholar.google.com/scholar_lookup?title=Deep+Learning+with+KERAS&author=Gulli,+A.&author=Pal,+S.&publication_year=2017 (accessed on 9 May 2024).
Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Comput. Sci. 2017, 117, 256–265. [Google Scholar] [CrossRef]
Bin Syed, M.A.; Ahmed, I. A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data. Sensors 2023, 23, 6400. [Google Scholar] [CrossRef]
Hu, F.; Yang, Q.; Yang, J.; Luo, Z.; Shao, J.; Wang, G. Incorporating multiple grid-based data in CNN-LSTM hybrid model for daily runoff prediction in the source region of the Yellow River Basin. J. Hydrol. Reg. Stud. 2024, 51, 101652. [Google Scholar] [CrossRef]
Ghourabi, A.; Mahmood, M.A.; Alzubi, Q.M. A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages. Future Internet 2020, 12, 156. [Google Scholar] [CrossRef]
Saleh, H.; Mostafa, S.; Gabralla, L.A.; Aseeri, A.O.; El-Sappagh, S. Enhanced Arabic Sentiment Analysis Using a Novel Stacking Ensemble of Hybrid and Deep Learning Models. Appl. Sci. 2022, 12, 8967. [Google Scholar] [CrossRef]
Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [Google Scholar] [CrossRef]
Khan, L.; Amjad, A.; Afaq, K.M.; Chang, H.T. Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media. Appl. Sci. 2022, 12, 2694. [Google Scholar] [CrossRef]
Behera, R.K.; Jena, M.; Rath, S.K.; Misra, S. Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data. Inf. Process Manag. 2021, 58, 102435. [Google Scholar] [CrossRef]
KerasTuner. Available online: https://keras.io/keras_tuner/ (accessed on 12 April 2024).
Alawneh, H.; Hasasneh, A. Survival Prediction of Children after Bone Marrow Transplant Using Machine Learning Algorithms. Int. Arab. J. Inf. Technol. 2024, 21, 394–407. [Google Scholar] [CrossRef]
Islam, M.A.; Iacob, I.E. Manuscripts Character Recognition Using Machine Learning and Deep Learning. Modelling 2023, 4, 168–188. [Google Scholar] [CrossRef]
Al-Radhi, M.S.; Abdo, O.; Csapó, T.G.; Abdou, S.; Németh, G.; Fashal, M. A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus. Comput. Speech Lang. 2020, 60, 101025. [Google Scholar] [CrossRef]

Figure 1. The workflow of the proposed model for Arabic sentiment analysis.

Figure 2. The number of positive and negative tweets.

Figure 3. The proposed CNN-LSTM model architecture for Arabic sentiment analysis.

Figure 4. Confusion matrix of experiment 2 R3.

Figure 5. ROC-Curve of experiment 2 R3.

Table 1. The output shape of each layer for the 1D CNN-LSTM model.

Exp Num	Run Num	Embedding Shape	Convolutional Layer	Max Pooling	LSTM Layer	Flatten
Exp 1	R1	(None, 1189, 150)	(None, 1187, 400)	(None, 593, 400)	(None, 593, 250)	(None, 148,250)
	R2	(None, 1956, 300)	(None, 1954, 100)	(None, 977, 100)	(None, 977, 80)	(None, 78,160)
	R3	(None, 1955, 150)	(None, 1953, 400)	(None, 976, 400)	(None, 976, 170)	(None, 165,920)
	R4	(None, 1956, 100)	(None, 1955, 200)	(None, 977, 200)	(None, 977, 130)	(None, 127,010)
	R5	(None, 968, 200)	(None, 967, 300)	(None, 483, 300)	(None, 483, 100)	(None, 48,300)
	R6	(None, 1955, 100)	(None, 1953, 400)	(None, 976, 400)	(None, 976, 200)	(None, 195,200)
	R7	(None, 968, 100)	(None, 963, 300)	(None, 481, 300)	(None, 481, 190)	(None, 91,390)
	R8	(None, 969, 100)	(None, 964, 100)	(None, 482, 100)	(None, 482, 120)	(None, 57,840)
Exp 2	R1	(None, 1328, 400)	(None, 1327, 200)	(None, 663, 200)	(None, 663, 300)	(None, 198,900)
	R2	(None, 2129, 150)	(None, 2128, 400)	(None, 1064, 400)	(None,1064, 270)	(None, 287,280)
	R3	(None, 2128, 350)	(None, 2126, 400)	(None, 1063, 400)	(None, 1063, 230)	(None, 244,490)
	R4	(None, 2129, 100)	(None, 2128, 100)	(None, 1064, 100)	(None, 1064, 110)	(None, 117,040)
	R5	(None, 1141, 150)	(None, 1139, 400)	(None, 569, 400)	(None, 569, 160)	(None, 91,040)
	R6	(None, 2128, 100)	(None, 2127, 200)	(None, 1063, 200)	(None, 1063, 150)	(None, 159,450)
	R7	(None, 1141, 100)	(None, 1140, 400)	(None, 570, 400)	(None, 570, 190)	(None, 108,300)
	R8	(None, 1142, 100)	(None, 1141, 200)	(None, 570, 200)	(None, 570, 280)	(None, 159,600)
Exp 3	R1	(None, 1167, 300)	(None, 1166, 100)	(None, 583, 100)	(None, 583, 220)	(None, 128,260)
	R2	(None, 2048, 350)	(None, 2046, 400)	(None, 1023, 400)	(None, 1023, 70)	(None, 71,610)
	R3	(None, 2128, 400)	(None, 2127, 300)	(None, 1063, 300)	(None, 1063, 70)	(None, 74,410)
	R4	(None, 2221, 100)	(None, 2219, 400)	(None, 1109, 400)	(None, 1109, 130)	(None, 144,170)
	R5	(None, 1141, 400)	(None, 1139, 100)	(None, 569, 100)	(None, 569, 90)	(None, 51,210)
	R6	(None, 2128, 100)	(None, 2126, 200)	(None, 1063, 200)	(None, 1063, 230)	(None, 244,490)
	R7	(None, 1141, 100)	(None, 1136, 100)	(None, 568, 100)	(None, 568, 80)	(None, 45,440)
	R8	(None, 1167, 100)	(None, 1166, 200)	(None, 583, 200)	(None, 583, 230)	(None, 134,090)

Table 2. The number of trainable parameters for each CNN-LSTM layer.

Exp Num	Run Num	Embedding Param	Conv Layer Param	LSTM Layer Param	Dense Layer Param
Exp 1	R1	10,774,200	180,400	651,000	148,251
	R2	5,438,400	90,100	57,920	78,161
	R3	4,934,700	180,400	388,280	165,921
	R4	147,671,600	40,200	172,120	127,011
	R5	6,217,000	120,300	160,400	48,301
	R6	147,671,600	120,400	480,800	195,201
	R7	147,671,600	180,300	373,160	91,391
	R8	147,671,600	60,100	106,080	57,841
Exp 2	R1	30,356,400	160,200	601,200	198,901
	R2	2,938,350	120,400	724,680	287,281
	R3	12,019,700	420,400	580,520	244,491
	R4	147,671,600	20,100	92,840	117,041
	R5	4,897,950	180,400	359,040	91,041
	R6	147,671,600	40,200	210,600	159,451
	R7	147,671,600	80,400	449,160	108,301
	R8	147,671,600	40,200	538,720	159,601
Exp 3	R1	5,450,400	60,100	282,480	128,261
	R2	6,902,000	420,400	131,880	71,611
	R3	13,765,200	240,300	103,880	74,411
	R4	147,671,600	120,400	276,120	144,171
	R5	13,086,800	120,100	68,760	51,211
	R6	147,671,600	60,200	396,520	244,491
	R7	147,671,600	60,100	57,920	45,441
	R8	147,671,600	40,200	396,520	134,091

Table 3. Best hyperparameters values determined by the Keras tuner.

Exp Num	Run	Output Dim	Convolutional Filters	Convolutional Kernel Size	LSTM Units	LSTM Dropout	Learning Rate
Exp 1	R1	150	400	3	250	0.3	0.00049751
	R2	300	100	3	80	0.3	0.00017099
	R3	150	400	3	170	0.2	0.00022555
	R4	100	200	2	130	0.4	0. 00023515
	R5	200	300	2	100	0.4	0. 00014636
	R6	100	400	3	200	0.3	0.00001149
	R7	100	300	6	190	0.3	0.00002469
	R8	100	100	6	120	0.4	0.00002816
Exp 2	R1	400	200	2	30	0.2	0. 00016625
	R2	150	400	2	270	0.2	0.00007771
	R3	350	400	3	230	0.4	0.00053589
	R4	100	100	2	110	0.2	0.00007517
	R5	150	400	3	160	0.3	0.00019196
	R6	100	200	2	150	0.3	0.00043968
	R7	100	400	2	190	0.2	0.00007383
	R8	100	200	2	280	0.2	0.000144206
Exp 3	R1	300	100	2	220	0.2	0. 00028067
	R2	350	400	3	70	0.4	0. 00027495
	R3	400	300	2	70	0.2	0.00022570
	R4	100	400	3	130	0.3	0.00003997
	R5	400	100	3	90	0.2	0.00009749
	R6	100	200	3	230	0.2	0.00001491
	R7	100	100	6	80	0.4	0.00007121
	R8	100	200	2	230	0.2	0.00011200

Table 4. Experiment 1 results.

Run	Stemmer	Emoji	Embedding	Precision	Recall	F1-Score	Accuracy
R1	ISRI	Remove Emoji	Keras Embedding	72%	70%	70%	70.15%
R2	ISRI	Encoding to Arabic	Keras Embedding	90%	90%	90%	90.23%
R3	Snowball	Encoding to Arabic	Keras embedding	91%	91%	91%	91.69%
R4	ISRI	Encoding to Arabic	AraVec 3.0	87%	87%	87%	87.32%
R5	Snowball	Remove Emoji	Keras Embedding	70%	70%	70%	69.92%
R6	Snowball	Encoding to Arabic	AraVec 3.0	76%	76%	76%	76.09%
R7	Snowball	Remove Emoji	AraVec 3.0	54%	54%	53%	53.87%
R8	ISRI	Remove Emoji	AraVec 3.0	54%	53%	53%	53.55%

Table 5. Experiment 2 results.

Run	Stemmer	Emoji	Embedding	Precision	Recall	F1-Score	Accuracy
R1	ISRI	Remove Emoji	Keras embedding	70%	70%	70%	70.11%
R2	ISRI	Encoding to Arabic	Keras embedding	90%	90%	90%	89.81%
R3	Snowball	Encoding to Arabic	Keras embedding	92%	92%	92%	91.85%
R4	ISRI	Encoding to Arabic	AraVec 3.0	78%	78%	78%	78.08%
R5	Snowball	Remove Emoji	Keras Embedding	70%	70%	70%	69.94%
R6	Snowball	Encoding to Arabic	AraVec 3.0	76%	75%	75%	75.59%
R7	Snowball	Remove Emoji	AraVec 3.0	57%	57%	56%	56.48%
R8	ISRI	Remove Emoji	AraVec 3.0	57%	56%	56%	56.07%

Table 6. Experiment 3 results.

Run	Stemmer	Emoji	Embedding	Precision	Recall	F1-Score	Accuracy
R1	ISRI	Remove Emoji	Keras embedding	71%	70%	69%	69.79%
R2	ISRI	Encoding to Arabic	Keras embedding	91%	91%	91%	90.28%
R3	Snowball	Encoding to Arabic	Keras embedding	90%	90%	90%	90.43%
R4	ISRI	Encoding to Arabic	AraVec 3.0	77%	77%	77%	77.01%
R5	Snowball	Remove Emoji	Keras Embedding	70%	70%	70%	70.03%
R6	Snowball	Encoding to Arabic	AraVec 3.0	78%	77%	77%	77.46%
R7	Snowball	Remove Emoji	AraVec 3.0	54%	54%	53%	54.14%
R8	ISRI	Remove Emoji	AraVec 3.0	55%	55%	55%	55.1%

Table 7. Comparison.

Article	Dataset	Model	Accuracy
Our approach	ASTC	CNN-LSTM	91.85%
Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis	ASTC	Stacking LR	92.22%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alawneh, H.; Hasasneh, A.; Maree, M. On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis. Modelling 2024, 5, 1469-1489. https://doi.org/10.3390/modelling5040076

AMA Style

Alawneh H, Hasasneh A, Maree M. On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis. Modelling. 2024; 5(4):1469-1489. https://doi.org/10.3390/modelling5040076

Chicago/Turabian Style

Alawneh, Hussam, Ahmad Hasasneh, and Mohammed Maree. 2024. "On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis" Modelling 5, no. 4: 1469-1489. https://doi.org/10.3390/modelling5040076

APA Style

Alawneh, H., Hasasneh, A., & Maree, M. (2024). On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis. Modelling, 5(4), 1469-1489. https://doi.org/10.3390/modelling5040076

Article Menu

On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Dataset Descriptions

3.2. Data Pre-Processing

3.3. A Combined Deep Learning Model

3.3.1. Convolutional Neural Network (CNN)

3.3.2. Long Short-Term Memory (LSTM)

3.3.3. CNN-LSTM

3.4. Model Evaluation

4. Results

4.1. Experiment 1

4.2. Experiment 2

4.3. Experiment 3

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI