1. Introduction
The Arabic language is among the five most commonly used online languages. It is used for conversion among large communities around the world. Technological innovation has a significant impact on people’s lives. In the field of computer science, a great deal has been accomplished in recent years. Artificial intelligence (AI) has produced remarkable and important achievements [
1]. As AI interacts with various scientific fields, it has an increased in importance in recent years. Many aspects of artificial intelligence are covered, including natural language processing (NLP). A human language is a natural language (English, Arabic, or Spanish). In AI, machine learning and chatbots, intent classification is classifying someone’s intent by analyzing their language. Intent classification is a kind of NLP that focuses on categorizing text into different categories to interpret text better.
Intents are generic characteristics that link a user’s text to a bot action (prediction workflow). For example, the whole sentence “What is the weather today?” will map to the ‘weather inquiry’ intent, not just a portion of it. [
2]. In the last decade, intent classification and entity extraction have been widely used to develop chatbots for various languages. Since Arabic is the fourth-largest language used on the Internet, intent classification for Arabic is necessary. Very little work has been performed on Arabic intent classification and entity extraction. Moreover, there are a few articles on the development of chatbots for the Arabic language.
Machine learning and natural language processing are used in intent classification to automatically match specific sentences and words with a suitable intent [
3]. For example, a machine learning model could discover that words, such as buy and acquire, are often linked with the urge to buy. On the other hand, intent classifiers must be trained using text samples, often referred to as training data. Tags such as Interested, Need Information, Unsubscribe, Wrong Person, Email Bounce, Autoreply and others may be useful when going through customer emails.
NLP is a subfield of AI that uses natural language to allow human–computer interaction and communication. NLP has spawned a slew of new applications. A chatbot is one of the most intriguing natural language artificial intelligence applications [
4]. A chatbot is software that uses natural language to conduct a human–computer interaction through auditory or textual means. As a result, it functions as a virtual assistant that uses artificial intelligence to mimic conversational abilities and human-like behavior [
5]. It also contains embedded information that aids in identifying and comprehending the phrase and the generation of the right answer. Many research articles have been published in this area due to the importance of sentiment analysis. However, this research has concentrated on English and other Indo-European languages. In morphologically rich languages such as Arabic, there has been very little research on sentiment analysis [
6]. Despite this, many academics have focused on sentiment analysis in Arabic due to the growing number of Arabic internet users and the exponential development of Arabic online content in the past decade. Pipelines can be used to streamline a workflow for machine learning. Pre-processing, extraction of features, categorization and post-processing are all possible steps in the pipeline. Many other necessary phases in this pipeline may be added according to the complexity of applications. By optimization, we intend to modify the model for optimum performance. Any learning model’s effectiveness depends on choosing the parameters that produce the greatest outcomes. The concept of optimization can be compared to a search algorithm that explores a range of parameters and picks out the best among them.
Because Arabic is such a complicated language, developing Arabic chatbots has posed a significant challenge to the academic community. Only a few works have tried to create Arabic chatbots so far. ArabChat [
7] is one such project, as follows: a rule-based chatbot capable of pattern matching and delivering appropriate responses to user inquiries. BOTTA [
8], another project, is a retrieval-based model that supports the Egyptian dialect. Ollobot is a rule-based chatbot that provides health monitoring and assistance in the medical sector [
9]. However, because of its restricted functional scalability, an open-domain chatbot cannot be successfully used in every business. Furthermore, since most chatbot frameworks are written in English, building an efficient and multi-objective chatbot for Arabic is necessary. This research proposes an ArRASA, a pipeline optimization approach based on a deep learning-based open-source chatbot system that understands Arabic, to solve this issue.
The proposed approach consists of the following steps: Tokenization, feature extraction, specific intent classification and suitable entity extraction are the four phases of this closed-domain chatbot. A closed-domain chatbot, also known as a domain-specific chatbot, focuses on a certain range of issues and provides limited replies based on the business issue. For instance, a food delivery chatbot can only let users place, monitor, or cancel an order. Such straight-shooting discussion is kind of bumping into an acquaintance as follows: you expect people to be likely to inquire about your work and maybe comment on the environment. You have prepared answers to every topic, and the idea is just to satisfy the enquiries. While an open-domain chatbot is required to grasp any matter and provide appropriate answers. The proposed model can be scaled by adding more intents and entities. Open-domain chatbots are less effective in the industry, so the proposed study focuses on developing closed-domain chatbots. Moreover, a handsome amount of work is performed using traditional machine and deep learning approaches, while the proposed study uses transformers-based techniques for the development of a more effective and reliable chatbot.
This paper discusses ArRASA, a pipeline optimization approach based on a deep learning-based open-source chatbot system that understands Arabic. To cope with this topic, first, we discuss the Arabic language and its different perspectives and challenges, as it has some special characteristics and rules to deal with a problem. We discuss the related approaches in the literature review section. The proposed methodology discusses the complete operations of the ArRASA. The proposed model can be scaled by adding more intents and entities. In terms of Arabic language understanding, an optimization experiment is carried out at each step. The prime contributions of the proposed solution can be summarized as follows:
ArRASA is a channel optimization strategy proposed based on a deep-learning platform to create a chatbot that understands Arabic;
ArRASA is a novel approach for a closed-domain chatbot using RASA (an open-source conversational AI platform) that can be used in any Arabic industry;
Tokenization, feature extraction, specific intent classification and suitable entity extraction are the four phases of the proposed approach;
The performance of ArRASA is evaluated using traditional assessment metrics, i.e., accuracy and F1 score for the intent classification and entity extraction tasks in the Arabic language;
The performance is also compared with the existing approaches regarding accuracy and F1 score.
The remainder of the paper is organized as follows:
Section 2 discusses the related work and Arabic language, and its challenges are elaborated in
Section 3. The proposed solution is developed in
Section 4, while the system structure is discussed in
Section 5. The performance evaluation is presented in
Section 6 and
Section 7 concludes the paper.
2. Related Work
BOTTA [
8] is the first Arabic dialect chatbot developed for Egyptian Arabic to work as a conversational agent to mimic user-friendly chats. Various components of the BOTTA chatbot are defined, and it presents various solutions. Researchers working on Arabic chatbot technology can access the BOTTA database files for free and with public access.
Shawar et al. [
10] show how machine-learning techniques were used to create an Arabic chatbot that accepts user feedback in Arabic and responds with Qur’anic quotes. A device that learned conversational patterns from a corpus of transcribed conversations was used to create various chatbots that spoke English, French and Afrikaans. Because the Qur’an is not a copy of a dialogue, the learning method has been altered to accommodate the Qur’an’s format in terms of sooras and ayyas.
Bashir et al. [
11] propose a method for using named entity recognition and text categorization using deep learning methods in the Arabic area of home automation. To do this, we provide an NLU module that can be further combined with an ASR, a conversation manager and a natural language generator module to create a fully functional dialogue system. The process of gathering and annotating the data, constructing the intent classifier and entity extractor models, and ultimately the assessment of these techniques against various benchmarks are all included in the study.
AlHumoud et al. [
12] summarize published Arabic chatbot studies to recognize information gaps and illustrate areas that need more investigation and research. This research found a scarcity of Arabic chatbots and that all available works are retrieval-based. The experiments are divided into the following two classes, depending on the method of chatbot communication interaction: text and voice conversational chatbots. The study was presented and assessed according to the deployment method, the duration and breadth of the presentations, and the model used for the chatbot dataset. According to the study, all the assessed chatbots used a retrieval-based dataset model.
Nabiha [
13] is a chatbot that uses the Saudi Arabic dialect to converse with King Saud University Information Technology (IT) students. As a consequence, Nabiha will be the first Saudi chatbot to communicate in the Saudi dialect. Nabiha is accessible on several platforms, including Android, Twitter and the Web, to make it simpler to use. Students may contact Nabiha by downloading an app, tweeting her, or visiting her website. According to the students in the IT department who tested Nabiha, the results were acceptable, given the general challenges of the Arabic language and the Saudi dialect.
The study in [
14] is the first Arabic end-to-end generative model for task-oriented DS (AraConv). It makes use of various parameters for the multilingual transformer model mT5. We also provide the Arabic-TOD discourse dataset, which was utilized to train and evaluate the AraConv model. Compared to research employing identical monolingual conditions, the findings obtained are fair. We propose joint training, in which the model is jointly trained on Arabic conversation data with data from one or two high-resource languages such as English and Chinese, to minimize issues related to a short training dataset and enhance the AraConv model’s outcomes.
Many authors worked on the development of Arabic chatbots, but most of them worked on the development of open-domain chatbots. Open-domain chatbots are less effective in the industry, so the proposed study focuses on developing closed-domain chatbots. Moreover, a handsome amount of work is performed using traditional machine and deep learning approaches, while the proposed study uses transformer-based techniques for the development of a more effective and reliable chatbot. The previous studies focused on the development of chatbots, but few worked on optimizing the proposed techniques. The proposed technique also worked on optimizing the proposed architecture for enhancing the accuracy and efficiency of the proposed Arabic chatbot;
Table 1 presents the comparison of previous techniques.
Rasa is a platform for building AI-powered, quality chatbots in the industry. It is used by developers all around the world to build chatbots and contextual assistants. ArRASA is a channel optimization strategy based on a deep-learning platform to create a chatbot that understands Arabic. ArRASA is a closed-domain chatbot that can be used in any Arabic industry. As we proposed an optimized Arabic language chatbot using RASA, so we named it ArRASA.
4. Proposed Solution
As we know, the number of services and products is growing rapidly around the globe. Due to this, the number of queries to the producers is also increasing. To solve this problem, companies hire individuals to serve as customer support for their products and services. However, this procedure of responding to consumers’ questions is expensive for the company and quite slow for the users. There must be an effective and accessible approach to addressing this issue. Various researchers presented automated chatbots that give responses to customer queries instead of humans in various languages. Researchers presented different Arabic chatbots to work on the Arabic language. These chatbots have their own limitations. We propose a framework for optimizing Arabic chatbots by using the RASA framework, which is one of the current leading open-source platforms for chatbot development. The reason behind using the RASA is that it has not been used for Arabic chatbots in the past. ArRASA is a channel optimization method that uses a deep-learning model to develop an Arabic chatbot. ArRASA is a closed-domain chatbot that may be utilized in any Arabic industry.
4.1. Natural Language Understanding
Natural language processing (NLP) is concerned with how machines interpret language and promote “natural” back-and-forth contact between humans and computers. On the other hand, natural language comprehension is concerned with a machine’s capacity to comprehend human language [
17]. NLU is the process of rearranging unstructured data so that computers can “understand” and interpret it. For example, Machine Translation, Automatic Ticket Routing and Questioning Answer are established based on the concept of NLU. Natural language understanding (NLU), which translates natural language speech into a machine-readable format, is the primary technology used by chatbots.
Figure 1 depicts the NLUs design, and the NLUs preprocessing phase is split into two parts. Tokenization is the initial step, in which a corpus is divided into tokens, which are grammatically intangible language units. The second stage is featurization, which involves extracting the properties of each token. Following the preprocessing step, the purpose is classified to properly comprehend the user’s request, and an object is extracted to give a suitable answer [
18]. As a result, a user-friendly chatbot framework can be created. Intent classification is the technique of determining a user’s intent based on a user’s statement. Entities are preset collections of items that make sense, such as names, organizations, time expressions, numbers and other groups of objects. Various sets of entities are needed to be collected for each chatbot. Intents and entities are used to understand what a user wants and how to generate the correct answer to the user’s query. The RASA NLU can be used in chatbots and AI assistants for language understanding, emphasizing intent categorization and entity extraction. RASA NLU can interpret data using these two features. We formed a training dataset to recognize the intent and extract the entity using the RASA NLU pipeline. Rasa is a platform for building AI-powered, quality chatbots in the industry. Developers use it all around the world to build chatbots and contextual assistants. As described in
Section 5, the training set contains a variety of intents and entities, and there may be no entity or multiple entities in any sentence.
4.2. Pre-Training Setup
We utilize the masked language modeling (MLM) task to accomplish the first pre-training objective, which involves whole-word masking and replacing 15% of the N input tokens. The [MASK] token is used 80 percent of the time, a random token 10% of the time and the original token 10% to replace those tokens. By forcing the computer to anticipate the whole word rather than only parts of it as whole word masking raises the pre-training challenge. The next sentence prediction (NSP) task, which allows the model to identify the relationship between two phrases and is helpful for a range of language comprehension tasks such as question answering and machine translation, is one that we often employ. We mask a particular percentage of words or phrases in MLM, and the system is intended to guess such masked words based on other words in the text. Since the representation of the masked term is learned based on the words that appear on both sides of the masked term. Moreover, The MLM systems are bidirectional in structure. As MLM is the process of masking words in a series with a masked token and training the model to populate that mask with a suitable token. As a result, the model will concentrate on both the right and left contexts. So, using the MLM for the proposed model enables the proposed model to understand the context of terms both at the beginning and at the end of a phrase.
4.3. DIET (Dual Intent and Entity Transformer)
DIET is a multi-task transformer that conducts intent classification and entity recognition at the same time. It allows to plug-and-play pre-trained embeddings such as BERT, GloVe, ConveRT and others [
19]. No one collection of embeddings consistently performs well across various datasets in the tests. As a result, a modular architecture is particularly essential.
DIET is a software development process that incorporates a modular design. In terms of accuracy and performance, it is comparable to large-scale, pre-trained language models. It is 6X quicker to train than the present state of the art. On language comprehension benchmarks such as GLUE [
20] and SuperGLUE [
21], large-scale pre-trained language models have demonstrated promising results, with significant gains over previous pre-training techniques such as GloVe [
22] and supervised approaches. These embeddings are well suited to generalize across tasks since they were trained on large-scale natural language text corpora.
Input phrases are interpreted as a series of tokens, either words or sub-words, based on the featurization pipeline. We add a special classification token for the Arabic language at the end of each phrase. Each input token is characterized using sparse and/or dense characteristics. At the token level, one-hot encodings of character n-grams (n ≤ 5) and multi-hot encodings of character n-grams (n ≤ 5) are scarce. Because character n-grams contain much redundant information, we use dropout for these sparse features to prevent overfitting.
Figure 2 presents the proposed optimized pipeline architecture for the Arabic chatbot.
A two-layer transformer with relative position attention is utilized for encoding context throughout the whole phrase. The input dimension of the transformer design must be the same as the transformer layers. The concatenated features are sent through another fully connected layer with shared weights across all sequence stages to match the dimension of the transformer layers, which is set to 256 for the proposed model.