Next Article in Journal
Design of a Cattle-Health-Monitoring System Using Microservices and IoT Devices
Next Article in Special Issue
Botanical Leaf Disease Detection and Classification Using Convolutional Neural Network: A Hybrid Metaheuristic Enabled Approach
Previous Article in Journal
QoS-Aware Scheduling Algorithm Enabling Video Services in LTE Networks
Previous Article in Special Issue
The Influence of Genetic Algorithms on Learning Possibilities of Artificial Neural Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Real Time Arabic Sign Language Alphabets (ArSLA) Recognition Model Using Deep Learning Architecture

by
Zaran Alsaadi
1,
Easa Alshamani
1,
Mohammed Alrehaili
1,
Abdulmajeed Ayesh D. Alrashdi
1,
Saleh Albelwi
1,2 and
Abdelrahman Osman Elfaki
1,*
1
Faculty of Computing and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia
2
Industrial Innovation and Robotics Center, University of Tabuk, Tabuk 71491, Saudi Arabia
*
Author to whom correspondence should be addressed.
Computers 2022, 11(5), 78; https://doi.org/10.3390/computers11050078
Submission received: 27 March 2022 / Revised: 27 April 2022 / Accepted: 5 May 2022 / Published: 10 May 2022
(This article belongs to the Special Issue Human Understandable Artificial Intelligence)

Abstract

:
Currently, treating sign language issues and producing high quality solutions has attracted researchers and practitioners’ attention due to the considerable prevalence of hearing disabilities around the world. The literature shows that Arabic Sign Language (ArSL) is one of the most popular sign languages due to its rate of use. ArSL is categorized into two groups: The first group is ArSL, where words are represented by signs, i.e., pictures. The second group is ArSl alphabetic (ArSLA), where each Arabic letter is represented by a sign. This paper introduces a real time ArSLA recognition model using deep learning architecture. As a methodology, the proceeding steps were followed. First, a trusted scientific ArSLA dataset was located. Second, the best deep learning architectures were chosen by investigating related works. Third, an experiment was conducted to test the previously selected deep learning architectures. Fourth, the deep learning architecture was selected based on extracted results. Finally, a real time recognition system was developed. The results of the experiment show that the AlexNet architecture is the best due to its high accuracy rate. The model was developed based on AlexNet architecture and successfully tested at real time with a 94.81% accuracy rate.

1. Introduction and Motivation

Sign language is a method used by people with hearing impairments to interact with others. Therefore, research and development in the field of sign language recognition has attracted scientists, researchers, and engineers to develop software to facilitate the process of communicating with people with hearing disabilities [1]. Sign language is a form of communication that uses well-known signs or body motions to convey meaning. There are many hearing impaired people who are unable to write or read a language, as well. Hence, building a sign language translation or, in other words, a sign language recognition (SLR) system can be extremely beneficial to their lives. The SLR system is in high demand due to its capacity to bridge the gap between the hearing impairmed community and the rest of the world. It is one of the most important areas of computational research that deals with real life issues. According to the World Health Organization, Fact Sheet (2022), over 5% of the world’s population suffer from hearing impairments. It is also estimated that this number will increase to above 700 million people by 2050.
Arabic alphabets are used by the population of Arab countries, which amounts to about 1 billion people, or 14% of the world’s population (World Health Organization, 2015). In addition to the population of Arab countries, many Asian and African populations use Arabic alphabets in their languages or dialects, such as Persian, Malay (Jawi), Uyghur, Kurdish, Punjabi, Sindhi, Balti, Balochi, Pashto, Lurish, Urdu, Kashmiri, Rohingya, Somali and Mandinka, among others [2]. Therefore, Arabic alphabets are used by almost a quarter of the world’s population, clearly illustrating their significance.
According to the law of multiple proportions, 5% of the populations that use Arabic alphabetics face difficulties with hearing impairments. This is considered to be a significant number. This justifies the importance of Arabic Sign Language Alphabets (ArSLA). Arabic Sign Language Alphabets (ArSLA) are an illustration of Arabic letters in sign language shapes. Hence, ArSLA is used by the hearing-impaired community to overcome the obstacle of dealing with traditional Arabic letters. This enables them to be involved in the traditional educational and pedagogic process [3].
Arabic sign language is divided into two parts: the first part is a complete language where each word is represented by a sign (for instance, the word father is represented by a sign). Different Arab countries have their own Arabic sign languages, such as Egyptian sign language, or Saudi sign language. The first part is known as Arabic Sign Language (ArSL). In the second part, each letter in the Arabic alphabet is represented by a special sign known as Arabic sign Language Alphabets (ArSLA). Due to the formerly explained importance of Arabic sign language, the challenge of developing Arabic sign language recognition systems has garnered the attention of researchers and practitioners. As a result, the literature suggests many solutions for both ArSL and ArSLA.
Transfer learning [4] has been proposed as a solution to overcome this challenge, a technique in which the model is trained on a large training set. The results of this training are then treated as the starting point for the target task. Transfer learning has proved to be successful in fields such as language processing and computer vision. Data augmentation [5] is another technique found to be effective in alleviating overfitting and improving overall performance. This method increases the training set’s size by performing geometric and color transformations such as rotation, resizing, cropping, and adding noise to or blurring the image, etc. In this work, we did not use either transfer learning or data augmentation when training our CNN model. Instead, we utilized the ArSLA dataset which is suitable for testing and training. It uses around 1000 images for each letter to train the CNN model.
This paper presents a real time Arabic Sign Language Alphabets (ArSLA) recognition model using deep learning techniques. As previously discusses, we focused on ArSLA due to its popularity. The main target was to develop a solution that can be accessible for everyone.

2. Related Works

This section explores related works that were investigated and analyzed with the aim of discovering and highlighting any research gaps. The methodology used for collecting related works was to select recent research that utilized sign language recognition methods covering the previous 10-year period. As selection criteria, research papers that addressed issues with ArSL, or ArSLA with applicable solutions were chosen. The research papers that presented inapplicable or reiterated solutions were neglected.
According to [6], the recognition systems of sign languages could be categorized into two groups: glove-based systems and vision-based systems. The first group is based on hardware devices which consist of special sensors that can be packaged in different shapes that should be suitable to be used by hand (since sign languages are characterized by hands). The second group is based on image processing techniques and algorithms which leads to using only the camera. Despite the promising achievements in the first group, the second group could still be considered the best choice as the only hardware device it requires is a camera which available in almost any modern computer. In the literature, the first group is titled as sensors-based solutions, while the second group is titled as image-based solutions. In this paper, we follow the concepts of the second group. In the following, selected papers from related works are discussed and analyzed.
Halawani and Zaitun [7] developed a system for converting common Arabic spoken words to ArSL using a speech recognition engine. This work suggests using data gloves for measuring sign language motion. It is not clear how the common words used in the system were collected and validated. Mohandes el al. [8] developed a multilevel system for ArSL recognition. In the first level, the leap motion controller is used for tracking and detecting hand motion. Level one is used for images acquisition with the aim of creating a dataset. The second level is a preprocessing of collected images. The third level is a feature extraction process. The final level is a classification model. The performance of such systems is dependent on the degree of accuracy in defining image features. ElBadawy et al. [9] used a 3D Convolutional Neural Network (CNN) to develop an ArSL recognition system based on 25 sign pictures. The results of this research achieved 85% accuracy. Alzohairi et al. [10] developed an ArSLA recognition system using a Support Vector Machine (SVM). The SVM has been implemented as one versus all SVM that extract the histograms of Oriented Gradients (HOG) descriptor. The accuracy of this system did not exceed 63%.
Ibrahim et al. [11] proposed an automatic visual Sign Language Recognition System (SLRS) that converts solitary Arabic words into text. This proposal is limited to the 30 isolated words used in the daily school life of hearing-challenged children. A proposed skin-blob tracking technique is used to identify and track the hands.
Deriche et al. [12] designed an ArSL recognition system based on dual Leap Motion Controllers (LMC). The optimum geometric features were collected from both front and side LMCs. The classification was developed on aBayesian approach. The system was validated on one hundred developed signs. The accuracy of this system was not mentioned. Hassan et al. [13] conducted experiments for comparing the two methods of ArSL recognition which are: the k-Nearest Neighbor and Hidden Markov Models. The experiments were conducted based on a dataset collected by using sensor gloves and another dataset collected by a motion tracker. The results showed a similar classification accuracy. This means that the acquisition methods do not affect classification accuracy since a dataset collected them correctly. Gangrade and Bharti [14] used multi-layered random forest (MLRF) for recognizing static gestures from depth data provided by Microsoft’s Kinect sensor. The method in [14] was validated by synthetic data, a publicly available dataset of 24 American Sign Language (ASL) signals.
Kamruzzaman [15] proposed an ArSLA recognition system aimed at translating signs to Arabic speech. The classifier was developed based on a Convolutional Neural Network (CNN). The CNN architecture that was used in this model is not clear. The work in [16] developed a comparison survey to study the performance of ArSLAN classifiers. The selected classifiers were CNN, RNN, MLP, LDA, HMM, ANN, SVM, and KNN. The works in [14, 1] used deep convolutional networks to recognize letters and digits in American Sign Language. Missing signs (special letters, such as the space sign or gap between words) in the corpus are a source of errors in these models. In addition, the used CNN architectures are not clear.
The discussion of related works reflects the following issues which represent the research gap:
(1)
There are many works suggesting the use of a special hardware device to read hand signs which is considered an extra burden and an added expense. These solutions will not be available to everyone.
(2)
Related works that have been developed based on machine learning, or image processing algorithms, are dependent on implementing feature extraction techniques. Hence, the quality of extracted results is completely subject to the selected features which could possibly be an imperfect selection.
(3)
Related works that have been developed based on CNN neglect well known CNN architectures. Developing a solution based on Ad hoc CNN architecture is doubted on solution credibility, as new CNN architecture should be tested and validated in different environments and situations.
(4)
The works that dealt with ArSL suffer from scalability issues. Not all words can be covered due to the vast amount of words used in sign language.
In the following sections, explanation of how the proposed model has overcome the above shortcomings has been presented in detail.

3. Methodology

In this section, the steps that have been followed to achieve the proposed model are presented. As mentioned in the related works, ArSLA recognition could be achieved based on machine learning, or deep learning, i.e., CNN techniques. In this paper, we have chosen CNN to develop the recognition model due to its advantage over machine learning. Contrary to machine learning, the CNN model tackles the feature extraction by itself. The basic design principles for a CNN are to construct an architecture and a learning algorithm in such a way that the number of parameters is reduced but the compression and prediction capacity of the learning algorithms is not compromised. CNN layers and nonlinear activations are commonly used after the linear math procedure of convolution. Many times, in the architecture, local connections between pixels are utilized. The introduction of a local receptive field allows for the extraction of various feature elements. Hidden layers placed in-between fully connected layers, can detect a higher degree of complexity. CNN is more effective than machine learning recognition, reconstruction, and classification because of functions of sparse connectivity between subsequent layers, the parameter sharing of weights between neighboring pixels, and similar representations.
Figure 1 shows the frame of methodology for developing the proposed model. This framework consists of five steps: the search for a suitable standard ArSLA dataset, the search for suitable CNN models, selection of the best CNN model, the development of a real time recognition system, and validation of the developed real time recognition system. In the following, each step is presented and discussed.

3.1. Search for a Suitable Standard ArSLA Dataset

In this step, we searched for a suitable standard ArSLA dataset. If no suitable dataset was found, there were no other options but to create our own dataset. Figure 2 shows the dataset selection strategy.
The targeted dataset should contain three features which are:
(1)
Dataset for ArSLA. This means the dataset should contain all Arabic alphabetics letters.
(2)
The dataset should consist of static images.
(3)
The dataset should be a standard dataset which means the selected dataset has been involved in public research with published results.
We have chosen the ArSLA dataset that was published in [6]. This dataset has the advantages of being fully tagged, made publicly available, consisting of 54,049 images in greyscale jpg formats with a resolution of 64 × 64, and representing the 32 Arabic letters. Figure 3 shows the selected ArSLA dataset.
Figure 3 shows 32 basic Arabic signs and alphabets that contains 54,049 photos of ArSLA demonstrated by more than 40 people. The number of photos per class varies depending on the class. A sample graphic of all Arabic Language Signs is also included. Based on the image file name, the CSV file contains the label of each related Arabic Sign Language Image.

3.2. Search for Suitable CNN Architectures

As it is known in computer science, deep learning or CNN is the best technique that can be used for a recognition system. According to [1] the previous most-used deep learning trained architectures are AlexNet [17], VGG-16 [18] and ResNet50 [19], EfficientNet [20] among others.
CNN can be fed raw image pixel values rather than pre-processed feature vectors, unlike standard machine learning applications. Figure 4 shows the general architecture of CNN [21]. CNN’s usual architecture is made up of layers of computing units (gates), which are:
  • Convolutional Layers: A grid that provides input to each gate. Each gate’s weights are connected so that each gate recognizes the same feature. There are various sets of gates similar to this, organized in multiple channels (layers) to learn different aspects.
  • Pooling Layers: This works as a down--sampling layer by reducing the number of gates. Each of the “k × k” input grid gates are usually reduced to a single cell/gate by choosing the maximum input value or the average of all inputs. The layer is scanned with a tiny k grid and a stride is chosen so that the grid covers the layer without overlapping.
  • Fully linked Layers: Each gate’s output is connected to the input of the next layer’s gate. (Also referred to as auto encoder levels). These transform a vectorized version of the input into a normalized vectorized output. The output vector is a set of probabilities that serve as the classification signature.
  • Convolution Layers: Consider 1D convolution, suppose the input vector is f and the kernel is g whose length is m. The following equation shows the center of kernel shifted and multiplied.
( f × g ) ( i ) = j = 1 m g ( j ) f i j + m 2
Similarly, one can define the 2D convolution. If the input of 2D convolution is an image I (or, equivalently, a w by h matrix), and the m × m kernel matrix is denoted as W, then this could be noted by the following equation:
( I W ) ( x , y ) = j = 1 m k = 1 m W j , k I x j + m / 2 , y k + m / 2
Consider next the Convolutional layer in a typical CNN. Suppose the input of the convolutional layer has the dimension H × W × C, then the convolutional layers can be taken as a set of C parallel, or stacked matrix feature maps, formed by convolving different sized matrix kernels (feature detectors) over the input, and projecting element wise the accumulated dot products. If the chosen convolutional kernel is k1 × k2 × C and with a stride Zs (representing the kernel sliding interval), and together with a zero-padding parameter Zp, representing the extent of the zero-border surrounding the image, one controls the size of the resulting feature maps. Then, the dimension of the output of such a convolution layer will be H1 × W1 × D1, where:
( H 1 , W 1 , D 1 ) = ( H + 2 Z p k 1 Z s + 1 , W + 2 Z p k 2 Z s + 1 , K D )
where additionally KD: =depth size capturing the number of stacked convolutional layers, (=C in this case).
Activation functions:
Activation functions define the output of a neuron based on a given set of inputs. Some commonly used activation functions σ() with their gradients are as follows (Zanna, and Bolton, 2020) [22]:
ReLU: σ ( x ) = { 0         x < 0 x         x 0
Softmax: σ x j = e x j k = 1 d e x k
5.
Pooling Layers: Pooling layers are down-sampling layers combining the output of layers to a single neuron. If we denote k as the kernel size (now assume kernel is squared), Dn as number of kernel windows, and Zs as stride to develop pooling layers, then the output dimension of the pooling layer will be (suppose we have H1 × W1 × D1 input) as has been denoted be the following equation:
where   ( H 1 , W 1 , D 1 ) = ( H 1 k Z s + 1 , W 1 k Z s + 1 , D n )
With a pooling type that can be:
  • Max-pooling
  • Average Pooling
  • L2 norm Pooling
6.
Fully Connected Dense Layers: After the pooling layers, pixels of pooling layers are stretched to a single column vector. These vectorized and concatenated data points are fed into dense layers, known as fully connected layers for the classification. In some cases, the output layer of a deep network uses a soft max procedure. Similar to logistic regression: Given n vectors {x1, x2,…,xn} with labels {l1, l2,…,ln}, where li ∈ {0, 1} (as a binary classification task). With a weight vector w one can define by the following equation:
Prob ( l = 1 x ) = σ ( w T x ) :   = 1 1 + e w T x
where σ represents sigmoid function.
In the following, the selected four CNN models have been presented.

3.2.1. AlexNet

AlexNet was the first convolutional network to employ the graphics processing unit (GPU) to improve performance. AlexNet has five convolutional layers, three max-pooling layers, two normalization layers, two fully connected layers, and one SoftMax layer in its design. Convolutional filters and a nonlinear activation function ReLU are used in each convolutional layer. Max pooling is conducted using the pooling layers. Due to the presence of fully connected layers, the input size is fixed. The input size is usually stated as 224 × 224 × 3, however due to padding, it can add up to 227 × 227 × 3. The neural network has 60 million parameters in total. Max Pool is used to down-sample an image or a representation. Overlapping Max Pool layers are like Max Pool layers with the exception of the adjacent windows over which the maximum determined overlaps, as shown in Figure 5. AlexNet’s employed pooling windows with a size of 33 and a stride of 2 between adjacent windows [23]. Figure 5 shows AlexNet architecture.

3.2.2. VGG16

The use of very small 3 × 3 receptive field (filters) over the whole network with a stride of 1 pixel was proposed in this model. In AlexNet, the receptive field in the first layer was 11 × 11 with stride 4, whereas the receptive field in the second layer was 11 × 11 with stride 4. The notion of utilizing 3 × 3 filters in a uniform manner increased the VGG performance. Two 3 × 3 filters in succession produce a 5 × 5 effective receptive field. Three 3 × 3 filters, meanwhile, produce a 7 × 7 receptive field. A combination of multiple 3 × 3 filters can thus stand in for a larger receptive area. In addition to the three convolution layers, there are three non-linear activation layers, rather than the single one in 7 × 7. Therefore, the decision functions become more discriminative and would provide the network the ability to converge quickly. Secondly, it greatly reduces the number of weight parameters in the model. This can also be viewed as a regularization of the 7 × 7 convolutional filters, forcing them to decompose through the 3 × 3 filters, with non-linearity introduced in the middle using ReLU activations. The network’s inclination to over-fit during the training exercise would be reduced as a result as shown in Figure 6. Furthermore, 3 × 3 is the smallest size that can capture the concepts of left to right, top to bottom, and so on. As a result, lowering the filter size even further may have an influence on the model’s capacity to recognize the spatial aspects of the image. The network’s persistent use of 3 × 3 convolutions made it incredibly simple, elegant, and easy to deal with [24]. Figure 6 shows VGG16 architecture.

3.2.3. ResNet

We usually stack additional layers in Deep Neural Networks to address a complex problem, which improves accuracy and performance. The idea behind adding more layers is that these layers can learn increasingly complicated features as time goes on. However, it has been discovered that the classic Convolutional neural network model has a maximum depth threshold. Moreover, the emergence of ResNet or residual networks, which are made up of Residual Blocks, has relieved the challenge of training very deep networks. ResNet’s skip connections alleviate the problem of disappearing gradients in deep neural networks by allowing the gradient to flow through an additional shortcut channel as shown in Figure 6. These connections also aid the model by allowing it to learn the identity functions, ensuring that the higher layer performs at least as well as the lower layer, if not better [25]. Figure 7 shows ResNet architecture.

3.2.4. EfficientNet

EfficientNet is a convolutional neural network design and scaling method that uses a compound coefficient to scale all depth/width/resolution dimensions evenly. The EfficientNet scaling method consistently scales network breadth, depth, and resolution with a set of predefined scaling coefficients, unlike traditional methods, which arbitrary scales these elements. Before the EfficientNets, the most popular technique to scale up ConvNets was to increase the depth (number of layers), the breadth (number of channels), or the image quality. As shown in Figure 8, the EfficientNet started by creating a baseline network using a technique called neural architecture search which automates the building of neural networks. On a floating-point operations per second (FLOPS) basis, it optimizes both accuracy and efficiency. The movable inverted bottleneck convolution is used in this architecture (MBConv). The researchers then scaled up this baseline network to create the EfficientNets family of deep learning models [20]. Figure 8 shows EfficientNets architecture.
In the previous step, the most popular CNN models, according to the literature, were discussed. The next step illustrates how the best CNN model among the four explained CNN models was nominated. In the following, the third step of our methodology is presented.

3.3. Select the Best CNN Architecture

According to [26], accuracy is the best benchmark to compare between CNN architectures. Accordingly, experiments have been conducted for testing accuracy among the four selected CNN architectures. Experiments were conducted according to the following steps:
(1)
Preparing and resizing the dataset images to be ready for insertion into the selected four CNN architectures.
(2)
Labeling the dataset images to be ready for the classification process. The Appendix A shows a snapshot of the software code for labeling the dataset images and for preparing the dataset for experiments.
(3)
Split dataset images in two sets, training and testing sets.
The first step in building the real-time object detection model was to split the dataset into train and test sets. The python library split-folders were used to split the dataset. The dataset was split into 80% training set (43,240 images) and 20% test set (10,809 images). The training set was used for training the CNN architecture to achieve the best weights. The testing set was used for checking the correctness of achieved weights and to adjust it accordingly.
Define the optimizer where the Adam optimizer [27] has been chosen for controlling the learning rate based on the following equation:
Optimizer = Adam (lr = 0.001, beta_1 = 0.9, beta_2 = 0.999)
(4)
Define the epochs and batch size where the epoch specifies the number of times the CNN accepts whole training data, i.e., the epoch is equal to one forward pass and one backward pass for all training samples. On the other hand, the batch size specifies the number of training samples we use in one forward pass and one backward pass [28]. The following parameters are defined in this step: epochs = 30; batch_size = 32. Due to limitation of this paper size, only the results of last five epoch are presented. In this step, image Augmentation was also implemented to avoid overfitting problem.
(5)
Run the four CNN architectures where categorical cross-entropy loss [29] is typically used in a multiclass classification setting in which the outputs are interpreted as predictions of class membership probabilities. The output of this experiment step is defining classification accuracy for each CNN architecture. Table 1 shows the hyperparameters for experiments. Table 2 shows the accuracy results that have been generated from the conducted experiments.
It is obvious that the best CNN architecture is the AlexNet architecture. In the following table, the implementation of real time recognition model based AlexNet architecture is presented. Figure 9 shows the accuracy results of the conducted experiment. Figure 10 shows the loss results of the conducted experiment. Table 3 shows how many images representing each sign were employed into network training and validation.

4. Developing Real Time ArSLA Recognition Model Using AlexNet Deep Learning Architecture

In this section, the technical details of developing a real time model for ArSL recognition is explained. We have selected AlexNet architecture due to its high performance as examined in the previous section. As previously presented in this discussion, the recognition model based on AlexNet architecture has been developed and is ready to be tested and used. Now, Figure 11 and Figure 12 show two Arabic letters Nuon (ن) and Thaa (ث) that have been captured in real time. A gestures technique has been used for identifying and capturing hand shape. In image processing, a gesture is defined as a technique in which part of the human body is recognized by using a camera [30].
In the following, steps for the sign reorganization are described:
Image Capturing: Open-cv has been used for developing software to control the camera and implement the real-time detection. The saved model from previous trainings were loaded into the system for applying with a real-time detector. After that, the gesture recognition model has been used to detect the convexity of hand.
Extracting the ROI: (Region of interest) from inserted frames within background subtraction. Determine the contour and draw the convex hull. The contour is outlined as the object’s (hand) boundary that can be seen in the image. The contour can also be a wave connecting points that has a similar color value and is important in the shape analyzing and objects identification method.
Find the convexity defects depending upon the number of defects and determine the gesture. This process takes few milliseconds which means the recognition is implemented at real time.
Algorithm 1 shows the algorithm of gestures technique that was implemented in this model. Algorithm 2 shows the algorithm of testing the model by a single image. The model recognized these two letters successfully. This process can later be used to write a full article. Writing a full article by using our proposed model will enhance the lives of people with hearing disabilities.
Algorithm 1. Algorithm of gestures technique that has been implemented in this model
# video capture
Set: cap = cv2.VideoCapture(0)
while True:
  read frame
    # Simulating mirror image
    frame = cv2.flip(frame, 1)
   # Coordinates of the ROI
    x1 = int(0.5*frame.shape [1])
    y1 = 10
    x2 = frame.shape [1]-10
    y2 = int(0.5*frame.shape [1])
    # Drawing the ROI
    # The increment/decrement by 1 is to compensate for the bounding box
    cv2.rectangle(frame, (x1 − 1, y1 − 1), (x2 + 1, y2 + 1), (255, 0, 0), 1)
    # Extracting the ROI
    roi = frame [y1:y2, x1:x2]
# Resizing the ROI so it can be fed to the model for prediction
    roi = resize image (roi, (64, 64))
    roi = Color CTV (roi, cv2.COLOR_BGR2GRAY)
    _, image = cv2.threshold(roi, 120, 255, cv2.THRESH_BINARY)
    Show image(image)
Algorithm 2. Algorithm of testing the model by single image
Set path:
Img = convert to gray(path)
print(path)
letter = path.split(letter)
print(letter)
num = transform(letter)
print(num)
img = reshape image(1, 227, 227, 3)
pred_y = model1.predict(img)
print(np.argmax(pred_y))
calculate test loss
test accuracy = model1.evaluate(img, num)
print(test accuracy)

5. Discussion and Conclusions

Regarding the confusion matrix, all the models predicted 32 classes for the 32 standard Alphabetic Arabic signs. The VGG model predicted 45 correct images belonging to class 7. It also predicted 30 images correctly belonging to class 19. The ResNet model predicted 20 correct images from class 7 and about 10 images from class 19. The efficientNet model predicted about 10 images correct for class 7 and class 19.
Deep learning challenges were stated in [31]. In the following, the explanation of how this proposal model dealt with these challenges is presented.
Challenge 1: Convolutional neural networks (CNNs) require sufficient training samples to achieve high performance as using small datasets can result in overfitting.
To overcome this challenge, transfer learning has been proposed as a solution, a technique in which the model is trained on a large training set. The results of this training are then treated as the starting point for the target task. Transfer learning has been successful in fields such as language processing and computer vision. Frequently used pretrained deep learning models include common models such as AlexNet, EfficientNet, VGG16, and ResNet, which are all typically utilized for image classification. Data augmentation is another technique that has proved to be effective in alleviating overfitting and improving overall performance. This method increases the training set’s size by performing geometric and color transformations such as rotation, resizing, cropping, and either adding noise or blurring the image, etc. In this work, we did not use either transfer learning or data augmentation in training our CNN model. Instead, we utilized the ArSLA dataset which is suitable for testing and training. It uses around 1000 images for each letter to train the CNN model.
Challenge 2: Selecting the proper CNN model. This is because model selection will be different from one dataset to another, meaning the process is dependent on trial and error.
To address this issue, we trained and tested four, state-of-the-art CNN models, including AlexNet, VGG16, GoogleNet, and ResNet, in order to select the best model to classify the sign language. Our results found that AlexNet had the highest accuracy at 94.81%.
Challenge 3: In manufacturing, data acquisition can be difficult. This is due to one of two issues: The first is that sensor placement is ineffective. The second is that vibrations or noise render the collected data useless.
To address this challenge, the ArSLA dataset [6] utilized data from 40 contributors across varying age groups. We then scaled the image pixels to zero-mean and unit variance as pre-processing techniques. We desire our model to learn from noisy images because most of the applications use the camera to capture the letters, some of which are lower quality.
In Section 2 (Related Works), the problem descriptions, research gaps, and enhancement possibilities were discussed in four points. In the following, the contributions of the proposed recognition model are discussed in light of these four points.
(1)
The proposed model is accessible to everyone as no additional or special equipment was used and only a PC or laptop camera is required.
(2)
The process of feature extraction is built into the Alexnet architecture which produces high accuracy. This feature extraction is completely independent of human intervention which in turn makes it free from human error factors.
(3)
The recognition systems developed based on deep learning architectures exhibit a good reputation and produce results with high accuracy. In light of this, the best deep learning (CNN) architectures (according to the literature) were tested by a real and trusted ArSL dataset, then based on the test’s results the best CNN architecture were chosen for developing the real-time recognition system. The proposed model was not built from scratch, but rather from the latest findings of researchers in this field, which enabled it to benefit from the accumulated experiences. Table 4 summarizes related works by focusing on training accuracy. In our proposed model, the training accuracy for AlexNet is 99.75% (see Table 2). For the training accuracy that is better than the highest value in related works, see Table 4. For the sake of transparency, we have additionally presented the validation accuracy (testing accuracy) which is 94.81%.
(4)
The proposed model was developed with the complete Arabic alphabet which allows users to write full articles using ArSLA in real time.
This model is limited to detect only one object (a hand) without taking into the background into consideration. The background of the hand plays a prominent role in object recognition. The performance might not be the same if the background is changed. The background should be the same as in the training set. In addition, the detection process in our proposed model is highly sensitive to variations in the hand’s pose.
Future work is planned and suggested to develop a mobile application based on this proposed model. Another direction of future work is to utilize transfer learning to pre-train a model on other sign language datasets such as the American Sign Language dataset, MS-ASL. Other work will modify Arabic sign language data to validate the effectiveness of implementing transfer learning in sign language recognition. Data augmentation will also be applied to generate training samples.

Author Contributions

Conceptualization, A.O.E.; Data curation, Z.A.; Formal analysis, S.A.; Investigation, Z.A. and S.A.; Methodology, Z.A. and A.O.E.; Project administration, A.O.E.; Resources, E.A., M.A. and A.A.D.A.; Software, Z.A., E.A., M.A. and A.A.D.A.; Supervision, A.O.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data has been presented in main text.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

# importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow
from tensorflow.keras.layers import Input, Lambda, Dense, Flatten, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications import Xception
from tensorflow.keras.models import Sequential
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from glob import glob
from tensorflow.keras.applications import EfficientNetB0
# initiate vgg16
model = VGG16(include_top = False, weights = ‘imagenet’)
x = model.output
x = GlobalAveragePooling2D ( ) (x)
x = Dense(1024, activation = ‘relu’) (x)
pred = Dense(trainx.num_classes, activation = ‘softmax’)(x)
mdl2 = Model(inputs = model.input, outputs = pred)
for layer in model.layers:
  layer.trainable = False
mdl2.compile(loss = ‘categorical_crossentropy’,
   optimizer = ‘adam’,
     metrics = [‘accuracy’]    )
mdl2.fit(trainx, epochs = 30)
# vgg model evaluation
test_loss, test_acc = mdl2.evaluate(testx)
# generating predictions
true_classes = testx.classes
class_indices = trainx.class_indices
class_indices = dict((v, k) for k, v in class_indices.items())
ypred2 = mdl2.predict(testx)
ypred2_classes = np.argmax(ypred2, axis = 1)
# accuracy
from sklearn.metrics import accuracy_score
acc = accuracy_score(true_classes, ypred2_classes)
print(acc)
import seaborn as sns
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_classes, ypred2_classes)
print(cm)
plt.figure(figsize = (12, 9))
sns.heatmap(cm)
plt.show()
# RestNet50
res = ResNet50(include_top = False, weights = ‘imagenet’)
x1 = res.output
x1 = GlobalAveragePooling2D()(x1)
x1 = Dense(1024, activation = ‘relu’)(x1)
pred1 = Dense(trainx.num_classes, activation = ‘softmax’)(x1)
mdl3 = Model(inputs = res.input, outputs = pred1)
for layer in model.layers:
  layer.trainable = False
mdl3.compile(loss = ‘categorical_crossentropy’,
   optimizer = ‘adam’,
     metrics = [‘accuracy’]    )
mdl3.fit(trainx, epochs = 30)
# restnet model evaluation
testr_loss, testr_acc = mdl3.evaluate(testx)
# confusion matrix
cm1 = confusion_matrix(trueclas, ypred3_classes)
# heatmap
plt.figure(figsize = (12, 9))
sns.heatmap(cm1)
plt.show()
# efficient net model
eff = EfficientNetB0(include_top = False, weights = ‘imagenet’)
x2 = eff.output
x2 = GlobalAveragePooling2D()(x2)
x2 = Dense(1024, activation = ‘relu’)(x2)
pred2 = Dense(trainx.num_classes, activation = ‘softmax’)(x2)
mdl4 = Model(inputs = eff.input, outputs = pred2)
for layer in model.layers:
  layer.trainable = False
mdl4.compile(loss = ‘categorical_crossentropy’,
   optimizer = ‘adam’,
     metrics = [‘accuracy’]    )
mdl4.fit(trainx, epochs = 30)
# generating predictions for restnet
trueclas = testx.classes
classindices = trainx.class_indices
classindices = dict((v, k) for k, v in class_indices.items())
ypred3 = mdl3.predict(testx)
ypred3_classes = np.argmax(ypred3, axis = 1)
# accuracy
acc1 = accuracy_score(trueclas, ypred3_classes)
print(acc1)
# efficientnet model evaluation
testf_loss, testf_acc = mdl4.evaluate(testx)
# efficientnet confusion matrix
# confusion matrix
cm2 = confusion_matrix(trueclazz, ypred4_classes)
# heatmap
plt.figure(figsize = (12, 9))
sns.heatmap(cm2)
plt.show()
# Alexnet model
# defining a model
alx = Xception(include_top = False, weights = ‘imagenet’)
x4 = alx.output
x4 = GlobalAveragePooling2D()(x4)
x4 = Dense(1024, activation = ‘relu’)(x4)
pred5 = Dense(trainx.num_classes, activation = ‘softmax’)(x4)
mdl5 = Model(inputs = alx.input, outputs = pred5)
 
for layer in model.layers:
  layer.trainable = False
mdl5.compile(loss = ‘categorical_crossentropy’,
   optimizer = ‘adam’,
     metrics = [‘accuracy’]    )
mdl5.fit(trainx, epochs = 30)
# alexnet model evaluation
testa_loss, testa_acc = mdl5.evaluate(testx)
# generating predictions for efficientnet
trueclazz1 = testx.classes
clazzindices1 = trainx.class_indices
clazzindices1 = dict((v, k) for k, v in class_indices.items())
ypred5 = mdl5.predict(testx)
ypred5_classes = np.argmax(ypred5, axis = 1)
# accuracy
acc3 = accuracy_score(trueclazz1, ypred5_classes)
print(acc3)
 
# Alexnet confusion matrix
# confusion matrix
cm3 = confusion_matrix(trueclazz1, ypred5_classes)
# heatmap
plt.figure(figsize = (12, 9))
sns.heatmap(cm3)
plt.show()

References

  1. Rastgoo, R.; Kiani, K.; Escalera, S. Sign Language Recognition: A Deep Survey. Expert Syst. Appl. 2020, 164, 113794. [Google Scholar] [CrossRef]
  2. Mirdehghan, M. Persian, Urdu, and Pashto: A comparative orthographic analysis. Writ. Syst. Res. 2010, 2, 9–23. [Google Scholar] [CrossRef]
  3. El-Bendary, N.; Zawbaa, H.M.; Daoud, M.S.; Hassanien, A.E.; Nakamatsu, K. ArSLAT: Arabic Sign Language Alphabets Translator. In Proceedings of the 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM), Krakow, Poland, 8–10 October 2010; pp. 590–595. [Google Scholar] [CrossRef]
  4. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
  5. Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
  6. Tharwat, A.; Gaber, T.; Hassanien, A.E.; Shahin, M.K.; Refaat, B. Sift-based arabic sign language recognition system. In Afro-European Conference for Industrial Advancement, Proceedings of the First International Afro-European Conference for Industrial Advancement AECIA 2014, Addis Ababa, Ethiopia, 17–19 November 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 359–370. [Google Scholar]
  7. Halawani, S.M.; Zaitun, A.B. An avatar based translation system from Arabic speech to Arabic sign language for deaf people. Inter. J. Inf. Sci. Educ. 2012, 2, 13–20. [Google Scholar]
  8. Mohandes, M.; Aliyu, S.; Deriche, M. Arabic sign language recognition using the leap motion controller. In Proceedings of the 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE), Istanbul, Turkey, 1–4 June 2014; pp. 960–965. [Google Scholar]
  9. ElBadawy, M.; Elons, A.S.; Howida, A.; Shedeed, H.A.; Tolba, M.F. Arabic sign language recognition with 3d convolutional neural networks. In Proceedings of the 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 5–7 December 2017; pp. 66–71. [Google Scholar]
  10. Alzohairi, R.; Alghonaim, R.; Alshehri, W.; Aloqeely, S.; Alzaidan, M.; Bchir, O. Image based Arabic sign language recognition system. Int. J. Adv. Comput. Sci. Appl. 2018, 9. [Google Scholar] [CrossRef] [Green Version]
  11. Ibrahim, N.B.; Selim, M.M.; Zayed, H.H. An Automatic Arabic Sign Language Recognition System (ArSLRS). J. King Saud Univ. Comput. Inf. Sci. 2018, 30, 470–477. [Google Scholar] [CrossRef]
  12. Deriche, M.; Aliyu, S.O.; Mohandes, M. An Intelligent Arabic Sign Language Recognition System Using a Pair of LMCs With GMM Based Classification. IEEE Sens. J. 2019, 19, 8067–8078. [Google Scholar] [CrossRef]
  13. Hassan, M.; Assaleh, K.; Shanableh, T. Multiple Proposals for Continuous Arabic Sign Language Recognition. Sens. Imaging 2019, 20. [Google Scholar] [CrossRef] [Green Version]
  14. Gangrade, J.; Bharti, J. Real time sign language recognition using depth sensor. Int. J. Comput. Vis. Robot. 2019, 9, 329. [Google Scholar] [CrossRef]
  15. Kamruzzaman, M.M. Arabic Sign Language Recognition and Generating Arabic Speech Using Convolutional Neural Network. Wirel. Commun. Mob. Comput. 2020, 2020, 3685614. [Google Scholar] [CrossRef]
  16. Mustafa, M. A study on Arabic sign language recognition for differently abled using advanced machine learning classifiers. J. Ambient. Intell. Human Comput. 2020, 12, 4101–4115. [Google Scholar] [CrossRef]
  17. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
  18. Piekarski, M.; Jaworek-Korjakowska, J.; Wawrzyniak, A.I.; Gorgon, M. Convolutional neural network architecture for beam instabilities identification in Synchrotron Radiation Systems as an anomaly detection problem. Measurement 2020, 165, 108116. [Google Scholar] [CrossRef]
  19. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  20. Koonce, B. EfficientNet. In Convolutional Neural Networks with Swift for Tensorflow; Apress: Berkeley, CA, USA, 2021; pp. 109–123. [Google Scholar]
  21. Valueva, M.; Nagornov, N.; Lyakhov, P.; Valuev, G.; Chervyakov, N. Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math. Comput. Simul. 2020, 177, 232–243. [Google Scholar] [CrossRef]
  22. Zanna, L.; Bolton, T. Data-Driven Equation Discovery of Ocean Mesoscale Closures. Geophys. Res. Lett. 2020, 47, e2020GL088376. [Google Scholar] [CrossRef]
  23. Schmidhuber, J. Deep Learning. Scholarpedia 2015, 10, 32832. [Google Scholar] [CrossRef] [Green Version]
  24. Rezende, E.; Ruppert, G.; Carvalho, T.; Theophilo, A.; Ramos, F.; Geus, P.D. Malicious software classification using VGG16 deep neural network’s bottleneck features. In Information Technology-New Generations; Springer International Publishing: Cham, Switzerland, 2018; pp. 51–59. [Google Scholar]
  25. Lin, H.; Jegelka, S. Resnet with one-neuron hidden layers is a universal approximator. Advances in neural information processing systems. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 3–8 December 2018; Volume 31. [Google Scholar]
  26. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
  27. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
  28. Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw. 2019, 113, 54–71. [Google Scholar] [CrossRef]
  29. Gordon-Rodriguez, E.; Loaiza-Ganem, G.; Pleiss, G.; Cunningham, J.P. Uses and abuses of the cross-entropy loss: Case studies in modern deep learning. In Proceedings of the “I Can’t Believe It’s Not Better!” at NeurIPS Workshops, PMLR, Virtual, 12 December 2020. [Google Scholar]
  30. Kaura, H.K.; Honrao, V.; Patil, S.; Shetty, P. Gesture controlled robot using image processing. Int. J. Adv. Res. Artific. Intell. 2013, 2. [Google Scholar] [CrossRef] [Green Version]
  31. Nasir, V.; Sassani, F. A review on deep learning in machining and tool monitoring: Methods, opportunities, and challenges. Int. J. Adv. Manuf. Technol. 2021, 115, 2683–2709. [Google Scholar] [CrossRef]
  32. Abiyev, R.H.; Arslan, M.; Idoko, J.B. Sign Language Translation Using Deep Convolutional Neural Networks. KSII Trans. Internet Inf. Syst. 2020, 14, 631–653. [Google Scholar] [CrossRef]
  33. Youssif, A.A.; Aboutabl, A.E.; Ali, H.H. Arabic sign language (ArSL) recognition system using hmm. Int. J. Adv. Comput. Sci. Appl. 2011, 2. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Frame of methodology for developing the proposed model.
Figure 1. Frame of methodology for developing the proposed model.
Computers 11 00078 g001
Figure 2. The dataset selection strategy.
Figure 2. The dataset selection strategy.
Computers 11 00078 g002
Figure 3. The selected ArSLA dataset.
Figure 3. The selected ArSLA dataset.
Computers 11 00078 g003
Figure 4. General architecture of CNN.
Figure 4. General architecture of CNN.
Computers 11 00078 g004
Figure 5. AlexNet architecture.
Figure 5. AlexNet architecture.
Computers 11 00078 g005
Figure 6. VGG16 Architecture.
Figure 6. VGG16 Architecture.
Computers 11 00078 g006
Figure 7. ResNet Architecture.
Figure 7. ResNet Architecture.
Computers 11 00078 g007
Figure 8. Efficient Nets architecture.
Figure 8. Efficient Nets architecture.
Computers 11 00078 g008
Figure 9. The accuracy results of the conducted experiment.
Figure 9. The accuracy results of the conducted experiment.
Computers 11 00078 g009
Figure 10. The loss results of the conducted experiment.
Figure 10. The loss results of the conducted experiment.
Computers 11 00078 g010
Figure 11. Arabic letter Nuon (ن).
Figure 11. Arabic letter Nuon (ن).
Computers 11 00078 g011
Figure 12. Arabic letter Thaa (ث).
Figure 12. Arabic letter Thaa (ث).
Computers 11 00078 g012
Table 1. The Hyperparameters for experiments.
Table 1. The Hyperparameters for experiments.
HyperparametersVGG16
Value
RestNet50
Value
EffecientNet
Value
AlexNet
Value
Initial learning rate0.0010.0010.0010.001
Activation functionCategorical Cross EntropyCategorical Cross EntropyCategorical Cross EntropyCategorical Cross Entropy
Number of epochs 30 30 30 30
Batch size32323232
OptimizerADAMADAMADAMADAM
Weight initializationXavier initializationXavier initializationXavier initializationXavier initialization
Learning rate decay (λ)0.00020.00020.00020.0002
Momentum0.90.90.9 0.9
Table 2. The accuracy results that have been generated from the conducted experiments.
Table 2. The accuracy results that have been generated from the conducted experiments.
ModelEpochTrain LossTrain Accuracy (%)Valid LossValid Acc (%)
VGG16260.218793.270.593682.30
270.206993.70
280.207793.68
290.199693.77
300.195794.05
RestNet50260.023599.290.568889.83
270.015799.55
280.021299.35
290.017599.52
300.102196.76
EffecientNet260.029099.240.548186.56
270.026999.24
280.022599.32
290.026599.29
300.023999.37
AlexNet260.026999.100.200494.81
270.024299.28
280.026599.19
290.021699.32
300.008599.75
Table 3. How many images representing each sign were employed into network training and validation.
Table 3. How many images representing each sign were employed into network training and validation.
#Letter Name in English ScriptLetter Name in Arabic Script# of Images#Letter Name in English ScriptLetter Name in Arabic Script# of Images
1Alif أَلِف)أ) 167217 ظَاء)ظ) 1723
2 بَاء) ب) 179118Ayn عَين)ع) 2114
3 أتَاء) ت) 183819Ghayn غَين)غ) 1977
4Thā ثَاء) ث) 176620 فَاء)ف) 1955
5Jīm جِيمْ) ج) 155221Qāf قَاف) ق) 1705
6 حَاء) ح) 152622Kāf كَاف)ك) 1774
7Khā خَاء) خ) 160723Lām لاَمْ)ل) 1832
8Dāl دَالْ) د) 163424Mīm مِيمْ)م) 1765
9Dhāl ذَال) ذ) 158225Nūn نُون)ن) 1819
10 رَاء) ر) 165926 هَاء)ه) 1592
11Zāy زَاي) ز) 137427Wāw وَاو)و) 1371
12Sīn سِينْ) س) 163828 يَا) ئ) 1722
13Shīn شِينْ) ش) 150729Tāa ة)ة) 1791
14Sād صَادْ)ص) 189530Al ال)ال) 1343
15Dād ضَاد)ض) 167031Laa ﻻ)ﻻ) 1746
16 طَاء)ط) 181632Yāa يَاء) يَاء) 1293
Table 4. Shows summary of related works by focus on training accuracy.
Table 4. Shows summary of related works by focus on training accuracy.
#RefYearDeviceLanguageFeaturesTechniqueTraining Accuracy
1[9]2017Camera25 Arabic wordsImage pixelsCNN90%
2[12]2019dual Leap Motion Controllers100 Arabic wordsN geometric parametersLDA88%
3[14]2019Kinect sensor35 Indian signDistances, angles, and velocity involving upper body jointsMulti-class
support vector machine classifier
87.6%
4[11]2018Single camera30 Arabic wordsSegmented imageEuclidean
distance classifier
83%
5[32]2020Single camera24 English lettersImage pixelsInception v3 plus Support Vector Machine (SVM)92.21%
6[16]2020Single camera28 Arabic lettersImage pixelsCNN97.82%
7[6]2015Glove30 Arabic lettersinvariant featuresResNet-1893.4%
8[33]2011Single camera20 Arabic wordsEdge detection and contours trackingHMM82.22%
9[13]2019Camera40 Arabic sign language wordsThresholder image differencesHMM94.5%
10[10]2018Camera30 Arabic lettersFFTHOG and SVM63.5%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alsaadi, Z.; Alshamani, E.; Alrehaili, M.; Alrashdi, A.A.D.; Albelwi, S.; Elfaki, A.O. A Real Time Arabic Sign Language Alphabets (ArSLA) Recognition Model Using Deep Learning Architecture. Computers 2022, 11, 78. https://doi.org/10.3390/computers11050078

AMA Style

Alsaadi Z, Alshamani E, Alrehaili M, Alrashdi AAD, Albelwi S, Elfaki AO. A Real Time Arabic Sign Language Alphabets (ArSLA) Recognition Model Using Deep Learning Architecture. Computers. 2022; 11(5):78. https://doi.org/10.3390/computers11050078

Chicago/Turabian Style

Alsaadi, Zaran, Easa Alshamani, Mohammed Alrehaili, Abdulmajeed Ayesh D. Alrashdi, Saleh Albelwi, and Abdelrahman Osman Elfaki. 2022. "A Real Time Arabic Sign Language Alphabets (ArSLA) Recognition Model Using Deep Learning Architecture" Computers 11, no. 5: 78. https://doi.org/10.3390/computers11050078

APA Style

Alsaadi, Z., Alshamani, E., Alrehaili, M., Alrashdi, A. A. D., Albelwi, S., & Elfaki, A. O. (2022). A Real Time Arabic Sign Language Alphabets (ArSLA) Recognition Model Using Deep Learning Architecture. Computers, 11(5), 78. https://doi.org/10.3390/computers11050078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop