1. Introduction
Emotions are a vital and fundamental part of our existence. To comprehend a human’s most fundamental behavior, we must examine these feelings using emotional data, such as text, voice, face, facial emotions, and facial data [
1,
2]. Emotion analysis is used in computer vision, image processing, and high-speed photography applications to detect movement in images [
3]. Emotion analysis aims to detect emotion in an image, track a person’s emotion through time, group items that move together, and determine motion direction. Human faces are thought to contain much information that humans use to make engagement decisions, and facial emotions are closely linked to perceived engagement [
4].
Facial expressions reveal much about a person’s feelings, intentions, and internal states [
5]. Facial expression systems are a type of automatic facial expression recognition system [
6]. Such systems can function reasonably, displaying spontaneous and planned behavior and contrived facial expressions [
7].
Emotional analysis employs a complicated framework for understanding customer responses [
8,
9]. Using emoji and text analysis, this method assesses the variances in the feelings expressed by different viewers or purchasers [
10]. It is a detailed examination of the feelings and intensities felt as emotions develop and change [
11,
12,
13].
Unlike sentiment analysis, emotional analysis considers human emotions’ nuances. It also investigates the viewer’s, buyer’s, or reader’s intentions and impulses [
14,
15]. These discoveries can be quite enlightening and are simple to put into action: if boredom is the predominant emotion, spice things up with humor, creativity, or a cliffhanger moment. A confused reaction can indicate that the material is too complicated and that you must convey it differently [
16].
Emotions are essential in our life, as they are inborn with the ability to influence our behaviors and judgments [
17]. Faces are often the best sign of this because they convey emotions without words and may be observed by others [
18]. Facial emotions are created by muscle movements beneath the skin of the face [
19,
20]. Emotions, according to scholars, are a significant component of nonverbal communication and a rich source of social signals, and they are crucial in understanding human behavior [
21]. By examining facial expressions and mood data, researchers can gain a deeper comprehension of complicated human actions [
22].
A straightforward guide to students’ internal expressions is preferred to develop a successful strategy for teaching effectively and improving academic achievements [
23]. This can reveal complex emotions and impulses, which are never simply black and white either for positive or negative emotions for any student [
24].
Teachers could get real-time data on their pupils’ engagement with educational videos. Such data could indicate whether a video causes high anxiety or low involvement. The technology could aid in determining when and why pupils become disengaged, allowing intervention before it is too late [
25]. This study looks at methods for automatically detecting student involvement based on facial expressions. We investigated whether human observers can reliably judge engagement from the face. The technology could assist in understanding when and why kids get distracted and possibly intervene before it becomes a problem [
26]. Automated Instructional Systems (AITS) is a type of computer-assisted instruction that reports the video viewing speed that students choose as well as the student’s perception of the degree of difficulty.
In-depth emotional analysis is designed to help understand student behaviors and motivations [
27]. This is the only way to properly understand how to change website or social media site content during online education. It encompasses the full range of human emotions. Emotions are strong feelings associated with each circumstance and play an important role in communication between students and teachers. Emoticons can be recognized by various features, including face, speech, and even text. Facial expressions are one of the most direct ways of communicating feelings and are necessary to appreciate a student’s internal feelings [
28,
29]. The facial expressions of students and/or teachers can be used to understand their emotions in the learning environment.
This paper aims to map the relationship between teaching practices and student learning based on students’ and teachers’ emotional impact. Facial recognition algorithms extract helpful information from online platforms as image classification techniques are applied to detect the emotion of student and/or teacher faces. To validate the proposed system, an online course with students is used; the findings suggest that this technique operates well. Based on emotional analysis, several deep learning techniques are applied to train and test the emotion classification process. Transfer learning for a pre-trained deep neural network is used as well to increase the accuracy of the emotion classification stage. The obtained results show that the performance of the proposed method is promising with both techniques, as presented in the Experimental Results Section.
Application of the proposed system to a sample of online courses and in-class students resulted in an unexpected conclusion. In-class students mainly were positive and had interactive emotions such as happy, surprised, sad, and angry. Furthermore, the online students’ emotions were negative faces such as contempt, disgust, fear, and natural. It is worth mentioning that the ahegao emotion appeared a few times with only online course students and never appeared with in-class students. Furthermore, according to the grades achieved, it was expected that online course students would have lower total grades based on their emotions during the class. However, the system proved the opposite: online course students achieved higher grades than in-class students.
The benefit of applying the proposed system in real life relies on grouping students into two groups based on their class emotions: those who are more engaged with face-to-face education, and the others who can get better results by attending online courses. Moreover, according to the tested sample, 67% of the students who were not interested during face-to-face education, based on their emotions, will have better results if they attend the same course virtually.
The remaining of this paper is structured as follow:
Section 2 reviews the related literature. The proposed system is introduced in
Section 3.
Section 4 includes the experimental results and discussions. Finally, the work is concluded and future work is mentioned in
Section 6.
2. Literature Review
Teachers must comprehend student efficiency in a scientific atmosphere [
30]. This problem does not occur in an offline setting since the teacher can see the pupils’ emotions and expressions. The concept described in this study assists teachers in adjusting their teaching approaches to match the interests, progress, and learning of their pupils [
31,
32].
The teacher is the most crucial element in the school, and his/her personality greatly influences student behavior, as no good educational situation can be achieved without teachers. Their values and behavior are personally affected by their characteristics and their style of dealing inside and outside the classroom [
33].
A teacher who has desirable personal characteristics (based on student perception) is more able to bring about changes in their behavior and is more able to arouse their interest and direct them in the desired direction. The positive relationship between a teacher and the students allow them to learn how to lead and direct themselves [
33]. Professors and teachers are the most important people affecting students, especially in the primary stages, which are considered the most important stages of study because the student acquires everything that comes to mind from information, whether negative or negative positive, at this stage.
Many types of research have been conducted since the sudden move to online education; many problems have been addressed in these research articles [
34,
35,
36]. One of the important things that teachers should pay attention to is body language and facial expressions. Because the students focus on the teacher’s kinetic language accurately and can even verbally describe it, they constantly monitor his movements, so attention must be paid to students’ strength of observation to achieve the required communication. The effort is focused on the teacher, so he directs his physical movements, eyebrow expressions, tone of voice, the use of shoulders, and many other nonverbal movements [
37].
Many previous researchers have worked on the present technique for developing emotion recognition hardware on image/pattern recognition systems [
38,
39]. Further, an important point that must be focused on, according to experts, is that communication with body language must be reciprocal. Students express themselves through their movements more than through speaking, and they sometimes may not dare to speak, but they express themselves through their behavior more, and the teacher must understand these movements and nonverbal signals to help with student engagement.
Figure 1 illustrates a sample of teachers’ emotions in the classroom.
Many earlier studies have focused on emotion analysis (EA) for various purposes. The developers of [
40] have given an emotion care scheme and web-based platform to recognize people’s emotional status during the continuing COVID-19 issue. They looked at eight emotions in various situations (i.e., anger, anticipation, disgust, fear, joy, sadness, surprise, and trust).
In [
41], a convolutional neural network 2D (CNN-2D) input is a spectrogram built from speech sounds. Convolution layers, pooling layers, and fully connected layers are the three CNN layers that extract particular properties from spectrogram representations. When this model is used, the accuracy improves by 6.5 percent.
In [
42], the authors’ proposed paradigm has much promise for use in mental health care. It could identify, monitor, and diagnose a patient’s mental health in a low-cost, user-friendly way. Their suggested approach employed the CK+ and FER2013 datasets to get information from AlexNet’s Fully Connected Layer 6.
In [
43], a group of scientists created a system that employs sensors to log and disseminate real-time mood data. The platform is intended to make it simple to prototype new computer interfaces that can detect, respond to, and adapt to human emotion. They expect it will contribute to the advancement of effective computing technology.
In [
44], after being trained on a large dataset consisting of animations of the characters Tom and Jerry with a size of 8K by downloading videos from a popular YouTube channel, the proposed integrated deep neural network (DNN) correctly identifies the character, segments their face masks, and recognizes the resulting emotions. With 96 percent accuracy and an F1 score of 0.85, VGG 16 exceeded the competition. The study’s primary goal was to integrate DNN and validate it on vast data to better comprehend and analyze emotions. The suggested integrated DNN includes Mask R-CNN for cartoon figure detection, and well-known learning architectures/models such as VGG16, InceptionV3, ResNet-50, and MobileNetV2.
In [
45], a sophisticated Lie-Sensor is developed for detecting fraud or malicious intent and authorizing its validation. Face emotions are labeled as ’Happiness,’ ’Sadness,’ ’Surprise,’ and ’Hate’ in a suggested live emotional intelligence detector. It also uses text classification to forecast a message’s label separately. Finally, it compares the two labels and determines whether the message is genuine. The authors of [
46] present a method for recognizing facial expressions in photographs of people of various ages, genders, and nationalities. “Emotion sketches”, simplified depictions of facial expressions, are proposed. The study explains how to get emotion sketches and confirms the method by using them to train and test neural networks. Emotion sketches were used to train three neural networks of different types in order to classify facial expressions as ’Positive’, ’Negative’, ’Awe’ and ’Neutral’. The prediction results were encouraging, with over 70% accuracy given by each network on a query dataset.
In [
47], for facial sentiment analysis, the researchers suggest a real-time streaming image-based PingPong (PP2) method, line-segment feature analysis (LFA), and convolutional recurrent neural network (CRNN) model. The accuracy of face recognition using the suggested method is compared to the loss rate for other models in a performance evaluation. This study was carried out to address security issues that may arise with driver convenience services that use video in smart automobiles. We built an encoding–decrypting procedure on videos to improve the security of real-time stream videos. Using two variable clock functions and memory, the PP2 algorithm generated random numbers. PP2LFA-CRNN models were compared to AlexNet and CRNN models. The learning rate was 96.8% in the experiment using the test dataset, which was higher than expected.
In [
48], the PP2LFA-CRNN model was compared to AlexNet and CRNN models in terms of performance. The test dataset experiment’s learning rate was 96.8%, which was more excellent than previous techniques (CRNN: 94.2 percent and AlexNet: 91.3 percent). The experiment revealed that a person’s visual attention matches their purchasing and consumption habits. More consumer cognition research will help researchers better understand human behavior for various applications, including marketing, health care, personal qualities, wellness, and many more. In [
49], the authors suggested using RIEA (relationship identification using emotion analysis) to find relationships between intelligent agents. Their study extracted emotions and mapped them onto a set of human interactions using cognitive psychology and natural language processing theories.
In [
50], the authors offered a current assessment of computational analytic tools for assessing emotional facial expression in Parkinson’s disease patients (PWP). An NVIDIA GeForce 920M GPU was used to develop a deep-learning-based model. Techniques for computational facial expression analysis in Parkinson’s disease have many applications. Many of the proposed approaches to improving clinical assessment contain flaws. Hypomimia is a biomarker for Parkinson’s disease that we believe is significant. In [
51], a new software application designed as a serious game to teach children with autism how to understand and express their emotions was released. Children naturally grab objects and engage with the system with their faces. The system was assessed based on its relevance for children with autism spectrum disorder (ASD). ASD is a neurodevelopmental disease that affects a person’s social skills, particularly those related to emotional awareness and recognition. These skills can be learned, especially early in life. The researchers designed a game with no distracting elements so that children’s attention is focused on learning to understand emotions.
The research in [
52] investigated the effects of the proportion of non-competitive people and the length of emotional and cognitive time on the evolution of cooperation. Emotion comes through people’s relationships, and studies have shown that emotion greatly impacts people’s decision-making. Among non-competitive individuals, the fraction of cooperators increases with the minimum, whereas in competitive individuals, the proportion of cooperators peaks at M = 5. Individual emotions being introduced into strategy evolution is congruent with real-world phenomena. Our findings will help researchers better understand how strategies and emotions co-evolve. In [
53], electroencephalogram (EEG) signals were used to detect a patient’s mental state. EEG-based e-healthcare systems can be deployed and used in various smart contexts. They can assist disabled people in moving or controlling various devices, computers, and artificial limbs. Participants looked at images on a 15-inch display from a distance of roughly 70 cm. The display had gaze sensors mounted, and participants wore a head cap to measure functional near-infrared spectroscopy (fNIRS) signals. The proposed approach was compared via two different types of comparison methodologies.
In [
54], A team of researchers at the University of British Columbia (UBC) in Canada developed a machine-learning model that achieves state-of-the-art single-network accuracy on FER2013 without using extra training data. They adopted the VGGNet architecture, rigorously fine-tuned its hyperparameters, and experimented with various optimization methods. Without additional training data, researchers at the University of Bristol achieved the highest single-network accuracy on FER2013. They used the VGG network to build a series of experiments to test various optimization algorithms and learning rate schedulers for better prediction accuracy. This paper achieved single-network state-of-the-art classification accuracy on FER2013 using a VGGNet. They also conducted extra tuning of their model using cosine annealing and combined the training and validation datasets to improve the classification accuracy further.
In [
55], the authors presented emotion analysis (EA), which determines whether or not a text has any emotion. EA has grown in popularity recently, particularly for social media applications such as tweets and Facebook posts. The authors considered several instances of public posts and focused on several emotions in a single post. In [
56], the authors presented a headline emotion classification. The content words were extracted to form different word pairs with joy, disgust, fear, anger, sadness, and surprise emotions. In [
57], Vasileios Hatzivassiloglou and Kathleen R. McKeown found and validated limitations from conjunctions on the positive or negative semantic orientation of conjoined adjectives from a large corpus. A log-linear regression model uses these constraints to predict if conjoined adjectives are similar.
In [
58], the authors distinguished six basic emotions using supervised machine learning. The support vector machine (SVM) classifier outperformed all other classifiers. On previously unseen examples, it generalized well. In [
59], the authors proposed a system that automatically recognizes facial expressions from an image and classifies emotions. The system uses a simplified ’Viola Jones Face Detection’ method for face localization. The different feature vectors are combined to improve recognition and classification performance. In [
60], the authors explored a couple of machine learning algorithms and feature-extraction techniques to help accurately identify human emotion. In [
61], the authors reviewed the recent literature on speech emotion recognition. Thirty-two representative speech databases were reviewed from the point of view of their language, number of speakers, and emotions. The importance of choosing different classification models has been discussed.
EA has also been employed in the educational field; student and/or teacher emotions could be detected using smart systems. Many researchers have studied the effects of people’s emotions on others. In [
62], the authors tried to analyze online learning behaviors based on image emotion recognition. Key frames were extracted from human faces using an improved local binary pattern (LBP) and wavelet transform. The authors designed the structure for an online learning behavior analysis system. They also proposed a strategy for learning to recognize emotions through facial expressions. They extracted significant frames from facial expression photographs using an upgraded LBP and wavelet transform. The mean expression feature was then solved using many extracted key frames. Data and histograms show that the suggested technique can improve the effectiveness of image emotion recognition in experiments.
The authors of [
63] established the SELCSI (Student Emotional Learning in Cultural Safety Instrument). The preliminary validity and reliability of students’ emotional learning scales were discovered. This tool could help researchers better understand how nursing and midwifery students learn to practice in a culturally acceptable manner. The tool’s use has significant theoretical, educational, and methodological implications. The SELCSI is a tool to assist students to understand how health students learn to engage with First Peoples and communities in a culturally appropriate manner. For nursing and midwifery education, the instrument’s use has substantial theoretical, pedagogical, and methodological implications. In [
64], the researchers’ goal was to look at the impact of mindfulness techniques on stress perception and psychological well-being. The study included 45 graduate students split into two groups: intervention and control. Analysis of variance (ANOVA) for repeated measures was used to evaluate quantitative data, while thematic content analysis was used to analyze the interviews. The interviews revealed the presence of mixed feelings about graduate school and the development of new coping methods to deal with this work environment. In both groups, the results showed an increase in mindfulness and psychological well-being, as well as a decrease in perceived stress.
The research in [
65] presents an EEG-based emotion detection method for detecting a patient’s emotional state. The overall categorization accuracy was found to be 83.87 percent. Four electrodes are used to test a new technique based on the EEG database “DEAP”. When compared to existing algorithms, it performs well. It uses electroencephalogram (EEG) signals to detect a patient’s mental state. EEG-based e-healthcare systems can be used in a variety of smart settings. They can assist disabled people with operating or controlling various devices, computers, and prosthetic limbs. In [
66], the authors presented a system to detect the engagement level of the students. The system correctly identified when students were “very engaged”, “nominally engaged”, and “not engaged at all”. The students with the best scores also had higher concentration indexes and were more attentive to details of their work.
Concluding the literature review, artificial intelligence (AI) and recent deep learning techniques could be applied to many areas that facilitate human lives [
67,
68]. Furthermore, it could be applied in medical applications [
69,
70], recommender systems [
71], job-seeking [
72], smart cities and localization [
73], hospitals [
74,
75], object tracking [
76,
77,
78], software engineering [
79,
80], E-commerce [
81], emotional analysis [
82], agriculture applications [
83,
84], and many others [
85].
3. Methodology
Face Reader is the most reliable automated method for recognizing a variety of specific qualities in facial photographs, including the nine basic or universal expressions of happiness, sadness, anger, surprise, neutrality, disdain, Ahegao, fear, and disgust. According to Paul Ekman, these emotional categories are basic or universal emotions. Face Reader can also detect a ’neutral’ condition and evaluate ’contempt.’ Action units, valence, arousal, gaze direction, head orientation, and personal factors such as gender and age are also calculated.
Online students exhibit varied levels of involvement while participating in these instructional activities, including boredom, annoyance, delight, neutral, bewilderment, and learning gain. Online educators must accurately and efficiently assess online learners’ engagement status to give individualized pedagogical support. Automatic categorization methods extract features from various traits such as eye movement, facial expressions, gestures, and postures or physiological and neurological sensors. These methods do not interfere with learners’ engagement detection in their learning environments, enabling them to be grouped into different subject areas.
The amount of arousal or alertness is commonly associated with participation in the neuroscience literature. The detected emotion of either student or the teacher are the indicators used to assess engagement and attentiveness. These methods require using computer-vision-based approaches that are not practical in real-world educational settings.
Computer-vision-based approaches can assess whether a learner is engaged in an activity. The assessment procedure is unobtrusive and simple to use, comparable to how a teacher monitors whether a pupil is motivated without disrupting his or her activity in the classroom.
3.1. Proposed System
Several datasets have been used in the proposed system; some are collected from the web, and others have been previously implemented by other researchers (more information about the dataset will be explained in detail in the next section). The proposed system employs deep learning techniques to test the emotions in the dataset after applying preprocessing stages to reduce the features and remove noise.
Figure 2 illustrates the proposed system implemented to analyze teacher and student emotions.
3.2. Dataset
Many available datasets can be used for EA, and some of them are employed in this research. The following describes the well-known datasets in the field.
Among the most comprehensive databases for face affect in still images, AffectNet includes category and dimensional models; 1250 emotion-related tags in English, German, Spanish, Portuguese, Arabic, and Farsi were used to collect the data.
The CK+ (Extended Cohn–Kanade Dataset) is a publicly available benchmark dataset for action unit and emotion recognition. There are 5876 images in the collection from 123 persons, with expression sequences ranging from neutral to peak. The images in the CK+ collection all share the same backdrop, are mostly grayscale, and are 640 × 490 pixels in size.
A training set of 28,000 labeled photos, a development set of 3500 annotated images, and a test set of 3500 images comprise the Fer-2013 (Facial Expression Recognition 2013) dataset. The dataset was created by combining the results of each emotion’s Google image search with synonyms for the emotions. Each image in FER-2013 is tagged with one of seven emotions: happy, sad, furious, afraid, surprise, disgust, and neutral, with happiness being the most common, resulting in a 24.4 percent baseline for random guessing.
EMOTIC (Emotion Recognition in Context) is a database of photographs of people in real-life scenarios labeled with their apparent emotions. The EMOTIC dataset includes two types of emotion representation: discrete categories (a set of 26) and continuous dimensions (24) (e.g., valence, arousal, and dominance). There are 23,571 images and 34,320 people tagged in the collection. In reality, some images were hand-picked from Google’s search engine.
The Google Facial Expression Comparison dataset is a popular emotion dataset with many people using it. There are labeled triplet images in the collection. The top six raters assign each triplet a label. The dataset aids in establishing which of the two faces has similar feelings in each scenario. The data are mainly used to summarize albums, determine emotions, and other similar tasks.
Ascertain is a multi-modal database for implicit personality and affects recognition that may be used to track physiological responses to assess personality traits and emotional states. The data contain 58 individuals’ Big Five personality scores and emotional self-ratings, synchronously recorded EEG, ECG, galvanic skin response (GSR), and facial activity data acquired while watching affective movie clips with off-the-shelf sensors.
Dreamer is a multi-modal database that contains electroencephalogram (EEG) and electrocardiogram (ECG) information collected during affect elicitation with audio–visual stimuli. Signals from 23 people were captured in this dataset, together with the subjects’ self-assessment of their affective state regarding valence, arousal, and dominance following each stimulus.
K-EmoCon is a multi-modal dataset compiled from 32 persons who participated in 16 paired social discussions. The data were gathered using three off-the-shelf wearable sensors, audio–visual video of debate participants, and continual emotional annotations.
In this paper, we collected data by downloading it from the Kaggle website, where we obtained a dataset for analyzing emotions of images with different names such as fer-2013, CK+48, jaffedbase, OAHEGA EMOTION RECOGNITION DATASET, and Natural Human Face Images for Emotion Recognition, as described in detail in
Table 1. We consolidated the examples from the various datasets into a separate file for each emotion: ahegao, anger, contempt, happiness, fear, disgust, neutrality, surprise, and sad.
Then, we divided each separate file into two groups: 80% training and 20% testing. We used cross validation to optimize the training and testing percentages, as illustrated in
Table 1:
In this paper, we apply several algorithms to each of our datasets separately and then to our aggregate dataset, with the aim of revealing the extent to which teachers influence students in the educational process by focusing on analyzing feelings, which is the main objective of this paper. A sample of the dataset used in this research is illustrated in
Figure 3.
The second step was to make the dataset stabled, so we converted all images to the same extension and size and made them gray (black and white). Our purpose is to build stabled dataset to be ready for use in many experiments in several papers for education purposes. This paper’s main contribution is linking the emotional impact of both students and teachers on online education. The unbalanced data that was noticed from
Table 1 will be resolved in future work as both oversampling and under-sampling techniques will be implemented and applied to the gathered dataset to make the accuracy much better according to the emotion classification process.
5. Discussion
This study aims to analyze the acceptance of technology by students and how it affects achievements based on emotions after participating in an online course. A survey of 271 students (119 registered in face-to-face instruction, 252 registered in an online course) revealed a higher level of positive emotions than negative emotions.
Online course students had higher levels of aheago, anger, contempt, and disgust but less happy and surprise. Furthermore, the results show that students in online courses reported significantly higher grades, as technological achievements related significantly with class enjoyment.
Based on a five-point scale survey (1 = not related; 5 = very related), the following nine emotions were measured: ahegao, anger, contempt, happiness, fear, disgust, neutrality, surprise, and sad. The emotions varied in terms of valence and activation: positive activating, negative activating, positive deactivating (not measured), and negative deactivating.
For the assessment scales to show important connections with student emotions, both control and value scales and technology-related beliefs and age were included as covariates in further analyses of variance. An in-depth analysis was conducted to assess differences in students’ grades for both online and face-to-face education.
Based on analyzing the conducted survey, the covariate effects can explain significant differences in students’ negative emotions in both online and face-to-face education. Surprisingly, for domain-specific achievement, neither the main effect nor the covariates showed significant effects, although the t-test confirmed a significant difference between the groups.
The purpose of this research was to analyze to what extent there are differences in the experiences of specific teachers’ students with respective to how their achievement is related to individual emotions; this has been evaluated by comparing students who attended the course physically with those who attended it virtually. As a result of what we conducted here, students with positive emotions have more benefits than those with negative emotions in university courses in education.
Further results regarding emotions and their appraisals showed that achievement task value was rated higher in the on-campus group, while technological control was higher in the online group. Consequently, domain-specific achievement was higher in the on-campus group. This supports the previous assumption that for some students, it seems to be difficult to learn to self-regulate.
In sum, it has been shown in this study that the learning environment might affect student achievement task value and technological control. On the other hand, the results indicate that the learning environment (i.e., online vs. face-to-face) seemed to have only weak effects on student achievement in this study.