Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion
Abstract
:1. Introduction
- How can we combine physiological body data with environmental data for emotion recognition?
- Is it possible to create subject independent emotion prediction model with high accuracy?
- What is the best algorithm with ensemble learning to use for integrating multi-modal data to generate subject-independent models using sensor data fusion?
- Avoid over-fitting: When only a small quantity of data is available, a learning algorithm is prone to discovering several diverse hypotheses that perfectly predict all of the training data while producing poor predictions for unknown instances. Averaging different hypothesis reduces the risk of choosing an incorrect hypothesis and also improves the overall predictive performance.
- Provide a computation advantage: Local searches conducted by single learners may become stuck in local optima. Ensemble approaches reduce the risk of attaining a local minimum by mixing numerous learners.
2. Related Work
2.1. Discussions about Emotion Recognition Using On-Body and Environmental Factors
Sensor | Signals and Features |
---|---|
Motion | Because modern accelerometers incorporate tri-axial micro-electro-mechanical systems (MEMS) to record three-dimensional acceleration, the motion equation is as follows: , where this equation is the root mean square of all three components. In recent years, authors used the accelerometer to identify emotions [19]. |
Body Temperature | Despite its simplicity, we can use body temperature to gauge a person’s emotions and mood shifts [18,19,20]. Wan-Young Chung demonstrated that variations in skin temperature, known as Temperature Variability (TV), may be used to identify nervous system activity [21]. |
Heart Rate | The RR interval refers to the period between 2 successive pulse peaks, and the signal produced by this sensor consists of heartbeats. According to many researches, authors sometimes use HR to measure happiness and emotions [20,22,23]. |
EDA | It is sometimes called Galvanic Skin Resistance (GSR) and is associated with emotional and stress sensitivity [20,24,25,26]. |
- During data analysis, intermediate-level data fusion, also known as “Feature Level” fusion, is used to determine the optimal collection of features for classification. For example, the best combination of features, such as EMG, Respiration, Skin Conductance, and ECG, has been retrieved using feature-level fusion [18].
- Finally, high-level data fusion, often known as “Decision Level” fusion, seeks to enhance decision-making by combining the outcomes of many approaches. Ensemble learning can be considered a decision level fusion.
2.2. Discussions about Emotion Recognition Using Physiological Signals and Facial Expressions
3. Methodology
3.1. System Architecture
3.2. Data Collection
- Monitor your heart rate using an optical heart rate monitor.
- Accelerometer with three axes.
- Sensors for Galvanic Skin Response (GPS).
- Galvanic skin response sensors(EDA).
- UV sensor.
- Skin temperature sensor.
3.3. Data Pre-Processing
- incomplete
- incorrect
- irrelevant data
- outliers outliers in data can be identified using a boxplot or a histogram. We define the first quartile of the data as and the third quartile of the data as . We can compute IQR (Interquartile Range). We also remove columns With Low Variance to eliminate columns with few unique values by filtering columns to be eliminated from the dataset using variance statistics or the variance threshold of each column using specific threshold values ranging from (0.0 to 0.5). Based on these methods, features were reduced from 22 to 18 features. Table 5 represents the extracted and removed features. Figure 2 depicts the connection between the threshold (the x-axis) and the number of filtered features (the y-axis) using Variance Threshold in the modified data-set. Then, we spilled the cleaned data into training and testing sets, with the features scaled using normalization or standardization.
3.4. Ensemble Learning Methods
3.4.1. Bagging (Bootstrap Aggregation)
3.4.2. Boosting
3.4.3. Stacking
3.5. Implementation
3.5.1. Feature Extraction
- HR, EDA, bTemp representing in (mean, median, max, min, std and quartiles) [24].
- For HR, we used a Poincare plot to extract distributions of HRV features and was used to test normality defined in time series and frequency domains [18].We computed the SD1 parameter based on time series as follows: , where SD1 is the standard deviation along the minor axis. On the other hand, SDSD is the standard deviation of successive differences (Time Domain Parameter). Additionally, the SD2 parameter may be calculated as follows: , where SD2 is the standard deviation on the major axis. And SDNN is the standard deviation of the NNI series(). Thus, a higher heart rate of HRV or a lower heart rate of HRV depends on (SD1/SD2) [53]We also derive frequency domain that indicates the power spectrum of order 12 by integration of low frequency (LF) heartbeats (0.04 to 0.15 Hz) and high-frequency (HF) (0.15 Hz to 0.4 Hz) [54].
- For Motion: We combined the X, Y, and Z characteristics into a single component called Motion.
3.5.2. Feature Fusion Level
3.5.3. Feature Selection
3.5.4. Building and Optimizing Classification Models
- Separate the training data into three groups.
- select L weak learners and fit them to first-fold data
- evaluate each of the weak learners for second-fold
- make predictions for third-fold observations for each of the L weak learners
- fit the meta-model on the third fold, using the weak learners’ predictions as inputs.
4. Results
4.1. Descriptive Statistics
- Standard statistical procedures: include the following:
- Descriptive Statistics: Descriptive statistics describe the basic and essential variables of the data, such as mean, standard deviation (std), median (Med), minimum (min), 1st Quartile, or (25%), 2nd Centile, or (50%), 3rd Centile or (75%), maximum (100%), and skewness and kurtosis (kur). Table 7 depicts these statistics of the various body and environmental sensor signals for all participants.
- Correlation Matrix: Correlation is a statistical approach for determining if and how strongly two independent variables are connected. The correlation coefficient (or “r”) is an indicator of the strength of the linear relationship between two variables and is the principal outcome of a correlation. It has a range of −1.0 to +1.0. The closer r is near +1 or −1, the closer the two variables are linked and calculated as follows:If r is near 0, it implies that the variables have no connection. If r is positive, it indicates that when one variable grows, the other gets bigger as well. If r is negative, it implies that while one gets bigger, the other shrinks (this is known as an “inverse” correlation). Figure 11 depicts a graphical representation of a correlation matrix.
- Covariance Matrix: covariance matrix on the other hand, is a square matrix that depicts the covariance between each pair of variables in a random vector. If the covariance is positive, it implies that the frequency of the two variables is increasing. However, if the correlation is negative, it indicates that the two variables are often falling. Finally, if the covariance is 0, there is no relationship between the two variables as shown in Table 8, EDA has a negative correlation with (HR, UV, bTemp) and a positive relationship with (HR, UV, b-Temp) (EnvNoise, Air pressure) and Motion has a negative relation with (bTemp) and positively related with (EDA, HR, UV, EnvNoise, air pressure) and so on according to (HR, UV, EnvNoise, Air-Pressure, bTemp).
- PCA: PCA (Principle component analysis) was also used to identify the relation between features included in the multiple regression analyses. The principal component analysis (PCA) is an example of a regression component analysis system (PCA). Figure 12 depicts: the first PCA component of (b-Temp, HR) on-body features exhibits positive correlations with environmental variables since they are both oriented towards the same right side of the plot (UV). On-body variables (EDA) and Motion, on the other hand, have positive coefficients with external variables (AirPressure, Env-Noise) because they are both oriented to the plot’s left-top side and have a negative relationship with b-Temp. Because (Motion, Air-Pressure) features are oriented on the top y-axis and negatively linked to bTemp, they have negative coefficients with (UV, HR, bTemp) features, whereas (EDA, EnvNoise) features have positive coefficients with (UV, HR). Consequently, we must comprehend the link between environmental and on-body factors and their influence on human emotions.
- Analysis of the Poincare plot: this is a scattergram technique that isn’t linear. A poincaré plot is a graph that shows NN(i) on the x-axis and NN(i + 1) (the next NN interval) on the y-axis, with NN intervals in between (the distance between each heartbeat). We utilized poincaré plots to display and evaluate heart rate variability (HRV) normality, excluding those with noisy heart rate patterns and assessing heart health [55].The standard deviation of the instantaneous beat-to-beat NN interval variability (minor axis of the SD1), the standard deviation of the continuous long-term RR interval variability (major-axis of SD2), and the axis ratio (SD2/SD1) based on the analysis of this plot [54]. A higher or lower heart rate variability (HRV) is determined by the ratio (SD1/SD2), with a higher proportion indicating excellent health and a lower rate indicating poor health.Given a time series, this visualisation can be calculated as follows:Following that, return plots , then plots , and so on. We used Poincare plots to verify the common examples of noisy and regular HR data patterns of participants as shown in Figure 13.
4.2. Emotion Ensemble Predictive Models
4.3. Performance Evaluation
4.4. Hyper-Parameter Optimization
5. Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dias, W.; Andaló, F.; Padilha, R.; Bertocco, G.; Almeida, W.; Costa, P.; Rocha, A. Cross-dataset emotion recognition from facial expressions through convolutional neural networks. J. Vis. Commun. Image Represent. 2022, 82, 103395. [Google Scholar] [CrossRef]
- Yang, Y.; Xu, F. Review of Research on Speech Emotion Recognition. In International Conference on Machine Learning and Intelligent Communications; Springer: Cham, Switzerland, 2022; pp. 315–326. [Google Scholar]
- Balamurali, R.; Lall, P.B.; Taneja, K.; Krishna, G. Detecting human emotions through physiological signals using machine learning. In Artificial Intelligence and Technologies; Springer: Singapore, 2022; pp. 587–602. [Google Scholar]
- Zhang, Y.; Cheng, C.; YiDie, Z. Multimodal emotion recognition based on manifold learning and convolution neural network. Multimed. Tools Appl. 2022, 1–16. [Google Scholar] [CrossRef]
- Reis, S.; Seto, E.; Northcross, A.; Quinn, N.W.T.; Convertino, M.; Jones, R.L.; Maier, H.R.; Schlink, U.; Steinle, S.; MassimoVieno; et al. Integrating modelling and smart sensors for environmental and human health. Environ. Model. Softw. 2015, 74, 238–246. [Google Scholar] [CrossRef] [Green Version]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdisciplinary Reviews. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar]
- Bradley, M.M.; Lang, P.J. Measuring emotion: The self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 1994, 25, 49–59. [Google Scholar] [CrossRef]
- Kanjo, E.; Younis, E.M.G.; Ang, C.S. Deep learning analysis of mobile physiological, environmental and location sensor data for emotion detection. Inf. Fusion 2019, 49, 46–56. [Google Scholar] [CrossRef]
- Kanjo, E.; Bacon, J.; Roberts, D.; Landshoff, P. MobSens: Making smart phones smarter. IEEE Pervasive Comput. 2009, 8, 50–57. [Google Scholar] [CrossRef]
- Kööts, L.; Realo, A.; Allik, J. The influence of the weather on affective experience. J. Individ. Differ. 2011, 32, 74. [Google Scholar] [CrossRef] [Green Version]
- Park, N.-K.; Farr, C.A. The effects of lighting on consumers’ emotions and behavioral intentions in a retail environment: A cross-cultural comparison. J. Inter. Des. 2007, 33, 17–32. [Google Scholar] [CrossRef]
- Kanjo, E.; Younis, E.M.G.; Sherkat, N. Towards unravelling the relationship between on-body, environmental and emotion data using sensor information fusion approach. Inf. Fusion 2018, 40, 18–31. [Google Scholar] [CrossRef]
- Gravina, R.; Alinia, P.; Ghasemzadeh, H.; Fortino, G. Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges. Inf. Fusion 2017, 35, 68–80. [Google Scholar] [CrossRef]
- Steinle, S.; Reis, S.; Sabel, C.E. Quantifying human exposure to air pollution—Moving from static monitoring to spatio-temporally resolved personal exposure assessment. Sci. Total. Environ. 2013, 443, 184–193. [Google Scholar] [CrossRef] [Green Version]
- Kanjo, E.; Benford, S.; Paxton, M.; Chamberlain, A.; Fraser, D.S.; Woodgate, D.; Crellin, D.; Woolard, A. MobGeoSen: Facilitating personal geosensor data collection and visualization using mobile phones. Pers. Ubiquitous Comput. 2008, 12, 599–607. [Google Scholar] [CrossRef]
- Hall, D.L.; Llinas, J. An introduction to multisensor data fusion. Proc. IEEE 1997, 85, 6–23. [Google Scholar] [CrossRef] [Green Version]
- Castanedo, F. A review of data fusion techniques. Sci. World J. 2013, 2013, 704504. [Google Scholar] [CrossRef] [PubMed]
- Guendil, Z.; Lachiri, Z.; Maaoui, C.; Pruski, A. Multiresolution framework for emotion sensing in physiological signals. In Proceedings of the 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Monastir, Tunisia, 21–23 March 2016. [Google Scholar]
- Irrgang, M.; Egermann, H. From motion to emotion: Accelerometer data predict subjective experience of music. PLoS ONE 2016, 11, e0154360. [Google Scholar]
- Adibuzzaman, M.; Jain, N.; Steinhafel, N.; Haque, M.; Ahmed, F.; Ahamed, S.; Love, R. In situ affect detection in mobile devices: A multimodal approach for advertisement using social network. ACM SIGAPP Appl. Comput. Rev. 2013, 13, 67–77. [Google Scholar] [CrossRef]
- Chung, W.Y.; Bhardwaj, S.; Punvar, A.; Lee, D.S.; Myllylae, R. A fusion health monitoring using ECG and accelerometer sensors for elderly persons at home. In Proceedings of the 2007 29th Annual International Conference of the IEEE Engineering in Medicine And Biology Society, Lyon, France, 22–26 August 2007. [Google Scholar]
- Wan-Hui, W.; Yu-Hui, Q.; Guang-Yuan, L. Electrocardiography recording, feature extraction and classification for emotion recognition. In Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, Los Angeles, CA, USA, 31 March–2 April 2009; Volume 4. [Google Scholar]
- Colomer Granero, A.; Fuentes-Hurtado, F.; Naranjo Ornedo, V.; Guixeres Provinciale, J.; Ausín, J.M.; Alcaniz Raya, M. A comparison of physiological signal analysis techniques and classifiers for automatic emotional evaluation of audiovisual contents. Front. Comput. Neurosci. 2016, 10, 74. [Google Scholar] [CrossRef] [PubMed]
- Lisetti, C.L.; Nasoz, F. Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J. Adv. Signal Process. 2004, 2004, 929414. [Google Scholar] [CrossRef] [Green Version]
- Takahashi, K. Remarks on SVM-based emotion recognition from multi-modal bio-potential signals. In Proceedings of the RO-MAN 2004, 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No. 04TH8759), Kurashiki, Japan, 22 September 2004. [Google Scholar]
- Kim, J.; André, E. Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 2067–2083. [Google Scholar] [CrossRef]
- Li, C.; Bao, Z.; Li, L.; Zhao, Z. Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition. Inf. Process. Manag. 2020, 55, 102185. [Google Scholar] [CrossRef]
- Zhang, J.; Yin, Z.; Chen, P.; Nichele, S. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Inf. Fusion 2020, 59, 103–126. [Google Scholar] [CrossRef]
- Pandey, P.; Seeja, K.R. Subject independent emotion recognition from EEG using VMD and deep learning. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 1730–1738. [Google Scholar] [CrossRef]
- Gupta, V.; Chopda, M.D.; Pachori, R.B. Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals. IEEE Sens. J. 2018, 19, 2266–2274. [Google Scholar] [CrossRef]
- Albraikan, A.; Tobón, D.P.; El Saddik, A. Toward user-independent emotion recognition using physiological signals. IEEE Sens. J. 2018, 19, 8402–8412. [Google Scholar] [CrossRef]
- Ali, M.; Al Machot, F.; Haj Mosa, A.; Jdeed, M.; Al Machot, E.; Kyamakya, K. A globally generalized emotion recognition system involving different physiological signals. Sensors 2018, 18, 1905. [Google Scholar] [CrossRef] [Green Version]
- Kim, K.H.; Bang, S.W.; Kim, S.R. Emotion recognition system using short-term monitoring of physiological signals. Med. Biol. Eng. Comput. 2004, 42, 419–427. [Google Scholar] [CrossRef]
- Healey, A.J.; Picard, R.W. Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans. Intell. Transp. Syst. 2005, 6, 156–166. [Google Scholar] [CrossRef] [Green Version]
- Dzedzickis, A.; Kaklauskas, A.; Bucinskas, V. Human emotion recognition: Review of sensors and methods. Sensors 2020, 20, 592. [Google Scholar] [CrossRef] [Green Version]
- Katsis, C.D.; Katertsidis, N.; Ganiatsas, G.; Fotiadis, D.I. Toward emotion recognition in car-racing drivers: A biosignal processing approach. IEEE Trans. Syst. Man -Cybern.-Part Syst. Humans 2008, 38, 502–512. [Google Scholar] [CrossRef]
- Patel, M.; Lal, S.K.L.; Kavanagh, D.; Rossiter, P. Applying neural network analysis on heart rate variability data to assess driver fatigue. Expert Syst. Appl. 2011, 38, 7235–7242. [Google Scholar] [CrossRef]
- Jang, E.H.; Park, B.J.; Kim, S.H.; Chung, M.A.; Sohn, J.H. Classification of three emotions by machine learning algorithms using psychophysiological signals. Int. J. Psychophysiol. 2012, 3, 402–403. [Google Scholar] [CrossRef]
- Soleymani, M.; Pantic, M.; Pun, T. Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput. 2011, 3, 211–223. [Google Scholar] [CrossRef] [Green Version]
- Chang, C.-Y.; Chang, C.; Zheng, J.; Chung, P. Physiological emotion analysis using support vector regression. Neurocomputing 2013, 122, 79–87. [Google Scholar] [CrossRef]
- Verma, G.K.; Tiwary, U.S. Multimodal fusion framework: A multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 2014, 102, 162–172. [Google Scholar] [CrossRef]
- Pollreisz, D.; TaheriNejad, N. A simple algorithm for emotion recognition, using physiological signals of a smart watch. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju, Korea, 11–15 July 2017; pp. 2353–2356. [Google Scholar]
- Houssein, E.H.; Asmaa, H.; Abdelmgeid, A.A. Human emotion recognition from EEG-based brain–computer interface using machine learning: A comprehensive review. Neural Comput. Appl. 2022, 1–31. [Google Scholar] [CrossRef]
- Aguiñaga, A.R.; LDelgado, U.M.; López-López, V.R.; Téllez, A.C. EEG-Based Emotion Recognition Using Deep Learning and M3GP. Appl. Sci. 2022, 12, 2527. [Google Scholar] [CrossRef]
- Khan, A.N.; Ihalage, A.A.; Ma, Y.; Liu, B.; Liu, Y.; Hao, Y. Deep learning framework for subject-independent emotion detection using wireless signals. PLoS ONE 2021, 16, e0242946. [Google Scholar] [CrossRef]
- Cosoli, G.; Poli, A.; Scalise, L.; Spinsante, S. Measurement of multimodal physiological signals for stimulation detection by wearable devices. Measurement 2021, 184, 109966. [Google Scholar] [CrossRef]
- Banzhaf, E.; de la Barrera, F.; Kindler, A.; Reyes-Paecke, S.; Schlink, U.; Welz, J.; Kabisch, S. A conceptual framework for integrated analysis of environmental quality and quality of life. Ecol. Indic. 2014, 45, 664–668. [Google Scholar] [CrossRef]
- Sewell, M. Ensemble learning. RN 2008, 11, 1–34. [Google Scholar]
- Sarkar, D.; Natarajan, V. Ensemble Machine Learning Cookbook: Over 35 Practical Recipes to Explore Ensemble Machine Learning Techniques Using Python; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
- Li, Y.; Wei, J.; Wang, D.; Li, B.; Huang, H.; Xu, B.; Xu, Y. A medium and Long-Term runoff forecast method based on massive meteorological data and machine learning algorithms. Water 2021, 13, 1308. [Google Scholar] [CrossRef]
- Sodhi, A. American Put Option pricing using Least squares Monte Carlo method under Bakshi, Cao and Chen Model Framework (1997) and comparison to alternative regression techniques in Monte Carlo. arXiv 2018, arXiv:1808.02791. [Google Scholar]
- Kiyak, E.O. Data Mining and Machine Learning for Software Engineering. In Data Mining-Methods, Applications and Systems; IntechOpen: London, UK, 2020. [Google Scholar]
- Nguyen Phuc Thu, T.; Hernández, A.I.; Costet, N.; Patural, H.; Pichot, V.; Carrault, G.; Beuchée, A. Improving methodology in heart rate variability analysis for the premature infants: Impact of the time length. PLoS ONE 2019, 14, e0220692. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Joo, Y.; Lee, S.; Kim, H.; Kim, P.; Hwang, S.; Choi, C. Efficient healthcare service based on Stacking Ensemble. In Proceedings of the 2020 ACM International Conference on Intelligent Computing and Its Emerging Applications, GangWon, Korea, 12–15 December 2020. [Google Scholar]
- Fishman, M.; Jacono, F.J.; Park, S.; Jamasebi, R.; Thungtong, A.; Loparo, K.A.; Dick, T.E. A method for analyzing temporal patterns of variability of a time series from Poincare plots. J. Appl. Physiol. 2012, 113, 297–306. [Google Scholar] [CrossRef] [Green Version]
- Polikar, R. Ensemble learning. In Ensemble Machine Learning; Springer: Boston, MA, USA, 2012; pp. 1–34. [Google Scholar]
- Poucke, S.V.; Zhang, Z.; Schmitz, M.; Vukicevic, M.; Laenen, M.V.; Celi, L.A.; Deyne, C.D. Scalable predictive analysis in critically ill patients using a visual open data analysis platform. PLoS ONE 2016, 11, e0145791. [Google Scholar] [CrossRef] [Green Version]
- Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. J.-Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
- Song, T.; Zheng, W.; Lu, C.; Zong, Y.; Zhang, X.; Cui, Z. MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access 2019, 7, 12177–12191. [Google Scholar] [CrossRef]
- Li, S.; Cui, L.; Zhu, C.; Li, B.; Zhao, N.; Zhu, T. Emotion recognition using Kinect motion capture data of human gaits. PeerJ 2016, 4, e2364. [Google Scholar] [CrossRef] [Green Version]
- Wen, W.; Liu, G.; Cheng, N.; Wei, J.; Shangguan, P.; Huang, W. Emotion recognition based on multi-variant correlation of physiological signals. IEEE Trans. Affect. Comput. 2014, 5, 126–140. [Google Scholar] [CrossRef]
- Zhang, Z.; Song, Y.; Cui, L.; Liu, X.; Zhu, T. Emotion recognition based on customized smart bracelet with built-in accelerometer. PeerJ 2016, 4, e2258. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Noroozi, F.; Sapiński, T.; Kamińska, D.; Anbarjafari, G. Vocal-based emotion recognition using random forests and decision tree. Int. J. Speech Technol. 2017, 20, 239–246. [Google Scholar] [CrossRef]
- Shu, L.; Yu, Y.; Chen, W.; Hua, H.; Li, Q.; Jin, J.; Xu, X. Wearable emotion recognition using heart rate data from a smart bracelet. Sensors 2020, 20, 718. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sultana, M.; Al-Jefri, M.; Lee, J. Using machine learning and smartphone and smartwatch data to detect emotional states and transitions: Exploratory study. JMIR mHealth uHealth 2020, 8, e17818. [Google Scholar] [CrossRef]
Emotions | Measurement Methods | Data Analysis Methods | Accuracy | Ref. |
---|---|---|---|---|
Sadness, anger, stress, surprise | ECG, SKT, GSR | SVM | For recognizing three and four categories, the correct classification rates were 78.4% and 61.8%, respectively. | [33] |
Sadness, anger, fear, surprise, frustration, and amusement | GSR, HRV, SKT | KNN, DFA, MBP | KNN, DFA, and MBP could classify emotions with 72.3%, 75.0%, and 84.1%, respectively | [24] |
Three levels of driver stress | ECG, EOG, GSR and respiration | Fisher projection matrix and a linear discriminant | Three levels of driver stress with an accuracy of over 97% | [34] |
Fear, neutral, joy | ECG, SKT, GSR, respiration | Canonical correlation analysis | The rate of correct categorization is 85.3%. Fear, neutral, and happy categorization percentages were 76%, 94%, and 84%, respectively | [35] |
The emotional classes identified are high stress, low stress, disappointment, and euphoria | Facial EOG, ECG, GSR, respiration, | SVM and adaptive neuro-fuzzy inference system (ANFIS) | The total classification rates for the SVM and the ANFIS using ten fold cross-validation are 79.3% and 76.7%, respectively. | [36] |
Fatigue caused by driving for extended hours | HRV | Neural network | The accuracy of the neural network is 90% | [37] |
Boredom, pain, surprise | GSR, ECG, HRV, SKT | Machine learning algorithms such as linear discriminate analysis (LDA), classification and regression tree (CART), self-organizing map (SOM), and SVM | SVM produced accuracy rate of 100.0% | [38] |
The arousal classes were calm, medium aroused, and activated and the valence classes were unpleasant, neutral, and pleasant | ECG, pupillary response, gaze distance | Support vector machine | The optimal classification accuracies of 68.5% for three labels of valence and 76.4% for three labels of arousal | [39] |
Sadness, fear, pleasure | ECG, GSR, blood volume, pulse | Support vector regression | Recognition rate up to 89.2% | [40] |
Terrible, love, hate, sentimental, lovely, happy, fun, shock, cheerful, depressing, exciting, melancholy, mellow | EEG, GSR, blood volume pressure, respiration pattern, SKT, EMG, EOG | Support Vector Machine, Multilayer Perceptron (MLP), K-Nearest Neighbor (KNN) and Meta-multiclass (MMC), | The average accuracies are 81.45%, 74.37%, 57.74% and 75.94% for SVM, MLP, KNN and MMC classifiers respectively. The best result is for ‘Depressing’ with 85.46% using SVM. | [41] |
Happiness, sadness, surprise, stress | SKT, EDA, and HR | SVM, RSVM, SVM+GA, NN, DFA | The average accuracies are: 66.95% (SVM), 75.9% (RSVM), 90% (SVM+GA), 80.2% (NN), 84.7% (DFA) of this study and using Empatica E4 smartwatch to collect data from participants | [42] |
Theoretical emotions | EEG signal | KNN, NB, SVM, RF, feature extraction (e.g., wavelet transform and non-linear dynamics), feature reduction (e.g., PCA, LDA) | This study achieved an average classification accuracy of over 80% and using wearable sensor to collect eeg signals | [28] |
Emotions | Measurement Methods | Data Analysis Methods | Accuracy | Ref. |
---|---|---|---|---|
Arousal and valence emotions. Arousal represents inactive and active emotions (Annoying, Angry, Nervous, Excited, Happy, pleased). Valence represents negative and positive emotions (Sad, Bored, Sleepy, Relaxed, Calm, Peaceful) | EEG, Facial expressions | ANN, SVM, RF, K-NN, DT, RNN, CNN, DNN, DBN, LSTM | ML classification accuracy ranges from 61.17 to 93% (SVM: 41%, ANN: 18%, RF: 14%, KNN: 9%, DT: 9%) and deep learning classification accuracy ranges from 61.25% and 97.56% (LSTM: 50%, DNN: 7%, DBN: 7%, CNN: 36%) | [43] |
Arousal and valence (low and high) emotion levels. | EEG Signal | ML classifiers (KNN, SVM, LDA) and deep learning and MG3P (NN, MLP, ELM) and Gaussian process, k-means | This study performed an overall recognition rate (82.9%) [NN: 85.80%, SVM: 77.80%, KNN: 88.94%, MLP: 78.16%, 87.10%, 78.06%, 71.30%, 71.30%] | [44] |
Nagtive and positive emotions | EEG signal and Facial expressions | ML classfiers: RF, KNN, SVM, DT, LDA and deep learning classifiers: CNN+LSTM | This study achieves the following accuracy levels: 63.33% RF, 63.33% SVM, 61.7% KNN, 55% DT, 51.7% LDA, 71.67% CNN+LSTM. | [45] |
Negative emotions (annonyed, stressed, angry) | EGG physiological signals | ML classifiers: LR, SVM | It achieves accuracy levels: 75.00% LR, 72.62% SVM. | [46] |
Microsoft Wrist-Band 2 | Android Phone 7 |
---|---|
Heart Rate (HR) | Self-Report of Emotion (1–5) |
Body-Temperature (Body-Temp ) | Environmental Noise ( Env-Noise ) |
Electro Dermal Activities (EDA) | GPS Location (lat, lon) |
Hand Acceleration (Motion as three-axis accelerometer) | |
Air Pressure | |
Light (UV) |
Features Extracted | Meaning | Removed Features |
---|---|---|
EDA | It’s called Elctro-Dermal Activity, skin conductance and galvanic skin response (GSR). | FLightofStairsAscended, FLightofStairsDescended, Lat, Lng (Location) |
HR | Heart Rate (Also called pulse) is the number of times the heart beats. | |
Air-Pressure | The pressure of the air. | |
bTemp | It’s called Body temperature. | |
Env-Noise | Represents Environmental Noise. | |
UV | UV means Ultra-violet radiation. | |
Motion | An accelerometer with three axes. It’s a combination of (X, Y, Z) axes. | |
X | Participant’s Motion in X-axis. | |
Y | Participant’s Motion in Y-axis | |
Z | Participant’s Motion in Z-axis | |
Total-Gain | The overall gain achieved by the participant. | |
Total-Loss | The amount of calories lost. | |
Stepping-Gain | Steps achieved or gained during travel. | |
Stepping-Loss | The steps in which a loss of calories occurred. | |
Steps-Ascended | Number of steps in ascending order. | |
Steps-Descended | Number of steps in descending order. | |
Rate | The rate of movement in X, Y, and Z directions. | |
Label | The target emotion labels can be (1–5) |
Bagging | Boosting | Stacking | |
---|---|---|---|
Differences | Bagging often considers homogeneous weak learners, learns them independently from each other in parallel, and combines them following some kind of deterministic averaging process. | Boosting frequently takes into account homogeneous weak learners, trains them sequentially in a highly adaptive way (a base model depends on the preceding ones), and combines them by a deterministic method. | Stacking frequently takes into account diverse weak learners, trains them concurrently, and then combines them by training a meta-model to produce a prediction based on the output of the many weak models. |
Characteristics | Bagging enables a group of weak learners to work together to outperform a single good student. Additionally, it aids in variance reduction, hence preventing the over-fitting of models during the process. | Boosting models could be improved with the help of several hyper-parameter variables. Boosting algorithms iteratively combine several weak learners and enhance observations. It might lessen a high bias that frequently appeared in models like decision trees and logistic regression. With Boosting Algorithms, characteristics are only chosen that have a large impact on the target, potentially reducing dimensionality and improving computational efficiency. | Stacking can harness the capabilities of a range of well-performing models on a classification or regression task and make predictions that have better performance than any single model in the ensemble. |
Min | Q1 | Med | Q2 | Mean | Q3 | Max | skw | kur | std | |
---|---|---|---|---|---|---|---|---|---|---|
EDA | 0.0 | 0.0 | 340,330 | 340,330 | 221,954 | 340,330 | 340,330 | −0.6 | −1.7 | 165,736 |
HR | 0.0 | 0.0 | 70.0 | 70.0 | 45.7 | 70.0 | 70.0 | −0.6 | −1.7 | 34.1 |
UV | 78.0 | 80.0 | 82.0 | 82.0 | 82.5 | 85.0 | 89.0 | 0.6 | −1.0 | 3.7 |
X | −0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | −0.6 | −0.5 | 0.0 |
Y | 0.1 | 0.1 | 0.1 | 0.1 | 0.2 | 0.2 | 0.3 | 0.2 | −1.7 | 0.1 |
Z | 0.9 | 0.9 | 1.0 | 1.0 | 1.0 | 1.0 | 1.1 | −0.1 | 1.5 | 0.0 |
EnvNoise | 49.0 | 52.0 | 52.0 | 52.0 | 52.6 | 53.0 | 56.0 | 0.2 | −0.2 | 1.6 |
AirPressure | 0.0 | 0.0 | 1010.6 | 1010.6 | 703.1 | 1010.7 | 1010.7 | −0.8 | −1.4 | 475.5 |
bTemp | 0.0 | 0.0 | 22.8 | 22.8 | 15.8 | 22.8 | 22.8 | −0.8 | −1.4 | 10.7 |
EDA | HR | UV | Motion | EnvNoise | AirPressure | bTemp | |
---|---|---|---|---|---|---|---|
EDA | 26,655,597,441 | −1,012,733 | −202,362,467 | 3430.42 | 95,587 | 33,144.6 | −648,616 |
HR | −1,012,733 | 293.4 | 32,554.2 | 0.67 | −7.4 | 2.104 | 51.51 |
UV | −202,362,467 | 32,554.2 | 11,783,951.32 | 40.46 | −1174.20 | 516.51 | 9288.6 |
Motion | 3430.4 | 0.57 | 40.46 | 0.13 | 0.016 | 0.032 | −0.046 |
EnvNoise | 95,587.03 | −7.40 | −1174.20 | 0.016 | 4.82 | 0.15 | −3.40 |
AirPressure | 33,144.58 | 2.10 | 516.51 | 0.032 | 0.15 | 0.31 | −0.58 |
bTemp | −648,616 | 51.51 | 9288.58 | −0.046 | −3.39 | −0.58 | 22.79 |
Precision | Recall | F1-Score | Support | |
---|---|---|---|---|
1 | 0.94 | 0.98 | 0.96 | 666 |
2 | 0.98 | 0.97 | 0.98 | 1661 |
3 | 0.99 | 0.99 | 0.99 | 2532 |
4 | 0.99 | 0.98 | 0.99 | 1813 |
5 | 0.98 | 0.99 | 0.99 | 1415 |
Accuracy | 0.98 | 8087 | ||
macro avg | 0.98 | 0.98 | 0.98 | 8087 |
weighted avg | 0.98 | 0.98 | 0.98 | 8087 |
Classifier | Body + Environmental | Body | Environmental |
---|---|---|---|
KNN | 93% | 89% | 87% |
SVM | 94% | 91% | 88% |
DT | 97% | 91% | 89.50% |
RF | 97.50% | 91.40% | 88% |
Stacking | 98.20% | 93% | 91% |
Classifier | Parameter | Parameter Explanation |
---|---|---|
KNN | KNN parameters (weights = distance, p = 2, n-neighbors = 4, leaf-size = 30, algorithm = ’auto’) | Explanation:- Weights: weight function used in prediction; p: power parameter for the Minkowski metric; n-neighbors: number of neighbors to use; leaf-size: leaf size passed to algorithm; algorithm: used to compute the nearest neighbors |
SVM | SVM Parameters (C = 100, gamma = 0.01) | C: Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. C can be each value from this [1,10,100,1000]. Gamma: gamma is a parameter for non linear hyperplanes. The higher the gamma value it tries to exactly fit the training data set. Gamma values can be in range of [0.0001,100]. We can see that increasing gamma leads to overfitting as the classifier tries to perfectly fit the training data. Kernel function: Kernel Function is a method used to take data as input and transform into the required form of processing data. “Kernel” is used due to set of mathematical functions used in Support Vector Machine provides the window to manipulate the data. Kernel can be sigmoid or poly or rbf. Usually kernel function is RBF(radial-bias function). |
RF | RF Parameters (n-estimators = 700, max-depth = 100) | n-estimators: the number of trees in the forest; max-depth: the maximum depth of the tree. |
DT | DT Parameters (max-depth = 300, max-leaf-nodes = 900, splitter = ‘best’) | max-depth: the maximum depth of the tree max-leaf-nodes: maximum number of leaf nodes Grow a tree with max-leaf-nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. splitter: the strategy used to choose the split at each node. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Younis, E.M.G.; Zaki, S.M.; Kanjo, E.; Houssein, E.H. Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion. Sensors 2022, 22, 5611. https://doi.org/10.3390/s22155611
Younis EMG, Zaki SM, Kanjo E, Houssein EH. Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion. Sensors. 2022; 22(15):5611. https://doi.org/10.3390/s22155611
Chicago/Turabian StyleYounis, Eman M. G., Someya Mohsen Zaki, Eiman Kanjo, and Essam H. Houssein. 2022. "Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion" Sensors 22, no. 15: 5611. https://doi.org/10.3390/s22155611
APA StyleYounis, E. M. G., Zaki, S. M., Kanjo, E., & Houssein, E. H. (2022). Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion. Sensors, 22(15), 5611. https://doi.org/10.3390/s22155611