A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification
Abstract
:1. Introduction
1.1. Sonification Applications in the IoT
1.2. Research Problem: Modeling and Predicting Perceived and Induced Emotion
1.3. Research Questions
- RQ1. How well do machine learning models perform when predicting arousal and valence?
- RQ2. How different are the models that are built for predicting perceived and induced emotions?
- RQ3. What are the significant acoustic features for predicting arousal and valence?
- RQ4. How do the significant features vary for predicting perceived and induced emotions?
1.4. Contributions of This Work
- –
- We present a small-scale survey of the literature related to emotion recognition, along with the features and datasets used.
- –
- We build machine learning models to predict perceived and induced emotions.
- –
- We compare and contrast the features used to build the best prediction models for different emotional dimensions (i.e., arousal, valence, and dominance).
- –
- We report the significant acoustic features identified when building the best prediction models for both perceived and induced emotions.
2. Related Work
- Input
- –
- Dataset (number of samples; types of samples, e.g., sound event/music/songs/etc.)
- –
- Features (number of features; types of features, e.g., psycho-acoustic features, dimensions, e.g., 1D/2D)
- Output
- –
- Output (categorical model, e.g., sad/happy/angry/etc; dimensional model, e.g., arousal/valence/dominance)
- –
- Perceived or induced emotion
- Problem
- –
- Classification, clustering, or prediction problems
- –
- Evaluation metrics, e.g., RMSE/MSE/accuracy/explained variance/etc.)
- –
- Feature selection/reduction
- –
- Feature analysis (significant features, smallest number of features, and so on)
2.1. Music Emotion Recognition
2.2. Sound Emotion Recognition
3. Experimental Setup
3.1. Datasets and Psychoacoustic Features
- –
- Dynamics—intensity of the signal, such as the root mean square (RMS) of the amplitude;
- –
- Rhythm—articulation, density, and temporal periodicity of events, such as the number of events per second (event density);
- –
- Timbre/Spectrum—brightness, noisiness, dissonance, and shape of the frequency spectrum, such as the spectral center of mass (centroid);
- –
- Pitch—presence of harmonic sounds, such as the proportion of frequencies that are not multiples of the fundamental frequency (inharmonicity);
- –
- Tonality—presence of harmonic sounds that collectively imply a major or minor key, such as the strength of a tonal center (key clarity).
3.1.1. EmoSoundscape Dataset: A Dataset for “Perceived” Emotion
3.1.2. IADSE Dataset: A Dataset for “Induced” Emotion
3.2. Evaluation Metrics for Analysis
4. Methodology
4.1. Feature Selection
4.2. Hyper-Parameter Tuning
- –
- , number of trees in the forests;
- –
- , maximum number of levels in each decision tree;
- –
- , minimum number of data points placed in a node before the node is split;
- –
- , minimum number of data points allowed in a leaf node;
- –
- , number of features selected using RFE with the RF estimator.
5. Results and Analysis
5.1. Performance of Prediction Models
5.2. Significant Features
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Real-Time IoT Monitoring—Visualize Device Performance. Available online: https://www.datadoghq.com/ (accessed on 1 July 2021).
- Khan, W.; Ansell, D.; Kuru, K.; Bilal, M. Flight guardian: Autonomous flight safety improvement by monitoring aircraft cockpit instruments. J. Aerosp. Inf. Syst. 2018, 15, 203–214. [Google Scholar] [CrossRef]
- Saraubon, K.; Anurugsa, K.; Kongsakpaibul, A. A Smart System for Elderly Care Using IoT and Mobile Technologies. In Proceedings of the ICSEB ’18—2018 2nd International Conference on Software and E-Business, Zhuhai, China, 18–20 December 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 59–63. [Google Scholar] [CrossRef]
- Sainadh, A.V.M.S.; Mohanty, J.S.; Teja, G.V.; Bhogal, R.K. IoT Enabled Real-Time Remote Health Monitoring System. In Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 428–433. [Google Scholar] [CrossRef]
- Shahada, S.A.A.; Hreiji, S.M.; Atudu, S.I.; Shamsudheen, S. Multilayer Neural Network Based Fall Alert System Using IOT. Int. J. MC Sq. Sci. Res. 2019, 11, 1–15. [Google Scholar]
- Mwangi, A.; Ndashimye, E.; Karikumutima, B.; Ray, S.K. An IoT-alert System for Chronic Asthma Patients. In Proceedings of the 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 4–7 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 12–19. [Google Scholar]
- Danna, J.; Velay, J.L. Handwriting Movement Sonification: Why and How? IEEE Trans. Hum.-Mach. Syst. 2017, 47, 299–303. [Google Scholar] [CrossRef]
- Turchet, L. Interactive sonification and the IoT: The case of smart sonic shoes for clinical applications. In Proceedings of the 14th International Audio Mostly Conference: A Journey in Sound, Nottingham, UK, 18–20 September 2019; pp. 252–255. [Google Scholar]
- Rutkowski, T.M. Multichannel EEG sonification with ambisonics spatial sound environment. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, Siem Reap, Cambodia, 9–12 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–4. [Google Scholar]
- Quasim, M.T.; Alkhammash, E.H.; Khan, M.A.; Hadjouni, M. Emotion-based music recommendation and classification using machine learning with IoT Framework. Soft Comput. 2021, 25, 12249–12260. [Google Scholar] [CrossRef]
- Timoney, J.; Yaseen, A.; Mcevoy, D. The Potential Role of Internet of Musical Things in Therapeutic Applications. In Proceedings of the 10th Workshop on Ubiquitous Music (UbiMus 2020), g-ubimus, Porto Seguro, BA, Brazil, 5–7 August 2020. [Google Scholar]
- Roja, P.; Srihari, D. Iot based smart helmet for air quality used for the mining industry. Int. J. Res. Sci. Eng. Technol. 2018, 4, 514–521. [Google Scholar]
- Meshram, P.; Shukla, N.; Mendhekar, S.; Gadge, R.; Kanaskar, S. IoT Based LPG Gas Leakage Detector. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2019, 5, 531–534. [Google Scholar] [CrossRef]
- Santiputri, M.; Tio, M. IoT-based Gas Leak Detection Device. In Proceedings of the 2018 International Conference on Applied Engineering (ICAE), Batam, Indonesia, 3–4 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
- ALshukri, D.; Sumesh, E.; Krishnan, P. Intelligent border security intrusion detection using iot and embedded systems. In Proceedings of the 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC), Muscat, Oman, 15–16 January 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–3. [Google Scholar]
- Saquib, Z.; Murari, V.; Bhargav, S.N. BlinDar: An invisible eye for the blind people making life easy for the blind with Internet of Things (IoT). In Proceedings of the 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 19–20 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 71–75. [Google Scholar]
- Soh, Z.H.C.; Husa, M.A.A.H.; Abdullah, S.A.C.; Shafie, M.A. Smart waste collection monitoring and alert system via IoT. In Proceedings of the 2019 IEEE 9th Symposium on Computer Applications&Industrial Electronics (ISCAIE), Sabah, Malaysia, 27–28 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 50–54. [Google Scholar]
- Paul, S.; Banerjee, S.; Biswas, S. Smart Garbage Monitoring Using IoT. In Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 1–3 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1181–1185. [Google Scholar]
- Association, A.P. Emotion—APA Dictionary of Psychology. Available online: https://dictionary.apa.org/emotion (accessed on 1 July 2021).
- Tao, J.; Tan, T. Affective Computing: A Review. In International Conference on Affective Computing and Intelligent Interaction; Springer: Berlin/Heidelberg, Germany, 2005; pp. 981–995. [Google Scholar] [CrossRef]
- Picard, R.W. Affective Computing; MIT Press: Cambridge, MA, USA, 1997. [Google Scholar]
- Song, Y.; Dixon, S.; Pearce, M.T.; Halpern, A.R. Perceived and Induced Emotion Responses to Popular Music: Categorical and Dimensional Models. Music Percept. Interdiscip. J. 2016, 33, 472–492. [Google Scholar] [CrossRef]
- Ekman, P. An argument for basic emotions. Cogn. Emot. 1992, 6, 169–200. [Google Scholar] [CrossRef]
- Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [Google Scholar] [CrossRef]
- Zentner, M.; Grandjean, D.; Scherer, K. Emotions Evoked by the Sound of Music: Characterization, Classification, and Measurement. Emotion 2008, 8, 494–521. [Google Scholar] [CrossRef] [Green Version]
- Gomez, P.; Danuser, B. Affective and physiological responses to environmental noises and music. Int. J. Psychophysiol. 2004, 53, 91–103. [Google Scholar] [CrossRef]
- Gingras, B.; Marin, M.M.; Fitch, W.T. Beyond Intensity: Spectral Features Effectively Predict Music-Induced Subjective Arousal. Q. J. Exp. Psychol. 2014, 67, 1428–1446. [Google Scholar] [CrossRef]
- Egermann, H.; Fernando, N.; Chuen, L.; McAdams, S. Music induces universal emotion-related psychophysiological responses: Comparing Canadian listeners to Congolese Pygmies. Front. Psychol. 2015, 5, 1341. [Google Scholar] [CrossRef] [Green Version]
- Wanlu, Y.; Makita, K.; Nakao, T.; Kanayama, N.; Machizawa, M.; Sasaoka, T.; Sugata, A.; Kobayashi, R.; Ryosuke, H.; Yamawaki, S.; et al. Affective auditory stimulus database: An expanded version of the International Affective Digitized Sounds (IADS-E). Behav. Res. Methods 2018, 50, 1415–1429. [Google Scholar] [CrossRef]
- Fan, J.; Thorogood, M.; Pasquier, P. Emo-soundscapes: A dataset for soundscape emotion recognition. In Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017; pp. 196–201. [Google Scholar]
- Griffiths, D.; Cunningham, S.; Weinel, J. A self-report study that gauges perceived and induced emotion with music. In Proceedings of the 2015 Internet Technologies and Applications (ITA), Wrexham, UK, 8–11 September 2015; pp. 239–244. [Google Scholar]
- Constantin, F.A.; Drăgulin, S. Few Perspectives and Applications of Music Induced Emotion. In Proceedings of the 2019 5th Experiment International Conference (exp.at’19), Funchal, Portugal, 12–14 June 2019; pp. 481–485. [Google Scholar]
- Liu, M.; Chen, H.; Li, Y.; Zhang, F. Emotional Tone-Based Audio Continuous Emotion Recognition. In MultiMedia Modeling; He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 470–480. [Google Scholar]
- Ooi, C.S.; Seng, K.P.; Ang, L.M.; Chew, L.W. A new approach of audio emotion recognition. Expert Syst. Appl. 2014, 41, 5858–5869. [Google Scholar] [CrossRef]
- Sezgin, M.C.; Günsel, B.; Kurt, G.K. A novel perceptual feature set for audio emotion recognition. In Proceedings of the Face and Gesture 2011, Santa Barbara, CA, USA, 21–25 March 2011; pp. 780–785. [Google Scholar]
- Yang, Y.H.; Lin, Y.C.; Su, Y.F.; Chen, H. A Regression Approach to Music Emotion Recognition. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 448–457. [Google Scholar] [CrossRef]
- Yang, Y.H.; Chen, H. Ranking-Based Emotion Recognition for Music Organization and Retrieval. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 762–774. [Google Scholar] [CrossRef]
- Eerola, T.; Lartillot, O.; Toiviainen, P. Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models. In Proceedings of the 10th International Society for Music Information Retrieval Conference, Kobe, Japan, 26–30 October 2009; pp. 621–626. [Google Scholar]
- Seo, Y.S.; Huh, J.H. Automatic Emotion-Based Music Classification for Supporting Intelligent IoT Applications. Electronics 2019, 8, 164. [Google Scholar] [CrossRef] [Green Version]
- Liu, T.; Han, L.; Ma, L.; Guo, D. Audio-based deep music emotion recognition. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2018; Volume 1967, p. 040021. [Google Scholar] [CrossRef]
- Soleymani, M.; Caro, M.N.; Schmidt, E.M.; Sha, C.Y.; Yang, Y.H. 1000 Songs for Emotional Analysis of Music. In Proceedings of the ACM International Workshop on Crowdsourcing for Multimedia, Association for Computing Machinery, Barcelona, Spain, 22 October 2013; pp. 1–6. [Google Scholar]
- Fan, J.; Tatar, K.; Thorogood, M.; Pasquier, P. Ranking-Based Emotion Recognition for Experimental Music. In Proceedings of the International Society for Music Information Retrieval Conference, Suzhou, China, 23–27 October 2017. [Google Scholar]
- Schafer, R. The Soundscape: Our Sonic Environment and the Tuning of the World; Inner Traditions/Bear: Rochester, VT, USA, 1993. [Google Scholar]
- Schuller, B.; Hantke, S.; Weninger, F.; Han, W.; Zhang, Z.; Narayanan, S. Automatic recognition of emotion evoked by general sound events. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 341–344. [Google Scholar]
- Drossos, K.; Kotsakis, R.; Kalliris, G.; Floros, A. Sound events and emotions: Investigating the relation of rhythmic characteristics and arousal. In Proceedings of the IISA 2013, Piraeus, Greece, 10–12 July 2013; pp. 1–6. [Google Scholar]
- Bradley, M.M.; Lang, P.J. The International Affective Digitized Sounds (2nd Edition; IADS-2): Affective Ratings of Sounds and Instruction Manual; Technical report B-3; University of Florida: Gainesville, FL, USA, 2007. [Google Scholar]
- Mathieu, B.; Essid, S.; Fillon, T.; Prado, J.; Richard, G. YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), Utrecht, The Netherlands, 9–13 August 2010; pp. 441–446. [Google Scholar]
- Sundaram, S.; Schleicher, R. Towards evaluation of example-based audio retrieval system using affective dimensions. In Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, Singapore, 19–23 July 2010; pp. 573–577. [Google Scholar]
- Fan, J.; Tung, F.; Li, W.; Pasquier, P. Soundscape emotion recognition via deep learning. In Proceedings of the Sound and Music Computing, Limassol, Cyprus, 4–7 July 2018. [Google Scholar]
- Hershey, S.; Chaudhuri, S.; Ellis, D.P.W.; Gemmeke, J.F.; Jansen, A.; Moore, R.C.; Plakal, M.; Platt, D.; Saurous, R.A.; Seybold, B.; et al. CNN architectures for large-scale audio classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 131–135. [Google Scholar] [CrossRef] [Green Version]
- Ntalampiras, S.; Potamitis, I. Emotion Prediction of Sound Events Based on Transfer Learning. In Engineering Applications of Neural Networks; Boracchi, G., Iliadis, L., Jayne, C., Likas, A., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 303–313. [Google Scholar]
- Ntalampiras, S. Emotional quantification of soundscapes by learning between samples. Multimed. Tools Appl. 2020, 79, 30387–30395. [Google Scholar] [CrossRef]
- Cunningham, S.; Ridley, H.; Weinel, J.; Picking, R. Audio Emotion Recognition Using Machine Learning to Support Sound Design. In Proceedings of the AM’19: 14th International Audio Mostly Conference: A Journey in Sound, Nottingham, UK, 18–20 September 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 116–123. [Google Scholar] [CrossRef]
- Cunningham, S.; Ridley, H.; Weinel, J.; Picking, R. Supervised machine learning for audio emotion recognition. Pers. Ubiquitous Comput. 2020, 25, 637–650. [Google Scholar] [CrossRef] [Green Version]
- Drossos, K.; Floros, A.; Giannakoulopoulos, A. BEADS: A dataset of Binaural Emotionally Annotated Digital Sounds. In Proceedings of the IISA 2014, the 5th International Conference on Information, Intelligence, Systems and Applications, Chania, Greece, 7–9 July 2014; pp. 158–163. [Google Scholar]
- Drossos, K.; Floros, A.; Giannakoulopoulos, A.; Kanellopoulos, N. Investigating the Impact of Sound Angular Position on the Listener Affective State. IEEE Trans. Affect. Comput. 2015, 6, 27–42. [Google Scholar] [CrossRef]
- Asutay, E.; Västfjäll, D.; Tajadura-Jiménez, A.; Genell, A.; Bergman, P.; Kleiner, M. Emoacoustics: A Study of the Psychoacoustical and Psychological Dimensions of Emotional Sound Design. J. Audio Eng. Soc. 2012, 60, 21–28. [Google Scholar]
- Bradley, M.M.; Lang, P.J. Measuring emotion: The self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 1994, 25, 49–59. [Google Scholar] [CrossRef]
- Lartillot, O.; Toiviainen, P.; Eerola, T. A Matlab Toolbox for Music Information Retrieval. In Data Analysis, Machine Learning and Applications; Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 261–268. [Google Scholar]
- Lange, E.; Frieler, K. Challenges and Opportunities of Predicting Musical Emotions with Perceptual and Automatized Features. Music Percept. 2018, 36, 217–242. [Google Scholar] [CrossRef]
- Spiess, A.; Neumeyer, N. An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: A Monte Carlo approach. BMC Pharmacol. 2010, 10, 6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Abri, F.; Gutiérrez, L.F.; Siami Namin, A.; Sears, D.R.W.; Jones, K.S. Predicting Emotions Perceived from Sounds. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 2057–2064. [Google Scholar] [CrossRef]
- Altman, N.; Krzywinski, M. Points of Significance: Ensemble methods: Bagging and random forests. Nat. Methods 2017, 14, 933–934. [Google Scholar] [CrossRef]
- Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
- Siami Namin, A.; Hewett, R.; Jones, K.S.; Pogrund, R. Sonifying Internet Security Threats. In Proceedings of the CHI EA ’16: CHI Conference Extended Abstracts on Human Factors in Compting Systems, San Jose, CA, USA, 7–12 May 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 2306–2313. [Google Scholar] [CrossRef] [Green Version]
- Datta, P.; Siami Namin, A.; Jones, K.; Hewett, R. Warning users about cyber threats through sounds. SN Appl. Sci. 2021, 3, 714. [Google Scholar] [CrossRef]
Ref. | Problem Formulation | Emotion | Results | |
---|---|---|---|---|
Cat./Dim. | I/P | |||
[36] | Regression analysis | Dim. (Ar.–Val.) | P | stats.: 58.3% Ar., 28.1% Val. |
[37] | Ranking | Dim. (Ar.–Val.) | P | Gamma stats.: 0.326 val. |
[38] | – | Dim. and Cat. (5 emot.) | P | stats.: 77% Ar., 70% Val. |
[39] | – | Dim. and Cat. (4 quadrants) | I | Acc.: 73.96% (SVM) |
[40] | Classification | Dim. (Ar.–Val.) | P | Acc.: 72.4% (CNN) |
[42] | Ranking | Dim. (Ar.–Val.) | P | Gamma stats.: 0.801 Ar., 0.795 Val. |
[44] | Regression | Dim. (Ar.–Val.) | P | Corr. Coeff. 0.61 Ar., 0.49 Val. |
[45] | Classification and Ranking | Dim. (Ar.) | P | Acc.: 81.44% Ar. (Log. regr.) |
[55,56] | Annotator Ratings | Dim. (Ar.–Val.) | P | Diff. in Mean Ar.: 0.18, Mean Val.: 0.38 IADS and BEADS |
[30] | Regression | Dim. (Ar.-Val.) | P | : 0.853 Ar., 0.623 Val. |
[49] | Classification and Regression | Dim. (Ar.–Val.) and Cat. (Schafer’s) | P | : 0.892 Ar., 0.759 Val. |
[51] | Regression, Clustering | Dim. (Ar.–Val.) | P | MSE: 3.13 Ar., 3.10 Val. |
[52] | Prediction | Dim. (Ar.–Val.) and Cat. (Schafer’s) | P | MSE: 0.0107 Ar., 0.0168 Val. |
[53,54] | Regression | Dim. (Ar.–Val.) | P | : 0.345 Ar., 0.269 Val. |
[57] | Annotator Ratings | Dim. (Ar.,Val., Annoyance, and Loudness) | I | – |
[48] | Latent Analysis and Retrieval | Dim. (Ar.–Val.–Domn.) | P | RMSE for top-5 clips b.w. 1.6 to 2.6 |
Ref. | Dataset | Features | |
---|---|---|---|
No. of Features | Type | ||
[36] | 195 Pop Songs | 114 | PsySound, Spectral Contrast, Daubechies wavelets coefficient histogram (DWCH) |
[37] | 60 K-pop and 1240 Chinese pop music samples | 157 | 10 Melody, 142 timbre, 5 rhythm |
[38] | SoundTrack110-110 song samples | 29 | dynamics, timbre, harmony, register, rhythm, articulation |
[39] | 100 K-pop songs | Avg. height, peak avg., HfW, avg. width, BPM | |
[40] | 1000 song dataset [41] | 30,498 | Spectrograms |
[42] | 100 Emusicclips | 56 | features of MIR Toolbox [59] |
[44] | Emotional Sound Database (390 sounds) | 73 | 31 low-level descriptors (energy, spectral, and voicing) and 42 functional (statistical, regression, and local minima/maxima) |
[45] | IADS dataset (167 sounds) [46] | 26 | beat spectrum, onsets, tempo, fluctuation, even density, and pulse clarity |
[55,56] | Binaural sound corpus—BEADS (167 sounds) | 5 | Angular adjustments (45°, 90°, 135°, and 180°) |
[30] | EmoSoundscapes—1213 soundscape files | 39 | features of MIRToolbox and YAAFE |
[49] | EmoSoundscapes [30] | 54 | loudness, MFCC, energy, spectral |
[51] | IADS [46] and 1000 songs [41] | MFCC and Perceptual Wavelet Packets (PWP) | |
[52] | EmoSoundscapes [30] | 23 | MFCC-like |
[53,54] | IADS [46] | 76 | Features of MIRToolbox |
[57] | 18 envir. sounds from IADS [46] | Fourier-time transformation (FTT) | |
[48] | 2491 audio clips from the BBC Sound Effects Library | 12 | MFCC |
Selected MIR Features | |
---|---|
Feature | Count |
Dynamics | 2 |
Pitch | 1 |
Rhythm | 6 |
Spectral | 23 |
Spectral MFCC | 26 |
Timbre | 4 |
Tonal | 6 |
Total | 68 |
Feature 1 | Feature 2 | Correlation |
---|---|---|
timbre spectralflux (std) | dynamics rms (std) | 0.954 |
spectral spread (mean) | spectral rolloff95 (mean) | 0.949 |
timbre spectralflux (mean) | dynamics rms (mean) | 0.946 |
spectral rolloff85 (mean) | spectral centroid (mean) | 0.94 |
spectral skewness (mean) | spectral kurtosis (mean) | 0.93 |
spectral mfcc 12 (std) | spectral mfcc 11 (std) | 0.905 |
spectral rolloff85 (mean) | spectral flatness (mean) | 0.902 |
Feature 1 | Feature 2 | Correlation |
---|---|---|
spectral kurtosis (mean) | spectral skewness (mean) | 0.968 |
spectral rolloff85 (mean) | spectral centroid (mean) | 0.951 |
spectral rolloff95 (mean) | spectral spread (mean) | 0.927 |
spectral rolloff85 (mean) | spectral rolloff95 (mean) | 0.923 |
Our Previous Work | Arousal | Valence | |||||
EmoSoundscape | IADS | IADSE | EmoSoundscape | IADS | IADSE | ||
DS | No. of Samples | 600 | 167 | 927 | 600 | 167 | 927 |
KBest | No. of Features | 26/68 | 28/313 | 27/68 | 29/68 | 27/313 | 27/68 |
Random Forest Hyper-parameters | No. of Estimators | 200 | 200 | 30 | 100 | 150 | 150 |
Max Depth | 20 | 10 | 20 | 30 | 5 | 20 | |
Min Samples Split | 3 | 5 | 4 | 2 | 2 | 2 | |
Min Samples Leaf | 1 | 2 | 1 | 1 | 2 | 1 | |
Evaluation Metrics | Train RMSE | 0.09 | 0.36 | 0.30 | 0.013 | 0.41 | 0.41 |
Test RMSE | 0.25 | 0.88 | 0.78 | 0.37 | 0.98 | 1.13 | |
Our Current Work | Arousal | Valence | Dominance | ||||
EmoSoundscape | IADSE | EmoSoundscape | IADSE | IADSE | |||
DS | No. of Samples | 600 | 927 | 600 | 927 | 927 | |
RFE | No. of Features | 15/68 | 25/68 | 14/68 | 9/68 | 7/68 | |
Random Forest Hyper-parameters | No. of Estimators | 50 | 300 | 50 | 250 | 150 | |
Max Depth | 20 | 20 | 10 | 30 | 5 | ||
Min Samples Split | 5 | 3 | 5 | 2 | 2 | ||
Min Samples Leaf | 2 | 1 | 2 | 1 | 5 | ||
Evaluation Metrics | Train RMSE | 0.1032 | 0.2818 | 0.1681 | 0.3970 | 0.70 | |
Test RMSE | 0.2351 | 0.7782 | 0.3698 | 1.1577 | 0.83 | ||
Train MSE | 0.0106 | 0.0794 | 0.0283 | 0.1576 | 0.49 | ||
Test MSE | 0.0552 | 0.6055 | 0.1367 | 1.3402 | 0.70 | ||
Train R2 | 0.9718 | 0.9422 | 0.9137 | 0.9200 | 0.48 | ||
Test R2 | 0.8639 | 0.5631 | 0.5860 | 0.3700 | 0.26 |
RMSE—IADSE | RMSE—EmoSoundscape | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Arousal | Valence | Dominance | Arousal | Valence | |||||||
Model | Features | Train | Test | Train | Test | Train | Test | Train | Test | Train | Test |
Tuned RF | all | – | – | – | – | – | – | – | – | – | – |
selected | 0.28 | 0.77 | 0.39 | 1.15 | 0.70 | 0.83 | 0.10 | 0.23 | 0.16 | 0.36 | |
4L MLP | all | 0.22 | 0.83 | 0.22 | 1.23 | 0.20 | 0.89 | 0.06 | 0.26 | 0.06 | 0.39 |
selected | 0.30 | 0.81 | 0.60 | 1.19 | 0.55 | 0.89 | 0.09 | 0.25 | 0.13 | 0.39 | |
1D CNN | all | 0.85 | 0.89 | 1.41 | 1.44 | 0.97 | 0.97 | 0.29 | 0.31 | 0.41 | 0.42 |
selected | 0.83 | 0.87 | 1.27 | 1.37 | 0.97 | 0.97 | 0.24 | 0.26 | 0.38 | 0.39 | |
Average | all | 0.645 | 0.860 | 0.925 | 1.335 | 0.585 | 0.93 | 0.175 | 0.285 | 0.235 | 0.405 |
selected | 0.476 | 0.816 | 0.753 | 1.236 | 0.74 | 0.916 | 0.143 | 0.246 | 0.223 | 0.380 |
Arousal 15/68 | Sign. |
---|---|
spectral roughness (mean) | 0.704 |
timbre spectralflux (mean) | 0.085 |
dynamics rms (mean) | 0.029 |
spectral brightness (mean) | 0.027 |
spectral spectentropy (mean) | 0.023 |
spectral rolloff85 (mean) | 0.018 |
spectral rolloff95 (mean) | 0.015 |
rhythm fluctuationmax peakposmean | 0.015 |
spectral mfcc 13 (std) | 0.014 |
spectral mfcc 12 (std) | 0.012 |
spectral irregularity (mean) | 0.010 |
spectral mfcc 5 (mean) | 0.010 |
timbre lowenergy (std) | 0.010 |
tonal hcdf (mean) | 0.010 |
rhythm attacktime (mean) | 0.009 |
Valence 14/68 | Sign. |
spectral roughness (mean) | 0.374 |
dynamics rms (mean) | 0.155 |
timbre lowenergy (std) | 0.105 |
spectral mfcc 6 (std) | 0.054 |
spectral centroid (std) | 0.043 |
spectral mfcc 4 (std) | 0.042 |
rhythm pulseclarity (mean) | 0.033 |
spectral skewness (mean) | 0.032 |
spectral mfcc 9 (std) | 0.031 |
spectral mfcc 8 (mean) | 0.029 |
spectral rolloff95 (mean) | 0.026 |
spectral flatness (std) | 0.024 |
spectral rolloff95 (std) | 0.024 |
timbre spectralflux (mean) | 0.020 |
Arousal | Sign. |
---|---|
timbre spectralflux (mean) | 0.289 |
dynamics rms (mean) | 0.091 |
dynamics rms (std) | 0.064 |
spectral flatness (std) | 0.044 |
spectral roughness (mean) | 0.041 |
spectral spectentropy (mean) | 0.036 |
spectral rolloff85 (mean) | 0.035 |
spectral brightness (mean) | 0.032 |
rhythm pulseclarity (mean) | 0.031 |
timbre spectralflux (std) | 0.029 |
tonal keyclarity (std) | 0.024 |
spectral skewness (mean) | 0.024 |
rhythm tempo (std) | 0.024 |
pitch pitch (mean) | 0.022 |
spectral flatness (mean) | 0.021 |
timbre lowenergy (mean) | 0.020 |
tonal keyclarity (mean) | 0.019 |
spectral mfcc 6 (mean) | 0.018 |
spectral centroid (mean) | 0.018 |
spectral spectentropy (std) | 0.018 |
spectral brightness (std) | 0.018 |
spectral spread (mean) | 0.018 |
spectral irregularity (mean) | 0.018 |
spectral mfcc 8 (mean) | 0.017 |
spectral mfcc 2 (mean) | 0.016 |
Valence | Sign. |
tonal keyclarity (mean) | 0.248 |
spectral roughness (mean) | 0.136 |
spectral rolloff85 (std) | 0.101 |
dynamics rms (std) | 0.092 |
rhythm fluctuationmax peakposmean | 0.092 |
spectral brightness (mean) | 0.092 |
spectral spread (mean) | 0.083 |
spectral skewness (std) | 0.082 |
spectral mfcc 11 (std) | 0.070 |
Dominance | Sign. |
spectral roughness (mean) | 0.367 |
dynamics rms (mean) | 0.158 |
tonal keyclarity (mean) | 0.115 |
tonal hcdf (std) | 0.111 |
timbre spectralflux (mean) | 0.091 |
spectral brightness (mean) | 0.089 |
spectral mfcc 3 (mean) | 0.065 |
Features | IADSE | EmoSoundscape | |||
---|---|---|---|---|---|
Dynamics | Arousal | Valence | Dominance | Arousal | Valence |
dynamics rms (mean) | * | * | + | + | |
dynamics rms (std) | * | * | |||
Pitch | |||||
pitch pitch (mean) | * | ||||
Rythm | |||||
rhythm pulseclarity (mean) | * | + | |||
rhythm tempo (std) | * | ||||
rhythm fluctuationmax peakposmean | * | + | |||
rhythm attacktime (mean) | + | ||||
Timber | |||||
timbre spectralflux (mean) | * | * | + | + | |
timbre spectralflux (std) | * | ||||
timbre lowenergy (mean) | * | ||||
timbre lowenergy (std) | + | + | |||
Tonal | |||||
tonal keyclarity (mean) | * | * | * | ||
tonal keyclarity (std) | * | ||||
tonal hcdf (mean) | + | ||||
tonal hcdf (std) | * | ||||
Spectral | |||||
spectral flatness (std) | * | ||||
spectral roughness (mean) | * | * | * | + | + |
spectral spectentropy (mean) | * | + | |||
spectral rolloff85 (mean) | * | + | |||
spectral rolloff85 (std) | * | ||||
spectral brightness (mean) | * | * | * | + | |
spectral brightness (std) | * | ||||
spectral skewness (mean) | * | + | |||
spectral skewness (std) | * | ||||
spectral flatness (mean) | * | (+) | |||
spectral flatness (std) | + | ||||
spectral centroid (mean) | * | (+) | |||
spectral centroid (std) | + | ||||
spectral spectentropy (std) | * | ||||
spectral spread (mean) | * | * | (+) | (+) | |
spectral irregularity (mean) | * | + | |||
spectral rolloff95 (mean) | (*) | (*) | + | + | |
spectral rolloff95 (std) | + | ||||
spectral kurtosis (mean) | (+) | ||||
Spectral-mfcc | |||||
spectral mfcc 5 (mean) | + | ||||
spectral mfcc 6 (mean) | * | ||||
spectral mfcc 8 (mean) | * | + | |||
spectral mfcc 2 (mean) | * | ||||
spectral mfcc 3 (mean) | * | ||||
spectral mfcc 4 (std) | + | ||||
spectral mfcc 6 (std) | + | ||||
spectral mfcc 9 (std) | + | ||||
spectral mfcc 11 (std) | * | (+) | |||
spectral mfcc 12 (std) | + | ||||
spectral mfcc 13 (std) | + |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Abri, F.; Gutiérrez, L.F.; Datta, P.; Sears, D.R.W.; Siami Namin, A.; Jones, K.S. A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification. Electronics 2021, 10, 2519. https://doi.org/10.3390/electronics10202519
Abri F, Gutiérrez LF, Datta P, Sears DRW, Siami Namin A, Jones KS. A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification. Electronics. 2021; 10(20):2519. https://doi.org/10.3390/electronics10202519
Chicago/Turabian StyleAbri, Faranak, Luis Felipe Gutiérrez, Prerit Datta, David R. W. Sears, Akbar Siami Namin, and Keith S. Jones. 2021. "A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification" Electronics 10, no. 20: 2519. https://doi.org/10.3390/electronics10202519
APA StyleAbri, F., Gutiérrez, L. F., Datta, P., Sears, D. R. W., Siami Namin, A., & Jones, K. S. (2021). A Comparative Analysis of Modeling and Predicting Perceived and Induced Emotions in Sonification. Electronics, 10(20), 2519. https://doi.org/10.3390/electronics10202519