sensors-logo

Journal Browser

Journal Browser

Deep Learning Methods for Human Activity Recognition and Emotion Detection

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensor Networks".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 207913

Special Issue Editor


E-Mail Website
Guest Editor
Department of Telematic Engineering, Universidad Carlos III de Madrid, 28911 Madrid, Spain
Interests: wearable technologies for health and wellbeing applications; mobile and pervasive computing for assistive living; Internet of Things and assistive technologies; machine learning algorithms for physiological; inertial and location sensors; personal assistants and coaching for health self-management; activity detection and prediction methods
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Detecting and characterizing human movements and activities is the base for providing contextual information while solving more complex challenges such as health self-management, personal recommender systems, object detection and manipulation, behavioral pattern recognition, and professional sport training. Human activities provide information about what the user does. Combining human activity recognition with emotion recognition enhances the contextual information to how the user feels while doing something and provides rich knowledge of context that is able to characterize both the physical and psychological wellbeing aspects of a person.

A wide range of machine learning methods have been applied over the last 20 years to try to automatically characterize human activities and emotions either based on visual information from environment cameras, embedded sensors in different tools and appliances, or wearable non-intrusive sensor devices. The proliferation of data together with the recent deep-learning-based methods have allowed the research community to achieve high-accuracy algorithms to detect human movements and emotions. This Special Issue is focused on papers that provide up-to-date information on either human activity and emotion detection or the combination of both using machine learning methods in different types of sensors. Both research and survey papers are welcome.

Prof. Dr. Mario Munoz-Organero
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Human activity recognition
  • Emotion recognition
  • Machine learning
  • Deep learning
  • Wearable sensors

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (38 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

23 pages, 7559 KiB  
Article
FMCW Radar Human Action Recognition Based on Asymmetric Convolutional Residual Blocks
by Yuan Zhang, Haotian Tang, Ye Wu, Bolun Wang and Dalin Yang
Sensors 2024, 24(14), 4570; https://doi.org/10.3390/s24144570 - 15 Jul 2024
Cited by 1 | Viewed by 1089
Abstract
Human action recognition based on optical and infrared video data is greatly affected by the environment, and feature extraction in traditional machine learning classification methods is complex; therefore, this paper proposes a method for human action recognition using Frequency Modulated Continuous Wave (FMCW) [...] Read more.
Human action recognition based on optical and infrared video data is greatly affected by the environment, and feature extraction in traditional machine learning classification methods is complex; therefore, this paper proposes a method for human action recognition using Frequency Modulated Continuous Wave (FMCW) radar based on an asymmetric convolutional residual network. First, the radar echo data are analyzed and processed to extract the micro-Doppler time domain spectrograms of different actions. Second, a strategy combining asymmetric convolution and the Mish activation function is adopted in the residual block of the ResNet18 network to address the limitations of linear and nonlinear transformations in the residual block for micro-Doppler spectrum recognition. This approach aims to enhance the network’s ability to learn features effectively. Finally, the Improved Convolutional Block Attention Module (ICBAM) is integrated into the residual block to enhance the model’s attention and comprehension of input data. The experimental results demonstrate that the proposed method achieves a high accuracy of 98.28% in action recognition and classification within complex scenes, surpassing classic deep learning approaches. Moreover, this method significantly improves the recognition accuracy for actions with similar micro-Doppler features and demonstrates excellent anti-noise recognition performance. Full article
Show Figures

Figure 1

16 pages, 641 KiB  
Article
A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization
by Antonios Papadakis and Evaggelos Spyrou
Sensors 2024, 24(8), 2491; https://doi.org/10.3390/s24082491 - 12 Apr 2024
Cited by 1 | Viewed by 1307
Abstract
Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. [...] Read more.
Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works. Full article
Show Figures

Figure 1

29 pages, 16436 KiB  
Article
Efficient Human Violence Recognition for Surveillance in Real Time
by Herwin Alayn Huillcen Baca, Flor de Luz Palomino Valdivia and Juan Carlos Gutierrez Caceres
Sensors 2024, 24(2), 668; https://doi.org/10.3390/s24020668 - 20 Jan 2024
Cited by 3 | Viewed by 2303
Abstract
Human violence recognition is an area of great interest in the scientific community due to its broad spectrum of applications, especially in video surveillance systems, because detecting violence in real time can prevent criminal acts and save lives. The majority of existing proposals [...] Read more.
Human violence recognition is an area of great interest in the scientific community due to its broad spectrum of applications, especially in video surveillance systems, because detecting violence in real time can prevent criminal acts and save lives. The majority of existing proposals and studies focus on result precision, neglecting efficiency and practical implementations. Thus, in this work, we propose a model that is effective and efficient in recognizing human violence in real time. The proposed model consists of three modules: the Spatial Motion Extractor (SME) module, which extracts regions of interest from a frame; the Short Temporal Extractor (STE) module, which extracts temporal characteristics of rapid movements; and the Global Temporal Extractor (GTE) module, which is responsible for identifying long-lasting temporal features and fine-tuning the model. The proposal was evaluated for its efficiency, effectiveness, and ability to operate in real time. The results obtained on the Hockey, Movies, and RWF-2000 datasets demonstrated that this approach is highly efficient compared to various alternatives. In addition, the VioPeru dataset was created, which contains violent and non-violent videos captured by real video surveillance cameras in Peru, to validate the real-time applicability of the model. When tested on this dataset, the effectiveness of our model was superior to the best existing models. Full article
Show Figures

Figure 1

16 pages, 4264 KiB  
Article
Design and Development of an Imitation Detection System for Human Action Recognition Using Deep Learning
by Noura Alhakbani, Maha Alghamdi and Abeer Al-Nafjan
Sensors 2023, 23(24), 9889; https://doi.org/10.3390/s23249889 - 18 Dec 2023
Viewed by 1319
Abstract
Human action recognition (HAR) is a rapidly growing field with numerous applications in various domains. HAR involves the development of algorithms and techniques to automatically identify and classify human actions from video data. Accurate recognition of human actions has significant implications in fields [...] Read more.
Human action recognition (HAR) is a rapidly growing field with numerous applications in various domains. HAR involves the development of algorithms and techniques to automatically identify and classify human actions from video data. Accurate recognition of human actions has significant implications in fields such as surveillance and sports analysis and in the health care domain. This paper presents a study on the design and development of an imitation detection system using an HAR algorithm based on deep learning. This study explores the use of deep learning models, such as a single-frame convolutional neural network (CNN) and pretrained VGG-16, for the accurate classification of human actions. The proposed models were evaluated using a benchmark dataset, KTH. The performance of these models was compared with that of classical classifiers, including K-Nearest Neighbors, Support Vector Machine, and Random Forest. The results showed that the VGG-16 model achieved higher accuracy than the single-frame CNN, with a 98% accuracy rate. Full article
Show Figures

Figure 1

19 pages, 6477 KiB  
Article
A Hybrid Deep Learning Emotion Classification System Using Multimodal Data
by Dong-Hwi Kim, Woo-Hyeok Son, Sung-Shin Kwak, Tae-Hyeon Yun, Ji-Hyeok Park and Jae-Dong Lee
Sensors 2023, 23(23), 9333; https://doi.org/10.3390/s23239333 - 22 Nov 2023
Cited by 1 | Viewed by 2469
Abstract
This paper proposes a hybrid deep learning emotion classification system (HDECS), a hybrid multimodal deep learning system designed for emotion classification in a specific national language. Emotion classification is important in diverse fields, including tailored corporate services, AI advancement, and more. Additionally, most [...] Read more.
This paper proposes a hybrid deep learning emotion classification system (HDECS), a hybrid multimodal deep learning system designed for emotion classification in a specific national language. Emotion classification is important in diverse fields, including tailored corporate services, AI advancement, and more. Additionally, most sentiment classification techniques in speaking situations are based on a single modality: voice, conversational text, vital signs, etc. However, analyzing these data presents challenges because of the variations in vocal intonation, text structures, and the impact of external stimuli on physiological signals. Korean poses challenges in natural language processing, including subject omission and spacing issues. To overcome these challenges and enhance emotion classification performance, this paper presents a case study using Korean multimodal data. The case study model involves retraining two pretrained models, LSTM and CNN, until their predictions on the entire dataset reach an agreement rate exceeding 0.75. Predictions are used to generate emotional sentences appended to script data, which are further processed using BERT for final emotion prediction. The research result is evaluated by using categorical cross-entropy (CCE) to measure the difference between the model’s predictions and actual labels, F1 score, and accuracy. According to the evaluation, the case model outperforms the existing KLUE/roBERTa model with improvements of 0.5 in CCE, 0.09 in accuracy, and 0.11 in F1 score. As a result, the HDECS is expected to perform well not only on Korean multimodal datasets but also on sentiment classification considering the speech characteristics of various languages and regions. Full article
Show Figures

Figure 1

16 pages, 857 KiB  
Article
Multi-View Human Action Recognition Using Skeleton Based-FineKNN with Extraneous Frame Scrapping Technique
by Najeeb ur Rehman Malik, Usman Ullah Sheikh, Syed Abdul Rahman Abu-Bakar and Asma Channa
Sensors 2023, 23(5), 2745; https://doi.org/10.3390/s23052745 - 2 Mar 2023
Cited by 10 | Viewed by 3218
Abstract
Human action recognition (HAR) is one of the most active research topics in the field of computer vision. Even though this area is well-researched, HAR algorithms such as 3D Convolution Neural Networks (CNN), Two-stream Networks, and CNN-LSTM (Long Short-Term Memory) suffer from highly [...] Read more.
Human action recognition (HAR) is one of the most active research topics in the field of computer vision. Even though this area is well-researched, HAR algorithms such as 3D Convolution Neural Networks (CNN), Two-stream Networks, and CNN-LSTM (Long Short-Term Memory) suffer from highly complex models. These algorithms involve a huge number of weights adjustments during the training phase, and as a consequence, require high-end configuration machines for real-time HAR applications. Therefore, this paper presents an extraneous frame scrapping technique that employs 2D skeleton features with a Fine-KNN classifier-based HAR system to overcome the dimensionality problems.To illustrate the efficacy of our proposed method, two contemporary datasets i.e., Multi-Camera Action Dataset (MCAD) and INRIA Xmas Motion Acquisition Sequences (IXMAS) dataset was used in experiment. We used the OpenPose technique to extract the 2D information, The proposed method was compared with CNN-LSTM, and other State of the art methods. Results obtained confirm the potential of our technique. The proposed OpenPose-FineKNN with Extraneous Frame Scrapping Technique achieved an accuracy of 89.75% on MCAD dataset and 90.97% on IXMAS dataset better than existing technique. Full article
Show Figures

Figure 1

15 pages, 2891 KiB  
Article
Imbalanced Text Sentiment Classification Based on Multi-Channel BLTCN-BLSTM Self-Attention
by Tiantian Cai and Xinsheng Zhang
Sensors 2023, 23(4), 2257; https://doi.org/10.3390/s23042257 - 17 Feb 2023
Cited by 7 | Viewed by 2566
Abstract
With the continuous expansion of the field of natural language processing, researchers have found that there is a phenomenon of imbalanced data distribution in some practical problems, and the excellent performance of most methods is based on the assumption that the samples in [...] Read more.
With the continuous expansion of the field of natural language processing, researchers have found that there is a phenomenon of imbalanced data distribution in some practical problems, and the excellent performance of most methods is based on the assumption that the samples in the dataset are data balanced. Therefore, the imbalanced data classification problem has gradually become a problem that needs to be studied. Aiming at the sentiment information mining of an imbalanced short text review dataset, this paper proposed a fusion multi-channel BLTCN-BLSTM self-attention sentiment classification method. By building a multi-channel BLTCN-BLSTM self-attention network model, the sample after word embedding processing is used as the input of the multi-channel, and after fully extracting features, the self-attention mechanism is fused to strengthen the sentiment to further fully extract text features. At the same time, focus loss rebalancing and classifier enhancement are combined to realize text sentiment predictions. The experimental results show that the optimal F1 value is up to 0.893 on the Chnsenticorp-HPL-10,000 corpus. The comparison and ablation of experimental results, including accuracy, recall, and F1-measure, show that the proposed model can fully integrate the weight of emotional feature words. It effectively improves the sentiment classification performance of imbalanced short-text review data. Full article
Show Figures

Figure 1

26 pages, 3557 KiB  
Article
Fake News Detection Model on Social Media by Leveraging Sentiment Analysis of News Content and Emotion Analysis of Users’ Comments
by Suhaib Kh. Hamed, Mohd Juzaiddin Ab Aziz and Mohd Ridzwan Yaakub
Sensors 2023, 23(4), 1748; https://doi.org/10.3390/s23041748 - 4 Feb 2023
Cited by 30 | Viewed by 9629
Abstract
Nowadays, social media has become the main source of news around the world. The spread of fake news on social networks has become a serious global issue, damaging many aspects, such as political, economic, and social aspects, and negatively affecting the lives of [...] Read more.
Nowadays, social media has become the main source of news around the world. The spread of fake news on social networks has become a serious global issue, damaging many aspects, such as political, economic, and social aspects, and negatively affecting the lives of citizens. Fake news often carries negative sentiments, and the public’s response to it carries the emotions of surprise, fear, and disgust. In this article, we extracted features based on sentiment analysis of news articles and emotion analysis of users’ comments regarding this news. These features were fed, along with the content feature of the news, to the proposed bidirectional long short-term memory model to detect fake news. We used the standard Fakeddit dataset that contains news titles and comments posted regarding them to train and test the proposed model. The suggested model, using extracted features, provided a high detection accuracy of 96.77% of the Area under the ROC Curve measure, which is higher than what other state-of-the-art studies offer. The results prove that the features extracted based on sentiment analysis of news, which represents the publisher’s stance, and emotion analysis of comments, which represent the crowd’s stance, contribute to raising the efficiency of the detection model. Full article
Show Figures

Figure 1

23 pages, 1288 KiB  
Article
Federated Meta-Learning with Attention for Diversity-Aware Human Activity Recognition
by Qiang Shen, Haotian Feng, Rui Song, Donglei Song and Hao Xu
Sensors 2023, 23(3), 1083; https://doi.org/10.3390/s23031083 - 17 Jan 2023
Cited by 9 | Viewed by 4311
Abstract
The ubiquity of smartphones equipped with multiple sensors has provided the possibility of automatically recognizing of human activity, which can benefit intelligent applications such as smart homes, health monitoring, and aging care. However, there are two major barriers to deploying an activity recognition [...] Read more.
The ubiquity of smartphones equipped with multiple sensors has provided the possibility of automatically recognizing of human activity, which can benefit intelligent applications such as smart homes, health monitoring, and aging care. However, there are two major barriers to deploying an activity recognition model in real-world scenarios. Firstly, deep learning models for activity recognition use a large amount of sensor data, which are privacy-sensitive and hence cannot be shared or uploaded to a centralized server. Secondly, divergence in the distribution of sensory data exists among multiple individuals due to their diverse behavioral patterns and lifestyles, which contributes to difficulty in recognizing activity for large-scale users or ’cold-starts’ for new users. To address these problems, we propose DivAR, a diversity-aware activity recognition framework based on a federated Meta-Learning architecture, which can extract general sensory features shared among individuals by a centralized embedding network and individual-specific features by attention module in each decentralized network. Specifically, we first classify individuals into multiple clusters according to their behavioral patterns and social factors. We then apply meta-learning in the architecture of federated learning, where a centralized meta-model learns common feature representations that can be transferred across all clusters of individuals, and multiple decentralized cluster-specific models are utilized to learn cluster-specific features. For each cluster-specific model, a CNN-based attention module learns cluster-specific features from the global model. In this way, by training with sensory data locally, privacy-sensitive information existing in sensory data can be preserved. To evaluate the model, we conduct two data collection experiments by collecting sensor readings from naturally used smartphones annotated with activity information in the real-life environment and constructing two multi-individual heterogeneous datasets. In addition, social characteristics including personality, mental health state, and behavior patterns are surveyed using questionnaires. Finally, extensive empirical results demonstrate that the proposed diversity-aware activity recognition model has a relatively better generalization ability and achieves competitive performance on multi-individual activity recognition tasks. Full article
Show Figures

Figure 1

24 pages, 5225 KiB  
Article
Human Gait Activity Recognition Machine Learning Methods
by Jan Slemenšek, Iztok Fister, Jelka Geršak, Božidar Bratina, Vesna Marija van Midden, Zvezdan Pirtošek and Riko Šafarič
Sensors 2023, 23(2), 745; https://doi.org/10.3390/s23020745 - 9 Jan 2023
Cited by 21 | Viewed by 7336
Abstract
Human gait activity recognition is an emerging field of motion analysis that can be applied in various application domains. One of the most attractive applications includes monitoring of gait disorder patients, tracking their disease progression and the modification/evaluation of drugs. This paper proposes [...] Read more.
Human gait activity recognition is an emerging field of motion analysis that can be applied in various application domains. One of the most attractive applications includes monitoring of gait disorder patients, tracking their disease progression and the modification/evaluation of drugs. This paper proposes a robust, wearable gait motion data acquisition system that allows either the classification of recorded gait data into desirable activities or the identification of common risk factors, thus enhancing the subject’s quality of life. Gait motion information was acquired using accelerometers and gyroscopes mounted on the lower limbs, where the sensors were exposed to inertial forces during gait. Additionally, leg muscle activity was measured using strain gauge sensors. As a matter of fact, we wanted to identify different gait activities within each gait recording by utilizing Machine Learning algorithms. In line with this, various Machine Learning methods were tested and compared to establish the best-performing algorithm for the classification of the recorded gait information. The combination of attention-based convolutional and recurrent neural networks algorithms outperformed the other tested algorithms and was individually tested further on the datasets of five subjects and delivered the following averaged results of classification: 98.9% accuracy, 96.8% precision, 97.8% sensitivity, 99.1% specificity and 97.3% F1-score. Moreover, the algorithm’s robustness was also verified with the successful detection of freezing gait episodes in a Parkinson’s disease patient. The results of this study indicate a feasible gait event classification method capable of complete algorithm personalization. Full article
Show Figures

Graphical abstract

21 pages, 4341 KiB  
Article
STC-NLSTMNet: An Improved Human Activity Recognition Method Using Convolutional Neural Network with NLSTM from WiFi CSI
by Md Shafiqul Islam, Mir Kanon Ara Jannat, Mohammad Nahid Hossain, Woo-Su Kim, Soo-Wook Lee and Sung-Hyun Yang
Sensors 2023, 23(1), 356; https://doi.org/10.3390/s23010356 - 29 Dec 2022
Cited by 17 | Viewed by 3479
Abstract
Human activity recognition (HAR) has emerged as a significant area of research due to its numerous possible applications, including ambient assisted living, healthcare, abnormal behaviour detection, etc. Recently, HAR using WiFi channel state information (CSI) has become a predominant and unique approach in [...] Read more.
Human activity recognition (HAR) has emerged as a significant area of research due to its numerous possible applications, including ambient assisted living, healthcare, abnormal behaviour detection, etc. Recently, HAR using WiFi channel state information (CSI) has become a predominant and unique approach in indoor environments compared to others (i.e., sensor and vision) due to its privacy-preserving qualities, thereby eliminating the need to carry additional devices and providing flexibility of capture motions in both line-of-sight (LOS) and non-line-of-sight (NLOS) settings. Existing deep learning (DL)-based HAR approaches usually extract either temporal or spatial features and lack adequate means to integrate and utilize the two simultaneously, making it challenging to recognize different activities accurately. Motivated by this, we propose a novel DL-based model named spatio-temporal convolution with nested long short-term memory (STC-NLSTMNet), with the ability to extract spatial and temporal features concurrently and automatically recognize human activity with very high accuracy. The proposed STC-NLSTMNet model is mainly comprised of depthwise separable convolution (DS-Conv) blocks, feature attention module (FAM) and NLSTM. The DS-Conv blocks extract the spatial features from the CSI signal and add feature attention modules (FAM) to draw attention to the most essential features. These robust features are fed into NLSTM as inputs to explore the hidden intrinsic temporal features in CSI signals. The proposed STC-NLSTMNet model is evaluated using two publicly available datasets: Multi-environment and StanWiFi. The experimental results revealed that the STC-NLSTMNet model achieved activity recognition accuracies of 98.20% and 99.88% on Multi-environment and StanWiFi datasets, respectively. Its activity recognition performance is also compared with other existing approaches and our proposed STC-NLSTMNet model significantly improves the activity recognition accuracies by 4% and 1.88%, respectively, compared to the best existing method. Full article
Show Figures

Figure 1

12 pages, 4026 KiB  
Article
Real-Time Human Activity Recognition with IMU and Encoder Sensors in Wearable Exoskeleton Robot via Deep Learning Networks
by Ismael Espinoza Jaramillo, Jin Gyun Jeong, Patricio Rivera Lopez, Choong-Ho Lee, Do-Yeon Kang, Tae-Jun Ha, Ji-Heon Oh, Hwanseok Jung, Jin Hyuk Lee, Won Hee Lee and Tae-Seong Kim
Sensors 2022, 22(24), 9690; https://doi.org/10.3390/s22249690 - 10 Dec 2022
Cited by 18 | Viewed by 6382
Abstract
Wearable exoskeleton robots have become a promising technology for supporting human motions in multiple tasks. Activity recognition in real-time provides useful information to enhance the robot’s control assistance for daily tasks. This work implements a real-time activity recognition system based on the activity [...] Read more.
Wearable exoskeleton robots have become a promising technology for supporting human motions in multiple tasks. Activity recognition in real-time provides useful information to enhance the robot’s control assistance for daily tasks. This work implements a real-time activity recognition system based on the activity signals of an inertial measurement unit (IMU) and a pair of rotary encoders integrated into the exoskeleton robot. Five deep learning models have been trained and evaluated for activity recognition. As a result, a subset of optimized deep learning models was transferred to an edge device for real-time evaluation in a continuous action environment using eight common human tasks: stand, bend, crouch, walk, sit-down, sit-up, and ascend and descend stairs. These eight robot wearer’s activities are recognized with an average accuracy of 97.35% in real-time tests, with an inference time under 10 ms and an overall latency of 0.506 s per recognition using the selected edge device. Full article
Show Figures

Figure 1

24 pages, 1020 KiB  
Article
Applying Self-Supervised Representation Learning for Emotion Recognition Using Physiological Signals
by Kevin G. Montero Quispe, Daniel M. S. Utyiama, Eulanda M. dos Santos, Horácio A. B. F. Oliveira and Eduardo J. P. Souto
Sensors 2022, 22(23), 9102; https://doi.org/10.3390/s22239102 - 23 Nov 2022
Cited by 13 | Viewed by 4386
Abstract
The use of machine learning (ML) techniques in affective computing applications focuses on improving the user experience in emotion recognition. The collection of input data (e.g., physiological signals), together with expert annotations are part of the established standard supervised learning methodology used to [...] Read more.
The use of machine learning (ML) techniques in affective computing applications focuses on improving the user experience in emotion recognition. The collection of input data (e.g., physiological signals), together with expert annotations are part of the established standard supervised learning methodology used to train human emotion recognition models. However, these models generally require large amounts of labeled data, which is expensive and impractical in the healthcare context, in which data annotation requires even more expert knowledge. To address this problem, this paper explores the use of the self-supervised learning (SSL) paradigm in the development of emotion recognition methods. This approach makes it possible to learn representations directly from unlabeled signals and subsequently use them to classify affective states. This paper presents the key concepts of emotions and how SSL methods can be applied to recognize affective states. We experimentally analyze and compare self-supervised and fully supervised training of a convolutional neural network designed to recognize emotions. The experimental results using three emotion datasets demonstrate that self-supervised representations can learn widely useful features that improve data efficiency, are widely transferable, are competitive when compared to their fully supervised counterparts, and do not require the data to be labeled for learning. Full article
Show Figures

Figure 1

27 pages, 9950 KiB  
Article
SDHAR-HOME: A Sensor Dataset for Human Activity Recognition at Home
by Raúl Gómez Ramos, Jaime Duque Domingo, Eduardo Zalama, Jaime Gómez-García-Bermejo and Joaquín López
Sensors 2022, 22(21), 8109; https://doi.org/10.3390/s22218109 - 23 Oct 2022
Cited by 13 | Viewed by 5769
Abstract
Nowadays, one of the most important objectives in health research is the improvement of the living conditions and well-being of the elderly, especially those who live alone. These people may experience undesired or dangerous situations in their daily life at home due to [...] Read more.
Nowadays, one of the most important objectives in health research is the improvement of the living conditions and well-being of the elderly, especially those who live alone. These people may experience undesired or dangerous situations in their daily life at home due to physical, sensorial or cognitive limitations, such as forgetting their medication or wrong eating habits. This work focuses on the development of a database in a home, through non-intrusive technology, where several users are residing by combining: a set of non-intrusive sensors which captures events that occur in the house, a positioning system through triangulation using beacons and a system for monitoring the user’s state through activity wristbands. Two months of uninterrupted measurements were obtained on the daily habits of 2 people who live with a pet and receive sporadic visits, in which 18 different types of activities were labelled. In order to validate the data, a system for the real-time recognition of the activities carried out by these residents was developed using different current Deep Learning (DL) techniques based on neural networks, such as Recurrent Neural Networks (RNN), Long Short-Term Memory networks (LSTM) or Gated Recurrent Unit networks (GRU). A personalised prediction model was developed for each user, resulting in hit rates ranging from 88.29% to 90.91%. Finally, a data sharing algorithm has been developed to improve the generalisability of the model and to avoid overtraining the neural network. Full article
Show Figures

Figure 1

21 pages, 5598 KiB  
Article
Kids’ Emotion Recognition Using Various Deep-Learning Models with Explainable AI
by Manish Rathod, Chirag Dalvi, Kulveen Kaur, Shruti Patil, Shilpa Gite, Pooja Kamat, Ketan Kotecha, Ajith Abraham and Lubna Abdelkareim Gabralla
Sensors 2022, 22(20), 8066; https://doi.org/10.3390/s22208066 - 21 Oct 2022
Cited by 12 | Viewed by 6587
Abstract
Human ideas and sentiments are mirrored in facial expressions. They give the spectator a plethora of social cues, such as the viewer’s focus of attention, intention, motivation, and mood, which can help develop better interactive solutions in online platforms. This could be helpful [...] Read more.
Human ideas and sentiments are mirrored in facial expressions. They give the spectator a plethora of social cues, such as the viewer’s focus of attention, intention, motivation, and mood, which can help develop better interactive solutions in online platforms. This could be helpful for children while teaching them, which could help in cultivating a better interactive connect between teachers and students, since there is an increasing trend toward the online education platform due to the COVID-19 pandemic. To solve this, the authors proposed kids’ emotion recognition based on visual cues in this research with a justified reasoning model of explainable AI. The authors used two datasets to work on this problem; the first is the LIRIS Children Spontaneous Facial Expression Video Database, and the second is an author-created novel dataset of emotions displayed by children aged 7 to 10. The authors identified that the LIRIS dataset has achieved only 75% accuracy, and no study has worked further on this dataset in which the authors have achieved the highest accuracy of 89.31% and, in the authors’ dataset, an accuracy of 90.98%. The authors also realized that the face construction of children and adults is different, and the way children show emotions is very different and does not always follow the same way of facial expression for a specific emotion as compared with adults. Hence, the authors used 3D 468 landmark points and created two separate versions of the dataset from the original selected datasets, which are LIRIS-Mesh and Authors-Mesh. In total, all four types of datasets were used, namely LIRIS, the authors’ dataset, LIRIS-Mesh, and Authors-Mesh, and a comparative analysis was performed by using seven different CNN models. The authors not only compared all dataset types used on different CNN models but also explained for every type of CNN used on every specific dataset type how test images are perceived by the deep-learning models by using explainable artificial intelligence (XAI), which helps in localizing features contributing to particular emotions. The authors used three methods of XAI, namely Grad-CAM, Grad-CAM++, and SoftGrad, which help users further establish the appropriate reason for emotion detection by knowing the contribution of its features in it. Full article
Show Figures

Figure 1

19 pages, 8965 KiB  
Article
Interpretable Passive Multi-Modal Sensor Fusion for Human Identification and Activity Recognition
by Liangqi Yuan, Jack Andrews, Huaizheng Mu, Asad Vakil, Robert Ewing, Erik Blasch and Jia Li
Sensors 2022, 22(15), 5787; https://doi.org/10.3390/s22155787 - 3 Aug 2022
Cited by 24 | Viewed by 3603
Abstract
Human monitoring applications in indoor environments depend on accurate human identification and activity recognition (HIAR). Single modality sensor systems have shown to be accurate for HIAR, but there are some shortcomings to these systems, such as privacy, intrusion, and costs. To combat these [...] Read more.
Human monitoring applications in indoor environments depend on accurate human identification and activity recognition (HIAR). Single modality sensor systems have shown to be accurate for HIAR, but there are some shortcomings to these systems, such as privacy, intrusion, and costs. To combat these shortcomings for a long-term monitoring solution, an interpretable, passive, multi-modal, sensor fusion system PRF-PIR is proposed in this work. PRF-PIR is composed of one software-defined radio (SDR) device and one novel passive infrared (PIR) sensor system. A recurrent neural network (RNN) is built as the HIAR model for this proposed solution to handle the temporal dependence of passive information captured by both modalities. We validate our proposed PRF-PIR system for a potential human monitoring system through the data collection of eleven activities from twelve human subjects in an academic office environment. From our data collection, the efficacy of the sensor fusion system is proven via an accuracy of 0.9866 for human identification and an accuracy of 0.9623 for activity recognition. The results of the system are supported with explainable artificial intelligence (XAI) methodologies to serve as a validation for sensor fusion over the deployment of single sensor solutions. PRF-PIR provides a passive, non-intrusive, and highly accurate system that allows for robustness in uncertain, highly similar, and complex at-home activities performed by a variety of human subjects. Full article
Show Figures

Figure 1

19 pages, 6800 KiB  
Article
Color Design Decisions for Ceramic Products Based on Quantification of Perceptual Characteristics
by Yi Wang, Qinxin Zhao, Jian Chen, Weiwei Wang, Suihuai Yu and Xiaoyan Yang
Sensors 2022, 22(14), 5415; https://doi.org/10.3390/s22145415 - 20 Jul 2022
Cited by 8 | Viewed by 3303
Abstract
The appearance characteristics of ceramic color are an important factor in determining the user’s aesthetic perception of the product. Given the problem that ceramic color varies and the user’s visual sensory evaluation of color is highly subjective and uncertain, a method of quantifying [...] Read more.
The appearance characteristics of ceramic color are an important factor in determining the user’s aesthetic perception of the product. Given the problem that ceramic color varies and the user’s visual sensory evaluation of color is highly subjective and uncertain, a method of quantifying ceramic color characteristics based on the Back Propagation (BP) neural network algorithm is proposed. The semantic difference method and statistical method were used to obtain quantified data from ceramic color perceptual semantic features and were combined with a neural network to study the association between ceramic color features and user perceptual-cognitive evaluation. A BP neural network was used to build a ceramic color perceptual semantic mapping model, using color semantic quantified values as the input layer, color L, A, and B component values as the output layer, and model training to predict the sample. The output color L, A, and B components are used as the input layer and the color scheme was designed. The above method can effectively solve the mapping problem between the appearance characteristics of ceramic color and perceptual semantics and provide a decision basis for ceramic product color design. The case application of color design of daily-use ceramic products was conducted to verify the effectiveness and feasibility of the quantitative research method of ceramic color imagery. Full article
Show Figures

Figure 1

18 pages, 3177 KiB  
Article
Machine Learning Algorithms for Detection and Classifications of Emotions in Contact Center Applications
by Mirosław Płaza, Sławomir Trusz, Justyna Kęczkowska, Ewa Boksa, Sebastian Sadowski and Zbigniew Koruba
Sensors 2022, 22(14), 5311; https://doi.org/10.3390/s22145311 - 15 Jul 2022
Cited by 18 | Viewed by 4733
Abstract
Over the past few years, virtual assistant solutions used in Contact Center systems are gaining popularity. One of the main tasks of the virtual assistant is to recognize the intentions of the customer. It is important to note that quite often the actual [...] Read more.
Over the past few years, virtual assistant solutions used in Contact Center systems are gaining popularity. One of the main tasks of the virtual assistant is to recognize the intentions of the customer. It is important to note that quite often the actual intention expressed in a conversation is also directly influenced by the emotions that accompany that conversation. Unfortunately, scientific literature has not identified what specific types of emotions in Contact Center applications are relevant to the activities they perform. Therefore, the main objective of this work was to develop an Emotion Classification for Machine Detection of Affect-Tinged Conversational Contents dedicated directly to the Contact Center industry. In the conducted study, Contact Center voice and text channels were considered, taking into account the following families of emotions: anger, fear, happiness, sadness vs. affective neutrality of the statements. The obtained results confirmed the usefulness of the proposed classification—for the voice channel, the highest efficiency was obtained using the Convolutional Neural Network (accuracy, 67.5%; precision, 80.3; F1-Score, 74.5%), while for the text channel, the Support Vector Machine algorithm proved to be the most efficient (accuracy, 65.9%; precision, 58.5; F1-Score, 61.7%). Full article
Show Figures

Figure 1

29 pages, 97559 KiB  
Article
Weakly Supervised Violence Detection in Surveillance Video
by David Choqueluque-Roman and Guillermo Camara-Chavez
Sensors 2022, 22(12), 4502; https://doi.org/10.3390/s22124502 - 14 Jun 2022
Cited by 15 | Viewed by 5230
Abstract
Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring [...] Read more.
Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then, given an input video and the action proposals, we extract spatiotemporal features using deep neural networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube as violent or non-violent. We obtain similar results to the state of the art in three public databases Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively. Full article
Show Figures

Figure 1

19 pages, 2959 KiB  
Article
Neural Networks for Automatic Posture Recognition in Ambient-Assisted Living
by Bruna Maria Vittoria Guerra, Micaela Schmid, Giorgio Beltrami and Stefano Ramat
Sensors 2022, 22(7), 2609; https://doi.org/10.3390/s22072609 - 29 Mar 2022
Cited by 7 | Viewed by 2247
Abstract
Human Action Recognition (HAR) is a rapidly evolving field impacting numerous domains, among which is Ambient Assisted Living (AAL). In such a context, the aim of HAR is meeting the needs of frail individuals, whether elderly and/or disabled and promoting autonomous, safe and [...] Read more.
Human Action Recognition (HAR) is a rapidly evolving field impacting numerous domains, among which is Ambient Assisted Living (AAL). In such a context, the aim of HAR is meeting the needs of frail individuals, whether elderly and/or disabled and promoting autonomous, safe and secure living. To this goal, we propose a monitoring system detecting dangerous situations by classifying human postures through Artificial Intelligence (AI) solutions. The developed algorithm works on a set of features computed from the skeleton data provided by four Kinect One systems simultaneously recording the scene from different angles and identifying the posture of the subject in an ecological context within each recorded frame. Here, we compare the recognition abilities of Multi-Layer Perceptron (MLP) and Long-Short Term Memory (LSTM) Sequence networks. Starting from the set of previously selected features we performed a further feature selection based on an SVM algorithm for the optimization of the MLP network and used a genetic algorithm for selecting the features for the LSTM sequence model. We then optimized the architecture and hyperparameters of both models before comparing their performances. The best MLP model (3 hidden layers and a Softmax output layer) achieved 78.4%, while the best LSTM (2 bidirectional LSTM layers, 2 dropout and a fully connected layer) reached 85.7%. The analysis of the performances on individual classes highlights the better suitability of the LSTM approach. Full article
Show Figures

Figure 1

18 pages, 9845 KiB  
Article
Wearable Sensor-Based Human Activity Recognition with Transformer Model
by Iveta Dirgová Luptáková, Martin Kubovčík and Jiří Pospíchal
Sensors 2022, 22(5), 1911; https://doi.org/10.3390/s22051911 - 1 Mar 2022
Cited by 107 | Viewed by 11624
Abstract
Computing devices that can recognize various human activities or movements can be used to assist people in healthcare, sports, or human–robot interaction. Readily available data for this purpose can be obtained from the accelerometer and the gyroscope built into everyday smartphones. Effective classification [...] Read more.
Computing devices that can recognize various human activities or movements can be used to assist people in healthcare, sports, or human–robot interaction. Readily available data for this purpose can be obtained from the accelerometer and the gyroscope built into everyday smartphones. Effective classification of real-time activity data is, therefore, actively pursued using various machine learning methods. In this study, the transformer model, a deep learning neural network model developed primarily for the natural language processing and vision tasks, was adapted for a time-series analysis of motion signals. The self-attention mechanism inherent in the transformer, which expresses individual dependencies between signal values within a time series, can match the performance of state-of-the-art convolutional neural networks with long short-term memory. The performance of the proposed adapted transformer method was tested on the largest available public dataset of smartphone motion sensor data covering a wide range of activities, and obtained an average identification accuracy of 99.2% as compared with 89.67% achieved on the same data by a conventional machine learning method. The results suggest the expected future relevance of the transformer model for human activity recognition. Full article
Show Figures

Graphical abstract

26 pages, 8223 KiB  
Article
New Sensor Data Structuring for Deeper Feature Extraction in Human Activity Recognition
by Tsige Tadesse Alemayoh, Jae Hoon Lee and Shingo Okamoto
Sensors 2021, 21(8), 2814; https://doi.org/10.3390/s21082814 - 16 Apr 2021
Cited by 30 | Viewed by 7713
Abstract
For the effective application of thriving human-assistive technologies in healthcare services and human–robot collaborative tasks, computing devices must be aware of human movements. Developing a reliable real-time activity recognition method for the continuous and smooth operation of such smart devices is imperative. To [...] Read more.
For the effective application of thriving human-assistive technologies in healthcare services and human–robot collaborative tasks, computing devices must be aware of human movements. Developing a reliable real-time activity recognition method for the continuous and smooth operation of such smart devices is imperative. To achieve this, light and intelligent methods that use ubiquitous sensors are pivotal. In this study, with the correlation of time series data in mind, a new method of data structuring for deeper feature extraction is introduced herein. The activity data were collected using a smartphone with the help of an exclusively developed iOS application. Data from eight activities were shaped into single and double-channels to extract deep temporal and spatial features of the signals. In addition to the time domain, raw data were represented via the Fourier and wavelet domains. Among the several neural network models used to fit the deep-learning classification of the activities, a convolutional neural network with a double-channeled time-domain input performed well. This method was further evaluated using other public datasets, and better performance was obtained. The practicability of the trained model was finally tested on a computer and a smartphone in real-time, where it demonstrated promising results. Full article
Show Figures

Figure 1

29 pages, 5303 KiB  
Article
DRER: Deep Learning–Based Driver’s Real Emotion Recognizer
by Geesung Oh, Junghwan Ryu, Euiseok Jeong, Ji Hyun Yang, Sungwook Hwang, Sangho Lee and Sejoon Lim
Sensors 2021, 21(6), 2166; https://doi.org/10.3390/s21062166 - 19 Mar 2021
Cited by 34 | Viewed by 7193
Abstract
In intelligent vehicles, it is essential to monitor the driver’s condition; however, recognizing the driver’s emotional state is one of the most challenging and important tasks. Most previous studies focused on facial expression recognition to monitor the driver’s emotional state. However, while driving, [...] Read more.
In intelligent vehicles, it is essential to monitor the driver’s condition; however, recognizing the driver’s emotional state is one of the most challenging and important tasks. Most previous studies focused on facial expression recognition to monitor the driver’s emotional state. However, while driving, many factors are preventing the drivers from revealing the emotions on their faces. To address this problem, we propose a deep learning-based driver’s real emotion recognizer (DRER), which is a deep learning-based algorithm to recognize the drivers’ real emotions that cannot be completely identified based on their facial expressions. The proposed algorithm comprises of two models: (i) facial expression recognition model, which refers to the state-of-the-art convolutional neural network structure; and (ii) sensor fusion emotion recognition model, which fuses the recognized state of facial expressions with electrodermal activity, a bio-physiological signal representing electrical characteristics of the skin, in recognizing even the driver’s real emotional state. Hence, we categorized the driver’s emotion and conducted human-in-the-loop experiments to acquire the data. Experimental results show that the proposed fusing approach achieves 114% increase in accuracy compared to using only the facial expressions and 146% increase in accuracy compare to using only the electrodermal activity. In conclusion, our proposed method achieves 86.8% recognition accuracy in recognizing the driver’s induced emotion while driving situation. Full article
Show Figures

Figure 1

16 pages, 1935 KiB  
Article
Skeleton-Based Emotion Recognition Based on Two-Stream Self-Attention Enhanced Spatial-Temporal Graph Convolutional Network
by Jiaqi Shi, Chaoran Liu, Carlos Toshinori Ishi and Hiroshi Ishiguro
Sensors 2021, 21(1), 205; https://doi.org/10.3390/s21010205 - 30 Dec 2020
Cited by 24 | Viewed by 5440
Abstract
Emotion recognition has drawn consistent attention from researchers recently. Although gesture modality plays an important role in expressing emotion, it is seldom considered in the field of emotion recognition. A key reason is the scarcity of labeled data containing 3D skeleton data. Some [...] Read more.
Emotion recognition has drawn consistent attention from researchers recently. Although gesture modality plays an important role in expressing emotion, it is seldom considered in the field of emotion recognition. A key reason is the scarcity of labeled data containing 3D skeleton data. Some studies in action recognition have applied graph-based neural networks to explicitly model the spatial connection between joints. However, this method has not been considered in the field of gesture-based emotion recognition, so far. In this work, we applied a pose estimation based method to extract 3D skeleton coordinates for IEMOCAP database. We propose a self-attention enhanced spatial temporal graph convolutional network for skeleton-based emotion recognition, in which the spatial convolutional part models the skeletal structure of the body as a static graph, and the self-attention part dynamically constructs more connections between the joints and provides supplementary information. Our experiment demonstrates that the proposed model significantly outperforms other models and that the features of the extracted skeleton data improve the performance of multimodal emotion recognition. Full article
Show Figures

Figure 1

20 pages, 2803 KiB  
Article
Optimizing Sensor Deployment for Multi-Sensor-Based HAR System with Improved Glowworm Swarm Optimization Algorithm
by Yiming Tian and Jie Zhang
Sensors 2020, 20(24), 7161; https://doi.org/10.3390/s20247161 - 14 Dec 2020
Cited by 7 | Viewed by 2841
Abstract
Human activity recognition (HAR) technology that analyzes and fuses the data acquired from various homogeneous or heterogeneous sensor sources has motivated the development of enormous human-centered applications such as healthcare, fitness, ambient assisted living and rehabilitation. The concurrent use of multiple sensor sources [...] Read more.
Human activity recognition (HAR) technology that analyzes and fuses the data acquired from various homogeneous or heterogeneous sensor sources has motivated the development of enormous human-centered applications such as healthcare, fitness, ambient assisted living and rehabilitation. The concurrent use of multiple sensor sources for HAR is a good choice because the plethora of user information provided by the various sensor sources may be useful. However, a multi-sensor system with too many sensors will bring large power consumption and some sensor sources may bring little improvements to the performance. Therefore, the multi-sensor deployment research that can gain a tradeoff among computational complexity and performance is imperative. In this paper, we propose a multi-sensor-based HAR system whose sensor deployment can be optimized by selective ensemble approaches. With respect to optimization of the sensor deployment, an improved binary glowworm swarm optimization (IBGSO) algorithm is proposed and the sensor sources that have a significant effect on the performance of HAR are selected. Furthermore, the ensemble learning system based on optimized sensor deployment is constructed for HAR. Experimental results on two datasets show that the proposed IBGSO-based multi-sensor deployment approach can select a smaller number of sensor sources while achieving better performance than the ensemble of all sensors and other optimization-based selective ensemble approaches. Full article
Show Figures

Figure 1

18 pages, 3816 KiB  
Article
Gait-Based Identification Using Deep Recurrent Neural Networks and Acceleration Patterns
by Angel Peinado-Contreras and Mario Munoz-Organero
Sensors 2020, 20(23), 6900; https://doi.org/10.3390/s20236900 - 3 Dec 2020
Cited by 11 | Viewed by 3459
Abstract
This manuscript presents an approach to the challenge of biometric identification based on the acceleration patterns generated by a user while walking. The proposed approach uses the data captured by a smartphone’s accelerometer and gyroscope sensors while the users perform the gait activity [...] Read more.
This manuscript presents an approach to the challenge of biometric identification based on the acceleration patterns generated by a user while walking. The proposed approach uses the data captured by a smartphone’s accelerometer and gyroscope sensors while the users perform the gait activity and optimizes the design of a recurrent neural network (RNN) to optimally learn the features that better characterize each individual. The database is composed of 15 users, and the acceleration data provided has a tri-axial format in the X-Y-Z axes. Data are pre-processed to estimate the vertical acceleration (in the direction of the gravity force). A deep recurrent neural network model consisting of LSTM cells divided into several layers and dense output layers is used for user recognition. The precision results obtained by the final architecture are above 97% in most executions. The proposed deep neural network-based architecture is tested in different scenarios to check its efficiency and robustness. Full article
Show Figures

Figure 1

28 pages, 3876 KiB  
Article
Smartphone Motion Sensor-Based Complex Human Activity Identification Using Deep Stacked Autoencoder Algorithm for Enhanced Smart Healthcare System
by Uzoma Rita Alo, Henry Friday Nweke, Ying Wah Teh and Ghulam Murtaza
Sensors 2020, 20(21), 6300; https://doi.org/10.3390/s20216300 - 5 Nov 2020
Cited by 23 | Viewed by 4139
Abstract
Human motion analysis using a smartphone-embedded accelerometer sensor provided important context for the identification of static, dynamic, and complex sequence of activities. Research in smartphone-based motion analysis are implemented for tasks, such as health status monitoring, fall detection and prevention, energy expenditure estimation, [...] Read more.
Human motion analysis using a smartphone-embedded accelerometer sensor provided important context for the identification of static, dynamic, and complex sequence of activities. Research in smartphone-based motion analysis are implemented for tasks, such as health status monitoring, fall detection and prevention, energy expenditure estimation, and emotion detection. However, current methods, in this regard, assume that the device is tightly attached to a pre-determined position and orientation, which might cause performance degradation in accelerometer data due to changing orientation. Therefore, it is challenging to accurately and automatically identify activity details as a result of the complexity and orientation inconsistencies of the smartphone. Furthermore, the current activity identification methods utilize conventional machine learning algorithms that are application dependent. Moreover, it is difficult to model the hierarchical and temporal dynamic nature of the current, complex, activity identification process. This paper aims to propose a deep stacked autoencoder algorithm, and orientation invariant features, for complex human activity identification. The proposed approach is made up of various stages. First, we computed the magnitude norm vector and rotation feature (pitch and roll angles) to augment the three-axis dimensions (3-D) of the accelerometer sensor. Second, we propose a deep stacked autoencoder based deep learning algorithm to automatically extract compact feature representation from the motion sensor data. The results show that the proposed integration of the deep learning algorithm, and orientation invariant features, can accurately recognize complex activity details using only smartphone accelerometer data. The proposed deep stacked autoencoder method achieved 97.13% identification accuracy compared to the conventional machine learning methods and the deep belief network algorithm. The results suggest the impact of the proposed method to improve a smartphone-based complex human activity identification framework. Full article
Show Figures

Figure 1

20 pages, 5688 KiB  
Article
A Deep Machine Learning Method for Concurrent and Interleaved Human Activity Recognition
by Keshav Thapa, Zubaer Md. Abdullah Al, Barsha Lamichhane and Sung-Hyun Yang
Sensors 2020, 20(20), 5770; https://doi.org/10.3390/s20205770 - 12 Oct 2020
Cited by 29 | Viewed by 4303
Abstract
Human activity recognition has become an important research topic within the field of pervasive computing, ambient assistive living (AAL), robotics, health-care monitoring, and many more. Techniques for recognizing simple and single activities are typical for now, but recognizing complex activities such as concurrent [...] Read more.
Human activity recognition has become an important research topic within the field of pervasive computing, ambient assistive living (AAL), robotics, health-care monitoring, and many more. Techniques for recognizing simple and single activities are typical for now, but recognizing complex activities such as concurrent and interleaving activity is still a major challenging issue. In this paper, we propose a two-phase hybrid deep machine learning approach using bi-directional Long-Short Term Memory (BiLSTM) and Skip-Chain Conditional random field (SCCRF) to recognize the complex activity. BiLSTM is a sequential generative deep learning inherited from Recurrent Neural Network (RNN). SCCRFs is a distinctive feature of conditional random field (CRF) that can represent long term dependencies. In the first phase of the proposed approach, we recognized the concurrent activities using the BiLSTM technique, and in the second phase, SCCRF identifies the interleaved activity. Accuracy of the proposed framework against the counterpart state-of-art methods using the publicly available datasets in a smart home environment is analyzed. Our experiment’s result surpasses the previously proposed approaches with an average accuracy of more than 93%. Full article
Show Figures

Figure 1

15 pages, 1174 KiB  
Article
Multi-Modality Emotion Recognition Model with GAT-Based Multi-Head Inter-Modality Attention
by Changzeng Fu, Chaoran Liu, Carlos Toshinori Ishi and Hiroshi Ishiguro
Sensors 2020, 20(17), 4894; https://doi.org/10.3390/s20174894 - 29 Aug 2020
Cited by 10 | Viewed by 4235
Abstract
Emotion recognition has been gaining attention in recent years due to its applications on artificial agents. To achieve a good performance with this task, much research has been conducted on the multi-modality emotion recognition model for leveraging the different strengths of each modality. [...] Read more.
Emotion recognition has been gaining attention in recent years due to its applications on artificial agents. To achieve a good performance with this task, much research has been conducted on the multi-modality emotion recognition model for leveraging the different strengths of each modality. However, a research question remains: what exactly is the most appropriate way to fuse the information from different modalities? In this paper, we proposed audio sample augmentation and an emotion-oriented encoder-decoder to improve the performance of emotion recognition and discussed an inter-modality, decision-level fusion method based on a graph attention network (GAT). Compared to the baseline, our model improved the weighted average F1-scores from 64.18 to 68.31% and the weighted average accuracy from 65.25 to 69.88%. Full article
Show Figures

Figure 1

20 pages, 1980 KiB  
Article
Feature Selection on 2D and 3D Geometric Features to Improve Facial Expression Recognition
by Vianney Perez-Gomez, Homero V. Rios-Figueroa, Ericka Janet Rechy-Ramirez, Efrén Mezura-Montes and Antonio Marin-Hernandez
Sensors 2020, 20(17), 4847; https://doi.org/10.3390/s20174847 - 27 Aug 2020
Cited by 12 | Viewed by 4139
Abstract
An essential aspect in the interaction between people and computers is the recognition of facial expressions. A key issue in this process is to select relevant features to classify facial expressions accurately. This study examines the selection of optimal geometric features to classify [...] Read more.
An essential aspect in the interaction between people and computers is the recognition of facial expressions. A key issue in this process is to select relevant features to classify facial expressions accurately. This study examines the selection of optimal geometric features to classify six basic facial expressions: happiness, sadness, surprise, fear, anger, and disgust. Inspired by the Facial Action Coding System (FACS) and the Moving Picture Experts Group 4th standard (MPEG-4), an initial set of 89 features was proposed. These features are normalized distances and angles in 2D and 3D computed from 22 facial landmarks. To select a minimum set of features with the maximum classification accuracy, two selection methods and four classifiers were tested. The first selection method, principal component analysis (PCA), obtained 39 features. The second selection method, a genetic algorithm (GA), obtained 47 features. The experiments ran on the Bosphorus and UIVBFED data sets with 86.62% and 93.92% median accuracy, respectively. Our main finding is that the reduced feature set obtained by the GA is the smallest in comparison with other methods of comparable accuracy. This has implications in reducing the time of recognition. Full article
Show Figures

Figure 1

17 pages, 2556 KiB  
Article
Energy-Guided Temporal Segmentation Network for Multimodal Human Action Recognition
by Qiang Liu, Enqing Chen, Lei Gao, Chengwu Liang and Hao Liu
Sensors 2020, 20(17), 4673; https://doi.org/10.3390/s20174673 - 19 Aug 2020
Cited by 4 | Viewed by 2700
Abstract
To achieve the satisfactory performance of human action recognition, a central task is to address the sub-action sharing problem, especially in similar action classes. Nevertheless, most existing convolutional neural network (CNN)-based action recognition algorithms uniformly divide video into frames and then randomly select [...] Read more.
To achieve the satisfactory performance of human action recognition, a central task is to address the sub-action sharing problem, especially in similar action classes. Nevertheless, most existing convolutional neural network (CNN)-based action recognition algorithms uniformly divide video into frames and then randomly select the frames as inputs, ignoring the distinct characteristics among different frames. In recent years, depth videos have been increasingly used for action recognition, but most methods merely focus on the spatial information of the different actions without utilizing temporal information. In order to address these issues, a novel energy-guided temporal segmentation method is proposed here, and a multimodal fusion strategy is employed with the proposed segmentation method to construct an energy-guided temporal segmentation network (EGTSN). Specifically, the EGTSN had two parts: energy-guided video segmentation and a multimodal fusion heterogeneous CNN. The proposed solution was evaluated on a public large-scale NTU RGB+D dataset. Comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed network. Full article
Show Figures

Figure 1

21 pages, 536 KiB  
Article
An Innovative Multi-Model Neural Network Approach for Feature Selection in Emotion Recognition Using Deep Feature Clustering
by Muhammad Adeel Asghar, Muhammad Jamil Khan, Muhammad Rizwan, Raja Majid Mehmood and Sun-Hee Kim
Sensors 2020, 20(13), 3765; https://doi.org/10.3390/s20133765 - 5 Jul 2020
Cited by 32 | Viewed by 4419
Abstract
Emotional awareness perception is a largely growing field that allows for more natural interactions between people and machines. Electroencephalography (EEG) has emerged as a convenient way to measure and track a user’s emotional state. The non-linear characteristic of the EEG signal produces a [...] Read more.
Emotional awareness perception is a largely growing field that allows for more natural interactions between people and machines. Electroencephalography (EEG) has emerged as a convenient way to measure and track a user’s emotional state. The non-linear characteristic of the EEG signal produces a high-dimensional feature vector resulting in high computational cost. In this paper, characteristics of multiple neural networks are combined using Deep Feature Clustering (DFC) to select high-quality attributes as opposed to traditional feature selection methods. The DFC method shortens the training time on the network by omitting unusable attributes. First, Empirical Mode Decomposition (EMD) is applied as a series of frequencies to decompose the raw EEG signal. The spatiotemporal component of the decomposed EEG signal is expressed as a two-dimensional spectrogram before the feature extraction process using Analytic Wavelet Transform (AWT). Four pre-trained Deep Neural Networks (DNN) are used to extract deep features. Dimensional reduction and feature selection are achieved utilising the differential entropy-based EEG channel selection and the DFC technique, which calculates a range of vocabularies using k-means clustering. The histogram characteristic is then determined from a series of visual vocabulary items. The classification performance of the SEED, DEAP and MAHNOB datasets combined with the capabilities of DFC show that the proposed method improves the performance of emotion recognition in short processing time and is more competitive than the latest emotion recognition methods. Full article
Show Figures

Graphical abstract

24 pages, 4271 KiB  
Article
Keys for Action: An Efficient Keyframe-Based Approach for 3D Action Recognition Using a Deep Neural Network
by Hashim Yasin, Mazhar Hussain and Andreas Weber
Sensors 2020, 20(8), 2226; https://doi.org/10.3390/s20082226 - 15 Apr 2020
Cited by 33 | Viewed by 4746
Abstract
In this paper, we propose a novel and efficient framework for 3D action recognition using a deep learning architecture. First, we develop a 3D normalized pose space that consists of only 3D normalized poses, which are generated by discarding translation and orientation information. [...] Read more.
In this paper, we propose a novel and efficient framework for 3D action recognition using a deep learning architecture. First, we develop a 3D normalized pose space that consists of only 3D normalized poses, which are generated by discarding translation and orientation information. From these poses, we extract joint features and employ them further in a Deep Neural Network (DNN) in order to learn the action model. The architecture of our DNN consists of two hidden layers with the sigmoid activation function and an output layer with the softmax function. Furthermore, we propose a keyframe extraction methodology through which, from a motion sequence of 3D frames, we efficiently extract the keyframes that contribute substantially to the performance of the action. In this way, we eliminate redundant frames and reduce the length of the motion. More precisely, we ultimately summarize the motion sequence, while preserving the original motion semantics. We only consider the remaining essential informative frames in the process of action recognition, and the proposed pipeline is sufficiently fast and robust as a result. Finally, we evaluate our proposed framework intensively on publicly available benchmark Motion Capture (MoCap) datasets, namely HDM05 and CMU. From our experiments, we reveal that our proposed scheme significantly outperforms other state-of-the-art approaches. Full article
Show Figures

Figure 1

23 pages, 4894 KiB  
Article
Deep Joint Spatiotemporal Network (DJSTN) for Efficient Facial Expression Recognition
by Dami Jeong, Byung-Gyu Kim and Suh-Yeon Dong
Sensors 2020, 20(7), 1936; https://doi.org/10.3390/s20071936 - 30 Mar 2020
Cited by 80 | Viewed by 6070
Abstract
Understanding a person’s feelings is a very important process for the affective computing. People express their emotions in various ways. Among them, facial expression is the most effective way to present human emotional status. We propose efficient deep joint spatiotemporal features for facial [...] Read more.
Understanding a person’s feelings is a very important process for the affective computing. People express their emotions in various ways. Among them, facial expression is the most effective way to present human emotional status. We propose efficient deep joint spatiotemporal features for facial expression recognition based on the deep appearance and geometric neural networks. We apply three-dimensional (3D) convolution to extract spatial and temporal features at the same time. For the geometric network, 23 dominant facial landmarks are selected to express the movement of facial muscle through the analysis of energy distribution of whole facial landmarks.We combine these features by the designed joint fusion classifier to complement each other. From the experimental results, we verify the recognition accuracy of 99.21%, 87.88%, and 91.83% for CK+, MMI, and FERA datasets, respectively. Through the comparative analysis, we show that the proposed scheme is able to improve the recognition accuracy by 4% at least. Full article
Show Figures

Figure 1

17 pages, 54474 KiB  
Article
eXnet: An Efficient Approach for Emotion Recognition in the Wild
by Muhammad Naveed Riaz, Yao Shen, Muhammad Sohail and Minyi Guo
Sensors 2020, 20(4), 1087; https://doi.org/10.3390/s20041087 - 17 Feb 2020
Cited by 55 | Viewed by 6200
Abstract
Facial expression recognition has been well studied for its great importance in the areas of human–computer interaction and social sciences. With the evolution of deep learning, there have been significant advances in this area that also surpass human-level accuracy. Although these methods have [...] Read more.
Facial expression recognition has been well studied for its great importance in the areas of human–computer interaction and social sciences. With the evolution of deep learning, there have been significant advances in this area that also surpass human-level accuracy. Although these methods have achieved good accuracy, they are still suffering from two constraints (high computational power and memory), which are incredibly critical for small hardware-constrained devices. To alleviate this issue, we propose a new Convolutional Neural Network (CNN) architecture eXnet (Expression Net) based on parallel feature extraction which surpasses current methods in accuracy and contains a much smaller number of parameters (eXnet: 4.57 million, VGG19: 14.72 million), making it more efficient and lightweight for real-time systems. Several modern data augmentation techniques are applied for generalization of eXnet; these techniques improve the accuracy of the network by overcoming the problem of overfitting while containing the same size. We provide an extensive evaluation of our network against key methods on Facial Expression Recognition 2013 (FER-2013), Extended Cohn-Kanade Dataset (CK+), and Real-world Affective Faces Database (RAF-DB) benchmark datasets. We also perform ablation evaluation to show the importance of different components of our architecture. To evaluate the efficiency of eXnet on embedded systems, we deploy it on Raspberry Pi 4B. All these evaluations show the superiority of eXnet for emotion recognition in the wild in terms of accuracy, the number of parameters, and size on disk. Full article
Show Figures

Figure 1

Review

Jump to: Research, Other

22 pages, 1497 KiB  
Review
Taxonomy of Anomaly Detection Techniques in Crowd Scenes
by Amnah Aldayri and Waleed Albattah
Sensors 2022, 22(16), 6080; https://doi.org/10.3390/s22166080 - 14 Aug 2022
Cited by 18 | Viewed by 4926
Abstract
With the widespread use of closed-circuit television (CCTV) surveillance systems in public areas, crowd anomaly detection has become an increasingly critical aspect of the intelligent video surveillance system. It requires workforce and continuous attention to decide on the captured event, which is hard [...] Read more.
With the widespread use of closed-circuit television (CCTV) surveillance systems in public areas, crowd anomaly detection has become an increasingly critical aspect of the intelligent video surveillance system. It requires workforce and continuous attention to decide on the captured event, which is hard to perform by individuals. The available literature on human action detection includes various approaches to detect abnormal crowd behavior, which is articulated as an outlier detection problem. This paper presents a detailed review of the recent development of anomaly detection methods from the perspectives of computer vision on different available datasets. A new taxonomic organization of existing works in crowd analysis and anomaly detection has been introduced. A summarization of existing reviews and datasets related to anomaly detection has been listed. It covers an overview of different crowd concepts, including mass gathering events analysis and challenges, types of anomalies, and surveillance systems. Additionally, research trends and future work prospects have been analyzed. Full article
Show Figures

Figure 1

43 pages, 2146 KiB  
Review
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances
by Shibo Zhang, Yaxuan Li, Shen Zhang, Farzad Shahabi, Stephen Xia, Yu Deng and Nabil Alshurafa
Sensors 2022, 22(4), 1476; https://doi.org/10.3390/s22041476 - 14 Feb 2022
Cited by 226 | Viewed by 32332
Abstract
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human–computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and [...] Read more.
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human–computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning has greatly pushed the boundaries of HAR on mobile and wearable devices. This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive analysis of the current advancements, developing trends, and major challenges. We also present cutting-edge frontiers and future directions for deep learning-based HAR. Full article
Show Figures

Figure 1

Other

Jump to: Research, Review

13 pages, 1544 KiB  
Letter
Accelerometer-Based Human Activity Recognition for Patient Monitoring Using a Deep Neural Network
by Esther Fridriksdottir and Alberto G. Bonomi
Sensors 2020, 20(22), 6424; https://doi.org/10.3390/s20226424 - 10 Nov 2020
Cited by 51 | Viewed by 5461
Abstract
The objective of this study was to investigate the accuracy of a Deep Neural Network (DNN) in recognizing activities typical for hospitalized patients. A data collection study was conducted with 20 healthy volunteers (10 males and 10 females, age = 43 ± 13 [...] Read more.
The objective of this study was to investigate the accuracy of a Deep Neural Network (DNN) in recognizing activities typical for hospitalized patients. A data collection study was conducted with 20 healthy volunteers (10 males and 10 females, age = 43 ± 13 years) in a simulated hospital environment. A single triaxial accelerometer mounted on the trunk was used to measure body movement and recognize six activity types: lying in bed, upright posture, walking, wheelchair transport, stair ascent and stair descent. A DNN consisting of a three-layer convolutional neural network followed by a long short-term memory layer was developed for this classification problem. Additionally, features were extracted from the accelerometer data to train a support vector machine (SVM) classifier for comparison. The DNN reached 94.52% overall accuracy on the holdout dataset compared to 83.35% of the SVM classifier. In conclusion, a DNN is capable of recognizing types of physical activity in simulated hospital conditions using data captured by a single tri-axial accelerometer. The method described may be used for continuous monitoring of patient activities during hospitalization to provide additional insights into the recovery process. Full article
Show Figures

Figure 1

Back to TopTop