2. An Overview of Published Articles
Ojiako and Farrahi (contribution 1) experimented with an innovative predictive model of human activities (HAR). They demonstrated that the sensor-based MLP mixer architecture enables competitive performance in vision-based tasks with lower computational costs than other deep learning techniques. The MLP mixer recently created by Google Brain [
12] does not use convolutions or self-attention mechanisms, and instead consists entirely of MLPs. The authors compared the performance of the MLP mixer with the existing state-of-the-art literature:
*Ensemble LSTM.
*CNN-BiGRU.
*AttenSense.
*Multi-agent attention.
*DeepConvLSTM.
*Triple attention.
*Self-Caution*CNN.
*b-LSTM-S.
The performance was 10.1% better in the Daphnet Gait dataset, 1% better in the PAMAP2 dataset, and 0.5% better in the Opportunity dataset.
Velasco et al. (contribution 2) used the HAR approach to understand human behavior by analyzing data representative of domestic routines. Their study is oriented towards establishing a connection between the activities of daily living, the spaces in which they take place, and the times related to the performance of the activities in a given place. Research has shown that this information is helpful for healthcare professionals to assess the health status of patients, for family members to keep track of the habits of relatives, and for home designers to assess the architectural characteristics of home interiors for accessibility and movement of residents. The authors used the knowledge discovery database (KDD) approach with the data analyst variant as a key player in the knowledge discovery process [
13]. The KDD approach is an interactive and iterative knowledge discovery process that identifies relationships between data that must be valid, new, potentially useful, and understandable. The analyst gains a greater understanding of the domestic routine with each process iteration. The parameters used for the evaluation are the sequence of places visited, times of day at which they are visited, and average duration of visits; the signals are acquired using PIR sensors connected to a Raspberry Pi4, placed inside each room of the house. Transitions between positions are detected by measuring the RSSI power of the Bluetooth signal emitted by a BLE device worn by the subject being monitored. The evaluation of the method was verified through workshops with seventeen multidisciplinary participants: architects, engineers, health professionals, and caregivers. The feedback obtained was positive, confirming the validity of the method adopted as a source of significant information on the status of the monitored subjects.
In the third manuscript, Huang et al. (contribution 3) proposed a new multiscale hierarchical adaptive network structure for HAR called HMA Conv-LSTM. In this model, there are:
a multi-scale hierarchical convolution module (HMC) that performs finer-grained feature extraction on the spatial information of feature vectors;
an adaptive channel feature fusion module that can blend functionality at different scales, improving model efficiency and removing redundant information;
a dynamic channel-selection module-LSTM based on the attention mechanism to extract time context information.
This multi-scale convolution module uses convolutional cores of different scales for extracting and splicing multi-scale features in both sensory and temporal dimensions. This strengthens the network’s ability to recognize features of different scales, improves its adaptability, and enhances its ability to characterize features.
The diversity and duration of the actions detected by sensors placed on different body positions dictate longer sliding window sizes for segmentation. This sizing can result in some fine-grained subtle action processes being overlooked, thus affecting action recognition. In contrast, the proposed hierarchical architecture can split the action window and extract features from the sensor sequence data with finer granularity to recognize the finer action processes effectively. To validate the efficacy of the proposed model, the authors carried out experiments on several public HAR datasets: Opportunity, PAMAP2, USC-HAD, and Skoda. Their model was built using Google’s open-source TensorFlow 2.9.0 deep learning framework. The proposed model achieves competitive performance compared to several state-of-the-art approaches. The evaluation results also show that the proposed HMA Conv-LSTM can effectively obtain the temporal context and spatial information from sensor sequence data.
Again, Mekruksavanich et al. (contribution 4) used an innovative approach based on a DL network and the nature of the data. Exploiting the potential offered by WiFi-based detection techniques, they used channel status information (CSI) [
14] rather than the received signal strength indicator (RSSI). The authors proposed a hybrid deep learning network called CNN-GRU-AttNet that leverages the strengths of CNN and GRU to extract informative spatio-temporal features from raw CSI data automatically and to efficiently classify tasks. They also integrated an attention mechanism into the network that prioritizes important features and time steps, thereby improving recognition performance. The network consists of five layers: the input layer, two CNN layers, a GRU layer, an attention layer, a fully connected layer, and an output layer. To assess the effectiveness of the proposed model, the authors used two publicly accessible datasets, CSI-HAR and Stan WIFI. They refer to seven activities: walking, running, sitting, lying down, standing up, bending, and falling. Because these datasets did not have predefined training and test sets, they adopted the cross-validation technique five times to evaluate the model’s performance. They also performed a comparative evaluation of the performance of five core deep learning models: CNN, LSTM, BiLSTM, GRU, and BiGRU.
The results show exceptional efficacy in the classification of HAR activities, superior to the five basic DL models, producing an average accuracy of 99.62%, an accuracy of 99.61%, and an F1 score of 99.61% in all movements.
Kim and Lee (contribution 5), aware that some physical activities may include similar features that lead the automatic classification phase to incorrect evaluations, proposed a new approach to improve recognition accuracy. Their proposed method uses a smartphone’s three-axis acceleration and gyroscopic data to define activity patterns visually. In particular, the method expands the sensor data into 2D and 3D images. This generates new characteristics of human activities that cannot be detected in one-dimensional data. These new features allow, on the one hand, the recognition of more diverse types of human physical activity and, on the other hand, the identification of unique characteristics among similar types of activities. The raw values from the accelerometer and gyroscope that correspond to the breadth of the continuous data of the activities performed are used to represent 2D image models. Each time-series value is transformed into a luminosity value, obtaining the Brightness Intensity Distribution Model (BIDP) for each physical activity data. Each point is expressed as a distinct brightness value based on the measured value. This type of representation includes areas of intense and low brightness depending on the location of the data waveform that can degrade the model’s performance. To overcome this problem, the authors carried out a processing step to generate a standardized visual image.
The image data were used in the training phase along with the raw 1D data to increase the precision and accuracy of the HAR. The sensor data from the triaxial accelerometer and gyroscope used in this study came from the “WISDM Activity and Biometrics for Smartphones and Smartwatches” published by Weiss [
14]. The neural network used was of the multidimensional convolutional type. The model achieved a 90% or higher performance for all 18 classes of physical activity examined.
This model’s HAR performance was superior to previous studies’ corresponding performance.
Caramaschi et al. (contribution 6) experimented with a model for the recognition of human activity independent of the orientation of the worn device that classified five predefined activities within a range of actions that could occur in a clinical setting. Their proposal stems from the study of how changes in sensor orientation affect the classification of deep learning (DL) human activity recognition (HAR) targeting activities such as slow and assisted walking and wheelchair use. The HAR model is orientation-agnostic, uses data augmentation, and is trained with acceleration measurements recorded from five sensor positions on the participant’s trunk. The wearable sensor data augmentation approach, first used by Ohashi et al. [
15], positively affects time-series computing and potentially improves data-driven tasks such as HAR. They used two datasets. The first is the Wearing Position Study (WPS) acquired at Philips Research Laboratories (2022). It contains three-axis acceleration measurements from nineteen healthy volunteers, comprising ten males and nine females. The second is the Simulated Hospital Study (SHS) acquired at Philips Research Laboratories (2019). It includes ten healthy male and ten female volunteers. Five GENEActive (GA) sensors were used for monitoring: two in contact with the skin, two dangling from the neck, and one in the pocket of the clinical gown. The implemented HAR model is a modified version of the DNN proposed by Fridriksdottir et al. [
16]. The main difference is replacing the long short-time memory layer with a convolutional layer. This change in architecture was introduced to simplify the model and did not generate significantly different results from the previous DNN. The performance achieved by the two sets was evaluated to choose the number of augmented rotation intervals to be applied to the training data. The first set consisted of seven rotations between 0 and 90 degrees, while the second set consisted of seven rotations between 0 and 180 degrees. In light of this preliminary analysis, the final augmentation settings for the augmented model’s training set consisted of ten rotations from 0 to 180, with a 20-degree pitch on the frontal, longitudinal, and sagittal axes separately. Cross-validation was used five times to train both the base and augmented model. The cross-validation performance was used to evaluate the augmentation approach (i.e., the range of rotations) and the effect of rotation on the baseline model. The control data results confirmed the augmented model’s good performance obtained during cross-validation. Testing showed that as the data increased, the model could learn additional configurations not provided by the initial dataset.
Adherence to cardiac rehabilitation does not currently produce the expected results, negatively affecting the health status of patients and the use of available resources. To improve this trend, Filos et al. (contribution 7) set up a study based on machine learning techniques to predict the adherence of patients with cardiovascular disorders to a six-month home cardiac telerehabilitation program. Their approach is based on the use of clinical information available before the start of a program and behavioral and cardiovascular fitness characteristics acquired during the preliminary phase of familiarization with the program. As a first step, the methodology applied involves classifying patients into different clusters. Hierarchical clustering, an algorithm that groups objects with similar characteristics in a tree hierarchy, was used for classification. The baseline data led to the formation of three groups of patients: an active, low-risk patient group, sedentary, high-risk patients, and a group of patients at high cardiovascular risk but who are fit and motivated. Familiarity with exercise showed three adherence behaviors (high adherence, low adherence, and transient adherence), while exercise sessions after the familiarization phase resulted in adherent and non-adherent clusters. Two model types, namely repetitive decision trees (DT) and random forest (RF), were used to predict long-term adherence. The data to develop the DT model were patient clusters created based on baseline characteristics and clusters related to adherence to the exercise program. Since the DT model is unstable, a slight variation in the training dataset can lead to changes in the tree. A random forest (RF) technique, which is more stable, was thus applied. The first model showed both high accuracy and high recall, at 80.2 ± 19.5% and 94.4 ± 14.5%, respectively, which were better than the performance of the second model, which displayed a precision of 71.8 ± 25.8% and a recall of 87.7 ± 24%. Network analysis was applied to discover correlations of their characteristics that relate to adherence. This study highlighted how important the combination of basic clinical data with the characteristics acquired during a brief familiarization phase is for the high-accuracy prediction of adherence to the long-term RC program. The proposed methodology can be generalized to facilitate the identification of patients who are more adherent to telerehabilitation programs.
Obesity increases the risk of many chronic diseases, especially cardiovascular disease, and is a cause of death. Faced with the rapid increase in obesity in the population, Vidal et al. (contribution 8) developed a cross-sectional analytical study of residents of the United States of America (USA) who have an Instagram account to determine whether using any meal tracking platform to record food consumption correlated with an improvement in body mass index (BMI). The survey was conducted on a sample of actual or graduate students from Mary Hardin Baylor University, Oakland University, the University of Kentucky, and Queens University in Charlotte. Eight hundred and ninety-six subjects with an Instagram account signed up to participate in an anonymous online survey, of which 78.7% were women, 20.6% were men, and 0.7% were classified as others. As for generations, 11.5% belonged to Generation Z, 75.6% to the Millennials, 11.4% to Generation X, and 1.6% to the Baby Boomers. Overall, 93.5% of the sample did not smoke, 2.3% smoked, and 4.1% smoked occasionally. Concerning academic qualifications, 3.7% had high school graduates, 6.1% had some university credits, 0.6% had technical training, 3.2% had an associate degree, 43.2% had a bachelor’s degree, 15.1% had a master’s degree and 28.1% had a doctorate. The information acquired through the questionnaire included the number of hours per week dedicated to Instagram or physical activity and the intensity of physical activity performed. In order to test the influence of using any meal tracking platform to record food intake on BMI, they were asked if they had used any digital platform in the past month. The chi-square test was used to study the relationships between the use of any digital platform in the last month and gender, generation, smoking habits, highest academic degree earned, and time spent on Instagram. The Mann–Whitney U test was adopted to compare BMI, weekly hours spent on Instagram looking at nutrition- or physical activity-related content, vigorous physical activity, moderate physical activity, time spent walking, and time spent sitting among participants who did not eat meals. The survey showed that the platform was used by 34.2% of the sample. Participants who used any meal tracking platform also had a higher BMI, invested more hours per week on Instagram looking at nutrition- or physical activity-related content, and performed more minutes per week of vigorous physical activity. The survey showed that participants rely on new technologies for optimal weight without obtaining practical results. The authors believe that combining care with digital app-based tools and support from healthcare professionals can help individuals to effectively achieve a healthy weight.
In the ninth paper, Alemayol et al. (contribution 9) proposed a gait and pose analysis study based on estimating the angle of the lower limb joint from a single inertial sensor. Gait analysis is critical in healthcare; it is mainly adopted for precise patient monitoring, the identification of movement abnormalities, the evaluation of surgical findings, and the detection of osteoarthritis of the knee and hip to diagnose Parkinson’s disease. Gaits are interpreted through three types of parameters: spatiotemporal (e.g., stride speed and length/stride), kinematic (e.g., hip extension/flexion), or kinetic parameters (e.g., ground reaction moments and forces). The authors used kinematic parameters, the joint angles of the lower limb, and preferred wearable sensors for data collection. These sensors are preferred to non-wearable ones, which generally consist of optical motion acquisition systems with high position accuracy, as they are expensive and require longer installation times and specific skills. Motion analysis in a real-world environment requires precise and reliable sensors. The investigations identified the Xsens inertial sensors as the most suitable for this purpose. The literature has various testimonies on the number of sensors, their positioning and estimation methods, and the analysis of movement. The authors employed various neural network algorithms to determine the number and placement of sensors for estimating the joint angle of both legs. To calculate the actual values of the lower limb joint angle, seven individual Awinda sensors were mounted on the lower half of the body of each of the sixteen subjects, in particular one on the pelvis at the height of the anterior-superior iliac spine, another on each of the lateral thighs, two more on the upper parts of the tibiae and finally two more on the upper anterior parts of the feet. The goal was the estimation of leg kinematics (joint angles) from any of the sensors attached to the body. The authors used four different neural network models for the estimation: long-term bidirectional memory (BLSTM), convolutional neural network, wavelet neural network, and unidirectional LSTM. Two groups of target angles of the leg joint were examined. The first set contained only four corners of the leg joint in the sagittal plane, while the second included six angles of the leg joint in the sagittal plane and two angles of the leg joint in the coronal plane. By evaluating different combinations of networks and datasets, it was found that the BLSTM network was the best performer with both datasets, with an absolute mean error (MAE) of between 3.02° and 4.33° for the four dominant angles of the leg joint in the sagittal plane. The results improved with an increased number of sensors and the introduction of biometric information. From the investigation of the placement of the single sensor, it was found that the shin or thigh is the optimal position for estimating the angle of the leg joint. Actual leg movement was compared to a computer-generated simulation of leg joints, which demonstrated the possibility of estimating leg joint angles during walking with a single inertial sensor.
Bibbò et al. (contribution 10) developed an innovative model to detect subjects’ emotional health using a self-normalizing neural network (SNN) containing an ensemble layer. In the context of HAR, computer vision technology can be applied to recognize emotional states through facial expressions using facial positions such as the nose, eyes, and lips. The recognition of facial emotions is important because, from the analysis of the face, it is possible to detect the subject’s health status, such as anxiety, depression, stress, malaise, and neurodegenerative disorders, making facial diagnosis possible. This is a beneficial technique in caring for older adults; through the information provided, medical staff can evaluate the type of intervention required to reduce the subjects’ discomfort. Some facial manifestations can be associated with the first pathological symptoms, preventing diseases that can degenerate. The innovation produced by the authors is the development of an AI classifier based on a set of classifier neural networks whose outputs are directed to an ensemble layer. In particular, the networks are self-normalizing neural networks (SNNs). The model comprises six SNNs, each trained to identify six emotions (anger, disgust, fear, happiness, sadness, and surprise). The networks cascade, and each is dedicated to detecting the presence or absence in the input image of a single specific emotion (among the six present in this study) assigned to and associated with it. Each neural network is trained with its images for a specific emotion. Each network produces two outputs, among which the first, identified with EM through a numerical enhancement (from 0 to 1), confirms the correspondence of the emotion detected with that assigned to the network. The second, identified with AM, similarly through a numerical enhancement (from 0 to 1), signals the presence of a different emotion from that assigned to the specific network. These outputs are then transferred to the ensemble layer, which provides an accurate result by analyzing the outputs of the individual networks according to statistical logic. Kaggle was used as the dataset. The authors used an approach to validate the results through the control network in the experiments. The results showed a success rate for almost all emotions of around 80%, with a peak of 95% for the emotion “Fear”.
The exciting topic of the metaverse is addressed in the eleventh article of this collection. One of the areas in which the metaverse is applied is digital games. Virtual reality and animation allow virtual characters to take on natural roles and generate new immersive ways to live their lives. Oliveira et al. (contribution 11) aimed their research at understanding the impact of the concept of the metaverse on ordinary people’s lives. The definition of the concept of the metaverse was first postulated by Neal Stephenson in his book
Snow Crash in 1992. It was defined as a virtual world capable of reaching, interacting, and influencing human existence [
17]. There currently needs to be a single definition.
The metaverse can be understood as a network of interconnected 3D virtual worlds rendered in real time that can be experienced synchronously and persistently by an unlimited number of users. This study is part of the research on the metaverse, virtual reality, and gaming. It was produced in three focus groups with Portuguese adults who are regular video game players. The focus group originated in the work of the Bureau of Applied Social Research at Columbia University in 1940. It is used in research in several disciplines. It is a qualitative method of collecting data on a particular topic in an informal discussion between selected people. During the discussion, information is gained about what people think or feel and how they act. The developed investigation has the following aims:
To verify how the metaverse is represented and characterized;
To identify which technologies stimulate the immersion experience;
To identify the main dimensions that influence the acceptance of the metaverse concept;
To understand perceptions of metaverse and VR regarding socialization and well-being;
To test perceptions of a player’s daily life regarding the concepts of the metaverse, virtual reality, and gaming;
To understand the impact of social representations on the concept of play;
To understand animation’s perceived role in relation to the Metaverse, Virtual Reality, and gaming concepts.
The data collected during the focus groups are the answers provided by the 13 participants to the twenty-eight questions distributed across the three themes: games, animation, and metaverse. The results obtained from player responses produced accurate information on how the metaverse is represented and characterized and relates to virtual reality and gaming. In conclusion, the metaverse is considered a game that allows immersive experiences through virtual reality technology and the style and esthetics of animation. It is also seen as a means of socialization and communication, and a promoter of well-being.
In the future, its expansion into the world of social networks as a means of communication is foreseeable.