1. Introduction
The movement recognition, usually identified as a typical pattern recognition problem and, more specifically, a classification problem [
1], has been applied in sports by using machine learning techniques to generate neural network models for multiple purposes. These purposes include: (i) identifying critical conditions (e.g., falls, heart attacks); (ii) classifying activities (e.g., swimming, running, cycling); and (iii) counting repetitions (e.g., steps, jumps, squats) [
2,
3,
4]. The importance of analyzing physical activities with the help of neural networks models and wearable devices lies in the ability to derive insightful information from these models that can be relevant to the user’s performance. For example, predicting sweat loss to prevent dehydration based on factors such as heart rate, temperature, anthropometric parameters of users, and steps per minute, as presented in [
5].
In the context of repetitive physical activities, to the best of our knowledge, the lack of a neural network model to provide qualitative feedback that helps users to improve, or adjust, the movement for proper execution is still an open problem. Commonly performed in gyms or physiotherapy sessions, the repetitive physical activities focus on strengthening a set of muscles by activating them along multiple sessions for several months or years. However, overstressing these muscles by incorrect movement executions may cause ruptures and lesions that demand rest or, in the worst case, surgical intervention to reconstruct the muscle fiber. By developing a technology to assist the users regarding their movement execution, we can drastically change the relevance of wearable devices/neural networks from a simple monitoring technology to a disruptive approach that improves the users’ performance and prevents injuries [
6].
At this point, we can say that our research question is about data reconstruction. It involves fixing incorrect input data, which represents a wrong execution of physical activity, to create accurate output data that shows the right way of doing it. In simpler terms, we are exploring whether it is possible to correct input data and provide feedback that helps users improve their future input data.
To address the research question effectively, we must also answer the following motivating questions: (i) is it possible to provide a neural network model that is capable of generating a qualitative feedback to the users during repetitive physical activities (e.g., squats, chest fly, row, curl)? (ii) is it possible to improve the users’ performance on repetitive physical activities by adjusting the movements from the model’s feedback? and (iii) is it possible to generate a model that adapts to the user’s limitations avoiding misleading generalizations that are not suitable for the user’s physical traits?
This paper proposes an alternative solution to face the challenges and difficulties, advocating that it is possible, at runtime, to generate both the appropriate pattern and the model for a specific activity, from the use of inertial sensors along the various body segments. Each generated model can provide detailed adjustment suggestions that guide the users to properly perform a physical activity. This approach avoids misleading generalizations by generating a new model at the beginning of each different activity, identifying the movement pattern most suitable for the user’s physical traits at that time, respecting the limitations of the users according to their progress, preventing overstress the muscles and keeping the movement quality along the physical activity sets.
Despite the various approaches to convert, or map, the movements made by individuals in the real world into data that can be analyzed, such as using cameras [
7,
8,
9], electromyography [
10,
11,
12,
13], and resistive sensors based [
14,
15,
16,
17], this paper considers the inertial sensors as the best approach due its cost/benefit. These sensors have a small size, good accuracy, and low cost, in addition to being able to measure the monitored segment orientation based on linear acceleration (accelerometer) and angular velocity (gyroscope) in three different axes (x, y and z) [
18,
19].
The proposed method was evaluated using the PHYTMO—Physical Therapy Monitoring [
20] dataset which contains inertial sensors data from the body’s segments classified as correct and incorrect execution, allowing to compare the proposed method’s efficiency. These data were analyzed by two main algorithms: (i) Dynamic Time Warping (DTW) [
21,
22] and (ii) The Restricted Boltzmann Machine (RBM) [
23]. The RBM is used to extract the patterns from the training data and to generate the adjustment suggestions for new input data, and the DTW is used to evaluate the efficiency of the proposed method by identifying the gain in applying the adjustment made by the RBM model in the same input data.
The remainder of this paper is organized as follows. The next Section presents the Related Works, followed by Theoretical Background which presents the base concepts needed to understand this paper. The Proposed Method Section explains, in multiple subsections, an alternative to address the problems mentioned before. The Evaluation Method describes the steps to validate the proposed method and describes the obtained results and its discussions. Finally, the Conclusion Section presents an overview, achievements, limitations of the proposed method and perspectives for future work.
2. Related Works
The most similar study found in the literature proposes the use of a Recurrent Restricted Boltzmann Machine to perform predictions of chaotic time-series [
24]. Despite the divergence of the application, the proposed method would be an interesting candidate to solve the research problem in this manuscript. Their proposes includes a recurrent structure at the hidden nodes that stores historical information, generating more accurate predictions for future values for the input time series.
The proposed method in [
25] uses a support vector machine (SVM) classifier to recognize postural patterns through wearable sensors to avoid postures that increase spinal stress. This approach trained a model to recognize lifting and releasing movements with correct and incorrect postures using kinematic data from 8 sensors distributed on the lower and higher legs and in the trunk body segment. The experiments involved 26 healthy subjects, and the SVM model achieved an accuracy of 99.4% in identifying, at real-time, correct and incorrect postures, considering all sensors. Despite the good results, this model only works for those movements (lift and release loads). Furthermore, SVM only generates outputs that have already been evaluated during the training process, unlike RBM which can adapt and generate outputs based on a previous distribution of the training dataset.
The studies conducted in [
26] provide a deep learning framework capable of evaluating up to 10 rehabilitation exercises. It uses an autoencoder as sub-networks to process the displacement of individual body parts, which are monitored by a visual motion sensor. Their conclusions demonstrate that probabilistic models outperform approaches that use distance functions for movement assessments. A second outcome of this research is a dataset called UI-PRMD with data collected from 10 healthy subjects, composed by 10 repetitions of 10 rehabilitation exercises. This data were gather by an optical tracking system and consist of 117-dimensional sequences of angular joint displacements.
A different reconstruction time series approach, proposed in [
27], uses a denoising autoencoder to reconstruction time series with missing values. By converting the raw time series into a 2d matrix that establishes correlations between the time intervals, it is possible to use a Denoising autoencoder to reconstruct the missing value in the 2D matrix. The results show that uses 2D representations of a time series improves the imputation and classification performance.
The study in [
28] evaluates four machine learning techniques regarding the binary classification of an exercise. It reached a misclassification error of up to 0.5%, using a support vector machine with a polynomial kernel, and up to 99% of accuracy in detecting wrong movements considering 7 different exercises on physical therapy routines. As explained previously, this approach also has a limited number of exercises that it is capable of recognizing and classifying, reducing its applications due to difficulties in adding and removing different routines. Also, the authors states that wearable device cannot detect a variety of fitness movements and may hinder the exercises of the fitness users.
An interesting approach in [
29] proposes the use of transferring learning based on deep neural networks to increase the number of classes (physical activities) that could be evaluated by the model. The performance experiments reached up to 98.56% of accuracy and 97.9% precision in identifying the movements. In the movement completeness, it reached up to 92.84% of accuracy and 92.85% precision. Despite the good results, the transfer has a previous cost of generating a generic model. Furthermore, scalability and necessary model updates would be difficult to maintain over time.
In the literature, there is no paper able to satisfy the main question that motivates our study: how can we propose an approach to generate specific models, in runtime, to assist each user based on their physical characteristics to perform a physical activity correctly given them adjustment suggestions?
3. Theoretical Background
This Section introduces the basic concepts for understanding the proposed method, starting with the Inertial Sensors Data Section, which explains how to combine the raw sensor’s data into a single metric to reduce the dimensionality simplifying the data analysis. Moreover, it explains the RBM algorithm, how it works and why it is the chosen algorithm to integrate the proposed method. Finally, it is presented the Dynamic Time Warping algorithm which is used to evaluate the gain on applying the RBM suggestion into the original input series.
3.1. Inertial Sensor Data
Despite the various approaches to convert, or map, the movements made by individuals in the real world into data that can be analyzed, such as by using cameras [
7,
8,
9], electromyography [
13,
30], and based on resistive sensors [
14,
31], this paper considers that inertial sensors are suitable candidates for that. These sensors have a small size, good accuracy, and low cost, in addition to being able to measure the monitored segment orientation based on linear acceleration (accelerometer) and angular velocity (gyroscope) in three different axes (
x,
y and
z) [
18,
19,
32].
The mapping process consists of registering multiple readings during the movement execution. This set of readings, called time series, contains raw data from three axes for each sensor that helps to understand the behavior of a monitored segment regarding the acceleration and the angular velocity. The raw time series data are combined to reduce the data dimensionality by using the magnitude metric [
33,
34]. This metric is
, where
,
, and
are the measurements on each axis by the sensor (
) at a specific instant (
t) during the movement execution.
Finally, the accelerometer and the gyroscope magnitude values are combined with a balanced fusion operation given by
, where
and
are the magnitude values from the accelerometer and gyroscope at instant
t, respectively.
Figure 1 presents a sample of raw data from the accelerometer and gyroscope, and their conversion to the magnitude metric and the data fusion.
The advantage of using the magnitude metric and the fusion of magnitudes relies on the conversion of negative values into positive values, and the amplification of the wave, as depicted in
Figure 1: Magnitude–Fusion (orange line). These characteristics help to highlight the cyclical pattern, implicit in the raw data of each sensor’s axis and, at the same time, they reduce the data dimensionality, simplifying the data analysis by using the absolute values. Also, problems with sensor displacement and axis orientations are mitigated by this metric because it considers only the absolute values of the forces in all of the axes of the sensors [
33].
3.2. Restricted Boltzmann Machines
The Restricted Boltzmann Machine (RBM) [
23] is a two-layer (visible and hidden) artificial neural network with no within-layer units’ connections that make stochastic decisions about whether units should be on (activated) or off (deactivated). This network receives a set of binary arrays and learns the set pattern (adjust the weights between the units—
) by finding a prior distribution in the observed data to generate arrays with high probability, as depicted in
Figure 2. It uses Gibbs sampling during this process.
Differently from the Feed Forward Neural Networks [
36], the RBM iterates over the layers, activating and deactivating the units until the visible layer represents a set that satisfies the previous distribution data. This characteristic demands high-quality input data during the training process, otherwise the weights would "learn" bad patterns and generate undesired visible units’ activation.
This neural network was chosen by its common application on recommendation systems [
37,
38]; also, its stochastic nature allows to identify a pattern from multiple subsets of a probabilistic distribution that would represent the training dataset; and, finally, this neural network contains only two layers, being suitable to embed on constrained resource devices.
3.3. Dynamic Time Warping
Dynamic Time Warping (DTW) [
21,
22] is applied to compute the distance between two time series by finding the best alignment between them. It uses dynamic programming to combine the points from both series to extract the minimal combination cost that expresses the total distance between both series. The smaller the total distance is, the more similar the compared time series are. This approach outperforms the traditional Euclidean Distance since it compensates for eventual differences in the frequency and duration of both series.
Figure 3 illustrates the best alignment path (connected dots on the matrix) between two time series (
x and
y) by associating the closest elements to each other (arrows connecting the elements’ arrays) on both series.
This algorithm was originally created to perform voice recognition, where the same word could be spoken with a different tone, frequency, and speed. By compensating for these variables, it is possible to adjust both series to identify relevant patterns between them.
Applying this technique to compare two magnitude series helps to identify the similarity between them. Using a reference series, it is possible to calculate the gain on applying the RBM suggestions in the input series by comparing both (input and changed series) to the reference model and calculating the difference between the results. Alternatively, it is used to perform a filtering on training samples that helps to generate a more accurate RBM model.
4. Proposed Method
Once the theoretical background provided the basic knowledge to understand the proposed method, this Section explains the pre-processing step (Series Binarization Section) that prepares the magnitude time series to train the RBM model and to analyze the further series by converting them into a binary array. This Section also describes the RBM architecture, specifying how the number of visible and hidden nodes are defined. After that, the proposed method pipeline is described for both training and the evaluation of new movements.
4.1. Series Binarization
The proposed method states that the difference between the sample values at the time and t is represented as a 3-tuple of discrete states which indicate if the value is decreasing (1,0,0), sustaining (0,1,0), or increasing (0,0,1). This means that a sample (a single execution of the physical activity) containing n values of magnitude readings is represented by a binary array of elements (or 3-tuples).
For example, assuming a time series of magnitude values [0.45, 0.53, 0.69, 0.72, 0.7, 0.7, 0.68, 0,65…], the binarization considers the variation between and to generate the 3-tuple. In this case, for the first () and the second () elements of x, we have , which is a positive number () that indicates that between and , the magnitude is increasing; then, the 3-tuple representation would be (0,0,1). Following these steps for the further elements in x, the resultant array of 3-tuples is = [(0,0,1), (0,0,1), (0,0,1), (1,0,0), (0,1,0), (1,0,0), (1,0,0)…]. Despite each element corresponding to a 3-tuple, for the RBM, all tuples are combined sequentially, forming an array of binary values holding the position of each element as follows: = [0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0…]. This binary array translates the tendencies of values in the magnitude time series which will be placed as input values in the RBM for both steps, training and evaluating.
4.2. The RBM Architecture
The visible layer of the RBM must handle all elements presented by the binary array, generated by the Series Binarization; this means that the RBM architecture will have a visible layer composed of units, with each one of them representing a single element from a binary array.
The hidden layer is composed of a set of hidden units, representing the number of 3-tuples in the visible layer. This architecture setup assumes that a trained model activates/deactivates the unit from the 3-tuples to satisfy the most probable states according to the model weights and its hidden units. The changes made by the model are the adjustment recommendations that transform the “incorrections on the input series” (original visible units) into a “corrected output series” (changed visible units).
The output 3-tuples represent the expected tendency of the sensor’s readings along the movement. For example, if the model "learns" that between the readings at the time and , the value must increase, then it changes the input data to represent this tendency.
As presented with the pre-processing step and the RBM architecture, it is possible to present the Pipeline Overview Section, which uses all base concepts presented before in this manuscript.
4.3. Pipeline Overview
The proposed method states that it is possible to provide feedback by generating specific models, in execution time, for each individual body segment, based on fusion magnitude time series (presented in the Inertial Sensor Data Section) from inertial sensors.
Figure 4 depicts the training process pipeline which starts by collecting a set of high-quality data during the right movement execution, it is possible to highlight the cyclic patterns, or periods (see
Figure 1), where each one of them represents an exercise repetition (or sample). These samples are converted into a binary array (presented in the
Section 4.1) placed as visible units from an RBM model (see
Figure 2) so it could be possible to train the model.
For each different physical activity, the model must be retrained to replace the outdated model and to adapt to the user’s rhythm along their evolution on the physical activities’ practices, avoiding misleading generalizations from massive datasets. Also, the evaluation of independent segments by specific models would provide accurate feedback, preventing interference from the other sensors on the pattern’s extractions and analysis.
Once the model is generated, then is possible to evaluate the following set of repetitive physical activities obtaining the samples from input data, converting them into a binary array and then feeding the model, which, based on a prior distribution from the training dataset, will reconstruct/modify the input data to satisfy the patterns from the correct movement execution as illustrated in the
Figure 5.
The differences between the input binarized sample and the RBM model outputs indicating the points where the evaluated execution must be changed to satisfy the correct execution pattern. By comparing the triples associated with each sequential pair from the magnitude series, it is possible to specify if the tendency, existent in the input binary array, is equal to the expected check, if the same units are activated or not. If both 3-tuples have the same units activated, then the series satisfies the model pattern. Otherwise, the output 3-tuple units indicate if the expected values must increase, sustain, or decrease the value.
It is important to stress that the training process demands a high-quality dataset to generate an efficient RBM model. The first set of each physical activity must be executed under the supervision of a qualified professional that guarantees the proper movement execution. This requirement is essential to provide reliable data to use as a reference for correcting the following executions.
5. Evaluation Method
Our performance evaluation consisted of applying a proposed method on a public inertial database, called PHYTMO [
20], that contains raw data from NGIMU [
40] inertial sensors attached to the subjects’ arms, forearms, thighs, and shins, on both sides (L-left, R-right) as depicted in
Figure 6.
This dataset provides data labeled by the groups of the subjects’ ages, exercises, sensor position, and if the movements were performed both, correctly, or incorrectly. This dataset recorded 30 subjects grouped in 5 ranges of age—22 to 26 (A), 30 to 39 (B), 42 to 49 (C), 50 to 55 (D), and 60 to 68 (E)—performing, at least 8×, the following exercises: knee flex–extension (KFE); squats (SQT); hip abduction (HAA); elbow flex–extension (EFE); extension of arms over head (EAH); and squeezing (SQZ).
The raw data from the PHYTMO dataset were converted into magnitude series, following the process explained in the Inertial Sensor Data Section, and these files were used to run the experiments to validate the proposed method.
Figure 7 depicts a data sample, after the conversion of raw data into magnitude series, for a single sensor, during the 4 sets of execution where the time series in the
Figure 7a,b represent a correct movement execution, and the time series in
Figure 7c,d represent incorrect movement execution.
By using the correct execution data to train (
Figure 7a) and validate (
Figure 7b) the RBM models, and the incorrect execution data (
Figure 7c,d) to test them, it is possible to measure the results of applying the recommendations on the incorrect execution samples for each specific body’s segment.
In this paper, this measurement is called the gain metric. It uses the Dynamic Time Warping algorithm to compare a reference series (extracted from the correct movement data) to both the original input series and the output series (generated by the RBM suggestions after evaluating the incorrect movement data). The gain metric is given by , where is the distance between i (original input series) and r (reference series), and is the distance between the o (output series) and r. This metric expresses how close to the reference series the input series gets after applying the modifications made by the RBM model, in other words, how close to the right movement would be if the users follows the recommendations.
The gain value reflects the following behavior: If the value tends towards 0, the output series will diverge further from the reference series than the original input. When the gain is closer to 1, the output series closely resembles the input series, while values greater than 1 indicate substantial improvements in the output series, reducing the distance from the reference series. In this case, the proposed method aims to maximize the gain value.
The reference series is the sample that provides the smallest average distance value to the other samples from the training data. This means that, for this evaluation, the samples are compared to each other and, then, the one that has the smallest average distance is defined as the reference series.
The average distance value of the reference series is also used to train a secondary RBM model (RBM+DTW) by excluding the samples that had a distance value greater than the average distance from the reference series. The regular RBM model (RBM) uses unfiltered samples to train the network. Both models are considered in this analysis to highlight the impact of assuming a full set of exercises as a training dataset without filtering the samples. This helps to understand whether the high computational cost of identifying the reference series generates a better model.
Finally, the process described in this Section repeats for all 280 sets (similar to
Figure 7). Then, for each set are extracted: (i) a reference sample from the first series of correct exercises; (ii) an RBM model trained from all samples in the first set of correct exercises, and (iii) an RBM+DTW model trained from the samples with distances equal to or smaller than the reference series average distance. The other three series were used to validate the models (second series of correct execution data) and to test the models (both series of incorrect execution data). This evaluation method intends to identify the gain (how similar to the reference series is the output series) after applying the adjustments to movement suggestions, generated by an RBM trained model.
6. Results
This Section is divided into three parts: (i) obtaining the reference series; (ii) validating the models by analyzing a correct movement execution series; and (iii) testing the model on evaluating both incorrect movement execution series.
6.1. Obtaining the Reference Series
Obtaining the reference series has a high computational cost, since we need to compute the distance from all samples to each other.
Figure 8 shows the samples extracted from the training series (
Figure 7a).
Each one of the samples depicted in
Figure 8 represents a correct movement execution of a physical activity. The reference series must be the sample that generically represents the others, i.e., the most similar sample to the other samples. This similarity is obtained by computing the distance (DTW) of all sample combinations.
Table 1 presents the distance of each sample to one another, and the average distance for each one of them. It is possible to see that Sample #5 has the lowest average distance while Sample #3 has the highest average distance value.
It is worth noting that in column #5, some samples (#1, #2, #10, #14, #16, and #17) have distances greater than the average. These samples are not used for training the RBM+DTW model. This filtering creates a more accurate model by training with as highly similar samples as possible.
6.2. Validation Results
Due to the validation process, the validation series (
Figure 7b) were used to evaluate if the models are performing the adjustment recommendation properly.
Figure 9 depicts the average distance for each segment.
As expected, the distances from the input series to the reference series have distance values greater than the output series on both models. This means that the suggestions generated by both the RBM and RBM+DTW models suggested changes that make the input more similar to the reference series.
Figure 10 is a sample that compares the input series (
Figure 10a), the RBM output series (
Figure 10b), and the RBM+DTW output series (
Figure 10c) to the reference series (blue line in all of them). It is worth noting that the RBM model generates a gain of 48.15% and the RBM+DTW model generates an output series that has a gain of 32.38%, compared to the input series, which means that both output series are 48.15% and 32.38% more similar to the reference series than the original input.
An interesting result could be observed in
Figure 10, in which the suggestions for the sensor placed on the left shin are further away than the original input data. As the validation process uses correct execution data, this result could be interpreted as an input series that already satisfies the prior distribution learned from the RBM.
Figure 11 depicts that behavior, where the output series is 13.73% and 12.94%, far from the original input, for the RBM and the RBM+DTW models, respectively.
Despite the observed behavior, the adjustment suggestions did not generate outputs series that change the series to a variation that could be considered an incorrect movement, preserving the patterns of the reference series.
Finally,
Figure 12 shows the average gain by the body’s segment when applying the model’s suggestions. The results show that RBM+DTW outperformed the RBM model, making it a more accurate model, providing efficient suggestions.
The previous conclusion does not disqualify the RBM model. Despite the inferior performance compared to the RBM+DTW models, the RBM model still generates an average gain of 85% in relation to the input series for the thigh sensor, which means the RBM output series are 1.7× more similar to the reference series than the input series. The RBM+DTW models generate outputs up to 184% better than the input (left thigh segment).
The validation process satisfied our purpose by presenting the expected results when analyzing correct movement data.
6.3. Testing Results
Differently from the validation process, the testing experiments used incorrect execution data to evaluate the models. The expected outcome was to achieve better results than the validation process, as the models would adjust the incorrect executions, leading to even greater gains.
In
Figure 13, it is possible to observe the tendency, previously shown in the validation process, that the output series were closer to the reference series than the original input, even on the left shin sensor.
The average gain for each model, presented in
Figure 14, starts from 106% for the RBM model, up to 232% for the RBM+DTW model. Those results express the efficiency of following the suggestions made by the models. This means that both models generate output series at least 2× better than the original input.
An example of how both models, RBM and RBM+DTW, correct the input series could be observed in
Figure 15. The input series contains a decreasing tendency at the beginning and an increasing tendency at the end (
Figure 15a). These tendencies do not exist in the reference series (see the reference series in the
Figure 8,
Figure 10 and
Figure 11).
The RBM model suggests adjustment on the extremities of the input series, but it creates a bigger gap between the series (
Figure 15b). The RBM+DTW models also change the input extremity but obtain better results on preserving a smaller gap than the reference series and the RBM+DTW output series (
Figure 15c). This example shows that the RBM models generate an output series 27.22% better (more similar to the reference series) than the input series, against the 84.89% from the RBM+DTW output series.
In
Figure 16, it is possible to compare the gain of both models by exercise and body segment. As explained before, the RBM+DTW outperforms the RBM with a higher gain value, sustaining the previous analysis.
An interesting result could be observed on elbow flex–extension (
Figure 16b), where the gains for the arms are greater than the forearm, in both sizes. This could be explained by the difference between the movements on each segment. During the EFE, the arms move less than the forearms, which means that any suggestion (modification) on the arms has a huge impact on the gain. This is different from a long movement which accepts some small variation based on the training process. This same behavior could be observed on knee flex–extension (
Figure 16e), where the shins move longer than the thighs along the exercise execution.
The RBM model during the squats (
Figure 16f) exercise has the smallest average gain of all exercises, while the RBM+DTW produced the highest average gain during the knee flex–extension (
Figure 16e) exercise.
Figure 14 highlights the gain on use of these models during the exercises squeeze (
Figure 16a) and knee flex–extension (
Figure 16b), where both produced suggestions with gains higher than 200%, while the squat (
Figure 16f) produced gains smaller than 90% but greater than 50%. This means that, even in the worst case of this approach (
Figure 16f), the use of the proposed method presents an output series at least 65% better than the input series.
6.4. Discussion
Through the results presented in
Section 6, it was possible to address the motivating questions of this article, where: (i) By generating a machine learning model using a Restricted Boltzmann Machine, from a binarized time series, that represents the inertial movements of each body segment, it is possible to indicate which points, in the input series, do not satisfy the data patterns from the training process and corrects them in such a way that they approach the expected pattern by the neural network. (ii) By applying the proposed model to a database containing inertial data from subjects performing various physical exercises correctly and incorrectly, it was possible to train a model with the correct data and recommend adjustments in the time series of the incorrect data, which allowed an increase of up to 3.6× in the similarity between the input series and the output series generated by the machine learning model. (iii) The method suggests that the creation of highly specific models that can identify the user’s movement patterns at the beginning of each physical exercise. This feature ensures that the model which analyzes the subsequent repetitions is as up-to-date as possible. This method is flexible enough to adapt to any repetitive physical activities.
By analyzing the results, it was possible to identify that despite the use of DTW to find a reference series between the training sample has a high computational cost, this mechanism presented relevant advantages when used as a filter to select the samples for training the models. As presented in
Figure 14, the RBM+DTW model outperformed the gains of the RBM for each segment.
Another interesting insight from the experimental results is that, eventually, the input series already satisfies the training pattern; in this case, both the RBM and RBM+DTW models generate output series which are slightly different from the input series, which may increase the distance to the reference series. In this case, it is highly recommended to ignore the suggestions. In other words, if the input series already satisfies the pattern distribution from the training data, it is not necessary to submit this input series to the models.
A third observation from the experiment relies on the use of the magnitude metric, which demonstrates itself as a powerful mechanism to reduce the model dimensionality. Assuming the IMU readings occur in a specific frequency (i.e., 10 Hz or 10 readings per second), it is possible to estimate the movement duration and its velocity by observing the variation between the readings where higher differences means faster movements and lower differences means slower movements.
Finally, a fourth insight from these experiments is that by assuming the number of elements in the reference series as the default sliding window size, it is possible to infer an input series that does not complete the movement inside this window as a wrong movement execution; this assumption relies on the fact that if the movement does not finish along the window size, then it was performed too slowly. On the other hand, multiple executions inside the same sliding window means fast executions, which also does not satisfy the training pattern.
7. Conclusions
The proposed method presents an alternative to evaluate repetitive physical activities and provide suggestions to improve the user’s performance by creating highly specialized models. The main goal is to provide a novel method for movement recognition and evaluation.
By employing a metric known as ’magnitude’, which represents the absolute values of readings from the inertial sensors, we were able to utilize a Restricted Boltzmann Machine for assessing movement tendencies and providing suggestions for new inputs that deviated from the patterns learned in the training dataset. Differently from the usual classification of neural networks, this approach aims to adjust the input data to an output that satisfies the RBM model. The differences between the input and the output data indicate the points where the movement must change to be performed correctly. The proposed method mitigates the sensor displacement problem by retraining the model with the updated sensor position and assuming absolute values regardless of the axis position on the body’s segment.
Although the results have shown that using unfiltered samples to generate the RBM model leads to less accurate suggestions compared to RBM+DTW, this approach still produced output series that, on average, were 1.7× closer (or similar) to the reference series than the input series. In contrast, the RBM+DTW model generated output series that were up to 3.68× superior.
It is important to highlight the computational cost for RBM+DTW, to identify the reference series, which is a computationally expensive operation. Such a method could hardly be used on embedded devices like smartwatches due to resource constraints. On the other hand, the unfiltered model has the potential to do so. The possible applications of this method could span across several areas, such as gym activities monitoring, physiotherapy, robotic arm calibration, and remote activity monitoring. The studies presented in [
41] provide a preliminary approach to adapt the proposed model for integration into wearable devices.
As a major limitation, the application of this methodology in a real scenario demands a qualified professional responsible for guaranteeing the quality of the training data; this means that a user must understand and comprehend the physical activity execution and try to repeat them as best as possible according to the professional orientation during the training set. Also, it is not recommended to reuse a trained model for monitoring executions in different training sessions, due to the risk of the sensor being attached to the wrong segment or the user’s limitation/rhythm change between the session, which may generate incorrect feedback. Another challenge of this approach is to define when an input series (from streaming data) must be analyzed by the RBM model. It is not suitable nor practical to submit a series to the model at each new sensor’s reading. The criterion to evaluate the stream buffer used in this paper is the max value in the sliding window; when it reaches the same index as the max value on the reference series, then the input series is binarized and submitted to the RBM model for evaluation. Finally, the magnitude metric suppresses the capacity to identify the movement orientation and angles. However, the magnitude eliminates the requirement to position the sensors in a specific orientation.
To the best of our present knowledge, the literature does not have suitable solutions to address the problem of providing qualitative feedback for helping users to improve their physical movements. Pushing the limits of the state of the art in human activity recognition, there are some relevant studies such as [
42,
43] that use deep neural networks to recognize multiple human daily activities. Other works such as [
44] use hybrid techniques to combine multiple data sources to recognize human activity. Finally, some studies such as [
45] explore deep learning techniques to identify human walking activities. Nonetheless, the method presented in this paper is distinguished because most existing approaches in the literature concentrate solely on recognizing particular actions or movements, without offering feedback to help users correct a wrong movement and make it right. Furthermore, the proposed method offers several advantages, such as: (i) enhanced technique, in the sense that the system helps users perform exercises correctly, ensuring that the technique is appropriate; (ii) real-time feedback, where users receive immediate feedback while performing exercises, helping them correct any error as they occur, rather than afterward; (iii) by providing guidance on improper movements, the system can help prevent injuries caused by poorly executed exercises; and (iv) as the system can be used by a wide range of people in different locations, making correct exercises more accessible.
Finally, further studies could combine multiple suggestions from each segment sensor as a single one. By combining these multiple outputs into unique feedback, it is possible to evaluate a complex system using smaller evaluations as a divide-and-conquer approach. Also, as the next step, a deeper comparison of the proposed method to the state-of-the-art in data reconstruction focusing on time series.