1. Introduction
In recent years, home weight training exercises have received a lot of attention. Wearable products have also become an important tool to help people improve the quality of exercise. In terms of exercise data quantification, with the rapid development of Wearable Fitness Trackers (WFTs) and Smartphone Pedometer Apps (SPAs), people are keeping an eye on their health through heart rate, fitness, and sleep tracking [
1].
In 2004, Kong [
2] developed an inertial navigation system algorithm based on a low-cost inertial measurement unit to implement a nonlinear data fusion algorithm using a Distribution Approximation Filter (DAF). Gyroscopes are often used for system positioning in many applications, but gyroscopes generate error drift, a low-frequency drift phenomenon over time [
3], which accumulates very large orientation errors over time, so that an attitude positioning system cannot use only one gyroscope to achieve accurate orientation. The causes of integration drift can be divided into two categories: one is the linear increase of drift due to the integration of the original offset in the signal, and the other is the integration of noise in the signal [
4].
In the long term, both accelerometer and magnetometer have no drift. Inputting the accelerometer and magnetometer data into Kalman Filter (KF) can compensate the drift of gyroscope, so it can be used to determine the orientation, but both of them have more obvious short-term noise [
5].
Many studies have used data from inertial measurement units (IIMUs) for human posture recognition using machine learning models. Tahir [
6] used a one-dimensional Hadamard wavelet transformation and a feature extraction method based on one-dimensional LBP (Local binary patterns) to compute valuable eigenvalues in acceleration and angular velocity signals, and used a sequential minimization algorithm and a stochastic forest method to classify activities in the USC-HAD (USC human activity dataset) and IMSB (IM-Sporting Behaviors) datasets. No IMU-based device was presented in this work. Ezio Preatoni [
7] trained and tested different supervised machine learning models based on 14 participants, including k-NN (k-nearest neighbors) algorithm and SVM (Support Vector Machine), using accelerations and angular velocities of one sensor with an highest accuracy of 96.4% and of two sensors with 97.6% accuracy to classify four fitness sports. No exercise prescription variables, such as exercise sets, training capacity, workout capacity, training period, explosive power, etc. were considered. Many research papers have combined data from cell phones and wearable sensors to automatically capture features using deep learning algorithms, effectively saving manual feature processing [
8]. Sui [
9] mentioned that many papers use IMU data into a CNN (Convolutional Neural Network) to achieve posture recognition, and he proposed another CNN algorithm to calculate the pace trajectory. In his work, the developed IMU device was worn on a shoe for foot motion tracking. The literature [
10] proposes a real-time segmentation and classification algorithm that can identify physical exercise and works well in both indoor and outdoor environments. This proposed algorithm achieved a 95% classification accuracy for five indoor and outdoor exercises. No exercise prescription variables were considered in [
10]. Koskimäki and Siirtola [
11] trained a machine learning model with accelerometer data to classify 36 types of fitness exercises. The data were collected using a GeneActiv 3D accelerometer, which can be used as a wrist-worn or be attached with straps to other body locations.
In many cases, IMUs have been used to develop multimodal algorithms, and Hausberger et al. [
12] designed a smart fitness device that uses acceleration and angular velocity signals read by IMUs to perform automatic motion tracking and analyze weight training exercises. Their device was mounted on the dumb bell. The repetitions of dumb bell exercise were detected by a peak/valley detector. Other exercise prescription variables were not considered in this work. Seong and Choi [
13] used an IMU to implement a new type of computer interface operating system that performs hand motion tracking and click gesture recognition algorithms using Euler angles and acceleration data, respectively.
Chen et al. [
14] proposed a deep learning framework based on graph convolutional network (GCN) to predict the position of each joint of human lower limbs during motion in 3D space using signals collected from five IMUs (above the ankles of both feet, above the knees of both feet, and at the waist). Abdelhady et al. [
15] proposed a system of IMUs in the lower legs, IMUs in the thighs, and pressure sensors in both feet to measure the kinetics and kinematics of the lower limbs. Wang et al. [
16] used an IMU on each leg and a motion camera on each calf to estimate the asymmetry of stride length and gait, and the root mean square error of stride length was 5.4 cm and asymmetry was 3.0% in healthy subjects, while the root mean square error of stride length was 9.6 cm and asymmetry was 14.7% in subjects with abnormal gait.
Zou et al. [
17] proposed a low-cost smart glove equipped with an inertial unit. Their system could recognize 15 sets of training programs, detect three common nonstandard behaviors: case 1-the overall speed of a repetition is too fast or too slow, case 2-the speeds of outward and backward processes are not balanced, and case 3-the repetitions are not stable with noticeable shakes. A segmentation process was used in their system to detect the start point and end point of each repetition with the help of a double adaptive thresholds-based method. Then the half dynamic time warping (Half-DTW), which is not a deep learning approach, was proposed to measure the similarity between the unrecognized activity segmentation data (accelerometer and gyroscope readings) and the activity templates. The Half-DTW requires two steps. First, it applies an outward-backward segmentation operation to divide a repetition into outward and backward parts through locating the turning point (TP, stage changing point). Second, the Half-DTW applies the corresponding matching operation and outputs the identified activity type. Several exercise quality indices were calculated as well: standard degree and execution duration for the assessment of single repetition, and smoothness and continuity for the assessment of group repetitions. The average accuracy using Half-DTW for 15 exercise types is 96.07% with 103.76 ms average time delays. Rodriguez et al. [
18] presented a wearable device using three inertial units with triaxial accelerometers, gyroscopes, and magnetometers to measure the orientation of three sections of the spine for low back pain therapy. By integrating the absolute and relative orientation from the sensors based on the so-called direction cosine matrix (DCM), it estimates the posture of the back in real-time and uses a fuzzy system to control a vibration unit to alert the user to correct the posture of his/her back.
From these mentioned literature in the above, most papers focus on recognition algorithms for exercises to improve prediction accuracy. The number of sensors used for collecting data and identifying exercise types is ranging from one to five or even more. Depending on the application purpose, some sensors are attached to lower limbs, and some are attached to arms. However, to the best of the authors’ knowledge, very few papers extend the results of the IMU motion tracking to produce helpful exercise prescription variables for fitness lovers. This paper integrates hardware and software to develop an attitude and heading reference system (AHRS) device using only two IMUs (one on the wrist band and the other in the sleeve near the elbow) and an intelligent algorithm that performs motion tracking, posture recognition, and calculates exercise prescription variables for weight training. These exercise prescription variables, including exercise items, repetitions, sets, training capacity, workout capacity, and training period, etc., provide meaningful information for weight trainers/trainees. A fitness App with the function of recording exercise performance is also developed on the smartphone, so that the users can check their exercise performance and review the historical exercise records in the user interface in real time.
2. Materials and Methods
2.1. Overall Study Design
As shown in
Figure 1, a wearable device was developed in which a microcontroller unit (MCU) was used to collect six-axis data and quaternion data from two IMUs (one on the wrist band and the other on the upper arm near the elbow) and heart rate data. In addition to the IMU and heart rate module, the MCU in the wrist band is equipped with a real time clock (RTC) and a Bluetooth Low Energy (BLE). The RTC provides the present time for MCU, and BLE is used to transmit IMU six-axis (6 degrees of freedom, 6-DoF) data, quaternion, heart rate, and time through data packaging to the smart phone. By integrating the existing techniques of motion tracking [
19] and posture recognition [
12], a weight training quantization algorithm is devised for smartphone to convert motion data into prescription variables. The smartphone App can communicate with the MCU through BLE using pre-defined commands. The App then parses the data received through Bluetooth (BT data packet) into IMU 6-DoF data, quaternion, and heart rate. The IMU 6-DoF data are used for posture recognition based on a pre-trained machine learning model (ML model) installed on the phone. This ML model is trained offline on a PC by feeding recorded IMU 6-DoF data, which are prepared in csv files. Once the ML model is trained, it can be installed on the smart phone for posture recognition to create posture labels. The parsed quaternion is used for motion tracking to calculate arm orientation vectors and to find out the elbow and wrist positions that are then fed to the Unity’s Animation System (avatar, virtual humanoid) [
20]. The parsed heart rate and the motion of the 3D virtual humanoid are displayed on the screen of the developed smartphone App. The posture label with the positions of elbow and wrist is input to the exercise state analysis, and the exercise event stream (including exercise state, rest state, repetitions, rest time between sets, and exercise event) is obtained for the exercise event process. The exercise prescription variables are then computed, based on the input dumb bell weight and the exercise event stream, by the exercise event process, shown on the phone screen of the App, and stored in SQLite database. Not only transmitted by BLE, the IMU 6-DoF, quaternion, and heart rate data collected by MCU can also be transmitted by wire and stored as csv files. These csv files can be stored in a cloud or in a PC.
The proposed artificial intelligence weight training quantization algorithm converts the raw motion data into exercise prescription variables step by step based on mathematical and physical theories, as shown in
Figure 2. This part has two procedures, one for posture recognition and one for motion tracking using IMU 6-axis data and quaternion data of the wrist and the arm to calculate the posture label type and the joint position of the arm. The outputs of the two procedures are fed first to the exercise state analysis for calculating repetitions, state, and event, and then to the exercise event process for calculating the exercise prescription variables.
2.2. Motion Tracking Algorithm
In this paper, we use two nine-axis sensors (Adafruit BNO055 Absolute Orientation Sensors), as shown in the top photo of
Figure 3, one on the wrist and the other on the upper arm near the elbow (marked by red circles). The bottom left photo shows that the sensor coordinates transformed to world coordinates, and the bottom right photo shows the world coordinates transformed to Unity coordinates. Once the alignment with the user’s orientation is done, the Unity would show the same movement as the real person does. In the built-in microcontroller of the sensor, a world coordinate with the
Y-axis facing the north pole of the geomagnetic field and the
Z-axis perpendicular to the earth plane is defined, and the corresponding quaternion
is generated based on this world coordinate.
represents the action of rotating the sensor from the base state to the current position, which can be further expressed as Equation (1).
The physical meaning of
is to rotate a certain three-dimensional vector
around a unit vector rotation axis
by
θ degrees into a new vector
, where
is represented by the quaternion
, and
is represented by
. When the quaternion at initial state,
, is
, the object coincides with the world coordinate, which is
. In
Figure 3, the coordinates of the nine-axis sensor in the base state coincide with the world coordinates, and the attitude vectors of the upper arm and forearm
and
align the
X-axis and −
Y-axis of the sensor coordinates, respectively, and can be expressed as quaternions as Equations (2) and (3).
The quaternions
and
of the upper arm and forearm are multiplied by the arm orientation vector as in Equations (4) and (5), respectively, to obtain the quaternions
and
of the arm after rotation.
where
,
, and the orientation vectors of the arms in the world coordinates are Equations (6) and (7).
The world coordinates are transformed to the virtual Unity coordinates. In the paper, the
Y and
Z axes of the world coordinates are aligned to the Unity coordinates, and Equations (8) and (9) are the arm pose vectors
and
in the virtual three-dimensional space (virtual Unity).
There is no wearable sensor on the torso, so it is not possible to determine the orientation of the model in the Unity coordinates. Therefore, in this paper the arm movement trajectory in three dimensions is simulated under the assumption that the user’s torso does not turn. To achieve this simulation, the orientation of the subject has to be corrected to the same direction as the front of the 3D avatar (virtual humanoid, Unity) before the movement. First, let the user straighten the arm towards the front of the body and calculate the horizontal angle γ between
and the Unity coordinate
Z-axis (in front of the avatar), which is the angle between the user and the avatar’s face, and simultaneously rotate
and
in the ground plane (orthogonal to the Unity coordinates’
Y-axis) by γ degrees to obtain the Unity coordinates’ arm orientation vectors after aligning the real and virtual sides. Finally, based on the user’s arm length
and
, the positions of the elbow and wrist joints relative to the origin of the Unity coordinates,
and
, are calculated using Equations (10) and (11). The wrist position
is then used in the inverse kinematic model to update the 3D avatar’s movement.
2.3. Posture Recognition Algorithm
The algorithm is based on a supervised deep neural network model (including LSTM layer) to train a model for classifying four weight training posture types. Each sample contains six features (three-axis accelerations and angular velocities measured by the wrist IMU), and the sampling rate is 100 Hz. The output layer is a dense layer with softmax as the activation function. The output data size is (1, 4), so the model can classify four labels. The epoch is set to 30 and the batch size is set to 64. The python code for this model based on Keras + Tensorflow is as follows:
input_layer = tf.keras.layers.Input (shape = (128, 6))
hidden1 = tf.keras.layers.LSTM (128) (input_layer)
hidden2 = tf.keras.layers.Dropout (0.5) (hidden1)
hidden3 = tf.keras.layers.Dense (6, activation = ‘relu’) (hidden2)
output = tf.keras.layers.Dense (NUM_GESTURES, activation = ‘softmax’) (hidden3)
model = tf.keras.Model (inputs = fitness_dataset, outputs = output)
model.compile (optimizer = ‘rmsprop’, loss = ‘mse’, metrics = [‘mae’])
history = model.fit (inputs_train, outputs_train, epochs = 30, batch_size = 64, verbose = 0, validation_data = (inputs_validate, outputs_validate))
After training, the pre-trained model is converted to a TensorFlow Lite model file (.tflite) with a file size of 412 KB. TensorFlow Lite is a machine-learning solution for mobile or embedded devices, with APIs available for Android and iOS mobile development platforms. Developers can convert TF (TensorFlow) models into TF Lite files and import them into their mobile projects to realize edge computing locally. In this paper, we use Flutter to develop Android mobile application, and use the TensorFlow Lite patch tflite_flutter (version: 0.8.0) of Flutter Package to import the model file into the development project. The six-axis data from the wrist IMU is used as input data to the pre-trained model of the local App for classifying the user’s motion posture (three pre-defined dumbbell movements and rest state). The flowchart of the algorithm running in the App is shown in
Figure 4.
When the LSTM model completes the identification, the oldest 32 samples are removed, and the latest 32 samples are added into the input data (total 128 samples) for LSTM. The output of the LSTM model is a probability vector r with dimension (1,p), where p is the number of postures to be identified. The element with the largest probability (>0.5) in this vector is the most possible posture label at the moment. If the same label is identified for n consecutive times, this label will be used as the output result of the posture recognition algorithm.
In order to collect the motion data for training the neural network, the participants were divided into a training group (control group) and a test group. The datasets for training and testing were collected by an experiment instructor who instructed the participants to put on the wearable device developed in this paper, and the participants performed four postures (rest, bicep curl, lateral raise, and shoulder press) at their own pace. The instructor used the developed smartphone App to record the participants’ exercise data (with a sampling rate of 100 Hz). The data were stored in the local database, converted to a csv file, and uploaded to the storage space of the cloud database at the end of sampling. Then the local computer downloaded the cloud data for training.
The dimension of the input data is (1,128,6), and the IMU six-axis data are packaged in 32 time steps, as shown in
Figure 5. Each set of data has 128 consecutive samples. Once a set of data has been processed, the first (oldest) 32 samples in this set are discarded and 32 new samples are added to the end of the set to create another data set until the sample in the data is exhausted.
2.4. Exercise Prescription Variables Algorithm
The algorithm integrates the wrist position and posture label to calculate the exercise state, repetitions, exercise period, and rest intervals between sets and to trigger the corresponding exercise events to calculate the exercise prescription variables. In addition to the weight training, it also can be used for the assessment of neuromuscular qualities and for a targeted resistance training. Picerno [
21] provided the exercise operator with a list of good practice rules concerning the instrumented assessment of muscle strength during isoinertial resistance exercises, such as movement execution, load assignment, number of lift within a set, rest intervals between sets, etc. Weakley et al. [
22] listed feedback variables and their effects on acute-training performance, such as frequency, quantitative vs. qualitative, conscientiousness, motivation and competitiveness, intrinsically vs. extrinsically motivated athletes, and encouragement. Good exercise prescription variables of weight training can provide correct feedback to the exercise operator and motivate him/her to keep on good exercise habit. As shown in
Figure 6, the current state flag is determined by the state of the previous round and the current output of the posture label, where a flag of True means the user is in exercise state, otherwise it is in rest state. When in exercise state, movement event is triggered, and peak/valley detection algorithm is used to count the repetitions of this exercise. If no repetition is detected, repetition event is triggered; otherwise go to state flag identification. When in rest state, if a new set of exercise begins, data for the previous set are updated, rest time between sets is computed, and the start event is triggered.
In the exercise state, the movement event is triggered, and the wrist position P relative to the shoulder is input to the peak/valley detection algorithm to calculate the repetitions and the displacement in the vertical direction, as shown in
Figure 7. The shoulder here is set as the origin, assuming that the length of the upper arm and forearm are 1. The current value of the wrist position P in the
Z-axis coordinate perpendicular to the ground plane is between −2 and +2. In state 0, we store P into the array and determine whether P is greater than the lower threshold B1. If yes, find the minimum value
in the array, and go into state 1. In state 1, if P is greater than the upper threshold B2, enter into state 2. In state 2, record the value P in the array until P is less than the threshold B2. When P is less than B2, find the maximum value
in the array and enter into state 3. In state 3, if P is less than B1, it means that a repetitive motion is completed. The displacement of the current repetition in
Z-axis h = 2 ×
) and the time difference Dt from the last completed repetition to the current one (i.e., the time spent on the current repetition) are calculated, and therefore the work is computed (h multiplied by the acceleration of gravity g). After that, trigger the repetition event and return to state 0 from state 3. Different exercise modes should use different B1 and B2 values, and different users have different thresholds for the same exercise item. Therefore, the values B1 and B2 should be adjusted according to the user’s exercise mode and habit.
Table 1 shows the thresholds for three exercise modes for one subject.
As shown in
Figure 8, the exercise event enters into the application programming interface (API) of the event handler in a prescribed format. Depending on the type of event and the parameters it carries, the program executes the corresponding computation of the exercise prescription variables and refreshes the user interface.
When the start event occurs, update the number of sets (unit: times), the rest time between sets (unit: seconds), and the load weight of the set (mass of dumbbell input by the user, unit: kg).
When a movement event occurs, the current wrist position (in centimeters) is updated and the user interface is refreshed.
When a repetition event occurs, the training capacity, work, and power are calculated; the number of repetitions, cumulative repetition period (in seconds), cumulative training capacity, cumulative work, and maximum power (explosive power) are updated; and the user interface is refreshed.
Table 2 lists the source or formula of the exercise prescription variables, where the load weight is manually entered into the App by the user; the number of repetitions is calculated by the peak/valley detection algorithm and brought in by the repetition event; the training capacity is the load weight multiplied by the number of repetitions in the current set as Equation (12); the work is calculated by multiplying the load weight by the acceleration of gravity and the repetition displacement as Equation (13); the average power is the work in the current repetition divided by the time spent in the repetition
as Equation (14); the burst power is the maximum average power in the current set; and the rest time between sets is the time interval from the previous set to the current set.
2.5. Verification Experiments
The designed system was proven by two types of verification tests. The first type is to control a stepper motor to rotate the designed IMU and to compare the rotation angle obtained from the IMU with the rotation angle of the controlled stepper motor. As shown in
Figure 9, the nine-axis sensor was fixed above the center axle of a stepper motor before the experiment was conducted. The experiment used a microcontroller to output the control signal to the drive circuit of the motor, which drove the nine-axis sensor to rotate forwards from 0 to 270 degrees with an increment of 18 degrees and then backward from 270 back to 0 degrees with a decrement of 18 degrees (0° ≤
θmotor ≤ 270°). For every 18 degrees of motor rotation (∆
θmotor = 18°), the microcontroller records the quaternion
q of the nine-axis sensor. Equations (15) and (16) are used to derive the angle value
θ of the current nine-axis sensor rotation along the central axel
u of the motor (the
z-axis of the nine-axis sensor is parallel to
u). The angle of rotation of the stepper motor
θmotor was used as the control group, and the angle value measured by the nine-axis sensor
θimu was used as the experimental group. The mean absolute error was calculated between the two sets of samples.
The second type of verification test is to use a camera with an open source optical human body pose estimation system, Mediapipe Pose [
23], to calculate the position of the wrist and the angles of upper arm and forearm between the
Z-axis, and these calculated data are compared with the designed system. This is to verify the reliability of the motion tracking algorithm. MediaPipe Pose is a machine learning-based human pose tracking algorithm that can identify and mark 33 human body feature points from RGB images of the whole body, as shown in
Figure 10. An experimental environment where a camera is placed facing a wall with a white board was setup. We connected a camera to the computer; ran the Python script in Anaconda environment, captured the RGB image of the human body through the camera; input the image to the Python API provided by MediaPipe Pose to find the left shoulder, left elbow, and left wrist coordinates of the human body (
,
, and
in
Figure 10 [
4] at locations 11, 13, and 15); and mark these coordinates on the original image.
In the M system (MediaPipe Pose estimation system), a rectangular coordinate system is defined and the positions of each feature in the image are denoted as
, and the left shoulder, elbow, and wrist are denoted as
,
, and
, respectively. The two-dimensional unit vectors
and
of the forearm and upper arm were calculated using Equation (17), and the timestamp and arm orientation vector were saved as a csv file in a list at the end of each sampling.
In this paper, a new coordinate system will be constructed as the common coordinate system between the M system and the presented system, and these two coordinate systems will be aligned with the new coordinate system, and the M system will be used as a reference to observe the difference of the motion tracking algorithm of the presented system.
The left shoulder is set as the origin of the new coordinate system
, and the arm orientation vectors of the two original coordinate systems are transferred from the old coordinate system to the new coordinate system (the upper arm is denoted as
and the lower arm is denoted as
), and the relative positions of the elbow and wrist joints in the new coordinate system,
and
, are derived by Equation (18).
The experiment is to use the two-dimensional plane (Y-Z plane) as the reference plane. The camera lens surface is parallel to the Y-Z plane. The subject’s back is against the wall (Y-Z plane). During the experiment, the subject moves the arm in the frontal plane (Y-Z plane), and samples are taken simultaneously by both systems (sampling rate: 30 Hz for M system and 100 Hz for this system). Time (in ms) is recorded, and , , , , , and , are calculated, where the angle between the left upper arm vector and the Z-axis is α, and the angle between the forearm vector and the Z-axis is β. The direction from the user’s back to the user’s front is the –X-axis direction in the common coordinate system with the vector (−1,0,0).