1. Introduction
Modern robots, such as rehabilitation robots, try to actively interact with human partners towards achieving a common task. This gives rise to the emerging field of research named human robot interaction (HRI), aiming at maximizing the performance, efficiency, and applicability of coupled human–robot systems [
1]. It can be defined as “a field of study dedicated to understanding, designing, and evaluating robotic systems for use by or with humans” [
2]. The system performance, however, is largely limited by the output of human intent learning and prediction [
3]. In many human-robot systems, continuous human motion intents instead of a limited number of classified motion patterns have to be identified and further reconstructed as control commands to the robot so that the robot could match human motion intents and perform efficient assistances [
4]. Thus, how to determine the human motion intent in terms of continuous joint angle has become the key issue. Two main related problems remain: one is the choice of the learning model for continuous joint angle prediction, the other is the selection of input signals that could reflect both the human motion dynamics and the human-robot interaction state well.
As for the modelling methods for the forward dynamics of human neuromusculoskeletal system, one choice is the application of phenomenological or semi-phenomenological models. Clear biomechanical interpretations can be found in these kinds of models. Although most of the existing models, such as the model proposed by Zajac [
5] and Zahalak [
6,
7], are based on the phenomenological models derived from Hill’s work [
8] or the more complex biophysical model of Huxley [
9], the transformation from muscle activation to muscle force and human kinematics has not been fully understood yet. In addition, the determination of muscle geometry, such as the muscle-tendon moment arm, is especially difficult for a living person and thus makes the estimation prone to error [
10,
11]. Problems also exist associated with the application of phenomenological models when the workload of the model parameter identification for each subject is extremely heavy [
12,
13].
The other approach that bypasses the above problems and limitations is to use machine learning methods. Moreover, with the fast development of artificial intelligence, artificial neural networks (ANNs) are now widely applied in the control of human-exoskeleton systems [
1]. In addition, the Gaussian process (GP) is also receiving more attention in the field [
14,
15,
16]. The GP model has superior flexibility when the parameters that directly determine the model performance can be adjusted through learning [
17,
18]. As for the mapping structure, one widely applied approach is to map muscle activation or neural signals to the joint angle [
19]. When the history of the movements reflects the inherent dynamics of human motion [
20], joint angle could also be predicted using autoregressive models. To combine the advantages of both and achieve continuous, intuitive and naturalistic understanding of human motion intent, nonlinear autoregressive with exogenous inputs (NARX) would be a better choice and has begun to be applied in the field of joint angle prediction [
4,
21,
22]. Its application combines the non-linear spatio-temporal correlation structure of natural human movements with muscle-driven control signals to exploit the best of both worlds [
23].
However, such approaches treat the entire central nervous system as a black box with no biomechanical interpretations. Thus, the application is still a matter of great debate.
The ideal solution to such a contradiction should be one model that not only integrates the state-of-art machine learning techniques that provide good adaptivity for practical applications, but also has corresponding biomechanical interpretations. When the state model and measurement model of state space model (SSM) could be used to interpret the related biomechanical mechanisms, the SSM, which also integrates GP and ANN, may be the promising choice.
For the input signal selection, electromyography (EMG) signals which reflect the muscle activation have been widely utilized in the fields of joint angle learning and prediction [
19,
24]. The EMG signal provides easy access to physiological processes that cause the muscle to generate force and produce movement. Different from intramuscular EMG which uses a fine needle electrode inserted into the muscle, surface EMG (sEMG) is more commonly used due to its outstanding characters including noninvasiveness and safety [
25,
26]. As the quality of sEMG signals is very sensitive to muscle fatigue, sweat, and changing electrode characteristics, a proper processing method is needed and the application of on-line learning approaches, such as the GP and ANN as just introduced, would be preferred. In addition, sEMG signals are also used as the measurement output in SSMs for human joint angle prediction, such as the redundant subsectors of the sEMG-vector [
4] and other extracted EMG features, including waveform length (WL) and zero crossing (ZC) [
27]. Meanwhile, interactive force, which is a direct end-effector reflection in human-exoskeleton systems, can be regarded as an important indicator of transparency between a human and robot [
28,
29]. Although the fusion of EMG and force would improve the performance of joint angle prediction, interactive force has rarely been used in combination with EMG signals in joint angle prediction [
30]. More importantly, how the interactive force works in the human-exoskeleton system model, i.e., its biomechanical generation mechanism, is often ignored. Thus, the influence of the interactive force on the human-exoskeleton system is always overlooked.
The motivation of this paper is to develop a GP-integrated SSM for continuous joint prediction and to provide biomechanical interpretations for its state model and measurement model in human-exoskeleton integrated applications. To the best of our knowledge, the interactive force has rarely been used in combination with EMG signals in the continuous joint prediction of human-exoskeleton systems. In the proposed model, interactive force is applied as the measurement model output and the EMG signal is used as the state model input to indicate muscle activation. Hill’s muscle model and semi-phenomenological (SP) model [
31] are adopted to acquire the forward dynamics of joint motion for the state model and the human-machine interaction mechanism interpretation for the measurement model, respectively. In practical applications, the GP-based NARX model and back propagation neural network (BPNN) are applied to provide better flexibility and adaptivity. Compared with the existing SSMs for continuous joint angle prediction [
4,
27], our main contributions lie in three aspects. First, interactive force is fused with the EMG signal in the SSM instead of using extracted EMG signal features alone. When the impacts of interactive force on the human-exoskeleton system are considered, the performance of joint angle prediction would improve. Second, in addition to the use of machine learning techniques, Hill’s muscle model and SP model are used to derive the biomechanical mechanisms for the state model and measurement model of the proposed SSM. Third, as a non-parametric model, GP is integrated with traditional BPNN to combine the relative strengths of both the parametric model and non-parametric model. Corresponding experimental results demonstrate the validity and superiority of the method.
3. GP-NRAX- and BPNN-Integrated SSM for Joint Angle Prediction
Equations (10) and (24) give the concrete biomechanical models for the forward dynamics of the knee joint and the interactive force generation. However, the corresponding workload and computation complexity, including the acquisition of kinematic and kinetic data [
11] and the model calibration process using an inverse dynamic model [
32], are considerable. Moreover, characteristics of the human-exoskeleton system always change with time, which makes the model that includes the time-varying interactive force more difficult to be utilized in practical applications [
34,
37]. When fast prediction, real-time control and self-learning of parameters are often required in the development of biomechanical models, the use of machine learning method would be the best solution so far to meet the adaptivity requirements of the human-exoskeleton models [
16].
To find out the relationship among the variables more clearly, we first change the time step in (24) from
to
, which does not affect the correctness of the equation. Then, put aside the constant parameters and turn the concrete expressions in (10) and (24) into a more general form:
where
is the muscle activation,
is the vector containing the joint angle and its velocity,
is the interactive force between the exoskeleton and human and
is their angle difference. Given that the angle different
is a time-varying variable which cannot be acquired directly from muscle activation and joint angle as stated in
Section 2.2.3, its variation can be merged into the model adaptivity and learning process through the machine learning approaches. The aims are to reduce the impact of this uninterpretable variable on the model development and find out the relationships among the variables under study more intuitively. Thus, a clear formulation of state space model can then be acquired:
where
and
are the process noise and the measurement noise, respectively.
Based on (26), GP and BPNN are adopted for the state model and measurement model, respectively, to combine the relative strengths of both the non-parametric model and parametric model. The reason to apply GP and BPNN here is that BPNN has already been widely applied in the fields of human motion intent learning [
1], and GP offers superior flexibility compared with traditional modelling methods in learning [
16]. There have been successful applications of GP in human motion intent learning and prediction [
22,
23,
44]. Also, unscented Kalman filter (UKF) is used to develop the close loop of the state space model [
45].
In the proposed SSM, sEMG signals, which are thought to be caused by the motor unit action potentials (MUAPs) spreading on muscle fiber, are selected as the vector
which is the input to the state model to indicate muscle activation. SEMG signals are selected due to their outstanding characters including noninvasiveness and safety [
25,
26]. The interactive force
is the measurement model output, and joint vector
whose elements include the joint angle and the angular velocity is treated as the state vector. Specifically, the GP-NARX with one-step time-delay feedback defined as
is applied in the state model, and the BPNN defined as
is applied as the measurement model in (26). The state space model framework is shown in
Figure 3.
The state space model defined in (26) is first trained offline by a set of data, including the sEMG, joint angle, angular velocity and interactive force. Then, based on the trained GP-NARX and BPNN, the UKF algorithm is utilized to estimate the joint angle from sEMG signals and interactive force through the proposed SSM, as illustrated in
Figure 4. The closed-loop nature of UKF makes the prediction more accurate. Then, the trained model is used for joint angle prediction.
4. Experimental Study and Discussion
4.1. Experimental Setup and Data Processing
Experiments were conducted on the self-developed lower limb exoskeleton [
46], as illustrated in
Figure 5a. The applied robot was the combination of an exoskeleton robot and a bed which can turn from horizontal to 50°. As shown in
Figure 5b, the shank part of the exoskeleton is composed of a shank base and a sliding sleeve. This mechanical structure is adopted so that the contact point between the user and the exoskeleton can be adjusted according to personal requirements before experiments. Once decided, the position will be fixed during the running process when the change of contact point in the process may give rise to reduced prediction performance. A force sensor is planted at the contact point and a gas bag is used to ensure the degree of comfort of the subject.
Six healthy men aged from 23 to 30 participated in the experiments. This work was approved by the Ethics Committee of Shanghai Jiao Tong University. All the subjects gave written informed consent prior to the participation in the study. Each subject was required to make voluntary and random knee flexions and extensions to drive the exoskeleton when the subject’s thigh was kept in parallel with the ground and the position of the knee joint was fixed. About seven sessions, in each of which the subject randomly flexed and extended his leg for about 40 s, were conducted for each participant. Every two sessions were separated by a rest of 3 min to avoid muscle fatigue. More sessions would be added when poor data collection quality or obvious errors could be observed and the corresponding results were excluded from the final analysis. The data of the first four sessions of random movements were used for training the motion prediction models, and the remaining three sessions were used as the test data for performance evaluation. During the whole process, the exoskeleton worked under the interactive force sensing and control mode. In this mode, the exoskeleton was driven by the motors to follow human motion in real-time. Through detecting the interactive force between the human and exoskeleton, the trajectory prediction and compensation were implemented. The detailed force control strategy can be found in [
47].
Disposable bi-polar Ag-AgCl surface EMG electrodes were used to collect the EMG signals from the RF. In our experiments, only the EMG signals from RF instead of those from both RF and BF were used to feature the muscle activation. The corresponding muscles are illustrated in
Figure 6. There were two main reasons that the EMG signals from BF were not included in our model. The first was that the activation of BF stayed low and did not change much during the experiment process when the subject’s thigh was kept parallel to the ground [
46]. This meant that the BF contributed little to the joint variation. The second reason is that reducing the number of inputs would largely increase the computational efficiency, especially when GP was included in our model [
16,
17].
The force signal was processed by a 3rd-order digital Butterworth filter of 0.01-1 Hz. The EMG signals were processed under the sampling frequency of 2000 Hz. A 4th-order digital Butterworth filter of 10-500 Hz and a notch-rectifier of 50 Hz (to remove the power noise) were applied to the EMG signals. The robust and stable energy kernel method was applied to process the EMG signals and extract the muscle activation when it integrated both the merits of RMS and MPF and reflected muscle recruitment level and firing rate at the same time [
42]. The extracted feature of EMG signals was indicated as EK in the following. As only the EMG signal from RF was included, the dimension of the feature vector was 1.
4.2. Model Learning and Input Choice
Motion models were trained for each subject. The GP-NARX model was first trained using
as the inputs and
as the output. The model structure is illustrated in
Figure 7. SE was chosen as the kernel function when the stationary nature and the smoothness assumption of this radial basis function regression model coincide with the joint angle variation. Specifically, the automatic relevance determination squared exponential (SEard) kernel was used to provide different hyperparameters for the three different inputs, including the extracted feature of EMG, the joint angle and the angular velocity. The model hyperparameters were optimized through minimizing the differentiable multivariate function using the conjugate gradients.
The BPNN included three layers (i.e., one input layer, one hidden layer, and one output layer). The input to the BPNN in the measurement model was the joint vector
including the joint angle and angular velocity, and the interactive force
was the model output. Thus, the dimension of the input vector was 2 and that of the output vector was 1. Sigmoid function (logsig) and linear function (purelin) were selected as the transfer functions of hidden and output layers, respectively, as illustrated in
Figure 8. In order to decide the numbers of hidden layers and neurons, the correlation coefficient was used to evaluate the network performance. After many attempts, the number of hidden layers was chosen as 1 and the number of hidden neurons was chosen as 5 to achieve a tradeoff between computational efficiency and accuracy. The Levenberg-Marquardt method was used to optimize the net.
In fact, the joint angle instead of the joint vector containing both the joint angle and the angular velocity was also tested as the input to the measurement model. However, the performance was far from satisfactory even when more hidden layers and hidden neurons were used. It proved, from one aspect, that the biomechanical interpretation of human-exoskeleton interaction did provide a good foundation for the development of the measurement model. With the trained GP-NARX model and BPNN, the state space model formulated in (26) can be obtained and developed using the UKF algorithm.
The way in which system models learn the mapping relations from neural signals to human joint angle varies, but ANN is one of the most widely used approaches [
1]. Thus, different from the BPNN for the measurement model of SSM, a traditional BPNN, which used the extracted EMG features and the interactive force as the inputs and the joint vector containing the joint angle and angular velocity as the output, was also built for performance comparison. In consequence, the dimensions of the input vector and output vector were both 2. It was also found through numerous attempts that the increase of the numbers of hidden neurons and hidden layers did not improve the performance much. Thus, one hidden layer with 5 hidden neurons was also adopted here. Its architecture is illustrated in
Figure 9.
4.3. Experimental Resuls
To better evaluate the performance of the proposed SSM, different evaluation criteria including the correlation coefficient (CC), root mean square error (RMSE), and the goodness of fit in terms of normalized mean square error (NMSE) are utilized. The goodness of fit in terms of NMSE is abbreviated as FIT in the following.
The Pearson CC of predicted values
and actual values
in terms of the covariance of
and
is defined as:
where
and
are the standard deviation of
and
, respectively.
The RMSE of predicted values
with actual values
for
different predictions is given by:
The cost function of the goodness of fit is chosen as NMSE in our application:
One typical segment of experimental results of Subject 1 (abbreviated as S1) using the proposed SSM is illustrated in
Figure 10.
Referring to
Figure 10a, the muscle activation could be well represented by the extracted EMG feature acquired from the energy kernel method. In addition, clear mapping relations could be observed between the interactive force and the joint angle. It also proved the rationality and necessity to include interactive force in the proposed model for the improvement of joint prediction performance.
The joint angle and angular velocity prediction performance comparisons between the SSM and their actual values are illustrated in
Figure 10b,c. A Wilcoxon signed rank test was used to calculate the corresponding
p-value. As can be seen, although there existed local biases, the predicted values of both the angle and angular velocity could approximate their actual values well. Judging from different evaluation criteria, the SSM performances of the angle prediction were as follows: the CC was 0.9944, the RMSE was 0.0433 and the FIT was 0.9881 (
). The SSM prediction performances of the angular velocity were as follows: the CC was 0.9890, the RMSE was 0.0215 and the FIT was 0.9750 (
). It could be found that the prediction performances were satisfactory, which reflected the feasibility and effectiveness of the proposed method.
The same group of data from S1 was also processed with the traditional BPNN whose structure is illustrated in
Figure 9. The joint prediction performance comparison is shown in
Figure 11a. It could be found that the angle prediction of BPNN could still approximate the actual joint angle. However, compared with the results acquired from the proposed SSM, performances acquired from BPNN were worse when the CC was 0.9727 (decreased by 2.18%), the FIT was 0.9264 (decreased by 6.24%), and the RMSE increased to 0.1077 from 0.0433 (
). Although the overall prediction trend of BPNN was consistent with the actual angle variation when both the FIT and CC were acceptable, visible fluctuations could not be overlooked in
Figure 11a. The reason that led to the fluctuations may be the nonstationarity of the EMG signal when it was one of the inputs to the model, and different from the proposed SSM, the BPNN was not able to make the EMG signal sufficiently smoothed. Such fluctuations might be fatal to the potential applications in the prediction control of exoskeleton systems when the position prediction error would give rise to huge interactive forces between human and exoskeleton. These extra loads would not only bring about extra burdens to the users, but may also lead to secondary damage when used in rehabilitation applications. The fluctuations in the BPNN prediction results also demonstrated the superiority of the proposed SSM over traditional BPNN when no visible fluctuation could be found in the SSM prediction results.
The joint prediction performance comparison between SSM and BPNN of S2 is illustrated in
Figure 11b. Although the SSM performance of S2 was worse than that of S1 when bigger derivations could be observed, the results were still acceptable when the FIT was 0.9309, the CC was 0.9759, and the RMSE was 0.0941 (
). In addition, it could be found that performance of S2 using SSM was also better than that using BPNN. Compared to SSM, the corresponding values of BPNN were as follows: the FIT was 0. 8229 (decreased by 11.61%), the CC was 0.9372 (decreased by 3.97%), and the RMSE increased to 0.1525 from 0.0941 (
). Same as the results of S1 displayed in
Figure 11a, there were visible fluctuations which seemed to be even more intense. That would be unacceptable when applied in human-exoskeleton systems.
Although not illustrated in the figure, the angular velocity prediction performances of S2 using the proposed SSM were also satisfactory when the corresponding CC was 0.9879, the FIT was 0.9687 and the RMSE was 0.0252 (). Compared to SSM, the performances of BPNN were worse when the CC was 0.8751 (decreased by 11.42%), the FIT was 0.7415 (decreased by 23.46%) and the RMSE increased to 0.0738 from 0.0252 (). These results also demonstrated the advantage of the proposed SSM over traditional BPNN in the angular velocity prediction. Given that our emphasis was put on the joint angle prediction, the angle velocity comparison was not further discussed.
In addition to the comparisons in
Figure 11, the prediction performance comparisons of all the participated subjects including S3 to S6 were also conducted. The results of RMSE, CC and FIT were calculated using (27)–(29) and are displayed in
Table 1. It could be clearly observed that obvious improvements were achieved through the application of the proposed SSM instead of BPNN when all the values of CC and FIT were increased and values of RMSE were decreased for the 6 participated subjects. Individual differences could also be observed. In addition to the nonstationarity of EMG signals as the input, the differences were probably caused by the fact that the acquired signal qualities from different subjects differed greatly. Especially for S3, the performances using BPNN were poor. Various attempts, including increasing the numbers of hidden layers and neurons in each layer, have been made to improve the performance and acquire the best-fit network. However, the prediction results using BPNN were still unacceptable. The reasons behind still needed further investigations. In contrary, the corresponding SSM performances were much more acceptable, which demonstrated the robustness of the proposed SSM. Except for S3, the BPNN performances of other participated subjects were all acceptable. When we ignore the results of S3, the improvements when applying the proposed SSM instead of BPNN were achieved regardless of individual differences. The results demonstrated the advantages of the proposed SSM over traditional BPNN when applied in joint angle prediction, and the effectiveness of the SSM was also proven.
4.4. Further Discussions
Another issue that has to be made clear is whether the loads on the knee joint have impacts on the effectiveness of the proposed model. There are always loads on the knee joint during most of the cases, such as the situation when a standing position is adopted. Considering that the applied exoskeleton robot was the combination of a bed and an exoskeleton robot, there would be loads on the knee joint when there was an angle between the bed and the ground during the experiments. Although the bed could not turn to the vertical position because of safety considerations and equipment restrictions, the load impacts on the proposed model could be verified.
Thus, to demonstrate that the method did work, not only in the sitting down position, but was also suitable for other applications, the upper part of the bed was risen to about 30 degrees to the ground to find out the load impacts on the effectiveness of the proposed model. A small angle was adopted for safety considerations. Additional experiments were conducted. The resulting CC values were all above 0.95, which were similar to the main parts of the experiments when the sitting down position was adopted. The satisfactory results demonstrated that based on the inputs of extracted EMG features and interactive force, the proposed SSM model was able to predict the joint angle continuously when there were loads on the knee joint.
In fact, joint angle learning and prediction has long been a research hotspot. Instead of using GP-based models, traditional neural networks were usually applied [
1]. For instance, Ding et al. developed a general state-space model, in which a traditional neural network-based NARX model was integrated with BPNN [
4]. They have done a very good job in continuously and accurately estimating the multi-joint movements through the use of a nonlinear filtering algorithm. The CC values of the four predicted angles were 0.95, 0.96, 0.96 and 0.95, which were all close to 1. Compared with their work, we concentrated on the learning and prediction of one joint angle instead of four, and improved the performance, as illustrated in
Table 1. It should be recognized that the decrease of the joint angle number needed to be predicted might be one reason for the improvements. There might also be other possible reasons that contributed to the improvements. One reason was that the addition of interactive force in our model would improve the performance of joint angle prediction [
30]. The use of nonredundant and redundant sub-vectors of EMG signals were innovative in their study, but the information acquired only from the EMG signals was still limited. The other reason might be the use of GP when the integration of parametric model and non-parametric model of GP would improve the performance [
48]. In addition, they also made the comparison between the state space model and the use of NARX model alone. A conclusion was reached that the estimation using the integration of NARX and BPNN over the prediction with the NARX model alone had two improvements: (1) the estimation was closer to the “true” angles, and the larger errors have been corrected by the measurement equation; (2) the UKF made the estimation smoother, which would be beneficial for the stable control of the robots. The second conclusion could also be used to explain the differences observed in our experimental results shown in
Figure 11 when the fluctuations were only found using BPNN instead of the proposed SSM integrating GP-NARX with BPNN.
To take the advantages of GP-based models, Michele et al. also used a GP-NARX model that combined the context of pervious hand movement with instantaneous muscle activity to predict future hand movements [
23]. In addition to EMG signals, mechanomyographic (MMG) signals were also used to indicate hand muscle activation in the forearm. Referring to their results, the improvement of fusing MMG with EMG in the GP-NARX model was relatively small. Thus, it was reasonable to only include EMG in the angle prediction model when taking the computational cost into consideration. GP was also used to estimate the ankle angle and torque based on shank angular velocity and angle without information from neural signals [
16]. In their study, they showed that it was not necessary to acquire speed determination, gait percent identification, lookup tables, or switching rules to continuously mapping the input to the output.
However, all the above studies treated the whole neural control process of human movement as a black-box problem without providing any biomechanical interpretations. In contrast, there are also studies which used Hill’s muscle model to develop the forward dynamics of human neuromusculoskeletal system for angle prediction [
27]. Similar to all the model-based methods, parameter identification, which is very time-consuming, must be done in advance. Thus, the proposed method which is a trade-off between engineering application using machine learning techniques and theoretical analysis providing the biomechanical interpretations might be the optimal choice.
As stated, one major difference of our method is to bring GP into the state space model. When GP is generally considered equivalent in terms of performance to conventional artificial neural network approaches, it is in fact formally equivalent to a neural network with an infinite number of units in the hidden layer [
49]. However, these unique benefits of GP over a standard neural network come at a cost. GP regression is numerically complex and the heavy computational cost is always its major drawback. This leads to the fact that the computational efficiency of the current method seriously limits its adoption in real-time applications. To overcome this limitation, numerous researchers have suggested different sparse approximations, including the subset of data (SoD) approximation, fully independent training conditional (FITC) approximation, and partially independent training conditional (PITC) approximation. One successful application in joint angle prediction can be found in [
44] where an online sparse GP algorithm is proposed to learn the human motion intent. In addition, the integration with an evolving system would be another choice [
22]. Also, constant advances in embedded processing have resulted in GP being successfully deployed in real-time activity recognition in smartphone-based settings using state-of-the-art low-power embedded processors [
50,
51]. How to integrate these techniques into the proposed method and increase the computation efficiency to meet the real-time requirements in the prediction control of exoskeleton system would be the focus of our future research.