Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization

Iwamoto, Masami; Atsumi, Noritoshi; Kato, Daichi

doi:10.3390/biomimetics9100618

Open AccessArticle

Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization

by

Masami Iwamoto

^*

,

Noritoshi Atsumi

and

Daichi Kato

Human Science Research-Domain, Toyota Central R&D Labs., Inc., 41-1, Yokomichi, Nagakute, Aichi 480-1192, Japan

^*

Author to whom correspondence should be addressed.

Biomimetics 2024, 9(10), 618; https://doi.org/10.3390/biomimetics9100618

Submission received: 9 September 2024 / Revised: 5 October 2024 / Accepted: 9 October 2024 / Published: 11 October 2024

(This article belongs to the Special Issue Computer-Aided Biomimetics: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Simultaneous and cooperative muscle activation results in involuntary posture stabilization in vertebrates. However, the mechanism through which more muscles than joints contribute to this stabilization remains unclear. We developed a computational human body model with 949 muscle action lines and 22 joints and examined muscle activation patterns for stabilizing right upper or lower extremity motions at a neutral body posture (NBP) under gravity using actor–critic reinforcement learning (ACRL). Two feedback control models (FCM), muscle length change (FCM–ML) and joint angle differences, were applied to ACRL with a normalized Gaussian network (ACRL–NGN) or deep deterministic policy gradient. Our findings indicate that among the six control methods, ACRL–NGN with FCM–ML, utilizing solely antagonistic feedback control of muscle length change without relying on synergy pattern control or categorizing muscles as flexors, extensors, agonists, or synergists, achieved the most efficient involuntary NBP stabilization. This finding suggests that vertebrate muscles are fundamentally controlled without categorization of muscles for targeted joint motion and are involuntarily controlled to achieve the NBP, which is the most comfortable posture under gravity. Thus, ACRL–NGN with FCM–ML is suitable for controlling humanoid muscles and enables the development of a comfortable seat design.

Keywords:

involuntary posture stabilization; muscle length feedback control; actor-critic reinforcement learning; neutral body posture; human musculosketetal model

1. Introduction

Vertebrates possess numerous muscles despite having relatively few joints. The coordinated activation of these muscles enables involuntary posture stabilization. Muscles exhibit various motion types, such as flexors and extensors, adductors and abductors, and invertors and extortors. For instance, the tibial anterior muscle facilitates both flexion and adduction, complicating the understanding of posture control under gravity.

Neurology and robotics research focuses on how humans control muscles for posture stabilization and intentional movements. Many studies have explored the control of multiple muscles to achieve specific postures or motions. Linear feedback gain control methods, such as proportional–integral–derivative control [1,2] and optimal control algorithms with cost functions [3], estimate muscle activation levels using human musculoskeletal models. These methods predetermine each muscle’s percentage contribution or activation delay for joint motions to realize target postures and movements. However, they struggle to achieve robust motion control in unexpected static or dynamic environments, as muscle contributions and delays can vary with different postures and motions.

Reinforcement learning (RL) has recently emerged as an effective approach for action selection in unknown environments. The actor–critic method, believed to model RL in the basal ganglia [4], has been utilized to estimate muscle activation levels for desired postures or motions in various static and dynamic environments [5,6,7,8,9]. This method simulates involuntary posture stabilization using human musculoskeletal models with anatomical muscle lines [5,8,9] and repeated practice with a mechanistic two–link arm model comprising two joints and six muscles [6,7]. However, learning muscle control policies for posture stabilization under gravity is cost–intensive. Additionally, estimating activation levels of muscles with multiple motion types is challenging, as the method requires the precategorization of muscles into motion types like flexors or extensors.

In our previous study, we developed a musculoskeletal model of an adult male’s right upper limb using MATLAB (MathWorks, Natick, MA, USA) and an actor–critic reinforcement learning (ACRL) system. The actor and critic networks were implemented using a normalized Gaussian network (ACRL–NGN) or deep deterministic policy gradient (DDPG), a widely–used efficient RL algorithm with a deterministic policy, alongside a feedback control model (FCM) of muscle length change (FCM–ML) or joint angle differences (FCM–JA) [10]. Unlike FCM–JA, FCM–ML does not need any precategorization of muscles for motion types. The system efficiently learns involuntary posture stabilization of the right upper extremity under gravity. However, the efficiency and effectiveness of FCM–ML, FCM–JA, ACRL–NGN, and DDPG in achieving desired postures under unexpected static or dynamic environments remain unclear. The feasibility of applying the system to other body parts and involuntary control of multiple muscles for posture under gravity remains obscure as well.

Thus, in the current study, parametric investigations using ACRL–NGN, DDPG, FCM–ML, and FCM–JA were performed for involuntary posture stabilization of the right upper and lower extremities to investigate the learning efficiency and effectiveness of ACRL algorithms and FCMs and discuss how human muscles involuntarily control posture under gravity. Further, we discuss the contribution of the efficient and effective muscle control method for involuntary posture stabilization to the field of clinical or sports science and robotics.

2. Materials and Methods

2.1. Muscle Control System of the Whole–Body Musculoskeletal Model

Figure 1 illustrates the muscle control system of a whole–body musculoskeletal model employing the ACRL–NGN algorithm with FCM–ML. The model, developed in MATLAB, included 949 muscle lines (excluding the face) and 22 articular joints. In the muscle controller depicted in Figure 1, the state variable

s (t)

was calculated using the joint angle

θ (t)

and joint angular velocity

\dot{θ} (t)

at time t. Consequently, 112 state variables were defined for 56 joint motions across the entire body. Anthropometric data for the model were sourced from the 3D male anatomy model (Zygote, American Fork, UT, USA). This study examined posture stabilization of five joint motions in the right upper extremity: elbow flexion–extension (ELV) and inversion–eversion (ELW), shoulder internal–external rotation (SHU), flexion–extension (SHV), and inversion–eversion (SHW). It also analyzed five joint motions in the right lower extremity: knee flexion–extension (KNV) and inversion–eversion (KNW), hip internal–external rotation (HPU), flexion–extension (HPV), and inversion–eversion (HPW). According to anatomical texts (e.g., [11]), the initial angle ranges were as follows: ELV (−135° to 17°), ELW (0° to 180°), SHU (−120° to 40°), SHV (−170° to 50°), SHW (−90° to 70°), KNV (−10° to 150°), KNW (−40° to 10°), HPU (−80° to 30°), HPV (−140° to 20°), and HPW (−50° to 40°). The muscle moment arm, determined by the muscle’s line of action relative to joint position, is indicated by muscular force and joint motion. The musculoskeletal model’s biofidelity was validated by comparing predicted muscle moment arms with experimental data from human subjects. Previous studies validated 17 major right upper extremity muscles (Table 1) against experimental data from the literature [10,12,13,14,15,16] and 13 major right lower extremity muscles (Table 1) against data from earlier research [17,18,19,20]. Figure A9 and Figure A10 depict the validation results for the 13 right lower extremity muscles.

In ACRL–NGN, the critic and actor networks were implemented using the NGN and continuous–time formulation of RL [4,21,22]. The state value function

V (s (t))

in the critic and action value function

a_{m} (s (t))

for the mth muscle in the actor are represented as follows Equations (1) and (2):

V (s (t)) = \sum_{i = 1}^{K} v_{i} b_{i} (s (t)),

(1)

a_{m} (s (t)) = \sum_{i = 1}^{K} w_{i}^{m} b_{i} (s (t)),

(2)

where

v_{i}

and

w_{i}^{m}

are the weights of the critic and actor, respectively;

b_{i} (s (t))

denotes the base function and is represented by the following Equation (3):

\begin{matrix} b_{i} (s (t)) = \frac{B_{i} (s (t))}{\sum_{l = 1}^{K} B_{l} (s (t))}, B_{i} (s (t)) = \exp [- \sum_{k = 1}^{n} {(\frac{s_{k} (t) - c_{k}}{σ_{b}^{k}})}^{2}], \end{matrix}

(3)

where

c_{k}

denotes the coordinates (

d θ

,

d \dot{θ}

) of the center of the activation function,

σ_{b}^{k}

, K, and n represent the constant, the number of base functions, and the number of states

s (t)

, respectively. In this study,

K = 144, n = 112

, and

σ_{b}^{k} =

26.5 and 163.6 for joint angle and joint angular velocity, respectively. The angle difference

d θ

between the current angle and target angle ranged from

- 70^{\circ}

to

70^{\circ}

, and angular velocity difference

d \dot{θ}

between the current angular velocity and target angular velocity ranged from

- 300^{\circ}

/s to

300^{\circ}

/s. The target angles of five joint motions were determined using the neutral body posture (NBP) in spaceflight [23], where

θ_{E L V_{t r g}} = - {88.0}^{\circ}, θ_{E L W_{t r g}} = 0^{\circ}, θ_{S H U_{t r g}} = - {39.0}^{\circ}, θ_{S H V_{t r g}} = - {36.0}^{\circ}, θ_{S H W_{t r g}} = {36.0}^{\circ}

in the right upper extremity and

θ_{K N V_{t r g}} = {43.0}^{\circ}, θ_{K N W_{t r g}} = {20.0}^{\circ}, θ_{H P U_{t r g}} = - {12.0}^{\circ}, θ_{H P V_{t r g}} = - {52.0}^{\circ}, θ_{H P W_{t r g}} = - {9.0}^{\circ}

in the right lower extremity. Since the musculoskeletal model had

- 30^{\circ}

of the ELV angle initially, the target angle of ELV was modified from

- 58^{\circ}

(the original angle) to

- 88^{\circ}

to achieve the NBP in this study. The target angular velocities of the five joint motions were set to zero for postural stabilization of the right upper and lower extremities.

The weights of the critic and actors

v_{i}

and

w_{i}^{m}

were updated using the following Equations (4) and (5):

\begin{matrix} Δ v_{i} & = & α_{V} δ (t) e_{k} (t), \end{matrix}

(4)

\begin{matrix} Δ w_{i}^{m} & = & α_{a} δ (t) n_{m} (t) \exp (- 0.5 V (s (t))) \frac{\partial a_{m} (s (t))}{\partial w_{i}^{m}}, \end{matrix}

(5)

where

α_{V}

and

α_{a}

denote the learning rate of the critic and the actor, respectively.

n_{m} (t)

is the white noise function that was randomly determined for each muscle m from 0 to 1 at each time step to explore the control output. The symbol

δ (t)

represents the temporal difference error and is described as follows Equation (6):

\begin{matrix} δ (t) & = & r (s (t)) + γ V (s (t + 1)) - V (s (t)) \\ = & r (s (t)) + (1 - \frac{Δ t}{τ}) V (s (t + 1)) - V (s (t)), \end{matrix}

(6)

where

γ

denotes the discount factor ranging from 0 to 1, and

τ

denotes a time constant of evaluation;

e_{k} (t)

represents the eligibility trace, which is updated using the following Equation (7):

{\dot{e}}_{k} (t) = - \frac{1}{κ} e_{k} (t) + \frac{\partial V (s (t))}{\partial w_{k}^{V}},

(7)

where the symbol

κ

denotes the time constant of the eligibility trace. In this study,

α_{V} = 0.3,

α_{a} = 0.11

,

τ = 0.05

, and

κ = 0.05

.

The reward function

r (s (t))

is defined as follows Equation (8):

r (s (t)) = \sum_{i = 1}^{N J M} (\exp (- {(\frac{d θ_{i}}{σ_{r}})}^{2}) + \exp (- {(\frac{d {\dot{θ}}_{i}}{σ_{r}})}^{2})) - c \sum_{m = 1}^{N} u_{m} {(t)}^{2},

(8)

where

N J M (= 5)

, c,

σ_{r}

,

u_{m} (t)

, and N denote the total number of joint motions, the weight, a constant, muscle activation level of the m-th muscle, and the total number of muscles, respectively. In this study,

c = 0.01

and

σ_{r} = 100.0

. The activation level of the m-th muscle

u_{m} (t)

is obtained using the following Equation (9):

u_{m} (t) = u_{m}^{m a x} sig (- A (\sum_{k = 1}^{K} w_{i}^{m} b_{i} (s (t)) + \exp (- 0.5 V (s (t))) n_{m} (t)) - B),

(9)

where

u_{m}^{m a x}

is the maximum activation level of the m-th muscle,

sig ()

denotes the sigmoid function, A and B are constants of the sigmoid function, and

n_{m} (t)

is the white noise function. In this study,

u_{m}^{m a x} = 1.0

,

A = 1.0

, and

B = - 4.0

.

2.2. Simulation Conditions for Learning to Stabilize a Target Posture

We implemented the ACRL–NGN algorithm using Python 3.7 to simulate the posture stabilization of five joint motions of the right upper or lower extremities under gravity. The degrees of freedom, except for these five joint motions, were constrained, and the joint angles calculated using MATLAB (R2013b) were the output, with a time step of 0.01 s. For robust RL in a model–free manner, initial joint angles were randomly selected from specified ranges. In each trial, the motion of the extremity was computed using MATLAB under gravity, with the musculoskeletal model set to the initial angles. The muscle activation level

u_{m} (t)

from the actor at time t was input to the corresponding muscle, and the joint motions and muscle lengths were calculated using MATLAB. The state

s (t)

, comprising

d θ

and

d \dot{θ}

for each joint motion, was used to calculate the value function

V (s (t))

and reward function

r (s (t))

to determine the muscle activation level for

t + 1

. Each trial ended at 2.0 s, the simulation’s termination time, and the process was repeated until 300 initial angles were reached.

We used ACRL–NGN or DDPG as the learning algorithm and FCM–ML or FCM–JA as the feedback control model to determine the maximum value of each muscle activation level. We used a DDPG algorithm with the actor–critic method [24], implemented by modifying the Python code of Morvanzhou (https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/9_Deep_Deterministic_Policy_Gradient_DDPG/DDPG.py, accessed on 8 October 2024). The learning rates of the actor and critic were set to 0.0001, while the

τ

was set to 0.01. The FCM–ML is described based on the length rate

Δ l_{m} = (l_{m} - l_{m 0}) / l_{m 0}

of each muscle m using Equation (10):

u_{m}^{m a x} = sig (- 500.0 \cdot Δ l_{m} + 5.0),

(10)

where

l_{m}

and

l_{m 0}

are the current and equilibrium lengths of each muscle m, respectively. In this study, the equilibrium length of each muscle was determined as the length of each muscle when the entire body had an NBP. The FCM–JA is described based on the angle differences

d θ_{j m}

in each joint motion

j m

and the percentage contribution

p c_{m}^{j m}

of each muscle m to each joint motion

j m

, determined by the following Equation (11):

u_{m}^{m a x} = sig (\sum_{j m = 1}^{N J M} p c_{m}^{j m} \cdot d θ_{j m}),

(11)

where

p c_{m}^{j m}

takes negative values for flexions of ELV, SHV, and HPV, the extension of KNV, the inversion of ELW, eversions of SHW, HPW, and KNW, and external rotations of SHU and HPU, while positive values were found for the flexion of KNV, extensions of ELV, SHV, and HPV, inversions of SHW, HPW, and KNW, the eversion of ELW, and internal rotations of SHU and HPU. For example, the FCM–JA is described using Equations (A1)–(A20) in Appendix A for the right upper extremity and Equations (A21)–(A49) in Appendix A for the right lower extremity by referring to anatomical texts (e.g., [11]). The absolute values of percentage contribution

p c_{m}^{j m}

were set to 0.5 and 0.2 for the agonist muscles and synergist muscles, respectively. The value 0.5 was determined based on volunteer test data on muscle strength and muscle activations of the flexors and extensors of the elbow joint during isometric exercise performance, as reported by [25]. The value 0.2 was determined by considering the ratios of activation levels of synergist muscles to those of agonist muscles obtained from experimental test data using electromyography (EMG) [5].

In this study, 12 learning conditions were adopted: two patterns of ACRL–NGN and DDPG for the learning algorithm; three patterns of FCM–ML, FCM–JA, and without any feedback control for the FCM; two patterns of the right upper extremity and right lower extremity for the body part. In each case, the learning calculation was performed for 300 trials to stabilize the target posture, and we compared the simulation results of 12 cases at the 1st, 2nd, 3rd, 20th, and 300th trials (see Table 2). In addition, time–history curves of the joint angles of the ELV, ELW, SHU, SHV, and SHW for the right upper extremity and those of the KNV, KNW, HPU, HPV, and HPW for the right lower extremity were generated.

2.3. Simulation Conditions for Arm Motion Predictions under Different Initial Postures

The learning results for the right upper extremity, after 300 trials in the case with ACRL–NGN and FCM–ML, were the most accurate and efficient among all 12 cases. Thus, we performed simulations of postural change from three different initial postures using the muscle activation function obtained after the 300th trial. The muscle activation function corresponds to

w_{i}^{m}

in Equation (2). We substituted the muscle activation function

w_{i}^{m}

into Equation (9) and obtained the activation level of the m-th muscle

u_{m} (t)

as follows Equations (12)–(14):

\begin{matrix} u_{m}^{m a x} & = & sig (- 500.0 \cdot Δ l_{m} + 5.0), \end{matrix}

(12)

\begin{matrix} Δ l_{m} & = & (l_{m} - l_{m 0}) / l_{m 0}, \end{matrix}

(13)

\begin{matrix} σ (s (t)) & = & \exp (- 0.5 V (s (t))) . \end{matrix}

(14)

Equation (12) represents FCM–ML. We performed the posture change simulations from three different initial postures as follows: Case A: the initial posture consisted of

θ_{E L V} = 0^{\circ}, θ_{E L W} = 0^{\circ}, θ_{S H U} = 0^{\circ}, θ_{S H V} = - 90^{\circ}

, and

θ_{S H W} = 0^{\circ}

; Case B: the initial posture consisted of

θ_{E L V} = 0^{\circ}, θ_{E L W} = 60^{\circ}, θ_{S H U} = 0^{\circ}, θ_{S H V} = 40^{\circ}

and

θ_{S H W} = 0^{\circ}

; Case C: the initial posture consisted of

θ_{E L V} = - 90^{\circ}, θ_{E L W} = 0^{\circ}, θ_{S H U} = 90^{\circ}, θ_{S H V} = 0^{\circ}

and

θ_{S H W} = 0^{\circ}

. We determined the posture changes in the whole-body musculoskeletal model, time trajectories of five joint motions in the right upper extremity, and activation levels of major muscles from the simulation results. All the simulations were performed using a Dell OptiPlex5050 computer with an Intel Core i7—6700 processor and 16 GB of DDR4 Memory.

3. Results

3.1. RL for Stabilization to a Targeted Posture

Table 2 presents the simulation results for posture stabilization of the right upper or lower extremities under gravity. Twelve learning conditions were tested: two algorithms (ACRL–NGN and DDPG), three feedback control models (FCM–ML, FCM–JA, and no FCM), and two body parts (right upper and lower extremities). Learning calculations were performed until the 300th trial for target posture stabilization, with results compared at the 1st, 2nd, 3rd, 20th, and 300th trials. The table numbers indicate the joint angle motions stabilized within a

- 5^{\circ}

to

5^{\circ}

range of the target angle. For the right upper extremity, FCM–ML with ACRL–NGN stabilized all five joint motions by the 2nd trial and in all subsequent trials, while with DDPG, four joint motions were stabilized by the 3rd and 300th trials. For the right lower extremity, FCM–ML with ACRL–NGN stabilized four joint motions by the 2nd and 300th trials, whereas with DDPG, four joint motions were stabilized by the 300th trial. Cases with FCM–JA and no FCMs achieved stabilization for only one or two joint motions.

Figure 2 presents the time history of each joint angle in the right upper extremity using the ACRL–NGN and FCM–ML algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. Despite the random initial angles, each joint stabilized at the target angle by 2 s. Both algorithms achieved the target posture (NBP) for all five joint motions by the 2nd trial, remaining stable until the 300th trial. Figure 3 shows the time history using the ACRL–NGN and FCM–JA algorithms over the same trials. Here, the target posture was reached only for the ELV and SHU in the 1st trial and solely for the ELV in the 3rd and 20th trials. Figure 4 illustrates the time histories using the ACRL–NGN algorithm without any FCMs, showing that the target posture was achieved only for SHW in the 2nd, 20th, and 300th trials. Figure 5 compares the time history of each joint angle in the right lower extremity using the ACRL–NGN and FCM–ML algorithms across the trials, displaying that for all four joint motions, except KNW, the posture of the right upper extremity reached the target posture by the 2nd and 300th trials. Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7 and Figure A8 show the results of comparisons for additional cases.

3.2. Arm Motion Predictions from Different Initial Postures

Figure 6 illustrates the right upper extremity posture changes over 2 s, predicted from three initial postures using muscle control functions at the 300th trial with ACRL–NGN and FCM–ML algorithms. Figure 6a presents a side view for Case A, starting with shoulder forward flexion at

- 90^{\circ}

and elbow extension at

0^{\circ}

, achieving the target posture (NBP) at 2 s. Figure 6b shows a side view for Case B, with an initial backward shoulder extension of

40^{\circ}

and elbow inversion of

60^{\circ}

, also reaching the target posture. Figure 6c offers a front view for Case C, beginning with shoulder external rotation at

90^{\circ}

and elbow flexion at

- 90^{\circ}

, successfully attaining the target posture.

Figure 7 presents the joint angle time histories obtained from three simulations. In Cases A and C, the five joint angles closely match the target angles. However, in Case B, the SHU and SHW shoulder joint angles deviate by

20^{\circ}

from their targets, while the ELV, ELW, and SHV angles remain aligned with the targets. Figure 8 illustrates the activation levels of major muscles during posture change simulations. In Case A, the latissimus dorsi and deltoid spinal activities, responsible for shoulder extension, initially increase, while the biceps brachii long head and brachialis, responsible for elbow flexion, are only slightly activated to achieve the target posture. In Case B, the deltoid clavicular head and biceps brachii long head, which flex the shoulder and elbow joints, respectively, initially show increased activity; in contrast, the triceps brachii long head, which extends the elbow joint, is slightly activated. The deltoid, clavicular head and biceps brachii long head sustained their activation for over 1.5 s. Additionally, the supinator and pronator teres, responsible for supinating and pronating the elbow joint, initially show increased activities, which are sustained for over 2 s. The target posture is achieved with the aid of these muscle activities. In Case C, the pectoralis major clavicular head and subscapularis, which internally rotate the shoulder joint, initially show increased activity, with the subscapularis remaining active until 0.3 s after the simulation onset. The activity of the biceps brachii long head, which flexes the elbow joint, increases until 0.3 s after simulation onset and then decreases, whereas the brachialis maintains its activation level between 0.1 and 0.2 until 2 s, thus achieving the target posture.

4. Discussion

4.1. Comparisons between FCM–ML and FCM–JA

The results obtained in this study indicate that the ACRL–NGN algorithm combined with FCM–ML is the most effective for achieving desired postures in both the right upper and lower extremities, followed by the DDPG algorithm with FCM–ML (see Table 2 and Figure 2). This suggests that muscle control, informed by feedback on muscle length changes, is crucial for determining maximum muscle activation levels and ensuring efficient postural control. Muscle spindles, which are sensory receptors in nearly all muscles, relay information about muscle length changes and stretching speed to the central nervous system (CNS). The CNS uses these data to calculate the position and movement of the extremities, which are essential for motor control, posture maintenance, and stable gait [26,27]. In FCM–ML, the maximum activation level of the m-th muscle is uniquely determined by the muscle length change

Δ l_{m}

in Equation (13). An increase in

Δ l_{m}

enhances the muscle activation level and the muscle force, promptly achieving the target posture. Moreover, if the current joint angle is more flexed than the target angle, the

Δ l_{m}

and muscular forces of extensors become larger than those of flexors, thereby extending the joint and vice versa. FCM–ML retains antagonistic feedback control of muscle length change

Δ l_{m}

.

Muscle forces are highly dependent on fiber length. When muscle–tendon compliance is low (i.e., a small ratio of tendon slack length to optimal muscle fiber length), muscle fiber operating length (reflected by muscle length change

Δ l_{m}

) is primarily influenced by joint angles and muscle moment arms. Conversely, when compliance is high, it depends more on activation level and force–length–velocity effects [28]. Since the current model excluded tendon elements and did not simulate high muscle–tendon compliance in upper or lower extremity motions, joint angles and muscle moment arms predominantly determine the muscle length change

Δ l_{m}

. Each muscle in our musculoskeletal model has a moment arm comparable to the human body (refer to [10] for upper extremity muscles and Figure A9 and Figure A10 for lower extremity muscles). Thus, joint angles are assumed to determine the muscle length change

Δ l_{m}

and muscle force. However, FCM–JA cases stabilized posture for only one or two joint motions (see Table 2). In FCM–JA, the maximum activation level of the m-th muscle is derived from joint angles (see Equation (11)) but not uniquely determined, as the percentage contribution

p c_{m}^{j m}

in Equation (11) varies with joint motion types (flexion–extension, inversion–eversion, and internal–external rotations), muscle roles (agonist or synergist for each joint motion) (see Equations (A1)–(A20) for the right upper extremity and Equations (A21)–(A49) for the right lower extremity in Appendix A), and muscle activation delay. The percentage contribution must be predetermined using the anatomical references (e.g., [11]) and EMG experimental data. While it can be predetermined for posture stabilization in expected static or dynamic environments and intentional, voluntary motion, as joint motion type, muscle role, and activation delay are known (see [1,2,5,8,9]), it cannot be predetermined for involuntary posture stabilization in unexpected static or dynamic environments. This phenomenon elucidates why the cases employing FCM–JA demonstrated minimal posture stabilization for the right upper and lower extremities.

4.2. Comparisons between ACRL–NGN and DDPG

DDPG, a common RL algorithm for continuous control, learns a deterministic policy via the actor–critic method. However, it suffers from severe overestimation bias due to using a single critic for function approximation, as the actor–network is trained to execute actions with the highest value estimate [29,30]. To address this, the twin–delayed deep deterministic policy gradient (TD3) was proposed, using the minimum value from double critic networks for value estimation with exploration noise [29]. We employed the DDPG and ACRL–NGN algorithms to learn to move extremities from various initial postures to target postures. In our simulations with FCM–ML, while DDPG achieved the target posture in four joint angle motions, the ACRL–NGN algorithm outperformed DDPG in efficiency and accuracy (see Table 2). This likely stems from DDPG’s overestimation issue. Conversely, ACRL–NGN incorporates a continuous state space using a Gaussian softmax network and a noise–induced muscle activation function [10], enhancing performance akin to exploration noise in TD3.

4.3. Comparisons with Previous Iterative Learning Methods

Previous studies identified muscle synergies as groups of co-activated muscles to reduce redundancy in the human musculoskeletal system and developed an iterative learning controller to coordinate these synergies for complex tasks [31,32,33]. However, about 40 iterations are required to achieve the target motion, and synergy patterns must be adjusted for specific tasks. In contrast, our method does not necessitate muscle categorization for tuning synergy patterns and achieves target postures or motions with significantly fewer iterations (see Table 2). FCM–JA is a muscle synergy control method. For the upper extremity, the long, lateral, and medial heads of triceps brachii form one group, while the brachialis and brachioradialis form another, as indicated in Equations (A1)–(A20). In each reinforcement learning trial, the initial angles of SHV, SHW, ELV, and ELW vary randomly within the ranges of −170° to 50°, −90° to 70°, −135° to 17°, and 0° to 180°, respectively. Consequently, the upper extremity simulated in this study exhibits various joint motions except ELV. The

p c_{m}^{j m}

contributions of 0.5 and 0.2 for the agonist and synergist muscles in Equation (11) are valid for ELV but not for the other joint motions. Similar to the results obtained in previous synergy control studies, our findings indicate that FCM–JA requires tuning of synergy patterns for specific joint motions, unlike FCM–ML.

4.4. Versatility of ACRL–NGN with FCM–ML to Other Body Parts

In stabilizing the target posture for the right lower extremity, the ACRL–NGN and FCM–ML algorithms achieved target angles for only four joint motions, compared to five joint motions in the right upper extremity. This discrepancy likely arises because the lower extremity contains more biarticular muscles than the upper extremity: nine in the lower extremity (tensor fasciae latae, rectus femoris, gracilis, sartorius, biceps femoris long head, semitendinosus, semimembranosus, gastrocnemius, and plantaris) and three in the upper extremity (triceps brachii long head and bicep brachii long and short heads). Muscle length, influenced by two joints, is severely affected by the tendons in biarticular muscles [28]. The current model omits tendon elements, rendering muscle length feedback ineffective for the lower extremities. Future studies should incorporate a muscle–tendon complex model.

Vertebrates, excluding humans, like dogs and horses, possess different joint motion ranges. Understanding their muscle control mechanisms requires specific data on muscular motion types (e.g., flexors or extensors), muscle roles (e.g., agonist or synergist), and activation patterns from experimental EMG data alongside anthropometric data. However, the feedback method of muscle length change (FCM–ML), combined with musculoskeletal anthropometric data, aids in comprehending their muscle control mechanisms because it does not require specific data on such categorization of muscles. This result implies that vertebrate muscles are fundamentally controlled without classifying flexors or extensors or distinguishing between agonists and synergists necessary to explore muscular synergy patterns for targeted joint motion.

4.5. Application Prospects of ACRL–NGN with FCM–ML

ACRL–NGN with FCM–ML does not require muscle categorization for tuning synergy patterns and thus achieves targeted postures or motions with fewer iterations compared to the existing methods (Section 4.3). Therefore, this method is suitable for activating human–like robots with multiple muscle lines, such as a humanoid McKibben arm with multifilament artificial muscles [34]. Additionally, the simulation results presented in Figure 6, Figure 7 and Figure 8 highlight the muscles that should be trained to efficiently achieve targeted motions and thus benefit athletes and patients undergoing rehabilitation. According to the force–length curve of the Hill–type muscle model, each muscle generates maximal force at its natural length [35]. The results of this study indicate that the maximum muscle activation level depends on the difference

Δ l

between the current and target posture muscle lengths (Equation (12)). The NBP is used as the target posture, as each muscle length in the NBP is considered to be the natural length. A large

Δ l

promotes activation via the sigmoid function described in Equation (12), whereas a small

Δ l

inhibits activation but results in a stronger contractile force, allowing the muscle’s current length to approach the target length, and thus, the NBP. Another target posture with an ELV angle of

- 30^{\circ}

was set while maintaining the other joint motions in their original states, and learning calculations using ACRL–NGN and FCM–ML were performed until the 300th trial. The ELV stabilized at

- 88^{\circ}

, similar to that observed in the NBP, by the 300th trial, demonstrating that feedback control of muscle–length changes efficiently stabilizes the NBP under gravity without voluntary muscle controls. This result suggests that the NBP is the most comfortable posture for humans [36,37,38], underscoring the potential of ACRL–NGN with FCM–ML in comfortable seat and chair design.

4.6. Study Limitations

Nevertheless, the results obtained in this study are influenced by several limitations. First, we did not incorporate tendon elements into the analysis. Second, the time history curves of the muscle activation levels predicted during the three posture change simulations were not validated against EMG data due to the lack of experimental data on arm motions obtained using adult male participants. Further investigations are necessary to replace the current muscle model with a muscle–tendon complex model that incorporates a Hill–type muscle model with serial damping and eccentric force–velocity, similar to the one proposed by Haeufle et al. [39]. Such a modified model is expected to reveal the effects of tendon elements and biarticular muscles on posture stabilization and various motions (e.g., gait and jumping) of the extremities. Additionally, volunteer tests should be conducted to acquire EMG data of surface muscles in the extremities, as described in our previous studies [5,8], and to examine the validity of the predicted muscle activation levels to enhance the biofidelity of the proposed human musculoskeletal model and muscle controller.

5. Conclusions

ACRL–NGN, which reflects basal ganglia activity with FCM–ML simulating muscle spindle feedback, achieved the most efficient involuntary posture stabilization among all control methods tested. This approach does not require muscular synergy pattern control classifying flexors or extensors or distinguishing agonists from synergists. Instead, it relies on antagonistic feedback control based on changes in muscle length; thus, this method, which indicates muscles that should be trained for efficiently achieving targeted motions, can be applied to all vertebrate muscles, humanoids with multiple muscles, and in rehabilitation of athletes or patients. Additionally, this method effectively achieves involuntary NBP stabilization, i.e., the most comfortable posture for humans under gravity. This result implies that human body muscles may involuntarily control posture to attain the NBP under gravity, suggesting the potential applicability of the proposed method to comfortable seat design. In future studies, we will replace the current muscle model with a muscle–tendon complex model and apply the ACRL–NGN and FCM–ML algorithms to intentional movements.

Author Contributions

Conceptualization, M.I.; methodology, M.I., N.A. and D.K.; software, M.I., N.A. and D.K.; validation, M.I. and N.A.; formal analysis, M.I. and N.A.; investigation, M.I. and N.A.; resources, M.I.; data curation, M.I. and N.A.; writing—original draft preparation, M.I.; writing—review and editing, M.I.; visualization, M.I. and N.A.; supervision, M.I.; project administration, M.I.; funding acquisition, M.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data and code presented in the study are openly available for anonymous peer review at https://osf.io/sc3zt/?view_only=6c9b73083bcc42a380f8383f1978d21f, accessed on 8 October 2024. AI or AI-assisted tools were not used in drafting any aspect of this manuscript.

Conflicts of Interest

Masami Iwamoto, Noritoshi Atsumi, and Daichi Kato are employed by Toyota Central R&D Labs., Inc. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

The FCM–JA is described based on the angle differences of the five joint motions using the following Equations (A1)–(A20) for the right upper extremity.

u_{1 : D e l t o i d a n t e r i o r}^{m a x} = s i g (- 0.5 d θ_{S H V} + 0.5 d θ_{S H W}),

(A1)

u_{2 : D e l t o i d m i d d l e}^{m a x} = s i g (- 0.5 d θ_{S H U}),

(A2)

u_{3 : D e l t o i d p o s t e r i o r}^{m a x} = s i g (0.2 d θ_{S H V} - 0.2 d θ_{S H W}),

(A3)

u_{4 : T e r e s m a j o r}^{m a x} = s i g (0.5 d θ_{S H U} + 0.5 d θ_{S H W}),

(A4)

u_{5 : T e r e s m a j o r}^{m a x} = s i g (- 0.5 d θ_{S H W}),

(A5)

u_{6 : S u p r a s p i n a t u s}^{m a x} = s i g (- 0.2 d θ_{S H U}),

(A6)

u_{7 : I n f r a s p i n a t u s}^{m a x} = s i g (- 0.5 d θ_{S H W}),

(A7)

u_{8 : S u b s c a p u l a r i s}^{m a x} = s i g (0.5 d θ_{S H U} + 0.5 d θ_{S H W}),

(A8)

u_{9 : C o r a c o b r a c h i a l i s}^{m a x} = s i g (- 0.2 d θ_{S H V} + 0.2 d θ_{S H W}),

(A9)

u_{10 : B i c e p s b r a c h i i l o n g}^{m a x} = s i g (- 0.5 d θ_{S H V} - 0.2 d θ_{S H W} - 0.5 d θ_{E L V} + 0.2 d θ_{E L W}),

(A10)

u_{11 : B i c e p s b r a c h i i s h o r t}^{m a x} = s i g (- 0.5 d θ_{S H V} - 0.2 d θ_{S H W} - 0.5 d θ_{E L V} + 0.2 d θ_{E L W}),

(A11)

u_{12 : T r i c e p s b r a c h i i l o n g}^{m a x} = s i g (0.5 d θ_{E L V}),

(A12)

u_{13 : T r i c e p s b r a c h i i l a t e r a l}^{m a x} = s i g (0.5 d θ_{E L V}),

(A13)

u_{14 : T r i c e p s b r a c h i i m e d i a l}^{m a x} = s i g (0.5 d θ_{E L V}),

(A14)

u_{15 : B r a c h i a l i s}^{m a x} = s i g (- 0.5 d θ_{E L V}),

(A15)

u_{16 : B r a c h i o r a d i a l i s}^{m a x} = s i g (- 0.5 d θ_{E L V}),

(A16)

u_{17 : P r o n a t o r t e r e s}^{m a x} = s i g (- 0.2 d θ_{E L V} - 0.5 d θ_{E L W}),

(A17)

u_{18 : A n c o n e u s}^{m a x} = s i g (0.2 d θ_{E L V} - 0.2 d θ_{E L W}),

(A18)

u_{19 : S u p i n a t o r}^{m a x} = s i g (0.5 d θ_{E L W}),

(A19)

u_{20 : P r o n a t o r q u a d r a t u s}^{m a x} = s i g (- 0.5 d θ_{E L W}) .

(A20)

The FCM–JA was described based on the angle differences of the five joint motions using the following Equations (A21)–(A49) for the right lower extremity.

u_{1 : P s o a s m a j o r}^{m a x} = s i g (- 0.5 d θ_{H P V}),

(A21)

u_{2 : I l i a c u s}^{m a x} = s i g (- 0.5 d θ_{H P V}),

(A22)

u_{3 : G l u t e u s m a x i m u s}^{m a x} = s i g (0.5 d θ_{H P V}),

(A23)

u_{4 : G l u t e u s m e d i u s}^{m a x} = s i g (- 0.5 d θ_{H P U}),

(A24)

u_{5 : G l u t e u s m i n i m u s}^{m a x} = s i g (- 0.5 d θ_{H P U}),

(A25)

u_{6 : I n f e r i o r g e m e l l u s}^{m a x} = s i g (- 0.5 d θ_{H P W}),

(A26)

u_{7 : O b t u r a t o r e x t e r n u s}^{m a x} = s i g (- 0.5 d θ_{H P W}),

(A27)

u_{8 : O b t u r a t o r i n t e r n u s}^{m a x} = s i g (- 0.5 d θ_{H P W}),

(A28)

u_{9 : S u p e r i o r g e m e l l u s}^{m a x} = s i g (- 0.5 d θ_{H P W}),

(A29)

u_{10 : Q u a d r a t u s f e m o r i s}^{m a x} = s i g (- 0.5 d θ_{H P W}),

(A30)

u_{11 : P i r i f o r m}^{m a x} = s i g (- 0.5 d θ_{H P U} - 0.5 d θ_{H P W}),

(A31)

u_{12 : A d d u c t o r b r e v i s}^{m a x} = s i g (- 0.2 d θ_{H P V} + 0.5 d θ_{H P U} - 0.2 d θ_{H P W}),

(A32)

u_{13 : A d d u c t o r l o n g u s}^{m a x} = s i g (- 0.2 d θ_{H P V} + 0.5 d θ_{H P U} - 0.2 d θ_{H P W}),

(A33)

u_{14 : A d d u c t o r m a g n u s}^{m a x} = s i g (- 0.2 d θ_{H P V} + 0.5 d θ_{H P U}),

(A34)

u_{15 : B i c e p s f e m o r i s l o n g h e a d}^{m a x} = s i g (0.5 d θ_{K N V} + 0.5 d θ_{H P V} - 0.5 d θ_{K N W}),

(A35)

u_{16 : B i c e p s f e m o r i s s h o r t h e a d}^{m a x} = s i g (0.5 d θ_{K N V} - 0.5 d θ_{K N W}),

(A36)

u_{17 : G r a c i l i s}^{m a x} = s i g (0.2 d θ_{K N V} - 0.2 d θ_{H P V} + 0.5 d θ_{H P U} + 0.2 d θ_{K N W}),

(A37)

u_{18 : P e c t i n e u s}^{m a x} = s i g (- 0.5 d θ_{H P V} + 0.5 d θ_{H P U}),

(A38)

u_{19 : R e c t u s f e m o r i s}^{m a x} = s i g (- 0.5 d θ_{K N V} - 0.5 d θ_{H P V}),

(A39)

u_{20 : S a r t o r i u s}^{m a x} = s i g (0.2 d θ_{K N V} - 0.2 d θ_{H P V} - 0.2 d θ_{H P U} - 0.2 d θ_{H P W} + 0.2 d θ_{K N W}),

(A40)

u_{21 : S e m i m e m b r a n o s u s}^{m a x} = s i g (0.5 d θ_{K N V} + 0.5 d θ_{H P V} + 0.2 d θ_{K N W}),

(A41)

u_{22 : S e m i t e n d i n o s u s}^{m a x} = s i g (0.5 d θ_{K N V} + 0.5 d θ_{H P V} + 0.2 d θ_{H P U} + 0.5 d θ_{K N W}),

(A42)

u_{23 : T e n s o r f a s c i a l a t a}^{m a x} = s i g (- 0.5 d θ_{H P V} - 0.5 d θ_{H P U} + 0.5 d θ_{H P W} - 0.2 d θ_{K N W}),

(A43)

u_{24 : V a s t u s i n t e r m e d i u s}^{m a x} = s i g (- 0.5 d θ_{K N V}),

(A44)

u_{25 : V a s t u s m e d i a l i s}^{m a x} = s i g (- 0.5 d θ_{K N V}),

(A45)

u_{26 : V a s t u s l a t e r a l i s}^{m a x} = s i g (- 0.5 d θ_{K N V}),

(A46)

u_{27 : G a s t r o c n e m i u s}^{m a x} = s i g (0.5 d θ_{K N V}),

(A47)

u_{28 : P l a n t a r i s}^{m a x} = s i g (0.2 d θ_{K N V}),

(A48)

u_{29 : P o p l i t e u s}^{m a x} = s i g (0.2 d θ_{K N V}) .

(A49)

Figure A1. Comparison of the time history of each joint angle in the right upper extremity using the deep deterministic policy gradient (DDPG) and feedback control model (FCM)–muscle length (ML) algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. (a) ELV: flexion–extension of the elbow joint, (b) ELW: inversion–eversion of the elbow joint, (c) SHU: internal–external rotation of the shoulder joint, (d) SHV: flexion–extension of the shoulder joint, and (e) SHW: inversion–eversion of the shoulder joint. The black dashed lines denote the target angles.

Figure A2. Comparison of the time history of each joint angle in the right upper extremity using the DDPG and FCM–joint angle (JA) algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure A3. Comparison of the time history of each joint angle in the right upper extremity using the DDPG algorithm without any FCMs at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure A4. Comparison of the time history of each joint angle in the right lower extremity using the actor–critic reinforcement learning (ACRL)–normalized Gaussian network (NGN) and FCM–JA algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. (a) KNV: flexion–extension of the knee joint, (b) KNW: inversion–eversion of the knee joint, (c) HPU: internal–external rotation of the hip joint, (d) HPV: flexion–extension of the hip joint, and (e) HPW: inversion–eversion of the hip joint. The black dashed lines denote the target angles.

Figure A5. Comparison of the time history of each joint angle in the right lower extremity using the ACRL–NGN algorithm without any FCMs at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure A6. Comparison of the time history of each joint angle in the right lower extremity using the DDPG and FCM–ML algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure A7. Comparison of the time history of each joint angle in the right lower extremity using the DDPG and FCM–JA algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure A8. Comparison of the time history of each joint angle in the right lower extremity using the DDPG algorithm without any FCMs at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure A9. Comparison of moment arm versus knee or hip flexion angles between model prediction and test data.

Figure A10. Comparison of moment arm versus knee or hip flexion angles between model prediction and test data.

References

Kato, D.; Nakahira, Y.; Atsumi, N.; Iwamoto, M. Development of human-body model THUMS Version 6 containing muscle controllers and application to injury analysis in frontal collision after brake deceleration. In Proceedings of the 2018 IRCOBI Conference—International Research Council on the Biomechanics of Injury, Athens, Greece, 12–14 September 2018. [Google Scholar]
Rooij, L. Effect of various pre-crash braking strategies on simulated human kinematic response with varying levels of driver attention. In Proceedings of the 22nd Enhanced Safety of Vehicles Conference, Washington, DC, USA, 13–16 June 2011. [Google Scholar]
Thelen, D.; Anderson, F.; Delp, S. Generating dynamics simulations of movement using computed muscle control. J. Biomech. 2003, 36, 321–328. [Google Scholar] [CrossRef] [PubMed]
Doya, K. Reinforcement learning in continuous time and space. Neural Comput. 2000, 12, 219–245. [Google Scholar] [CrossRef] [PubMed]
Iwamoto, M.; Nakahira, Y.; Kimpara, H.; Sugiyama, T.; Min, K. Development of a human body finite element model with multiple muscles and their controller for estimating occupant motions and impact responses in frontal crash situations. Stapp Car Crash J. 2012, 56, 231–268. [Google Scholar] [PubMed]
Kambara, H.; Kim, K.; Sato, M.; Koike, Y. Learning arm’s posture control using reinforcement learning and feedback-error-learning. In Proceedings of the 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Francisco, CA, USA, 1–5 September 2004; pp. 486–489. [Google Scholar]
Kambara, H.; Kim, K.; Shin, D.; Sato, M.; Koike, Y. Learning and generation of goal-directed arm reaching from scratch. Neural Netw. 2009, 22, 348–361. [Google Scholar] [CrossRef]
Min, K.; Iwamoto, M.; Kakei, S.; Kimpara, H. Muscle Synergy-Driven Robust Motion Control. Neural Comput. 2018, 30, 1104–1131. [Google Scholar] [CrossRef]
Min, K.; Lee, J.; Kakei, S. Dynamic modulation of a learned motor skill for its recruitment. Front. Comput. Neurosci. 2020, 14, 457682. [Google Scholar] [CrossRef]
Iwamoto, M.; Kato, D. Efficient actor-critic reinforcement learning with embodiment of muscle tone for posture stabilization of the human arm. Neural Comput. 2021, 33, 129–156. [Google Scholar] [CrossRef]
Neumann, D. Kinesiology of the Musculoskeletal System: Foundations for Rehabilitation, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2010. [Google Scholar]
Kuechle, D.; Newman, S.; Itoi, E.; Morrey, B.; An, K. Shoulder muscle moment arms during horizontal flexion and elevation. J. Shoulder Elb. Surg. 1997, 6, 429–439. [Google Scholar] [CrossRef]
Kuechle, D.; Newman, S.; Itoi, E.; Morrey, B.; An, K. The relevance of the moment arm of shoulder muscles with respect to axial rotation of the glenohumeral joint in four positions. Clin. Biomech. 2000, 15, 322–329. [Google Scholar] [CrossRef]
Murray, W.; Delp, S.; Buchanan, T. Variation of muscle moment arms with elbow and forearm position. J. Biomech. 1995, 28, 513–525. [Google Scholar] [CrossRef]
Murray, W.; Buchanan, T.; Delp, S. The isometric functional capacity of muscles that cross the elbow. J. Biomech. 2000, 33, 943–952. [Google Scholar] [CrossRef] [PubMed]
Murray, W.; Buchanan, T.; Delp, S. Scaling of peak moment arms of elbow muscles with upper extremity bone dimensions. J. Biomech. 2002, 35, 19–26. [Google Scholar] [CrossRef] [PubMed]
Arnold, E.M.; Salinas, S.; Asakawa, D.J.; Delp, S.L. Accuracy of muscle moment arms estimated from MRI-based musculoskeletal models of the lower extremity. Comput. Aided Surg. 2000, 5, 108–119. [Google Scholar] [CrossRef] [PubMed]
Buford, W.L.; Ivey, F.M.; Malone, J.D.; Patterson, R.M.; Pearce, G.; Nguyen, D.K.; Stewart, A.A. Muscle balance at the knee-moment arms for the normal knee and the ACL-minus knee. IEEE Trans. Rehabil. Eng. 1997, 5, 367–379. [Google Scholar] [CrossRef] [PubMed]
Hawkins, D. Software for determining lower extremity muscle-tendon kinematics and moment arm lengths during flexion/extension movements. Comput. Biol. Med. 1992, 22, 59–71. [Google Scholar] [CrossRef]
Nemeth, G.; Ohlsenp, H. In vivo moment arm lengths for hip extensor muscles at different angles of hip flexion. J. Biomech. 1985, 8, 129–140. [Google Scholar] [CrossRef]
Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr. Opin. Neurobiol. 2000, 10, 732–739. [Google Scholar] [CrossRef]
Morimoto, J.; Doya, K. Robust reinforcement learning. Neural Comput. 2005, 17, 335–359. [Google Scholar] [CrossRef]
Tengwall, R.; Jackson, J.; Kimura, T.; Komenda, S.; Okada, S.; Preuschoft, H. Human posture in zero gravity. Curr. Anthr. 1982, 23, 657–666. [Google Scholar] [CrossRef]
Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014. [Google Scholar]
Yang, J.; Lee, J.; Lee, B.; Kim, S.; Shin, D.; Lee, Y.; Lee, J.; Han, D.; Choi, S. The Effects of Elbow Joint Angle Changes on Elbow Flexor and Extensor Muscle Strength and Activation. J. Phys. Ther. Sci. 2014, 26, 1079–1082. [Google Scholar] [CrossRef]
Kröger, S.; Watkins, B. Muscle spindle function in healthy and diseased muscle. Skelet. Muscle 2021, 11, 3. [Google Scholar] [CrossRef] [PubMed]
Ramadan, R.; Geyer, H.; Jeka, J.; Schoner, G.; Reimann, H. A neuromuscular model of human locomotion combines spinal reflex circuits with voluntary movements. Sci. Rep. 2022, 12, 8189. [Google Scholar] [CrossRef]
Arnold, A.S.; Delp, S.L. Fibre operating lengths of human lower limb muscles during walking. Phil. Trans. R. Soc. B 2011, 366, 1530–1539. [Google Scholar] [CrossRef]
Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR 80. pp. 1587–1596. [Google Scholar]
Lyu, J.; Ma, X.; Yan, J.; Li, X. Efficient continuous control with double actors and regularized critics. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, Virtual venue, 22 February–1 March 2022; pp. 7655–7663. [Google Scholar]
Angelini, F.; Santina, C.D.; Garabini, M.; Bianchi, M.; Bicchi, A. Control architecture for human-like motion with applications to articulated soft robots. Front. Robot. AI 2020, 7, 117. [Google Scholar] [CrossRef]
Chen, J.; Zhong, S.; Kang, E.; Qiao, H. Realizing human-like manipulation with a musculoskeletal system and biologically inspired control scheme. Neurocomputing 2019, 339, 116–129. [Google Scholar] [CrossRef]
Fan, J.; Jin, J.; Wang, Q. Humanoid muscle-skeleton robot arm design and control based on reinforcement learning. In Proceedings of the 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; pp. 541–546. [Google Scholar]
Kurumaya, S.; Suzumori, K.; Nabae, H.; Wakimoto, S. Musculoskeletal lower-limb robot driven by multifilament muscles. ROBOMECH J. 2016, 3, 18. [Google Scholar] [CrossRef]
Thelen, D. Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults. J. Biomech. Eng. 2003, 125, 70–77. [Google Scholar] [CrossRef] [PubMed]
Fiorillo, I.; Piro, S.; Anjani, S.; Smulders, M.; Song, Y.; Naddeo, A.; Vinkg, P. Future vehicles: The effect of seat configuration on posture and quality of conversation. Ergonomics 2019, 62, 1400–1414. [Google Scholar] [CrossRef] [PubMed]
Gunev, D.; Ilievy, S. The basic geometric parameters of the driving position of a battery electric, prototype class vehicle for the shell eco-marathon competition. AIP Conf. Proc. 2021, 2439, 020002. [Google Scholar]
Kim, K.H.; Young, K.S.; Rajulu, S.L. Neutral body posture in spaceflight. In Proceedings of the Human Factors and Ergonomics Society 2019 Annual Meeting, Washington, DC, USA, 28 October–1 November 2019; pp. 992–996. [Google Scholar]
Haeufle, D.F.B.; Gunther, M.; Bayer, A.; Schmitt, S. Hill-type muscle model with serial damping and eccentric force-velocity relation. J. Biomech. 2014, 47, 1531–1536. [Google Scholar] [CrossRef]

Figure 1. Muscle control system of the whole–body musculoskeletal model based on the actor–critic reinforcement learning (ACRL)–normalized Gaussian network (NGN) and feedback control model (FCM)–muscle length (ML) change algorithms.

Figure 2. Comparison of the time history of each joint angle in the right upper extremity using the ACRL–NGN and FCM–ML algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. (a) ELV: flexion–extension of the elbow joint, (b) ELW: inversion–eversion of the elbow joint, (c) SHU: internal–external rotation of the shoulder joint, (d) SHV: flexion–extension of the shoulder joint, and (e) SHW: inversion–eversion of the shoulder joint. The black dashed lines denote the target angles.

Figure 3. Comparison of the time history of each joint angle in the right upper extremity using the ACRL–NGN and FCM–JA algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure 4. Comparison of the time history of each joint angle in the right upper extremity using the ACRL–NGN algorithm without any FCMs at the 1st, 2nd, 3rd, 20th, and 300th trials. The black dashed lines denote the target angles.

Figure 5. Comparison of the time history of each joint angle in the right lower extremity using the ACRL–NGN and FCM–ML algorithms at the 1st, 2nd, 3rd, 20th, and 300th trials. (a) KNV: flexion–extension of the knee joint, (b) KNW: inversion–eversion of the knee joint, (c) HPU: internal–external rotation of the hip joint, (d) HPV: flexion–extension of the hip joint, and (e) HPW: inversion–eversion of the hip joint. The black dashed lines denote the target angles.

Figure 6. Simulation result of posture change during 2 s of the right upper extremity, obtained using muscle control functions obtained at the 300th trial with the ACRL–NGN and FCM–ML algorithms. (a) Case A: an initial posture consisting of ELV:

0^{\circ}

, ELW:

0^{\circ}

, SHU:

0^{\circ}

, SHV:

- 90^{\circ}

, and SHW:

0^{\circ}

; (b) Case B: an initial posture consisting of ELV:

0^{\circ}

, ELW:

60^{\circ}

, SHU:

0^{\circ}

, SHV:

40^{\circ}

, and SHW:

0^{\circ}

; (c) Case C: an initial posture consisting of ELV:

- 90^{\circ}

, ELW:

0^{\circ}

, SHU:

90^{\circ}

, SHV:

0^{\circ}

, and SHW:

0^{\circ}

.

Figure 6. Simulation result of posture change during 2 s of the right upper extremity, obtained using muscle control functions obtained at the 300th trial with the ACRL–NGN and FCM–ML algorithms. (a) Case A: an initial posture consisting of ELV:

0^{\circ}

, ELW:

0^{\circ}

, SHU:

0^{\circ}

, SHV:

- 90^{\circ}

, and SHW:

0^{\circ}

; (b) Case B: an initial posture consisting of ELV:

0^{\circ}

, ELW:

60^{\circ}

, SHU:

0^{\circ}

, SHV:

40^{\circ}

, and SHW:

0^{\circ}

; (c) Case C: an initial posture consisting of ELV:

- 90^{\circ}

, ELW:

0^{\circ}

, SHU:

90^{\circ}

, SHV:

0^{\circ}

, and SHW:

0^{\circ}

.

Figure 7. Comparison of each joint angle time history from posture change simulation results during 2 s of the right upper extremity using muscle control functions obtained at the 300th trial with the ACRL–NGN and FCM–ML algorithms. Case A: an initial posture consisting of ELV:

0^{\circ}

, ELW:

0^{\circ}

, SHU:

0^{\circ}

, SHV:

- 90^{\circ}

, and SHW:

0^{\circ}

; Case B: an initial posture consisting of ELV:

0^{\circ}

, ELW:

60^{\circ}

, SHU:

0^{\circ}

, SHV:

40^{\circ}

, and SHW:

0^{\circ}

; Case C: an initial posture consisting of ELV:

- 90^{\circ}

, ELW:

0^{\circ}

, SHU:

90^{\circ}

, SHV:

0^{\circ}

, and SHW:

0^{\circ}

. The black dashed lines denote the target angles.

Figure 7. Comparison of each joint angle time history from posture change simulation results during 2 s of the right upper extremity using muscle control functions obtained at the 300th trial with the ACRL–NGN and FCM–ML algorithms. Case A: an initial posture consisting of ELV:

0^{\circ}

, ELW:

0^{\circ}

, SHU:

0^{\circ}

, SHV:

- 90^{\circ}

, and SHW:

0^{\circ}

; Case B: an initial posture consisting of ELV:

0^{\circ}

, ELW:

60^{\circ}

, SHU:

0^{\circ}

, SHV:

40^{\circ}

, and SHW:

0^{\circ}

; Case C: an initial posture consisting of ELV:

- 90^{\circ}

, ELW:

0^{\circ}

, SHU:

90^{\circ}

, SHV:

0^{\circ}

, and SHW:

0^{\circ}

. The black dashed lines denote the target angles.

Figure 8. Comparison of each muscle activation time history from posture change simulation results during 2 s of the right upper extremity using muscle control functions obtained at the 300th trial with the ACRL–NGN and FCM–ML algorithms. (a) Case A: an initial posture consisting of ELV:

0^{\circ}

, ELW:

0^{\circ}

, SHU:

0^{\circ}

, SHV:

- 90^{\circ}

, and SHW:

0^{\circ}

; (b) Case B: an initial posture consisting of ELV:

0^{\circ}

, ELW:

60^{\circ}

, SHU:

0^{\circ}

, SHV:

40^{\circ}

, and SHW:

0^{\circ}

; (c) Case C: an initial posture consisting of ELV:

- 90^{\circ}

, ELW:

0^{\circ}

, SHU:

90^{\circ}

, SHV:

0^{\circ}

, and SHW:

0^{\circ}

.

Figure 8. Comparison of each muscle activation time history from posture change simulation results during 2 s of the right upper extremity using muscle control functions obtained at the 300th trial with the ACRL–NGN and FCM–ML algorithms. (a) Case A: an initial posture consisting of ELV:

0^{\circ}

, ELW:

0^{\circ}

, SHU:

0^{\circ}

, SHV:

- 90^{\circ}

, and SHW:

0^{\circ}

; (b) Case B: an initial posture consisting of ELV:

0^{\circ}

, ELW:

60^{\circ}

, SHU:

0^{\circ}

, SHV:

40^{\circ}

, and SHW:

0^{\circ}

; (c) Case C: an initial posture consisting of ELV:

- 90^{\circ}

, ELW:

0^{\circ}

, SHU:

90^{\circ}

, SHV:

0^{\circ}

, and SHW:

0^{\circ}

.

Table 1. Muscles for validation of their momemt arms.

Upper Extremity	Lower Extremity
Deltoid anterior	Rectus femoris
Deltoid middle	Semitendinosus
Deltoid posterior	Semimembranosus
Teres major	Biceps femoris (long head)
Teres minor	Biceps femoris (short head)
Supraspinatus	Gluteus maximus
Infraspinatus	Gastrocnemius (medial head)
Subscapularis	Gastrocnemius (lateral head)
Biceps brachii (long head)	Vastus lateralis
Biceps brachii (short head)	Vastus intermedius
Triceps brachii (long head )	Vastus medialis
Triceps brachii (lateral head)	Sartorius
Triceps brachii (medial head)	Gracilis
Brachialis
Brachioradialis
Pronator teres
Anconeus

Table 2. Simulation results of posture stabilization of the upper extremity or lower extremity under gravity.

	Learning Algorithm	FCM	1st Trial	2nd Trial	3rd Trial	20th Trial	300th Trial
Right upper extremity	ACRL	Muscle length (ML)	3	5	5	5	5
		Joint angle (JA)	1	0	1	1	0
		None	0	1	0	1	1
	DDPG	Muscle length (ML)	1	1	4	2	4
		Joint angle (JA)	1	0	1	0	1
		None	1	0	0	1	0
Right lower extremity	ACRL	Muscle length (ML)	3	4	3	0	4
		Joint angle (JA)	2	2	1	1	2
		None	0	0	0	0	0
	DDPG	Muscle length (ML)	3	1	1	0	4
		Joint angle (JA)	1	1	2	1	2
		None	1	0	0	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Iwamoto, M.; Atsumi, N.; Kato, D. Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization. Biomimetics 2024, 9, 618. https://doi.org/10.3390/biomimetics9100618

AMA Style

Iwamoto M, Atsumi N, Kato D. Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization. Biomimetics. 2024; 9(10):618. https://doi.org/10.3390/biomimetics9100618

Chicago/Turabian Style

Iwamoto, Masami, Noritoshi Atsumi, and Daichi Kato. 2024. "Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization" Biomimetics 9, no. 10: 618. https://doi.org/10.3390/biomimetics9100618

APA Style

Iwamoto, M., Atsumi, N., & Kato, D. (2024). Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization. Biomimetics, 9(10), 618. https://doi.org/10.3390/biomimetics9100618

Article Menu

Antagonistic Feedback Control of Muscle Length Changes for Efficient Involuntary Posture Stabilization

Abstract

1. Introduction

2. Materials and Methods

2.1. Muscle Control System of the Whole–Body Musculoskeletal Model

2.2. Simulation Conditions for Learning to Stabilize a Target Posture

2.3. Simulation Conditions for Arm Motion Predictions under Different Initial Postures

3. Results

3.1. RL for Stabilization to a Targeted Posture

3.2. Arm Motion Predictions from Different Initial Postures

4. Discussion

4.1. Comparisons between FCM–ML and FCM–JA

4.2. Comparisons between ACRL–NGN and DDPG

4.3. Comparisons with Previous Iterative Learning Methods

4.4. Versatility of ACRL–NGN with FCM–ML to Other Body Parts

4.5. Application Prospects of ACRL–NGN with FCM–ML

4.6. Study Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI