1. Introduction
In control systems, the main task of the controller is to obtain an admissible control law when certain conditions are satisfied according to the dynamic characteristics of the plant. Then, the plant cooperates with other controllers to achieve the optimization of performance indicators (maximum or minimum), so as to solve the optimal control problem [
1]. Solving the optimal control problem is equivalent to solving the Hamilton–Jacobi–Bellman (HJB) equation. Because of the nonlinearity and partial derivative of the HJB equation, it is difficult to acquire its analytical solution. In recent years, the adaptive dynamic programming (ADP) method has received widespread attention for obtaining the approximate solutions of the HJB equation [
2,
3,
4,
5]. The ADP method combines the idea of dynamic programming (DP) and reinforcement learning (RL). The RL-based agent interacts with the environment and takes action to obtain cumulative rewards, and it overcomes the problem of dimensional disasters generated in DP. The implementation of the ADP method approximates the HJB equation by the approximation principle of neural network function, and obtains the optimal value function and optimal control law by RL. In [
6], the structure of ADP is proposed for the first time. Subsequently, in [
7] uses the actor–critic NN structure to deal with the nonlinear system that depends on the dynamic information. In [
8,
9,
10], the policy iteration (PI) consisting of policy evaluation and policy improvement iterative techniques is used for continuous time and discrete time dynamic systems to iteratively update the control policy online with state and input information. However, the PI learning algorithm relies on a precise dynamic system. Both drift dynamic and input dynamic are needed in the iterative process. Therefore, it is based on the improvement of the PI algorithm and uses the iterative technology of PI approximation to the optimal solution. An integral reinforcement learning (IRL) algorithm in [
11] is proposed to release drift dynamics by adding integral operations. In [
12], for a locally unknown continuous-time nonlinear system with actuator saturation, an IRL algorithm with an actor-critic network is proposed to solve the HJB equation online. Then, the experience replay technique is used to update the critic weights to solve the IRL-Bellman equation. The IRL method is used for the unknown
tracking control problem of a linear system with actuator saturation in [
13]. The designed controller not only makes the system state trajectory of convergence but the control strategy can also make the tracking error asymptotically stable.
The complexity and uncertainty of the system increase the difficulty of solving the HJB solution. Especially the nonlinear system in practical problems,
control problem is an alternative method to solve the robust optimal problem. The purpose of the
control problems is to design a controller that can effectively suppress the impact of external disturbances on system performance. Therefore, many studies propose the robust optimal control method to transform the
optimal control problem into the zero-sum game problem, which is a max/min optimization problem essentially [
14,
15]. Therefore, the Nash saddle point of the two-player zero-sum problem is regarded as solving the solution of the Hamilton–Jacobi–Isaacs (HJI) equation. In [
16,
17], an online policy iteration (PI) algorithm is presented to solve the two-player zero-sum game. An ADP-based critic–actor network is used to approximate the solution of the HJI equation. For the system containing external disturbances, a new ADP-based online IRL algorithm is proposed to approximate the HJI equation. Use current data and historical data to update network weights to improve data utilization efficiency when solving the HJI equation [
18]. In [
19], an
tracking controller is designed to solve the zero-sum game with completely unknown dynamics and constrained input. The tracking HJI equation is solved by off-policy reinforcement learning. In [
20,
21], Adaptive critic designs (ACD) are used to solve the zero-sum problem that external disturbances have an impact on system performance.
However, in engineering applications, the controller requires high physical characteristics and security, which is usually limited by the threshold. In addition, the design and analyzable methods of the control system also have higher requirements. It is not only necessary to achieve the control design goal to ensure the stability of the dynamic system but also to consider the control performance of the system to save energy and reduce consumption. The above iterative methods mostly use the time-triggered mechanism of periodic sampling to solve the nonlinear zero-sum game problem with actuator saturation.
To reduce unnecessary data transmission and frequency update between components, the event-triggered mechanism is introduced into adaptive dynamic programming for the first time in [
22]. A controller with sampling states limited by a triggering condition is designed, which not only ensures the stability and optimality of the system, but also reduces the controller update. For the system with constrained input, [
23,
24] propose an approximately optimal control structure based on the event-triggered strategy, which makes control laws update non-periodically to reduce computation and transmission costs, and ensures the uniform ultimate boundedness of the event-triggered system. The designed triggering condition is the key to the event-based controller. Not only does the triggering condition have a non-negative triggering threshold but it avoids Zeno behavior. In [
25], an event-driven controller is designed, and the triggering condition should not only have a non-negative triggering threshold but also the Zeno behavior is free. In [
26], the IRL-based event-triggered algorithm is used to solve the partially unknown nonlinear system. The critic-network updates periodically and the actor-network updates non-periodically to approximately acquire the performance index function and control law. The good convergence of the NN weight is theoretically proved and effectively avoids the Zeno phenomenon. For the system with unknown disturbance, the robust controller is designed by using the
control method. In [
15], an event-triggered
controller is structured, which introduces the disturbance attenuation level into the triggering condition, and ensures that the triggering threshold is non-negative by selecting appropriate parameters. The control law is updated with an event-triggered strategy, and the perturbation law is adjusted under the time-triggered mechanism.
At present, the event-triggered ADP algorithm has been applied to optimal regulation problem [
27,
28], optimal tracking control [
29], zero-sum game [
30,
31,
32], non-zero-sum game [
33,
34], robust control problem [
35,
36]. However, most studies are based on identification–critic NNs or critic–actor–perturbation NNs structure to approximate obtain the solution of the HJI equation for the zero-sum game problem, which often increases the communication load such as actuators and controllers, and increases the loss of resources and costs [
37,
38]. Therefore, an event-triggered ADP method is proposed to solve the
optimal control problem for partially unknown continuous-time nonlinear systems with constrained input, and the structure of a single-critic network is constructed to acquire approximately the solution of the HJI equation. This paper aims to achieve the following aspects:
Based on the event-triggered control, the triggering condition with the level of disturbance attenuation is developed to limit the sampling states of the system, and the appropriate level of disturbance attenuation is selected to ensure that the triggering condition remains non-negative.
An event-triggered input-constrained nonlinear system controller is designed. The event-based control law and disturbance law are updated at the triggering instant, and the computation in the control process is effectively reduced.
For the zero-sum game problem, a single-critic network structure based on the event-triggered ADP is proposed to approximate the solution of the HJI equation. It not only greatly reduces the update frequency of the controller and reduces the computational cost, but also the reliance on known dynamic information is relaxed.
The rest of this article is organized as follows.
Section 2 gives the description and transformation of the problem.
Section 3 introduces the event-triggered HJI equation.
Section 4 describes the implementation of the event-based ADP algorithm, and gives an analysis of system stability.
Section 5 demonstrates the simulation of the continuous-time linear system and the continuous-time (CT) nonlinear system, and
Section 6 presents the conclusion of this paper.
Notation. Some parameters are defined in this article. , , are all denoted the set of real matrices, and m, are represented as the corresponding dimension matrix. is the set of positive real numbers. is expressed as an inverse hyperbolic function, and are defined as the minimum and maximum eigenvalue of the matrix. express as the 2-norm. is denoted as the partial derivative of the function V with respect to the variable x, → denotes as numerical infinite approximation.
2. Problem Description
Consider the continuous-time nonlinear system with external disturbance as
where
denotes the states vector of system.
,
is a positive upper bound.
is drift dynamic, and has
.
is input coupling dynamic.
and
are the disturbance dynamics and bounded external disturbance.
is a positive constant with
.
Since the system is affected by constrained input,
is defined as [
39]
where
represents nonquadratic function, and
R is assumed to be diagonal matrix.
is a hyperbolic tangent function, which is treated as the constrained input.
For the
control problem with constrained input, we need to reduce the impact of external disturbances on system performance. For any
, the closed-loop system (
1) exists
and this is,
-gain not larger than
is satisfied, where
is a symmetric positive matrix.
denotes the level of disturbance attenuation.
The design intention of the
optimal control problem is to find a control law that not only guarantees the asymptotic stability of the system (
1) but also the disturbance attenuation condition (
3) holds. Then, we need to define the following value function
where
.
To achieve the above goals, the
control problem is treated as the zero-sum game problem. By the minimax optimization principle, the perturbation policy is used as a decision-maker and maximizes its value, while the control policy acts as another decision-maker and minimizes its value. The following optimal value function is indicated as
Assume that the
is continuous and differentiable, and the Bellman equation can be given as
Then, the Hamiltonian function is defined as
Definition 1 ([
12]).
A control law is defined as admission with respect to the value function (4) on a compact set Λ. If and , can ensure system (1) stability on Λ and for any the value funcion is finite. By stationarity conditions
and
, the optimal control law
and the disturbance law
are expressed as
Submitting (
8) into (
2),
can be obtained
where
, and
indicates the elements are all 1 column vectors,
is a row vector made up of the diagonal elements of
R.
Based on (
7)–(
9), the optimal time-triggered HJI equation becomes
From (
11), it is difficult to solve its solution. In addition, since the dynamic system
is unknown, it further increases the difficulty of solving the HJI equation. Thus, we prefer to introduce the IRL technology to obtain the solution to the HJI equation without requiring the system dynamic
.
The learning process of IRL mainly includes the following steps in Algorithm 1.
Algorithm 1: IRL algorithm. |
1: Choose the initial admissible control laws , . During the interval time , the value function is obtained by solving the following Bellman equation
Update the control law and disturbance law by
|
Remark 1. The IRL algorithm is an improvement over the policy iterative (PI) technique. Compared with the standard Bellman Equation (4), the IRL-based Bellman Equation (12) does not involve system dynamics and . However, in the IRL algorithm, the control policy is updated at t and applied to the next interval of the system. It is so-called periodic sampling or time-triggered control (TTC). By the TTC, the control law and disturbance law are periodically updated, which may cause large numerous resource consumption and computation costs for the network system with limited bandwidth. Therefore, an event-triggered ADP is applied to solve the HJI equation for the control problem.
3. Design of Event-Triggered -Constrained Optimal Control
An event-triggered control is introduced into ADP to solve the zero-sum game problem and then develops an event-triggered HJI equation. Then, a triggering condition is designed to limit the number of sampling states which can avoid the Zeno behavior.
In the ETC, the event-triggered instant sequences are defined as
,
and
. A novel sampling state can be obtained at the triggering instant when the triggering condition is violated. The event error determines the appearance of the sampling state, and it can be expressed as
where
is the sampling state at the triggering instant
and
represents the current state, respectively.
Remark 2. The event is triggered depending on two basic conditions, which are the event-triggered error and the triggering threshold . An event occurs and the controller updates when the error violates the triggering threshold, this is, the error is at . The holding of the sampling state in the non-triggered interval is realized by the zero-order holder (ZOH). The ZOH is used to store the control law of the previous moment and convert the sampled signal into a continuous signal. Then, the law is kept until the next triggering instant.
It is assumed that there is no delay between the sensor and the controller. According to (
15), the closed-loop system (
1) is rewritten as
Then, the event-based value function is shown as
Therefore, the optimal event-triggered control law and the disturbance law in
can be transformed into
Similar to (
11), the event-triggered HJI equation becomes
To sum up, in order to reduce communication load and computational cost, we introduce the event-triggered ADP in Algorithm 2 as follows
Algorithm 2 Event-triggered ADP algorithm. |
Choose the initial admissible control laws , . During the interval time, the value function can be obtain by solving the event-based Bellman equation ( 17). ( 18) and ( 19) can obtain the optimal control policy and the optimal disturbance policy separately. |
Assumption 1 ([
15]).
For the optimal value function is continuously differentiable, and are bounded on Λ, such that the solution and its partial derivative can be denoted as with a positive constant . Assumption 2 ([
26]).
For , there exist and and can be satisfied with and . Assumption 3 Assuming that the optimal control law and the disturbance law are Lipschitz continuous on a compact set . There exist and satisfying Theorem 1. Let be the optimal solution of (11). The optimal control laws in (18) and in (19) are obtained to ensure that the dynamic system is uniformly ultimately bounded (UUB) only when the following triggering condition is given aswhere is the level of disturbance attenuation from (3), σ is a positive design parameter. is the minimum eigenvalue of the matrix Q. The controller gets a new control law when the event error exceeds the triggering threshold . Proof. The optimal value function
is obtained from (
11) and it is positive definite for any
. Use
as the Lyapunov function, and its derivative to time is defined as
Then, (
8) and (
9) can be transformed as
According to (
25)–(
27), (
24) can be rewritten as
By the weighted form of Young’s inequality, we have
Let Assumption 1–3 holding and note
, we have
where
. From (
10), we can obtain
. If the triggering condition is given by (
23), we can obtain
Thus, it can conclude that
for
. Therefore, the optimal laws
and
can ensure that the closed-loop system (
1) is asymptotically stable by the Lyapunov theory. The proof is completed. □
From (
23), we can acquire the triggering interval currently when the event is triggered. The difference between the two triggering instants is recorded as the triggering interval. The interval can be expressed as
. However, the Zeno behavior occurs if there exists
.
Remark 3. Zeno behavior is a special dynamic behavior in hybrid systems in which an infinite number of discrete transitions occur in a finite amount of time [40]. The existence of the Zeno behavior does not guarantee that the system can be asymptotically stable. Therefore, it is required to avoid Zeno behavior. Theorem 2. Under the triggering conditions (23) given in Theorem 1, the inter-execution time exists and satisfieswhere , then the minimum interval time is not less than the positive constant. Proof. Based on [
41,
42], for
, we have
According to (
15) and its gradient is
. Since
is a constant for
, we obtain
. For the drift dynamid
is Lipschitz continuous, there exists
and satisfies
, such that the following inequality holds [
36]
From (
33) and (
36) can be rewritten as
. Let
be the solution to
and satisfying
=0. Then, its solution can be expressed as
where
. Therefore, we obtain
by comparison principle. Assume that the instant time
, and have
, we can obtain
. Then, we obtain
with
. Since
, we have
. For the asymptotically stable closed-loop system, we can obtain
for any
. Therefore,
is held, which is guaranteed to have a lower bound with
. The proof is accomplished. □
4. Approximate Solution of Event-Triggered HJI Equation
An ADP-based event-triggered algorithm is shown before. In this section, an event-based single neural network with the approximation function is constructed, namely critic NN, to approximately solve the HJI equation.
Based on the general approximation of neural networks. The value function
and its gradient
can be obtained by the neural network as
where
is the ideal weight the critic NN,
is the number of neurons,
is the suitable activation function of NN.
indicates the reconstruction error.
The approximate value function can be obtained from (
37), and substituted it into (
12) to obtain the following Bellman equation
where
. Due to the existence of the NN approximation error, the Bellman equation error is
For (
39), the optimal value is obtained only when
goes to zero indefinitely. However, the ideal weights
are unknown, they can be estimated by the current weights
. The actual value function of the approximate network can be expressed as
To solve the optimal control problem, the value function is obtained by solving the Bellman equation in the policy evaluation, and then the new control laws are obtained by the present value function in the policy improvement. Combined with (
20), (
21), and (
42), the event-triggered control law and disturbance law become
Combining (
12) with (
41), the approximate Bellman equation is described as
where the reinforcement signal during the integral interval can be defined as
In the iterative process of NN, its purpose is to minimize the value function. The gradient-descent method is used in this paper. Then, the critic network weights are adjusted online by minimizing the objective function
. Thus, the turning law of the critic NN weight can be expressed as
where
is an adaptive learning rate and
is used for normalization. Define the critic weight approximation error as
, and the square of the denominator in (
47) is used to guarantee
is bounded. Then, the critic weight error dynamic is presented by
where
,
.
Remark 4. This paper designs an online event-triggered ADP learning algorithm, the activation function must be continuously excited to ensure that the NN weights can be converged. Then, the value function and control laws are obtained. Therefore, the probing noise signal with different amplitudes and diverse frequencies is constructed as .
Remark 5. Under the triggering condition (23), the adaptive learning law of the critic network (47) updates its weight by the reinforcement signal from (46), then the control law (43), and perturbation law (44) are updated at the triggering instant. In order to intuitively clarify the main idea of the algorithm, Figure 1 shows the block diagram of the event-triggered ADP -constraint control. Assumption 4 and are Lipschitz continuous and satisfy and .
is Lipschitz continuous, that is , is a positive constant.
Assumption 5 The NN activation function and its gradient are bounded by positive constants so that ,.
The critic NNs approximation error and its gradient are all bounded by a constant and satisfied , .
The control law
is admissible control. The approximate weight
can be guaranteed to converge to ideal weight
under persistent excitation (PE) condition. Assume that the activation function
is continuously exciting in the interval
. Let the residual error due to the approximation error satisfies
, and it can ensure that the critic network weight is UUB. The boundedness condition of critic NN weight is expressed at (
50).
Theorem 3. Let Assumption 1–5 be valid, and the control laws and the turning law of critic NN are implemented by (43), (44), and (47). The closed-loop system (1) is asymptotically stable and the weight approximation error (48) is UUB only when the triggering condition in (23) is used and the following inequalities condition holdswhere and are constants from (60) and (63). Proof. The initial control for the dynamical system (
1) is admissible. For the value function (
4) of the system, the Lyapunov function is constructed as
where
,
,
.
The process of NN learning is mainly divided into two cases under event-triggered control: (1) During the flow dynamic on . (2) At the triggering instant .
Case 1. When the triggering condition is not satisfied and the derivative of
can be given as
According to (
25)–(
27) and (
51) can be transformed as
Similarly, using Young’s inequality, we have
Combining (
8) with (
38), then we obtain
Note that
and
, Then, we have
According to Assumption 4 and Assumption 5, we obtain
Similarly, it can be obtained
According to (
55)–(
57),
rewrite as
Lisewise, combining (
9) with (
37),
can be given as
According to (
53), (
58) and (
59), we have
where
If the event is not triggered, we have
. Note that
, thus,
m,
from (
48) are bounded. Then, according to
, the part of
can be given as
Combining (
60) and (
61), we have
Let the triggering condition (
23) holding the derivative of
become
where
. Let the triggering condition (
23) and the inequality condition (
49) hold. We can conclude that
if the
and
are satisfied with
Therefore, by the Lyapunov theory, the state x and the NN weight error are uniformly ultimately bounded.
Case 2. The event is triggered at
. A new Lyapunov function is constructed as
where
,
,
. If
, then we have
with
. It can be concluded
x and
are UUB in the flow dynamic from Case 1 such that the closed-loop system (
1) is asymptotically stable. Since
and
are continuous, then we can obtain
and
. At the triggering instant
, we have
where
, and
is a class-
function [
43]. Therefore, we can obtain
. The above two cases ensure that the closed-loop system under the event-triggered strategy is asymptotically stable and the NN weight error is UUB if the triggering condition (
23) and the inequality (
49) are held. The proof is completed. □
Remark 6. The theoretical description has been shown. Algorithm 2 can be implemented by using a single critic NN, which updates the critic weight using the turning law (47). Then, the new control laws and are obtained by (43) and (44) when an event occurs. Compared with [12,26,35,44], we propose the event-triggered ADP method which solves the zero-sum problem with actuator saturation and partially unknown dynamic. Based on event-triggered control, the control laws of the two-player are presented in segments by the designed triggering condition (23). Thus, the event-triggered control avoids unnecessary data transmission and calculation costs. 5. Simulation Results
In this section, two simulation examples of the linear system and the nonlinear system are provided to verify the effectiveness of the online event-triggered adaptive dynamic programming algorithm for the continuous-time system.
5.1. Linear Sytsem
Consider the F-16 aircraft continuous-time linear system with external perturbation as [
12]
where
, ,
The three state vectors are denoted as , and the initial state is . Choose a matrix of the appropriate dimension, where and . The constrained control can be satisfied with . The activation function of single NNs is . Then, the approximated weight vectors are denoted as , which is the solution of the algebraic Riccati equation (ARE): . The NN learning and triggering parameters are needed in this paper: , , , . The external disturbance is selected as .
Figure 2 shows the convergence of three state vectors. It is obvious that the states tend to be asymptotically stable. The iterative process of critic NN is illustrated in
Figure 3. The critic weight vectors are converged to
0.0349,−0.0128,−0.1575, 1.4738, 0.6577, 0.2075
under the PE condition after 60s. Therefore, the PE condition is effectively guaranteed by adding detection noise. Under the event-triggered strategy, the control law and the perturbation law are updated in a nonperiodical manner. The evolution of ETC laws
and
is described in
Figure 4. Under the condition of constrained input, the control law remains no more than 1, that is
. The trajectory of event error (
15) and the triggering threshold (
23) is shown in
Figure 5, which verifies the working effect of the triggering condition.
Figure 6 presents the event-triggered sample intervals and periods under the triggering condition. Compared with 1000 sampling states based on the time-triggered strategy, only 85 state samples are performed based on the event-triggered strategy. From
Table 1, the minimum interval time
, and the Zeno phenomenon is avoided. As a result, the updated frequencies and computations of the controller are greatly reduced.
5.2. Nonlinear Sytsem
Consider a continuous-time nonlinear system with the external disturbance given as
where
,
The state vectors are denoted as , and the initial state is . The constrained input control is satisfied with . Choose and . The activation function is selected as . Then, the approximated weight vectors are denoted as . The NN learning and triggering parameters are needed in this paper: , , , . The external disturbance is selected as similarly.
The two state trajectories of the closed-loop system are described in
Figure 7. It is obvious that
and
gradually converge to asymptotically stable. The ideal critic weight is
. The NNs can be ensured to converge under the action of PE condition.
Figure 8 demonstrates the approximable critic weight vectors are converged to
after learning. From
Figure 9, we can obtain the evolution of ETC policies
and
. Under the condition of constrained input, the control law remains no more than 1, that is
. The trajectory of the event error and the triggering threshold are shown in
Figure 10. It can be seen intuitively that the event-triggered errors are always within the triggering thresholds.
Figure 11 indicates the event-triggered sampling intervals under the event-triggered mechanism. From
Table 2, it has only 157 sampling states are performed based on an event-triggered strategy. Likewise, states will be sampled at each time by using a time-triggered strategy.