Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input

Peng, Binbin; Cui, Xiaohong; Cui, Yang; Chen, Wenjie

doi:10.3390/app13042140

Open AccessArticle

Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input

by

Binbin Peng

¹

,

Xiaohong Cui

^1,2,*,

Yang Cui

³ and

Wenjie Chen

¹

College of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou 310018, China

²

Key Laboratory of Intelligent Manufacturing Quality Big Data Tracing and Analysis of Zhejiang Province, China Jiliang University, Hangzhou 310018, China

³

School of Electronic and Information Engineering, University of Science and Technology Liaoning, Shenyang 114051, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(4), 2140; https://doi.org/10.3390/app13042140

Submission received: 17 December 2022 / Revised: 3 February 2023 / Accepted: 6 February 2023 / Published: 7 February 2023

(This article belongs to the Special Issue Adaptive Dynamic Programming and Its Control Applications in Intelligent Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this paper, an event-triggered adaptive dynamic programming (ADP) method is proposed to deal with the

H_{\infty}

problem with unknown dynamic and constrained input. Firstly, the

H_{\infty}

-constrained problem is regarded as the two-player zero-sum game with the nonquadratic value function. Secondly, we develop the event-triggered Hamilton–Jacobi–Isaacs(HJI) equation, and an event-triggered ADP method is proposed to solve the HJI equation, which is equivalent to solving the Nash saddle point of the zero-sum game. An event-based single-critic neural network (NN) is applied to obtain the optimal value function, which reduces the communication resource and computational cost of algorithm implementation. For the event-triggered control, a triggering condition with the level of disturbance attenuation is developed to limit the number of sampling states, and the condition avoids Zeno behavior by proving the existence of events with minimum triggering interval. It is proved theoretically that the closed-loop system is asymptotically stable, and the critic NN weight error is uniformly ultimately boundedness (UUB). The learning performance of the proposed algorithm is verified by two examples.

Keywords:

event-triggered control (ETC); adaptive dynamic programming (ADP); zero-sum game; neural network (NN); integral reinforcement learning (IRL)

1. Introduction

In control systems, the main task of the controller is to obtain an admissible control law when certain conditions are satisfied according to the dynamic characteristics of the plant. Then, the plant cooperates with other controllers to achieve the optimization of performance indicators (maximum or minimum), so as to solve the optimal control problem [1]. Solving the optimal control problem is equivalent to solving the Hamilton–Jacobi–Bellman (HJB) equation. Because of the nonlinearity and partial derivative of the HJB equation, it is difficult to acquire its analytical solution. In recent years, the adaptive dynamic programming (ADP) method has received widespread attention for obtaining the approximate solutions of the HJB equation [2,3,4,5]. The ADP method combines the idea of dynamic programming (DP) and reinforcement learning (RL). The RL-based agent interacts with the environment and takes action to obtain cumulative rewards, and it overcomes the problem of dimensional disasters generated in DP. The implementation of the ADP method approximates the HJB equation by the approximation principle of neural network function, and obtains the optimal value function and optimal control law by RL. In [6], the structure of ADP is proposed for the first time. Subsequently, in [7] uses the actor–critic NN structure to deal with the nonlinear system that depends on the dynamic information. In [8,9,10], the policy iteration (PI) consisting of policy evaluation and policy improvement iterative techniques is used for continuous time and discrete time dynamic systems to iteratively update the control policy online with state and input information. However, the PI learning algorithm relies on a precise dynamic system. Both drift dynamic and input dynamic are needed in the iterative process. Therefore, it is based on the improvement of the PI algorithm and uses the iterative technology of PI approximation to the optimal solution. An integral reinforcement learning (IRL) algorithm in [11] is proposed to release drift dynamics by adding integral operations. In [12], for a locally unknown continuous-time nonlinear system with actuator saturation, an IRL algorithm with an actor-critic network is proposed to solve the HJB equation online. Then, the experience replay technique is used to update the critic weights to solve the IRL-Bellman equation. The IRL method is used for the unknown

H_{\infty}

tracking control problem of a linear system with actuator saturation in [13]. The designed controller not only makes the system state trajectory of convergence but the control strategy can also make the tracking error asymptotically stable.

The complexity and uncertainty of the system increase the difficulty of solving the HJB solution. Especially the nonlinear system in practical problems,

H_{\infty}

control problem is an alternative method to solve the robust optimal problem. The purpose of the

H_{\infty}

control problems is to design a controller that can effectively suppress the impact of external disturbances on system performance. Therefore, many studies propose the robust optimal control method to transform the

H_{\infty}

optimal control problem into the zero-sum game problem, which is a max/min optimization problem essentially [14,15]. Therefore, the Nash saddle point of the two-player zero-sum problem is regarded as solving the solution of the Hamilton–Jacobi–Isaacs (HJI) equation. In [16,17], an online policy iteration (PI) algorithm is presented to solve the two-player zero-sum game. An ADP-based critic–actor network is used to approximate the solution of the HJI equation. For the system containing external disturbances, a new ADP-based online IRL algorithm is proposed to approximate the HJI equation. Use current data and historical data to update network weights to improve data utilization efficiency when solving the HJI equation [18]. In [19], an

H_{\infty}

tracking controller is designed to solve the zero-sum game with completely unknown dynamics and constrained input. The tracking HJI equation is solved by off-policy reinforcement learning. In [20,21], Adaptive critic designs (ACD) are used to solve the zero-sum problem that external disturbances have an impact on system performance.

However, in engineering applications, the controller requires high physical characteristics and security, which is usually limited by the threshold. In addition, the design and analyzable methods of the control system also have higher requirements. It is not only necessary to achieve the control design goal to ensure the stability of the dynamic system but also to consider the control performance of the system to save energy and reduce consumption. The above iterative methods mostly use the time-triggered mechanism of periodic sampling to solve the nonlinear zero-sum game problem with actuator saturation.

To reduce unnecessary data transmission and frequency update between components, the event-triggered mechanism is introduced into adaptive dynamic programming for the first time in [22]. A controller with sampling states limited by a triggering condition is designed, which not only ensures the stability and optimality of the system, but also reduces the controller update. For the system with constrained input, [23,24] propose an approximately optimal control structure based on the event-triggered strategy, which makes control laws update non-periodically to reduce computation and transmission costs, and ensures the uniform ultimate boundedness of the event-triggered system. The designed triggering condition is the key to the event-based controller. Not only does the triggering condition have a non-negative triggering threshold but it avoids Zeno behavior. In [25], an event-driven controller is designed, and the triggering condition should not only have a non-negative triggering threshold but also the Zeno behavior is free. In [26], the IRL-based event-triggered algorithm is used to solve the partially unknown nonlinear system. The critic-network updates periodically and the actor-network updates non-periodically to approximately acquire the performance index function and control law. The good convergence of the NN weight is theoretically proved and effectively avoids the Zeno phenomenon. For the system with unknown disturbance, the robust controller is designed by using the

H_{\infty}

control method. In [15], an event-triggered

H_{\infty}

controller is structured, which introduces the disturbance attenuation level into the triggering condition, and ensures that the triggering threshold is non-negative by selecting appropriate parameters. The control law is updated with an event-triggered strategy, and the perturbation law is adjusted under the time-triggered mechanism.

At present, the event-triggered ADP algorithm has been applied to optimal regulation problem [27,28], optimal tracking control [29], zero-sum game [30,31,32], non-zero-sum game [33,34], robust control problem [35,36]. However, most studies are based on identification–critic NNs or critic–actor–perturbation NNs structure to approximate obtain the solution of the HJI equation for the zero-sum game problem, which often increases the communication load such as actuators and controllers, and increases the loss of resources and costs [37,38]. Therefore, an event-triggered ADP method is proposed to solve the

H_{\infty}

optimal control problem for partially unknown continuous-time nonlinear systems with constrained input, and the structure of a single-critic network is constructed to acquire approximately the solution of the HJI equation. This paper aims to achieve the following aspects:

Based on the event-triggered control, the triggering condition with the level of disturbance attenuation is developed to limit the sampling states of the system, and the appropriate level of disturbance attenuation is selected to ensure that the triggering condition remains non-negative.
An event-triggered $H_{\infty}$ input-constrained nonlinear system controller is designed. The event-based control law and disturbance law are updated at the triggering instant, and the computation in the control process is effectively reduced.
For the zero-sum game problem, a single-critic network structure based on the event-triggered ADP is proposed to approximate the solution of the HJI equation. It not only greatly reduces the update frequency of the controller and reduces the computational cost, but also the reliance on known dynamic information is relaxed.

The rest of this article is organized as follows. Section 2 gives the description and transformation of the problem. Section 3 introduces the event-triggered HJI equation. Section 4 describes the implementation of the event-based ADP algorithm, and gives an analysis of system stability. Section 5 demonstrates the simulation of the continuous-time linear system and the continuous-time (CT) nonlinear system, and Section 6 presents the conclusion of this paper.

Notation. Some parameters are defined in this article.

R

,

R^{m}

,

R^{m \times n}

are all denoted the set of real matrices, and m,

m \times n

are represented as the corresponding dimension matrix.

N^{+}

is the set of positive real numbers.

{tanh}^{- T} (\cdot)

is expressed as an inverse hyperbolic function,

\underset{̲}{λ} (\cdot)

and

\bar{λ} (\cdot)

are defined as the minimum and maximum eigenvalue of the matrix.

∥ \cdot ∥

express as the 2-norm.

\nabla V \equiv \partial V / \partial x

is denoted as the partial derivative of the function V with respect to the variable x, → denotes as numerical infinite approximation.

2. Problem Description

Consider the continuous-time nonlinear system with external disturbance as

\dot{x} = f (x (t)) + g (x (t)) u (t) + h (x (t)) υ (t)

(1)

where

x \in R^{n}

denotes the states vector of system.

u (t) = {(u_{1}, u_{2}, \dots, u_{m}) \in R^{m} : ∥ u_{i} ∥ \leq u_{M}, i = 1, 2, \dots, m}

,

u_{M}

is a positive upper bound.

f (x) \in R^{n}

is drift dynamic, and has

f (0) = 0

.

g (x) \in R^{n \times m}

is input coupling dynamic.

h (x) \in R^{n \times q}

and

υ (t) \in R^{q}

are the disturbance dynamics and bounded external disturbance.

υ_{M}

is a positive constant with

∥ υ (t) ∥ \leq υ_{M}

.

Since the system is affected by constrained input,

M (u)

is defined as [39]

\begin{matrix} M (u) = 2 \int_{0}^{u} u_{M} {tanh}^{- T} (μ / u_{M}) R d μ \\ = 2 \sum_{i = 1}^{m} \int_{0}^{u_{i}} u_{M} {tanh}^{- T} (μ_{i} / u_{M}) R d μ_{i} \end{matrix}

(2)

where

M (u)

represents nonquadratic function, and R is assumed to be diagonal matrix.

{tanh}^{- T} (\cdot)

is a hyperbolic tangent function, which is treated as the constrained input.

For the

H_{\infty}

control problem with constrained input, we need to reduce the impact of external disturbances on system performance. For any

υ (t) \in L_{2} [0, \infty)

, the closed-loop system (1) exists

\int_{t}^{\infty} (x^{T} Q x + M (u)) d τ \leq γ^{2} \int_{t}^{\infty} υ^{T} (τ) υ (τ) d τ

(3)

and this is,

L_{2}

-gain not larger than

γ

is satisfied, where

Q \in R^{n \times n}

is a symmetric positive matrix.

γ > 0

denotes the level of disturbance attenuation.

The design intention of the

H_{\infty}

optimal control problem is to find a control law that not only guarantees the asymptotic stability of the system (1) but also the disturbance attenuation condition (3) holds. Then, we need to define the following value function

J (x (t), u, v) = \int_{t}^{\infty} U (x (τ), u (τ), υ (τ)) d τ

(4)

where

U (x, u, υ) = x^{T} Q x + M (u) - γ^{2} υ^{T} υ

.

To achieve the above goals, the

H_{\infty}

control problem is treated as the zero-sum game problem. By the minimax optimization principle, the perturbation policy is used as a decision-maker and maximizes its value, while the control policy acts as another decision-maker and minimizes its value. The following optimal value function is indicated as

V^{*} (x) = J (x (t), u^{*}, v^{*}) = min_{u} max_{υ} \int_{t}^{\infty} U (x (τ), u (τ), υ (τ)) d τ

(5)

Assume that the

V (x) = J (x, u, υ)

is continuous and differentiable, and the Bellman equation can be given as

\nabla V^{T} (f + g u + h υ) + x^{T} Q x + M (u) - γ^{2} υ^{T} υ = 0

(6)

Then, the Hamiltonian function is defined as

H (x, \nabla V, u, υ) = \nabla V^{* T} (f + g u + h υ) + M (u) + x^{T} Q x - γ^{2} υ^{T} υ

(7)

Definition 1

([12]). A control law is defined as admission with respect to the value function (4) on a compact set Λ. If

u (0) = 0

and

υ (0) = 0

,

u (x)

can ensure system (1) stability on Λ and for any

x_{0} \in Λ

the value funcion

V (x_{0})

is finite.

By stationarity conditions

\partial H / \partial u = 0

and

\partial H / \partial υ = 0

, the optimal control law

u^{*} (x)

and the disturbance law

υ^{*} (x)

are expressed as

u^{*} (x) = - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} (x) \nabla V^{*} (x))

(8)

υ^{*} (x) = \frac{1}{2} γ^{- 2} h^{T} (x) \nabla V^{*} (x)

(9)

Submitting (8) into (2),

M (u^{*})

can be obtained

M (u^{*}) = u_{M} \nabla {V^{*}}^{T} g tanh (Ψ) + u_{M}^{2} \bar{R} ln (\underset{̲}{1} - {tanh}^{2} (Ψ))

(10)

where

Ψ = 1 / (2 u_{M}) R^{- 1} g^{T} \nabla V^{*} (x)

, and

\underset{̲}{1}

indicates the elements are all 1 column vectors,

\bar{R}

is a row vector made up of the diagonal elements of R.

Based on (7)–(9), the optimal time-triggered HJI equation becomes

\begin{matrix} \nabla V^{* T} (x) f (x) + x^{T} Q x + u_{M}^{2} \bar{R} ln (\underset{̲}{1} - {tanh}^{2} (Ψ)) - \frac{1}{4} γ^{- 2} \nabla V^{* T} h (x) h^{T} (x) \nabla V^{*} (x) = 0 \end{matrix}

(11)

From (11), it is difficult to solve its solution. In addition, since the dynamic system

f (x)

is unknown, it further increases the difficulty of solving the HJI equation. Thus, we prefer to introduce the IRL technology to obtain the solution to the HJI equation without requiring the system dynamic

f (x)

.

The learning process of IRL mainly includes the following steps in Algorithm 1.

Algorithm 1: IRL algorithm.

1: Choose the initial admissible control laws

u_{0}

,

υ_{0}

.

2 : Policy Evaluation .

During the interval time

[t - T, t]

, the value function

V^{[i]} (x)

is obtained by solving the following Bellman equation

V^{[i]} (x (t - T)) = \int_{t - T}^{t} (x^{T} Q x + M (u^{[i]}) - γ^{2} υ^{[i] T} υ^{[i]}) d τ + V^{[i]} (x (t))

(12)

3 : Policy Inprovement .

Update the control law and disturbance law by

u^{[i + 1]} (x) = - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} (x) \nabla V^{[i]} (x))

(13)

υ^{[i + 1]} (x) = \frac{1}{2} γ^{- 2} h^{T} (x) \nabla V^{[i]} (x)

(14)

Remark 1.

The IRL algorithm is an improvement over the policy iterative (PI) technique. Compared with the standard Bellman Equation (4), the IRL-based Bellman Equation (12) does not involve system dynamics

f (x)

and

g (x)

.

However, in the IRL algorithm, the control policy is updated at t and applied to the next interval

[t, t + T]

of the system. It is so-called periodic sampling or time-triggered control (TTC). By the TTC, the control law and disturbance law are periodically updated, which may cause large numerous resource consumption and computation costs for the network system with limited bandwidth. Therefore, an event-triggered ADP is applied to solve the HJI equation for the

H_{\infty}

control problem.

3. Design of Event-Triggered $H_{\infty}$ -Constrained Optimal Control

An event-triggered control is introduced into ADP to solve the zero-sum game problem and then develops an event-triggered HJI equation. Then, a triggering condition is designed to limit the number of sampling states which can avoid the Zeno behavior.

In the ETC, the event-triggered instant sequences are defined as

{t_{k}}_{k = 0}^{\infty}

,

t_{k} < t_{k + 1}

and

k \in N^{+}

. A novel sampling state can be obtained at the triggering instant when the triggering condition is violated. The event error determines the appearance of the sampling state, and it can be expressed as

e_{k} (t) = {\hat{x}}_{k} - x (t), t \in [t_{k}, t_{k + 1})

(15)

where

{\hat{x}}_{k} = x (t_{k})

is the sampling state at the triggering instant

t_{k}

and

x (t)

represents the current state, respectively.

Remark 2.

The event is triggered depending on two basic conditions, which are the event-triggered error

e_{k}

and the triggering threshold

e_{T}

. An event occurs and the controller updates when the error violates the triggering threshold, this is, the error is

e_{k} = 0

at

t = t_{k}

. The holding of the sampling state in the non-triggered interval

t \in [t_{k}, t_{k + 1})

is realized by the zero-order holder (ZOH). The ZOH is used to store the control law of the previous moment and convert the sampled signal into a continuous signal. Then, the law is kept until the next triggering instant.

It is assumed that there is no delay between the sensor and the controller. According to (15), the closed-loop system (1) is rewritten as

\dot{x} = f (x) + g (x) u ({\hat{x}}_{k}) + h (x) υ ({\hat{x}}_{k}), t \in [t_{k}, t_{k + 1})

(16)

Then, the event-based value function is shown as

\begin{matrix} V (x (t - T)) = \int_{t - T}^{t} (x^{T} Q x + M (u ({\hat{x}}_{k})) - γ^{2} υ^{T} ({\hat{x}}_{k}) υ ({\hat{x}}_{k})) d τ + V (x (t)) \end{matrix}

(17)

Therefore, the optimal event-triggered control law and the disturbance law in

t \in [t_{k}, t_{k + 1})

can be transformed into

u^{*} ({\hat{x}}_{k}) = - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} ({\hat{x}}_{k}) \nabla V^{*} ({\hat{x}}_{k}))

(18)

υ^{*} ({\hat{x}}_{k}) = \frac{1}{2} γ^{- 2} h^{T} ({\hat{x}}_{k}) \nabla V^{*} ({\hat{x}}_{k})

(19)

Similar to (11), the event-triggered HJI equation becomes

\begin{matrix} \nabla V & ^{* T} (x) f (x) + x^{T} Q x + M (u^{*} ({\hat{x}}_{k})) - u_{M} V^{* T} (x) g tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} ({\hat{x}}_{k}) \nabla V^{*} ({\hat{x}}_{k})) \\ + \frac{1}{2} γ^{- 2} \nabla V^{* T} (x) h (x) h^{T} ({\hat{x}}_{k}) \nabla V^{*} ({\hat{x}}_{k}) - \frac{1}{4} γ^{- 2} \nabla V^{* T} ({\hat{x}}_{k}) h ({\hat{x}}_{k}) h^{T} ({\hat{x}}_{k}) \nabla V^{*} ({\hat{x}}_{k}) = 0 \end{matrix}

(20)

To sum up, in order to reduce communication load and computational cost, we introduce the event-triggered ADP in Algorithm 2 as follows

Algorithm 2 Event-triggered ADP algorithm.

1 :

Choose the initial admissible control laws

u_{0}

,

υ_{0}

.

2 : Policy Evaluation .

During the interval time, the value function

V (x)

can be obtain by solving the event-based Bellman equation (17).

3 : Policy Inprovement .

(18) and (19) can obtain the optimal control policy

u^{*} ({\hat{x}}_{k})

and the optimal disturbance policy

υ^{*} ({\hat{x}}_{k})

separately.

Assumption 1

([15]). For the optimal value function

V^{*} (x)

is continuously differentiable,

V^{*} (x)

and

\nabla V^{*} (x)

are bounded on Λ, such that the solution

V^{*} (x)

and its partial derivative

\nabla V^{*} (x)

can be denoted as

max {∥ V^{*} (x) ∥, ∥ \nabla V^{*} (x) ∥} \leq η_{0}

with a positive constant

η_{0}

.

Assumption 2

([26]). For

\forall x \in R^{n}

, there exist

g_{M} > 0

and

h_{M} > 0

and can be satisfied with

∥ g (x) ∥ \leq g_{M}

and

∥ h (x) ∥ \leq h_{M}

.

Assumption 3

Assuming that the optimal control law and the disturbance law are Lipschitz continuous on a compact set

Λ \in R^{n}

. There exist

L_{u} > 0

and

L_{υ} > 0

satisfying

∥u^{*} (x) - u^{*} ({\hat{x}}_{k})∥ \leq L_{u} ∥e_{k} (t)∥

(21)

∥υ^{*} (x) - υ^{*} ({\hat{x}}_{k})∥ \leq L_{υ} ∥e_{k} (t)∥

(22)

Theorem 1.

Let

V^{*} (x)

be the optimal solution of (11). The optimal control laws

u^{*} ({\hat{x}}_{k})

in (18) and

υ^{*} (x)

in (19) are obtained to ensure that the dynamic system is uniformly ultimately bounded (UUB) only when the following triggering condition is given as

∥e_{k} (t)∥ \leq \sqrt{\frac{(1 - σ^{2})}{L_{u}^{2} + γ^{2} L_{υ}^{2}} \underset{̲}{λ} (Q) {∥x∥}^{2}} ≜ e_{T}

(23)

where

γ > 0

is the level of disturbance attenuation from (3), σ is a positive design parameter.

\underset{̲}{λ} (Q)

is the minimum eigenvalue of the matrix Q. The controller gets a new control law when the event error

e_{k}

exceeds the triggering threshold

e_{T}

.

Proof.

The optimal value function

V^{*} (x)

is obtained from (11) and it is positive definite for any

x \neq 0

. Use

V^{*} (x)

as the Lyapunov function, and its derivative to time is defined as

\begin{matrix} {\dot{V}}^{*} (x) & = \nabla V^{* T} (x) (f + g u^{*} ({\hat{x}}_{k}) + h υ^{*} ({\hat{x}}_{k})) \\ = \nabla V^{* T} (x) g (u^{*} ({\hat{x}}_{k}) - u^{*} (x)) + \nabla V^{* T} (x) h (υ^{*} ({\hat{x}}_{k}) - υ^{*} (x)) \\ + \nabla V^{* T} (x) (f + g u^{*} (x) + h υ^{*} (x)) \end{matrix}

(24)

From (6), we can acquire

\nabla V^{* T} (x) (f + g u^{*} (x) + h υ^{*} (x)) = - x^{T} Q x - M (u^{*} (x)) + γ^{2} υ^{* T} υ^{*}

(25)

Then, (8) and (9) can be transformed as

\nabla V^{* T} (x) g = - 2 u_{M} R {tanh}^{- T} (\frac{u^{*} (x)}{u_{M}})

(26)

\nabla V^{* T} (x) h = 2 γ^{2} {υ^{*}}^{T} (x)

(27)

According to (25)–(27), (24) can be rewritten as

\begin{matrix} {\dot{V}}^{*} (x) & = \underset{s_{1}}{\underset{︸}{2 u_{M} R {tanh}^{- T} (\frac{u^{*} (x)}{u_{M}}) (u^{*} (x) - u^{*} ({\hat{x}}_{k}))}} - x^{T} Q x \\ + \underset{s_{2}}{\underset{︸}{2 γ^{2} υ^{* T} (x) (υ^{*} ({\hat{x}}_{k}) - υ^{*} (x))}} - M (u^{*} (x)) + γ^{2} {υ^{*}}^{T} υ^{*} \end{matrix}

(28)

By the weighted form of Young’s inequality, we have

\begin{matrix} s_{1} & \leq ∥ u_{M} R {tanh}^{- T} (\frac{u^{*} (x)}{u_{M}}) ∥^{2} + {∥u^{*} (x) - u^{*} ({\hat{x}}_{k})∥}^{2} \\ \leq \frac{1}{4} ∥ \nabla V^{* T} (x) g ∥^{2} + L_{u}^{2} {∥e_{k} (t)∥}^{2} \end{matrix}

(29)

Similarly, we can obtain

\begin{matrix} s_{2} & \leq γ^{2} {∥ υ^{*} ∥}^{2} + γ^{2} {∥υ^{*} ({\hat{x}}_{k}) - υ^{*} (x)∥}^{2} \\ \leq γ^{2} {∥ υ^{*} ∥}^{2} + γ^{2} L_{υ}^{2} {∥e_{k} (t)∥}^{2} \end{matrix}

(30)

Let Assumption 1–3 holding and note

x^{T} Q x \geq \underset{̲}{λ} (Q) {∥x∥}^{2}

, we have

\begin{matrix} {\dot{V}}^{*} (x) & = - x^{T} Q x - M (u^{*} (x)) + 2 γ^{2} {∥ υ^{*} (x) ∥}^{2} + γ^{2} L_{υ}^{2} {∥e_{k} (t)∥}^{2} \\ + \frac{1}{4} ∥ \nabla V^{* T} (x) g ∥^{2} + L_{u}^{2} {∥e_{k} (t)∥}^{2} \\ \leq - (1 - σ^{2}) \underset{̲}{λ} (Q) {∥x∥}^{2} - σ^{2} \underset{̲}{λ} (Q) {∥x∥}^{2} + (L_{u}^{2} + γ^{2} L_{υ}^{2}) {∥ e_{k} ∥}^{2} \\ + \frac{1}{4} ∥ \nabla V^{* T} (x) g ∥^{2} + \frac{1}{2 γ^{2}} ∥ \nabla V^{* T} (x) h ∥^{2} - M (u^{*} (x)) \\ \leq - (1 - σ^{2}) \underset{̲}{λ} (Q) {∥x∥}^{2} - M (u^{*} (x)) + (L_{u}^{2} + γ^{2} L_{υ}^{2}) {∥e_{k}∥}^{2} - σ^{2} \underset{̲}{λ} (Q) {∥x∥}^{2} + ω_{0} \end{matrix}

(31)

where

ω_{0} = (1 / 4) η_{0}^{2} g_{M}^{2} + (η_{0}^{2} h_{M}^{2}) / (2 γ^{2})

. From (10), we can obtain

- M (u^{*}) \leq 0

. If the triggering condition is given by (23), we can obtain

{\dot{V}}^{*} (x) \leq - σ^{2} \underset{̲}{λ} (Q) {∥x∥}^{2} + ω_{0}

(32)

Thus, it can conclude that

{\dot{V}}^{*} (x) < 0

for

∥ x ∥ > \sqrt{ω_{0} / (σ^{2} \underset{̲}{λ} (Q))}

. Therefore, the optimal laws

u^{*} ({\hat{x}}_{k})

and

υ^{*} ({\hat{x}}_{k})

can ensure that the closed-loop system (1) is asymptotically stable by the Lyapunov theory. The proof is completed. □

From (23), we can acquire the triggering interval currently when the event is triggered. The difference between the two triggering instants is recorded as the triggering interval. The interval can be expressed as

τ_{k} = t_{k + 1} - t_{k}

. However, the Zeno behavior occurs if there exists

{(τ_{k})}_{m i n} = 0

.

Remark 3.

Zeno behavior is a special dynamic behavior in hybrid systems in which an infinite number of discrete transitions occur in a finite amount of time [40]. The existence of the Zeno behavior does not guarantee that the system can be asymptotically stable. Therefore, it is required to avoid Zeno behavior.

Theorem 2.

Under the triggering conditions (23) given in Theorem 1, the inter-execution time exists

τ_{k} = t_{k + 1} - t_{k}

and satisfies

τ_{k} \geq \frac{Γ (t)}{L_{x} (1 + Γ (t))}

(33)

where

Γ (t) = ∥ e_{k} (t) ∥ / ∥ x (t) ∥

, then the minimum interval time is not less than the positive constant.

Proof.

Based on [41,42], for

t \in (t_{k}, t_{k + 1})

, we have

\begin{matrix} \dot{Γ} (t) & = \frac{d}{d t} \frac{∥ e_{k} ∥}{∥x∥} = \frac{d}{d t} \frac{{(e_{k}^{T} e_{k})}^{\frac{1}{2}}}{{(x^{T} x)}^{\frac{1}{2}}} \\ = \frac{{(e_{k}^{T} e_{k})}^{- \frac{1}{2}} e_{k}^{T} {\dot{e}}_{k} {(x^{T} x)}^{\frac{1}{2}}}{{∥x∥}^{2}} - \frac{{(x^{T} x)}^{- \frac{1}{2}} x^{T} \dot{x} {(e_{k}^{T} e_{k})}^{\frac{1}{2}}}{{∥x∥}^{2}} \\ = - \frac{e_{k}^{T} \dot{x}}{∥ e_{k} ∥ ∥ x ∥} - \frac{x^{T} \dot{x} ∥ e_{k} ∥}{{∥ x ∥}^{3}} \end{matrix}

(34)

According to (15) and its gradient is

{\dot{e}}_{k} (t) = {\dot{\hat{x}}}_{k} - \dot{x} (t)

. Since

{\hat{x}}_{k}

is a constant for

t \in [t_{k}, t_{k + 1})

, we obtain

{\dot{e}}_{k} (t) = - \dot{x} (t)

. For the drift dynamid

f (x)

is Lipschitz continuous, there exists

L_{x} > 0

and satisfies

∥ f (x) ∥ \leq L_{x} ∥ x ∥

, such that the following inequality holds [36]

\begin{matrix} ∥\dot{x}∥ & = ∥{\dot{e}}_{k}∥ = ∥ f (x) + g (x) u ({\hat{x}}_{k}) + h (x) υ ({\hat{x}}_{k}) ∥ \\ \leq ∥ f (x) ∥ + ∥ g (x) u ({\hat{x}}_{k}) ∥ + ∥ h (x) υ ({\hat{x}}_{k}) ∥ \\ \leq L_{x} ∥ x + e_{k} ∥ \end{matrix}

(35)

Thus, we can obtain

\begin{matrix} \dot{Γ} (t) & \leq \frac{∥ e_{k} ∥ ∥ \dot{x} ∥}{∥ e_{k} ∥ ∥ x ∥} + \frac{∥x∥ ∥\dot{x}∥}{∥ x ∥ ∥ x ∥} \frac{∥ e_{k} ∥}{∥x∥} \\ = \frac{∥\dot{x}∥}{∥ x ∥} (1 + \frac{∥ e_{k} ∥}{∥ x ∥}) \\ \leq \frac{L_{x} ∥e_{k}∥ + L_{x} ∥x∥}{∥x∥} (1 + \frac{∥ e_{k} ∥}{∥x∥}) \\ = L_{x} {(1 + \frac{∥ e_{k} ∥}{∥ x ∥})}^{2} \end{matrix}

(36)

From (33) and (36) can be rewritten as

\dot{Γ} (t) \leq L_{x} (1 + Γ (t))^{2}

. Let

ϱ (t)

be the solution to

\dot{ϱ} (t) = L_{x} (1 + ϱ (t))^{2}

and satisfying

ϱ (0)

=0. Then, its solution can be expressed as

ϱ (t) = L_{x} (t - t_{k}) / (1 - L_{x} (t - t_{k}))

where

t \geq t_{k}

. Therefore, we obtain

Γ (t) \leq ϱ (t)

by comparison principle. Assume that the instant time

ξ \in R^{+}

, and have

t_{k} \leq ξ \leq t_{k + 1}

, we can obtain

Γ (ξ) \leq ϱ (ξ)

. Then, we obtain

ξ \geq η + t_{k}

with

η = Γ (t) / (L_{x} (1 + Γ (t)))

. Since

η + t_{k} \leq ξ \leq t_{k + 1}

, we have

τ_{k} = t_{k + 1} - t_{k} \geq η

. For the asymptotically stable closed-loop system, we can obtain

e_{k} (t) > 0

for any

∥ x (t) ∥ \neq 0

. Therefore,

Γ (t) > 0

is held, which is guaranteed to have a lower bound with

{(τ_{k})}_{m i n} \geq η > 0

. The proof is accomplished. □

4. Approximate Solution of Event-Triggered HJI Equation

An ADP-based event-triggered algorithm is shown before. In this section, an event-based single neural network with the approximation function is constructed, namely critic NN, to approximately solve the HJI equation.

Based on the general approximation of neural networks. The value function

V (x)

and its gradient

\nabla V (x)

can be obtained by the neural network as

V (x) = W_{c}^{T} ϕ (x) + ε_{V} (x)

(37)

\nabla V (x) = \nabla ϕ {(x)}^{T} W_{c} + \nabla ε_{V} (x)

(38)

where

W_{c} \in R^{L_{1}}

is the ideal weight the critic NN,

L_{1}

is the number of neurons,

ϕ (x) \in R^{L_{1}}

is the suitable activation function of NN.

ε_{V} (x)

indicates the reconstruction error.

The approximate value function can be obtained from (37), and substituted it into (12) to obtain the following Bellman equation

\int_{t - T}^{t} (x^{T} Q x + M (u) - γ^{2} υ^{T} υ) d τ + W_{c}^{T} Δ ϕ (x) ≜ e_{B} (t)

(39)

where

Δ ϕ (x) = ϕ (x (t)) - ϕ (x (t - T))

. Due to the existence of the NN approximation error, the Bellman equation error is

ε_{B} (t) = ε_{V} (t) - ε_{V} (t - T) = - \int_{t - T}^{t} (\nabla ε_{V}^{T} (f (x) + g (x) u + h (x) υ)) d τ

(40)

For (39), the optimal value is obtained only when

e_{B}

goes to zero indefinitely. However, the ideal weights

W_{c}

are unknown, they can be estimated by the current weights

{\hat{W}}_{c}

. The actual value function of the approximate network can be expressed as

\hat{V} (x) = {\hat{W}}_{c}^{T} ϕ (x)

(41)

\nabla \hat{V} (x) = \nabla ϕ {(x)}^{T} {\hat{W}}_{c}

(42)

To solve the optimal control problem, the value function is obtained by solving the Bellman equation in the policy evaluation, and then the new control laws are obtained by the present value function in the policy improvement. Combined with (20), (21), and (42), the event-triggered control law and disturbance law become

\hat{u} ({\hat{x}}_{k}) = - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} ({\hat{x}}_{k}) \nabla ϕ^{T} ({\hat{x}}_{k}) {\hat{W}}_{c})

(43)

\hat{υ} ({\hat{x}}_{k}) = \frac{1}{2} γ^{- 2} h^{T} ({\hat{x}}_{k}) \nabla ϕ^{T} ({\hat{x}}_{k}) {\hat{W}}_{c}

(44)

Combining (12) with (41), the approximate Bellman equation is described as

\hat{q} (t) + {\hat{W}}_{c}^{T} Δ ϕ (x) ≜ e_{c}

(45)

where the reinforcement signal during the integral interval can be defined as

\begin{matrix} \hat{q} (t) = \int_{t - T}^{t} (x^{T} Q x + M (\hat{u} ({\hat{x}}_{k})) - γ^{2} υ^{T} ({\hat{x}}_{k}) υ ({\hat{x}}_{k})) d τ \end{matrix}

(46)

In the iterative process of NN, its purpose is to minimize the value function. The gradient-descent method is used in this paper. Then, the critic network weights are adjusted online by minimizing the objective function

E = (1 / 2) e_{c}^{T} e_{c}

. Thus, the turning law of the critic NN weight can be expressed as

\begin{matrix} {\dot{\hat{W}}}_{c} & = - α_{c} \frac{\partial E}{\partial {\hat{W}}_{c}} = - α_{c} \frac{\partial E}{\partial e_{c}} \frac{\partial e_{c}}{\partial {\hat{W}}_{c}} = - α_{c} \frac{Δ ϕ}{{(1 + Δ ϕ^{T} Δ ϕ)}^{2}} \\ \times \{\int_{t - T}^{t} (x^{T} Q x + M (\hat{u} ({\hat{x}}_{k})) - γ^{2} υ^{T} ({\hat{x}}_{k}) υ ({\hat{x}}_{k})) d τ + Δ ϕ^{T} (x) {\hat{W}}_{c}\} \end{matrix}

(47)

where

α_{c} > 0

is an adaptive learning rate and

{(1 + Δ ϕ^{T} Δ ϕ)}^{2}

is used for normalization. Define the critic weight approximation error as

{\tilde{W}}_{c} = W_{c} - {\hat{W}}_{c}

, and the square of the denominator in (47) is used to guarantee

{\tilde{W}}_{c}

is bounded. Then, the critic weight error dynamic is presented by

{\dot{\tilde{W}}}_{c} = - α_{c} \bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t) {\tilde{W}}_{c} + α_{c} \bar{Δ} ϕ (t) m e_{B} (t)

(48)

where

m = 1 / (1 + Δ ϕ^{T} Δ ϕ)

,

\bar{Δ} ϕ = Δ ϕ / (1 + Δ ϕ^{T} Δ ϕ)

.

Remark 4.

This paper designs an online event-triggered ADP learning algorithm, the activation function must be continuously excited to ensure that the NN weights can be converged. Then, the value function and control laws are obtained. Therefore, the probing noise signal with different amplitudes and diverse frequencies is constructed as

d (t) = 0.5 (sin {(0.08 t)}^{2} cos (1.5 t) + 0.3 sin {(2.3 t)}^{4} cos (7 t))

.

Remark 5.

Under the triggering condition (23), the adaptive learning law of the critic network (47) updates its weight by the reinforcement signal from (46), then the control law (43), and perturbation law (44) are updated at the triggering instant. In order to intuitively clarify the main idea of the algorithm, Figure 1 shows the block diagram of the event-triggered ADP

H_{\infty}

-constraint control.

Assumption 4

([26]).

$g (x)$ and $h (x)$ are Lipschitz continuous and satisfy $∥ g (x) - g ({\hat{x}}_{k}) ∥ \leq L_{g} ∥ e_{k} (t) ∥$ and $∥ h (x) - h ({\hat{x}}_{k}) ∥ \leq L_{h} ∥ e_{k} (t) ∥$ .
$\nabla ϕ (x)$ is Lipschitz continuous, that is $∥\nabla ϕ (x) - \nabla ϕ ({\hat{x}}_{k})∥ \leq L_{ϕ} ∥e_{k} (t)∥$ , $L_{ϕ}$ is a positive constant.

Assumption 5

([34,35,36]).

The NN activation function and its gradient are bounded by positive constants so that $∥ϕ (x)∥ \leq ϕ_{m}$ , $∥\nabla ϕ (x)∥ \leq ϕ_{M}$ .
The critic NNs approximation error and its gradient are all bounded by a constant and satisfied $∥ε_{V}∥ \leq ε_{m}$ , $∥\nabla ε_{V}∥ \leq ε_{M}$ .

The control law

u (x)

is admissible control. The approximate weight

{\hat{W}}_{c}

can be guaranteed to converge to ideal weight

W_{c}

under persistent excitation (PE) condition. Assume that the activation function

Δ ϕ

is continuously exciting in the interval

[t - T, t]

. Let the residual error due to the approximation error satisfies

∥e_{B}∥ \leq e_{M}

, and it can ensure that the critic network weight is UUB. The boundedness condition of critic NN weight is expressed at (50).

Theorem 3.

Let Assumption 1–5 be valid, and the control laws and the turning law of critic NN are implemented by (43), (44), and (47). The closed-loop system (1) is asymptotically stable and the weight approximation error (48) is UUB only when the triggering condition in (23) is used and the following inequalities condition holds

(α_{c} - \frac{1}{2}) \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) - ϖ_{2} > 0

(49)

where

ϖ_{2}

and

ϖ_{4}

are constants from (60) and (63).

Proof.

The initial control for the dynamical system (1) is admissible. For the value function (4) of the system, the Lyapunov function is constructed as

L (t) = L_{1} + L_{2} + L_{3}

(50)

where

L_{1} = V^{*} (x)

,

L_{2} = V^{*} ({\hat{x}}_{k})

,

L_{3} = (1 / 2) {\tilde{W}}_{c}^{T} {\tilde{W}}_{c}

.

The process of NN learning is mainly divided into two cases under event-triggered control: (1) During the flow dynamic on

t \in [t_{k}, t_{k + 1})

. (2) At the triggering instant

t = t_{k}

.

Case 1. When the triggering condition is not satisfied and the derivative of

L_{1}

can be given as

\begin{matrix} {\dot{L}}_{1} & = \nabla {V^{*}}^{T} (x) (f (x) + g (x) \hat{u} ({\hat{x}}_{k}) + h (x) \hat{υ} ({\hat{x}}_{k})) \\ = \nabla {V^{*}}^{T} (x) g (x) (\hat{u} ({\hat{x}}_{k}) - u^{*} (x)) + \nabla {V^{*}}^{T} (x) h (x) (\hat{υ} ({\hat{x}}_{k}) - υ^{*} (x)) \\ + \nabla {V^{*}}^{T} (x) (f (x) + g (x) u^{*} (x) + h (x) υ^{*} (x)) \end{matrix}

(51)

According to (25)–(27) and (51) can be transformed as

\begin{matrix} {\dot{L}}_{1} & = 2 u_{M} R {tanh}^{- T} (\frac{u^{*} (x)}{u_{M}}) (\hat{u} ({\hat{x}}_{k}) - u^{*} (x)) + 2 γ^{2} {υ^{*}}^{T} (x) (\hat{υ} ({\hat{x}}_{k}) - υ^{*} (x)) \\ - x^{T} Q x - M (u^{*} (x)) + γ^{2} {υ^{*} (x)}^{T} υ^{*} (x) \end{matrix}

(52)

Similarly, using Young’s inequality, we have

\begin{matrix} {\dot{L}}_{1} & \leq ∥ u_{M} R {tanh}^{- T} (\frac{u^{*} (x)}{u_{M}}) ∥^{2} + {∥\hat{u} ({\hat{x}}_{k}) - u^{*} (x)∥}^{2} + γ^{2} {∥ υ^{*} (x) ∥}^{2} \\ + γ^{2} {∥\hat{υ} ({\hat{x}}_{k}) - υ^{*} (x)∥}^{2} - x^{T} Q x - M (u^{*} (x)) + γ^{2} υ^{* T} (x) υ^{*} (x) \\ \leq \frac{1}{4} ∥ \nabla V^{* T} (x) g ∥^{2} + 2 γ^{2} {∥ {υ (x)}^{*} ∥}^{2} - x^{T} Q x - M (u^{*} (x)) \\ + \underset{o_{1}}{\underset{︸}{{∥\hat{u} ({\hat{x}}_{k}) - u^{*} (x)∥}^{2}}} + γ^{2} \underset{o_{2}}{\underset{︸}{{∥\hat{υ} ({\hat{x}}_{k}) - υ^{*} (x)∥}^{2}}} \end{matrix}

(53)

Combining (8) with (38), then we obtain

\begin{matrix} u^{*} (x) & = - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} (x) \nabla V^{*} (x)) \\ = - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} (x) \nabla ϕ^{T} (x) \{W_{c} + \nabla ε_{V} (x)\}) \end{matrix}

(54)

Note that

| tanh (x) | \leq 1

and

| x - y | \leq | x | + | y |

, Then, we have

\begin{matrix} o_{1} & = ∥ u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} (x) (\nabla ϕ^{T} (x) W_{c} + \nabla ε_{V} (x))) \\ - u_{M} tanh (\frac{1}{2 u_{M}} R^{- 1} g^{T} ({\hat{x}}_{k}) \nabla ϕ^{T} ({\hat{x}}_{k}) (W_{c} - {\tilde{W}}_{c})) ∥^{2} \\ \leq u_{M}^{2} ∥ \frac{1}{2 u_{M}} R^{- 1} g^{T} (x) (\nabla ϕ^{T} (x) W_{c} + \nabla ε_{V} (x)) + \frac{1}{2 u_{M}} R^{- 1} g^{T} ({\hat{x}}_{k}) \nabla ϕ^{T} ({\hat{x}}_{k}) ({\tilde{W}}_{c} - W_{c}) ∥^{2} \\ \leq \frac{1}{4} {∥ R^{- 1} ∥}^{2} ∥ (\nabla ϕ (x) g (x) - \nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k})) W_{c} + g^{T} (x) \nabla ε_{V} (x) + \nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k}) {\tilde{W}}_{c} ∥^{2} \\ \leq \frac{1}{2} {∥ R^{- 1} ∥}^{2} (\underset{F_{1}}{\underset{︸}{{∥\nabla ϕ (x) g (x) - \nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k})∥}^{2}}} {∥W_{c}∥}^{2} + \underset{F_{2}}{\underset{︸}{{∥g^{T} (x) \nabla ε_{V} (x) + \nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k}) {\tilde{W}}_{c}∥}^{2}}}) \end{matrix}

(55)

According to Assumption 4 and Assumption 5, we obtain

\begin{matrix} F_{1} & = {∥\nabla ϕ (x) g (x) - \nabla ϕ ({\hat{x}}_{k}) g (x) + \nabla ϕ ({\hat{x}}_{k}) g (x) - \nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k})∥}^{2} \\ = {∥(\nabla ϕ (x) - \nabla ϕ ({\hat{x}}_{k})) g (x) + \nabla ϕ ({\hat{x}}_{k}) (g (x) - g ({\hat{x}}_{k}))∥}^{2} \\ \leq 2 {∥(\nabla ϕ (x) - \nabla ϕ ({\hat{x}}_{k})) g (x)∥}^{2} + 2 {∥\nabla ϕ ({\hat{x}}_{k}) (g (x) - g ({\hat{x}}_{k}))∥}^{2} \\ \leq 2 (L_{ϕ}^{2} g_{M}^{2} + ϕ_{M}^{2} L_{g}^{2}) {∥e_{k}∥}^{2} \end{matrix}

(56)

Similarly, it can be obtained

\begin{matrix} F_{2} & = {∥g^{T} (x) \nabla ε_{V} (x) + \nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k}) {\tilde{W}}_{c}∥}^{2} \\ \leq 2 {∥g^{T} (x) \nabla ε_{V} (x)∥}^{2} + 2 {∥\nabla ϕ ({\hat{x}}_{k}) g ({\hat{x}}_{k}) {\tilde{W}}_{c}∥}^{2} \\ \leq 2 (g_{M}^{2} ε_{M}^{2} + g_{M}^{2} ϕ_{M}^{2} {∥{\tilde{W}}_{c}∥}^{2}) \end{matrix}

(57)

According to (55)–(57),

o_{1}

rewrite as

o_{1} \leq ∥ R^{- 1} ∥^{2} (L_{ϕ}^{2} g_{M}^{2} + ϕ_{M}^{2} L_{g}^{2}) {∥e_{k}∥}^{2} ∥ W_{c} ∥^{2} + {∥ R^{- 1} ∥}^{2} (g_{M}^{2} ε_{M}^{2} + g_{M}^{2} ϕ_{M}^{2} {∥{\tilde{W}}_{c}∥}^{2})

(58)

Lisewise, combining (9) with (37),

o_{2}

can be given as

\begin{matrix} o_{2} & \leq ∥ \frac{1}{2} γ^{- 2} h^{T} (x) (\nabla ϕ^{T} (x) W_{c} + \nabla ε_{V} (x)) - \frac{1}{2} γ^{- 2} h^{T} ({\hat{x}}_{k}) \nabla ϕ^{T} ({\hat{x}}_{k}) {\hat{W}}_{c} ∥^{2} \\ \leq \frac{1}{2} {∥\nabla ϕ (x) h (x) - \nabla ϕ ({\hat{x}}_{k}) h ({\hat{x}}_{k})∥}^{2} ∥ W_{c} ∥^{2} + ∥ h^{T} (x) \nabla ε_{V} {(x) ∥}^{2} + {∥ \nabla ϕ ({\hat{x}}_{k}) h ({\hat{x}}_{k}) {\tilde{W}}_{c} ∥}^{2} \\ \leq (L_{ϕ}^{2} h_{M}^{2} + ϕ_{M}^{2} L_{h}^{2}) {∥e_{k}∥}^{2} {∥ W_{c} ∥}^{2} + h_{M}^{2} ε_{M}^{2} + h_{M}^{2} ϕ_{M}^{2} {∥{\tilde{W}}_{c}∥}^{2} \end{matrix}

(59)

According to (53), (58) and (59), we have

\begin{matrix} {\dot{L}}_{1} & \leq - x^{T} Q x - M (u^{*} (x)) + ϖ_{1} {∥e_{k}∥}^{2} {∥ W_{c} ∥}^{2} + ϖ_{2} {∥ {\tilde{W}}_{c} ∥}^{2} + ϖ_{3} \\ \leq - (1 - σ^{2}) \underset{̲}{λ} (Q) {∥x∥}^{2} - σ^{2} \underset{̲}{λ} (Q) {∥x∥}^{2} + ϖ_{1} {∥e_{k}∥}^{2} {∥ W_{c} ∥}^{2} + ϖ_{2} {∥ {\tilde{W}}_{c} ∥}^{2} + ϖ_{3} \end{matrix}

(60)

where

ϖ_{1} = ∥ R^{- 1} ∥^{2} (L_{ϕ}^{2} g_{M}^{2} + ϕ_{M}^{2} L_{g}^{2}) + γ^{2} (L_{ϕ}^{2} h_{M}^{2} + ϕ_{M}^{2} L_{h}^{2}), ϖ_{2} = ∥ R^{- 1} ∥^{2} g_{M}^{2} ϕ_{M}^{2} + γ^{2} h_{M}^{2} ϕ_{M}^{2}, ϖ_{3} = \frac{1}{4} g_{M}^{2} η_{0} + 2 γ^{2} υ_{M}^{2} + {∥ R^{- 1} ∥}^{2} g_{M}^{2} ε_{M}^{2} + γ^{2} h_{M}^{2} ε_{M}^{2} .

If the event is not triggered, we have

{\dot{L}}_{2} = 0

. Note that

1 / (1 + Δ ϕ^{T} Δ ϕ) \leq 1

, thus, m,

\bar{Δ} ϕ (t)

from (48) are bounded. Then, according to

W_{c} = {\hat{W}}_{c} + {\tilde{W}}_{c}

, the part of

{\dot{L}}_{3}

can be given as

\begin{matrix} {\dot{L}}_{3} & = {\tilde{W}}_{c}^{T} {\dot{\tilde{W}}}_{c} = - α_{c} {\tilde{W}}_{c}^{T} \bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t) {\tilde{W}}_{c} + α_{c} {\tilde{W}}_{c}^{T} \bar{Δ} ϕ (t) m ε_{B} (t) \\ \leq - α_{c} \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) ∥ {\tilde{W}}_{c} ∥^{2} + \frac{1}{2} α_{c}^{2} e_{M}^{2} + \frac{1}{2} \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) {∥ {\tilde{W}}_{c} ∥}^{2} \\ \leq - (α_{c} - \frac{1}{2}) \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) {∥ {\tilde{W}}_{c} ∥}^{2} + \frac{1}{2} α_{c}^{2} e_{M}^{2} \end{matrix}

(61)

Combining (60) and (61), we have

\begin{matrix} \dot{L} \leq & - (1 - σ^{2}) \underset{̲}{λ} (Q) {∥x∥}^{2} - σ^{2} \underset{̲}{λ} (Q) {∥x∥}^{2} + ϖ_{1} {∥e_{k}∥}^{2} {∥ W_{c} ∥}^{2} + ϖ_{2} {∥ {\tilde{W}}_{c} ∥}^{2} \\ - (α_{c} - \frac{1}{2}) \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) {∥ {\tilde{W}}_{c} ∥}^{2} + \frac{1}{2} α_{c}^{2} e_{M}^{2} + ϖ_{3} \end{matrix}

(62)

Let the triggering condition (23) holding the derivative of

L

become

\begin{matrix} \dot{L} \leq - σ^{2} \underset{̲}{λ} (Q) {∥x∥}^{2} - ((α_{c} - \frac{1}{2}) \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) - ϖ_{2}) {∥ {\tilde{W}}_{c} ∥}^{2} + ϖ_{4} \end{matrix}

(63)

where

ϖ_{4} = (1 / 2) α_{c}^{2} e_{M}^{2} + ϖ_{3}

. Let the triggering condition (23) and the inequality condition (49) hold. We can conclude that

\dot{L} < 0

if the

∥x (t)∥

and

∥ {\tilde{W}}_{c} ∥

are satisfied with

∥x∥ > \sqrt{\frac{ϖ_{4}}{σ^{2} \underset{̲}{λ} (Q)}}

(64)

∥{\tilde{W}}_{c}∥ > \sqrt{\frac{2 ϖ_{4}}{(2 α_{c} - 1) \underset{̲}{λ} (\bar{Δ} ϕ (t) \bar{Δ} ϕ^{T} (t)) - 2 ϖ_{2}}}

(65)

Therefore, by the Lyapunov theory, the state x and the NN weight error

{\tilde{W}}_{c}

are uniformly ultimately bounded.

Case 2. The event is triggered at

t = t_{k + 1}

. A new Lyapunov function is constructed as

Δ L (t) = Δ L_{1} + Δ L_{2} + Δ L_{3}

(66)

where

Δ L_{1} = V^{*} (x^{+}) - V^{*} ({\hat{x}}_{k})

,

Δ L_{2} = V^{*} ({\hat{x}}_{k + 1}) - V^{*} ({\hat{x}}_{k})

,

Δ L_{3} = 1 / 2 (\tilde{W} {_{c}^{+}}^{T} {\tilde{W}}_{c}^{+} - {\tilde{W}}_{c}^{T} {\tilde{W}}_{c})

. If

ξ \to 0^{+}

, then we have

x^{+} = x (t_{k}^{+}) \to x (t_{k} + ξ)

with

ξ \in (0, t_{k + 1} - t_{k})

. It can be concluded x and

{\tilde{W}}_{c}

are UUB in the flow dynamic from Case 1 such that the closed-loop system (1) is asymptotically stable. Since

V^{*} (x)

and

{\tilde{W}}_{c}

are continuous, then we can obtain

Δ L_{1} = 0

and

Δ L_{3} = 0

. At the triggering instant

t = t_{k + 1}

, we have

Δ L_{2} = V^{*} ({\hat{x}}_{k + 1}) - V^{*} ({\hat{x}}_{k}) \leq - ω (∥ e_{k + 1} (t_{k}) ∥) < 0

(67)

where

e_{k + 1} (t_{k}) = {\hat{x}}_{k + 1} - {\hat{x}}_{k}

, and

ω (\cdot)

is a class-

κ

function [43]. Therefore, we can obtain

Δ L < 0

. The above two cases ensure that the closed-loop system under the event-triggered strategy is asymptotically stable and the NN weight error is UUB if the triggering condition (23) and the inequality (49) are held. The proof is completed. □

Remark 6.

The theoretical description has been shown. Algorithm 2 can be implemented by using a single critic NN, which updates the critic weight using the turning law (47). Then, the new control laws

u (x)

and

υ (x)

are obtained by (43) and (44) when an event occurs. Compared with [12,26,35,44], we propose the event-triggered ADP method which solves the zero-sum problem with actuator saturation and partially unknown dynamic. Based on event-triggered control, the control laws of the two-player are presented in segments by the designed triggering condition (23). Thus, the event-triggered control avoids unnecessary data transmission and calculation costs.

5. Simulation Results

In this section, two simulation examples of the linear system and the nonlinear system are provided to verify the effectiveness of the online event-triggered adaptive dynamic programming algorithm for the continuous-time system.

5.1. Linear Sytsem

Consider the F-16 aircraft continuous-time linear system with external perturbation as [12]

\dot{x} = A x (t) + B u (t) + H υ (t)

(68)

where

A = [\begin{matrix} - 1.01887 & 0.90506 & - 0.00215 \\ 0.82225 & - 1.07741 & - 0.17555 \\ 0 & 0 & - 1 \end{matrix}]

,

B = {[0, 0, 1]}^{T}

,

H = {[1, 0, 1]}^{T}

The three state vectors are denoted as

x = {[x_{1}, x_{2}, x_{3}]}^{T}

, and the initial state is

x_{0} = {[1, - 1, 2]}^{T}

. Choose a matrix of the appropriate dimension, where

Q = I_{3}

and

R = I_{1}

. The constrained control can be satisfied with

u \in {u \in R : ∥u∥ \leq 1}

. The activation function of single NNs is

ϕ (x) = {[x_{1}^{2}, x_{2}^{2}, x_{3}^{2}, x_{1} x_{2}, x_{1} x_{3}, x_{2} x_{3}]}^{T}

. Then, the approximated weight vectors are denoted as

{\hat{W}}_{c} = [{\hat{W}}_{c_{1}}, {\hat{W}}_{c_{2}}, {\hat{W}}_{c_{3}}, {\hat{W}}_{c_{4}}, {\hat{W}}_{c_{5}}, {\hat{W}}_{c_{6}}]

, which is the solution of the algebraic Riccati equation (ARE):

A^{T} P + P A - P B R^{- 1} B^{T} P + Q = 0

. The NN learning and triggering parameters are needed in this paper:

γ = 0.5

,

L \in (0, 1)

,

T = 0.1

,

α_{c} = 0.35

. The external disturbance is selected as

υ (t) = 5 cos (t) e^{- t}

.

Figure 2 shows the convergence of three state vectors. It is obvious that the states tend to be asymptotically stable. The iterative process of critic NN is illustrated in Figure 3. The critic weight vectors are converged to

{\hat{W}}_{c} = [-

0.0349,−0.0128,−0.1575, 1.4738, 0.6577, 0.2075

]^{T}

under the PE condition after 60s. Therefore, the PE condition is effectively guaranteed by adding detection noise. Under the event-triggered strategy, the control law and the perturbation law are updated in a nonperiodical manner. The evolution of ETC laws

u ({\hat{x}}_{k})

and

υ ({\hat{x}}_{k})

is described in Figure 4. Under the condition of constrained input, the control law remains no more than 1, that is

∥ u_{m a x} ∥ < 1

. The trajectory of event error (15) and the triggering threshold (23) is shown in Figure 5, which verifies the working effect of the triggering condition. Figure 6 presents the event-triggered sample intervals and periods under the triggering condition. Compared with 1000 sampling states based on the time-triggered strategy, only 85 state samples are performed based on the event-triggered strategy. From Table 1, the minimum interval time

{(τ_{k})}_{m i n} = 0.1

, and the Zeno phenomenon is avoided. As a result, the updated frequencies and computations of the controller are greatly reduced.

5.2. Nonlinear Sytsem

Consider a continuous-time nonlinear system with the external disturbance given as

\dot{x} = f (x (t)) + g (x (t)) u (t) + h (x (t)) υ (t)

(69)

where

f (x) = [\begin{matrix} - x_{1} + x_{2} \\ - 0.5 x_{1} - 0.5 x_{2} (1 - cos {((2 x_{1}) + 2)}^{2}) \end{matrix}]

g (x) = [\begin{matrix} 0 \\ cos (2 x_{1}) + 2 \end{matrix}]

,

h (x) = [\begin{matrix} 0 \\ sin (4 x_{1} + 2) \end{matrix}]

The state vectors are denoted as

x = {[x_{1}, x_{2}]}^{T}

, and the initial state is

x_{0} = {[1, - 1]}^{T}

. The constrained input control is satisfied with

u \in {u \in R : ∥u∥ \leq 1}

. Choose

Q = I_{2}

and

R = I_{1}

. The activation function is selected as

ϕ (x) = {[x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}]}^{T}

. Then, the approximated weight vectors are denoted as

{\hat{W}}_{c} = [{\hat{W}}_{c_{1}}, {\hat{W}}_{c_{2}}, {\hat{W}}_{c_{3}}]

. The NN learning and triggering parameters are needed in this paper:

γ = 0.4

,

L \in (0, 1)

,

T = 0.1

,

α_{c} = 0.25

. The external disturbance is selected as

υ (t) = 5 cos (t) e^{- t}

similarly.

The two state trajectories of the closed-loop system are described in Figure 7. It is obvious that

x_{1}

and

x_{2}

gradually converge to asymptotically stable. The ideal critic weight is

{[W_{c_{1}}, W_{c_{2}}, W_{c_{3}}]}^{T}

. The NNs can be ensured to converge under the action of PE condition. Figure 8 demonstrates the approximable critic weight vectors are converged to

{[0.5468, 0.8876, - 0.4125]}^{T}

after learning. From Figure 9, we can obtain the evolution of ETC policies

u ({\hat{x}}_{k})

and

υ ({\hat{x}}_{k})

. Under the condition of constrained input, the control law remains no more than 1, that is

∥ u_{m a x} ∥ < 1

. The trajectory of the event error and the triggering threshold are shown in Figure 10. It can be seen intuitively that the event-triggered errors are always within the triggering thresholds. Figure 11 indicates the event-triggered sampling intervals under the event-triggered mechanism. From Table 2, it has only 157 sampling states are performed based on an event-triggered strategy. Likewise, states will be sampled at each time by using a time-triggered strategy.

6. Conclusions

This paper has proposed an event-triggered ADP algorithm to solve the locally unknown zero-sum game problem with constrained input. Using the IRL technique to obtain the unknown dynamic system, and a single-critic NN is structured to solve the HJI equation. The event-based

H_{\infty}

controller is designed to ensure that the state of the system is sampled only at the triggering instant. Furthermore, the triggering condition guarantees the dynamic system state and the critic weight are uniformly ultimately bounded. Two examples are used to verify the feasibility of the algorithm. However, with the complexity and nonlinearity of the system, this algorithm is not enough to solve systems with model-free or parameter uncertainties. Therefore, the direction of future research is how to solve the above problem.

Author Contributions

Writing—original draft, B.P.; Writing—review & editing, X.C., Y.C. and W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Zhejiang grant number LY22F030009, Zhejiang Province basic scientific research business fee funding project (2022YW20, 2022YW84), National Natural Science Foundation of China grant number 61903351, Liaoning Revitalization Talents Program grant number XLYC2007182, Education Department Project of Liaoning grant number LJKMZ20220655.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hu, Z.; Mu, X. Event-Triggered Impulsive Control for Nonlinear Stochastic Systems. IEEE Trans. Cybern. 2022, 52, 7805–7813. [Google Scholar] [CrossRef] [PubMed]
Jiang, H.; Zhang, H.; Luo, Y.; Han, J. Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems with Uncertainties via Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 579–588. [Google Scholar] [CrossRef]
Jiang, H.; Zhang, H.; Luo, Y.; Cui, X. H_∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 2017, 237, 226–234. [Google Scholar] [CrossRef]
Cui, X.; Zhang, H.; Luo, Y.; Jiang, H. Adaptive dynamic programming for H∞ tracking design of uncertain nonlinear systems with disturbances and input constraints. Int. J. Adapt. Control Signal Process. 2017, 31, 1567–1583. [Google Scholar] [CrossRef]
Lewis, F.L.; Vrabie, D.L. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
Werbos, P. Advanced forecasting methods for global crisis warning and models of intelligence. Gen. Syst. Yearb. 1977, 22, 25–38. [Google Scholar]
Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Mu, C.; Wang, K.; Sun, C. Policy-Iteration-Based Learning for Nonlinear Player Game Systems With Constrained Inputs. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 6488–6502. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1513–1525. [Google Scholar] [CrossRef]
Xu, D.; Wang, Q.; Li, Y. Adaptive Optimal Robust Control for Uncertain Nonlinear Systems Using Neural Network Approximation in Policy Iteration. Appl. Sci. 2021, 11, 2312. [Google Scholar] [CrossRef]
Vrabie, D.L.; Lewis, F.L. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. Off. J. Int. Neural Netw. Soc. 2009, 22, 237–246. [Google Scholar] [CrossRef] [PubMed]
Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014, 50, 193–202. [Google Scholar] [CrossRef]
Qin, C.; Wang, J.; Qiao, X.; Zhu, H.; Zhang, D.; Yan, Y. Integral Reinforcement Learning for Tracking in a Class of Partially Unknown Linear Systems With Output Constraints and External Disturbances. IEEE Access 2022, 10, 55270–55278. [Google Scholar] [CrossRef]
Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2939–2951. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; He, H. Event-Driven H_∞-Constrained Control Using Adaptive Critic Learning. IEEE Trans. Cybern. 2021, 51, 4860–4872. [Google Scholar] [CrossRef] [PubMed]
Modares, H.; Lewis, F.L.; Sistani, M.B. Online solution of nonquadratic two-player zero-sum games arising in the H_∞ control of constrained input systems. Int. J. Adapt. Control Signal Process. 2014, 28, 232–254. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Lewis, F.L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control. 2010, 22, 3040–3047. [Google Scholar]
Zhao, J.; Gan, M.; Chen, J.; Hou, D.; Zhang, M.; Bai, Y. Adaptive optimal control for a class of uncertain systems with saturating actuators and external disturbance using integral reinforcement learning. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, QLD, Australia, 17–20 December 2017; pp. 1146–1151. [Google Scholar]
Cui, X.; Chen, J.; Wang, B.; Xu, S. Off-policy algorithm based Hierarchical optimal control for completely unknown dynamic systems. Neurocomputing 2022, 488, 669–680. [Google Scholar] [CrossRef]
Zhong, X.; He, H.; Wang, D.; Ni, Z. Model-Free Adaptive Control for Unknown Nonlinear Zero-Sum Differential Game. IEEE Trans. Cybern. 2018, 48, 1633–1646. [Google Scholar] [CrossRef]
Wei, Q.; Liu, D.; Lin, Q.; Song, R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 957–969. [Google Scholar] [CrossRef]
Vamvoudakis, K.G. An online actor/critic algorithm for event-triggered optimal control of continuous-time nonlinear systems. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 1–6. [Google Scholar]
Sahoo, A.; Xu, H.; Jagannathan, S. Approximate Optimal Control of Affine Nonlinear Continuous-Time Systems Using Event-Sampled Neurodynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 639–652. [Google Scholar] [CrossRef] [PubMed]
Sahoo, A.; Narayanan, V.; Jagannathan, S. Optimal event-triggered control of uncertain linear networked control systems: A co-design approach. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–6. [Google Scholar]
Heemels, W.; Johansson, K.H.; Tabuada, P. An introduction to event-triggered and self-triggered control. In Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, HI, USA, 10–13 December 2012; pp. 3270–3285. [Google Scholar]
Mu, C.; Wang, K.; Qiu, T. Dynamic Event-Triggering Neural Learning Control for Partially Unknown Nonlinear Systems. IEEE Trans. Cybern. 2022, 52, 2200–2213. [Google Scholar] [CrossRef] [PubMed]
Narayanan, V.; Sahoo, A.; Jagannathan, S. Optimal Event-triggered Control of Nonlinear Systems: A Min-max Approach. In Proceedings of the 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018; pp. 3441–3446. [Google Scholar]
Zhao, F.; Gao, W.; Jiang, Z.P.; Liu, T. Event-Triggered Adaptive Optimal Control With Output Feedback: An Adaptive Dynamic Programming Approach. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5208–5221. [Google Scholar] [CrossRef]
Zhang, K.; Zhang, H.; Jiang, H.; Wang, Y. Near-optimal output tracking controller design for nonlinear systems using an event-driven ADP approach. Neurocomputing 2018, 309, 168–178. [Google Scholar] [CrossRef]
Hu, C.; Zou, Y.; Li, S. Observed-based event-triggered control for nonlinear systems with disturbances using adaptive dynamic programming. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 581–586. [Google Scholar]
Shi, C.X.; Yang, G. Nash equilibrium computation in two-network zero-sum games: An incremental algorithm. Neurocomputing 2019, 359, 114–121. [Google Scholar] [CrossRef]
Su, H.; Zhang, H.; Liang, Y.; Mu, Y. Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems. Neurocomputing 2019, 368, 84–98. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Event-Triggered Control of Discrete-Time Zero-Sum Games via Deterministic Policy Gradient Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 4823–4835. [Google Scholar] [CrossRef]
Mu, C.; Wang, K. Aperiodic adaptive control for neural-network-based nonzero-sum differential games: A novel event-triggering strategy. ISA Trans. 2019, 92, 1–13. [Google Scholar] [CrossRef]
Yang, X.; He, H. Event-Triggered Robust Stabilization of Nonlinear Input-Constrained Systems Using Single Network Adaptive Critic Designs. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 3145–3157. [Google Scholar] [CrossRef]
Zhang, Q.; Zhao, D.; Wang, D. Event-Based Robust Control for Uncertain Nonlinear Systems Using Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 37–50. [Google Scholar] [CrossRef]
Dong, L.; Zhong, X.; Sun, C.; He, H. Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 37–50. [Google Scholar] [CrossRef] [PubMed]
Wang, K.; Gu, Q.; Huang, B.; Wei, Q.; Zhou, T. Adaptive Event-Triggered Near-Optimal Tracking Control for Unknown Continuous-Time Nonlinear Systems. IEEE Access 2022, 10, 9506–9518. [Google Scholar] [CrossRef]
Abu-Khalaf, M.; Lewis, F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
Tabuada, P. Event-Triggered Real-Time Scheduling of Stabilizing Control Tasks. IEEE Trans. Autom. Control 2007, 52, 1680–1685. [Google Scholar] [CrossRef]
Xue, S.; Luo, B.; Liu, D.; Yang, Y. Constrained Event-Triggered H_∞ Control Based on Adaptive Dynamic Programming With Concurrent Learning. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 357–369. [Google Scholar] [CrossRef]
Luo, B.; Yang, Y.; Liu, D.; Wu, H. Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 76–88. [Google Scholar] [CrossRef]
Khalil, H.K.; Grizzle, J. Nonlinear Systems; Prentice Hall: Englewood Cliffs, NJ, USA, 1996. [Google Scholar]
Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 3189–3199. [Google Scholar] [CrossRef]

Figure 1. The structure diagram of Event-triggered ADP algorithm.

Figure 2. Three state vectors of linear system.

Figure 3. The weight trajectories of a single NNs.

Figure 4. The control law

u ({\hat{x}}_{k})

and the perturbation law

υ ({\hat{x}}_{k})

.

Figure 4. The control law

u ({\hat{x}}_{k})

and the perturbation law

υ ({\hat{x}}_{k})

.

Figure 5. The event error

{∥e_{k}∥}^{2}

and the triggering threshold

e_{T}

.

Figure 5. The event error

{∥e_{k}∥}^{2}

and the triggering threshold

e_{T}

.

Figure 6. Sample intervals of two strategies. (a) The time-triggered control. (b) The event-triggered control.

Figure 7. The evolution of

x_{1} (t)

and

x_{2} (t)

.

Figure 7. The evolution of

x_{1} (t)

and

x_{2} (t)

.

Figure 8. Three vectors for a single-critic network.

Figure 9. The control law

u ({\hat{x}}_{k})

and the perturbation law

υ ({\hat{x}}_{k})

.

Figure 9. The control law

u ({\hat{x}}_{k})

and the perturbation law

υ ({\hat{x}}_{k})

.

Figure 10. The event error

{∥e_{k}∥}^{2}

and the triggering threshold

e_{T}

.

Figure 10. The event error

{∥e_{k}∥}^{2}

and the triggering threshold

e_{T}

.

Figure 11. The event-triggered sampling intervals.

Table 1. State Sampling of Two Strategies.

Methods	Event-Triggered	Time-Triggered
Samples	85	1000
Minimal interval (s)	0.1	0.1
Average interval (s)	1.0165	0.1

Table 2. Comparison of Nonlinear System Sampling.

Methods	Event-Triggered	Time-Triggered
Samples	157	1000
Minimal interval (s)	0.1	0.1
Average interval (s)	0.527	0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, B.; Cui, X.; Cui, Y.; Chen, W. Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input. Appl. Sci. 2023, 13, 2140. https://doi.org/10.3390/app13042140

AMA Style

Peng B, Cui X, Cui Y, Chen W. Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input. Applied Sciences. 2023; 13(4):2140. https://doi.org/10.3390/app13042140

Chicago/Turabian Style

Peng, Binbin, Xiaohong Cui, Yang Cui, and Wenjie Chen. 2023. "Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input" Applied Sciences 13, no. 4: 2140. https://doi.org/10.3390/app13042140

APA Style

Peng, B., Cui, X., Cui, Y., & Chen, W. (2023). Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input. Applied Sciences, 13(4), 2140. https://doi.org/10.3390/app13042140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input

Abstract

1. Introduction

2. Problem Description

3. Design of Event-Triggered $H_{\infty}$ -Constrained Optimal Control

4. Approximate Solution of Event-Triggered HJI Equation

5. Simulation Results

5.1. Linear Sytsem

5.2. Nonlinear Sytsem

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input

Abstract

1. Introduction

2. Problem Description

3. Design of Event-Triggered H ∞ -Constrained Optimal Control

4. Approximate Solution of Event-Triggered HJI Equation

5. Simulation Results

5.1. Linear Sytsem

5.2. Nonlinear Sytsem

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Design of Event-Triggered $H_{\infty}$ -Constrained Optimal Control