Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems

Qin, Chunbin; Wu, Yinliang; Zhang, Jishi; Zhu, Tianzeng

doi:10.3390/e25081158

Open AccessArticle

Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems

¹

School of Artificial Intelligence, Henan University, Zhengzhou 450046, China

²

School of Software, Henan University, Kaifeng 475000, China

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(8), 1158; https://doi.org/10.3390/e25081158

Submission received: 30 May 2023 / Revised: 21 June 2023 / Accepted: 1 July 2023 / Published: 2 August 2023

(This article belongs to the Special Issue Information Theory for Interpretable Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper addresses the problem of decentralized safety control (DSC) of constrained interconnected nonlinear safety-critical systems under reinforcement learning strategies, where asymmetric input constraints and security constraints are considered. To begin with, improved performance functions associated with the actuator estimates for each auxiliary subsystem are constructed. Then, the decentralized control problem with security constraints and asymmetric input constraints is transformed into an equivalent decentralized control problem with asymmetric input constraints using the barrier function. This approach ensures that safety-critical systems operate and learn optimal DSC policies within their safe global domains. Then, the optimal control strategy is shown to ensure that the entire system is uniformly ultimately bounded (UUB). In addition, all signals in the closed-loop auxiliary subsystem, based on Lyapunov theory, are uniformly ultimately bounded, and the effectiveness of the designed method is verified by practical simulation.

Keywords:

interconnected nonlinear safety-critical systems; barrier function; asymmetric input constraints; safety constraints; decentralized control

1. Introduction

Over the past few decades, safety has received increasing attention in autonomous driving [1], intelligent robots [2], robotic arms [3], adaptive cruise control [4], etc. The design of these systems and controllers require that the system state trajectories evolve within a set called the safe set, reflecting the inherent properties of the system [5]. In practice, many engineering systems must operate within a specific safety range, beyond which the controlled system may be at risk [6]. Safety-critical systems primarily refer to systems having control behaviors that prioritize safety. The designed control schemes aim to reduce the potential for severe consequences, such as personal injury and environmental pollution, which may arise due to system shutdown or operational errors [7]. To ensure the safety and reliability of the system, scholars developed many safety control schemes. The classical approach focused on extending and applying Naguma’s theorem to safe sets defined by continuously differentiable functions [8]. In particular, barrier functions have become an effective tool for verifying security and have been widely used in [9,10,11]. They were used to convert a system with security constraints into an equivalency system that satisfies security requirements and then a security controller was designed to protect the system. In [9,10], penalty functions and BF-based state transitions were employed to merge states into a reinforcement learning framework to solve optimal control problems with full-state constraints. In [11], a safe non-strategic reinforcement learning method to solve secure nonlinear systems with dynamic uncertainty was proposed. In [12,13], a new secure reinforcement learning method was proposed to solve secure nonlinear systems with symmetric input constraints. However, the results in [9,10,11,12,13], mentioned above, were mainly based on studying the optimal safety control in a single continuous-time/discrete-time nonlinear system. The security control of interconnected systems has not been fully resolved.

On the other hand, interconnected systems consist of multiple subsystems with interconnected characteristics, and designing controllers for them through a concept similar to that of a single-system approach is difficult [14]. To solve this problem [15,16,17], the decentralized control approach, based on local subsystem information, was proposed. This approach involved using multiple controllers to control the interconnected systems. In [18,19], the decentralized control approach differed by initially decomposing the entire system control problem into a series of subproblems that could be solved independently. The solutions to the subproblems (i.e., independent controllers) were then joined to form a decentralized controller to stabilize the entire system. In addition, implementing the decentralized control algorithm used only the local subsystem’s knowledge, not the complete system’s information. Recently, scholars have proposed many schemes or techniques for designing decentralized controllers, including quantization techniques [20], fuzzy techniques [21], and optimal control methods [22]. This paper develops decentralized control strategies from the optimal control perspective. Problems of optimal control are usually solved via the solution of the Hamilton–Jacobi–Bellman (HJB) partial differentiation equation [23,24]. However, the HJB equation is generally not solvable analytically due to its inherent nonlinearity [25,26]. Therefore, adaptive dynamic programming (ADP) and reinforcement learning (RL) algorithms were proposed to obtain numerical solutions to the HJB equation and were widely applied to nonlinear interconnected systems [27,28,29,30]. In [31,32], the two previously mentioned algorithms could be deemed closely related, as they exhibited similar characteristics in addressing optimal control problems. For example, in [27,28], the distributed optimal controller was designed using robust ADP for nonlinear interconnected systems with unknown dynamics and parameters. In [29], the optimal decentralized control problem for interconnected nonlinear systems subject to stochastic dynamics was solved by enhancing the performance function of the auxiliary subsystem and transforming the original control problem into a set of optimal control strategies sampled in periodic patterns. Furthermore, in [30], the identifier–critic network framework was used to solve the problem of decentralized event-triggered control based on sliding-mode surfaces, avoiding the need for knowledge of the system’s internal dynamics. It is worth noting that the control results provided in [27,28,29,30] did not consider input constraints.

Control constraints are commonly encountered in industrial processes, where they are widespread and have a detrimental impact on the performance of systems [33,34]. Therefore, the study of constrained nonlinear systems is of practical importance. In [35,36], the RL-based decentralized algorithm was developed for tracking control of constrained interconnected nonlinear systems. In [37], the problem of decentralized optimal control of a constrained interconnected nonlinear system was solved by introducing a nonquadratic performance function to overcome the symmetric input constraint. The results in [35,36,37], mentioned above, mainly addressed the symmetric input constraint. However, the problem of asymmetric input constraints was identified in several project cases [38,39]. In [40], the optimal decentralized control problem with asymmetric input constraints was solved by designing a new non-quadratic performance function. In [41], a new performance function was proposed for interconnected nonlinear systems to successfully overcome the asymmetric input constraint and to solve the decentralized fault-tolerant control problem. However, none of the above studies considered the safety of the system. The optimal decentralized safety control (DSC) for constrained interconnected nonlinear safety-critical systems has not been thoroughly investigated thus far, which inspired our current study.

Motivated by previous discussions, this paper proposes an RL-based decentralized DSC strategy for constrained interconnected nonlinear safety-critical systems. The primary achievements are concluded below:

The reinforcement learning algorithm is used to solve the optimal DSC problem for restricted interconnected nonlinear safety-critical systems, and the asymmetric input constraint is successfully solved. The method optimizes the control strategy by minimizing the performance function, ensuring the safety of the system’s state, while considering the asymmetric input constraints.
Nonlinear interconnected safety-critical systems with asymmetric input constraints and safety constraints are converted to equivalent systems that satisfy user-defined safety constraints using barrier functions. Unlike the nonlinear safety-critical systems [3,9,10,13], this paper solves the security constraint problem of the interconnection term through the potential barrier function, which ensures the interconnected nonlinear safety-critical system satisfies the security constraint.
The asymmetric input constraints are solved by utilizing a single CNN architecture for online approximation of the performance function. Theoretical demonstrations show that the optimal DSC method can achieve uniformly ultimately bounded (UUB) system states and neural network weight estimation errors. In addition, a simulation example verified the feasibility and effectiveness of the developed DSC method.

The remainder of this article is structured as follows. In Section 2, the issue formulation and conversion are presented. In Section 3, the decentralized optimal safety DSC design scheme is presented. The design scheme for the critical neural network is presented in Section 4. In Section 5, the analyses of system stability are presented. In Section 6, the simulation sample demonstrates the effectiveness of the presented approach. Lastly, conclusions are given in Section 7.

2. Preliminaries

2.1. Problem Descriptions

Consider a constrained interconnected nonlinear safety-critical system composed of n subsystems and the formula below:

\begin{matrix} \begin{matrix} \{\begin{matrix} \dot{x_{i}} (t) = f_{i} (x_{i} (t)) + g_{i} (x_{i} (t)) u_{i} (t) + ▵ h_{i} (x (t)), \\ x_{i} (0) = x_{i 0}, i = 1, 2, \dots, n, \end{matrix} \end{matrix} \end{matrix}

(1)

where

x_{i} (t) \in R^{n_{i}}

is the ith subsystem’s state vector and

x_{i} (0)

represents the initial state,

x = [x_{1}^{T}, x_{2}^{T}, \dots, x_{n}^{T}] \in R^{\sum_{i = 1}^{n} n_{i}}

represents the overall state vector of the constrained interconnected nonlinear safety-critical system,

u_{i} = {[u_{i, 1}, u_{i, 2}, \dots u_{i, j}]}^{T} \in k_{i}

represents the control input, and the set of asymmetric constraints is represented as

k_{i} = \{u_{i, m_{i}} \in R^{m_{i}}, h_{i m i n} \leq |u_{i, j}| \leq h_{i m a x}, j = 1, 2, \dots, m_{i}\}

with

h_{i m i n}

and

h_{i m a x}

being the asymmetric saturating minimum and maximum bounds,

f_{i} (\cdot) \in R^{n_{i}}

and

g_{i} (\cdot) \in R^{n_{i} \times m_{_{i}}}

represent the drift system dynamics and input dynamics of the ith subsystem, respectively, and are Lipschitz continuous, and

▵ h_{i} \in R^{n_{i}}

represents the unknown interconnected term.

To simplify the design of the controller, let us introduce some assumptions. For

i = 1, 2, \dots, n

, we suppose the equilibrium of the ith subsystem’s state is

x_{i} = 0

.

Assumption 1.

For

i = 1, 2, \dots, n

, the

▵ h_{i} (x)

satisfies the below unmatched condition:

△ h_{i} = η_{i} (x_{i}) P_{i} (x),

where

η_{i} (x_{i})

is a known function with

η_{i} (x_{i}) \in R^{n_{i} \times q_{_{i}}} \neq g_{i} (x_{i})

, and

P_{i} (x)

is a bounded vector function that satisfies

∥P_{i} (x)∥ \leq \sum_{j = 1}^{n} b_{i, j} β_{i, j} (x_{j}),

(2)

where

b_{i, j} > 0

is a constant, and

β_{i, j} (x_{j})

are normal definite functions. Furthermore,

β_{i, j} (0) = 0

and

P_{i} (0) = 0

. Then, assuming

β_{j} (x_{j}) = m a x_{1 \leq i \leq n} \{β_{i, j} (x_{j})\}

, the unequal Equation (2) is denoted as:

∥P_{i} (x)∥ \leq \sum_{j = 1}^{n} C_{i, j} β_{j} (x_{j}),

(3)

where

C_{i, j} \geq (b_{i, j} β_{i, j} (x_{j})) / β_{j} (x_{j})

is a positive constant, and

j = 1, 2, \dots, n

.

Remark 1.

It is noted that constraints (2) and (3) specified by Assumption 1 are strict restrictions on specific interrelated nonlinear systems. Nevertheless, when we consider the function

P_{i} (x)

that satisfies no constraints (2) and (3), we discover that the calculational costs to address the stability of the closed-loop system are high. In fact, in real-world applications, constraints like inequalities (2) and (3) impose on the mismatched interconnection terms of the system (1) [40,42].

Assumption 2.

For

i = 1, 2, \dots, n

, the known function

g_{i} (x_{i})

is bounded as

∥g_{i} (x_{i})∥ \leq g_{i, m}

, where

g_{i, m}

is a known constant. Furthermore,

r a n k (g_{i} (x_{i})) = m_{i}

and

g_{i}^{T} (x_{i}) η_{i} (x_{i}) = 0

.

Based on the ith subsystem (1) described, the ith auxiliary subsystem is designed as:

\dot{x_{i}} = f_{i} (x_{i}) + g_{i} (x_{i}) u_{i} + (I_{n_{i}} - g_{i} (x_{i}) g_{i}^{+} (x_{i})) η_{i} (x_{i}) v_{i},

(4)

where

v_{i} \in R^{q_{i}}

is used to compensate for mismatched interconnections and stands for auxiliary control,

g_{i}^{+} (x_{i}) \in R^{m_{i} \times n_{i}}

is Moore–Penrose pseudo-reverse. According to Assumption 2, it can be found that the matrix

g_{i}^{+} (x_{i}) = {(g_{i}^{T} (x_{i}) g_{i} (x_{i}))}^{- 1} g_{i}^{T} (x_{i})

and

g_{i}^{+} (x_{i}) η_{i} (x_{i}) = {(g_{i}^{T} (x_{i}) g_{i} (x_{i}))}^{- 1} g_{i}^{T} (x_{i}) η_{i} (x_{i}) = 0

. Then, we rewrite the auxiliary subsystem (4) as:

\dot{x_{i}} = f_{i} (x_{i}) + g_{i} (x_{i}) u_{i} + η_{i} (x_{i}) v_{i} .

(5)

2.2. Security Conversion Issues

For the ith subsystem in the system (1), its state

x_{i} = {[x_{i, 1}, x_{i, 2}, \dots, x_{i, k}]}^{T}

satisfies the following security constraints:

\begin{matrix} \begin{matrix} \{\begin{matrix} x_{i, 1} \in (a_{i, 1}, A_{i, 1}), \\ x_{i, 2} \in (a_{i, 2}, A_{i, 2}), \\ . \\ . \\ . \\ x_{i, k} \in (a_{i, k}, A_{i, k}) . \end{matrix} \end{matrix} \end{matrix}

(6)

For nonlinear interconnect safety-critical systems with asymmetric input constraints and security constraints, we need to define the performance function as:

J_{i} (x_{i}) = \int_{t}^{\infty} e^{- α_{i} (τ - t)} (ι_{i} + Θ (x_{i}, u_{i}, v_{i})) d τ,

(7)

where

α_{i}

is the discount factor,

ι_{i} (x_{i}) = h_{i} β_{j}^{2} (x_{i})

and

Θ (x_{i}, u_{i}, v_{i}) = x_{i}^{T} H_{i} x_{i} + W_{i} (u_{i}) + ξ_{i} v_{i}^{T} v_{i}

with

H_{i}

and

W_{i} (u_{i})

are positive definite functions, where

h_{i}

and

ξ_{i}

are positive design parameters.

Remark 2.

Due to accounting for safety constraints and asymmetric input constraints in (7), the optimal control law does not converge to zero while the system state achieves the stable phase [43]. The discount factor

α_{i} = 0

,

J_{i} (x_{i})

may be unbounded, so it is necessary to consider the discount factor.

Problem 1.

(Decentralized control problems with security constraints and asymmetric input constraints) Consider the safety-critical system (1) and find the policy

u_{i} (.)

and auxiliary control strategy

v_{i} (.) : R^{n_{i}} \to R^{m_{i}}

in the ith subsystem. The performance function is given by (7) with the ith subsystem state

x_{i} = {[x_{i, 1}, \dots, x_{i, k}]}^{T}

and the control input

u_{i}

satisfying the following conditions:

\begin{matrix} u_{i, m i n} & \leq u_{i, j} \leq u_{i, m a x}, |u_{i, m i n}| \neq |u_{i, m a x}|, \end{matrix}

(8)

x_{i, k} \in (a_{i, k}, A_{i, k}), \forall k = 1, \dots, n_{i} .

(9)

Ensure that the security-critical system state is consistently within the security constraints. Further, the definitions of some barrier functions are given.

Definition 1

(Barrier function [9,10]). The function

B (\cdot) : R \to R

defined on interval (a, A) is referred to as the barrier function if

B (z; a, A) = {log}_{} \frac{A (a - z)}{a (A - z)}, \forall z \in (a, A),

(10)

where a and A are two constants satisfying

a < A

. Moreover, the potential function is invertible on the interval

(a, A)

, i.e.,

B^{- 1} (y; a, A) = \frac{a A (e^{\frac{y}{2}} - e^{- \frac{y}{2}})}{a e^{\frac{y}{2}} - A e^{- \frac{y}{2}}}, \forall y \in R .

(11)

Furthermore, the derivative of (11) is

\frac{d B^{- 1} (y; a, A)}{d y} = \frac{A a^{2} - a A^{2}}{a^{2} e^{y} - 2 a A + A^{2} e^{- y}} .

(12)

Based on Definition 1, we consider the state transition based on the potential barrier function as follows:

\begin{matrix} s_{i, k} = B (x_{i, k}; a_{i, k}, A_{i, k}), \end{matrix}

(13)

x_{i, k} = B^{- 1} (s_{i, k}; a_{i, k}, A_{i, k}),

(14)

where

k = 1, 2, \dots, n_{i}

. So, the

x_{i, k}

’s derivative concerning t is

\frac{d x_{i, k}}{d t} = \frac{d x_{i, k}}{d s_{i, k}} \frac{d s_{i, k}}{d t}

, and after using Definition 1, we obtain:

\begin{matrix} {\dot{s}}_{i, k} = & \frac{a_{i, k + 1} A_{i, k + 1} (e^{\frac{s_{i, k + 1}}{2}} - e^{- \frac{s_{i, k + 1}}{2}})}{a_{i, k + 1} e \frac{s_{i, k + 1}}{2} - A_{i, k + 1} e - \frac{s_{i, k + 1}}{2}} \times \frac{A_{i, k}^{2} e^{- s_{i, k}} - 2 a_{i, k} A_{i, k} + a_{i, k}^{2} e^{s_{i, k}}}{A_{i, k} a_{i, k}^{2} - a_{i, k} A_{i, k}^{2}} \\ = & F_{i k} (s_{i, k}, s_{i, k + 1}), k = 1, \dots, n_{i} - 1, \\ {\dot{s}}_{i, n_{i}} = & {\dot{x}}_{i} \times \frac{A_{i, k}^{2} e^{- s_{i, k}} - 2 a_{i, k} A_{i, k} + a_{i, k}^{2} e^{s_{i, k}}}{A_{i, k} a_{i, k}^{2} - a_{i, k} A_{i, k}^{2}} \\ = & F_{i, n_{i}} (s_{i, n_{i}}) + G_{i, n_{i}} (s_{i, n_{i}}) u_{i, n_{i}} + Y_{i, n_{i}} (s_{i, n_{i}}), \end{matrix}

where

\begin{matrix} F_{i, n_{i}} (s_{i}) & = \frac{a_{i, n_{i}}^{2} e_{i, n_{i}}^{s_{i}} - 2 a_{i, n_{i}} A_{i, n} + A_{i, n_{i}} e^{- s_{i, n_{i}}}}{A_{i, n_{i}} a_{i, n_{i}}^{2} - a_{i, n_{i}} A_{i, n_{i}}^{2}} \times f_{i} ([B_{i, 1}^{- 1} (s_{i, 1}) \dots B_{i, n_{i}}^{- 1} (s_{i, n_{i}})]), \\ G_{i, n_{i}} (s_{i}) & = \frac{a_{i, n_{i}}^{2} e_{i, n_{i}}^{s_{i}} - 2 a_{i, n_{i}} A_{i, n_{i}} + A_{i, n_{i}} e^{- s_{i, n_{i}}}}{A_{i,} a_{i, n_{i}}^{2} - a_{i, n_{i}} A_{i, n_{i}}^{2}} \times g_{i} ([B_{i, 1}^{- 1} (s_{i, 1}) \dots B_{i, n_{i}}^{- 1} (s_{i, n_{i}})]), \end{matrix}

and

Y_{i, n_{i}} (s_{i, n_{i}})

is the interconnection term of the

n_{i}

th term in the ith subsystem.

Then, the interconnected nonlinear safety-critical system (1) can be rewritten as:

\dot{s_{i}} = F_{i} (s_{i}) + G_{i} (s_{i}) u_{i} (t) + Y_{i} (s_{i}),

(15)

where

F_{i} (s_{i}) = {[F_{i 1} (s_{i, 1}, s_{i, 2}), \dots, F_{i, n_{i}} (s_{i})]}^{T}

,

G_{i} (s_{i}) = {[0, \dots, G_{i, n_{i}} (s_{i})]}^{T}

and

Y_{i} (s_{i})

is the unknown interconnected term.

Based on Assumption 1, we define the unknown interconnection term after the system transformation as:

Y_{i} (s_{i}) = ℘_{i} (s_{i}) U_{i} (s),

(16)

where

℘_{i} (s_{i}) = {[℘_{1, n_{1}} (s_{1}), 0, \dots, 0]}^{T}

, and

℘_{1, n_{1}} (s_{1}) = \frac{a_{i, 2}^{2} e^{s_{i, 2}} - 2 a_{i, 2} A_{i, 2} + A_{i, 2} e^{- s_{i, 2}}}{A_{i, 2} a_{i, 2}^{2} - a_{2} A_{i, 2}^{2}} \times η_{1, n_{1}} (x_{1}),

and

U_{i} (s_{i})

is a bounded vector function that satisfies

\begin{matrix} ∥U_{i} (s)∥ \leq \sum_{j = 1}^{n} b_{i, j} ϑ_{i, j} (s_{j}), \end{matrix}

(17)

where

ϑ_{i, j} (s_{j})

is a positive definite function. Then, assuming

ϑ_{j} (s_{j}) = m a x_{1 \leq i \leq n} \{ϑ_{i, j} (s_{j})\}

and

ϑ_{j} (s_{j}) = {[ϑ_{j, 1} (s_{j, 1}, s_{j, 2}), \dots, ϑ_{j, n_{i}} (s_{j})]}^{T}

, where

ϑ_{j, n_{i}} (s_{j}) = \frac{a_{j, n_{i}}^{2} e_{j, n_{i}}^{s_{j}} - 2 a_{j, n_{i}} A_{j, n_{i}} + A_{j, n_{i}} e^{- s_{j, n_{i}}}}{A_{j, n_{i}} a_{j, n_{i}}^{2} - a_{j, n_{j}} A_{j, n_{i}}^{2}} \times β_{j} ([B_{j, 1}^{- 1} (x_{1}) \dots B_{j, n_{i}}^{- 1} (x_{j})]) .

(18)

According to (3) and (18), the inequality (17) is expressed as:

∥U_{i} (s)∥ \leq \sum_{j = 1}^{n} S_{i, j} ϑ_{j} (s_{j}),

(19)

where

S_{i, j} \geq (b_{i, j} ϑ_{i, j} (s_{j})) / ϑ_{j} (s_{j})

is a positive constant, and

i, j = 1, 2, \dots, n

.

Assumption 3.

F_{i} (s_{i})

is Lipschitz continuous with

F_{i} (0) = 0

,

P_{i} (0) = 0

,

G_{i} (s_{i})

and

℘_{i} (s_{i})

are upper-bounded, then

∥F_{i} (s_{i})∥ \leq f_{i, m_{i}} ∥s_{i}∥

,

∥G_{i} (s_{i})∥ \leq g_{i, m_{i}}

, and

∥℘_{i} (s_{i})∥ \leq η_{i, m_{i}}

,

∥U_{i} (s_{i})∥ \leq P_{i, m_{i}} ∥s_{i}∥

, where

f_{i, m_{i}}

,

g_{i, m_{i}}

,

η_{i, m_{i}}

,

P_{i, m_{i}}

are positive constants.

r a n k (G_{i} (s_{i})) = m_{i}

and

G_{i}^{T} (s_{i}) ℘_{i} (s_{i}) = 0

. Moreover, the modified system (15) is within the manageable range, and

s_{i} = 0

is the balance point for (15).

Lemma 1

([32]).

\forall (s_{1}, s_{2}) \in R^{2}

, we have the following condition,

\begin{matrix} s_{1} s_{2} \leq \frac{ε_{1}^{p_{1}}}{p_{1}} | s_{1} |^{p_{1}} + \frac{1}{p_{2} ε_{1}^{p_{} 2}} {| s_{2} |}^{p_{2}}, \end{matrix}

where

ε_{1} > 0, (p_{1} - 1) (p_{2} - 1) = 1

and

p_{1}, p_{2} > 1

.

Remark 3.

The barrier function in Definition 1, which has the following characteristics, ensures that the safety-critical system (15) always satisfies the safety constraints [9,10].

1.: The state $s_{i}$ of the system is restricted to be bounded, so the system state $x_{i}$ satisfies constraints (8) and (9), i.e.,

$\begin{matrix} |B (z_{i}; a_{i}, A_{i})| < + \infty, \forall z_{i} \in (a_{i}, A_{i}) . \end{matrix}$
2.: When the system’s state approaches the boundary of the safety area, the barrier function changes as follows:

$\begin{matrix} lim_{z_{i} \to a_{i}^{+}} B (z_{i}; a_{i}, A_{i}) = - \infty, \\ lim_{z_{i} \to A_{i}^{-}} B (z_{i}; a_{i}, A_{i}) = + \infty . \end{matrix}$
3.: The barrier function fails to function when the system state reaches equilibrium, i.e.,

$\begin{matrix} B (0; a_{i}, A_{i}) = 0, \forall a_{i} < A_{i} . \end{matrix}$

3. Decentralized Optimal DSC Design

This section consists of two main subsections to establish the decentralized optimal DSC method. First, the security constraint problem is dealt with through the systematic transformation of the barrier function and the HJB equation for the ith auxiliary subsystem without security constraints is developed by introducing the improved performance function. Finally, the decentralized safety controller is constructed by solving the HJB equation for the auxiliary subsystem.

3.1. Barrier Function Conversion

According to the ith subsystem (15) described, the ith auxiliary subsystem is designed as:

\dot{s_{i}} = F_{i} (s_{i}) + G_{i} (s_{i}) u_{i} + (I_{n_{i}} - G_{i} (s_{i}) G_{i}^{+} (s_{i})) ℘_{i} (s_{i}) v_{i},

(20)

where

G_{i}^{+} (s_{i}) \in R^{m_{i} \times n_{i}}

is Moore–Penrose pseudo-reverse. According to Assumptions 2 and 3, the matrix if found to be

G_{i}^{+} (s_{i}) = {(G_{i}^{T} (s_{i}) G_{i} (s_{i}))}^{- 1} G_{i}^{T} (s_{i})

and

G_{i}^{+} (s_{i}) ℘_{i} (s_{i}) = {(G_{i}^{T} (s_{i}) G_{i} (s_{i}))}^{- 1} G_{i}^{T} (s_{i}) ℘_{i} (s_{i}) = 0

. Then, the auxiliary subsystem (20) is rewritten as:

\dot{s_{i}} = F_{i} (s_{i}) + G_{i} (s_{i}) u_{i} + ℘_{i} (s_{i}) v_{i} .

(21)

Regarding the converted system (15), analogous to (7), the performance function below is introduced:

V_{i} (s_{i}) = \int_{t}^{\infty} e^{- α_{i} (τ - t)} (π_{i} + γ (s_{i}, u_{i}, v_{i})) d τ,

(22)

where

π_{i} (s_{i}) = h_{i} ϑ_{j}^{2} (s_{i})

and

γ (s_{i}, u_{i}, v_{i}) = s_{i}^{T} Q_{i} s_{i} + W_{i} (u_{i}) + ξ_{i} v_{i}^{T} v_{i}

,

Q_{i}

is the positive definition matrix. Furthermore,

s_{i 0} = s_{i} (0)

denotes the initial state, and

W_{i} (u_{i})

is a non-quadratic utility function that solves the asymmetric input constraint. Then,

W_{i} (u_{i})

is defined in the following form:

W_{i} (u_{i}) = \sum_{j = 1}^{m_{i}} 2 λ_{i} \int_{c_{i}}^{u_{i, j}} Ψ^{- 1} ((v_{i} - c_{i}) / λ_{i}) d v_{i},

(23)

where

λ_{i} = (h_{i m a x} - h_{i m i n}) / 2

and

c_{i} = (h_{i m a x} + h_{i m i n}) / 2

, and

Ψ_{i} (.)

represent the monotonic odd function, where

Ψ_{i} (0) = 0

. In this paper, without sacrificing generality,

Ψ_{i} (s_{i}) = (e^{s_{i}} - e^{- s_{i}}) / (e^{s_{i}} + e^{- s_{i}})

.

Remark 4.

Unlike the traditional form of symmetric input constraints [35], this article considered asymmetric constraints on the controlling inputs [44]. The revised hyperbolic tangent function presented in (22) effectively transforms the asymmetric constrained control problem into an unconstrained control problem by devising different maximum and minimum bounds.

Problem 2.

(Optimal decentralized control problems with asymmetric input constraints) Finding the control policy

u_{i}

and auxiliary control strategy

v_{i}

in the ith subsystem, the performance function becomes (22).

Based on the subsystem (21), as well as the performance function (22), the corresponding Hamiltonian is given by:

\begin{matrix} H (s_{i}, u_{i}, v_{i}, \nabla V_{i} (s_{i})) = & {(\nabla V_{i} (s_{i}))}^{T} (F_{i} (s_{i}) + G_{i} (s_{i}) u_{i} (t) + ℘_{i} (s_{i}) v_{i}) \\ + π_{i} + γ (s_{i}, u_{i}, v_{i}) - α_{i} V_{i}, \end{matrix}

(24)

with

\nabla V_{i} (s_{i}) = \frac{\partial V_{i} (s_{i})}{\partial s_{i}}

.

The optimal performance function is

V_{i}^{*} (s_{i}) = min_{u_{i}, v_{i} \in Ψ (Ω_{i})} V_{i} (s_{i}),

(25)

where

Ψ (Ω_{i})

is a collection of all acceptable control policies and auxiliary control strategies for

Ω_{i}

.

Based on Bellman’s optimality principle [31],

V_{i}^{*} (s_{i})

in (25) satisfies the HJB

min_{u_{i}, v_{i} \in Ψ (Ω_{i})} H (s_{i}, u_{i}, v_{i}, \nabla V_{i}^{*} (s_{i})) = 0,

(26)

where

\nabla V_{i}^{*} (s_{i}) = \frac{\partial V_{i}^{*} (s_{i})}{\partial s_{i}}

. Then, the optimal control policy and the auxiliary control policy can be derived as follows:

u_{i}^{*} (s_{i}) = - λ_{i} tanh (\frac{1}{2 λ_{i}} G_{i}^{T} (s_{i}) \nabla V_{i}^{*} (s_{i})) + c_{i},

(27)

v_{i}^{*} (s_{i}) = - \frac{1}{2 ξ_{i}} ℘_{i}^{T} \nabla V_{i}^{*} (s_{i}),

(28)

where

c_{i} = [c_{1}, \dots, c_{m_{i}}]

.

Substituting

u_{i}^{*} (s_{i})

and

v_{i}^{*} (s_{i})

into (26), the HJB equation is rewritten as:

\begin{matrix} {(\nabla V_{i}^{*} (s_{i}))}^{T} F_{i} (s_{i}) + {(\nabla V_{i}^{*} (s_{i}))}^{T} G_{i} (s_{i}) u_{i}^{*} (s_{i}) - & ξ_{i} {∥v_{i}^{*} (s_{i})∥}^{2} \\ - α_{i} V_{i}^{*} + π_{i} (s_{i}) + s_{i}^{T} Q_{i} s_{i} + W_{i} (u_{i}^{*} (s_{i})) & = 0, \end{matrix}

(29)

with

V_{i}^{*} (0) = 0

.

Through the BF-based system transformation, the decentralized control problem 1 with asymmetric input constraints and security constraints is transformed into an unconstrained optimization problem, i.e., the decentralized control problem 2. Next, the following lemma is discussed to ensure the equivalence between the decentralized control problems 1 and 2.

Lemma 2.

Assume that Assumptions 1 to 3 are met and that control policy

u_{i} (\cdot)

and auxiliary control strategy

v_{i} (\cdot)

solve the decentralized control problem 2 of (21). It follows, then, that the below holds:

1.: If the initial state $x_{0}$ of the interconnected nonlinear safety-critical system (1) is in the range ( $a_{i, k}$ , $A_{i, k}$ ), $\forall k = 1, 2, \dots, n_{i}$ , then the closed-loop system satisfies (6).
2.: If the functions $H_{i} (x)$ and $Q_{i} (x)$ satisfies the condition $H_{i} (x_{i}) = Q_{i} (B_{i} (x_{i})) = Q_{i} (s_{i})$ , the performance described in (22) is equivalent to the one in (7).

Proof.

Both the performance function and Assumption 3 satisfy the observability of zero states, guaranteeing the presence of the safety-optimal performance function

V_{i}^{*} (s_{i})

. From (24), we obtain

\nabla V_{i}^{*} (t) \leq 0

, which allows us to obtain

V_{i}^{*} (s_{i} (t)) \leq V_{i}^{*} (s_{i} (0))

for all

t \geq 0

. Consequently, as stated in Remark 3, if the initial state

x_{i} (0)

of the system (21) satisfies the security constraint (6), and

V_{i}^{*} (s_{i} (0))

is bounded, then the

V_{i}^{*} (s_{i} (t))

is also bounded. Finally, we obtain

x_{i, k} (t) \in (a_{i, k}, A_{i, k}), k = 1, 2, \dots, n_{i} .

(30)

Therefore, the given

u_{i}^{*}

and

v_{i}^{*}

satisfy the constraints of the decentralized control problem 1.

Now, consider the state transition based on the barrier function described in (13) and (14). Since

x_{i}

satisfies the constraints given in (8), each element of the state

s_{i} = {[B_{i, 1} (x_{i, 1}), \dots, B_{i, k} (x_{i, k})]}^{T}

is finite. By comparing the performance functions (7) and (22), the equivalence relation

J_{i} (x_{i} (0)) = V_{i} (s_{i} (0))

is obtained, provided that

H_{i} (x_{i}) = Q_{i} (s_{i})

. This completes the proof. □

3.2. Designing the Optimal DSC Strategy by Solving n HJB Equations

Throughout this section, we show that the optimal DSC strategies for interconnected nonlinear systems can be constructed by solving the n HJB equations.

Theorem 1.

Consider n subsystems under Assumptions 1 to 3 with DSC policies

u_{i}^{*} (s_{i})

and auxiliary control strategies

v_{i}^{*} (s_{i})

, having the corresponding conditions as below:

{∥v_{i}^{*} (s_{i})∥}^{2} < s_{i}^{T} Q_{i} s_{i}, t \geq t_{0} .

(31)

Next, consider n positive constants

h_{i}^{*}, i = 1, 2, \dots, n

, so that for anything

h_{i} \geq h_{i}^{*}

, the optimal DSC policies

u_{1}^{*} (s_{1})

,

u_{2}^{*} (s_{2})

, …,

u_{n}^{*} (s_{n})

guarantee that the interconnected nonlinear system (15) with security constraints is UUB.

Proof.

The Lyapunov candidacy function

L_{i, 1} (s)

below was selected:

L_{i, 1} (s) = \sum_{i = 1}^{n} V_{i}^{*} (s_{i}),

(32)

where the

V_{i}^{*} (s_{i})

is defined in the same way as (22), and we denote the time derivative along the trajectory

\dot{s_{i}} = F_{i} (s_{i}) + G_{i} (s_{i}) u_{i} (t) + Y_{i} (s_{i})

as:

{\dot{L}}_{i, 1} (s) = \sum_{i = 1}^{n} {(\nabla V_{i}^{*})}^{T} (G_{i} (s_{i}) u_{i}^{*} + F_{i} (s_{i}) + Y_{i} (s)) .

(33)

By using (27) and (28), we obtain:

{(\nabla V_{i}^{*} (s_{i}))}^{T} G_{i} (s_{i}) = - 2 λ_{i} {tanh}^{- T} (\frac{u_{i}^{*} - c_{i}}{λ_{i}}),

(34)

{(\nabla V_{i}^{*} (s_{i}))}^{T} ℘_{i} (s_{i}) = - 2 ξ_{i} {(v_{i}^{*} (s_{i}))}^{T} .

(35)

Inserting (29), (34) and (35) into (33), we have

\begin{matrix} {\dot{L}}_{i, 1} (s) = \sum_{i = 1}^{n} [α_{i} V_{i}^{*} - π_{i} (s_{i}) - s_{i}^{T} Q_{i} s_{i} - W_{i} (u_{i}^{*}) + ξ_{i} {∥v_{i}^{*} (s_{i})∥}^{2} - 2 ξ_{i} {(v_{i}^{*} (s_{i}))}^{T} U_{i} (s)] . \end{matrix}

(36)

According to the optimal DSC policy (27), the term

W_{i} (u_{i}^{*})

becomes

W_{i} (u_{i}^{*} (s_{i})) = 2 λ_{i} \sum_{j = 1}^{m_{j}} \int_{0}^{u_{i, j}^{*} - c_{i}} {tanh}^{- 1} (\frac{u_{i} - c_{i}}{λ_{i}}) d (u_{i} - c_{i}) .

(37)

By appealing to the proof in [44], Equation (37) can be further reduced to

\begin{matrix} W_{i} (u_{i}^{*} (s_{i})) = \underset{β_{1}}{\underset{⏟}{λ_{i}^{2} \sum_{i = 1}^{m_{i}} ({tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}))}} \\ - \underset{β_{2}}{\underset{⏟}{2 λ_{i}^{2} \sum_{j = 1}^{m_{i}} \int_{0}^{{tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}})} (u_{i} - c_{i}) {tanh}^{2} (u_{i} - c_{i}) d (u_{i} - c_{i})}}, \end{matrix}

(38)

replacing (38) into (36), one has

\begin{matrix} {\dot{L}}_{i, 1} (s) \leq & - \sum_{i = 1}^{n} (2 ξ_{i} (s_{i}^{T} Q_{i} s_{i} - {∥v_{i}^{*} (s_{i})∥}^{2})) - \sum_{i = 1}^{n} (1 - 2 ξ_{i}) (s_{i}^{T} Q_{i} s_{i}) - \sum_{i = 1}^{n} (π_{i} (s_{i}) \\ - 2 ξ_{i} \sum_{j = 1}^{m_{i}} ∥v_{i}^{*} (s_{i})∥ b_{i, j} ϑ_{i, j} (s_{j}) + ξ^{2} {∥v_{i}^{*} (s_{i})∥}^{2}) + α_{i} V_{i}^{*} - β_{1} + β_{2} . \end{matrix}

(39)

It is known from [45] that there is a positive constant

δ_{i, M}

such that

0 \leq ∥\nabla V_{i}^{*} (s_{i})∥ \leq δ_{i, M}

. Therefore, using Lemma 1, Assumption 1, (17), (19), and (27), we obtain

\begin{matrix} 2 β_{1} & \leq 2 λ_{i}^{2} {tanh}^{- T} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}) {tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}) \\ = \frac{1}{2} {(\nabla V_{i}^{*} (s_{i}))}^{T} G_{i} (s_{i}) G_{i}^{T} (s_{i}) (\nabla V_{i}^{*} (s_{i})) \\ \leq \frac{1}{2} G_{i, m}^{2} δ_{i, m}^{2}, \end{matrix}

(40)

Utilizing the integral median theorem [46] and the inequality (40), the

β_{2}

(38) can be deduced as:

\begin{matrix} β_{2} & = 2 λ_{i}^{2} \sum_{j = 1}^{m_{i}} {tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}) ϖ_{i} {tanh}^{- 2} ϖ_{i} \\ \leq 2 λ_{i}^{2} \sum_{j = 1}^{m_{i}} {tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}) ϖ_{i} \\ \leq 2 λ_{i}^{2} {tanh}^{- T} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}) {tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}) \\ \leq \frac{1}{2} G_{i, m}^{2} δ_{i, m}^{2}, \end{matrix}

(41)

where

ϖ_{i} \in (0, {tanh}^{- 1} (\frac{u_{i, j}^{*} - c_{i}}{λ_{i}}))

.

From [27], we conclude that

∥α_{i} V_{i}^{*} (s_{i})∥ \leq ϱ_{i, m}

, where

ϱ_{i, m}

is a positive constant. Then, plugging (40) and (41) into (39), and taking into consideration the conclusion mentioned above, we can rephrase inequality (39) as follows:

\begin{matrix} {\dot{L}}_{i, 1} (s) \leq - \sum_{i = 1}^{n} (2 ξ_{i} (s_{i}^{T} Q_{i} s_{i} - {∥v_{i}^{*} (s_{i})∥}^{2})) - \sum_{i = 1}^{n} (1 - 2 ξ_{i}) (s_{i}^{T} Q_{i} s_{i}) \\ - \sum_{i = 1}^{n} (h_{i} ϑ_{i} {(s_{j})}^{2} - 2 ξ_{i} \sum_{j = 1}^{m_{i}} ∥v_{i}^{*} (s_{i})∥ b_{i, j} ϑ_{i, j} (s_{j}) + ξ^{2} {∥v_{i}^{*} (s_{i})∥}^{2}) + ϱ_{i} + \frac{1}{4} \sum_{i = 1}^{n} G_{i, m}^{2} δ_{i, m}^{2}, \end{matrix}

(42)

by denoting

Λ = d i a g \{h_{1}, h_{2}, \dots, h_{n}\}

and

Z = [ϑ_{1} (s_{1}), \dots, ϑ_{n} (s_{n}), ξ_{1} ∥v_{1}^{*} (s_{1})∥, \dots,

ξ_{n} ∥v_{n}^{*} (s_{n})∥]

. Let the condition (31) be satisfied, so we have

\begin{matrix} {\dot{L}}_{i, 1} (s) \leq - \sum_{i = 1}^{n} (1 - 2 ξ_{i}) (s_{i}^{T} Q_{i} s_{i}) - Z^{T} X Z + ϱ_{i} + \frac{1}{4} \sum_{i = 1}^{n} G_{i, m}^{2} δ_{i, m}^{2}, \end{matrix}

(43)

with

X = [\begin{matrix} Λ & A^{T} \\ A & I_{n} \end{matrix}]

and

A = [\begin{matrix} b_{11} & \dots & b_{1 n} \\ ⋮ & ⋱ & ⋮ \\ b_{n 1} & \dots & b_{n n} \end{matrix}]

.

From the matrix X expression, positive definiteness is maintained by choosing a sufficiently large

Λ

. In other words, there is

h_{i}^{*} > 0

, such that

h_{i} > h_{i}^{*}

, ensuring

Z^{T} X Z > 0

. Thus, the inequality (43) is further deduced as:

\begin{matrix} {\dot{L}}_{i, 1} (s) \leq - \sum_{i = 1}^{n} (1 - 2 ξ_{i}) λ_{m i n} (Q_{i}) {∥s_{i}∥}^{2} + ϱ_{i} + \frac{1}{4} \sum_{i = 1}^{n} G_{i, m}^{2} δ_{i, m}^{2} . \end{matrix}

(44)

The inequality (44) means that

{\dot{L}}_{i, 1} (s) < 0

whenever

s_{i} (t)

lies outside the following set

N_{s_{i}}

:

N_{s_{i}} = \{s_{i} : ∥s_{i}∥ \leq \sqrt{\frac{\frac{1}{4} G_{i, M}^{2} δ_{i, M}^{2} + ϱ_{i}}{λ_{m i n} (Q_{i}) (1 - 2 ξ_{i})}}\} .

(45)

Based on Lyapunov’s extension theorem [47], it is shown that the optimal performance functions

V_{i}^{*} (s_{i})

guarantee that the interconnected nonlinear system (15) with asymmetric input constraints is UUB. Since the performance function (7) and (22) yield the same results, it can be shown that the optimal performance function

J_{i}^{*} (x_{i})

guarantees that the interconnected nonlinear safety-critical system (1) with security constraints and asymmetric input constraints is UUB. □

4. Critic Network for Approximation

The critic neural network is introduced in this section, with the aim of approximating the optimal performance function. Then, the evaluation network of the auxiliary subsystem (21) is used to construct the estimated optimal control strategy. According to [48],

V_{i}^{*} (s_{i})

is expressed as:

V_{i}^{*} (s_{i}) = W_{c_{i}}^{T} σ_{c_{i}} (s_{i}) + ε_{c_{i}} (s_{i}),

(46)

where

σ_{c_{i}} (s_{i}) = [σ_{c_{i}, 1} (s_{i}), σ_{c_{i}, 2} (s_{i}), \dots, σ_{c_{i}, N_{i}} (s_{i})] \in R^{N_{i}}

denotes the activation function,

W_{c_{i}} \in R^{N_{i}}

denotes the ideal weight vector,

N_{i}

denotes the number of neurons, and

ε_{c_{i}} (s_{i}) \in R^{N_{i}}

is the reconstruction error of NN. The vector activation function

σ_{c_{i}, p} (s_{i})

is denoted as a continuously differentiable function, where

p = 1, 2, \dots, N_{i}

. For

s_{i} \neq 0

,

{\{σ_{c_{i}, p} (s_{i})\}}_{p = 1}^{N_{i}}

is linearly independent. Then, the derivative of

V_{i}^{*} (s_{i})

can be expressed as:

\nabla V_{i}^{*} (s_{i}) = \nabla σ_{c_{i}}^{T} (s_{i}) W_{c_{i}} + \nabla ε_{c_{i}} (s_{i}),

(47)

where

\nabla σ_{c_{i}} (s_{i}) = \frac{\partial σ_{c_{i}} (s_{i})}{\partial s_{i}}

and

\nabla ε_{c_{i}} (s_{i}) = \frac{\partial ε_{c_{i}} (s_{i})}{\partial s_{i}}

.

From Equations (27), (28) and (47), the optimal safety control policy

u_{i}^{*} (s_{i})

and the auxiliary control strategy

v_{i}^{*} (s_{i})

are rephrased as:

u_{i}^{*} (s_{i}) = - λ_{i} tanh (\frac{1}{2 λ_{i}} G_{i}^{T} (s_{i}) \nabla σ_{c_{i}}^{T} (s_{i}) W_{c_{i}}) + c_{d_{i}} + ε_{u_{i}} (s_{i}),

(48)

v_{i}^{*} (s_{i}) = - \frac{1}{2 ξ_{i}} ℘_{i}^{T} (s_{i}) \nabla σ_{c_{i}}^{T} (s_{i}) W_{c_{i}} + ε_{v_{i}} (s_{i}),

(49)

where

\begin{matrix} ε_{u_{i}} (s_{i}) & = - \frac{1}{2} (I_{m_{i}} - {tanh}^{2} (ζ)) G_{i}^{T} (s_{i}) \nabla ε_{c_{i}} (s_{i}), \\ ε_{v_{i}} (s_{i}) & = - \frac{1}{2 ξ_{i}} ℘_{i}^{T} (s_{i}) \nabla ε_{c_{i}} (s_{i}), \end{matrix}

with

I_{m_{i}} = {[1, 1, \dots, 1]}^{T} \in R^{m_{i}}

. The seclected value of

ζ

is between

\frac{1}{2 λ_{i}} G_{i}^{T} (s_{i}) \nabla σ_{c_{i}}^{T} (s_{i}) W_{c_{i}}

and

\frac{1}{2 λ_{i}} G_{i}^{T} (s_{i}) (\nabla σ_{c_{i}}^{T} (s_{i}) W_{c_{i}} + \nabla ε_{c_{i}} (s_{i}))

.

The ideal weight vector

W_{c_{i}}

is not available and the optimal control strategy

u_{i}^{*} (s_{i})

is not directly applicable. Therefore, the estimated weight vector

{\hat{W}}_{c_{i}}

is constructed to replace

W_{c_{i}}

as:

{\hat{V}}_{i}^{*} (s_{i}) = {\hat{W}}_{c_{i}}^{T} σ_{c_{i}} (s_{i}) .

(50)

The estimation error

{\tilde{W}}_{c_{i}} = W_{c_{i}} - {\hat{W}}_{c_{i}}

is defined. Similarly, according to (50), the (49) and (48) are further developed as:

{\hat{u}}_{i} (s_{i}) = - λ_{i} tanh (\frac{1}{2 λ_{i}} G_{i}^{T} (s_{i}) \nabla σ_{c_{i}}^{T} (s_{i}) {\hat{W}}_{c_{i}}) + c_{d_{i}},

(51)

{\hat{v}}_{i} (s_{i}) = - \frac{1}{2 ξ_{i}} ℘_{i}^{T} (s_{i}) \nabla σ_{c_{i}}^{T} (s_{i}) {\hat{W}}_{c_{i}} .

(52)

Combining (50), (51) and (52), the Hamiltonian is re-expressed as:

\begin{matrix} H (s_{i}, {\hat{u}}_{i}, {\hat{v}}_{i}, \nabla {\hat{V}}_{i} (s_{i})) = & {(\nabla {\hat{V}}_{i} (s_{i}))}^{T} (G_{i}^{T} (s_{i}) {\hat{u}}_{i} + F_{i} (s_{i}) + ℘_{i} (s_{i}) {\hat{v}}_{i}) \\ + π_{i} (s_{i}) + γ_{i} (s_{i}, {\hat{u}}_{i}, {\hat{v}}_{i}) - α_{i} {\hat{V}}_{i} . \end{matrix}

(53)

According to (53), the error of the Hamiltonian is given by:

\begin{matrix} e_{i} & = H (s_{i}, {\hat{u}}_{i}, {\hat{v}}_{i}, \nabla {\hat{V}}_{i} (s_{i})) - H (s_{i}, u_{i}^{*}, v_{i}^{*}, \nabla V_{i}^{*} (s_{i})) \\ = π_{i} (s_{i}) + s_{i}^{T} Q_{i} s_{i} + W_{i} ({\hat{u}}_{i}) + ξ_{i} {\hat{v}}_{i}^{T} {\hat{v}}_{i} + {\hat{W}}_{c_{i}}^{T} ϱ_{i}, \end{matrix}

(54)

with

ϱ_{i} = \nabla σ_{c_{i}} (x_{i}) (G_{i}^{T} (s_{i}) {\hat{u}}_{i} + F_{i} (s_{i}) + ℘_{i} (s_{i}) {\hat{v}}_{i}) - α_{i} σ_{c_{i}} (s_{i})

. In order to make

u_{i} (s_{i}) \to u_{i}^{*} (s_{i})

, the error

e_{i}

should be guaranteed to be sufficiently small. To solve this issue, a critic weight adjustment law

{\hat{W}}_{c_{i}}

is proposed to minimize the objective function

ϕ_{i} = \frac{1}{2} e_{i}^{T} e_{i}

. Next, the critic updating law is developed as:

{\hat{W}}_{c_{i}} = - \frac{α_{c_{i}} ϱ_{i} e_{i}}{{(1 + ϱ_{i}^{T} ϱ_{i})}^{2}},

(55)

where the constant

α_{c_{i}}

is the positive learning rate.

Remark 5.

To minimize the Hamiltonian error

e_{i}

, it is necessary to maintain the derivative of

ϕ_{i}

as

{\dot{ϕ}}_{i} < 0

. Therefore, the critic weight adjustment law is derived by employing the normalization term

{(1 + ϱ_{i}^{T} ϱ_{i})}^{- 2}

and applying the gradient descent method with respect to

{\hat{W}}_{c_{i}}

[49].

By considering the definition of

{\tilde{W}}_{c_{i}}

, we obtain

{\dot{\tilde{W}}}_{c_{i}} = - α_{c_{i}} ℓ_{i} ℓ_{i}^{T} {\tilde{W}}_{c_{i}} + \frac{α_{c_{i}} ℓ_{i} e_{H_{i}}}{𝚤_{i}},

(56)

where

ℓ_{i} = \frac{ϱ_{i}}{1 + ϱ_{i}^{T} ϱ_{i}}

and

𝚤_{i} = 1 + ϱ_{i}^{T} ϱ_{i}

.

e_{H_{i}}

denotes the residual error, defined as

e_{H_{i}} = \nabla σ_{c_{i}} (x_{i}) (G_{i}^{T} (s_{i}) {\hat{u}}_{i} + F_{i} (s_{i}) + ℘_{i} (s_{i}) {\hat{v}}_{i})

.

The proposed decentralized DSC strategy for the ith subsystem with a single critic-NN is illustrated in Figure 1.

5. Stability Analysis

This section focuses on the stability of the n-auxiliary subsystem for the given control scheme. We need to make some Assumptions to satisfy the theorem.

Assumption 4.

For

s_{i} \in Ω_{i}, i = 1, \dots, n

, there exist some positive constants

D_{ε_{u_{i}}}, η_{i, M}, D_{σ_{c_{i}}}, D_{ε_{v_{i}}}

and

D_{e_{H_{i}}}

satisfying

∥ε_{u_{i}} (s_{i})∥ \leq D_{ε_{u_{i}}}

,

∥℘_{i} (s_{i})∥ \leq ℘_{i, M}

,

∥\nabla σ_{c_{i}} (s_{i})∥ \leq D_{σ_{c_{i}}}, ∥ε_{v_{i}} (s_{i})∥ \leq D_{ε_{v_{i}}}

and

∥e_{H_{i}}∥ \leq D_{e_{H_{i}}}

.

Assumption 5.

Consider the time period

[t, t + t_{k}]

and

t_{k} > 0

. Then, the term

ℓ_{i} ℓ_{i}^{T}

fulfills the following condition:

ϵ_{i} I_{N_{i}} \leq ℓ_{i} ℓ_{i}^{T} \leq s_{i} I_{N_{i}},

(57)

where

ϵ_{i}

and

s_{i}

are positive constants.

Theorem 2.

For the nonlinear interconnected safety-critical system (15), we design the estimated optimal safety policies and auxiliary control strategies as (51) and (52), respectively. Assume that Assumptions 1–5 hold. If

{\hat{W}}_{c_{i}}

is updated by (55), then

s_{i}

and

{\hat{W}}_{c_{i}}

are UUB if

α_{c_{i}}

in (55) satisfies

α_{c_{i}} > \frac{℘_{i, M}^{2} D_{σ_{c_{i}}}^{2}}{ξ_{i} λ_{m i n} (ℓ_{i} ℓ_{i}^{T})} .

(58)

Proof.

The candidate Lyapunov function is considered to be:

L_{i} (t) = \sum_{i = 1}^{n} (V_{i}^{*} (s_{i}) + \frac{1}{2} {\tilde{W}}_{c_{i}}^{T} {\tilde{W}}_{c_{i}}) .

(59)

Then, defining

L_{i, 1} (t) = V_{i}^{*} (s_{i})

and

L_{i, 2} (t) = \frac{1}{2} {\tilde{W}}_{c_{i}}^{T} {\tilde{W}}_{c_{i}}

, the time derivative by

L_{i, 1} (t)

is

\begin{matrix} {\dot{L}}_{i, 1} (t) = & {(\nabla V_{i}^{*} (s_{i}))}^{T} (G_{i}^{T} (s_{i}) {\hat{u}}_{i} + F_{i} (s_{i}) + ℘_{i} (s_{i}) {\hat{v}}_{i}) \\ = & {(\nabla V_{i}^{*} (s_{i}))}^{T} (G_{i}^{T} (s_{i}) u_{i}^{*} + F_{i} (s_{i}) + ℘_{i} (s_{i}) v_{i}^{*}) \\ + \underset{β_{3}}{\underset{⏟}{{(\nabla V_{i}^{*} (s_{i}))}^{T} G_{i}^{T} (s_{i}) ({\hat{u}}_{i} - u_{i}^{*})}} + \underset{β_{4}}{\underset{⏟}{{(\nabla V_{i}^{*} (s_{i}))}^{T} ℘_{i} (s_{i}) ({\hat{v}}_{i} - v_{i}^{*})}} . \end{matrix}

(60)

Combining (29), (34) and (35). The (60) is further deduced as:

\begin{matrix} {\dot{L}}_{i, 1} (t) = & α_{i} V_{i}^{*} - π_{i} (s_{i}) - s_{i}^{T} Q_{i} s_{i} - W_{i} (u_{i}^{*}) + ξ_{i} {∥v_{i}^{*} (s_{i})∥}^{2} + β_{3} + β_{4} . \end{matrix}

(61)

According to Lemma 1, and taking into account (40), (48), (51), we observe that the

β_{3}

term in (61) is satisfied by

\begin{matrix} β_{3} & \leq λ_{i}^{2} ∥{tanh}^{- 1} (\frac{u_{i}^{*} (s_{i}) - c_{d_{i}}}{λ_{i}})∥ + {∥{\hat{u}}_{i} - u_{i}^{*}∥}^{2} \\ \leq β_{1} + \underset{β_{5}}{\underset{⏟}{{∥λ_{i} (tanh (Y_{i, 1} (s_{i})) - tanh (Y_{i, 2} (s_{i}))) - ε_{u_{i}} (s_{i})∥}^{2}}} \\ \leq \frac{1}{4} G_{i, M}^{2} δ_{i, M}^{2} + β_{5}, \end{matrix}

(62)

where

Y_{i} (s_{i}) = \frac{1}{2 λ_{i}} G_{i}^{T} (s_{i}) \nabla V_{i}^{*} (s_{i})

. Then, based on the fact

∥tanh (Y_{i, k} (s_{i}))∥ \leq \sqrt{m_{i}},

k = 1, 2

in [44], according to Assumption 5,

β_{5}

is derived as:

\begin{matrix} β_{5} & \leq 2 λ_{i}^{2} {∥tanh (Y_{i, 1} (s_{i})) - tanh (Y_{i, 2} (s_{i}))∥}^{2} + 2 {∥ε_{u_{i}} (s_{i})∥}^{2} \\ \leq 4 λ_{i}^{2} ({∥tanh (Y_{i, 1} (s_{i}))∥}^{2} + {∥tanh (Y_{i, 2} (s_{i}))∥}^{2}) + 2 {∥ε_{u_{i}} (s_{i})∥}^{2} \\ \leq 8 λ_{i}^{2} m_{i} + 2 D_{ε_{u_{i}}}^{2} . \end{matrix}

(63)

Similarly, the last term of (61) is deduced from (35), (49) and (52) as:

\begin{matrix} β_{4} & \leq - ξ_{i} {∥v_{i}^{*}∥}^{2} - ξ_{i} {∥{\hat{v}}_{i} - v_{i}^{*}∥}^{2} \\ \leq - ξ_{i} {∥v_{i}^{*}∥}^{2} + 2 ξ_{i} ({∥{\hat{v}}_{i}∥}^{2} - {∥v_{i}^{*}∥}^{2}) + 2 ξ_{i} {∥ε_{v_{i}}∥}^{2} \\ \leq - ξ_{i} {∥v_{i}^{*}∥}^{2} + \frac{1}{2 ξ_{i}} ℘_{i, M}^{2} D_{σ_{c_{i}}}^{2} {∥{\tilde{W}}_{c_{i}}∥}^{2} + 2 ξ_{i} D_{ε_{v_{i}}}^{2} . \end{matrix}

(64)

By using (38), (62)–(64) and the fact that

∥α_{i} V_{i}^{*} (s_{i})∥ \leq ϱ_{i, M}

the following is derived:

\begin{matrix} {\dot{L}}_{i, 1} (t) \leq - λ_{m i n} (Q_{i}) {∥s_{_{i}}∥}^{2} + \frac{1}{2 ξ_{i}} ℘_{i, M}^{2} D_{σ_{c_{i}}}^{2} {∥{\tilde{W}}_{c_{i}}∥}^{2} + Θ_{i}, \end{matrix}

(65)

with

Θ_{i} = ϱ_{i, M} + \frac{1}{2} G_{i, M}^{2} δ_{i, M}^{2} + 8 λ_{i}^{2} m_{i} + 2 D_{ε_{u_{i}}}^{2} + 2 ξ_{i} D_{ε_{v_{i}}}^{2}

.

The error weight update law

{\tilde{W}}_{c_{i}}

.

L_{i, 1} (t)

is considered with the time derivative

{\dot{L}}_{i, 2} (t) = - α_{c_{i}} {\tilde{W}}_{c_{i}}^{T} ℓ_{i} ℓ_{i}^{T} {\tilde{W}}_{c_{i}} + α_{c_{i}} \frac{{\tilde{W}}_{c_{i}}^{T} ℓ_{i}}{𝚤_{i}} e_{H_{i}} .

(66)

Combining Lemma 1 and Assumption 4, the following conclusion is drawn:

α_{c_{i}} \frac{{\tilde{W}}_{c_{i}}^{T} ℓ_{i}}{𝚤_{i}} e_{H_{i}} \leq \frac{α_{c_{i}}}{2} {\tilde{W}}_{c_{i}}^{T} ℓ_{i} ℓ_{i}^{T} {\tilde{W}}_{c_{i}} + \frac{α_{c_{i}}}{2} D_{e_{H_{i}}}^{2} .

(67)

Combining inequalities (66) and (67), we derive the following inequalities:

{\dot{L}}_{i, 2} (t) \leq - \frac{α_{c_{i}}}{2} λ_{m i n} (ℓ_{i} ℓ_{i}^{T}) {∥{\tilde{W}}_{c_{i}}∥}^{2} + \frac{α_{c_{i}}}{2} D_{e_{H_{i}}}^{2} .

(68)

Substituting (65) and (68) into (59), the following inequality is obtained:

{\dot{L}}_{i} (t) \leq \sum_{i = 1}^{n} (- λ_{m i n} (Q_{i}) {∥s_{i}∥}^{2} - x_{i} {∥{\tilde{W}}_{c_{i}}∥}^{2} + Θ_{i} + \frac{α_{c_{i}}}{2} D_{e_{H_{i}}}^{2}),

(69)

where

x_{i} = \frac{α_{c_{i}}}{2} λ_{m i n} (ℓ_{i} ℓ_{i}^{T}) - \frac{1}{2 ξ_{i}} ℘_{i, M}^{2} D_{σ_{c_{i}}}^{2}

,

λ_{m i n} (ℓ_{i} ℓ_{i}^{T})

means the minimum eigenvalue of

ℓ_{i} ℓ_{i}^{T}

.

Therefore, Equations (58) and (69) mean

{\dot{L}}_{i} (t) < 0

, provided that the parameters

s_{i}

and

{\tilde{W}}_{c_{i}}

are not in the set of

N_{i} \{s_{i} : ∥s_{i}∥ \leq \sqrt{\frac{2 Θ_{i} + D_{e_{H_{i}}}^{2}}{2 λ_{m i n} (Q_{i})}}\},

(70)

N_{{\tilde{W}}_{c_{i}}} \{{\tilde{W}}_{c_{i}} : ∥{\tilde{W}}_{c_{i}}∥ \leq \sqrt{\frac{2 Θ_{i} + D_{e_{H_{i}}}^{2}}{x_{i}}}\} .

(71)

Introducing Lyapunov’s extension theorem, ref. [47], ensures the stability of the closed-loop system. This proof ensures that the weight estimation error

{\tilde{W}}_{c_{i}}

is UUB. At this point, this completes the proof process. □

Remark 6.

In contrast to techniques that aim to achieve input saturation [10,13], this article proposes an RL technique to solve the optimal DSC problem with safety constraints and asymmetric input constraints. This approach ensures not only the safety of the system but also minimizes the input constraints. Therefore, the developed reinforcement learning technique, based on security constraints and asymmetric input constraints, is better suited for some project applications, particularly for systems where the system state must be globally within the security settings.

6. Simulation Example

In this section, we provide a simulation example to verify the effectiveness of the proposed approach. The simulation involved a dual-linked robotic arm system [42]. The state space model of the system is defined by

\begin{matrix} {\dot{x}}_{1, 1} & = x_{1, 2}, \\ {\dot{x}}_{1, 2} & = - \frac{M_{1}}{{\tilde{G}}_{1}} x_{1, 2} - \frac{m_{1} \tilde{g} {\tilde{l}}_{1}}{{\tilde{G}}_{1}} sin (x_{1, 1}) + \frac{1}{{\tilde{G}}_{1}} u_{1} + △ h_{1}, \\ {\dot{x}}_{2, 1} & = x_{2, 2}, \\ {\dot{x}}_{2, 2} & = - \frac{M_{2}}{{\tilde{G}}_{2}} x_{2, 2} - \frac{m_{2} \tilde{g} {\tilde{l}}_{2}}{{\tilde{G}}_{2}} sin (x_{2, 1}) + \frac{1}{{\tilde{G}}_{2}} u_{2} + △ h_{2}, \end{matrix}

(72)

where

x_{i, 1}

and

x_{i, 2}

(i = 1, 2)

indicate the angular location of the robot arm,

u_{i}

stands for control input, and the

△ h_{i} = η_{i} P_{i}

represents the interconnection terms. The other parameters of the robotic arm system (72) are depicted in Table 1. The initial system state was selected as

x_{0} = {[2, 2, 2, 2]}^{T}

. We first defined the state variable

x_{i} = {[x_{i, 1}, x_{i, 2}]}^{T}

and constructed the internal dynamics and input gain matrix as follows:

f_{i} (x_{i}) = [\begin{matrix} x_{i, 2} \\ - \frac{M_{i}}{{\tilde{G}}_{i}} x_{i, 2} - \frac{m_{i} \tilde{g} {\tilde{l}}_{i}}{{\tilde{G}}_{i}} sin (x_{i, 1}) \end{matrix}] + [\begin{matrix} 0 \\ \frac{1}{{\tilde{G}}_{i}} \end{matrix}] u_{i} + [\begin{matrix} 1 \\ 0 \end{matrix}] P_{i} (x_{i}),

where

P_{1} (x_{1})

,

P_{2} (x_{2})

denote the uncertain interconnection terms of subsystems 1 and 2, i.e.,

\begin{matrix} P_{1} (x_{1}) & = 0.1 x_{1, 1} sin (x_{2, 2}), \\ P_{2} (x_{2}) & = (x_{1, 2} - 3 sin (0.1 x_{2, 1})) . \end{matrix}

Furthermore, the two robotic arm subsystems were in a state that satisfied the below security constraints:

\begin{matrix} x_{1, 1} \in (- 0.5, 2.9), x_{1, 2} \in (- 1.5, 2.5), \\ x_{2, 1} \in (- 1, 2.5), x_{2, 2} \in (- 3.5, 3) . \end{matrix}

(73)

Therefore, to deal with the security constraint, the following system of transformations without security constraint was obtained, using the BF-based system transformation (13):

s_{i} = F_{i} (s_{i}) + G_{i} (s_{i}) u_{i} + ℘_{i} (s_{i}) U_{i},

(74)

where

\begin{matrix} F_{i} (s_{i}) & = [\begin{matrix} \frac{a_{i, 2} A_{i, 2} (e^{\frac{s_{i, 2}}{2}} - e^{- \frac{s_{i, 2}}{2}})}{a_{i, 2} e^{\frac{s_{i, 2}}{2}} - A_{i, 2} e^{- \frac{s_{i, 2}}{2}}} \frac{a_{i, 1}^{2} e^{s_{i, 1}} - 2 a_{i, 1} A_{i, 1} + A_{i, 1} e^{- s_{i, 1}}}{A_{i, 1} a_{i, 1}^{2} - a_{1} A_{i, 1}^{2}} \\ f_{i} (B^{- 1} (s_{i})) \frac{a_{i, 2}^{2} e^{s_{i, 2}} - 2 a_{i, 2} A_{i, 2} + A_{i, 2} e^{- s_{i, 2}}}{A_{i, 2} a_{i, 2}^{2} - a_{i, 2} A_{i, 2}^{2}} \end{matrix}], \\ G_{i} (s_{i}) & = [\begin{matrix} 0 \\ \frac{1}{\tilde{G}} \frac{a_{i, 2}^{2} e^{s_{i, 2}} - 2 a_{i, 2} A_{i, 2} + A_{i, 2} e^{- s_{i, 2}}}{A_{i, 2} a_{i, 2}^{2} - a_{2} A_{i, 2}^{2}} \end{matrix}], \\ ℘_{i} (s_{i}) & = [\begin{matrix} \frac{a_{i, 2}^{2} e^{s_{i, 2}} - 2 a_{i, 2} A_{i, 2} + A_{i, 2} e^{- s_{i, 2}}}{A_{i, 2} a_{i, 2}^{2} - a_{2} A_{i, 2}^{2}} \\ 0 \end{matrix}] . \end{matrix}

(75)

For the transformed dual-linked robotic arm system (74), the initial state was chosen by

s_{i, 0} = {[s_{i, 0} (1), s_{i, 0} (2)]}^{T} = {[B (x_{i, 0} (1); a_{i, 1}, A_{i, 1}), B (x_{i, 0} (2); a_{i, 2}, A_{i, 2})]}^{T}

. The discount factors were chosen as

α_{1} = 1

and

α_{2} = 0.1

. The matrices were designed as

Q_{1} = 0.5 I^{2}

and

Q_{2} = I^{2}

,

R_{1} = 1

and

R_{2} = 1

. The upper and lower limits were allocated as below:

h_{1 m a x} = 0.75

,

h_{1 m i n} = - 0.25

and

h_{2 m a x} = 1.5

,

h_{2 m i n} = - 0.5

. Let

ϑ_{1} = ∥s_{1}∥

and

ϑ_{2} = ∥s_{2}∥

. Additional design factors were setup as below:

ξ_{1} = 8, ξ_{2} = 4, a_{c_{1}} = 2, a_{c_{2}} = 2

. Choose the activation functions

σ_{c_{i}} (s_{i}) = {[s_{1, 1}^{2}, s_{1, 1} s_{1, 2}, s_{1, 2}^{2}]}^{T}

and

σ_{c_{i}} (s_{i}) = {[s_{2, 1}^{2}, s_{2, 1} s_{2, 2}, s_{2, 2}^{2}]}^{T}

.

The simulation outcomes are presented in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13. The states of the system are depicted in Figure 2 and Figure 8, and it can be observed that the closed-loop system stabilized after 20 s and 35 s, respectively. However, the system failed to meet the specified security constraints. Figure 3 and Figure 9, shown in comparison with Figure 2 and Figure 8, not only assured that the system states converged to zero, but also satisfied the given safety constraints. The evolving states

s_{1} (t)

and

s_{2} (t)

are presented in Figure 4 and Figure 10, based on the safe control method with asymmetric input constraints. The optimal DSC policies are shown in Figure 5 and Figure 11. We found that the optimal DSC policies were restricted to the asymmetric set

[- 0.25, 0.75]

and

[- 0.5, 1.5]

. Figure 6 and Figure 12 represent the optimal auxiliary control strategies for subsystems 1 and 2, respectively. Figure 7 and Figure 13 show the critic updated laws. It can be observed that the weights converged after 15 s. According to Theorem 3, we concluded that the proposed optimal safety control policy and the auxiliary control policy could stabilize the closed-loop nonlinear system and satisfy the safety constraints on the system state. Moreover, the optimal control policy eventually converged to a predefined set of constraints. Finally, the results of the simulation showed that the presented optimal DSC solution for constrained interconnected nonlinear safety-critical systems, affected by system state constraints, is effective.

7. Conclusions

This article presents an RL-based DSC scheme for interconnected nonlinear safety-critical systems with security constraints and asymmetric input constraints. The proposed method transformed an interconnected nonlinear safety-critical system with security and asymmetric input constraints into an interconnected nonlinear safety-critical system with only asymmetric input constraints by using the barrier function. The non-quadratic utility function was added to the performance function to address the asymmetric input constraint. The critic network was also used to approach the optimal performance function and to establish the best security policy. Our control scheme stabilizes the closed-loop system and minimizes the improved performance function. In addition, the simulation results demonstrated the efficacy of the proposed distributed security solution. Future work will explore the optimal safety control of stochastic interconnected nonlinear systems with event triggering.

Author Contributions

C.Q. and Y.W. provided methodology, validation, and writing—original draft preparation; T.Z. provided conceptualization, writing—review; J.Z. provided supervision; C.Q. provided funding support. All authors read and agreed to the published version of the manuscript.

Funding

This work was supported by the science and technology research project of the Henan province 222102240014.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

Son, T.D.; Nguyen, Q. Safety-critical control for non-affine nonlinear systems with application on autonomous vehicle. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 7623–7628. [Google Scholar]
Manjunath, A.; Nguyen, Q. Safe and robust motion planning for dynamic robotics via control barrier functions. In Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, 14–17 December 2021; pp. 2122–2128. [Google Scholar]
Wang, J.; Qin, C.; Qiao, X.; Zhang, D.; Zhang, Z.; Shang, Z.; Zhu, H. Constrained optimal control for nonlinear multi-input safety-critical systems with time-varying safety constraints. Mathematics 2022, 10, 2744. [Google Scholar] [CrossRef]
Liu, Z.; Yuan, Q.; Nie, G.; Tian, Y. A multi-objective model predictive control for vehicle adaptive cruise control system based on a new safe distance model. Int. J. Automot. Technol. 2021, 22, 475–487. [Google Scholar] [CrossRef]
Ames, A.D.; Xu, X.; Grizzle, J.W.; Tabuada, P. Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 2016, 62, 3861–3876. [Google Scholar] [CrossRef]
Qin, C.; Wang, J.; Zhu, H.; Zhang, J.; Hu, S.; Zhang, D. Neural network-based safe optimal robust control for affine nonlinear systems with unmatched disturbances. Neurocomputing 2022, 506, 228–239. [Google Scholar] [CrossRef]
Qin, C.; Wang, J.; Zhu, H.; Xiao, Q.; Zhang, D. Safe adaptive learning algorithm with neural network implementation for H_∞ control of nonlinear safety-critical system. Int. J. Robust Nonlinear Control 2023, 33, 372–391. [Google Scholar] [CrossRef]
Srinivasan, M.; Abate, M.; Nilsson, G.; Coogan, S. Extent-compatible control barrier functions. Syst. Control Lett. 2021, 150, 104895. [Google Scholar] [CrossRef]
Yang, Y.; Yin, Y.; He, W.; Vamvoudakis, K.G.; Modares, H. Safety-aware reinforcement learning framework with an actor-critic-barrier structure. In Proceedings of the 2019 American Control Conference (ACC), Philadelphia, PA, USA, 10–12 July 2019; pp. 2352–2358. [Google Scholar]
Yang, Y.; Vamvoudakis, K.G.; Modares, H.; Yin, Y.; Wunsch, D.C. Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5441–5455. [Google Scholar] [CrossRef]
Xu, J.; Wang, J.; Rao, J.; Zhong, Y.; Wang, H. Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. Int. J. Robust Nonlinear Control 2022, 32, 3408–3424. [Google Scholar] [CrossRef]
Brunke, L.; Greeff, M.; Hall, A.W.; Yuan, Z.; Zhou, S.; Panerati, J.; Schoellig, A.P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annu. Rev. Control Robot. Auton. Syst. 2022, 5, 411–444. [Google Scholar] [CrossRef]
Qin, C.; Zhu, H.; Wang, J.; Xiao, Q.; Zhang, D. Event-triggered safe control for the zero-sum game of nonlinear safety-critical systems with input saturation. IEEE Access 2022, 10, 40324–40337. [Google Scholar] [CrossRef]
Bakule, L. Decentralized control: An overview. Annu. Rev. Control. 2008, 32, 87–98. [Google Scholar] [CrossRef]
Xu, L.X.; Wang, Y.L.; Wang, X.; Peng, C. Decentralized Event-Triggered Adaptive Control for Interconnected Nonlinear Systems With Actuator Failures. IEEE Trans. Fuzzy Syst. 2022, 31, 148–159. [Google Scholar] [CrossRef]
Guo, B.; Dian, S.; Zhao, T. Robust NN-based decentralized optimal tracking control for interconnected nonlinear systems via adaptive dynamic programming. Nonlinear Dyn. 2022, 110, 3429–3446. [Google Scholar] [CrossRef]
Feng, Z.; Li, R.B.; Wu, L. Adaptive decentralized control for constrained strong interconnected nonlinear systems and its application to inverted pendulum. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
Zouhri, A.; Boumhidi, I. Stability analysis of interconnected complex nonlinear systems using the Lyapunov and Finsler property. Multimed. Tools Appl. 2021, 80, 19971–19988. [Google Scholar] [CrossRef]
Li, X.; Zhan, Y.; Tong, S. Adaptive neural network decentralized fault-tolerant control for nonlinear interconnected fractional-order systems. Neurocomputing 2022, 488, 14–22. [Google Scholar] [CrossRef]
Tan, Y.; Yuan, Y.; Xie, X.; Tian, E.; Liu, J. Observer-based event-triggered control for interval type-2 fuzzy networked system with network attacks. IEEE Trans. Fuzzy Syst. 2023, 1–10. [Google Scholar] [CrossRef]
Zhang, J.; Li, S.; Ahn, C.K.; Xiang, Z. Adaptive fuzzy decentralized dynamic surface control for switched large-scale nonlinear systems with full-state constraints. IEEE Trans. Cybern. 2021, 52, 10761–10772. [Google Scholar] [CrossRef]
Huo, X.; Karimi, H.R.; Zhao, X.; Wang, B.; Zong, G. Adaptive-critic design for decentralized event-triggered control of constrained nonlinear interconnected systems within an identifier-critic framework. IEEE Trans. Cybern. 2021, 52, 7478–7491. [Google Scholar] [CrossRef]
Bao, C.; Wang, P.; Tang, G. Data-Driven Based Model-Free Adaptive Optimal Control Method for Hypersonic Morphing Vehicle. IEEE Trans. Aerosp. Electron. Syst. 2022, 1–15. [Google Scholar] [CrossRef]
Farzanegan, B.; Suratgar, A.A.; Menhaj, M.B.; Zamani, M. Distributed optimal control for continuous-time nonaffine nonlinear interconnected systems. Int. J. Control 2022, 95, 3462–3476. [Google Scholar] [CrossRef]
Heydari, M.H.; Razzaghi, M. A numerical approach for a class of nonlinear optimal control problems with piecewise fractional derivative. Chaos Solitons Fractals 2021, 152, 111465. [Google Scholar] [CrossRef]
Liu, S.; Niu, B.; Zong, G.; Zhao, X.; Xu, N. Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 2022, 109, 891–909. [Google Scholar] [CrossRef]
Niu, B.; Liu, J.; Wang, D.; Zhao, X.; Wang, H. Adaptive decentralized asymptotic tracking control for large-scale nonlinear systems with unknown strong interconnections. IEEE/CAA J. Autom. Sin. 2021, 9, 173–186. [Google Scholar] [CrossRef]
Zhao, B.; Luo, F.; Lin, H.; Liu, D. Particle swarm optimized neural networks based local tracking control scheme of unknown nonlinear interconnected systems. Neural Netw. 2021, 134, 54–63. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Niu, B.; Zong, G.; Xu, N.; Ahmad, A.M. Event-triggered optimal decentralized control for stochastic interconnected nonlinear systems via adaptive dynamic programming. Neurocomputing 2023, 539, 126163. [Google Scholar] [CrossRef]
Wang, T.; Wang, H.; Xu, N.; Zhang, L.; Alharbi, K.H. Sliding-mode surface-based decentralized event-triggered control of partially unknown interconnected nonlinear systems via reinforcement learning. Inf. Sci. 2023, 641, 119070. [Google Scholar] [CrossRef]
Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
Tang, F.; Niu, B.; Zong, G.; Zhao, X.; Xu, N. Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Netw. 2022, 154, 43–55. [Google Scholar] [CrossRef]
Sun, J.; Liu, C. Backstepping-based adaptive dynamic programming for missile-target guidance systems with state and input constraints. J. Frankl. Inst. 2018, 355, 8412–8440. [Google Scholar] [CrossRef]
Zhao, S.; Wang, J.; Wang, H.; Xu, H. Goal representation adaptive critic design for discrete-time uncertain systems subjected to input constraints: The event-triggered case. Neurocomputing 2022, 492, 676–688. [Google Scholar] [CrossRef]
Liu, C.; Zhang, H.; Xiao, G.; Sun, S. Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input. Neurocomputing 2019, 323, 1–11. [Google Scholar] [CrossRef]
Sun, H.; Hou, L. Adaptive decentralized finite-time tracking control for uncertain interconnected nonlinear systems with input quantization. Int. J. Robust Nonlinear Control 2021, 31, 4491–4510. [Google Scholar] [CrossRef]
Duan, D.; Liu, C. Finite-horizon optimal tracking control for constrained-input nonlinear interconnected system using aperiodic distributed nonzero-sum games. IET Control Theory Appl. 2021, 15, 1199–1213. [Google Scholar] [CrossRef]
Li, Y.; Li, Y.-X.; Tong, S. Event-based finite-time control for nonlinear multi-agent systems with asymptotic tracking. IEEE Trans. Autom. Control 2023, 68, 3790–3797. [Google Scholar] [CrossRef]
Zhang, H.; Zhao, X.; Zong, G.; Xu, N. Fully distributed consensus of switched heterogeneous nonlinear multi-agent systems with bouc-wen hysteresis input. IEEE Trans. Netw. Sci. Eng. 2022, 9, 4198–4208. [Google Scholar] [CrossRef]
Yang, X.; Zhou, Y.; Dong, N.; Wei, Q. Adaptive critics for decentralized stabilization of constrained-input nonlinear interconnected systems. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 4187–4199. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, H.; Xu, N.; Zong, G.; Zhao, X. Reinforcement learning-based decentralized fault tolerant control for constrained interconnected nonlinear systems. Chaos Solitons Fractals 2023, 167, 113034. [Google Scholar] [CrossRef]
Cui, L.; Zhang, Y.; Wang, X.; Xie, X. Event-triggered distributed self-learning robust tracking control for uncertain nonlinear interconnected systems. Appl. Math. Comput. 2021, 395, 125871. [Google Scholar] [CrossRef]
Tang, Y.; Yang, X. Robust tracking control with reinforcement learning for nonlinear-constrained systems. Int. J. Robust Nonlinear Control 2022, 32, 9902–9919. [Google Scholar] [CrossRef]
Yang, X.; Zhao, B. Optimal neuro-control strategy for nonlinear systems with asymmetric input constraints. IEEE/CAA J. Autom. Sin. 2020, 7, 575–583. [Google Scholar] [CrossRef]
Beard, R.W.; Saridis, G.N.; Wen, J.T. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 1997, 33, 2159–2177. [Google Scholar] [CrossRef]
Liu, D.; Yang, X.; Wang, D.; Wei, Q. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 2015, 45, 1372–1385. [Google Scholar] [CrossRef]
Pishro, A.; Shahrokhi, M.; Sadeghi, H. Fault-tolerant adaptive fractional controller design for incommensurate fractional-order nonlinear dynamic systems subject to input and output restrictions. Chaos Solitons Fractals 2022, 157, 111930. [Google Scholar] [CrossRef]
Zhang, L.; Zhao, X.; Zhao, N. Real-time reachable set control for neutral singular Markov jump systems with mixed delays. IEEE Trans. Circuits Syst. II Express Briefs 2021, 69, 1367–1371. [Google Scholar] [CrossRef]
Lakmesari, S.H.; Mahmoodabadi, M.J.; Ibrahim, M.Y. Fuzzy logic and gradient descent-based optimal adaptive robust controller with inverted pendulum verification. Chaos Solitons Fractals 2021, 151, 111257. [Google Scholar] [CrossRef]

Figure 1. The block diagram of the developed optimal DSC scheme.

Figure 2. Evolution of state

x_{1} (t)

without using the DSC method.

Figure 2. Evolution of state

x_{1} (t)

without using the DSC method.

Figure 3. Evolution of state

x_{1} (t)

using the DSC method.

Figure 3. Evolution of state

x_{1} (t)

using the DSC method.

Figure 4. Evolution of state

s_{1} (t)

using the DSC method.

Figure 4. Evolution of state

s_{1} (t)

using the DSC method.

Figure 5. Control evolution of input

u_{1}

.

Figure 5. Control evolution of input

u_{1}

.

Figure 6. Evolution of the auxiliary control input

v_{1}

using the DSC method.

Figure 6. Evolution of the auxiliary control input

v_{1}

using the DSC method.

Figure 7. Evolution of the critic weight vector

W_{c_{1}}

using the DSC method.

Figure 7. Evolution of the critic weight vector

W_{c_{1}}

using the DSC method.

Figure 8. Evolution of state

x_{2} (t)

without using the DSC method.

Figure 8. Evolution of state

x_{2} (t)

without using the DSC method.

Figure 9. Evolution of state

x_{2} (t)

using the DSC method.

Figure 9. Evolution of state

x_{2} (t)

using the DSC method.

Figure 10. Evolution of state

s_{2} (t)

using the DSC method.

Figure 10. Evolution of state

s_{2} (t)

using the DSC method.

Figure 11. Control evolution of input

u_{2}

.

Figure 11. Control evolution of input

u_{2}

.

Figure 12. Evolution of the auxiliary control input

v_{2}

using the DSC method.

Figure 12. Evolution of the auxiliary control input

v_{2}

using the DSC method.

Figure 13. Evolution of the critic weight vector

W_{c_{2}}

using the DSC method.

Figure 13. Evolution of the critic weight vector

W_{c_{2}}

using the DSC method.

Table 1. Meanings and values of symbols used in robotic arm systems.

The ith Subsystem	Parameter	Meaning	Value
	$m_{1}$	Mass of payload	5 kg
	$M_{1}$	Viscous friction	2 N
The first subsystem	${\tilde{l}}_{1}$	Length of the arm	0.5 m
	${\tilde{G}}_{1}$	Moment of inertia	10 kg
	${\tilde{g}}_{1}$	Acceleration of gravity	9.81 m/s
	$m_{2}$	Mass of payload	10 kg
	$M_{2}$	Viscous friction	2 N
The second subsystem	${\tilde{l}}_{2}$	Length of the arm	1 m
	${\tilde{G}}_{2}$	Moment of inertia	10 kg
	${\tilde{g}}_{2}$	Acceleration of gravity	9.81 m/s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, C.; Wu, Y.; Zhang, J.; Zhu, T. Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. Entropy 2023, 25, 1158. https://doi.org/10.3390/e25081158

AMA Style

Qin C, Wu Y, Zhang J, Zhu T. Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. Entropy. 2023; 25(8):1158. https://doi.org/10.3390/e25081158

Chicago/Turabian Style

Qin, Chunbin, Yinliang Wu, Jishi Zhang, and Tianzeng Zhu. 2023. "Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems" Entropy 25, no. 8: 1158. https://doi.org/10.3390/e25081158

APA Style

Qin, C., Wu, Y., Zhang, J., & Zhu, T. (2023). Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. Entropy, 25(8), 1158. https://doi.org/10.3390/e25081158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems

Abstract

1. Introduction

2. Preliminaries

2.1. Problem Descriptions

2.2. Security Conversion Issues

3. Decentralized Optimal DSC Design

3.1. Barrier Function Conversion

3.2. Designing the Optimal DSC Strategy by Solving n HJB Equations

4. Critic Network for Approximation

5. Stability Analysis

6. Simulation Example

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI