Next Article in Journal
Research on Quantum-Attack-Resistant Strong Forward-Secure Signature Schemes
Next Article in Special Issue
TURBO: The Swiss Knife of Auto-Encoders
Previous Article in Journal
Post-Quantum Secure Identity-Based Proxy Blind Signature Scheme on a Lattice
Previous Article in Special Issue
Wasserstein Distance-Based Deep Leakage from Gradients
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems

1
School of Artificial Intelligence, Henan University, Zhengzhou 450046, China
2
School of Software, Henan University, Kaifeng 475000, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(8), 1158; https://doi.org/10.3390/e25081158
Submission received: 30 May 2023 / Revised: 21 June 2023 / Accepted: 1 July 2023 / Published: 2 August 2023
(This article belongs to the Special Issue Information Theory for Interpretable Machine Learning)

Abstract

:
This paper addresses the problem of decentralized safety control (DSC) of constrained interconnected nonlinear safety-critical systems under reinforcement learning strategies, where asymmetric input constraints and security constraints are considered. To begin with, improved performance functions associated with the actuator estimates for each auxiliary subsystem are constructed. Then, the decentralized control problem with security constraints and asymmetric input constraints is transformed into an equivalent decentralized control problem with asymmetric input constraints using the barrier function. This approach ensures that safety-critical systems operate and learn optimal DSC policies within their safe global domains. Then, the optimal control strategy is shown to ensure that the entire system is uniformly ultimately bounded (UUB). In addition, all signals in the closed-loop auxiliary subsystem, based on Lyapunov theory, are uniformly ultimately bounded, and the effectiveness of the designed method is verified by practical simulation.

1. Introduction

Over the past few decades, safety has received increasing attention in autonomous driving [1], intelligent robots [2], robotic arms [3], adaptive cruise control [4], etc. The design of these systems and controllers require that the system state trajectories evolve within a set called the safe set, reflecting the inherent properties of the system [5]. In practice, many engineering systems must operate within a specific safety range, beyond which the controlled system may be at risk [6]. Safety-critical systems primarily refer to systems having control behaviors that prioritize safety. The designed control schemes aim to reduce the potential for severe consequences, such as personal injury and environmental pollution, which may arise due to system shutdown or operational errors [7]. To ensure the safety and reliability of the system, scholars developed many safety control schemes. The classical approach focused on extending and applying Naguma’s theorem to safe sets defined by continuously differentiable functions [8]. In particular, barrier functions have become an effective tool for verifying security and have been widely used in [9,10,11]. They were used to convert a system with security constraints into an equivalency system that satisfies security requirements and then a security controller was designed to protect the system. In [9,10], penalty functions and BF-based state transitions were employed to merge states into a reinforcement learning framework to solve optimal control problems with full-state constraints. In [11], a safe non-strategic reinforcement learning method to solve secure nonlinear systems with dynamic uncertainty was proposed. In [12,13], a new secure reinforcement learning method was proposed to solve secure nonlinear systems with symmetric input constraints. However, the results in [9,10,11,12,13], mentioned above, were mainly based on studying the optimal safety control in a single continuous-time/discrete-time nonlinear system. The security control of interconnected systems has not been fully resolved.
On the other hand, interconnected systems consist of multiple subsystems with interconnected characteristics, and designing controllers for them through a concept similar to that of a single-system approach is difficult [14]. To solve this problem [15,16,17], the decentralized control approach, based on local subsystem information, was proposed. This approach involved using multiple controllers to control the interconnected systems. In [18,19], the decentralized control approach differed by initially decomposing the entire system control problem into a series of subproblems that could be solved independently. The solutions to the subproblems (i.e., independent controllers) were then joined to form a decentralized controller to stabilize the entire system. In addition, implementing the decentralized control algorithm used only the local subsystem’s knowledge, not the complete system’s information. Recently, scholars have proposed many schemes or techniques for designing decentralized controllers, including quantization techniques [20], fuzzy techniques [21], and optimal control methods [22]. This paper develops decentralized control strategies from the optimal control perspective. Problems of optimal control are usually solved via the solution of the Hamilton–Jacobi–Bellman (HJB) partial differentiation equation [23,24]. However, the HJB equation is generally not solvable analytically due to its inherent nonlinearity [25,26]. Therefore, adaptive dynamic programming (ADP) and reinforcement learning (RL) algorithms were proposed to obtain numerical solutions to the HJB equation and were widely applied to nonlinear interconnected systems [27,28,29,30]. In [31,32], the two previously mentioned algorithms could be deemed closely related, as they exhibited similar characteristics in addressing optimal control problems. For example, in [27,28], the distributed optimal controller was designed using robust ADP for nonlinear interconnected systems with unknown dynamics and parameters. In [29], the optimal decentralized control problem for interconnected nonlinear systems subject to stochastic dynamics was solved by enhancing the performance function of the auxiliary subsystem and transforming the original control problem into a set of optimal control strategies sampled in periodic patterns. Furthermore, in [30], the identifier–critic network framework was used to solve the problem of decentralized event-triggered control based on sliding-mode surfaces, avoiding the need for knowledge of the system’s internal dynamics. It is worth noting that the control results provided in [27,28,29,30] did not consider input constraints.
Control constraints are commonly encountered in industrial processes, where they are widespread and have a detrimental impact on the performance of systems [33,34]. Therefore, the study of constrained nonlinear systems is of practical importance. In [35,36], the RL-based decentralized algorithm was developed for tracking control of constrained interconnected nonlinear systems. In [37], the problem of decentralized optimal control of a constrained interconnected nonlinear system was solved by introducing a nonquadratic performance function to overcome the symmetric input constraint. The results in [35,36,37], mentioned above, mainly addressed the symmetric input constraint. However, the problem of asymmetric input constraints was identified in several project cases [38,39]. In [40], the optimal decentralized control problem with asymmetric input constraints was solved by designing a new non-quadratic performance function. In [41], a new performance function was proposed for interconnected nonlinear systems to successfully overcome the asymmetric input constraint and to solve the decentralized fault-tolerant control problem. However, none of the above studies considered the safety of the system. The optimal decentralized safety control (DSC) for constrained interconnected nonlinear safety-critical systems has not been thoroughly investigated thus far, which inspired our current study.
Motivated by previous discussions, this paper proposes an RL-based decentralized DSC strategy for constrained interconnected nonlinear safety-critical systems. The primary achievements are concluded below:
  • The reinforcement learning algorithm is used to solve the optimal DSC problem for restricted interconnected nonlinear safety-critical systems, and the asymmetric input constraint is successfully solved. The method optimizes the control strategy by minimizing the performance function, ensuring the safety of the system’s state, while considering the asymmetric input constraints.
  • Nonlinear interconnected safety-critical systems with asymmetric input constraints and safety constraints are converted to equivalent systems that satisfy user-defined safety constraints using barrier functions. Unlike the nonlinear safety-critical systems [3,9,10,13], this paper solves the security constraint problem of the interconnection term through the potential barrier function, which ensures the interconnected nonlinear safety-critical system satisfies the security constraint.
  • The asymmetric input constraints are solved by utilizing a single CNN architecture for online approximation of the performance function. Theoretical demonstrations show that the optimal DSC method can achieve uniformly ultimately bounded (UUB) system states and neural network weight estimation errors. In addition, a simulation example verified the feasibility and effectiveness of the developed DSC method.
The remainder of this article is structured as follows. In Section 2, the issue formulation and conversion are presented. In Section 3, the decentralized optimal safety DSC design scheme is presented. The design scheme for the critical neural network is presented in Section 4. In Section 5, the analyses of system stability are presented. In Section 6, the simulation sample demonstrates the effectiveness of the presented approach. Lastly, conclusions are given in Section 7.

2. Preliminaries

2.1. Problem Descriptions

Consider a constrained interconnected nonlinear safety-critical system composed of n subsystems and the formula below:
x i ˙ ( t ) = f i ( x i ( t ) ) + g i ( x i ( t ) ) u i ( t ) + h i ( x ( t ) ) , x i ( 0 ) = x i 0 , i = 1 , 2 , , n ,
where x i ( t ) R n i is the ith subsystem’s state vector and x i ( 0 ) represents the initial state, x = x 1 T , x 2 T , , x n T R i = 1 n n i represents the overall state vector of the constrained interconnected nonlinear safety-critical system, u i = [ u i , 1 , u i , 2 , u i , j ] T k i represents the control input, and the set of asymmetric constraints is represented as k i = u i , m i R m i , h i m i n u i , j h i m a x , j = 1 , 2 , , m i with h i m i n and h i m a x being the asymmetric saturating minimum and maximum bounds, f i · R n i and g i · R n i × m i represent the drift system dynamics and input dynamics of the ith subsystem, respectively, and are Lipschitz continuous, and h i R n i represents the unknown interconnected term.
To simplify the design of the controller, let us introduce some assumptions. For i = 1 , 2 , , n , we suppose the equilibrium of the ith subsystem’s state is x i = 0 .
Assumption 1.
For i = 1 , 2 , , n , the h i ( x ) satisfies the below unmatched condition:
h i = η i x i P i x ,
where η i x i is a known function with η i x i R n i × q i g i ( x i ) , and P i x is a bounded vector function that satisfies
P i ( x ) j = 1 n b i , j β i , j ( x j ) ,
where b i , j > 0 is a constant, and β i , j ( x j ) are normal definite functions. Furthermore, β i , j ( 0 ) = 0 and P i 0 = 0 . Then, assuming β j x j = m a x 1 i n β i , j x j , the unequal Equation (2) is denoted as:
P i ( x ) j = 1 n C i , j β j ( x j ) ,
where C i , j b i , j β i , j x j / β j x j is a positive constant, and j = 1 , 2 , , n .
Remark 1.
It is noted that constraints (2) and (3) specified by Assumption 1 are strict restrictions on specific interrelated nonlinear systems. Nevertheless, when we consider the function P i ( x ) that satisfies no constraints (2) and (3), we discover that the calculational costs to address the stability of the closed-loop system are high. In fact, in real-world applications, constraints like inequalities (2) and (3) impose on the mismatched interconnection terms of the system (1) [40,42].
Assumption 2.
For i = 1 , 2 , , n , the known function g i x i is bounded as g i x i g i , m , where g i , m is a known constant. Furthermore, r a n k g i x i = m i and g i T x i η i x i = 0 .
Based on the ith subsystem (1) described, the ith auxiliary subsystem is designed as:
x i ˙ = f i x i + g i x i u i + I n i g i x i g i + x i η i x i v i ,
where v i R q i is used to compensate for mismatched interconnections and stands for auxiliary control, g i + x i R m i × n i is Moore–Penrose pseudo-reverse. According to Assumption 2, it can be found that the matrix g i + ( x i ) = g i T x i g i x i 1 g i T x i and g i + ( x i ) η i x i = g i T x i g i x i 1 g i T x i η i x i = 0 . Then, we rewrite the auxiliary subsystem (4) as:
x i ˙ = f i x i + g i x i u i + η i x i v i .

2.2. Security Conversion Issues

For the ith subsystem in the system (1), its state x i = x i , 1 , x i , 2 , , x i , k T satisfies the following security constraints:
x i , 1 ( a i , 1 , A i , 1 ) , x i , 2 ( a i , 2 , A i , 2 ) , . . . x i , k ( a i , k , A i , k ) .
For nonlinear interconnect safety-critical systems with asymmetric input constraints and security constraints, we need to define the performance function as:
J i x i = t e α i τ t ι i + Θ x i , u i , v i d τ ,
where α i is the discount factor, ι i ( x i ) = h i β j 2 ( x i ) and Θ x i , u i , v i = x i T H i x i + W i ( u i ) + ξ i v i T v i with H i and W i ( u i ) are positive definite functions, where h i and ξ i are positive design parameters.
Remark 2.
Due to accounting for safety constraints and asymmetric input constraints in (7), the optimal control law does not converge to zero while the system state achieves the stable phase [43]. The discount factor α i = 0 , J i x i may be unbounded, so it is necessary to consider the discount factor.
Problem 1.
(Decentralized control problems with security constraints and asymmetric input constraints) Consider the safety-critical system (1) and find the policy u i ( . ) and auxiliary control strategy v i ( . ) : R n i R m i in the ith subsystem. The performance function is given by (7) with the ith subsystem state x i = [ x i , 1 , , x i , k ] T and the control input u i satisfying the following conditions:
u i , m i n u i , j u i , m a x , u i , m i n u i , m a x ,
x i , k ( a i , k , A i , k ) , k = 1 , , n i .
Ensure that the security-critical system state is consistently within the security constraints. Further, the definitions of some barrier functions are given.
Definition 1
(Barrier function [9,10]). The function B · : R R defined on interval (a, A) is referred to as the barrier function if
B z ; a , A = log A a z a A z , z a , A ,
where a and A are two constants satisfying a < A . Moreover, the potential function is invertible on the interval ( a , A ) , i.e.,
B 1 y ; a , A = a A e y 2 e y 2 a e y 2 A e y 2 , y R .
Furthermore, the derivative of (11) is
d B 1 y ; a , A d y = A a 2 a A 2 a 2 e y 2 a A + A 2 e y .
Based on Definition 1, we consider the state transition based on the potential barrier function as follows:
s i , k = B x i , k ; a i , k , A i , k ,
x i , k = B 1 s i , k ; a i , k , A i , k ,
where k = 1 , 2 , , n i . So, the x i , k ’s derivative concerning t is d x i , k d t = d x i , k d s i , k d s i , k d t , and after using Definition 1, we obtain:
s ˙ i , k = a i , k + 1 A i , k + 1 e s i , k + 1 2 e s i , k + 1 2 a i , k + 1 e s i , k + 1 2 A i , k + 1 e s i , k + 1 2 × A i , k 2 e s i , k 2 a i , k A i , k + a i , k 2 e s i , k A i , k a i , k 2 a i , k A i , k 2 = F i k s i , k , s i , k + 1 , k = 1 , , n i 1 , s ˙ i , n i = x ˙ i × A i , k 2 e s i , k 2 a i , k A i , k + a i , k 2 e s i , k A i , k a i , k 2 a i , k A i , k 2 = F i , n i s i , n i + G i , n i s i , n i u i , n i + Y i , n i ( s i , n i ) ,
where
F i , n i ( s i ) = a i , n i 2 e i , n i s i 2 a i , n i A i , n + A i , n i e s i , n i A i , n i a i , n i 2 a i , n i A i , n i 2 × f i ( [ B i , 1 1 ( s i , 1 ) B i , n i 1 ( s i , n i ) ] ) , G i , n i ( s i ) = a i , n i 2 e i , n i s i 2 a i , n i A i , n i + A i , n i e s i , n i A i , a i , n i 2 a i , n i A i , n i 2 × g i ( [ B i , 1 1 ( s i , 1 ) B i , n i 1 ( s i , n i ) ] ) ,
and Y i , n i ( s i , n i ) is the interconnection term of the n i th term in the ith subsystem.
Then, the interconnected nonlinear safety-critical system (1) can be rewritten as:
s i ˙ = F i ( s i ) + G i ( s i ) u i ( t ) + Y i ( s i ) ,
where F i ( s i ) = [ F i 1 ( s i , 1 , s i , 2 ) , , F i , n i ( s i ) ] T , G i ( s i ) = [ 0 , , G i , n i ( s i ) ] T and Y i ( s i ) is the unknown interconnected term.
Based on Assumption 1, we define the unknown interconnection term after the system transformation as:
Y i ( s i ) = i ( s i ) U i ( s ) ,
where i ( s i ) = [ 1 , n 1 ( s 1 ) , 0 , , 0 ] T , and
1 , n 1 ( s 1 ) = a i , 2 2 e s i , 2 2 a i , 2 A i , 2 + A i , 2 e s i , 2 A i , 2 a i , 2 2 a 2 A i , 2 2 × η 1 , n 1 ( x 1 ) ,
and U i ( s i ) is a bounded vector function that satisfies
U i ( s ) j = 1 n b i , j ϑ i , j ( s j ) ,
where ϑ i , j ( s j ) is a positive definite function. Then, assuming ϑ j s j = m a x 1 i n ϑ i , j s j and ϑ j ( s j ) = [ ϑ j , 1 ( s j , 1 , s j , 2 ) , , ϑ j , n i ( s j ) ] T , where
ϑ j , n i ( s j ) = a j , n i 2 e j , n i s j 2 a j , n i A j , n i + A j , n i e s j , n i A j , n i a j , n i 2 a j , n j A j , n i 2 × β j ( [ B j , 1 1 ( x 1 ) B j , n i 1 ( x j ) ] ) .
According to (3) and (18), the inequality (17) is expressed as:
U i ( s ) j = 1 n S i , j ϑ j ( s j ) ,
where S i , j b i , j ϑ i , j s j / ϑ j s j is a positive constant, and i , j = 1 , 2 , , n .
Assumption 3.
F i ( s i ) is Lipschitz continuous with F i ( 0 ) = 0 , P i ( 0 ) = 0 , G i ( s i ) and i ( s i ) are upper-bounded, then F i ( s i ) f i , m i s i , G i ( s i ) g i , m i , and i ( s i ) η i , m i , U i ( s i ) P i , m i s i , where f i , m i , g i , m i , η i , m i , P i , m i are positive constants. r a n k ( G i ( s i ) ) = m i and G i T ( s i ) i ( s i ) = 0 . Moreover, the modified system (15) is within the manageable range, and s i = 0 is the balance point for (15).
Lemma 1
([32]). s 1 , s 2 R 2 , we have the following condition,
s 1 s 2 ε 1 p 1 p 1 | s 1 | p 1 + 1 p 2 ε 1 p 2 | s 2 | p 2 ,
where ε 1 > 0 , ( p 1 1 ) ( p 2 1 ) = 1 and p 1 , p 2 > 1 .
Remark 3.
The barrier function in Definition 1, which has the following characteristics, ensures that the safety-critical system (15) always satisfies the safety constraints [9,10].
1. 
The state s i of the system is restricted to be bounded, so the system state x i satisfies constraints (8) and (9), i.e.,
B ( z i ; a i , A i ) < + , z i ( a i , A i ) .
2. 
When the system’s state approaches the boundary of the safety area, the barrier function changes as follows:
lim z i a i + B ( z i ; a i , A i ) = , lim z i A i B ( z i ; a i , A i ) = + .
3. 
The barrier function fails to function when the system state reaches equilibrium, i.e.,
B ( 0 ; a i , A i ) = 0 , a i < A i .

3. Decentralized Optimal DSC Design

This section consists of two main subsections to establish the decentralized optimal DSC method. First, the security constraint problem is dealt with through the systematic transformation of the barrier function and the HJB equation for the ith auxiliary subsystem without security constraints is developed by introducing the improved performance function. Finally, the decentralized safety controller is constructed by solving the HJB equation for the auxiliary subsystem.

3.1. Barrier Function Conversion

According to the ith subsystem (15) described, the ith auxiliary subsystem is designed as:
s i ˙ = F i s i + G i s i u i + I n i G i s i G i + s i i s i v i ,
where G i + s i R m i × n i is Moore–Penrose pseudo-reverse. According to Assumptions 2 and 3, the matrix if found to be G i + ( s i ) = G i T s i G i s i 1 G i T s i and G i + ( s i ) i s i = G i T s i G i s i 1 G i T s i i s i = 0 . Then, the auxiliary subsystem (20) is rewritten as:
s i ˙ = F i s i + G i s i u i + i s i v i .
Regarding the converted system (15), analogous to (7), the performance function below is introduced:
V i s i = t e α i τ t π i + γ s i , u i , v i d τ ,
where π i ( s i ) = h i ϑ j 2 ( s i ) and γ s i , u i , v i = s i T Q i s i + W i ( u i ) + ξ i v i T v i , Q i is the positive definition matrix. Furthermore, s i 0 = s i ( 0 ) denotes the initial state, and W i ( u i ) is a non-quadratic utility function that solves the asymmetric input constraint. Then, W i ( u i ) is defined in the following form:
W i ( u i ) = j = 1 m i 2 λ i c i u i , j Ψ 1 ( ( v i c i ) / λ i ) d v i ,
where λ i = ( h i m a x h i m i n ) / 2 and c i = ( h i m a x + h i m i n ) / 2 , and Ψ i ( . ) represent the monotonic odd function, where Ψ i ( 0 ) = 0 . In this paper, without sacrificing generality, Ψ i ( s i ) = ( e s i e s i ) / ( e s i + e s i ) .
Remark 4.
Unlike the traditional form of symmetric input constraints [35], this article considered asymmetric constraints on the controlling inputs [44]. The revised hyperbolic tangent function presented in (22) effectively transforms the asymmetric constrained control problem into an unconstrained control problem by devising different maximum and minimum bounds.
Problem 2.
(Optimal decentralized control problems with asymmetric input constraints) Finding the control policy u i and auxiliary control strategy v i in the ith subsystem, the performance function becomes (22).
Based on the subsystem (21), as well as the performance function (22), the corresponding Hamiltonian is given by:
H ( s i , u i , v i , V i ( s i ) ) = ( V i ( s i ) ) T ( F i ( s i ) + G i ( s i ) u i ( t ) + i ( s i ) v i ) + π i + γ s i , u i , v i α i V i ,
with V i ( s i ) = V i ( s i ) s i .
The optimal performance function is
V i * ( s i ) = min u i , v i Ψ ( Ω i ) V i ( s i ) ,
where Ψ ( Ω i ) is a collection of all acceptable control policies and auxiliary control strategies for Ω i .
Based on Bellman’s optimality principle [31], V i * ( s i ) in (25) satisfies the HJB
min u i , v i Ψ ( Ω i ) H ( s i , u i , v i , V i * ( s i ) ) = 0 ,
where V i * ( s i ) = V i * ( s i ) s i . Then, the optimal control policy and the auxiliary control policy can be derived as follows:
u i * ( s i ) = λ i tanh ( 1 2 λ i G i T ( s i ) V i * ( s i ) ) + c i ,
v i * ( s i ) = 1 2 ξ i i T V i * ( s i ) ,
where c i = [ c 1 , , c m i ] .
Substituting u i * ( s i ) and v i * ( s i ) into (26), the HJB equation is rewritten as:
( V i * ( s i ) ) T F i ( s i ) + ( V i * ( s i ) ) T G i ( s i ) u i * ( s i ) ξ i v i * ( s i ) 2 α i V i * + π i ( s i ) + s i T Q i s i + W i ( u i * ( s i ) ) = 0 ,
with V i * ( 0 ) = 0 .
Through the BF-based system transformation, the decentralized control problem 1 with asymmetric input constraints and security constraints is transformed into an unconstrained optimization problem, i.e., the decentralized control problem 2. Next, the following lemma is discussed to ensure the equivalence between the decentralized control problems 1 and 2.
Lemma 2.
Assume that Assumptions 1 to 3 are met and that control policy u i ( · ) and auxiliary control strategy v i ( · ) solve the decentralized control problem 2 of (21). It follows, then, that the below holds:
1. 
If the initial state x 0 of the interconnected nonlinear safety-critical system (1) is in the range ( a i , k , A i , k ), k = 1 , 2 , , n i , then the closed-loop system satisfies (6).
2. 
If the functions H i ( x ) and Q i ( x ) satisfies the condition H i ( x i ) = Q i ( B i ( x i ) ) = Q i ( s i ) , the performance described in (22) is equivalent to the one in (7).
Proof. 
Both the performance function and Assumption 3 satisfy the observability of zero states, guaranteeing the presence of the safety-optimal performance function V i * ( s i ) . From (24), we obtain V i * ( t ) 0 , which allows us to obtain V i * ( s i ( t ) ) V i * ( s i ( 0 ) ) for all t 0 . Consequently, as stated in Remark 3, if the initial state x i ( 0 ) of the system (21) satisfies the security constraint (6), and V i * ( s i ( 0 ) ) is bounded, then the V i * ( s i ( t ) ) is also bounded. Finally, we obtain
x i , k ( t ) ( a i , k , A i , k ) , k = 1 , 2 , , n i .
Therefore, the given u i * and v i * satisfy the constraints of the decentralized control problem 1.
Now, consider the state transition based on the barrier function described in (13) and (14). Since x i satisfies the constraints given in (8), each element of the state s i = [ B i , 1 ( x i , 1 ) , , B i , k ( x i , k ) ] T is finite. By comparing the performance functions (7) and (22), the equivalence relation J i ( x i ( 0 ) ) = V i ( s i ( 0 ) ) is obtained, provided that H i ( x i ) = Q i ( s i ) . This completes the proof. □

3.2. Designing the Optimal DSC Strategy by Solving n HJB Equations

Throughout this section, we show that the optimal DSC strategies for interconnected nonlinear systems can be constructed by solving the n HJB equations.
Theorem 1.
Consider n subsystems under Assumptions 1 to 3 with DSC policies u i * ( s i ) and auxiliary control strategies v i * ( s i ) , having the corresponding conditions as below:
v i * ( s i ) 2 < s i T Q i s i , t t 0 .
Next, consider n positive constants h i * , i = 1 , 2 , , n , so that for anything h i h i * , the optimal DSC policies u 1 * ( s 1 ) , u 2 * ( s 2 ) , …, u n * ( s n ) guarantee that the interconnected nonlinear system (15) with security constraints is UUB.
Proof. 
The Lyapunov candidacy function L i , 1 ( s ) below was selected:
L i , 1 ( s ) = i = 1 n V i * ( s i ) ,
where the V i * ( s i ) is defined in the same way as (22), and we denote the time derivative along the trajectory s i ˙ = F i ( s i ) + G i ( s i ) u i ( t ) + Y i ( s i ) as:
L ˙ i , 1 ( s ) = i = 1 n ( V i * ) T ( G i ( s i ) u i * + F i ( s i ) + Y i ( s ) ) .
By using (27) and (28), we obtain:
( V i * ( s i ) ) T G i ( s i ) = 2 λ i tanh T ( u i * c i λ i ) ,
( V i * ( s i ) ) T i ( s i ) = 2 ξ i ( v i * ( s i ) ) T .
Inserting (29), (34) and (35) into (33), we have
L ˙ i , 1 ( s ) = i = 1 n [ α i V i * π i ( s i ) s i T Q i s i W i ( u i * ) + ξ i v i * ( s i ) 2 2 ξ i ( v i * ( s i ) ) T U i ( s ) ] .
According to the optimal DSC policy (27), the term W i ( u i * ) becomes
W i ( u i * ( s i ) ) = 2 λ i j = 1 m j 0 u i , j * c i tanh 1 ( u i c i λ i ) d ( u i c i ) .
By appealing to the proof in [44], Equation (37) can be further reduced to
W i ( u i * ( s i ) ) = λ i 2 i = 1 m i ( tanh 1 ( u i , j * c i λ i ) ) β 1 2 λ i 2 j = 1 m i 0 tanh 1 ( u i , j * c i λ i ) ( u i c i ) tanh 2 ( u i c i ) d ( u i c i ) β 2 ,
replacing (38) into (36), one has
L ˙ i , 1 ( s ) i = 1 n ( 2 ξ i ( s i T Q i s i v i * ( s i ) 2 ) ) i = 1 n ( 1 2 ξ i ) ( s i T Q i s i ) i = 1 n ( π i ( s i ) 2 ξ i j = 1 m i v i * ( s i ) b i , j ϑ i , j ( s j ) + ξ 2 v i * ( s i ) 2 ) + α i V i * β 1 + β 2 .
It is known from [45] that there is a positive constant δ i , M such that 0 V i * ( s i ) δ i , M . Therefore, using Lemma 1, Assumption 1, (17), (19), and (27), we obtain
2 β 1 2 λ i 2 tanh T ( u i , j * c i λ i ) tanh 1 ( u i , j * c i λ i ) = 1 2 ( V i * ( s i ) ) T G i ( s i ) G i T ( s i ) ( V i * ( s i ) ) 1 2 G i , m 2 δ i , m 2 ,
Utilizing the integral median theorem [46] and the inequality (40), the β 2 (38) can be deduced as:
β 2 = 2 λ i 2 j = 1 m i tanh 1 ( u i , j * c i λ i ) ϖ i tanh 2 ϖ i 2 λ i 2 j = 1 m i tanh 1 ( u i , j * c i λ i ) ϖ i 2 λ i 2 tanh T ( u i , j * c i λ i ) tanh 1 ( u i , j * c i λ i ) 1 2 G i , m 2 δ i , m 2 ,
where ϖ i ( 0 , tanh 1 ( u i , j * c i λ i ) ) .
From [27], we conclude that α i V i * ( s i ) ϱ i , m , where ϱ i , m is a positive constant. Then, plugging (40) and (41) into (39), and taking into consideration the conclusion mentioned above, we can rephrase inequality (39) as follows:
L ˙ i , 1 ( s ) i = 1 n ( 2 ξ i ( s i T Q i s i v i * ( s i ) 2 ) ) i = 1 n ( 1 2 ξ i ) ( s i T Q i s i ) i = 1 n ( h i ϑ i ( s j ) 2 2 ξ i j = 1 m i v i * ( s i ) b i , j ϑ i , j ( s j ) + ξ 2 v i * ( s i ) 2 ) + ϱ i + 1 4 i = 1 n G i , m 2 δ i , m 2 ,
by denoting Λ = d i a g h 1 , h 2 , , h n and Z = [ ϑ 1 ( s 1 ) , , ϑ n ( s n ) , ξ 1 v 1 * ( s 1 ) , ,   ξ n v n * ( s n ) ] . Let the condition (31) be satisfied, so we have
L ˙ i , 1 ( s ) i = 1 n ( 1 2 ξ i ) ( s i T Q i s i ) Z T X Z + ϱ i + 1 4 i = 1 n G i , m 2 δ i , m 2 ,
with X = Λ A T A I n and A = b 11 b 1 n b n 1 b n n .
From the matrix X expression, positive definiteness is maintained by choosing a sufficiently large Λ . In other words, there is h i * > 0 , such that h i > h i * , ensuring Z T X Z > 0 . Thus, the inequality (43) is further deduced as:
L ˙ i , 1 ( s ) i = 1 n ( 1 2 ξ i ) λ m i n ( Q i ) s i 2 + ϱ i + 1 4 i = 1 n G i , m 2 δ i , m 2 .
The inequality (44) means that L ˙ i , 1 ( s ) < 0 whenever s i ( t ) lies outside the following set N s i :
N s i = s i : s i 1 4 G i , M 2 δ i , M 2 + ϱ i λ m i n ( Q i ) ( 1 2 ξ i ) .
Based on Lyapunov’s extension theorem [47], it is shown that the optimal performance functions V i * ( s i ) guarantee that the interconnected nonlinear system (15) with asymmetric input constraints is UUB. Since the performance function (7) and (22) yield the same results, it can be shown that the optimal performance function J i * ( x i ) guarantees that the interconnected nonlinear safety-critical system (1) with security constraints and asymmetric input constraints is UUB. □

4. Critic Network for Approximation

The critic neural network is introduced in this section, with the aim of approximating the optimal performance function. Then, the evaluation network of the auxiliary subsystem (21) is used to construct the estimated optimal control strategy. According to [48], V i * ( s i ) is expressed as:
V i * ( s i ) = W c i T σ c i ( s i ) + ε c i ( s i ) ,
where σ c i ( s i ) = σ c i , 1 ( s i ) , σ c i , 2 ( s i ) , , σ c i , N i ( s i ) R N i denotes the activation function, W c i R N i denotes the ideal weight vector, N i denotes the number of neurons, and ε c i ( s i ) R N i is the reconstruction error of NN. The vector activation function σ c i , p ( s i ) is denoted as a continuously differentiable function, where p = 1 , 2 , , N i . For s i 0 , σ c i , p ( s i ) p = 1 N i is linearly independent. Then, the derivative of V i * ( s i ) can be expressed as:
V i * ( s i ) = σ c i T ( s i ) W c i + ε c i ( s i ) ,
where σ c i ( s i ) = σ c i ( s i ) s i and ε c i ( s i ) = ε c i ( s i ) s i .
From Equations (27), (28) and (47), the optimal safety control policy u i * ( s i ) and the auxiliary control strategy v i * ( s i ) are rephrased as:
u i * ( s i ) = λ i tanh ( 1 2 λ i G i T ( s i ) σ c i T ( s i ) W c i ) + c d i + ε u i ( s i ) ,
v i * ( s i ) = 1 2 ξ i i T ( s i ) σ c i T ( s i ) W c i + ε v i ( s i ) ,
where
ε u i ( s i ) = 1 2 ( I m i tanh 2 ( ζ ) ) G i T ( s i ) ε c i ( s i ) , ε v i ( s i ) = 1 2 ξ i i T ( s i ) ε c i ( s i ) ,
with I m i = [ 1 , 1 , , 1 ] T R m i . The seclected value of ζ is between 1 2 λ i G i T ( s i ) σ c i T ( s i ) W c i and 1 2 λ i G i T ( s i ) ( σ c i T ( s i ) W c i + ε c i ( s i ) ) .
The ideal weight vector W c i is not available and the optimal control strategy u i * ( s i ) is not directly applicable. Therefore, the estimated weight vector W ^ c i is constructed to replace W c i as:
V ^ i * ( s i ) = W ^ c i T σ c i ( s i ) .
The estimation error W ˜ c i = W c i W ^ c i is defined. Similarly, according to (50), the (49) and (48) are further developed as:
u ^ i ( s i ) = λ i tanh ( 1 2 λ i G i T ( s i ) σ c i T ( s i ) W ^ c i ) + c d i ,
v ^ i ( s i ) = 1 2 ξ i i T ( s i ) σ c i T ( s i ) W ^ c i .
Combining (50), (51) and (52), the Hamiltonian is re-expressed as:
H ( s i , u ^ i , v ^ i , V ^ i ( s i ) ) = ( V ^ i ( s i ) ) T ( G i T ( s i ) u ^ i + F i ( s i ) + i ( s i ) v ^ i ) + π i ( s i ) + γ i ( s i , u ^ i , v ^ i ) α i V ^ i .
According to (53), the error of the Hamiltonian is given by:
e i = H ( s i , u ^ i , v ^ i , V ^ i ( s i ) ) H ( s i , u i * , v i * , V i * ( s i ) ) = π i ( s i ) + s i T Q i s i + W i ( u ^ i ) + ξ i v ^ i T v ^ i + W ^ c i T ϱ i ,
with ϱ i = σ c i ( x i ) ( G i T ( s i ) u ^ i + F i ( s i ) + i ( s i ) v ^ i ) α i σ c i ( s i ) . In order to make u i ( s i ) u i * ( s i ) , the error e i should be guaranteed to be sufficiently small. To solve this issue, a critic weight adjustment law W ^ c i is proposed to minimize the objective function ϕ i = 1 2 e i T e i . Next, the critic updating law is developed as:
W ^ c i = α c i ϱ i e i ( 1 + ϱ i T ϱ i ) 2 ,
where the constant α c i is the positive learning rate.
Remark 5.
To minimize the Hamiltonian error e i , it is necessary to maintain the derivative of ϕ i as ϕ ˙ i < 0 . Therefore, the critic weight adjustment law is derived by employing the normalization term ( 1 + ϱ i T ϱ i ) 2 and applying the gradient descent method with respect to W ^ c i [49].
By considering the definition of W ˜ c i , we obtain
W ˜ ˙ c i = α c i i i T W ˜ c i + α c i i e H i 𝚤 i ,
where i = ϱ i 1 + ϱ i T ϱ i and 𝚤 i = 1 + ϱ i T ϱ i . e H i denotes the residual error, defined as e H i = σ c i ( x i ) ( G i T ( s i ) u ^ i + F i ( s i ) + i ( s i ) v ^ i ) .
The proposed decentralized DSC strategy for the ith subsystem with a single critic-NN is illustrated in Figure 1.

5. Stability Analysis

This section focuses on the stability of the n-auxiliary subsystem for the given control scheme. We need to make some Assumptions to satisfy the theorem.
Assumption 4.
For s i Ω i , i = 1 , , n , there exist some positive constants D ε u i , η i , M , D σ c i , D ε v i and D e H i satisfying ε u i ( s i ) D ε u i , i ( s i ) i , M , σ c i ( s i ) D σ c i , ε v i ( s i ) D ε v i and e H i D e H i .
Assumption 5.
Consider the time period t , t + t k and t k > 0 . Then, the term i i T fulfills the following condition:
ϵ i I N i i i T s i I N i ,
where ϵ i and s i are positive constants.
Theorem 2.
For the nonlinear interconnected safety-critical system (15), we design the estimated optimal safety policies and auxiliary control strategies as (51) and (52), respectively. Assume that Assumptions 1–5 hold. If W ^ c i is updated by (55), then s i and W ^ c i are UUB if α c i in (55) satisfies
α c i > i , M 2 D σ c i 2 ξ i λ m i n ( i i T ) .
Proof. 
The candidate Lyapunov function is considered to be:
L i ( t ) = i = 1 n ( V i * ( s i ) + 1 2 W ˜ c i T W ˜ c i ) .
Then, defining L i , 1 ( t ) = V i * ( s i ) and L i , 2 ( t ) = 1 2 W ˜ c i T W ˜ c i , the time derivative by L i , 1 ( t ) is
L ˙ i , 1 ( t ) = ( V i * ( s i ) ) T ( G i T ( s i ) u ^ i + F i ( s i ) + i ( s i ) v ^ i ) = ( V i * ( s i ) ) T ( G i T ( s i ) u i * + F i ( s i ) + i ( s i ) v i * ) + ( V i * ( s i ) ) T G i T ( s i ) ( u ^ i u i * ) β 3 + ( V i * ( s i ) ) T i ( s i ) ( v ^ i v i * ) β 4 .
Combining (29), (34) and (35). The (60) is further deduced as:
L ˙ i , 1 ( t ) = α i V i * π i ( s i ) s i T Q i s i W i ( u i * ) + ξ i v i * ( s i ) 2 + β 3 + β 4 .
According to Lemma 1, and taking into account (40), (48), (51), we observe that the β 3 term in (61) is satisfied by
β 3 λ i 2 tanh 1 ( u i * ( s i ) c d i λ i ) + u ^ i u i * 2 β 1 + λ i ( tanh ( Y i , 1 ( s i ) ) tanh ( Y i , 2 ( s i ) ) ) ε u i ( s i ) 2 β 5 1 4 G i , M 2 δ i , M 2 + β 5 ,
where Y i ( s i ) = 1 2 λ i G i T ( s i ) V i * ( s i ) . Then, based on the fact tanh ( Y i , k ( s i ) ) m i ,   k = 1 , 2 in [44], according to Assumption 5, β 5 is derived as:
β 5 2 λ i 2 tanh ( Y i , 1 ( s i ) ) tanh ( Y i , 2 ( s i ) ) 2 + 2 ε u i ( s i ) 2 4 λ i 2 ( tanh ( Y i , 1 ( s i ) ) 2 + tanh ( Y i , 2 ( s i ) ) 2 ) + 2 ε u i ( s i ) 2 8 λ i 2 m i + 2 D ε u i 2 .
Similarly, the last term of (61) is deduced from (35), (49) and (52) as:
β 4 ξ i v i * 2 ξ i v ^ i v i * 2 ξ i v i * 2 + 2 ξ i ( v ^ i 2 v i * 2 ) + 2 ξ i ε v i 2 ξ i v i * 2 + 1 2 ξ i i , M 2 D σ c i 2 W ˜ c i 2 + 2 ξ i D ε v i 2 .
By using (38), (62)–(64) and the fact that α i V i * ( s i ) ϱ i , M the following is derived:
L ˙ i , 1 ( t ) λ m i n ( Q i ) s i 2 + 1 2 ξ i i , M 2 D σ c i 2 W ˜ c i 2 + Θ i ,
with Θ i = ϱ i , M + 1 2 G i , M 2 δ i , M 2 + 8 λ i 2 m i + 2 D ε u i 2 + 2 ξ i D ε v i 2 .
The error weight update law W ˜ c i . L i , 1 ( t ) is considered with the time derivative
L ˙ i , 2 ( t ) = α c i W ˜ c i T i i T W ˜ c i + α c i W ˜ c i T i 𝚤 i e H i .
Combining Lemma 1 and Assumption 4, the following conclusion is drawn:
α c i W ˜ c i T i 𝚤 i e H i α c i 2 W ˜ c i T i i T W ˜ c i + α c i 2 D e H i 2 .
Combining inequalities (66) and (67), we derive the following inequalities:
L ˙ i , 2 ( t ) α c i 2 λ m i n ( i i T ) W ˜ c i 2 + α c i 2 D e H i 2 .
Substituting (65) and (68) into (59), the following inequality is obtained:
L ˙ i ( t ) i = 1 n ( λ m i n ( Q i ) s i 2 x i W ˜ c i 2 + Θ i + α c i 2 D e H i 2 ) ,
where x i = α c i 2 λ m i n ( i i T ) 1 2 ξ i i , M 2 D σ c i 2 , λ m i n ( i i T ) means the minimum eigenvalue of i i T .
Therefore, Equations (58) and (69) mean L ˙ i ( t ) < 0 , provided that the parameters s i and W ˜ c i are not in the set of
N i s i : s i 2 Θ i + D e H i 2 2 λ m i n ( Q i ) ,
N W ˜ c i W ˜ c i : W ˜ c i 2 Θ i + D e H i 2 x i .
Introducing Lyapunov’s extension theorem, ref. [47], ensures the stability of the closed-loop system. This proof ensures that the weight estimation error W ˜ c i is UUB. At this point, this completes the proof process. □
Remark 6.
In contrast to techniques that aim to achieve input saturation [10,13], this article proposes an RL technique to solve the optimal DSC problem with safety constraints and asymmetric input constraints. This approach ensures not only the safety of the system but also minimizes the input constraints. Therefore, the developed reinforcement learning technique, based on security constraints and asymmetric input constraints, is better suited for some project applications, particularly for systems where the system state must be globally within the security settings.

6. Simulation Example

In this section, we provide a simulation example to verify the effectiveness of the proposed approach. The simulation involved a dual-linked robotic arm system [42]. The state space model of the system is defined by
x ˙ 1 , 1 = x 1 , 2 , x ˙ 1 , 2 = M 1 G ˜ 1 x 1 , 2 m 1 g ˜ l ˜ 1 G ˜ 1 sin ( x 1 , 1 ) + 1 G ˜ 1 u 1 + h 1 , x ˙ 2 , 1 = x 2 , 2 , x ˙ 2 , 2 = M 2 G ˜ 2 x 2 , 2 m 2 g ˜ l ˜ 2 G ˜ 2 sin ( x 2 , 1 ) + 1 G ˜ 2 u 2 + h 2 ,
where x i , 1 and x i , 2 ( i = 1 , 2 ) indicate the angular location of the robot arm, u i stands for control input, and the h i = η i P i represents the interconnection terms. The other parameters of the robotic arm system (72) are depicted in Table 1. The initial system state was selected as x 0 = [ 2 , 2 , 2 , 2 ] T . We first defined the state variable x i = [ x i , 1 , x i , 2 ] T and constructed the internal dynamics and input gain matrix as follows:
f i ( x i ) = x i , 2 M i G ˜ i x i , 2 m i g ˜ l ˜ i G ˜ i sin ( x i , 1 ) + 0 1 G ˜ i u i + 1 0 P i ( x i ) ,
where P 1 ( x 1 ) , P 2 ( x 2 ) denote the uncertain interconnection terms of subsystems 1 and 2, i.e.,
P 1 ( x 1 ) = 0.1 x 1 , 1 sin ( x 2 , 2 ) , P 2 ( x 2 ) = ( x 1 , 2 3 sin ( 0.1 x 2 , 1 ) ) .
Furthermore, the two robotic arm subsystems were in a state that satisfied the below security constraints:
x 1 , 1 ( 0.5 , 2.9 ) , x 1 , 2 ( 1.5 , 2.5 ) , x 2 , 1 ( 1 , 2.5 ) , x 2 , 2 ( 3.5 , 3 ) .
Therefore, to deal with the security constraint, the following system of transformations without security constraint was obtained, using the BF-based system transformation (13):
s i = F i ( s i ) + G i ( s i ) u i + i ( s i ) U i ,
where
F i ( s i ) = a i , 2 A i , 2 ( e s i , 2 2 e s i , 2 2 ) a i , 2 e s i , 2 2 A i , 2 e s i , 2 2 a i , 1 2 e s i , 1 2 a i , 1 A i , 1 + A i , 1 e s i , 1 A i , 1 a i , 1 2 a 1 A i , 1 2 f i ( B 1 ( s i ) ) a i , 2 2 e s i , 2 2 a i , 2 A i , 2 + A i , 2 e s i , 2 A i , 2 a i , 2 2 a i , 2 A i , 2 2 , G i ( s i ) = 0 1 G ˜ a i , 2 2 e s i , 2 2 a i , 2 A i , 2 + A i , 2 e s i , 2 A i , 2 a i , 2 2 a 2 A i , 2 2 , i ( s i ) = a i , 2 2 e s i , 2 2 a i , 2 A i , 2 + A i , 2 e s i , 2 A i , 2 a i , 2 2 a 2 A i , 2 2 0 .
For the transformed dual-linked robotic arm system (74), the initial state was chosen by s i , 0 = [ s i , 0 ( 1 ) , s i , 0 ( 2 ) ] T = [ B ( x i , 0 ( 1 ) ; a i , 1 , A i , 1 ) , B ( x i , 0 ( 2 ) ; a i , 2 , A i , 2 ) ] T . The discount factors were chosen as α 1 = 1 and α 2 = 0.1 . The matrices were designed as Q 1 = 0.5 I 2 and Q 2 = I 2 , R 1 = 1 and R 2 = 1 . The upper and lower limits were allocated as below: h 1 m a x = 0.75 , h 1 m i n = 0.25 and h 2 m a x = 1.5 , h 2 m i n = 0.5 . Let ϑ 1 = s 1 and ϑ 2 = s 2 . Additional design factors were setup as below: ξ 1 = 8 , ξ 2 = 4 , a c 1 = 2 , a c 2 = 2 . Choose the activation functions σ c i ( s i ) = [ s 1 , 1 2 , s 1 , 1 s 1 , 2 , s 1 , 2 2 ] T and σ c i ( s i ) = [ s 2 , 1 2 , s 2 , 1 s 2 , 2 , s 2 , 2 2 ] T .
The simulation outcomes are presented in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13. The states of the system are depicted in Figure 2 and Figure 8, and it can be observed that the closed-loop system stabilized after 20 s and 35 s, respectively. However, the system failed to meet the specified security constraints. Figure 3 and Figure 9, shown in comparison with Figure 2 and Figure 8, not only assured that the system states converged to zero, but also satisfied the given safety constraints. The evolving states s 1 ( t ) and s 2 ( t ) are presented in Figure 4 and Figure 10, based on the safe control method with asymmetric input constraints. The optimal DSC policies are shown in Figure 5 and Figure 11. We found that the optimal DSC policies were restricted to the asymmetric set [ 0.25 , 0.75 ] and [ 0.5 , 1.5 ] . Figure 6 and Figure 12 represent the optimal auxiliary control strategies for subsystems 1 and 2, respectively. Figure 7 and Figure 13 show the critic updated laws. It can be observed that the weights converged after 15 s. According to Theorem 3, we concluded that the proposed optimal safety control policy and the auxiliary control policy could stabilize the closed-loop nonlinear system and satisfy the safety constraints on the system state. Moreover, the optimal control policy eventually converged to a predefined set of constraints. Finally, the results of the simulation showed that the presented optimal DSC solution for constrained interconnected nonlinear safety-critical systems, affected by system state constraints, is effective.

7. Conclusions

This article presents an RL-based DSC scheme for interconnected nonlinear safety-critical systems with security constraints and asymmetric input constraints. The proposed method transformed an interconnected nonlinear safety-critical system with security and asymmetric input constraints into an interconnected nonlinear safety-critical system with only asymmetric input constraints by using the barrier function. The non-quadratic utility function was added to the performance function to address the asymmetric input constraint. The critic network was also used to approach the optimal performance function and to establish the best security policy. Our control scheme stabilizes the closed-loop system and minimizes the improved performance function. In addition, the simulation results demonstrated the efficacy of the proposed distributed security solution. Future work will explore the optimal safety control of stochastic interconnected nonlinear systems with event triggering.

Author Contributions

C.Q. and Y.W. provided methodology, validation, and writing—original draft preparation; T.Z. provided conceptualization, writing—review; J.Z. provided supervision; C.Q. provided funding support. All authors read and agreed to the published version of the manuscript.

Funding

This work was supported by the science and technology research project of the Henan province 222102240014.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

  1. Son, T.D.; Nguyen, Q. Safety-critical control for non-affine nonlinear systems with application on autonomous vehicle. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 7623–7628. [Google Scholar]
  2. Manjunath, A.; Nguyen, Q. Safe and robust motion planning for dynamic robotics via control barrier functions. In Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, 14–17 December 2021; pp. 2122–2128. [Google Scholar]
  3. Wang, J.; Qin, C.; Qiao, X.; Zhang, D.; Zhang, Z.; Shang, Z.; Zhu, H. Constrained optimal control for nonlinear multi-input safety-critical systems with time-varying safety constraints. Mathematics 2022, 10, 2744. [Google Scholar] [CrossRef]
  4. Liu, Z.; Yuan, Q.; Nie, G.; Tian, Y. A multi-objective model predictive control for vehicle adaptive cruise control system based on a new safe distance model. Int. J. Automot. Technol. 2021, 22, 475–487. [Google Scholar] [CrossRef]
  5. Ames, A.D.; Xu, X.; Grizzle, J.W.; Tabuada, P. Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 2016, 62, 3861–3876. [Google Scholar] [CrossRef]
  6. Qin, C.; Wang, J.; Zhu, H.; Zhang, J.; Hu, S.; Zhang, D. Neural network-based safe optimal robust control for affine nonlinear systems with unmatched disturbances. Neurocomputing 2022, 506, 228–239. [Google Scholar] [CrossRef]
  7. Qin, C.; Wang, J.; Zhu, H.; Xiao, Q.; Zhang, D. Safe adaptive learning algorithm with neural network implementation for H control of nonlinear safety-critical system. Int. J. Robust Nonlinear Control 2023, 33, 372–391. [Google Scholar] [CrossRef]
  8. Srinivasan, M.; Abate, M.; Nilsson, G.; Coogan, S. Extent-compatible control barrier functions. Syst. Control Lett. 2021, 150, 104895. [Google Scholar] [CrossRef]
  9. Yang, Y.; Yin, Y.; He, W.; Vamvoudakis, K.G.; Modares, H. Safety-aware reinforcement learning framework with an actor-critic-barrier structure. In Proceedings of the 2019 American Control Conference (ACC), Philadelphia, PA, USA, 10–12 July 2019; pp. 2352–2358. [Google Scholar]
  10. Yang, Y.; Vamvoudakis, K.G.; Modares, H.; Yin, Y.; Wunsch, D.C. Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5441–5455. [Google Scholar] [CrossRef]
  11. Xu, J.; Wang, J.; Rao, J.; Zhong, Y.; Wang, H. Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. Int. J. Robust Nonlinear Control 2022, 32, 3408–3424. [Google Scholar] [CrossRef]
  12. Brunke, L.; Greeff, M.; Hall, A.W.; Yuan, Z.; Zhou, S.; Panerati, J.; Schoellig, A.P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annu. Rev. Control Robot. Auton. Syst. 2022, 5, 411–444. [Google Scholar] [CrossRef]
  13. Qin, C.; Zhu, H.; Wang, J.; Xiao, Q.; Zhang, D. Event-triggered safe control for the zero-sum game of nonlinear safety-critical systems with input saturation. IEEE Access 2022, 10, 40324–40337. [Google Scholar] [CrossRef]
  14. Bakule, L. Decentralized control: An overview. Annu. Rev. Control. 2008, 32, 87–98. [Google Scholar] [CrossRef]
  15. Xu, L.X.; Wang, Y.L.; Wang, X.; Peng, C. Decentralized Event-Triggered Adaptive Control for Interconnected Nonlinear Systems With Actuator Failures. IEEE Trans. Fuzzy Syst. 2022, 31, 148–159. [Google Scholar] [CrossRef]
  16. Guo, B.; Dian, S.; Zhao, T. Robust NN-based decentralized optimal tracking control for interconnected nonlinear systems via adaptive dynamic programming. Nonlinear Dyn. 2022, 110, 3429–3446. [Google Scholar] [CrossRef]
  17. Feng, Z.; Li, R.B.; Wu, L. Adaptive decentralized control for constrained strong interconnected nonlinear systems and its application to inverted pendulum. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
  18. Zouhri, A.; Boumhidi, I. Stability analysis of interconnected complex nonlinear systems using the Lyapunov and Finsler property. Multimed. Tools Appl. 2021, 80, 19971–19988. [Google Scholar] [CrossRef]
  19. Li, X.; Zhan, Y.; Tong, S. Adaptive neural network decentralized fault-tolerant control for nonlinear interconnected fractional-order systems. Neurocomputing 2022, 488, 14–22. [Google Scholar] [CrossRef]
  20. Tan, Y.; Yuan, Y.; Xie, X.; Tian, E.; Liu, J. Observer-based event-triggered control for interval type-2 fuzzy networked system with network attacks. IEEE Trans. Fuzzy Syst. 2023, 1–10. [Google Scholar] [CrossRef]
  21. Zhang, J.; Li, S.; Ahn, C.K.; Xiang, Z. Adaptive fuzzy decentralized dynamic surface control for switched large-scale nonlinear systems with full-state constraints. IEEE Trans. Cybern. 2021, 52, 10761–10772. [Google Scholar] [CrossRef]
  22. Huo, X.; Karimi, H.R.; Zhao, X.; Wang, B.; Zong, G. Adaptive-critic design for decentralized event-triggered control of constrained nonlinear interconnected systems within an identifier-critic framework. IEEE Trans. Cybern. 2021, 52, 7478–7491. [Google Scholar] [CrossRef]
  23. Bao, C.; Wang, P.; Tang, G. Data-Driven Based Model-Free Adaptive Optimal Control Method for Hypersonic Morphing Vehicle. IEEE Trans. Aerosp. Electron. Syst. 2022, 1–15. [Google Scholar] [CrossRef]
  24. Farzanegan, B.; Suratgar, A.A.; Menhaj, M.B.; Zamani, M. Distributed optimal control for continuous-time nonaffine nonlinear interconnected systems. Int. J. Control 2022, 95, 3462–3476. [Google Scholar] [CrossRef]
  25. Heydari, M.H.; Razzaghi, M. A numerical approach for a class of nonlinear optimal control problems with piecewise fractional derivative. Chaos Solitons Fractals 2021, 152, 111465. [Google Scholar] [CrossRef]
  26. Liu, S.; Niu, B.; Zong, G.; Zhao, X.; Xu, N. Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 2022, 109, 891–909. [Google Scholar] [CrossRef]
  27. Niu, B.; Liu, J.; Wang, D.; Zhao, X.; Wang, H. Adaptive decentralized asymptotic tracking control for large-scale nonlinear systems with unknown strong interconnections. IEEE/CAA J. Autom. Sin. 2021, 9, 173–186. [Google Scholar] [CrossRef]
  28. Zhao, B.; Luo, F.; Lin, H.; Liu, D. Particle swarm optimized neural networks based local tracking control scheme of unknown nonlinear interconnected systems. Neural Netw. 2021, 134, 54–63. [Google Scholar] [CrossRef] [PubMed]
  29. Zhao, Y.; Niu, B.; Zong, G.; Xu, N.; Ahmad, A.M. Event-triggered optimal decentralized control for stochastic interconnected nonlinear systems via adaptive dynamic programming. Neurocomputing 2023, 539, 126163. [Google Scholar] [CrossRef]
  30. Wang, T.; Wang, H.; Xu, N.; Zhang, L.; Alharbi, K.H. Sliding-mode surface-based decentralized event-triggered control of partially unknown interconnected nonlinear systems via reinforcement learning. Inf. Sci. 2023, 641, 119070. [Google Scholar] [CrossRef]
  31. Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
  32. Tang, F.; Niu, B.; Zong, G.; Zhao, X.; Xu, N. Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Netw. 2022, 154, 43–55. [Google Scholar] [CrossRef]
  33. Sun, J.; Liu, C. Backstepping-based adaptive dynamic programming for missile-target guidance systems with state and input constraints. J. Frankl. Inst. 2018, 355, 8412–8440. [Google Scholar] [CrossRef]
  34. Zhao, S.; Wang, J.; Wang, H.; Xu, H. Goal representation adaptive critic design for discrete-time uncertain systems subjected to input constraints: The event-triggered case. Neurocomputing 2022, 492, 676–688. [Google Scholar] [CrossRef]
  35. Liu, C.; Zhang, H.; Xiao, G.; Sun, S. Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input. Neurocomputing 2019, 323, 1–11. [Google Scholar] [CrossRef]
  36. Sun, H.; Hou, L. Adaptive decentralized finite-time tracking control for uncertain interconnected nonlinear systems with input quantization. Int. J. Robust Nonlinear Control 2021, 31, 4491–4510. [Google Scholar] [CrossRef]
  37. Duan, D.; Liu, C. Finite-horizon optimal tracking control for constrained-input nonlinear interconnected system using aperiodic distributed nonzero-sum games. IET Control Theory Appl. 2021, 15, 1199–1213. [Google Scholar] [CrossRef]
  38. Li, Y.; Li, Y.-X.; Tong, S. Event-based finite-time control for nonlinear multi-agent systems with asymptotic tracking. IEEE Trans. Autom. Control 2023, 68, 3790–3797. [Google Scholar] [CrossRef]
  39. Zhang, H.; Zhao, X.; Zong, G.; Xu, N. Fully distributed consensus of switched heterogeneous nonlinear multi-agent systems with bouc-wen hysteresis input. IEEE Trans. Netw. Sci. Eng. 2022, 9, 4198–4208. [Google Scholar] [CrossRef]
  40. Yang, X.; Zhou, Y.; Dong, N.; Wei, Q. Adaptive critics for decentralized stabilization of constrained-input nonlinear interconnected systems. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 4187–4199. [Google Scholar] [CrossRef]
  41. Zhao, Y.; Wang, H.; Xu, N.; Zong, G.; Zhao, X. Reinforcement learning-based decentralized fault tolerant control for constrained interconnected nonlinear systems. Chaos Solitons Fractals 2023, 167, 113034. [Google Scholar] [CrossRef]
  42. Cui, L.; Zhang, Y.; Wang, X.; Xie, X. Event-triggered distributed self-learning robust tracking control for uncertain nonlinear interconnected systems. Appl. Math. Comput. 2021, 395, 125871. [Google Scholar] [CrossRef]
  43. Tang, Y.; Yang, X. Robust tracking control with reinforcement learning for nonlinear-constrained systems. Int. J. Robust Nonlinear Control 2022, 32, 9902–9919. [Google Scholar] [CrossRef]
  44. Yang, X.; Zhao, B. Optimal neuro-control strategy for nonlinear systems with asymmetric input constraints. IEEE/CAA J. Autom. Sin. 2020, 7, 575–583. [Google Scholar] [CrossRef]
  45. Beard, R.W.; Saridis, G.N.; Wen, J.T. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 1997, 33, 2159–2177. [Google Scholar] [CrossRef]
  46. Liu, D.; Yang, X.; Wang, D.; Wei, Q. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 2015, 45, 1372–1385. [Google Scholar] [CrossRef]
  47. Pishro, A.; Shahrokhi, M.; Sadeghi, H. Fault-tolerant adaptive fractional controller design for incommensurate fractional-order nonlinear dynamic systems subject to input and output restrictions. Chaos Solitons Fractals 2022, 157, 111930. [Google Scholar] [CrossRef]
  48. Zhang, L.; Zhao, X.; Zhao, N. Real-time reachable set control for neutral singular Markov jump systems with mixed delays. IEEE Trans. Circuits Syst. II Express Briefs 2021, 69, 1367–1371. [Google Scholar] [CrossRef]
  49. Lakmesari, S.H.; Mahmoodabadi, M.J.; Ibrahim, M.Y. Fuzzy logic and gradient descent-based optimal adaptive robust controller with inverted pendulum verification. Chaos Solitons Fractals 2021, 151, 111257. [Google Scholar] [CrossRef]
Figure 1. The block diagram of the developed optimal DSC scheme.
Figure 1. The block diagram of the developed optimal DSC scheme.
Entropy 25 01158 g001
Figure 2. Evolution of state x 1 ( t ) without using the DSC method.
Figure 2. Evolution of state x 1 ( t ) without using the DSC method.
Entropy 25 01158 g002
Figure 3. Evolution of state x 1 ( t ) using the DSC method.
Figure 3. Evolution of state x 1 ( t ) using the DSC method.
Entropy 25 01158 g003
Figure 4. Evolution of state s 1 ( t ) using the DSC method.
Figure 4. Evolution of state s 1 ( t ) using the DSC method.
Entropy 25 01158 g004
Figure 5. Control evolution of input u 1 .
Figure 5. Control evolution of input u 1 .
Entropy 25 01158 g005
Figure 6. Evolution of the auxiliary control input v 1 using the DSC method.
Figure 6. Evolution of the auxiliary control input v 1 using the DSC method.
Entropy 25 01158 g006
Figure 7. Evolution of the critic weight vector W c 1 using the DSC method.
Figure 7. Evolution of the critic weight vector W c 1 using the DSC method.
Entropy 25 01158 g007
Figure 8. Evolution of state x 2 ( t ) without using the DSC method.
Figure 8. Evolution of state x 2 ( t ) without using the DSC method.
Entropy 25 01158 g008
Figure 9. Evolution of state x 2 ( t ) using the DSC method.
Figure 9. Evolution of state x 2 ( t ) using the DSC method.
Entropy 25 01158 g009
Figure 10. Evolution of state s 2 ( t ) using the DSC method.
Figure 10. Evolution of state s 2 ( t ) using the DSC method.
Entropy 25 01158 g010
Figure 11. Control evolution of input u 2 .
Figure 11. Control evolution of input u 2 .
Entropy 25 01158 g011
Figure 12. Evolution of the auxiliary control input v 2 using the DSC method.
Figure 12. Evolution of the auxiliary control input v 2 using the DSC method.
Entropy 25 01158 g012
Figure 13. Evolution of the critic weight vector W c 2 using the DSC method.
Figure 13. Evolution of the critic weight vector W c 2 using the DSC method.
Entropy 25 01158 g013
Table 1. Meanings and values of symbols used in robotic arm systems.
Table 1. Meanings and values of symbols used in robotic arm systems.
The ith SubsystemParameterMeaningValue
m 1 Mass of payload5 kg
M 1 Viscous friction2 N
The first subsystem l ˜ 1 Length of the arm0.5 m
G ˜ 1 Moment of inertia10 kg
g ˜ 1 Acceleration of gravity9.81 m/s
m 2 Mass of payload10 kg
M 2 Viscous friction2 N
The second subsystem l ˜ 2 Length of the arm1 m
G ˜ 2 Moment of inertia10 kg
g ˜ 2 Acceleration of gravity9.81 m/s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qin, C.; Wu, Y.; Zhang, J.; Zhu, T. Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. Entropy 2023, 25, 1158. https://doi.org/10.3390/e25081158

AMA Style

Qin C, Wu Y, Zhang J, Zhu T. Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. Entropy. 2023; 25(8):1158. https://doi.org/10.3390/e25081158

Chicago/Turabian Style

Qin, Chunbin, Yinliang Wu, Jishi Zhang, and Tianzeng Zhu. 2023. "Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems" Entropy 25, no. 8: 1158. https://doi.org/10.3390/e25081158

APA Style

Qin, C., Wu, Y., Zhang, J., & Zhu, T. (2023). Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems. Entropy, 25(8), 1158. https://doi.org/10.3390/e25081158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop