1. Introduction
Although the theory of nonlinear
control [
1,
2] is well developed and can be considered standardized, developing algorithms for solving this problem that enable practical implementation is a very active area of research. Furthermore, it is well known that the
control problem can be formulated as a two-player zero-sum differential game [
3,
4] with the objective function including a parameter in such a way that the control vector represents the player that minimizes the objective function while the vector of uncertainty represents the player that maximizes the same objective function. All available scientific research is mainly based on two main approaches: the formulation of the problem in the form of linear matrix inequalities (LMI) [
5,
6,
7,
8] or on the determination of the approximate solution of the associated Hamilton–Jacobi–Isaacs (HJI) equation [
9,
10,
11,
12], which is in linear case equivalent to the generalized Riccati equation [
13].
Furthermore, methods for solving the nonlinear
control problem of singular or descriptor systems have also been developed. The state-feedback scheme, impulse controllability and the well-known implicit function theorem for stability analysis for finite-time
control problem of descriptor systems subject to actuator saturation are adopted in [
14], while in [
15] impulse and hybrid controllers are combined, resulting in less conservative stability conditions than in state-feedback control strategy.
In nonlinear programming-based algorithms that have been proposed in the literature, the system dynamics is treated as an equality constraint and is included in the optimization process using the method of Lagrange multipliers. This results in a HJI equation that is very difficult or almost impossible to solve. For this reason, many approximation methods are developed in which actual computational complexity increases with the number of system states which need to be estimated. In [
10,
12], the reviews of reinforcement adaptive learning and adaptive dynamic programming techniques to solve multiplayer games related to
control are given. In [
16], an improved iterative or successive approximation methods for solving the HJI is developed. A game theoretic differential dynamic programming based algorithm for calculating both minimizing and maximizing inputs based on Taylor series expansion of the associated HJI equation around a nominal trajectory is proposed in [
11]. Using a critic neural network with time-varying activation functions, the HJI equation is approximately solved in [
9]. An event-triggering function for two-player zero-sum games in the presence of control constraints is designed in [
17]. Furthermore, in [
13], an iterative procedure to find the stabilizing solution of a set of Riccati equations related to discrete-time
control problems for periodic systems is addressed. The randomized algorithm based on the Ray-Shooting Method for
optimal synthesis of proportional–integral–derivative (PID) controller for linear systems is proposed in [
18].
In the case of the methods and algorithms mentioned above, the application of LMI and solving Riccati equation requires the linearization, and in this case the optimality of solution cannot be guaranteed in all operating states of nonlinear system dynamics. On the other hand, solving the HJI equation can be very complex and therefore difficult to apply in real control tasks.
In this paper, the nonlinear state-feedback control problem is formulated as zero-sum differential game, and we approach its solution without the LMI formalism or the need to approximate the solution of the HJI equation. The main idea of the presented approach is in the application of the conjugate gradient method, where the first-order derivatives are calculated by matrix relations backwards in time, which gives a direct numerical calculation of the control and uncertainty variables that explicitly depend on the system states.
In contrast to the approaches proposed in [
11,
16], which in the case of including a more complex nonlinearity of the dynamic system in the objective function result in a HJI equation so complex that it is practically impossible to solve, in our approach the nonlinear system dynamics is not included in the objective function as an equality constraint, but the state variables, control law and uncertainty are coupled by recursive matrix relations and chain rule for ordered derivatives, which are used to calculate the objective function gradients that appear in conjugate gradient method. Furthermore, in contrast to approaches presented in [
10,
12], which can be computationally expensive since the tuning of neural network weights is based on a method of weighted residuals that includes calculation of Lebesgue integrals and estimation of state variables, in our approach, the computational complexity does not depend on the dimension of the state space since procedure proposed for gradient calculation has a backward in time structure similar to the back propagation through time algorithm.
Bedsides the conjugate gradient method, which is used for computation of saddle point of zero-sum differential game, the quasi-Newton method for -gain minimization, line search method with Wolfe conditions, Adams approximation method for time discretization and the complex-step method for calculation of derivatives are also systematically integrated into an efficient mathematical tool in order to achieve numerical robustness and stability.
The rest of the paper is organized as follows. In
Section 2, the preliminaries of the nonlinear state-feedback
control from the concept of dissipativity point of view and also from the zero-sum differential games point of view are presented. Although theories of these concepts are well known, providing basic terms is necessary to understand the contributions contained in the following sections. A complete framework of the derivation of the algorithmic procedure that optimizes the
-gain of nonlinear dynamic systems without solving the HJI partial differential equation is given in
Section 3. In
Section 4, the proposed algorithmic procedure is evaluated on examples of nonlinear systems for which the input variables can be determined exactly by solving the HJI equation. Finally,
Section 5 concludes the paper.
Notation
The notation used is fairly standard. Matrices and vectors are represented in bold upper and bold lower case, respectively. Scalars are represented in italic lower case. is an identity matrix and is a null matrix. The dimensions of the matrices and vectors can generally be determined trivially by the context. The symbol ∇ stands for the gradient and is considered as a row vector. The symbol denotes transposition.
The
is a operator that stacks the columns of the matrix one underneath the other. The Kronecker product of the two matrices
and
, denoted by
is a
matrix defined by
. The definitions of matrix differentials calculus and the algebras related to Kronecker products can be found in [
19,
20].
The Euclidean norm of the vector is defined as . stands for the standard Lebesgue spaces of vector valued square-integrable and essentially bounded functions mapping an interval to . This space is equipped with an norm defined by . We avoid explicitly showing the dependence of the variables from the time when not needed.
2. Preliminaries and Problem Formulation
In this section, we give a review of the basic terms that include nonlinear state-feedback
control and related differential game theory, and this is mostly based on classic references from this field [
1,
2,
3,
4,
21,
22].
Consider the causal input-affine nonlinear system in the state space defined on a some manifold
in the following form
where
is the state vector,
is the control vector,
is the vector representing internal/external uncertainty belonging to the set
. The output vector
contains all directly measured states of system
. The vector
is the performance variable. Furthermore, the functions
,
,
,
,
are real
-functions of
.
The following assumptions are employed:
Assumption 1. is a unique equilibrium point, with and , of system Σ and for simplicity , .
Assumption 2. The vector and matrix are such that i form all , which implies By introducing Assumption 2, the so-called singular problem is avoided. More details on solving the singular problem can be found in [
23,
24].
Assumption 3. Initial state vector is a priori known.
Furthermore, the following definition is introduced:
Definition 1 (
-gain).
-gain from to of system Σ is the supremum of satisfiesfor some bounded -function such that .
In general, the problem of nonlinear control of system where all state variables are available (measurable or can be estimated) can be formulated as follows:
Problem 1. The problem of optimal nonlinear state-feedback control of system Σ is to determine the control law and the “worst case” such that is minimized.
Assumption 4. The functions and are , .
If for the system
there exists some
that satisfies (
3), then a zero-sum differential game can be defined. The optimal value of this game is given by the following expression:
with dynamic equality constraints (
1) on a finite time horizon
.
The necessary conditions for the saddle point of the zero-sum differential game (
4) follow from the minimum and maximum principles, and are of the form
where
V is a smooth positive semidefinite solution of the HJI partial differential equation
3. Synthesis of the Algorithm for the Solution of the Zero-Sum Differential Game
In this section, an approach of determining control and uncertainty variables for optimal
control (Problem 1) of nonlinear dynamic system
is proposed. The proposed approach does not require solving the HJI Equation (
7), but the solution is reduced to the direct numerical determination of the saddle point of the related zero-sum differential game.
Control and uncertainty variables are approximated by functions with a linear dependence on a finite number of constant parameters. To calculate these parameters, an approach based on the integration of the quasi-Newton method, the conjugate gradient method, the Adams method and the complex-step derivative approximation into one algorithm is proposed. The goal is to obtain a control variable that explicitly depends on the state variables and in that form is simple for practical implementation. Additionally, the aim is to achieve a numerical solution that uniformly converges towards the optimal solution by increasing the order of complexity of the approximation, i.e., by increasing the number of parameters.
Since we introduced the Assumption 4, based on Weierstraß’s theorem ([
25], p. 65) (which refers to polynomial approximation functions) and its generalizations [
26,
27,
28,
29] (which refer to non-polynomial forms of nonlinear approximation functions) on the uniform approximation of smooth functions, there are constants
such that the
i-th component of control and uncertainty vector can be written in the following form:
where
such that
,
. Linear subspaces generated by sets
i
are dense in the Sobolev norm
[
30].
For well chosen functions
and
, we have
where
i
are the approximation errors. It follows that
,
when
,
, respectively, while for fixed
and
, approximation errors are bounded by constants on the compact set. It is well known from approximation theory (see for example [
31,
32]) that it is often possible to determine in advance how many terms of the basis functions expansion should be taken for the required accuracy.
Equations (
8) and (
9) can be written in the following matrix form:
where
In this paper, based on the previous considerations, the problem that needs to be solved can be formulated as follows:
Problem 2. Determine the parameters and such that the -gain of the closed-loop is minimized. In other words, according to Definition 1, solve the following minimax optimization problem which represents a zero-sum differential game, where the minimal -gain is .
First, to propose an algorithm for minimization of the parameter
in (
16), the first-order derivative of the value of the differential game with respect to
is needed. Since
is non-differentiable because it is defined with the min–max operator, the sub-differential is employed. It can easily be shown that sub-differential of
with respect to
is
Next, due to the inaccurate calculation of the previously mentioned derivative, the quasi-Newton method is considered, for which the superlinear convergence and numerical robustness have been proven (see for example [
33,
34,
35,
36]). The quasi-Newton
k-th iteration is defined by
To minimize the
-gain of the system (
15), we propose the following quasi-Newton algorithm.
It is important to note that in the second step of Algorithm 1, unlike other known methods of nonlinear optimization, the dynamics of the system (
15) is not included in the objective function.
Algorithm 1 quasi-Newton method for -gain minimization. |
- Require:
, . - Ensure:
. - 1:
Set . - 2:
Solve zero-sum differential game
where , and are coupled by ( 15). - 3:
- 4:
If then terminate. Otherwise, set and return to step 2.
|
To solve the subproblem in the second step of Algorithm 1, we use the conjugate gradient method as described in the following algorithm.
Note that, in Algorithm 2, the maximization of the objective function is obtained by changing the sign in front of the gradient with respect to
in (
23). Additionally, it should be noted that the strategy for solving Problem 2 proposed by Algorithms 1 and 2 requires appropriate initialization, i.e., appropriate selection of approximation functions and initial parameters
,
and
. If they are inadequate, it cannot be guaranteed that the control law is for the “worst case” of uncertainty. In the general, it is very difficult to guarantee whether a solution for the “worst case” of uncertainty is obtained. However, the conditions that would give recommendations for the initialization of the parameters of the proposed algorithms could be derived by applying the concept of inverse min max optimal control. Solving the problem of inverse min–max optimal control goes beyond the scope of the research presented in this paper and will be part of future research.
Algorithm 2 Conjugate gradient method for saddle point computation. |
- Require:
, , , , , . - Ensure:
, . - 1:
Set . - 2:
Perform a conjugate gradient descent/ascent algorithm in the following form
where is the search direction vector and
- 3:
Determine by applying the line search strategy that satisfies Wolfe’s conditions (see Algorithm 3). - 4:
Determine by applying Dai-Yuan method [ 37]. - 5:
If then terminate. Otherwise, set and return to step 2.
|
Algorithm 3 Line search method with Wolfe conditions. |
- Require:
, , , , , . - Ensure:
. - 1:
Set . - 2:
Choose initial . - 3:
While satisfies Wolfe’s conditions
calculate - (i)
, - (ii)
.
- 4:
Set .
|
Furthermore, the initialization of parameters and is dependent on a specific optimization problem. Through a series of numerical experiments in simulations, we determined that the Dai–Yuan conjugate gradient method reaches a similar level of numerical robustness and accuracy for various initial values of , whereas for the standard gradient algorithm, choosing the initial parameters largely affects the convergence and can cause numerical instabilities.
To determine the convergence step , we use a line search method that satisfies the Wolfe conditions as described in the following algorithm.
Details related to the line search method and Wolfe conditions can be found in [
38].
Gradient Calculation
In this subsection, the relations for calculating the gradients that appear in (
22) i.e., (
23) is derived. In order to perform chain rule for derivatives backward in time, first the system (
15) needs to be rewritten in discrete-time state-space form. For this purpose, we use the multistep Adams method. Compared to the most popular four stage Runge–Kutta method, which requires solving the approximation problem four times at each step, the Adams method requires only one calculation.
The explicit Adams approximation of system (
15) can be formulated as follows. Let the time interval
consists of
for
, such that
is the time step length, and let
be the extended
-dimensional state vector (
n is dimension of state vector
and 4 is order of Adams method). Then, the system can be written in the following discrete-time state-space form:
where the vector
contains a non-zero coefficients of Adams method (see ([
39], page 358)), and where, for simplicity, we introduced
General considerations regarding Adams method are given in [
39], while several novel schemes for multistep Adams method are proposed in [
40].
The discrete-time form of the objective function (
24) is
where for simplicity we introduced
The following is a derivation of recursive matrix relations for calculating using the basic chain rule arithmetic and matrix differentials calculus as well as certain properties of vectorization of matrices and Kronecker algebra. The relations for calculating are obtained in an analogous way.
The gradient of the objective function (
29) with respect to the vector
is
where
Then, an operation of partial derivative with respect to vector
is performed over the expression (
32), which gives:
In (
33) it is necessary to calculate
, which is obtained based on the expression (
27) as follows:
where
The expressions (
34)–(
36) represent recursive matrix relations for
with initial conditions
Note that the functions, , , , , and , which appear in previous expressions, are known from system dynamics and predetermined approximation functions basis and their derivatives can be easily calculated to machine precision by applying a complex-step method.
The first-order derivative of
with respect to
using complex-step approximation is accomplished by approximating its component (let us say the
k-th component) with a complex variable using a Taylor’s series expansion [
41,
42]
where
is unit vector. Taking only the imaginary parts gives
In the above expressions,
i represents an imaginary unit
, while previously with
i we denoted the
i-th time point. For this reason, the
is omitted in these expressions, but this is implied. It can be seen that subtraction does not appear in the numerator of the expression (
39), i.e., the subtractive cancellation problem has been removed. This means that the step
h can be extremely small, so the derivative can be calculated to machine precision.
The implementation of a complex-step method can be very simply achieved by using a high-level programming language that does not require a prior definition of the variable types, but it is possible to define complex variables automatically by applying built-in functions.
The complex-step method for calculating
is described by the pseudocode given in Algorithm 4. The calculation of derivatives of other known functions can be obtained in an analogous manner.
Algorithm 4 Calculation of derivatives at the i-th time point using the complex-step method. |
- Require:
n, h, , - Ensure:
- 1:
- 2:
for to n do - 3:
- 4:
- 5:
for to n do - 6:
- 7:
end for - 8:
- 9:
end for
|
4. Benchmark Examples with Analytic Solution
In this section, the algorithmic procedure described in the previous sections is tested on examples of nonlinear systems where the HJI equation can be solved analytically, thereby exactly determining the control and uncertainty vectors. In this way, it can be directly assessed whether the proposed algorithmic procedure calculates the required solutions with sufficient numerical efficiency in terms of convergence and accuracy. The algorithm is written and implemented in the MATLAB software package. When solving a particular problem, it is necessary to initialize the algorithm in a suitable way.
4.1. Example—First-Order System
Consider a scalar nonlinear system described by the following equation [
21]:
If
, then by solving the corresponding HJI equation (for details see [
21]), the analytical feedbacks are
Approximation functions of control and uncertainty variables are chosen in the following form
i.e., written in the form (
12) and (
13) we have
In this example, we set the following values of the algorithm parameters: we divided the time interval from , to [s] into 100,000 equal subintervals, i.e., the discretization step is [s] and we applied the Adams method of the 4-th order; vector of initial conditions ; initial value of the parameter ; vectors of initial parameters of approximation functions ; stopping criterion of the quasi-Newton method ; stopping criterion of conjugate-gradient method , Dai–Yuan method is used by default, the initial numerical values of the conjugate gradient algorithm parameters are chosen as and , coefficients , and .
We obtain the following parameter
and parameters of approximation functions:
i.e., minimum
-gain is
.
Figure 1 shows the time dependence of the state variable, i.e., the response of the system (
40) from the initial condition
, where the control variable and uncertainty variable are of the form (
42) with parameters (
44). The control and uncertainty variables from (
42) obtained by the proposed algorithm with the parameters from (
44) in comparison with the analytical solutions from the expression (
41) depending on the state variable are shown in
Figure 2.
Figure 3 shows the solutions obtained by the derived algorithm in comparison with the analytical solutions (
41) as a function of time. Based on everything shown, it can be concluded that the numerically obtained solutions approximate the analytical ones well with negligible error.
4.2. Example—Second-Order System
Consider the second order nonlinear system [
43]
If
then by solving the corresponding HJI equation (for details see [
43]), the analytical feedbacks are
Approximation functions of control and uncertainty variables are chosen in the following form:
i.e., written in the form (
12) and (
13), we have
In this example, we set the following values of the algorithm parameters: we divided the time interval from , to [s] into equal subintervals, i.e., the discretization step is [s] and we applied the Adams method of the 4-th order by default; vector of initial conditions ; initial value of the parameter ; vectors of initial parameters of approximation functions ; stopping criterion of the quasi-Newton method ; stopping criterion of conjugate-gradient method , Dai–Yuan method is used by default, the initial numerical values of the conjugate gradient algorithm parameters are chosen as and , coefficients , and .
We obtain the following parameter
and parameters of approximation functions:
i.e., the minimum
-gain is
.
Figure 4 shows the time dependence of the state variables, i.e., the response of the system (
45) from the initial conditions
, where the control variables and uncertainty variable are of the form (
47) with parameters (
49).
Figure 5 shows the solutions obtained by the derived algorithm in comparison with the analytical solutions (
46). In this example, as in the previous, it is evident that the numerical solution approximates the analytical solution well.
5. Conclusions
In this paper, an approach for the solution of a nonlinear control problem is presented. Instead of using the approximation methods for solving the corresponding HJI equation, the solution is obtained by direct numerical calculation of the control and uncertainty variables that explicitly depend on the system states. In order to achieve numerical efficiency, the proposed algorithmic procedure uses quasi-Newton method, conjugate gradient method, line search method with Wolfe conditions, Adams approximation method for time discretization, and complex-step calculation of derivatives. In spite of the fact that the methods used in this paper are known from the references cited, in our approach, they are integrated together to provide a suitable mathematical tool for numerical solution of the zero-sum differential game related to the nonlinear control problem.
The extension of this approach can be continued in two research directions: (i) consider output measurement-feedback optimal control problem, and (ii) the case where the initial state vector is unknown and treated as an uncertainty, i.e., the maximizing “player”. It can be assumed that the proposed strategy can be extended to these two cases without a significant increase in its complexity.