Next Article in Journal
Countermeasures of Double Carbon Targets in Beijing–Tianjin–Hebei Region by Using Grey Model
Next Article in Special Issue
Non-Resonant Non-Hyperbolic Singularly Perturbed Neumann Problem
Previous Article in Journal
Change-Point Detection in Homogeneous Segments of COVID-19 Daily Infection
Previous Article in Special Issue
Regularized Asymptotic Solutions of a Singularly Perturbed Fredholm Equation with a Rapidly Varying Kernel and a Rapidly Oscillating Inhomogeneity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cheap Control in a Non-Scalarizable Linear-Quadratic Pursuit-Evasion Game: Asymptotic Analysis

by
Vladimir Turetsky
1,*,† and
Valery Y. Glizer
2,†
1
Department of Mathematics, Ort Braude College of Engineering, 51 Snunit Str., P.O. Box 78, Karmiel 2161002, Israel
2
The Galilee Research Center for Applied Mathematics, Ort Braude College of Engineering, 51 Snunit Str., P.O. Box 78, Karmiel 2161002, Israel
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2022, 11(5), 214; https://doi.org/10.3390/axioms11050214
Submission received: 27 March 2022 / Revised: 25 April 2022 / Accepted: 27 April 2022 / Published: 5 May 2022

Abstract

:
In this work, a finite-horizon zero-sum linear-quadratic differential game, modeling a pursuit-evasion problem, was considered. In the game’s cost function, the cost of the control of the minimizing player (the minimizer/the pursuer) was much smaller than the cost of the control of the maximizing player (the maximizer/the evader) and the cost of the state variable. This smallness was expressed by a positive small multiplier (a small parameter) of the square of the L 2 -norm of the minimizer’s control in the cost function. Parameter-free sufficient conditions for the existence of the game’s solution (the players’ optimal state-feedback controls and the game value), valid for all sufficiently small values of the parameter, were presented. The boundedness (with respect to the small parameter) of the time realizations of the optimal state-feedback controls along the corresponding game’s trajectory was established. The best achievable game value from the minimizer’s viewpoint was derived. A relation between solutions of the original cheap control game and the game that was obtained from the original one by replacing the small minimizer’s control cost with zero, was established. An illustrative real-life example is presented.

1. Introduction

A cheap control problem is an extremal control problem where a control cost of at least one of the decision makers is much smaller than a state cost in at least one cost function of the problem. Cheap control problems appear in many topics of optimal control and differential game theories. For example, such problems appear in the following topics: (1) regularization of singular optimal controls (see, e.g., [1,2,3,4]); (2) limitation analysis for optimal regulators and filters (see, e.g., [5,6,7]); (3) extremal control problems with high gain control in dynamics (see, e.g., [8,9]); (4) inverse optimal control problems (see, e.g., [10]); (5) robust optimal control of systems with uncertainties/disturbances (see, e.g., [11,12]); (6) guidance problems (see, e.g., [13,14]).
The Hamilton boundary-value problem and the Hamilton–Jacobi–Bellman–Isaacs equation, associated with the cheap control problem by solvability (control optimality) conditions, are singularly perturbed because of the smallness of the control cost.
In the present paper, we considered one class of cheap control pursuit-evasion differential games. Cheap control differential games have been studied in a number of works in the literature (see, e.g., [4,11,12,15,16] and references therein). In most of these studies, the case where a state cost appeared in the integral part of the cost function was treated. This feature allowed (subject to some additional condition on the state cost) the use of the boundary function method [17] for an asymptotic analysis of the corresponding singularly perturbed Hamilton–Jacobi–Bellman–Isaacs equation. Moreover, the time realization of the optimal state-feedback control with the small cost had an impulse-like behaviour, meaning it was unbounded as the control cost tended to zero. To the best of our knowledge, cheap control games, where the time realization of the state-feedback optimal control with the small cost remains bounded as this cost tends to zero, were considered only in a few works and only for specific problem settings. Thus in [13], a pursuit-evasion problem, modeled by a linear-quadratic zero-sum differential game with time-invariant four-dimensional dynamics and scalar controls of the players, was considered. In this game, the control cost of the pursuer was assumed to be small. Moreover, the integral part of the game’s cost function did not contain the state cost. By a linear state transformation, this cheap control game was converted to a scalar linear-quadratic cheap control game. In this scalar game, the time realization of the optimal state-feedback pursuer’s control against a bang–bang evader’s control was analyzed. Sufficient conditions for the boundedness of this time realization for all sufficiently small values of the pursuer’s control cost were derived. In [14], a similar problem was solved in the case where the control costs of both the pursuer and evader were small and had the same order of smallness. In [11], a more general pursuit-evasion problem was studied. This problem was modeled by a linear-quadratic zero-sum differential game with time-dependent six-dimensional dynamics. The controls of both the pursuer and evader were scalar. The costs of these controls were small and had the same order of smallness. The state cost was absent in the integral part of the game’s cost function. This game also allowed a transformation to a scalar linear-quadratic cheap control game. In this scalar game, the time realization of the optimal state-feedback pursuer’s control against an open-loop bounded evader’s control was analyzed. Sufficient conditions, guaranteeing that the time realization satisfied given constraints for all sufficiently small values of the controls’ costs, were obtained. In [12], a robust tracking problem, modeled by a linear-quadratic zero-sum differential game with time-dependent n-dimensional ( n 1 ) dynamics, was analyzed. The controls of both minimizing and maximizing players were vector-valued. The costs of these controls were small and had the same order of smallness. For this game, the limit behaviour of the state-dependent part of the cost function, generated by the optimal state-feedback control of the minimizing player (the minimizer) and any L 2 -bounded open-loop control of the maximizing player (the maximizer), was studied. Sufficient conditions, providing the tendency to zero of this part of the cost function as the small controls’ costs approached zero (the exact tracking), were derived. Subject to these conditions, necessary conditions for the boundedness of the time realization of the optimal state-feedback minimizer’s control for all sufficiently small values of the controls’ costs were obtained.
In the present work, we studied a much more general cheap control linear-quadratic zero-sum differential game than those in [11,13,14]. For this game, an asymptotic analysis of its solution was carried out in the case where the small control’s cost of the minimizer tended to zero. In particular, the asymptotic behavior of the time realizations of both players’ optimal state-feedback controls along the corresponding (optimal) trajectory of the game was analyzed. The boundedness of these time realizations was established for all sufficiently small values of the minimizer’s control cost. Moreover, in contrast to the results of the work [12], the conditions for such boundedness were sufficient and they were not restricted by any other specific conditions, such as the exact tracking in [12].
Also in the present work, we considered one more linear-quadratic zero-sum differential game. This game was obtained from the original cheap control game by replacing the small control cost of the minimizer with zero. This new game was called a degenerate game and was similar to the continuous/discrete time system obtained from a singularly perturbed system by replacing a small parameter of singular perturbation with zero. The relation between the original cheap control game and the degenerate game was established.
This paper is organised as follows. In Section 2, the problems of the paper (the cheap control differential game and the degenerate differential game) are rigorously formulated, main definitions and some preliminary results are presented and the objectives of the paper are stated. In Section 3, the solution of the cheap control differential game is obtained and the asymptotic analysis of this solution is carried out. Section 4 is devoted to deriving the solution of the degenerate differential game. In addition, some relations between the solution of the cheap control differential game and the degenerate differential game are established in this section. In Section 5, based on the theoretical results of the paper, one interception problem in 3D space was studied. Conclusions of the paper are presented in Section 6.

2. Preliminaries and Problem Statement

Consider the controlled system
x ˙ = A ( t ) x + B ( t ) u + C ( t ) v , x ( t 0 ) = x 0 , t [ t 0 , t f ] ,
where x R n , u R r and v R s are the state, the pursuer’s control and the evader’s control, respectively; t 0 is an initial time moment; t f is a final time moment; the matrix-valued functions A ( t ) , B ( t ) and C ( t ) of appropriate dimensions are continuous for t [ t 0 , t f ] . The controls u ( t ) and v ( t ) are assumed to be measurable bounded functions for t [ t 0 , t f ] .
The target set is a linear manifold
T x = x R n : D x + d = 0 ,
where D is a prescribed m × n -matrix ( m < n ) and d R m is a prescribed vector. The objective of the pursuer is to steer the system onto a target set at t = t f , whereas the evader desires to avoid hitting the target set by exploiting feedback strategies u ( t , x ) and v ( t , x ) , respectively.
Let us consider the set U x of all functions u = u ( t , x ) : [ 0 , t f ] × R n R r , which are measurable w.r.t. t [ 0 , t f ] for any fixed x R n and satisfy the local Lipschitz condition w.r.t. x R n uniformly in t [ 0 , t f ] . Similarly, we consider the set V x of all functions v = v ( t , x ) : [ 0 , t f ] × R n R s , which are measurable w.r.t. t [ 0 , t f ] for any fixed x R n and satisfy the local Lipschitz condition w.r.t. x R n uniformly in t [ 0 , t f ] .
Definition 1.
Let us denote by U x the set of all functions u ( t , x ) U x satisfying the following conditions: ( 1 u x ) the initial-value problem (1) for u ( t ) = u ( t , x ) and any fixed v ( t ) L 2 [ 0 , t f ] , R s has the unique absolutely continuous solution x u ( t ) , t [ 0 , t f ] ; ( 2 u x ) u t , x u ( t ) L 2 [ 0 , t f ] , R r .
Also, let us denote by V x the set of all functions v ( t , x ) V x satisfying the following conditions: ( 1 v x ) the initial-value problem (1) for v ( t ) = v ( t , x ) and any fixed u ( t ) L 2 [ 0 , t f ] , R r has the unique absolutely continuous solution x v ( t ) , t [ 0 , t f ] ; ( 2 v x ) v t , x v ( t ) L 2 [ 0 , t f ] , R s .
In what follows, the set U x is called the set of all admissible state-feedback controls (strategies) of the pursuer, while the set V x is called the set of all admissible state-feedback controls (strategies) of the evader.
Below, two differential games modeling this conflict situation are formulated.

2.1. Cheap Control Differential Game

The first is the Cheap Control Differential Game (CCDG) with the dynamics (1) and the cost function
J ˜ α β ( u , v ) = | D x ( t f ) + d | 2 + α t 0 t f | u ( t ) | 2 d t β t 0 t f | v ( t ) | 2 d t ,
where | x | denotes the Euclidean norm of the vector x; α , β > 0 are the penalty coefficients for the players’ control expenditure, and α is assumed to be small. The objectives of the pursuer and the evader were to minimize and to maximize the cost function (3) by u ( · ) U x and v ( · ) V x , respectively.
The CCDG (1), (3) is a zero-sum linear-quadratic differential game (see, e.g., [18,19,20,21,22]).
Definition 2.
Let u ( t , x ) , ( t , x ) [ t 0 , t f ] × R n , be any given admissible pursuer strategy, i.e., u ( · ) U x . Then, the value
J ˜ α β u ( u ( · ) ; t 0 , x 0 ) = sup v ( t ) L 2 [ t 0 , t f ] , R s J ˜ α β u ( · ) , v ( t ) ,
calculated along the corresponding trajectories of the system (1), is called the guaranteed result of the strategy u ( · ) in the CCDG.
The value
J ˜ α β u * ( t 0 , x 0 ) = inf u ( · ) U x J ˜ α β u ( u ( · ) ; t 0 , x 0 )
is called the upper value of the CCDG.
If the infimum value (5) is attained for u ˜ α β 0 ( t , x ) U x , i.e.,
inf u ( · ) U x J ˜ α β u ( u ( · ) ; t 0 , x 0 ) = min u ( · ) U x J ˜ α β u ( u ( · ) ; t 0 , x 0 )
and
u ˜ α β 0 ( t , x ) = arg min u ( · ) U x J ˜ α β u ( u ( · ) ; t 0 , x 0 ) ,
the strategy u ˜ α β 0 ( t , x ) is called the optimal strategy of the pursuer in the CCDG.
Definition 3.
Let v ( t , x ) , ( t , x ) [ t 0 , t f ] × R n , be any given admissible evader strategy, i.e., v ( · ) V x . Then, the value
J ˜ α β v ( v ( · ) ; t 0 , x 0 ) = inf u ( t ) L 2 [ t 0 , t f ] , R r J ˜ α β u ( t ) , v ( · ) ,
calculated along the corresponding trajectories of the system (1), is called the guaranteed result of the strategy v ( · ) in the CCDG.
The value
J ˜ α β v * ( t 0 , x 0 ) = sup v ( · ) V x J ˜ α β v ( v ( · ) ; t 0 , x 0 )
is called the lower value of the CCDG.
If the supremum value (8) is attained for v ˜ α β 0 ( t , x ) V x , i.e.,
sup v ( · ) V x J ˜ α β v ( v ( · ) ; t 0 , x 0 ) = max v ( · ) V x J ˜ α β v ( v ( · ) ; t 0 , x 0 )
and
v ˜ α β 0 ( t , x ) = arg max v ( · ) V x J ˜ α β v ( v ( · ) ; t 0 , x 0 ) ,
the strategy v ˜ α β 0 ( t , x ) is called the optimal strategy of the evader in the CCDG.
Definition 4.
If
J ˜ α β u * ( t 0 , x 0 ) = J ˜ α β v * ( t 0 , x 0 ) J ˜ α β 0 ( t 0 , x 0 ) ,
then it is said that the CCDG has the game value J ˜ α β 0 .

2.2. Singular (Degenerate) Differential Game

In this game the dynamics were the same as in the CCDG, i.e., (1), while the cost function of this game was obtained from (3) by replacing α with zero:
J ˜ β ( u , v ) = | D x ( t f ) + d | 2 β 0 t f | v ( t ) | 2 d t .
The differential game (1), (11) is called the Singular Differential Game (SDG).
Remark 1.
The sets of all admissible state-feedback controls (strategies) of the pursuer and the evader in the SDG are the same as in the CCDG, i.e., U x and V x , respectively. The guaranteed results J ˜ β u ( u ( · ) ; t 0 , x 0 ) and J ˜ β v ( v ( · ) ; t 0 , x 0 ) of any given strategies u ( · ) U x and v ( · ) V x in the SDG are defined similarly to (4) and (7), respectively. Namely,
J ˜ β u ( u ( · ) ; t 0 , x 0 ) = sup v ( t ) L 2 [ t 0 , t f ] , R s J ˜ β u ( · ) , v ( t ) ,
J ˜ β v ( v ( · ) ; t 0 , x 0 ) = inf u ( t ) L 2 [ t 0 , t f ] , R r J ˜ β u ( t ) , v ( · ) .
The upper J ˜ β u * ( t 0 , x 0 ) and lower J ˜ β v * ( t 0 , x 0 ) values of the SDG are defined similarly to (5) and (8), respectively. Namely,
J ˜ β u * ( t 0 , x 0 ) = inf u ( · ) U x J ˜ β u ( u ( · ) ; t 0 , x 0 ) ,
J ˜ β v * ( t 0 , x 0 ) = sup v ( · ) V x J ˜ β v ( v ( · ) ; t 0 , x 0 ) .
If
J ˜ β u * ( t 0 , x 0 ) = J ˜ β v * ( t 0 , x 0 ) J ˜ β * ( t 0 , x 0 ) ,
then J ˜ β * ( t 0 , x 0 ) is called the value of the SDG.
Definition 5.
The sequence of state-feedback controls { u ˜ β , k ( · ) } , u ˜ β , k ( · ) U x , ( k = 1 , 2 , . . . ) , is called minimizing in the SDG if
lim k J ˜ β u ( u ˜ β , k ( · ) ; t 0 , x 0 ) = J ˜ β u * ( t 0 , x 0 ) .
If there exists u ˜ β * ( t , x ) U x , for which the upper value of the SDG is attained, this state-feedback control is called an optimal state-feedback control of the pursuer in the SDG:
u ˜ β * ( t , x ) = arg min u ( · ) U x J ˜ β u ( u ( · ) ; t 0 , x 0 ) .
Definition 6.
The sequence of state-feedback controls { v ˜ β , k ( · ) } , v ˜ β , k ( · ) V x , ( k = 1 , 2 , . . . ) , is called maximizing in the SDG if
lim k J ˜ β v ( v ˜ β , k ( · ) ; t 0 , x 0 ) = J ˜ β v * ( t 0 , x 0 ) .
If there exists v ˜ β * ( t , x ) V x , for which the lower value of the SDG is attained, this state-feedback control is called an optimal state-feedback control of the evader in the SDG:
v ˜ β * ( t , x ) = arg max v ( · ) V x J ˜ β v ( v ( · ) ; t 0 , x 0 ) .
Remark 2.
Since the cost function (11) of the SDG does not contain a quadratic control cost of u, its solution (if it exists) cannot be obtained either by the Isaacs’s MinMax principle or by the Bellman–Isaacs equation method (see [23]). This justified calling this game singular. The CCDG could be considered as a singularly perturbed SDG, whereas the SDG was a degenerate CCDG.

2.3. Reduction of the Games

Let Φ ( t , τ ) be the transition matrix of the homogeneous system x ˙ = A ( t ) x . By applying the state transformation
z = D Φ ( t f , t ) x + d ,
the system (1) is reduced to
z ˙ = H 1 ( t ) u + H 2 ( t ) v , z ( t 0 ) = z 0 , t [ t 0 , t f ] ,
where m × r and m × s matrices H 1 ( t ) and H 2 ( t ) are
H 1 ( t ) = D Φ ( t f , t ) B ( t ) , H 2 ( t ) = D Φ ( t f , t ) C ( t ) ,
z 0 = D Φ ( t f , t 0 ) x 0 + d .
Due to (21), for the reduced system (22), the cost functions (3) and (11) of the CCDG and SDG become
J α β = | z ( t f ) | 2 + α t 0 t f | u ( t ) | 2 d t β t 0 t f | v ( t ) | 2 d t ,
and
J β = | z ( t f ) | 2 β t 0 t f | v ( t ) | 2 d t ,
respectively.
The games (22), (25) and (22), (26) are called the Reduced Cheap Control Differential Game (RCCDG) and the Reduced Singular Differential Game (RSDG), respectively.
Let us consider the set U z of all functions u = u ( t , z ) : [ 0 , t f ] × R m R r , which are measurable w.r.t. t [ 0 , t f ] for any fixed z R m and satisfy the local Lipschitz condition w.r.t. z R m uniformly in t [ 0 , t f ] . Similarly, we consider the set V z of all functions v = v ( t , z ) : [ 0 , t f ] × R m R s , which are measurable w.r.t. t [ 0 , t f ] for any fixed z R m and satisfy the local Lipschitz condition w.r.t. z R m uniformly in t [ 0 , t f ] .
Definition 7.
Let us denote by U z the set of all functions u ( t , z ) U z satisfying the following conditions: ( 1 u z ) the initial-value problem (22) for u ( t ) = u ( t , z ) and any fixed v ( t ) L 2 [ 0 , t f ] , R s has the unique absolutely continuous solution z u ( t ) , t [ 0 , t f ] ; ( 2 u z ) u t , z u ( t ) L 2 [ 0 , t f ] , R r .
In addition, let us denote by V z the set of all functions v ( t , z ) V z satisfying the following conditions: ( 1 v z ) the initial-value problem (22) for v ( t ) = v ( t , z ) and any fixed u ( t ) L 2 [ 0 , t f ] , R r has the unique absolutely continuous solution z v ( t ) , t [ 0 , t f ] ; ( 2 v x ) v t , z v ( t ) L 2 [ 0 , t f ] , R s .
In what follows, the set U z is called the set of all admissible state-feedback controls (strategies) of the pursuer in both games RCCDG and RSDG, while the set V z is called the set of all admissible state-feedback controls (strategies) of the evader in both games RCCDG and RSDG.
Remark 3.
Based on Definition 7, the guaranteed results J α β u ( u ( · ) ; t 0 , z 0 ) and J α β v ( v ( · ) ; t 0 , z 0 ) of any given strategies u ( · ) U z and v ( · ) V z in the RCCDG are defined similarly to (4) and (7), respectively. The upper J α β u * ( t 0 , z 0 ) and lower J α β v * ( t 0 , z 0 ) values of the RCCDG are defined similarly to (5) and (8), respectively. The optimal state-feedback controls of the pursuer u α β 0 ( t , z ) and the evader v α β 0 ( t , z ) , ( t , z ) [ 0 , t f ] × R m , are defined similarly to (6) and (9), respectively. The value of the RCCDG J α β 0 ( t 0 , z 0 ) is defined similarly to (10).
Remark 4.
Based on Definition 7, the guaranteed results J β u ( u ( · ) ; t 0 , z 0 ) and J β v ( v ( · ) ; t 0 , z 0 ) of any given strategies u ( · ) U z and v ( · ) V z in the RSDG are defined similarly to (12) and (13), respectively. The upper J β u * ( t 0 , z 0 ) and lower J β v * ( t 0 , z 0 ) values of the RSDG are defined similarly to (14) and (15), respectively. The minimizing sequence { u β , k ( · ) } , u β , k ( · ) U z , ( k = 1 , 2 , . . . ) , and the optimal state-feedback control u β * ( t , z ) of the pursuer in the RSDG are defined similarly to (17) and (18), respectively. The maximizing sequence { v β , k ( · ) } , v β , k ( · ) V z , ( k = 1 , 2 , . . . ) , and the optimal state-feedback control v β * ( t , z ) of the evader in the RSDG are defined similarly to (19) and (20), respectively. The value of the RSDG J β * ( t 0 , z 0 ) is defined similarly to (16).
Remark 5.
If u α β 0 ( t , z ) and v α β 0 ( t , z ) are the optimal strategies of the pursuer and the evader in the RCCDG, then the strategies
u α β 0 t , D Φ ( t f , t ) x + d and v α β 0 t , D Φ ( t f , t ) x + d ,
are optimal strategies of the pursuer and the evader in the CCDG.
If { u β , k ( t , z ) } k = 1 + and { v β , k ( t , z ) } k = 1 + are the minimizing sequence and the maximizing sequence in the RSDG, then the sequences
u β , k t , D Φ ( t f , t ) x + d k = 1 + and v β , k t , D Φ ( t f , t ) x + d k = 1 +
are minimizing and maximizing sequences in the SDG. Moreover, if u β * ( t , z ) and v β * ( t , z ) are the optimal strategies of the pursuer and the evader in the RSDG, then the strategies
u β * t , D Φ ( t f , t ) x + d and v β * t , D Φ ( t f , t ) x + d ,
are optimal strategies of the pursuer and the evader in the SDG.

2.4. Objectives of the Paper

In this paper, we investigated the asymptotic behaviour of the solution to the RCCDG and the relation between the RCCDG and the RSDG solutions. In particular, the objectives of the paper were:
(1)
to establish the boundedness of the time realizations u α β 0 ( t ) = u α β 0 t , z α β 0 ( t ) , v α β 0 ( t ) = v α β 0 t , z α β 0 ( t ) of the RCCDG optimal strategies along the corresponding trajectory z α β 0 ( t ) of (22) for α 0 ;
(2)
to establish the best achievable RCCDG value from the pursuer’s point of view:
J best 0 ( t 0 , z 0 ) = inf α ( 0 , α 0 ] J α β 0 ( t 0 , z 0 ) ,
where α 0 > 0 is some sufficiently small number;
(3)
to obtain the RSDG value, and establish the limiting relation between the values of the RCCDG and the RSDG:
lim α 0 J α β 0 ( t 0 , z 0 ) = J β * ( t 0 , z 0 ) ;
(4)
to construct the RSDG pursuer’s minimizing sequence u β , k ( · ) k = 1 + and the evader’s optimal state-feedback control v β * ( · ) based on the RCCDG solution.

3. The RCCDG Solution and Its Asymptotic Properties

By virtue of [19,20,21,22], we obtained the RCCDG solution:
J α β 0 ( t 0 , z 0 ) = z 0 T R α β ( t 0 ) z 0 ,
u α β 0 ( t , z ) = 1 α H 1 T ( t ) R α β ( t ) z ,
v α β 0 ( t , z ) = 1 β H 2 T ( t ) R α β ( t ) z ,
where the matrix-valued function R α β ( t ) is the solution of the Riccati matrix differential equation
R ˙ = R Q α β ( t ) R , R ( t f ) = I m , t [ t 0 , t f ] ,
Q α β ( t ) = 1 α H 1 ( t ) H 1 T ( t ) 1 β H 2 ( t ) H 2 T ( t ) ,
H T denotes a transposed matrix and I m is the unit m × m -matrix.
The solution of (35) is readily obtained:
R α β ( t ) = S α β 1 ( t ) , t [ t 0 , t f ] ,
if and only if the matrix
S α β ( t ) = I m + t t f Q α β ( τ ) d τ
is invertible for all t [ t 0 , t f ] .
Thus, the RCCDG is solvable if and only if
det S α β ( t ) 0 , t [ t 0 , t f ] .
Condition S. The system (22) is controllable with respect to u ( t ) at any interval [ t , t f ] , t [ t 0 , t f ) .
Remark 6.
By using the t-dependent controllability gramians
G 1 ( t ) = t t f H 1 ( τ ) H 1 T ( τ ) d τ , t [ t 0 , t f ) ,
Condition S can be rewritten [18] as
det G 1 ( t ) > 0 , t [ t 0 , t f ) .
The following statement is a direct consequence of (Theorem 3.1 [24]).
Proposition 1.
Let Condition S hold. Then, for any β > 0 there exists α ˜ = α ˜ ( β ) such that the condition (39) holds for all α > 0 satisfying
α α ˜ .
Let z α β 0 ( t ) denote the optimal motion of (22) for u = u α β 0 ( t , z ) , v = v α β 0 ( t , z ) .
Proposition 2.
Let Condition S hold. Then, there exists the bounded limit function
z ˜ ( t ) = lim α 0 z α β 0 ( t ) , t [ t 0 , t f ] ,
which is independent of β. Moreover
lim α 0 z α β 0 ( t f ) = z ˜ ( t f ) = 0 .
Proof. 
Let α > 0 satisfy (42). By substituting the optimal strategies (33) and (34) into the system (22), due to (36), (37) and (38), the dynamics become
z ˙ = Q α β ( t ) R α β ( t ) z .
Define
y R α β ( t ) z = I m + t t f Q α β ( τ ) d τ 1 z .
Then,
y ˙ = I m + t t f Q α β ( τ ) d τ 1 Q α β ( t ) I m + t t f Q α β ( τ ) d τ 1 z +
I m + t t f Q α β ( τ ) d τ 1 Q α β ( t ) I m + t t f Q α β ( τ ) d τ 1 z = 0 ,
yielding
y ( t ) c = const , t [ t 0 , t f ] .
For t = t 0 ,
y ( t 0 ) = c = I m + t 0 t f Q α β ( τ ) d τ 1 z 0 .
Thus, due to (46) and (48), the solution z α β 0 ( t ) of (45) is
z α β 0 ( t ) = I m + t t f Q α β ( τ ) d τ I m + t 0 t f Q α β ( τ ) d τ 1 z 0 .
Due to (36) and (40),
z α β 0 ( t ) = I m + 1 α G 1 ( t ) 1 β t t f H 2 ( τ ) H 2 T ( τ ) d τ I m + 1 α G 1 ( t 0 ) 1 β t 0 t f H 2 ( τ ) H 2 T ( τ ) d τ 1 z 0 .
By factoring 1 α out of both matrices, (51) becomes
z α β 0 ( t ) = α I m α β t t f H 2 ( τ ) H 2 T ( τ ) d τ + G 1 ( t ) α I m α β t t f H 2 ( τ ) H 2 T ( τ ) d τ + G 1 ( t 0 ) 1 z 0 .
Since the gramian G 1 ( t 0 ) is non-singular, the limit (43) is readily calculated for t [ t 0 , t f ] :
lim α 0 z α β 0 ( t ) = G 1 ( t ) G 1 1 ( t 0 ) z 0 z ˜ ( t ) .
For t = t f , (51) is
z α β 0 ( t f ) = α α I m α β t 0 t f H 2 ( τ ) H 2 T ( τ ) d τ + G 1 ( t 0 ) 1 z 0 ,
and
lim α 0 z α β 0 ( t f ) = 0 .
Since G 1 ( t f ) = 0 , (53) yields
z ˜ ( t f ) = 0 .
Equations (55) and (56) prove (44). This completes the proof of the proposition. □
Proposition 3.
Let Condition S hold. Then the time realizations u α β 0 ( t ) = u α β 0 ( t , z α β 0 ( t ) ) , v α β 0 ( t ) = v α β 0 ( t , z α β 0 ( t ) ) of the optimal strategies (33)–(34) are bounded for α 0 .
Proof. 
By substituting (50) into (33), by using (36) and (40), and by factoring 1 α out of the matrix, the time realization of the RCCDG optimal minimizer’s strategy is
u α β 0 ( t ) = H 1 T ( t ) α I m + G 1 ( t 0 ) α β t 0 t f H 2 ( τ ) H 2 T ( τ ) ( τ ) d τ 1 z 0 .
Thus, for any β > 0 , there exists the bounded limit function
lim α 0 u α β 0 ( t ) = H 1 ( t ) G 1 1 ( t 0 ) z 0 u ˜ ( t ) , t [ t 0 , t f ] .
Similarly, the time realization the RCCDG optimal maximizer’s strategy is
v α β 0 ( t ) = α β H 2 T ( t ) α I m + G 1 ( t 0 ) α β t 0 t f H 2 ( τ ) H 2 T ( τ ) ( τ ) d τ 1 z 0 .
yielding
lim α 0 v α β 0 ( t ) = 0 v ˜ ( t ) , t [ t 0 , t f ] .
Proposition 4.
Let Condition S hold. Then the feedback strategies (33) and (34) are well defined for α = 0 for all ( t , z ) [ t 0 , t f ) × R m .
Proof. 
Similarly to (57), by factoring 1 α from the gain of the strategy (33),
u α β 0 ( t , z ) = H 1 T ( t ) α I m + G 1 ( t ) α β t t f H 2 ( τ ) H 2 T ( τ ) ( τ ) d τ 1 z ,
which is well defined for α = 0 , ( t , z ) [ t 0 , t f ) × R m :
lim α 0 u α β 0 ( t , z ) = K ˜ ( t ) z u ˜ ( t , z ) ,
where
K ˜ ( t ) = H 1 T ( t ) G 1 1 ( t ) .
Similarly to (59),
v α β 0 ( t , z ) = α β H 2 T ( t ) α I m + G 1 ( t ) α β t t f H 2 ( τ ) H 2 T ( τ ) ( τ ) d τ 1 z ,
yielding
lim α 0 v α β 0 ( t , z ) = 0 v ˜ ( t , z ) ,
for all ( t , z ) [ t 0 , t f ) × R m . □
Remark 7.
Due to (40), the gain (63) of the limit feedback u ˜ ( t , z ) is infinite for t t f :
lim t t f | | K ˜ ( t ) | | = ,
where | | · | | is the Euclidean norm of a matrix.
Remark 8.
The limit motion z ˜ ( t ) given in (53) is generated by the limit feedback strategies u ˜ ( t , z ) and v ˜ ( t , z ) ) given in (62) and (65), respectively. Moreover, their time realizations along z ˜ ( t ) are equal to u ˜ ( t ) and v ˜ ( t ) given in (58) and (60), respectively:
u ˜ ( t , z ˜ ( t ) ) = u ˜ ( t ) , v ˜ ( t , z ˜ ( t ) ) = v ˜ ( t ) .
Proposition 5.
Let Condition S hold. Then for any β > 0 , the RCCDG game value satisfies
lim α 0 J α β 0 ( t 0 , z 0 ) = 0 .
Moreover, all the terms of the optimal cost function (25) tend to zero for α 0 :
lim α 0 | z α β 0 ( t f ) | 2 = 0 ,
lim α 0 α t 0 t f | u α β 0 ( t ) | 2 d t = 0 ,
lim α 0 β t 0 t f | v α β 0 ( t ) | 2 d t = 0 .
Proof. 
By factoring 1 α from the matrix R α β ( t ) ,
J α β 0 ( t 0 , z 0 ) = α z 0 T α I m + G 1 ( t 0 ) α β t 0 t f H 2 ( τ ) H 2 T ( τ ) ( τ ) d τ 1 z 0 .
Since the matrix G 1 ( t 0 ) is non-singular, (72) directly leads to (68).
The limiting Equation (69) is the consequence of (55); (70) holds, because, due to Proposition 3, the limit time realization of the minimizer’s optimal strategy is bounded; (71) follows from (60). □
Corollary 1.
Let Condition S hold. Then,
J best 0 ( t 0 , z 0 ) = 0 .
Proof. 
First of all, let us note that, due to Remark 6, the matrix G 1 ( t 0 ) is positive definite. Therefore, using (72), we can conclude the following. There exists a positive number α 0 α ˜ such that, for all α ( 0 , α 0 ] ,
J α β 0 ( t 0 , z 0 ) 0 .
This inequality, along with the equality (68), directly yields the statement of the corollary. □

4. RSDG Solution

Lemma 1.
Let Condition S hold. Then, there exists a positive number α 0 < α ˜ , such that for all α ( 0 , α 0 ] the guaranteed result J β u ( u α β 0 ( · ) ; t 0 , z 0 ) of the pursuer’s state-feedback control u α β 0 ( t , z ) in the RSDG satisfies the inequality
0 J β u ( u α β 0 ( · ) ; t 0 , z 0 ) a α ,
where a > 0 is some value independent of α.
Proof. 
First of all, let us remember that u α β 0 ( t , z ) is the optimal pursuer’s control in the RCCDG, and this control is given by Equation (33). Taking into account Remark 4 and Equation (26), the guaranteed result of this control in the RSDG is calculated as follows:
J β u ( u α β 0 ( · ) ; t 0 , z 0 ) = sup v ( t ) L 2 [ t 0 , t f ] , R s J β ( u α β 0 ( · ) , v ( · ) ) = sup v ( t ) L 2 [ t 0 , t f ] , R s | z ( t f | 2 β t 0 t f | v ( t ) | 2 d t
along trajectories of the system
z ˙ = H 1 ( t ) u α β 0 ( t , z ) + H 2 ( t ) v ( t ) , t [ t 0 , t f ] , z ( 0 ) = z 0 .
For any v ( t ) L 2 [ t 0 , t f ] , R s , we have the inequality
| z ( t f ) | 2 β t 0 t f | v ( t ) | 2 d t | z ( t f | 2 + α t 0 t f | u α β 0 ( t , z ) | 2 d t β t 0 t f | v ( t ) | 2 d t
along trajectories of the system (77). Therefore,
0 sup v ( t ) L 2 [ t 0 , t f ] , R s | z ( t f | 2 β t 0 t f | v ( t ) | 2 d t sup v ( t ) L 2 [ t 0 , t f ] , R s | z ( t f | 2 + α t 0 t f | u α β 0 ( t , z ) | 2 d t β t 0 t f | v ( t ) | 2 d t .
Since u α β 0 ( t , z ) is the optimal state-feedback control in the RCCDG, then using the form of the cost function in this game (see Equation (25)) and the definition of the value in this game (see Remark 3), we directly have
sup v ( t ) L 2 [ t 0 , t f ] , R s | z ( t f | 2 + α t 0 t f | u α β 0 ( t , z ) | 2 d t β t 0 t f | v ( t ) | 2 d t = J α β 0 ( t 0 , z 0 ) .
Remember that J α β 0 ( t 0 , z 0 ) is the RCCDG value given by Equation (32).
Further, using Equations (76), (80) and the inequality (79), we obtain immediately
0 J β u ( u α β 0 ( · ) ; t 0 , z 0 ) J α β 0 ( t 0 , z 0 ) .
Now, the statement of the lemma directly follows from Equation (72) and the inequality (81). □
Consider the following admissible state-feedback control of the maximizing player (the evader) in the RSDG:
v ¯ 0 ( t , z ) 0 , ( t , z ) [ t 0 , t f ] × R m .
Lemma 2.
Let Condition S hold. Then, the guaranteed result J β v ( v ¯ 0 ( · ) ; t 0 , z 0 ) of v ¯ 0 ( t , z ) in the RSDG is
J v ( v ¯ 0 ( · ) ; t 0 , z 0 ) = 0 .
Proof. 
Substituting v ( t ) = v ¯ 0 ( t , z ) into the system (22) and the cost function (26) yields the following system and cost function:
z ˙ = H 1 ( t ) u , z ( t 0 ) = z 0 , t [ t 0 , t f ] ,
J ¯ ( u ( · ) ) = J β ( u ( · ) , v ¯ 0 ( · ) ) = | z ( t f ) | 2 .
Therefore, J v ( v ¯ 0 ( · ) ; t 0 , z 0 ) is the infimum value with respect to u ( t ) L 2 [ t 0 , t f ] , R r of the cost function (85) along trajectories of the system (84), i.e.,
J v ( v ¯ 0 ( · ) ; t 0 , z 0 ) = inf u ( · ) L 2 [ t 0 , t f ] , R r J ¯ ( u ( · ) ) .
The optimal control problem (84) and (85) is singular (see, e.g., [3]), and the value (86) can be derived similarly to this work. To do this, first, we replaced approximately the singular problem (84) and (85) with the regular optimal control problem consisting of the system (84) and the new cost function
J ¯ α ( u ( · ) ) = | z ( t f ) | 2 + α t 0 t f | u ( t ) | 2 d t
to be minimized by u ( · ) L 2 [ t 0 , t f ] , R r along trajectories of the system (84). In (87), α > 0 is a small parameter of the regularization.
For any given α > 0 , the problem in (84), (87) is a linear-quadratic optimal control problem. By virtue of the results of [25], we directly have that the solution (the optimal control) of this problem is u ¯ α 0 ( t ) = ( 1 / α ) H 1 T ( t ) R ¯ α ( t ) z ¯ α ( t ) , and the optimal value of its function has the form
J ¯ α 0 = J ¯ α ( u ¯ α 0 ( · ) ) = z 0 T R ¯ α ( t 0 ) z 0 ,
where the m × m -matrix-valued function R ¯ α ( t ) is the solution of the terminal-value problem
R ¯ ˙ α = 1 α R ¯ α H 1 ( t ) H 1 T ( t ) R ¯ α , t [ t 0 , t f ] , R ¯ α ( t f ) = I m ,
the vector-valued function z ¯ α ( t ) is the solution of the initial-value problem
z ¯ ˙ α = 1 α H 1 ( t ) H 1 T ( t ) R ¯ α ( t ) z ¯ α , t [ t 0 , t f ] , z ( t 0 ) = z 0 .
Using Remark 6, we obtain the unique solution of the problem (89) as follows:
R ¯ α ( t ) = I m + 1 α G 1 ( t ) 1 , t [ t 0 , t f ] ,
where the m × m -matrix-valued function G 1 ( t ) is given in Remark 6 (see (40) for t [ t 0 , t f ] ).
Substituting (91) into (88), we obtain after some rearrangement
J ¯ α 0 = α z 0 T α I m + G 1 ( t 0 1 z 0 ,
yielding the following inequality for all sufficiently small α > 0 :
0 J ¯ α 0 c α ,
where c > 0 is some value independent of α .
Using Equation (88) and inequality (93), we obtain for all sufficiently small α > 0 :
0 inf u ( · ) L 2 [ t 0 , t f ] , R r J ¯ ( u ( · ) ) J ¯ ( u ¯ α 0 ( · ) ) J ¯ α ( u ¯ α 0 ( · ) ) = J ¯ α 0 c α ,
yielding
0 inf u ( · ) L 2 [ t 0 , t f ] , R r J ¯ ( u ( · ) ) c α .
The latter implies immediately
inf u ( · ) L 2 [ t 0 , t f ] , R r J ¯ ( u ( · ) ) = 0
which, along with Equation (86), proves the statement of the lemma. □
Theorem 1.
Let Condition S hold. Then, the RSDG value J β * ( t 0 , z 0 ) exists and
J β * ( t 0 , z 0 ) = 0 .
Proof. 
Let J β u * ( t 0 , z 0 ) and J β v * ( t 0 , z 0 ) be the upper and lower values of the RSDG, respectively. Then, due to the definitions of these values (see Remark 4), we have
J β u * ( t 0 , z 0 ) J β u ( u α β 0 ( · ) ; t 0 , z 0 ) , α ( 0 , α 0 ] ,
J β v ( v ¯ 0 ( · ) ; t 0 , z 0 ) J β v * ( t 0 , z 0 ) ,
J β v * ( t 0 , z 0 ) J β u * ( t 0 , z 0 ) .
Now, using the equality (83) and the inequalities (75), (95)–(97) yield
0 = J β v ( v ¯ 0 ( · ) ; t 0 , z 0 ) J β v * ( t 0 , z 0 ) J β u * ( t 0 , z 0 ) J β u ( u α β 0 ( · ) ; t 0 , z 0 ) a α , α ( 0 , α 0 ] .
The latter implies
0 J β v * ( t 0 , z 0 ) J β u * ( t 0 , z 0 ) a α , α ( 0 , α 0 ] .
From (99), for α 0 , we directly have J β v * ( t 0 , z 0 ) = J β u * ( t 0 , z 0 ) = 0 , which proves the theorem. □
Corollary 2.
Let Condition S hold. Then,
J best 0 ( t 0 , z 0 ) = J β * ( t 0 , z 0 ) .
Proof. 
The statement of the corollary directly follows from Theorem 1 and Equation (73). □
Corollary 3.
Let Condition S hold. Then, the limit Equality (31) is valid.
Proof. 
The statement of the corollary is a direct consequence of Equations (68) and (94). □
By { α k } k = 1 + , we denote a sequence of numbers, satisfying the following conditions: (I) α k ( 0 , α 0 ] , ( k = 1 , 2 , . . . ) ; (II) lim k + α k = 0 .
Theorem 2.
Let Condition S hold. Then, the sequence of the pursuer’s state-feedback controls u α k β 0 ( t , z ) k = 1 + is the minimizing sequence in the RSDG. The state-feedback control v ¯ 0 ( t , z ) , given by (82), is the optimal evader’s strategy in the RSDG.
Proof. 
From the chain of the equality and the inequalities (98) we obtain
lim k + J β u ( u α k β 0 ( · ) ; t 0 , z 0 ) = J β u * ( t 0 , z 0 ) ,
meaning the validity of the first statement of the theorem.
Similarly, we have
J β v ( v ¯ 0 ( · ) ; t 0 , z 0 ) = J β v * ( t 0 , z 0 ) ,
which implies the validity of the second statement of the theorem. □
Remark 9.
It should be noted that the optimal evader’s strategy v ¯ 0 ( t , z ) in the RSDG coincides with the limit (as α 0 ) of the optimal evader’s strategy in the RCCDG for all ( t , z ) [ t 0 , t f ) × R m (see Proposition 4 and Equation (65)). Also, it should be noted that the limit (as k + ) of the minimizing sequence u α k β 0 ( t , z ) k = 1 + in the RSDG is u ¯ ( t , z ) for all ( t , z ) [ t 0 , t f ) × R m (see Proposition 4 and Equations (62) and (63)). However, the function u ¯ ( t , z ) does not belong to the set U z . Therefore, this function does not belong to the set U z , i.e., it is not an admissible pursuer’s state-feedback control in the RSDG.

5. Example: Interception Problem in Three-Dimensional Space

5.1. Engagement Model and Its Reduction

Consider the engagement in 3D space of two flying vehicles (the interceptor or the pursuer and the target or the evader), which has similar geometry to that considered in [26,27]. In contrast to [26,27], we assumed that both the pursuer and the evader have first-order dynamics controllers. Two mutually perpendicular control channels could have different time constants: τ p 1 , τ p 2 for the pursuer’s controller and τ e 1 , τ e 2 for the evader’s one.
The equations of motion were written down in the line-of-sight coordinate system where the axis X was the initial line-of-sight, the plane X Y was the collision plane determined by the initial line-of-sight and the target’s velocity vectors and the plane X Z was normal to X Y .
Let ( X p , Y p , Z p ) and ( X e , Y e , Z e ) be the coordinates of the interceptor (the pursuer) and the target (the evader), respectively. The relative separations in the Y and Z-directions were Y = Y p Y e and Z = Z p Z e . By linearization along the initial line-of-sight, the equations of motion were written down in the form (1) where the state vector was
x = ( Y , Y ˙ , Y ¨ p , Y ¨ e , Z , Z ˙ , Z ¨ p , Z ¨ e ) T ,
the players’ control vectors (lateral acceleration commands) were u = ( u 1 , u 2 ) T (for the pursuer) and v = ( v 1 , v 2 ) T (for the evader); the final time t f was the time of achieving the zero distance between the players along the axis X. The matrices in (1) were
A ( t ) 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 / τ p 1 0 0 0 0 0 0 0 0 1 / τ e 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 / τ p 2 0 0 0 0 0 0 0 0 1 / τ e 2 ,
B ( t ) 0 0 0 0 1 / τ p 1 0 0 0 0 0 0 0 0 1 / τ p 2 0 0 , C ( t ) 0 0 0 0 0 0 1 / τ e 1 0 0 0 0 0 0 0 0 1 / τ e 2 .
In the pursuit problem, the target set was x 1 = Y = 0 , x 5 = Z = 0 , meaning that in (2),
D = 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 , d = 0 0 .
Thus, in this example, n = 8 , r = s = m = 2 .
The transition matrix of the homogeneous system was readily obtained as
Φ ( t f , t ) = Φ 1 ( t f , t , τ p 1 , τ e 1 ) O 4 O 4 Φ 1 ( t f , t , τ p 2 , τ e 2 ) ,
where O 4 is the zero 4 × 4 matrix,
Φ 1 ( t f , t , τ p , τ e ) = 1 t f t h ( t , τ p ) h ( t , τ e ) 0 1 τ p 1 e ϑ ( t , τ p ) τ e 1 e ϑ ( t , τ e ) 0 0 e ϑ ( t , τ p ) 0 0 0 0 e ϑ ( t , τ e ) ,
ϑ ( t , τ ) t f t τ ,
h ( t , τ ) τ 2 e ϑ ( t , τ ) + ϑ ( t , τ ) 1 .
Then, by applying the transformation (21) with D and d as in (106), the original system was reduced to the two-dimensional system of the form (22), where
H 1 ( t ) = h ( t , τ p 1 ) 0 0 h ( t , τ p 2 ) , H 2 ( t ) = h ( t , τ e 1 ) 0 0 h ( t , τ e 2 ) .
Explicitly, the system (22) became
z ˙ 1 = h ( t , τ p 1 ) u 1 + h ( t , τ e 1 ) v 1 , z 1 ( t 0 ) = z 0 1 , t [ t 0 , t f ] , z ˙ 2 = h ( t , τ p 2 ) u 2 + h ( t , τ e 2 ) v 2 z 2 ( t 0 ) = z 0 2 , t [ t 0 , t f ] .

5.2. Reduced Cheap Control Game

In this example, the RCCDG cost function (25) is
J α β = z 1 2 ( t f ) + z 2 2 ( t f ) + α t 0 t f u 1 2 ( t ) + u 2 2 ( t ) d t β t 0 t f v 1 2 ( t ) + v 2 2 ( t ) d t .
Due to (111), the gramian (40) is calculated as
G 1 ( t ) = t t f h 2 ( η , τ p 1 ) d η 0 0 t t f h 2 ( η , τ p 2 ) d η ,
and
det G 1 ( t ) = t t f h 2 ( η , τ p 1 ) d η t t f h 2 ( η , τ p 2 ) d η .
For all τ > 0 , we have that h ( t , τ ) > 0 , t [ t 0 , t f ) , and h ( t f , τ ) = 0 . Therefore, the condition (41), and, consequently, Condition S hold.
Due to the symmetry of the matrices (111), the matrix (37) is also symmetric:
R α β ( t ) = r α β 1 ( t ) 0 0 r α β 2 ( t ) ,
where
r α β i ( t ) = 1 1 + 1 α t t f h 2 ( η , τ p i ) d η 1 β t t f h 2 ( η , τ e i ) d η , i = 1 , 2 .
Thus, the RCCDG is solvable if
1 + 1 α t t f h 2 ( η , τ p i ) d η 1 β t t f h 2 ( η , τ e i ) d η > 0 , t [ t 0 , t f ] , i = 1 , 2 .
Similarly to [24], it is proved that the solvability condition (118) yields the value α ˜ in (42) as
α ˜ = min { α ˜ 1 , α ˜ 2 } ,
where
α ˜ i = α ˜ i ( β ) = μ i ( β ) β , β < t 0 t f h 2 ( η , τ e i ) d η , + , β t 0 t f h 2 ( η , τ e i ) d η , i = 1 , 2 ,
μ i ( β ) = 1 max t [ t 0 , t ¯ i ] F i ( t , β ) , i = 1 , 2 ,
F i ( t , β ) = t t ¯ i ( β ) h 2 ( η , τ e i ) d η t t f h 2 ( η , τ p i ) d η , i = 1 , 2 ,
the moments t ¯ i ( β ) ( t 0 , t f ) , i = 1 , 2 , satisfy
t ¯ i ( β ) t f h 2 ( η , τ e i ) d η = β , i = 1 , 2 .
By using (32)–(34) and (116), the solution of the game (112) and (113) is
J α β 0 ( t 0 , z 0 ) = r 1 ( t 0 ) z 0 1 2 + r 2 ( t 0 ) z 0 2 2 ,
u α β 0 ( t , z ) = 1 α h ( t , τ p 1 ) r 1 ( t ) z 1 , h ( t , τ p 2 ) r 2 ( t ) z 2 T ,
v α β 0 ( t , z ) = 1 β h ( t , τ e 1 ) r 1 ( t ) z 1 , h ( t , τ e 2 ) r 2 ( t ) z 2 T .
Let us consider the numerical example for t 0 = 0 s, t f = 3 s, β = 0.1 , τ p 1 = τ p 2 = 0.1 s, τ e 1 = 0.15 s, τ e 2 = 0.2 s. For these parameters,
β = 0.1 < t 0 t f h 2 ( η , τ e 1 ) d η = 0 3 h 2 ( η , 0.15 ) d η = 0.1737 ,
β = 0.1 < t 0 t f h 2 ( η , τ e 2 ) d η = 0 3 h 2 ( η , 0.2 ) d η = 0.293 .
In this example, the moments, defined by (123), are t ¯ 1 = 0.4792 s, t ¯ 2 = 0.8443 s (see Figure 1).
In Figure 2, the functions F i ( t , β ) , given by (122), are shown for t [ t 0 , t ¯ i ] , i = 1 , 2 . It is seen that these functions were decreasing. Therefore,
μ i = 1 F 1 ( 0 , β ) = 1.1035 , μ 2 = 1 F 2 ( 0 , β ) = 0.4214 .
Due to (119) and (120), α ˜ = β min { μ 1 , μ 2 } = 0.04214 .
In Figure 3 and Figure 4, the components of the optimal trajectories z α β 0 ( t ) are shown for decreasing values of α < α ˜ , along with the components of the corresponding limiting function z ˜ ( t ) . It is clearly seen that the optimal trajectories tended to z ˜ ( t ) for α 0 , and z α β 0 ( t f ) tended to zero.
The respective components of time realizations of the optimal strategies u α β 0 ( · ) and v α β 0 ( · ) , along with the components of the corresponding limiting functions u ˜ ( t ) and v ˜ ( t ) , are depicted in Figure 5, Figure 6, Figure 7 and Figure 8, respectively. It is seen that the time realizations of the optimal strategies tended to the corresponding limiting functions for α 0 , remaining bounded.
The game value J α β 0 ( t 0 , z 0 ) is depicted in Figure 9 as a function of α . It is seen that it tended to zero for α 0 .
The respective terminal and integral terms of the cost function are shown in Figure 10 and Figure 11, respectively. It is seen that all components of the optimal cost tended to zero for α 0 .
Remark 10.
From Equation (125), it was seen that the small control cost of the interceptor yielded the high gain in its optimal state-feedback control. This important feature of the interceptor’s optimal state-feedback control increased considerably the ability of the interceptor to capture the target. One more important feature of the interceptor’s optimal state-feedback control was that the time realization of this control along the optimal interception’s trajectory and, especially, the trajectory itself, were bounded while the small parameter α tended to zero. Both aforementioned features of the interceptor’s state-feedback control, obtained by solution of the cheap control game, were extremely important in various real-life situations of a capture of a maneuverable flying target by a maneuverable flying interceptor. It should be noted that if the small control cost of the interceptor tended to zero, the ability of the interceptor to capture the target increased tending to the best achievable result, which was the zero-miss distance at the end of the interception.

6. Conclusions

In this paper, a pursuit-evasion problem, modeled by a finite-horizon linear-quadratic zero-sum differential game, was considered. In the game’s cost function, the penalty coefficient for the minimizing player’s control expenditure was a small value α > 0 . Thus, the considered game was a zero-sum differential game with a cheap control of the minimizing player. By the proper state transformation, the initially formulated game was converted to a smaller Euclidean dimension differential game, called the reduced game. This game, also was a cheap control game and it was treated in the sequel of the paper. Due to the game’s solvability conditions, the solution of the reduced cheap control game was converted to the solution of the terminal-value problem for the matrix Riccati differential equation. Sufficient condition for the existence of the solution to this terminal-value problem in the entire interval of the game’s duration was presented, and the solution of this terminal-value problem was obtained. Using this solution, the value of the reduced cheap control game, as well as the optimal state-feedback controls of the minimizing player (the pursuer) and the maximizing player (the evader), were derived. The trajectory of the game, generated by the optimal players’ state-feedback controls, (the optimal trajectory), was obtained. The limits of the optimal trajectory, as well as of the time realizations of the players’ optimal state-feedback controls along the optimal trajectory, for α 0 were calculated. By this calculation, the boundedness of the optimal trajectory and the corresponding time realizations of the players’ optimal state-feedback controls for α 0 were shown. The limit of the game value for α 0 also was calculated, yielding the best achievable game value from the pursuer’s viewpoint. Along with the cheap control game, its degenerate version was considered. This version was obtained from the cheap control game by setting there formally α = 0 , yielding the new zero-sum linear-quadratic pursuit-evasion game. This new game was singular, because it could not be solved either by the Isaacs’s MinMax principle or by the Bellman–Isaacs equation method. For this singular game, the notion of the pursuer’s minimizing sequence of state-feedback controls (instead of the pursuer’s optimal state-feedback control) was proposed. It was established that the α -dependent pursuer’s optimal state-feedback control in the cheap control game constituted the pursuer’s minimizing sequence of state-feedback controls (as α 0 ) in the singular game. It was shown that the limit of this minimizing sequence was not an admissible pursuer’s state-feedback control in the singular game. However, the evader’s optimal state-feedback control and the value of the singular game coincided with the limits (for α 0 ) of the evader’s optimal state-feedback control and the value, respectively, of the cheap control game. Based on the theoretical results of the paper, the interception problem in 3D space, modeled by a zero-sum linear-quadratic game with the eight-dimensional dynamics, was studied. Similarly to the theoretical part of the paper, the case of the small penalty coefficient α > 0 for the pursuer’s (interceptor’s) control expenditure in the cost function was considered. By proper linear state transformation, the original cheap control game was reduced to the new cheap control game with the two-dimensional dynamics. The asymptotic behaviour of the solution to this new game for α 0 was analyzed.

Author Contributions

The authors contributed equally to this article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bell, D.J.; Jacobson, D.H. Singular Optimal Control Problems; Academic Press: Cambridge, MA, USA, 1975. [Google Scholar]
  2. Kurina, G.A. A degenerate optimal control problem and singular perturbations. Soviet Math. Dokl. 1977, 18, 1452–1456. [Google Scholar]
  3. Glizer, V.Y. Stochastic singular optimal control problem with state delays: Regularization, singular perturbation, and minimizing sequence. SIAM J. Control Optim. 2012, 50, 2862–2888. [Google Scholar] [CrossRef]
  4. Shinar, J.; Glizer, V.Y.; Turetsky, V. Solution of a singular zero-sum linear-quadratic differential game by regularization. Int. Game Theory Rev. 2014, 16, 1–32. [Google Scholar] [CrossRef]
  5. Kwakernaak, H.; Sivan, R. The maximally achievable accuracy of linear optimal regulators and linear optimal filters. IEEE Trans. Autom. Control 1972, 17, 79–86. [Google Scholar] [CrossRef] [Green Version]
  6. Braslavsky, J.H.; Seron, M.M.; Mayne, D.Q.; Kokotović, P.V. Limiting performance of optimal linear filters. Automatica 1999, 35, 189–199. [Google Scholar] [CrossRef]
  7. Seron, M.M.; Braslavsky, J.H.; Kokotović, P.V.; Mayne, D.Q. Feedback limitations in nonlinear systems: From Bode integrals to cheap control. IEEE Trans. Autom. Control 1999, 44, 829–833. [Google Scholar] [CrossRef] [Green Version]
  8. Kokotović, P.V.; Khalil, H.K.; O’Reilly, J. Singular Perturbation Methods in Control: Analysis and Design; Academic Press: London, UK, 1986. [Google Scholar]
  9. Young, K.D.; Kokotović, P.V.; Utkin, V.I. A singular perturbation analysis of high-gain feedback systems. IEEE Trans. Autom. Control 1977, 22, 931–938. [Google Scholar] [CrossRef]
  10. Moylan, P.J.; Anderson, B.D.O. Nonlinear regulator theory and an inverse optimal control problem. IEEE Trans. Autom. Control 1973, 18, 460–465. [Google Scholar] [CrossRef]
  11. Turetsky, V.; Glizer, V.Y. Robust solution of a time-variable interception problem: A cheap control approach. Int. Game Theory Rev. 2007, 9, 637–655. [Google Scholar] [CrossRef]
  12. Turetsky, V.; Glizer, V.Y.; Shinar, J. Robust trajectory tracking: Differential game/cheap control approach. Int. J. Systems Sci. 2014, 45, 2260–2274. [Google Scholar] [CrossRef]
  13. Turetsky, V.; Shinar, J. Missile guidance laws based on pursuit—Evasion game formulations. Automatica 2003, 39, 607–618. [Google Scholar] [CrossRef]
  14. Turetsky, V. Upper bounds of the pursuer control based on a linear-quadratic differential game. J. Optim. Theory Appl. 2004, 121, 163–191. [Google Scholar] [CrossRef]
  15. Petersen, I.R. Linear-quadratic differential games with cheap control. Syst. Control Lett. 1986, 8, 181–188. [Google Scholar] [CrossRef]
  16. Glizer, V.Y. Asymptotic solution of zero-sum linear-quadratic differential game with cheap control for the minimizer. NoDEA Nonlinear Diff. Equ. Appl. 2000, 7, 231–258. [Google Scholar] [CrossRef]
  17. Vasil’eva, A.B.; Butuzov, V.F.; Kalachev, L.V. The Boundary Function Method for Singular Perturbation Problems; SIAM Books: Philadelphia, PA, USA, 1995. [Google Scholar]
  18. Bryson, A.; Ho, Y. Applied Optimal Control; Hemisphere: New York, NY, USA, 1975. [Google Scholar]
  19. Zhukovskii, V.I. Analytic design of optimum strategies in certain differential games. I. Autom. Remote Control 1970, 4, 533–536. [Google Scholar]
  20. Krasovskii, N.N.; Subbotin, A.I. Game-Theoretical Control Problems; Springer: New York, NY, USA, 1988. [Google Scholar]
  21. Basar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory; Academic Press: London, UK, 1992. [Google Scholar]
  22. Petrosyan, L.A.; Zenkevich, N.A. Game Theory; World Scientific Publishing Company: Singapore, 2016. [Google Scholar]
  23. Isaacs, R. Differential Games; John Wiley: New York, NY, USA, 1965. [Google Scholar]
  24. Turetsky, V. Robust route realization by linear-quadratic tracking. J. Optim. Theory Appl. 2016, 170, 977–992. [Google Scholar] [CrossRef]
  25. Kalman, R.E. Contributions to the Theory of Optimal Control. Bol. Soc. Mat. Mex. 1960, 5, 102–119. [Google Scholar]
  26. Shinar, J.; Gutman, S. Three-Dimensional Optimal Pursuit and Evasion with Bounded Controls. IEEE Trans. Autom. Control 1980, 25, 492–496. [Google Scholar] [CrossRef]
  27. Shinar, J.; Medinah, M.; Biton, M. Singular surfaces in a linear pursuit-evasion game with elliptical vectograms. J. Optim. Theory Appl. 1984, 43, 431–458. [Google Scholar] [CrossRef]
Figure 1. Moments t ¯ i ( β ) .
Figure 1. Moments t ¯ i ( β ) .
Axioms 11 00214 g001
Figure 2. Functions F i ( t , β ) .
Figure 2. Functions F i ( t , β ) .
Axioms 11 00214 g002
Figure 3. Trajectories z α β 1 0 ( t ) and limiting function z ˜ 1 ( t ) .
Figure 3. Trajectories z α β 1 0 ( t ) and limiting function z ˜ 1 ( t ) .
Axioms 11 00214 g003
Figure 4. Trajectories z α β 2 0 ( t ) and limiting function z ˜ 2 ( t ) .
Figure 4. Trajectories z α β 2 0 ( t ) and limiting function z ˜ 2 ( t ) .
Axioms 11 00214 g004
Figure 5. Time realizations u α β 1 0 ( t ) and limiting function u ˜ 1 ( t ) .
Figure 5. Time realizations u α β 1 0 ( t ) and limiting function u ˜ 1 ( t ) .
Axioms 11 00214 g005
Figure 6. Time realizations u α β 2 0 ( t ) and limiting function u ˜ 2 ( t ) .
Figure 6. Time realizations u α β 2 0 ( t ) and limiting function u ˜ 2 ( t ) .
Axioms 11 00214 g006
Figure 7. Time realizations v α β 1 0 ( t ) and limiting function v ˜ 1 ( t ) .
Figure 7. Time realizations v α β 1 0 ( t ) and limiting function v ˜ 1 ( t ) .
Axioms 11 00214 g007
Figure 8. Time realizations v α β 2 0 ( t ) and limiting function v ˜ 2 ( t ) .
Figure 8. Time realizations v α β 2 0 ( t ) and limiting function v ˜ 2 ( t ) .
Axioms 11 00214 g008
Figure 9. The game value.
Figure 9. The game value.
Axioms 11 00214 g009
Figure 10. The terminal term of the cost function.
Figure 10. The terminal term of the cost function.
Axioms 11 00214 g010
Figure 11. Integral terms of the cost function.
Figure 11. Integral terms of the cost function.
Axioms 11 00214 g011
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Turetsky, V.; Glizer, V.Y. Cheap Control in a Non-Scalarizable Linear-Quadratic Pursuit-Evasion Game: Asymptotic Analysis. Axioms 2022, 11, 214. https://doi.org/10.3390/axioms11050214

AMA Style

Turetsky V, Glizer VY. Cheap Control in a Non-Scalarizable Linear-Quadratic Pursuit-Evasion Game: Asymptotic Analysis. Axioms. 2022; 11(5):214. https://doi.org/10.3390/axioms11050214

Chicago/Turabian Style

Turetsky, Vladimir, and Valery Y. Glizer. 2022. "Cheap Control in a Non-Scalarizable Linear-Quadratic Pursuit-Evasion Game: Asymptotic Analysis" Axioms 11, no. 5: 214. https://doi.org/10.3390/axioms11050214

APA Style

Turetsky, V., & Glizer, V. Y. (2022). Cheap Control in a Non-Scalarizable Linear-Quadratic Pursuit-Evasion Game: Asymptotic Analysis. Axioms, 11(5), 214. https://doi.org/10.3390/axioms11050214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop