1. Introduction
Stochastic control problems governed by Itô’s differential equations have been the subject of intensive research over the last decades. This generated a rich literature and fundamental results such as the
and LQ robust sampled-data control problems under a unified framework studied in [
1,
2], classes of uncertain sampled-data systems with random jumping parameters characterized by finite state semi-Markov process analysed in [
3], or stochastic differential games investigated in [
4,
5,
6,
7].
Dynamical games have been used to solve many real life problems (see e.g., [
8]). For example, the concept of Nash equilibrium is very important for dynamical games, where for controlled systems the closed-loop and open-loop equilibria strategies present special interest. Various aspects of open-loop Nash equilibria are studied for a LQ differential game in [
9], other results being reported in [
10,
11,
12]. In addiytion, in [
13] applications to gas network optimisation are studied via open-loop sampled-data Nash equilibrium strategy. The framework in which state vector measurements for a class of differential games are available only at discrete times was first studied in [
14]. There, a two-player differential game was considered, and necessary conditions for the sample data controls were obtained using a backward translation method starting at the last time interval, and following the previous state measurements. This case has been extended to a stochastic framework in [
15], where the players have access to sample-data state information with sampling interval. For other results dealing with closed-loop systems (see, e.g., [
16]). Stochastic dynamical games are an important, but more challenging framework. First introduced in [
17], stochastic LQ problems have been studied extensively (see, [
18,
19]).
In the present paper, we consider stochastic differential games governed by Itô’s differential equation, with state multiplicative and control multiplicative white noise perturbations. The original contributions of this work are the following. First, we analyze the design of a Nash equilibrium strategy in a state feedback form in the class of piecewise constant admissible strategies. It is assumed that the state measurements are available only at some discrete times. The original problem is transformed into an equivalent one which asks to find some existence conditions for a Nash equilibrium strategy in a state feedback form for a LQ stochastic differential game described by a system of Itô differential equations controlled by impulses. Necessary and sufficient conditions for the existence of a Nash equilibrium strategy for the new LQ differential game are obtained based on methods from [
20,
21]. The feedback matrices of the equilibrium strategies for the original dynamical game are obtained from the general result using the structure of the matrix coefficients of the system controlled by impulses. Another major contribution of this paper consists of the numerical methods for computing the feedback matrices of the Nash equilibrium strategy.
To our knowledge, in the stochastic framework, there are few papers dealing with the problem of sampled-data Nash equilibrium strategy in both open-loop and closed-loop forms ([
22,
23]), the papers [
13,
14] mentioned before only considering the deterministic framework. In that case, the problem of sampled-data Nash equilibrium strategy can be transformed in a natural way into a problem stated in discrete-time framework. Such a transformation is not possible when the dynamical system contains state multiplicative and control multiplicative white noise perturbations. In [
15], the stochastic character is due only to the presence of the additive white noise perturbations. In that case, the approach is not essentially different from the one used in the deterministic case.
The paper is organized as follows. In
Section 2, we formulate the problem, introducing the L-players Nash equilibria concept. In
Section 2.2, we state an equivalent form of the original problem and we introduce a system of matrix linear differential equations with jumps and algebraic constraints which is involved in the derivation of the feedback matrices of the equilibrium strategies. Then, in
Section 2.3, we provide some necessary and sufficient conditions which guarantee the existence of a piecewise constant Nash equilibrium strategy. An algorithm implementing these developments is given in
Section 3. The efficiency of the proposed algorithm is demonstrated by two numerical examples illustrating the behavior of the optimal trajectories generated by the equilibrium strategy.
Section 4 is dedicated to conclusions.
2. Problem Formulation
2.1. Model Description and Problem Setting
Consider the controlled system having the state space representation described by
where
is the state vector,
L is a positive integer,
are control parameters, and
is a 1-dimensional standard Wiener process defined on a probability space
.
In the controlled system there are L players () who change their behavior through their control function The matrices of the system and matrices of the players are known. In the field of the game theory, the controls are called admissible strategies (or policies) for the players. The different classes of admissible strategies can be defined in various ways, depending on the available information.
Each player aims to minimize its own cost function (performance criterion), and for
we have
We make the following assumption regarding the weights matrices in (
2):
H. and with and
Here we generalize Definition 2.1 given in [
23].
Definition 1. The L-tuple of admissible strategies is said to achieve a Nash equilibrium for the differential game described by the controlled system (1), the cost function (2), and the class of the admissible strategies , if for all , we have In this paper we consider a special class of closed-loop admissible strategies in which the states
of the dynamical system are available for measurement at the discrete-times
, and the set of admissible strategies consists of piecewise constant stochastic processes of the form
with
are arbitrary matrices.
Our aim is to investigate the problem of designing a Nash equilibrium strategy in the class of piecewise constant admissible strategies of type (
4) (the closed-loop admissible strategies), for a LQ differential game described by a dynamical system of type (
1), under the performance criteria (
2). Moreover, we also present a method for the numerical computation of the feedback gains of the equilibrium strategy.
We denote
the set of the piecewise constant admissible strategies of type (
4).
2.2. The Equivalent Problem
Define
by
,
where
are arbitrary
-dimensional random vectors with finite second moments. If
is the solution of system (
1) determined by the piecewise constant inputs
, we set
.
Direct calculations show that
is the solution of the initial value problem (IVP) associated to a linear stochastic system with finite jumps often called system controlled by impulses:
under the notations:
where
denotes the zero matrix of size
.
The performance criteria (
2) becomes
for all
are
-dimensional random vectors
-measurable such that
Throughout the paper
denotes the
-algebra generated by the random variables
. The matrices in (
7) can be written as
Let
be the set of the inputs of the form of sampled data linear state feedback, i.e.,
if and only if
with
where
are arbitrary matrices and
are the values at the time instants
of the solution of the following IVP:
Let
be a matrix valued sequence of the form
where
are arbitrary matrices. We consider the set
Remark 1. By (9) and (10), there is a one to one correspondence between the sets and . Each from can be identified with the sequence of its feedback matrices. Based on this remark we can rewrite the performance criterion (
7) as:
for all
.
Similarly to Definition 1, one can define a Nash equilibrium strategy for the LQ differential game described by the controlled system (5), the performance criteria (
13) and the class of admissible strategies
described by (
12).
Definition 2. The L-tuple of admissible strategies is said to achieve a Nash equilibrium for the differential game described by the controlled system (5), the cost function (13), and the class of the admissible strategies , if for all , we have Remark 2. - (a)
Based on the Remark 1 we may infer that if is an equilibrium strategy in the sense of the Definition 2, then given by (9) using the matrix components of , provides an equilibrium strategy for the LQ differential game described by (5), (7) and the family of admissible strategies . - (b)
Among the feedback matrices from (9) some have the form: where . Hence, some admissible strategies (9) are of type (4). Hence, if the feedback matrices of the Nash equilibrium strategy have the structure given in (15), then the strategy of type (9) with these feedback matrices provide the Nash equilibrium strategy for the LQ differential game described by (1), (2) and (4).
To obtain explicit formulae for the feedback matrices of a Nash equilibrium strategy of type (
9) (or, equivalently (
11), (
12)), we use the following system of matrix linear differential equations (MLDEs) with jumps and algebraic constraints:
where we have denoted
and
while the superscript † denotes the generalized inverse of a matrix.
Remark 3. A solution of the terminal value problem (TVP) with algebraic constraints (16) is a 2L-uple of the form where, for each , is a solution of the TVP (16a), (16b), (16d) and , . On the interval , is the solution of the TVP described by the perturbed Lyapunov-type equation from (16a) and the terminal value given in (16d). On each interval , , the terminal value of is computed via (16b) together with (17) and (18) provided that to be obtained as solution of (16c). So, the TVPs solved by are interconnected via (16c).
To facilitate the statement of the main result of this section, we rewrite (16c) in a compact form as:
where
and the matrices
and
are obtained using the block components of (16c).
2.3. Sampled Data Nash Equilibrium Strategy
First we derive a necessary and sufficient condition for the existence of an equilibrium strategy of type (
9) for the LQ differential game given by the controlled system (5), the performance criteria (
7) and the set of the admissible strategies
. To this end we adapt the argument used in the proof of ([
22], Theorem 4).
We prove:
Theorem 1. Under the assumption the following are equivalent:
- (i)
the LQ differential game defined by the dynamical system controlled by impulses (5), the performance criteria (7) and the class of the admissible strategies of type (9) has a Nash equilibrium strategy - (ii)
the TVP with constraints (16) has a solution defined on the whole interval and satisfying the conditions below for :
If condition (21) holds, then the feedback matrices of a Nash equilibrium strategy of type (9) are the matrix components of the solution of the TVP (16) and are given by The minimal value of the cost of the k-th player is .
Proof. From (
14) and Remarks 1 and 2(a), one can see that a strategy of type (
9) defines a Nash equilibrium strategy for the linear differential game described by the controlled system (5), the performance criteria (
7) (or equivalently (
13)) if and only if for each
the optimal control problem described by the controlled system
and the quadratic functional
has an optimal control in a state feedback form. The controlled system (23) and the performance criterion (
24) are obtained substituting
,
,
in (5) and (
7), respectively.
and
are computed as in (
17) and (
18), respectively, but with
replaced by
.
To obtain necessary and sufficient conditions for the existence of the optimal control in a linear state feedback form we employ the results proved in [
20]. First, notice that in the case of the optimal control problem (23)–(
24), the TVP (16a), (16b), (16d) plays the role of the TVP (19)–(23) from [
20].
Using Theorem 3 in [
20] in the case of the optimal control problem described by (23) and (
24) we deduce that the existence of the Nash equilibrium strategy of the form (
9) for the differential game described by the controlled system (5), the performance criteria (
7) (or its equivalent form (
13)), is equivalent to the solvability of the TVP described by (16). The feedback matrix
of the optimal control solves the equation:
Substituting the formulae of
in (
25) we deduce that the feedback matrices of the Nash equilibrium strategy solve an equation of the form (16c) written for
instead of
. This equation may be written in the compact form:
where
.
By Lemma 2.7 in [
21] we deduce that the Equation (
26) has a solution if and only if the condition (
21) holds. A solution of the Equation (
26) is given by (
22). The minimal value of the cost for the
k-th player is obtained from Theorem 1 in [
20] applied in the case of the optimal control problem described by (23), (
24). Thus the proof is complete.□
Remark 4. When the matrices are invertible, the conditions (21) are satisfied automatically. In this case, the feedback matrices of a Nash equilibrium strategy of type (20) are obtained as the unique solution of the Equation (22), because in this case, the generalized inverse of each matrix , is the usual inverse. Combining (
6) and (16c), we deduce that the matrices
provided by (
22) have the structure
. Hence, the Nash equilibrium strategy of the differential game described by the dynamical system (5), the performance criteria (
7) and the admissible strategies of type (
9) have the form
Now we obtain the following Nash equilibrium strategy of the differential game.
Theorem 2. Assume that the conditions and (ii) in Theorem 1 are satisfied. Then, a Nash equilibrium strategy in a state feedback form with sampled measurements of type (4) of the differential game described by the dynamical system (1) and the performance criteria (2) are given by: The feedback matrices from (27) are given by the first n columns of the matrices , which are obtained as solutions of Equation (26). In (27), are the values measured at the times , of the solution of the closed-loop system obtained when (27) is plugged into (1). The minimal value of the cost (2) associated to the k-th player is given by In the next section, we present an algorithm which allows the numerical computation of the matrices
arising in (
27) for an LQ differential game with two players.
3. Numerical Computations and the Algorithm
In what follows we assume that and
We propose a numerical approach to compute the optimal strategies
The algorithm consists of two steps:
We first compute the feedback matrices
of the Nash equilibrium strategy, based on the solution
:
STEP 1.A. We take
and compute
with
and
sufficiently large.
For the operator
we have
for all
.
The iterations
are computed from:
for
with
where
or
, respectively.
We compute the feedback matrices
as solutions of the linear equation
Next, we compute
:
and
STEP 2.A. Fix
j such that
. Assuming that
have already been computed for a
,
, we compute
where
is computed as in (
31).
We compute the feedback gains
as solution of the linear equation
STEP 2.B. Setting
,
we compute
as in the formulae below
and
In the second step, the computation of the optimal trajectory involves the initial vector and the equilibrium strategy values
Then, we illustrate the mean squares of the optimal trajectory and of the equilibrium strategy . We set and define .
We have
solves the forward linear differential equation with finite jumps:
, where
Then, we have used the values to make plots
where
such that
and
.
This algorithm enables us to compute the equilibrium strategies values of the players. The experiments illustrate that the optimal strategies are piecewise constant, which seems to indicate that we have a stabilization effect.
Further, we consider two examples for the LQ differential game described by the dynamical system (
1), the performance criteria (
2) and the class of piecewise constant admissible strategies of type (
28).
Example 1. We consider the controlled system (1) in the special form The coefficient matrices are defined as The evolution of the mean square values and of the optimal trajectory (with the initial point ) and the equilibrium strategies and is depicted in Figure 1 on the intervals , and in Figure 2 for , respectively. The values of the optimal trajectory equilibrium strategies of both players are very close to zero in both the short-term and long-term periods. Example 2. We consider the controlled system (1) in the special form and We define the matrix coefficients as follows: The evolution of the mean square values and of the optimal trajectory (with the initial point ) and the equilibrium strategies and on the intervals (Figure 3) and (Figure 4), respectively. The values of the optimal trajectory equilibrium strategies of both players are very close to zero in short-term and long-term period. 4. Concluding Remarks
In this paper, we have investigated the formulation of existence conditions for the Nash equilibria strategy in a state feedback form, in the piecewise constant admissible strategies case. These conditions are expressed through the solvability of the algebraic Equation (
26). The solutions of these equations provide the feedback matrices of the desired Nash equilibrium strategy. To obtain such conditions for the existence of a sampled-data Nash equilibrium strategy, we have transformed the original problem into an equivalent one which requires to find a Nash equilibrium strategy in a state feedback form for a stochastic differential game, in which the dynamic is described by Itô type differential equations controlled by impulses. Unlike for the deterministic case, when the problem of finding of a sampled-data Nash equilibrium strategy can be transformed into an equivalent problem in discrete-time, in the stochastic framework when the controlled system is described by Itô type differential equations, such a transformation to the discrete-time case is not possible. The developments from the present work clarify and extend the results from Section 5 of [
23], where only the particular case
was considered. The key method used for obtaining the feedback matrices of the Nash equilibrium strategy via the Equation (
26) is the solution
of the TVP (16). On each interval
(16a) consists of
L uncoupled backward linear differential equation. The boundary values
are computed via (16d) for
and via (16b) for
. Finally, we gave an algorithm for calculating the equilibrium strategies of the players, and the numerical experiments suggest a stabilization effect.