1. Introduction
The goal of this paper is to find sufficient and necessary optimality conditions in terms of a stochastic maximum principle (SMP) for a set of admissible controls
which minimize payoff functionals of the form
w.r.t. admissible controls
u, for some given functions
and
, under dynamics driven by a pure jump process
x with state space
whose jump intensity under the probability measure
is of the form
for some given functions
and
, as long as the intensities are predictable. Due to the dependence of the intensities on the mean of (a function of)
under
, the process
x is commonly called a nonlinear Markov chain or Markov chain of mean-type, although it does not satisfy the standard Markov property, as explained in the seminal paper by McKean [
1] for diffusion processes. The dependence of the intensities on the whole path
over the time interval
, makes the jump process cover a large class of real-world applications. For instance, in queuing theory it is desirable that the intensities are functions of
, or
The Markov chain of mean-field type is obtained as the limit of a system of weakly interacting Markov chains
as the size
N becomes large. That is,
Such a weak interaction is usually called a mean-field interaction. It occurs when the jump intensities of the Markov chains depend on their empirical mean. When the system’s size grows to infinity, the sequence of
N-indexed empirical means, which describes the states of the system, converges to the expectation
of
, which evolves according to a McKean–Vlasov equation (or nonlinear Fokker–Planck). A more general situation is when the jump intensities of the obtained nonlinear Markov chain depend on the marginal law
of
. To keep the content of the paper as simple as possible, we do not treat this situation.
Markov chains of mean-field type have been used as models in many different fields, such as chemistry, physics, biology, economics, epidemics, etc. (e.g., [
2,
3,
4,
5,
6]). Existence and uniqueness results with bounded and unbounded jump intensities were proven in [
7,
8], respectively. We refer to [
9] for existence and uniqueness of solutions to McKean–Vlasov equations with unbounded jump intensities, and to [
10,
11] for results related to the law of large numbers for unbounded jump mean-field models, and large deviations for corresponding empirical processes.
The present work is a continuation of [
12], where the authors proved the existence and uniqueness of this class of processes in terms of a martingale problem, and derived sufficient conditions (cf. Theorem 4.6 in [
12]) for the existence of an optimal control which minimizes
for a rather general class of (unbounded) jump intensities. Since the suggested conditions are rather difficult to apply in concrete situations (see Remark 4.7 and Example 4.8 in [
12]), we aim in this paper to investigate whether the SMP can yield optimality conditions that are tractable and easy to verify.
While in the usual strong-type control problems the dynamics are given in terms of a process which solves a stochastic differential equation (SDE) on a given probability space , the dynamics in our formulation are given in terms of a family of probability measures , where x is the coordinate process (i.e., it does not change with the control u). This type of formulation is usually called a weak-type formulation for control problems.
The main idea in the martingale and dynamic programming approaches to optimal control problems for jump processes (without mean-field coupling) suggested in previous work, including the following first papers on the subject [
13,
14,
15,
16] (the list of references is far from being exhaustive), is to use the Radon–Nikodym density process
of
w.r.t. some reference probability measure
P as dynamics and recast the control problem to a standard one. In this paper, we apply the same idea and recast the control problem to a mean-field-type control problem to which an SMP can applied. By a Girsanov-type result for pure jump processes, the density process
is a martingale and solves a linear SDE driven by some accompanying
P-martingale
M. The adjoint process associated to the SMP solves a (Markov chain) backward stochastic differential equation (BSDE) driven by the
P-martingale
M, whose existence and uniqueness can be derived using the results by Cohen and Elliott [
17,
18]. For some linear and quadratic cost functionals, we explicitly solve these BSDEs and derive a closed form of the optimal control.
In
Section 2, we briefly recall the basic stochastic calculus for pure jump processes that we will use in the sequel. In
Section 3, we derive sufficient and necessary optimality conditions for the control problem. As already mentioned, the SMP optimality conditions are derived in terms of a mean-field stochastic maximum principle, where the adjoint equation is a Markov chain BSDE. In
Section 3, we illustrate the results using two examples of optimal control problems that involve two-state chains and linear quadratic cost functionals. We also consider an optimal control of a mean-field version of the Schlögl model for chemical reactions. We consider linear and quadratic cost functionals in all examples for the sake of simplicity and also because, in these cases, we obtain the optimal controls in closed form.
The obtained results can easily be extended to pure jump processes taking values on more general state spaces such as .
2. Preliminaries
Let equipped with its discrete topology and -field and let be the space of functions from to I that are right-continuous with left limits at each and are left-continuous at time T. We endow with the Skorohod metric so that is a complete separable metric (i.e., Polish) space. Given and , put and denote by the filtration generated by x. Denote by the Borel -field over . It is well-known that coincides with .
To
x we associate the indicator process
whose value is 1 if the chain is in state
i at time
t and 0 otherwise, and the counting processes
, independent of
, such that
which counts the number of jumps from state
i into state
j during the time interval
. Obviously, since
x is right-continuous with left limits, both
and
are right-continuous with left limits. Moreover, by the relationship
the state process, the indicator processes, and the counting processes carry the same information, which is represented by the natural filtration
of
x. Note that (
1) is equivalent to the following useful representation:
Let
, where
are constant entries, be a
Q-matrix:
By Theorem 4.7.3 in [
19], or Theorem 20.6 in [
20] (for the finite state-space), given the
Q-matrix
G and a probability measure
over
I, there exists a unique probability measure
P on
under which the coordinate process
x is a time-homogeneous Markov chain with intensity matrix
G and starting distribution
(i.e., such that
). Equivalently,
P solves the martingale problem for
G with initial probability distribution
, meaning that for every
f on
I, the process defined by
is a local martingale relative to
, where
and
By Lemma 21.13 in [
20], the compensated processes associated with the counting processes
, defined by
are zero mean, square integrable, and mutually orthogonal
P-martingales whose predictable quadratic variations are
Moreover, at jump times
t, we have
Thus, the optional variation of
M
is
We call the accompanying martingale of the counting process or of the Markov chain x.
Denote by the completion of with the P-null sets of . Hereafter, a process from into a measurable space is said to be predictable (resp. progressively measurable) if it is predictable (resp. progressively measurable) w.r.t. the predictable -field on (resp. ).
For a real-valued matrix
indexed by
, we let
Consider the local martingale
Then, the optional variation of the local martingale
W is
and its compensator is
W is a square-integrable martingale and its optional variation satisfies
Moreover, the following Doob’s inequality holds:
3. A Stochastic Maximum Principle
We consider controls with values in some subset
U of
and let
be the set of
-progressively measurable processes
with values in
.
is the set of admissible controls.
For
, let
be the probability measure on
under which the coordinate process
x is a jump process with intensities
where for each
The cost functional associated to
is of the form
where
In this section, we propose to characterize minimizers
of
J, that is,
satisfying
in terms of a stochastic maximum principle (SMP). We first state and prove the sufficient optimality conditions. Then, we state the necessary optimality conditions.
Let
P be the probability measure on
under which
x is a time-homogeneous Markov chain such that
and with
Q-matrix
satisfying (
3). Then, by a Girsanov-type result for pure jump processes (e.g., [
20,
21]), it holds that
where, for
,
which satisfies
where
is given by the formula
and
is the
P-martingale given in (
6). Moreover, the accompanying martingale
satisfies
Integrating by parts and taking expectation, we obtain
We recast our problem of controlling a Markov chain through its intensity matrix to a standard control problem which aims at minimizing the cost functional (
25) under the dynamics given by the density process
which satisfies (
22), to which the mean-field stochastic maximum principle in [
22] can be applied. The corresponding optimal dynamics are given by the probability measure
on
defined by
where
is the associated density process.
is called an optimal pair associated with (
19).
For
,
denotes the partial derivative of the function
w.r.t.
w.
for
, we set
and we define
To the admissible pair of processes
, we associate the solution
(if it exits) of the following linear BSDE of mean-field type, known as first-order adjoint equation:
In the next proposition we give sufficient conditions on
, and
that guarantee the existence of a unique solution to the BSDE (
27).
Proposition 1. Assume that
- (A1)
For each
- (a)
is differentiable,
- (b)
there exists a positive constant s.t. P-a.s. for all ,
- (A2)
The functions , and are bounded. f and h are differentiable in with bounded derivatives.
Then, the BSDE (27) admits a solution consisting of an adapted process p which is right-continuous with left limits and a predictable process q which satisfies This solution is unique up to indistinguishability for p and equality -almost everywhere for q.
Remark 1. - (i)
Assumptions (A1) and (3) imply that there exists a positive constant C s.t. for all - (ii)
By Theorem T11 (chapter VII) in [21], the uniform boundedness of implies that for each
Proof. Assumptions (A1) and (A2) make the driver of the BSDE (
27) Lipschitz continuous in
q. The proof is similar to that of Theorem 3.1 for the Brownian motion-driven mean-field BSDE derived in [
23] by considering the following norm:
where
, along with Itô–Stieltjes formula for purely discontinuous semi-martingales. For the sake of completeness, we give a proof in
Appendix A. □
Remark 2. - (i)
The boundedness on f and h and their derivatives is strong and can be considerably weakened using standard truncation techniques.
- (ii)
If (i.e., the intensity does not contain any mean-field coupling), the BSDE (27) becomes standard. Thanks to Theorem 3.10 in [12], it is solvable only by imposing similar conditions to (H1)–(H3) therein. - (iii)
If (i.e., the intensity is of mean-field type), we do not know whether we can relax the imposed boundedness of and , because without this condition the standard comparison theorem for Markov chain BSDEs simply does not generally apply for such drivers.
Let
be an admissible pair and
be the associated first-order adjoint process solution of (
27).
For
, we introduce the Hamiltonian associated to our control problem
Next, we state the SMP sufficient and necessary optimality conditions, but only prove the sufficient optimality case, as the necessary optimality conditions result is tedious and more involved but by now “standard” and can be derived following the same steps of [
22,
24,
25].
In the next two theorems, we assume that (A1) and (A2) of Proposition 1 hold.
Theorem 1 (Sufficient optimality conditions
. Let be an admissible pair and be the associated first-order adjoint process which satisfies (27) and (28). Assume - (A4)
The set of controls U is a convex body (i.e., U is convex and has a nonempty interior) of , and the functions ℓ and f are differentiable in u.
- (A5)
The functions and are concave in for , P-almost surely.
- (A6)
The function is convex.
If the admissible control satisfiesthen the pair is optimal. Proof. We want to show that if the pair
satisfies (
32), then
Since
is convex, we have
Integrating by parts, using (
27), we obtain
We introduce the following “Hamiltonian” function:
Furthermore, for
u and
in
, we set
Since
and
are concave, we have
Since, by (
32),
, we obtain
for
.
Theorem 2 (Necessary optimality conditions (Verification Theorem)).
If is an optimal pair of the control problem (19) and there is a unique pair of -adapted processes associated to which satisfies (27) and (28), then Remark 3. Unfortunately, the sufficient optimality conditions can rarely be satisfied in practice because the convexity conditions imposed on the involved coefficients are not always satisfied, even for the simplest examples: assume ℓ and f without mean-field coupling and linear in the control u. Then, none of the functions and are concave in . However, the verification theorem in terms of necessary optimality conditions holds for a fairly general class of functions with sufficient smoothness. Hence, if we can solve the associated BSDEs, the necessary optimality conditions result can be useful.
4. Numerical Examples
In this section we first solve the adjoint equation associated to an optimal control problem associated with a standard two-state Markov chain, then we extend the problem to a two-state Markov chain of mean-field type. As mentioned in Remark 3, whether sufficient or necessary conditions may apply of course depends on the smoothness of the involved functions. Not all the functions involved in the next examples satisfy the convexity conditions imposed in Theorem 1.
Example 1. Optimal Control of a Standard Two-State Markov Chain.
We study the optimal control of a simple Markov chain
x whose state space is
, where
are integers, and its jump intensity matrix is
where
is a given positive constant intensity and
u is the control process assumed to be nonnegative, bounded, and predictable. Let
P be the probability measure under which the chain
x has intensity matrix
Further, let
be the density process given by (
22), where
ℓ is defined by
The control problem we want to solve consists of finding the optimal control
that minimizes the linear-quadratic cost functional
Given a control
, consider the Hamiltonian
where
By the first-order optimality conditions, an optimal control
is a solution of the equation
, which implies
The optimal control is thus
where for each
t,
, since
.
It remains to identify
. Consider the associated adjoint equations given by
In view of (
36), the driver reads
The adjoint equation becomes
Now, considering the probability measure
under which
x is a Markov chain whose jump intensity matrix is
the processes defined by
are
-martingales having the same jumps as the martingales
:
and
Integrating (
39) and then taking conditional expectation yields
Under the probability measure
Taking conditional expectation, we obtain
and
which in view of (
41) implies that
Therefore,
which yields the following explicit form of the optimal control:
In the next two examples we highlight the effect of the mean-field coupling in both the jump intensity and the cost functional on the optimal control.
Example 2. Mean-Field Optimal Control of a Two-State Markov Chain.
We consider the same chain as in the first example but with the following mean-field type jump intensities,
,
and want to minimize the cost functional
where
denotes the variance of
under the probability
defined by
Given a control
, consider the Hamiltonian
where
Performing similar calculations as in Example 1, we find that the optimal control is given by
We will now identify
. The associated adjoint equation is given by
In view of (
44), the driver reads
The adjoint equation becomes
Consider the probability measure
, under which
x is a Markov chain whose jump intensity matrix
This change of measure yields the
-martingales
and
Integrating (
46), then taking conditional expectation yields
Next, we compute the right hand side of (
48), then we identify
by matching.
Set
and
Under
Dynkin’s formula yields
Taking conditional expectation yields
and
where
Therefore,
Matching (
47) with (
49) yields
Noting that
, to guarantee that both
and
above are indeed intensity matrices, it suffices to impose that
We further characterize the optimal control
by finding
which satisfies (
50). Indeed, under
,
x has the representation
Taking the expectation under
yields
In particular, the mapping
is absolutely continuous. Using the fact that
and
, Equation (
51) becomes
with
Thus, in view (
50),
should satisfy the following constrained Riccati equation:
where
is a given initial value. As is well-known, without the imposed constraint on
, the Riccati equation admits an explicit solution that may explode in finite time unless the involved coefficients
and
evolve within certain ranges. With the imposed constraint on
, these ranges may become tighter. Below, we illustrate this with a few cases. As shown in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 below, for low values of
, the ODE (
52) can be solved for any time. How low the intensity should be mainly depends on the size of
b and
. The larger
b is, the wider is the range for
for which the ODE is solvable. In particular, when
and
, (
52) is solvable for any time when
. For greater values of
, the ODE violates the constraint proportionally “faster”.
The results also show that the initial conditions may affect the time horizon
T. Starting with values reasonably close to
, the ODE (
52) is solvable only for relatively shorter time horizons than when we start with values reasonably close to zero.
Example 3. Mean-Field Schlögl Model.
We suggest to solve a control problem associated with a mean-field version of the Schlögl model (cf. [
3,
9,
10,
26]) where the intensities are of the form
for some predictable and positive control process
u, where
and
is a deterministic
Q-matrix for which there exists
such that
for
and
for
.
We consider the following mean field-type cost functional
Given a control
, the associated Hamiltonian reads
The first-order optimality conditions yield
Next, we write the associated adjoint equation and identify
Consider the probability measure
, under which
x is a pure jump process whose jump intensity matrix is
The adjoint equations become
where
are mutually orthogonal
-martingales.
Following the same steps leading to (
42), from (
57), we obtain
, thus
Final remark. There are more real-world Markov chain mean-field control problems that we can apply our results to, such as the mean-field model proposed in [
27] for malware propagation over computer networks where the nodes/devices interact through network-based opportunistic meetings (e.g., via internet, email, USB keys). Markov chains of mean-field type can be used to model the state of the devices, which can be either Susceptible/Honest (H), Infected (I), or Dormant (D). The fact that the jump intensities may depend nonlinearly on the mean-field coupling makes it more involved to have closed-form expressions even when considering linear quadratic cost functionals.