1. Introduction
Stochastic games introduced by Shapley in Reference [
1] which have been actively pursued over the last few decades because of several applications mainly in economics and queueing system; see, for instance, References [
2,
3,
4,
5]. In this paper, we study the special constrained stochastic games with independent state processes. More precisely, each player chooses an action and pays some costs which depend on the actions of all players in each stage. However, each player’s action only influence the transition probability of the Markov chain controlled by herself. In such case, the player would wish to minimize certain expected average cost which she is most concerned about, while wants to keep other kinds of expected average costs within bounds. The game models under these formulation have been considered in References [
6,
7]. In Reference [
6], the authors present an existence condition of stationary Nash equilibria for the constrained average games. Reference [
7] derives a necessary and sufficient condition for a stationary Nash equilibria to be a global minimizer of a mathematic program.
Different from the framework of game model considered in this paper, References [
2,
8,
9] study the so-called centralized stochastic games in which all players jointly control a single Markov chain and the one-stage costs of each player are influenced by the actions of all players. Reference [
8] considers the game model with expected discounted cost criteria and expected average cost criteria and yields the existence of stationary Nash equilibria in the context of finite state space and compact action spaces. Using vanishing discount approach, Reference [
9] generalizes the existence result in Reference [
8] to the average games with denumerable states. Reference [
2] extends the existence result in Reference [
8] to the discounted game with denumerable states but under some special hypothesis on the transition laws and boundness condition imposed on the one-stage costs. By special finite-state approximation approach, Reference [
10] weakens the above existence condition in Reference [
8]. For applications of constrained games, Reference [
11] studies the wireless powered networks including a user and an attacker. Based on the constrained game theory in Reference [
6], Reference [
11] gives the energy request and data transmission strategy of the user and the attack strategy of the attacker in the sense of Nash equilibria. Reference [
12] considers the wireless networks in which each mobil wants to maximize its expected capacity subject to some power and buffer length constraints. Reference [
13] introduces a neighbor-aware energy-efficient monitoring system for energy harvesting Internet of Things. A constrained stochastic game model is established to minimize the number of transmissions while keeping a desired monitoring probability and a best response dynamics-based algorithm is developed in Reference [
13].
In this paper, we consider the constrained average games with denumerable states and unbounded costs. The main contributions of the present paper are as follows. First, by introducing the average occupation measures, we establish the so-called best-response linear program to characterize the occupation measures of stationary constrained Nash equilibria as fixed points of certain multifunction from the product space of occupation measures into itself. The standard weak convergence technique used in References [
6,
7] can not apply directly to the case wherein the costs are unbounded. Meanwhile, the arguments in References [
6,
7] employ the finiteness of the state spaces. However, the costs are always unbounded and the state spaces are not finite in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance References [
14,
15]. Therefore, we introduce the so-called
w-weak convergence topology and impose the standard drift condition on the transition kernel, the growth condition and additive condition on the costs,
w-uniform geometrical ergodicity condition and continuity-compactness conditions. By the properties of
w-weak convergence, we study the asymptotic properties of average occupation measures and expected average costs, which are used to establish the upper-continuity of the multifunction. Then, we show the existence of stationary constrained Nash equilibria by the fixed-point theorem. It should be mentioned that the vanishing discount approach in Reference [
9] is based on the existence of discounted Nash equilibria in Reference [
10], that is to show the limit of discounted Nash equilibria is the average Nash equilibrium when the discount factors tend to one. However, in this paper, we use fixed-point method directly because the existence of discounted Nash equilibria for stochastic games with independent state process has not been established. Finally, we characterize each stationary constrained Nash equilibrium as a global minimizer of this mathematical program which can be viewed as the combination of the best response linear program and duality program.
The rest of the paper is organized as follows. In
Section 2, we introduce the constrained average game model we are concerned with. In
Section 3, we introduce the best response linear program and study the corresponding convergent property of the best response linear program based on the average occupation measures. In
Section 4, we establish the main statements, including the existence result and the characterization of stationary constrained Nash equilibria. In
Section 5, a controlled wireless network is presented to illustrate our main results.
2. The Game Model
If
S is a Borel space, we denote by
its Borel
-algebra, by
the set of all probability measures on
endowed with the topology of weak convergence, and by
the set of all Borel measurable functions on
S. Let
I stand for the indicator function,
be a fixed integer. Let us define
, and use indexs
i or
j to denote a player. Now, we introduce the
N-person constrained game model
For each , the state space is assumed to be a denumerable set and is the action space which is assumed to be a Polish space endowed with the Borel -algebra . Without loss of generality, we assume and is enumerated by the natural order. Let be the product space of the state spaces and be the product space of the action spaces. For each , the nonempty measurable subset of denotes the set of actions available when the state of player i is . Let and . is the stochastic kernel on given controlled by player i, that is is the probability of moving from state to if player i chooses action . For each , the one-stage cost is a real-valued measurable function on . The constants () denote the constraints and denotes the initial distribution of player i. Let .
For each , let be the set of all histories up to time t and be the set of all stochastic kernels on given for each i, where denotes the t-power product space of .
Definition 1. - (1)
A randomized history-dependent strategy
for player i is a sequence of stochastic kernels on the action space given satisfying - (2)
A randomized history-dependent strategy for player i is said to be randomized stationary if there is a stochastic kernel such that for each . We will write such a stationary strategy as .
Assumption 1. The players do not observe their costs, that is, the strategy chosen by any player does not depend on the realization of the cost.
Remark 1. Assumption 1 is imposed on [6,7], which is used to ensure that a player could not use the one-stage costs to estimate the state and the action of the other players. The sets of all randomized history-dependent strategies and randomized stationary strategies for player i are denoted by and , respectively. A multi-strategy is a vector , where . Let denote the set of all randomized stationary multi-strategies.
Let
and
be the corresponding product
-algebra. Then, for each
and each initial distribution
, the well-known Tulcea’s Theorem ([
16], p. 178) ensures the existence of a unique probability measure
on
such that, for each
and
,
where
and
denote the state and the action at the decision epoch
t, respectively. The expectation operator with respect to
is denoted by
. If
is concentrated at some state
, we will write
and
as
and
, respectively. For each
, the product measure of
is denoted by
and the corresponding expectation operator is denoted by
. For each
, we denote by
the
N-vector
multi-strategy obtained from
by replacing
with
. Similarly, for each
, we denote by
the
N-vector in which the
jth component is
for
, while the
ith component is
.
For each
and
, the
expected average criteria are defined, for each player
and
, as
where
and
denote the state vector and action vector at time
t, respectively.
Definition 2. - (1)
For a fixed , a strategy is said to be feasible
for player i against π if for each . Letbe the set of all feasible strategies
for player i against π. - (2)
is called a feasible multi-strategy for if is in for each . We denote by Δ the set of all feasible multi-strategies.
- (3)
(Nash equilibrium) is called a constrained Nash equilibrium if andA constrained Nash equilibrium is said to be stationary if belongs to .
3. The Technique Preliminary
A monotone nondecreasing function such that will be refereed to as a Lyapunov function on for each . For each function g and constant , we denote by the power of the function g.
To guarantee the finiteness of the expected average costs, we need the following drift conditions, which is widely used in References [
14,
16,
17,
18] for discrete-time Markov decision processes and in References [
5,
10] for stochastic games.
Assumption 2. For each , there exist a Lyapunov function on , constants and such that
- (1)
;
- (2)
.
By Jensen’s inequality, it follows that there exist constants
and
(depending on
) for each
such that
Now, we give two different kinds of hypotheses imposed on the cost functions to ensure the existence of Nash equilibria.
Assumption 3. - (1)
(Growth condition) There exists a constant such that for each and .
- (2)
The function is continuous in for each fixed .
Assumption 4. Suppose that there exist a constant and functions on with for each and , such that
- (1)
and ;
- (2)
is continuous in for each .
Remark 2. - (1)
It should be mentioned that the cost functions in References [6,7,9] are assumed to be bounded. However, the cost functions are always unbounded in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance, References [14,15]. - (2)
As in Assumption 4, the case that the immediate costs are additive is also considered in References [7,19] for discrete-time unconstrained stochastic games.
Lemma 1. Suppose that Assumptions 2 and 3(1) (resp. 4(1)) hold. For each , and , Proof. (
4) and (
5) follow directly from Lemma 10.4.1 in Reference [
17]. □
For each function (resp. ) and , let us define the f-norm by (resp. ) and the Banach space (resp. ). The set of all bounded measurable functions on is denoted by .
For each
and
, let us define
For each , let denote the t-step transition probability from to corresponding to . Obviously, .
Definition 3. For each and , the transition kernel is said to be irreducible, if for all .
Assumption 5. For each ,
- (1)
let , there exist a probability measure on and constants and such thatfor each ; - (2)
is irreducible for all .
Remark 3. - (1)
Under Assumptions 2 and 5, it is easy to see that is the unique invariant probability measure of the transition kernel , see Reference ([17], p.12). Moreover, we have for each and for each . - (2)
The w-uniform geometrical ergodicity condition in Assumption 5(1) has been widely used in discrete-time Markov decision processes, see References [17,18] and the reference therein. Since the state space in References [6,7] is finite, the standard ergodicity condition is only required in References [6,7]. Moreover, as the cost functions in Reference [9] are assumed to be bounded, the weaker uniform geometrical ergodicity condition is imposed on [9].
For each
, we define the average occupation measure
for each
and the set
For each
and
, we introduce the following Markov decision processes
:
where
for each
and
, the other components are the same as in (
1).
For each
and
, the expected average cost is defined by
Definition 4. For a fixed , strategy is said to be feasible
for if Let be the set of all feasible strategies
of . A strategy is said to be optimal for if and We denote by the optimal value of .
For each
and
, under Assumptions 2, 3 (resp. 4) and 5, we have
Now, we introduce the following constrained optimality problem
Lemma 2. Suppose that Assumptions 1, 2 and 5 are satisfied and in addition either Assumption 3 or 4 holds. Let and with for each . Then Proof. We only prove for
. Let us define
First, we assume Assumption 3 holds. It follows from Assumption 5 that, for each player
, we have
where
for each
. For notational ease, we define
for
and
for
and
. Hence,
Thus, we have that
which implies the desired result. On the other hand, if Assumption 4 holds,
□
Let be the function as in Assumption 2 for each . Let us define and .
Definition 5. For each , let . is said to converge to with respect to the -weak topology if and only if for each In this case, we denote it by .
Let and with , we say if for each . For each and , we define for each .
To ensure the existence of constrained Nash equilibria, we also need the following standard continuity-compactness conditions which are widely used; see, for instance, References [
4,
14,
16,
17,
19] and the reference therein.
Assumption 6. - (1)
For each and , is a compact set.
- (2)
The functions is continuous in for each fixed and .
- (3)
The functions is continuous in for each fixed and .
Lemma 3. Suppose that Assumptions 2, 5 and 6 hold. For each , the set is convex and compact with respect to the -weak topology.
Proof. Fix
, it follows from Remark 3 that
and the set
is compact in
for each
. Hence, it follows from Corollary A.46 in Reference [
20], the set
is compact with respect to the
-weak topology. The other statement can be obtained by Lemma 5.2.2 in Reference [
18]. □
Remark 4. The standard weak convergence technique used in References [6,7] for bounded costs does not apply directly to the case wherein costs are unbounded in this paper. Lemma 4. Suppose that Assumptions 1, 2, 5 and 6 are satisfied and in addition either Assumption 3 or 4 holds. Let
be fixed,
and
, such that
and
weakly in
. Then,
for each
.
Proof. For each
, we assume
. According to proposition D.8 in Reference [
16], there exist
and
such that
and
for each
. For an arbitrary
, it follows from Lemma 5.1.2 in Reference [
18] that
Since
, it follows from Assumption 6(2) that
which together with Lemma 5.1.2 in Reference [
18] implies that
for each
. Similarly, we also have
. Thus, we can see that
and
weakly in
for each
. It follows from Assumption 3(2) or Assumption 4 that
for each
, where
for each
.
Under Assumptions 2(2) and 5, we have that
Then, under Assumption 3(2), by (
7)–(
9) and Proposition A.2.6 in Reference [
15], it follows that
If Assumption 3 is replaced by Assumption 4, then we have
□
The following slater condition is common for constrained games, see References [
2,
6,
7,
8,
9,
10].
Assumption 7. (Slater condition) For each stationary multi-strategy and each player i, there exists such that Lemma 5. Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let and be fixed. Then,
- (1)
for each , there exists a stationary strategy (depending on , and ), such that for each ;
- (2)
there exists a stationary strategy such that for each .
Proof. - (1)
See Lemma 5.7.10 in Reference [
16].
- (2)
Let such that for each . For the fixed , let be the corresponding strategy as in Assumption 7. It follows from Lemma 2 that for each . By part (1), there exists a stationary strategy (depending on and ) such that for each .
□
It follows from Lemma 5.1.2 in Reference [
18] that
if and only if
According to Lemma 5, the constrained optimality problem (
6) can be restricted to the set of all stationary strategies. Hence, for each
and
, the constrained optimality problem (
6) is equivalent to the following best response linear program
:
For a fixed
and
,
is said to be feasible for LP
if it satisfies (
10). The sets of all feasible solutions and optimal solutions are denoted by
and
, respectively. The optimal value of
is denoted by
.
Proposition 1. Suppose that Assumptions 1, 2, 5, 6 and 7 and are satisfied and in addition either Assumption 3 or 4 holds. Let and such that for each . Then,
- (1)
is an optimal strategy of for each if and only if φ is a stationary constrained Nash equilibrium;
- (2)
is an optimal strategy of LP for each if and only if φ is a stationary constrained Nash equilibrium.
Proof. - (1)
⟹. Let
be fixed and
. It follows from Lemma 2 that
Thus, we have
. Then, Lemma 2 yields that
which implies that
is a constrained Nash equilibrium.
⟸. For each , we take an arbitrary strategy such that for each . Lemma 2 yields that for each , which implies that . As is a constrained Nash equilibrium, it follows that , which implies that is an optimal strategy of .
- (2)
is optimal for LP if and only if is optimal for . Hence, we get the desired result by part (1). □
Lemma 6. Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed and such that , and for each . Then, is relatively compact in with respect to the -weak topology and the accumulation point is a feasible solution of LP.
Proof. It follows from Lemma 3 that there exists a subsequence
such that
with respect to the
-weak topology. Then, Lemma 4 yields that
As
, it follows that
which together with (
11) implies that
. □
Lemma 7. Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed and take an arbitrary , then LP has an optimal solution.
Proof. First, it follows from Lemma 5(2) that
. Let
be the minimizing sequence such that
By Lemma 6, there exists a subsequence
of
such that
. Then, it follows from Lemma 4 and (
12) that
This is . □
The idea of the proof of the following lemma is from Lemma 5.1 and Theorem 3.9 in Reference [
21] for constrained discrete-time Markov decision processes with discounted cost criteria.
Lemma 8. Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed and such that . Then,
- (1)
for each , there exist an integer N and for all , such that weakly in ;
- (2)
if is an accumulation point of sequence in which is an optimal solution of LP for each , then
Proof. - (1)
Lemma 5(2) gives the existence of
and constant
, such that
and
for all
. By Lemma 4, it follows that
for all
Let
be such that
. By (
13)-(
14), there exists an integer
(depending on
) such that, for each
and
,
Let
where
. Then, we derive from Lemma 3 that
, and
for each
and
, which implies that
for all
.
Then, let
, such that
and
. For each fixed
(corresponding to a given
), as in the previous argument, there exists an integer
(depending on
s) which is assumed to be increasing in
, such that
Let
Since
, by (
15)–(
17), we have
weakly in
as
and
for each
, which completes the proof of part (i).
- (2)
By Lemma 6, without loss of generality, we assume that
weakly in
. On the other hand, for any
, it follows from part (1) that there exist an integer
N and
for all
, such that
weakly in
, which together with
, gives
Then, by Lemma 4 and (
18), we have
which implies
.