Next Article in Journal
Coefficient Related Studies for New Classes of Bi-Univalent Functions
Next Article in Special Issue
Secretary Problem with Possible Errors in Observation
Previous Article in Journal
Projection Methods for Uniformly Convex Expandable Sets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

When Inaccuracies in Value Functions Do Not Propagate on Optima and Equilibria

by
Agnieszka Wiszniewska-Matyszkiel
1,* and
Rajani Singh
1,2,*
1
Institute of Applied Mathematics and Mechanics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 02-097 Warsaw, Poland
2
Department of Digitalization, Copenhagen Business School, 2000 Copenhagen, Denmark
*
Authors to whom correspondence should be addressed.
Mathematics 2020, 8(7), 1109; https://doi.org/10.3390/math8071109
Submission received: 16 May 2020 / Revised: 27 June 2020 / Accepted: 30 June 2020 / Published: 6 July 2020
(This article belongs to the Special Issue Statistical and Probabilistic Methods in the Game Theory)

Abstract

:
We study general classes of discrete time dynamic optimization problems and dynamic games with feedback controls. In such problems, the solution is usually found by using the Bellman or Hamilton–Jacobi–Bellman equation for the value function in the case of dynamic optimization and a set of such coupled equations for dynamic games, which is not always possible accurately. We derive general rules stating what kind of errors in the calculation or computation of the value function do not result in errors in calculation or computation of an optimal control or a Nash equilibrium along the corresponding trajectory. This general result concerns not only errors resulting from using numerical methods but also errors resulting from some preliminary assumptions related to replacing the actual value functions by some a priori assumed constraints for them on certain subsets. We illustrate the results by a motivating example of the Fish Wars, with singularities in payoffs.

1. Introduction

Finding an optimal control in the feedback form or a feedback Nash equilibrium in a dynamic game is complicated, especially if the analytic solution cannot be found. The appropriate methods, based on the Bellman equation or a set of coupled Bellman equations, respectively, require finding the value function on the whole state space or at least a large invariant subset of it (for the principle of optimality, i.e., necessity of the Bellman equation in the infinite horizon see e.g., Stokey, Lucas and Prescott [1], Kamihigashi [2], Wiszniewska-Matyszkiel and Singh [3]) with appropriate terminal condition (for the infinite horizon, necessity has been proven in Wiszniewska-Matyszkiel and Singh [3]).
Currently, the substantial focus in research by mathematicians is on high accuracy of the solution of the Bellman equation or the set of Bellman equations in the case of numerical solutions, as well as the existence and uniqueness in some classes of functions or methods of estimating the infinite horizon value function and accuracy of this estimation (e.g., Martins-Da-Rocha and Vailakis [4], Le Van and Morhaim [5], Rincón-Zapatero and Rodríguez-Palmero [6], Matkowski and Nowak [7], Kamihigashi [8], Wiszniewska-Matyszkiel, and Singh [3]).
An error in calculation or computation of the solution of the Bellman equation even on a very small set may result in finding a function that is far from the actual value function while the resulting candidate for the optimal control differs substantially from the actual optimal control on a considerable part of the state set (see, e.g., Singh and Wiszniewska-Matyszkiel [9]). Moreover, the Bellman equation may have multiple solutions (Rincón-Zapatero and Rodríguez-Palmero [6], Singh and Wiszniewska-Matyszkiel [10], Wiszniewska-Matyszkiel, and Singh [3], with the false value function more plausible than the actual one in [3,10]). However, sometimes, errors in calculation and computation of the value function do not propagate on optima and equilibria.
The research presented in this paper has been motivated by an analysis of the well-known Fish Wars model as an example of games, which may be regarded as intractable by numerical methods due to singularities in payoffs. Nevertheless, after applying numerical methods not taking into account preliminary assumptions about the form of the solutions, and leaving the grid purposely sparse on some sets, we have obtained an unexpectedly high level of accuracy of numerically computed optima and Nash equilibria along the corresponding state trajectories in spite of inaccuracy of approximation of the value functions on some intervals. This implies that, in some cases, using numerical methods to solve dynamic games or dynamic optimization problems with singularities in payoffs may turn out to be successful even if the type of singularity is not known a priori.
To explain this paradox, we derive general rules stating what kind of errors in calculation or computation of the value function do not result in errors in calculation or computation of the optimal control or a Nash equilibrium in a wide class of dynamic optimization problems or dynamic games, depending on partial knowlegde about the optimal or Nash trajectory or its approximation. These general results concern not only errors resulting from using numerical methods but also errors resulting from some preliminary assumptions related to obvious constraints on the value function or iterates approximating it as in Stokey, Lucas and Prescott [1], Martins-Da-Rocha and Vailakis [4], Kamihigashi [8], and Wiszniewska-Matyszkiel and Singh [3].
Thus, the main result is theoretical—we study the conditions under which the paradox that we have obtained in the motivating example appears in dynamic optimization problems or dynamic games and we derive general conclusions concerning some simplifications which can be applied while solving dynamic optimum or Nash equilibrium problems, not only in the case of singularities in payoffs. Those conditions are related to some preliminary constraints of the region in which the optimal or Nash trajectory, correspondingly, is, or to some constraints on where its approximate is, which can be checked ex post.
Our research has been preceded by some papers on theory and applications in which the feedback optimal control has been solved only along the optimal trajectory (e.g., Bock [11]), at the neighborhood of the steady state of the optimal trajectory (e.g., Horwood and Whittle [12,13]), or at the steady state only (e.g, Krawczyk and Tolwinsky [14]). This restriction is caused by either the fact that it is impossible to solve the problem analytically or by the fact that computation in real time is needed. All of those papers are dedicated to finding a partial solution of a dynamic optimization problem even if initially the model is a dynamic game.
Another reason to restrict the calculation of the optimal control or Nash equilibrium problem only to certain subsets of the state space or the product of the time set and the state space is the presence of singularities, as in our motivating example—a model of exploitation of a common fishery by at least two players (countries or fishermen) whose subsistence is based on fishing—which is an extension of the Fish Wars model of Levhari and Mirman [15]. In the model, there is a singularity at the point of extinction, which is not viable, and singularities at zero fishing decisions. The Fish Wars term comes from the Cod Wars between Iceland and UK, which has been the motivation to the seminal paper of Levhari and Mirman [15]. It is the first model of Fish Wars using tools of dynamic games, and it has been designed to describe the situation of extraction of common resources which are the base of subsistence of the fishing countries. In such a case, the depletion of the resource, which may mean even extinction of the species being the base of subsistence of the community extracting it, is disastrous to the community. Because of the same reason, not extracting at all at some time instant (i.e., year) means starvation and is also not acceptable. To emphasize this fact, logarithmic instantaneous and terminal payoff functions are introduced.
The results of Levhari and Mirman have been generalized to n players by Okuguchi [16], Mazalov and Rettieva [17,18] and Rettieva [19], where also cooperative issues were considered. Besides the standard game with the discounted payoff, Nowak [20,21] considers an analogous problem without discounting but either the overtaking optimality criterion or the limiting average and he proves the convergence as the discount factor converges to 1.
Because of the simplicity of calculation of analytic solutions, especially for the infinite time horizon, the model has been analyzed in many aspects and slight modifications. A far from exhausive selection from the literature encompasses: Fischer and Mirman [22,23] (two species); Wiszniewska-Matyszkiel [24] (increasing the number of players, considered not as a process of new actual players entering the game, but the decomposition of the decision-making process among increasing number of subsets of the same set of players), and [25] (analogous with asymmetric players); Kwon [26] (partial cooperation); Breton and Keoula [27] (cooperation and coalition stability in the case of delay in information) and [28] (asymmetric players); Koulovatianos [29] (randomness in the growth function of the resource and assumption that players rationally learn about it); Dutta and Sundaram [30] (a wider class of problems, defining “the tragedy of the commons” as overexploitation of the resource above the “golden rule” level, and looking for dynamics for which the inequality is opposite); Wiszniewska-Matyszkiel [31,32] (games with distorted information in which players do not exactly know the actual bio-economic structure of the problem in which they are involved, i.e., their influence on the resource, and belief-distorted Nash equilibria for such games—in which players introduce beliefs, not necessarily consistent with the actual dynamics, they best-respond to their beliefs, which results in self-verification of those beliefs) and Wiszniewska-Matyszkiel [33] (continuous time); Hanneson [34] (similar games treated as infinitely repeated supergame in which logarithmic part appears) and Górniewicz and Wiszniewska-Matyszkiel [35] (Allee effect causing depletion below some minimal sustainable state), Breton, Daumouni and Zaccour [36] (two species and many specialized players with different levels of cooperation examined). For more exhaustive surveys of the subject of exploitation of common renewable resources, see, e.g., Carraro and Filar [37] or Long [38,39]. This short selection of the papers on this subject shows how important this model is in resource economics. Moreover, it has applications in games of capital accumulation (e.g., Nowak [20]).
If the model is substantially modified, then the analytic calculation of optima and equilibria ceases to be feasible. In such a case, only numerical methods can be used. Therefore, the analysis of dynamic optimization problems or dynamic games of this type using numerical methods is really needed. To that end, one of the consequences of this paper is an answer to the question whether using numerical methods in a dynamic game or dynamic optimization problems with singularities in payoffs resulting from considering instantaneous and terminal payoffs with a logarithmic part, may result in reasonable outcomes. Surprisingly, in this study, the answer is positive even without using any knowledge about the type of singularity a priori.
The main theoretical finding is an answer to the question, whether such a phenomenon may appear in a more general class of dynamic optimization problems or dynamic games and how one can use very incomplete initial knowledge about the solution considered (the optimal control or Nash equilibrium) and the value function to simplify derivation of this solution.
The paper is composed as follows. In Section 2, the Motivating Example of the Fish Wars model with a finite horizon is presented. The problems of finding the dynamic optimum and Nash equilibrium in feedback strategies are solved numerically by an algorithm that does not assume a priori any specific form of solution and the grid is deliberately inaccurate on some subsets of the state space, and the numerical solutions are compared to the analytical solution graphically in Section 2.2, while the computational algorithms are presented in the Appendix. The results for this Motivating Example are the motivation to the theoretical analysis of this paper: Section 3 analyses when, given a certain set of initial conditions, the errors in the value function do not propagate on the optimal paths for a dynamic optimization problem in the general form, while Section 4 analyses possible extensions of those results to Nash equilibria in dynamic games in the general form.

2. The Motivating Example—Fish Wars—The Optima and Equilibria

We consider a dynamic game with n players, being countries exploiting the same marine fishery with one species of fish.
The terminal time T is finite, while the initial time instant is t 0 . At T + 1 , we calculate the salvage value.
The state variable is the biomass, and it is denoted by x. The set of its possible values is X = [ 0 , 1 ] . At state x, each of the countries can extract/consume not more than x n , which reflects the situation in which fish is uniformly distributed over the sea and each country can fish only in its Exclusive Economic Zone, equal for each country.
In this paper, we consider the feedback information structure (according to the notation of Haurie et al. [40]; in dynamic games called also closed loop no-memory or Markovian, while in optimal control also closed loop), i.e., the strategies which we consider are feedback strategies, by which we understand that consumption is a function of both time and state and it is independent of the initial state. Explicitly stating the form of strategies is important, since, although for dynamic optimization, the form of strategies does not influence the results, Nash equilibria for open loop and feedback strategies are usually different, as well as methods to obtain them are different (discussion about difference and rare cases of coincidence of those two kinds of equilibria can be found in e.g., Wiszniewska-Matyszkiel [25]).
The consumption of country i is c i : { t 0 , , T } × X [ 0 , 1 ] with c i ( t , x ) x n . The set of all such functions c i is denoted by C i , while C = C 1 × × C n
Instantaneous payoff of country i for given c i is ln c i (with ln 0 = ).
The terminal payoff is ln x n , which means that after the termination of the game the countries divide the remaining biomass equally.
Payoffs are discounted by a discount factor β ( 0 , 1 ) .
The trajectory X of the biomass resulting from choosing a strategy profile c is
X ( t + 1 ) = X ( t ) i = 1 n c i ( t , X ( t ) ) α ,
with the initial condition X ( t 0 ) = x 0 and for some constant α ( 0 , 1 ) .
If we start at t 0 from the state x 0 , then the total payoff (payoff for short) of player i in the game is
J i t 0 , x 0 , c i , c i = t = t 0 T β t t 0 ln c i ( t , X ( t ) ) + β T + 1 t 0 ln X ( T + 1 ) n ,
where c i denotes the vector of all c i for j i .
The first notion we are interested in is the social optimum. Analogously to Levhari and Mirman [15] and Okuguchi [16], where such profiles are called cooperative solutions, from a wide class of Pareto optimal profiles, we restrict to strategy profiles c maximizing the sum of payoffs, i = 1 n J i t 0 , x 0 , c i , c i . This is an obvious choice from the set of optimal profiles if we assume that players’ payoffs are transferable. We shall call such a profile the social optimum. It can be regarded as the optimization of an abstract social planner whose aim is to maximize the joint payoff.
The other concept we are interested in is the Nash equilibrium.
Definition 1.
a) A profile of strategies c ¯ is a social optimum if
c ¯ Argmax c C i = 1 n J i t 0 , x 0 , c i , c i .
b) A Nash equilibrium is a profile of strategies c ¯ , such that no player can benefit from unilateral deviation from it, i.e.,
c ¯ i Argmax c i C i J i t 0 , x 0 , c i , c ¯ i .

2.1. Analytic Solutions

The results for this model are regarded as quite standard and they have been calculated by applying the standard backwards induction reasoning using the Bellman equation. Although the original papers of Levhari and Mirman [15] and Okuguchi [16] do not state the whole finite horizon solutions and value functions, they can be easily derived to the form presented below. The specific form we cite here together with the proof can be found in Singh [41], the unpublished thesis whose Chapter 5 is based on a part of this paper.
First, we cite the results for the problem of social optimum.
Proposition 1.
There is a unique social optimum profile c S . It is symmetric and such that, for t = t 0 , , T , for every player, we have
c i S ( t , x ) = a t S · x for a t S = 1 n · B t S ,
where B t S = i = 0 τ + 1 ( α β ) i for τ = T t . Moreover, for t = t 0 , , T + 1 , the value function V S
V S ( t , x ) : = sup c C i = 1 n J i t , x , c i , c i = n ( A t S + B t S ln x ) ,
for A t S = β τ + 1 ln 1 n + i = 0 τ β τ i ( B T i S 1 ) ln B T i S 1 B T i S + ln 1 n B T i S .
Next, we state the results for the Nash equilibrium.
Proposition 2.
There is a unique Nash equilibrium profile c N . It is symmetric and such that for t = t 0 , , T , for every player i, we have
c i N ( t , x ) = a t N · x f o r a t N = 1 n + B t N ,
where B t N = i = 1 τ + 1 ( α β ) i for τ = T t . Moreover, for t = t 0 , , T + 1 , the value function of player i V i N given the strategies of the other players are c i N
V i N ( t , x ) : = sup c i C i J i t , x , c i , c i N = A t N + ( B t N + 1 ) ln x
for A t N = β τ + 1 ln 1 n + i = 0 τ β τ i B T i N ln B T i N n + B T i N + ln 1 n + B T i N .

2.2. Comparison of Analytic and Numerical Results

Here, we compare the actual results, calculated in Section 2.1, to the results of numerical computation according to the Algorithms A1 and A2 (precisely described in Appendix A).
The figures are for the values of the parameters: n = 2 , α = 0.6 , β = 1 1.02 , T = 10 , t 0 = 1 , x 0 = 0.025 x * , where x * = α β α 1 α is the steady state of the infinite horizon social optimum problem. Nevertheless, due to the reduction of the initial problem to Equations (A1) and (A2) for the social optimum problem and to Equations (A3) and (A4) for the Nash equilibrium problem, as it was done in Appendix A, increasing n increases neither the complexity nor the errors.
We compare the actual results to the numerical results with an initial uniform grid for x of 100 points refined on the interval [ 0 , 1 2 ] to about 10 4 points. For the Nash equilibrium, the number of grid points for o—the sum of decisions of the other players (see Algorithm A2 in Appendix A), for each iteration is 21, and the number of iterations is 4. Intentionally, we do not increase further the number of points in the grid for state variable for very small x or x > 1 2 .

2.2.1. The Social Optimum

As we can see in Figure 1a, there is a substantial difference between the actual and numerical value functions for two regions of the set of states x: close to 0, the point of a singularity of the actual value function, where the approximate value function is substantially greater, and the interval ( 1 2 , 1 ] , at which the grid is sparse and the approximate value function is substantially less. In fact, using numerical methods for problems with known points of singularities and not making the grid substantially finer as it converges to the singularity point of the calculated function means that we purposely leave the grid too sparse also for this region. Thus, our procedure means that we a priori allowed the value function to have errors substantially larger on some regions than inherent errors resulting from using numerical methods (if appropriate grids are used).
Despite those differences, both numerical and actual consumptions are apparently identical with errors of rank 10 4 (see Figure 1b, mainly at the region with the sparse grid).
Similarly, the optimal trajectory, as well as the optimal consumption along it, are apparently identical to their numerically computed counterparts (Figure 2a and Figure 3a). In this case, the rank of error decreases to 10 6 (Figure 2b and Figure 3b). This fact is explained by Theorem 1, especially 1c). Thus, there is no need to refine the grid for x at the interval ( 1 2 , 1 ] , at which it is sparse, as well as on the set of points close to the singularity at 0, since neither numerical nor analytic optimal trajectory has a non-empty intersection with those sets.

2.2.2. Nash Equilibrium

Analogously, for the Nash equilibrium, when we compare the actual results with the results of the numerical analysis, we have the same observations: the difference between the value functions (Figure 4a) is large on the same regions as for the social optimum, with apparently equal consumptions, c N ( x , t ) of error of rank 10 4 (Figure 4b), with two ranks having better accuracy of the Nash equilibrium consumption and state trajectories (Figure 5a and Figure 6a) with errors of rank 10 6 (Figure 5b and Figure 6b).
Note that the rank of accuracy for computing equilibria is the same as for the less complex problem of computing social optima. It means that the accuracy of finding the fixed point was very high. Thanks to the iterative procedure of refining the grid on small intervals (see Algorithm A2 described in the Appendix A), it was at a reasonable time cost.

2.3. General Conclusions from the Analysis of the Fish Wars Example

In both cases: the social optimum and the Nash equilibrium, we obtained a substantial error in the value functions on some intervals and high accuracy of the optimum/equilibrium trajectory and consumption path. This situation is rather unusual.
It is worth emphasizing that this holds not only for specific parameters and grids which we present graphically in this paper, but it is a more general rule. Even the first results obtained for the procedure of finding social optima, with quite few points of the grid, assumed to be used to test the code, before refinement of the grid on certain subsets of the set of states was introduced, revealed the same apparent paradox. While there was a considerable error in the value function, especially at the initial time and the regions close to boundaries of the set of states, the social optimum consumption path c S ( t , X ( t ) ) as well as X ( t ) was computed with unexpected (for this inaccuracy of V) accuracy.
A similar paradox took place while computing the Nash equilibrium. In spite of inaccurate computation of the value function close to the boundaries, the accuracy of computing the equilibrium path is comparable to the distance between the grid points for the sum of decisions of the other players o—the maximal precision that can be expected.
This has been obtained although we haven’t used knowledge about the actual value function and optimal strategy/equilibrium specific to this model while preparing the code for the numerical part, in order to be able to assess the applicability of this approach to a wider class of problems with similar properties.
The only information that we have used is the fact that the value function is continuous, regular for x > 0 and V ( t , 0 ) = , the optimal trajectory remains below a certain level, (here, we took 1 2 ) whenever the initial condition x 0 is below this level and it is over some small ϵ > 0 whenever x 0 is, while, in computation of equilibrium, we have also used the fact that the best response of a player is a decreasing function of joint consumption of the others and that the equilibrium exists.

3. When and What Kind of Inaccuracies in Computation or Calculation of the Value Function Do Not Propagate on the Optimal Path in Dynamic Optimization Problems

In this section, we return to the general theoretical motivation of this paper, which is the explanation when paradoxes similar to those obtained for the Motivating Example in Section 2.3: high accuracy of derived optimal state trajectory and consumption path despite the inaccuracy of the corresponding value function on some intervals.
We want to emphasize that a situation in which substantial inaccuracy in a calculation of the value function does not propagate on the optimum path is very unusual. There are examples of a sequence of, regarded as regular, linear quadratic optimization problems, in which an error in calculation of the value function and the resulting “candidate for the optimal control”, for which the Bellman sufficient condition is fulfilled besides an interval with a length converging to zero as some parameter converges to zero that result in false value function and false optimal control on a considerable part of the set of states (Singh and Wiszniewska-Matyszkiel [9]).
We can prove the following theorems, which allow, in a very general environment of dynamic optimization problems, to determine the type of errors in approximation of the value function—either resulting from using numerical computation with low accuracy on some sets, or from replacing the actual value function by some a priori estimation of it on some sets—which do not influence the accuracy of the optimal trajectory and the optimal control path, i.e., the trajectory of the optimal control along the corresponding trajectory.
We consider any discrete time dynamic optimization problem:
  • with the time set { 0 , 1 , } or { 0 , 1 , , T } , denoted by T ; the finite or infinite horizon is denoted by T;
  • with discount factor β ( 0 , 1 ] ;
  • the set of states X ;
  • the set of control parameters C ;
  • the current payoff function P : T × X × C R { } ;
  • where we consider the feedback controls c : T × X C (which, if independent of time, are denoted as c : X C )
  • with admissibility constraint c ( t , x ) C ( t , x ) for all t , x and for C : T × X C ; the set of admissible controls is denoted by C , and
  • the transformation of the state variable is given by a function ϕ : T × X × C X or, more generally
    ϕ : T × X × C X ˜ X with ϕ ( t , x , c ) X for all ( t , x , c ) T × X × C with c C ( t , x ) . In the latter case, there is no need to specify X ˜ , as it does not influence the results.
  • In the case when the time horizon is finite, we also consider a terminal payoff given by a function G : X R { } , paid after the termination of the game.
The aim is to maximize, given t 0 T and x 0 X , the objective functional J ( t 0 , x 0 , c ) over the set of admissible controls C , where J : T × X × C R { } is defined by
  • J ( t 0 , x 0 , c ) = t = t 0 P ( t , X ( t ) , c ( t , X ( t ) ) ) β t t 0 for the infinite time horizon, while
  • J ( t 0 , x 0 , c ) = t = t 0 T P ( t , X ( t ) , c ( t , X ( t ) ) ) β t t 0 + G ( X ( T + 1 ) ) β T + 1 t 0 for a finite time horizon T,
  • where the trajectory X corresponding to c is given by X ( t + 1 ) = ϕ ( t , X ( t ) , c ( t , X ( t ) ) )
    with the initial condition X ( t 0 ) = x 0 .
  • Whenever we want to emphasize the dependence of X on a control c, we write X c , if we want to emphasize the dependence of X on the initial condition, we write X t 0 , x 0 , while X t 0 , x 0 c if we want to emphasize both.
  • We assume that the problem is such that J is always well defined although it may be ± .
  • We restrict the set of initial conditions to x 0 X 0 X .
In the finite time horizon case, the necessary and sufficient condition for a function V : ( T { T + 1 } ) × X R { } to be the value function, i.e., to fulfill for every ( t , x ) , V ( t , x ) = sup c C J ( t , x , c ) and an admissible control c ¯ to be the optimal control, is the Bellman equation (see, e.g., Bellman [42], Blackwell [43], Stokey and Lucas [1] Başar and Olsder [44] or Haurie, Krawczyk and Zaccour [40] for various versions)
V ( t , x ) = max c C ( t , x ) P ( t , x , c ) + β V ( t + 1 , ϕ ( t , x , c ) ) for all t T , x X ;
with the terminal condition
V ( T + 1 , x ) = G ( x ) for all x X
and the inclusion
c ¯ ( t , x ) Argmax c C ( t , x ) P ( t , x , c ) + β V ( t + 1 , ϕ ( t , x , c ) ) for all t T , x X .
For the infinite time horizon optimization, the Bellman Equation (3) and the inclusion (5) remain the necessary condition, they also consitute a sufficient condition with certain terminal condition: usually, the standard terminal condition (see, e.g., Stokey, Lucas and Prescott [1], Theorems 4.2–4.4)
lim sup t β t V ( t , X ( t ) ) = 0 for every admissible trajectory X .
Obviously, in the infinite time horizon version of our motivating example, the standard terminal condition (6) does not hold, so it has to be replaced by a weaker one, which, together with the Bellman Equation (3) and the inclusion (5), forms a sufficient condition (Wiszniewska-Matyszkiel [45], Theorem 1):
lim sup t V ( t , X ( t ) ) β t 0 for every admissible trajectory X and
lim sup t V ( t , X ( t ) ) β t < 0 J ( t ¯ , x ¯ , c ) = for every ( t ¯ , x ¯ ) T × X and every control c such that X = X t ¯ , x ¯ c .
Terminal conditions (7) and (8) have been proven to be necessary in a large class of optimal control problems including Fish Wars in Wiszniewska-Matyszkiel and Singh [3].
We introduce the following notation:
  • V—the actual value function of the dynamic optimization problem;
  • V approx —another function regarded as an approximation of V; it may be either a solution of a numerical procedure or the actual V with values on certain subsets replaced by another value, e.g., some constraint known a priori.
  • RHS t , x ( c ) = P ( t , x , c ) + β V ( t + 1 , ϕ ( t , x , c ) ) —the maximized function on the right-hand side of the Bellman Equation bellman-general (as a function of c).
  • RHS t , x approx ( c ) = P ( t , x , c ) + β V approx ( t + 1 , ϕ ( t , x , c ) ) —the maximized function on the right-hand side of the Bellman Equation (3) with V replaced by V approx (as a function of c).
  • OPT —the set of optimal controls; we assume that it is non-empty.
  • OPT approx —the set of controls c ˜ C such that c ˜ ( t , x ) Argmax c C RHS t , x approx ( c ) ; we assume that it is non-empty.
  • Ω = { ( t , x ) : X t 0 , x 0 c ¯ ( t ) = x for some c ¯ OPT and x 0 X 0 } .
  • For c ¯ C , Ω c ¯ = { ( t , x ) : X t 0 , x 0 c ¯ ( t ) = x for some x 0 X 0 }.
  • Ω approx = { ( t , x ) : X t 0 , x 0 c ˜ ( t ) = x for some c ˜ OPT approx and x 0 X 0 } .
We can state the following theorem, explaining the apparent paradox of low errors in the computation of the optimal control path despite substantial errors in the computation of the value function in our Motivating Example.
Theorem 1.
Assume that either the horizon is finite or the terminal condition (7) and (8) hold for V.
Assume also that one of the following holds:
a) V ( t , x ) = V approx ( t , x ) for all ( t , x ) Ω approx , and V ( t , x ) V approx ( t , x ) for all ( t , x ) T × X ;
b) V ( t , x ) = V approx ( t , x ) for all ( t , x ) Ω , and V ( t , x ) V approx ( t , x ) for all ( t , x ) T × X ;
c) V ( t , x ) = V approx ( t , x ) for all ( t , x ) Ω Ω approx .
Then, Ω = Ω approx , for every c ¯ OPT there exist c ˜ OPT approx such that c ˜ | Ω = c ¯ | Ω and for every c ˜ OPT approx there exist c ¯ OPT such that c ˜ | Ω = c ¯ | Ω .
To prove Theorem 1, we formulate the following lemmata.
Lemma 1.
Consider an arbitrary set C and two functions f , g : C R ¯ with f ( c ) = g ( c ) for all c Argmax f Ø and f ( c ) g ( c ) otherwise.
Then, Argmax f = Argmax g .
Proof. 
Take c ¯ Argmax f and any other c.
By the assumptions, g ( c ) f ( c ) f ( c ¯ ) = g ( c ¯ ) . Thus, c ¯ Argmax g .
Next, take c ˜ Argmax g and assume that c ˜ Argmax f . Take c ¯ Argmax f .
By the assumptions, g ( c ¯ ) g ( c ˜ ) f ( c ˜ ) < f ( c ¯ ) = g ( c ¯ ) , which is a contradiction. □
Lemma 2.
Consider an arbitrary set C and two functions f , g : C R ¯ with f ( c ) = g ( c ) for all c Argmax f Argmax g with both Argmax f , Argmax g Ø .
Then, Argmax f = Argmax g .
Proof. 
Take c ˜ Argmax g and c ¯ Argmax f .
By the assumptions, f ( c ˜ ) f ( c ¯ ) = g ( c ¯ ) g ( c ˜ ) = f ( c ˜ ) , which implies that f ( c ˜ ) = f ( c ¯ ) and g ( c ¯ ) = g ( c ˜ ) . Thus, c ˜ Argmax f and c ¯ Argmax g . □
Proof of Theorem 1.
We start by noting that if ( t , x ) Ω , then
for all c ¯ Argmax c C ( t , x ) RHS t , x ( c ) , ( t + 1 , ϕ ( t , x , c ¯ ) ) Ω ,
and if ( t , x ) Ω approx , then
for all c ˜ Argmax c C ( t , x ) RHS t , x approx ( c ) , ( t + 1 , ϕ ( t , x , c ˜ ) ) Ω approx .
The proof will be by induction over t.
Together with Ω = Ω approx , we prove that Argmax c C ( t , x ) RHS t , x ( c ) = Argmax c C ( t , x ) RHS t , x approx ( c ) for all ( t , x ) Ω (and, consequently, for all ( t , x ) Ω approx ).
First, consider t = t 0 and arbitrary x X 0 .
In this case, Ω { ( t , y ) T × X : t t 0 } = { t 0 } × X 0 = Ω approx { ( t , x ) T × X : t t 0 } in all the cases a)–c).
Next, consider any t t 0 and any x such that ( t , x ) Ω .
Assume that Ω { ( k , y ) T × X : k t } = Ω approx { ( k , y ) T × X : k t } .
By Lemma 1 applied to the functions RHS t , x approx and RHS t , x , and Equations (10) and (9), respectively, we obtain Argmax c C ( t , x ) RHS t , x ( c ) = Argmax c C ( t , x ) RHS t , x approx ( c ) in cases a) and b).
By Lemma 2 applied to the functions RHS t , x approx and RHS t , x , and any of Equation (10) or (9), we obtain that Argmax c C ( t , x ) RHS t , x ( c ) = Argmax c C ( t , x ) RHS t , x approx ( c ) in case c).
Consequently, in all cases a)–c), Ω { ( k , y ) T × X : k t + 1 } = Ω approx { ( k , y ) T × X : k t + 1 } .
This ends the proof that Ω = Ω approx and that Argmax c C ( t , x ) RHS t , x ( c ) = Argmax c C ( t , x ) RHS t , x approx ( c ) for all ( t , x ) Ω .
Take any c ˜ OPT approx . Define c ¯ C such that c ¯ ( t , x ) = c ˜ ( t , x ) for all ( t , x ) Ω and c ¯ ( t , x ) being any selection from Argmax c C ( t , x ) RHS t , x ( c ) , otherwise.
The terminal condition either (4) or (7) and (8) is fulfilled.
By the fact that Argmax c C ( t , x ) RHS t , x ( c ) = Argmax c C ( t , x ) RHS t , x approx ( c ) also the Bellman equation is fulfilled, so c ¯ OPT .
Next, take any c ¯ OPT . Define c ˜ C such that c ˜ ( t , x ) = c ¯ ( t , x ) for all ( t , x ) Ω and c ˜ ( t , x ) being any selection from Argmax c C ( t , x ) RHS t , x approx ( c ) , otherwise. Since Argmax c C ( t , x ) RHS t , x ( c ) = Argmax c C ( t , x ) RHS t , x approx ( c ) , by the definition, c ¯ OPT approx . □
Theorem 1 states that any error described by a), b), or c) does not influence the quality of computation or calculation of the optimal state trajectory of the state variable and the optimal control path, c ( t , X ( t ) ) , i.e., the approximate X and C ( t , X ( t ) ) are the same as for an optimal control and every optimal control can be calculated using such an approximate value function and the result will be correct along the corresponding trajectory. Thus, overestimation of V on the set of points (i.e., states or time-state pairs) which never belong to the trajectory corresponding to the control defined by maximization of the Bellman equation with the approximate value function, as well as underestimation of the value function on the set of points which are suboptimal along the optimal trajectory, does not lead to any errors in calculation of the optimal consumption path and the optimal trajectory of the state variable. Similarly, this holds for any errors at points which are neither at the optimal trajectory nor the trajectory corresponding to the control defined by maximization of the Bellman equation with the approximate value function.
This is why we have purposely decreased the number of grid points in computation of the solutions in the Fish Wars example for x > 1 2 , although on this set there is a very visible difference between the actual and approximate value functions, in addition to us not caring about large, opposite in sign, differences between the value functions for x very close to 0.
Theorem 2.
Assume that either the horizon is finite or the terminal conditions (7) and (8) hold for V.
Consider a control c C .
Assume also that one of the following holds:
a) If c OPT approx and V ( t , x ) = V approx ( t , x ) for all ( t , x ) Ω c , and V ( t , x ) V approx ( t , x ) for all ( t , x ) T × X , then there exist c ¯ OPT such that Ω c ¯ = Ω c and c | Ω c = c ¯ | Ω c .
b) If c OPT and V ( t , x ) = V approx ( t , x ) for all ( ( t , x ) ) Ω c , and V ( t , x ) V approx ( t , x ) all ( t , x ) T × X , then there exist c ˜ OPT approx such that Ω c ˜ = Ω c c ˜ | Ω = c | Ω .
Proof. 
Following almost the same lines as proof of Theorem 1, we only concentrate on a single control from OPT or OPT approx . □

Illustration of Usefulness of Theorems 1 and 2 by Examples

For simplicity, we consider the social optimum problem and we show how Theorem 1 can be applied in order to simplify the computation of the social optimum.
We consider the social optimum in the Levhari and Mirman model, either with a finite horizon, like those analyzed in Section 2, or its infinite horizon version. In the analysis of the infinite horizon problem, we consider the value function and controls dependent on the state variable only, while in the finite horizon, they are dependent also on time.
To show that the method is general and it can be applied for the infinite time horizon, we introduce the limit game of our Motivating Example with the infinite time horizon.
Proposition 3.
Consider the game from Section 2 but with T = + . The social optimum for the infinite horizon problem is c i S ( x ) = x ( 1 α β ) n with the value function V i S ( x ) = n ( A S + B S ln ( x ) ) , for A S = 1 1 β α β 1 α β ln ( α β ) + ln ( 1 α β ) ln ( n ) and B S = 1 1 α β , while the Nash equilibrium is c i N ( x ) = 1 α β n ( 1 α β ) + α β , with the value function V i N ( x ) = A N + B N ln ( x ) for A N = 1 1 β α β 1 α β ln α β n ( 1 α β ) + α β + ln 1 α β n ( 1 α β ) + α β and B N = 1 1 α β .
Proof. 
The formulae have been proposed by Levhari, Mirman [15], and Okuguchi [16]. Checking that the Bellman Equations (3) and (5) hold is just a substitution. Nevertheless, the proofs of [15,16] lack checking the necessary terminal conditions (7) and (8), while the standard sufficient terminal condition (6) does not hold.
Thus, to complete the proof, we check the terminal conditions (7) and (8). Equation (7) is immediate by the fact that ln ( c i ) 0 for all admissible c i .
To prove Equation (8) for the social optimum, we consider a strategy c for which lim sup t β t · V S ( X c ( t ) ) < 0 . Thus, there exists a subsequence t k such that lim k β t k V S ( X c ( t k ) ) < 0 . Thus, lim k β t k ln ( X c ( t k ) ) < 0 .
i = 1 n J i t 0 , x 0 , c i , c i = i = 1 n t = t 0 β t t 0 ln ( c i ( X c ( t ) ) ) i = 1 n k = 0 β t k t 0 ln ( c i ( X c ( t k ) ) )
i = 1 n k = 0 β t k t 0 ln ( X c ( t k ) ) n .
The proof for the Nash equilibrium is analogous. □
Example 1.
Assume that some preliminary analysis has been done for the problem resulted in finding an ϵ < x 0 for which it has been proven that the optimal trajectory X ( t ) > ϵ for all t. Such an ϵ obviously exists.
Changing V by assigning V approx = for all x < ϵ 2 (see Figure 7) changes neither the optimal trajectory nor the optimal control path.
Thus, if we want to compute the optimal control, this substitution allows us to look for the social optimum and the value function for x ϵ 2 only and to avoid problems resulting from inaccuracies resulting from closeness to the actual singularity.
Example 2. 
Assume again that some preliminary analysis done for the problem has resulted in finding an ϵ < x 0 for which we know that the optimal trajectory X ( t ) > ϵ for all t.
Since V ( x ) J ( t 0 , x , c ) for every control c, changing V by assigning V approx ( x ) = J ( t 0 , x , c ¯ ) for all x < ϵ 2 , for any control c ¯ changes neither the optimal trajectory nor the optimal control path.
Thus, if we want to compute the optimal control, this substitution allows us to look for the value function for x ϵ 2 only (and the resulting optimal control).
Example 3.
Assume that preliminary analysis done for the problem resulted in finding constants a and b for which we know that a x + b is an upper bound for the value function for x > 1 2 (see Figure 8) and in discovering the fact that if x 0 1 2 ϵ , then, for all t, the optimal trajectory X t 0 , x 0 c ¯ ( t ) 1 2 ϵ .
If we change V by assigning V approx ( x ) = a x + b for all x > 1 2 , and calculate the maximand c ˜ of the right-hand side of the Bellman equation with V approx , then, if the trajectory X t 0 , x 0 c ˜ ( t ) < 1 2 ϵ for all t, then c ˜ ( t , X t 0 , x 0 c ˜ ( t ) ) is the accurate solution path.
Thus, if we want to compute the optimal control, this substitution allows us to restrict the computation of the value function for x < 1 2 ϵ with one restriction: it is proven that our result is really the optimal control only when for all x 0 < 1 2 ϵ , the computed c ˜ is such that X t 0 , x 0 c ˜ ( t ) < 1 2 ϵ for all t.
Example 4.
To define our next V approx , we consider two small numbers 1 > > ϵ 1 > ϵ > 0 ( ϵ 1 is fixed, while ϵ will be adjusted) and some obvious overestimation V + V , e.g., V + ( x ) = ln x . Next, we define V ϵ : [ ϵ , 1 ] R as the solution the Bellman equation with one change— V ( ( x c i ) α ) in its rhs is replaced by V + ( ( x c i ) α ) whenever ( x c i ) α < ϵ —that fulfills the terminal conditions (7) and (8). Note that such a solution obviously exists and it is unique and it coincides with the actual value function of the problem with an additional constraint on controls ( x c i ) α ϵ besides a small neighbourhood of 0 if ϵ is small enough.
If ϵ is such that the solution V ϵ 2 is invariant with respect to ϵ 2 for 0 < ϵ 2 < ϵ on the interval [ ϵ 1 , 1 ] , then we can consider
V approx ( x ) = V + ( x ) if x < ϵ , V ϵ ( x ) o t h e r w i s e .
This assumption is easy to check—it is enough to find ϵ for which the maximum in the rhs of the Bellman equation with the corresponding V approx is attained only for c for which ( x c i ) α ϵ 1 whenever x ϵ 1 . For our example, it obviously holds, since V + ( x ) as x 0 .
Then, every control c ˜ maximizing RHS t , x approx coincides with the actual optimal control whenever x 0 ϵ 1 .
Note that, in this case, the only knowledge that we used about the value function is its obvious overestimation V + .
A theorem stating how to find over- and under-estimations of the actual value function by solving some inequalities analogous to the Bellman equation with the terminal conditions (7) and (8) are in, e.g., Kamihigashi [8].
Those two types of over- and under-estimations of the value functions in the infinite horizon may also appear as a result of approximating the infinite horizon value function by its finite horizon truncations—for nonnegative payoffs, they are underestimations while, for the nonpositive payoffs, they are overestimations—or a result of an iterative procedure. Such procedures of estimating the value function by value iterations is currently being extensively studied in the infinite horizon problems (for various versions, see, e.g., Martins-da-Rocha and Vailakis [4] and Le Van and Morhaim [5] for continuous problems and Kamihigashi [8] for discontinuous problems).

4. When and What Kind of Inaccuracies in Computation or Calculation of the Value Functions Do Not Propagate on the Nash Equilibrium Path in Dynamic Games

Here, we extend the results of Section 3 for Nash equilibria in dynamic games.
We consider a discrete time dynamic game:
  • with n players;
  • with finite or infinite horizon T;
  • the time set { 0 , 1 , } or { 0 , 1 , , T } is denoted by T ;
  • with discount factor β ( 0 , 1 ] ;
  • the set of states X ;
  • players’ sets of decisions C i
  • with notation C i = × j i C j and C = × j = 1 , , n C j ;
  • players’ current payoff functions P i : T × X × C i × C i R { } ;
  • where we consider feedback strategies c i : T × X C i (if independent of time, they are written as c i : X C i )
  • which fulfill the admissibility constraint c i ( t , x ) C i ( t , x ) for all t , x and for C i : T × X C i .
  • We denote the set of admissible strategies of player i by C i , while the set of strategies of the remaining players by C i and the set of strategy profiles by C ;
  • For simplicity, we introduce the symbol [ c i , c ¯ i ] to denote the profile at which player i chooses c i while the remaining players choose their strategies c ¯ i , which are their strategies resulting from c ¯ ;
  • the transformation of the state variable is determined by ϕ : T × X × C X or, more generally,
    ϕ : T × X × C X ˜ X with ϕ ( t , x , c ) X for all ( t , x , c ) T × X × C with c C ( t , x ) . Specifying X ˜ is unnecessary since it does not influence the results.
  • In the case when the time horizon is finite, we also consider terminal payoffs given by functions G i : X R { } , paid after termination of the game.
  • The payoff is J i t 0 , x 0 , c i , c i = t = t 0 β t t 0 P i t , X ( t ) , c i ( t , X ( t ) ) , c i ( t , X ( t ) ) in the case of the infinite time horizon
  • and J i t 0 , x 0 , c i , c i = t = t 0 T β t t 0 P i t , X ( t ) , c i ( t , X ( t ) ) , c i ( t , X ( t ) ) + β T + 1 t 0 G ( X ( T + 1 ) ) for finite time horizon T,
  • where the trajectory X corresponding to a strategy profile c is given by
    X ( t + 1 ) = ϕ ( t , X ( t ) , c ( t , X ( t ) ) ) with the initial condition X ( t 0 ) = x 0 .
  • Whenever we want to emphasize the dependence of X on a strategy c i of player i, we write X c i , if we want to emphasize the dependence of X on the initial condition, we write X t 0 , x 0 , while X t 0 , x 0 c i if we want to emphasize both. If we also want to emphasize the strategies of the remaining players, we write X t 0 , x 0 [ c i , c i ] , or X t 0 , x 0 c whenever c denotes the whole profile.
  • We assume that the problem is such that J i is always well defined although it may be ± .
  • We restrict the initial condition to x 0 X 0 X .
At a feedback Nash equilibrium, given strategies of the other players c i C i , the aim of each player is to maximize, for each t 0 T and x 0 X , the objective functional J i t 0 , x 0 , c i , c i over the set of admissible controls C i . Formally, it can be defined as
Definition 2.
A feedback Nash equilibrium is a profile of strategies c ¯ such that for every player i, for every strategy c i of player i, for every t 0 and x 0 ,
J i t 0 , x 0 , c i , c ¯ i J i t 0 , x 0 , c ¯ i , c ¯ i .
First, consider a profile of strategies c ¯ and we fix the strategies of the others c ¯ i C i .
We introduce the following notations:
  • V i ( · , · , c ¯ i ) : T ¯ × X R ¯ —the actual value function of the dynamic optimization problem of player i given c ¯ i , i.e., V i ( t , x , c ¯ i ) = sup c i C i J i ( t , x , c i , c i ) .
  • V i approx ( · , · , c ¯ i ) : T ¯ × X R ¯ —another function regarded as approximation of V i ( · , · , c ¯ i ) ; it may be either a solution of a numerical procedure or the actual V i ( · , · , c ¯ i ) with values on certain subsets replaced by another values, e.g., some constraints; we assume a priori that, in the finite time horizon T, V i approx ( T + 1 , x , c ¯ i ) = G ( x ) .
  • RHS i , t , x , c ¯ i ( c i ) = P i ( t , x , c i , c ¯ i ( t , x ) ) + β V i ( t + 1 , ϕ ( t , x , [ c i , c ¯ i ] ) , c ¯ i ) —the maximized function on the right-hand side of the Bellman Equation (3) rewritten to maximization of player i given c ¯ i ;
  • RHS i , t , x , c ¯ i approx ( c i ) = P i ( t , x , c i , c ¯ i ( t , x ) ) + β V i approx ( t + 1 , ϕ ( t , x , [ c i , c ¯ i ] ) , c ¯ i ) —the maximized function on the right-hand side of the Bellman Equation (3) rewritten to maximization of player i given c ¯ i with V i replaced by V i approx .
  • BEST i ( c ¯ i ) —the set of strategies of player i which are best responses to c ¯ i ; we assume that it is non-empty;
  • BEST i approx ( c ¯ i ) —the set of strategies c ˜ i such that
    c ˜ i ( t , x ) Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i approx ( c i ) ; we assume that it is non-empty;
  • For c i C i : Ω i c i ( c ¯ i ) = { ( t , x ) : X t 0 , x 0 [ c i , c ¯ i ] ( t ) = x for some x 0 X 0 } ;
  • For c C : Ω c = { ( t , x ) : X t 0 , x 0 c ( t ) = x for some x 0 X 0 } ;
  • Ω i ( c ¯ i ) = { ( t , x ) : X t 0 , x 0 [ c i , c ¯ i ] ( t ) = x for some c i BEST i ( c ¯ i ) and x 0 X 0 } ;
  • Ω i approx ( c ¯ i ) = { ( t , x ) : X t 0 , x 0 [ c i , c ¯ i ] ( t ) = x for some c BEST i approx ( c ¯ i ) , x 0 X 0 } .
We can state the following theorem, being an immediate corollary of Theorem 1.
Theorem 3.
Consider a profile c ¯ and assume that all players besides i play c ¯ j and in the case of the infinite time horizon V i ( · , · , c ¯ i ) fulfills the terminal conditions (7) and (8).
Assume also that one of the following holds:
a) V i ( t , x , c ¯ i ) = V i approx ( t , x , c ¯ i ) for all ( t , x ) Ω i approx ( c ¯ i ) and V i ( t , x , c ¯ i ) V i approx ( t , x , c ¯ i ) , otherwise;
b) V i ( t , x , c ¯ i ) = V i approx ( t , x , c ¯ i ) for all ( t , x ) Ω i ( c ¯ i ) and V i ( t , x ) V i approx ( t , x , c ¯ i ) , otherwise;
c) V i ( t , x , c ¯ i , ) = V i approx ( t , x , c ¯ i ) for all ( t , x ) Ω i ( c ¯ i ) Ω i approx ( c ¯ i ) .
Then, Ω i approx ( c ¯ i ) = Ω i ( c ¯ i ) , for every c ¯ i BEST i ( c ¯ i ) there exist c ˜ i BEST i approx ( c ¯ i ) such that c ˜ i | Ω i ( c ¯ i ) = c ¯ | Ω i ( c ¯ i ) and for every c ˜ i BEST i approx ( c ¯ i ) there exist c ¯ i BEST i ( c ¯ i ) such that c ˜ i | Ω i ( c ¯ i ) = c ¯ i | Ω i ( c ¯ i ) .
Proof. 
Immediately, by Theorem 1 applied to the optimization of player i given the strategies of the others c ¯ i . □
Theorem 3 for c ¯ being the Nash equilibrium profile states that, given the initial state x 0 X 0 and correct expectation of the other players’ strategies c ¯ i , overestimation of V i ( t , x , c ¯ i ) on the set of ( t , x ) which never belong to the trajectory corresponding to the control defined by maximization of the corresponding Bellman equation with the approximate value function, as well as underestimation of the value function on the set of points which are suboptimal along the Nash equilibrium trajectory, as well as any error appearing outside the union of those sets does not lead to any errors in calculation of the Nash equilibrium consumption path and trajectory of the state variable. It may help in calculation or computation of a Nash equilibrium only if we know Nash equilibrium strategies of all players but one.
The next question is what we can expect if the expectation of the other players’ strategies c ˜ i is correct only on some subset of T × X .
Generally, we cannot expect that a profile resulting from solving the approximate equation is a Nash equilibrium, since the fixed point may be violated on state or time-state pairs outside Ω , while the definition of feedback Nash equilibrium requires that the whole profile is a fixed point of the joint best response correspondence. Nevertheless, we are able to prove some results.
Theorem 4.
Consider player i and two profiles of strategies of the other players c ¯ i and c ˜ i and assume that c ¯ i | Ω i ( c ¯ i ) Ω i ( c ˜ i ) = c ˜ i | Ω i ( c ¯ i ) Ω i ( c ˜ i ) .
In the case of the infinite time horizon, we assume also that
ϵ > 0 , ( t , x ) Ω ( c ¯ i ) t ¯ t s u c h t h a t s > t ¯ , i I , c i C i , β s J i s , X t , x [ c i , c ¯ i ] ( s ) , c i , c ¯ i < ϵ
Then,
Ω i ( c ¯ i ) = Ω i ( c ˜ i ) , BEST i ( c ¯ i ) | Ω i ( c ¯ i ) = BEST i ( c ˜ i ) | Ω i ( c ¯ i ) , V i ( · , · , c ¯ i ) | Ω i ( c ¯ i ) = V i ( · , · , c ˜ i ) | Ω i ( c ¯ i ) .
Proof. 
First, we prove that V i ( · , · , c ¯ i ) | Ω i ( c ¯ i ) Ω i ( c ˜ i ) = V i ( · , · , c ˜ i ) | Ω i ( c ¯ i ) Ω i ( c ˜ i ) and that Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) = Argmax c i C i ( t , x ) RHS i , t , x , c ˜ i ( c i ) for every ( t , x ) Ω i ( c ¯ i ) Ω i ( c ˜ i ) .
We start the proof from the case of a finite time horizon T.
By the definition of the value function, V i ( T + 1 , x , c ¯ i ) = V i ( T + 1 , x , c ˜ i ) = G i ( x ) .
We use Lemma 2 to RHS i , T , x , c ¯ i ( c i ) and RHS i , T , x , c ˜ i ( c i ) .
First, note that for every x such that ( T , x ) Ω i ( c ¯ i ) , for every c ^ i Argmax c i C i ( t , x ) RHS i , T , x , c ¯ i ( c i ) , ( T + 1 , ϕ ( T , x , [ c ^ i , c ¯ i ] ) ) = ( T + 1 , ϕ ( T , x , [ c ^ i , c ˜ i ] ) ) Ω i ( c ¯ i ) .
Thus, for every x such that ( T , x ) Ω i ( c ¯ i ) , Argmax c i C i ( t , x ) RHS i , T , x , c ¯ i ( c i ) = Argmax c i C i ( t , x ) RHS i , T , x , c ˜ i ( c i ) and max c i C i ( t , x ) RHS i , T , x , c ¯ i ( c i ) = max c i C i ( t , x ) RHS i , T , x , c ˜ i ( c i ) by Lemma 2.
Similarly, we obtain the same for every x such that ( T , x ) Ω i ( c ˜ i ) .
Consequently, by the Bellman Equation (3) (rewritten to the game notation), for every x such that ( T , x ) Ω i ( c ¯ i ) Ω i ( c ˜ i ) , V i ( T , x , c ˜ i ) = V i ( T , x , c ¯ i ) .
Next, consider any t T 1 and assume that for every x such that ( t + 1 , x ) Ω i ( c ¯ i ) Ω i ( c ˜ i ) , V i ( t + 1 , x , c ˜ i ) = V i ( t + 1 , x , c ¯ i ) .
We apply Lemma 2 to RHS i , t , x , c ¯ i ( c i ) and RHS i , t , x , c ˜ i ( c i ) .
First, note that for every x such that ( t , x ) Ω i ( c ¯ i ) , for every c ^ i Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) , ( T , ϕ ( t , x , [ c ^ i , c ¯ i ] ) ) = ( T , ϕ ( t , x , [ c ^ i , c ˜ i ] ) ) Ω i ( c ¯ i ) .
Thus, for every x such that ( t , x ) Ω i ( c ¯ i ) , Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) = Argmax c i C i ( t , x ) RHS i , t , x , c ˜ i ( c i ) and max c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) = max c i C i ( t , x ) RHS i , t , x , c ˜ i ( c i ) .
Similarly, we obtain the same for every x such that ( t , x ) Ω i ( c ˜ i ) .
Consequently, for every x such that ( t , x ) Ω i ( c ¯ i ) Ω i ( c ˜ i ) , V i ( t , x , c ˜ i ) = V i ( t , x , c ¯ i ) and Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) = Argmax c i C i ( t , x ) RHS t , x , c ˜ i ( c i ) , which ends the proof that V i ( · , · , c ¯ i ) | Ω i ( c ¯ i ) = V i ( · , · , c ˜ i ) | Ω i ( c ¯ i ) and BEST i ( c ¯ i ) | Ω i ( c ¯ i ) = BEST i ( c ˜ i ) | Ω i ( c ¯ i ) in the case of finite time horizon T.
To prove for the infinite time horizon, consider truncations of the game with finite time horizon T and the terminal payoff 0. We denote the payoffs in those games by J i T , the values function by V i T and the best response by c i T .
First, note that, with our assumption for the infinite time horizon, if V i T ( t , x , c ¯ i ) converges (to a finite or infinite limit), then V i ( t , x , c ¯ i ) is equal to the limit (by Wiszniewska-Matyszkiel and Singh [3]). Thus, since for every finite T, V i T ( t , x , c ¯ i ) = V i T ( t , x , c ˜ i ) , the same applies to the limit.
Does V i T ( t , x , c ¯ i ) converge?
There are the following cases:
a) V i ( t , x , c ¯ i ) is finite. Denote an optimal control by c ^ i .
Then, ϵ > 0 , T ϵ such that for T > T ϵ , J i T ( t , x , c ^ i , c ¯ i ) J i ( t , x , c ^ i , c ¯ i ) < ϵ and
β T + 1 J i T + 1 , X t , x c i , c ¯ i ( T + 1 ) , c i , c ¯ i < ϵ for every c i .
Thus, J i t , x , c ^ i , c ¯ i ϵ < J i T t , x , c ^ i , c ¯ i V i T t , x , c ¯ i = J i T t , x , c i T , c ¯ i
= J i t , x , c ^ i T , c ¯ i β T + 1 V i T + 1 , X t , x [ c ^ i T , c ¯ i ] , c ¯ i V i t , x , c ¯ i + ϵ
for c ^ i T : = c i T for t T , c ^ i otherwise , which implies existence of a limit.
b) V i ( t , x , c ¯ i ) = + . Denote an optimal control by c ^ i .
Then, M > 0 , there exists T M such that for all T > T M , J i T ( t , x , c ^ i , c ¯ i ) > M .
Thus, M < J i T ( t , x , c ^ i , c ¯ i ) V i T ( t , x , c ¯ i ) , which implies that the limit is + .
c) V i ( t , x , c ¯ i ) = . This means that for every c ^ i , J i ( t , x , c ^ i , c ¯ i ) = .
Thus, M < 0 , there exists T M such that for all T > T M , J i T ( t , x , c ^ i , c ¯ i ) < M . Since this holds for all c ^ i , V i T ( t , x , c ¯ i ) M , which implies convergence to .
This ends the proof that V i ( · , · , c ¯ i ) | Ω i ( c ¯ i ) Ω i ( c ˜ i ) = V i ( · , · , c ˜ i ) | Ω i ( c ¯ i ) Ω i ( c ˜ i ) . Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) = Argmax c i C i ( t , x ) RHS i , t , x , c ˜ i ( c i ) for all ( t , x ) Ω i ( c ¯ i ) Ω i ( c ˜ i ) in the case of the infinite time horizon.
The proof that B E S T i ( c ¯ i ) | Ω i ( c ¯ i ) Ω i ( c ˜ i ) = B E S T i ( c ˜ i ) | Ω i ( c ¯ i ) Ω i ( c ˜ i ) is immediate by the fact that, by (5) rewritten to the game notation, B E S T i ( c ¯ i ) is the set of all functions c ^ i ( t , x ) defined by c ^ i ( t , x ) Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) (by Wiszniewska-Matyszkiel and Singh [3]).
The only thing that remains to be proven is Ω i ( c ¯ i ) = Ω i ( c ˜ i ) , which we prove by forward induction, as in the proof of Theorem 1.
First, consider t = t 0 and arbitrary x X 0 .
In this case, Ω i ( c ¯ i ) { ( t , y ) T × X : t t 0 } = { t 0 } × X 0 = Ω i ( c ˜ i ) { ( t , x ) T × X : t t 0 } .
Next, consider any t t 0 and any x such that ( t , x ) Ω .
Assume that Ω i ( c ¯ i ) { ( k , y ) T × X : k t } = Ω i ( c ˜ i ) { ( k , y ) T × X : k t } .
Since Argmax c i C i ( t , x ) RHS i , t , x , c ¯ i ( c i ) = Argmax c i C i ( t , x ) RHS i , t , x , c ˜ i ( c i ) , Ω i ( c ¯ i ) { ( k , y ) T × X : k t + 1 } = Ω i ( c ˜ i ) { ( k , y ) T × X : k t + 1 } .
This ends the proof that Ω i ( c ¯ i ) = Ω i ( c ˜ i ) . □
Nevertheless, the above results do not guarantee the equivalence of equilibria and profiles calculated using the approximate value function even on the graph of the optimal trajectory, since for a profile to be a feedback Nash equilibrium, also its values for other ( t , x ) matter, since it may happen a Nash equilibrium being an extension of a profile with a Nash equilibrium along the corresponding strategy does not exist.
Therefore, let us introduce the following notation.
Consider a profile c C . For this c, we introduce two types of one stage games.
  • G t , x , c ¯ —a one stage game with strategies s i C i ( t , x ) , and payoffs P i ( t , x , s i , s i ) + β V i ( t + 1 , ϕ ( t , x , s ) , c ¯ i ) .
  • G t , x , c ¯ approx —a one stage game with strategies s i C i ( t , x ) , and payoffs P i ( t , x , s i , s i ) + β V i approx ( t + 1 , ϕ ( t , x , s ) , c ¯ i ) .
Note that, in G t , x , c and G t , x , c approx , dependence on c ¯ reduces to c ¯ ( τ , x ) for τ > t only.
Since feedback Nash equilibrium c ¯ has to be defined for all ( t , x ) and its value at ( t , x ) should coincide with the Nash equilibrium in G t , x , c ¯ , to prove the equivalence, we have to guarantee that there will be no problem with the existence of a Nash equilibrium also off the steady state trajectory.
  • C N , t is the set of profiles from c C for which a Nash equilibrium in G s , x , c exists for each x X , s t .
    We assume that C N , t 0 Ø , which is equivalent to the existence of a Nash equilibrium.
  • C N approx , t is the set of profiles from c C for which a Nash equilibrium in G t , x , c approx exists for each x X , s t .
    We assume that C N approx , t 0 Ø .
In the finite horizon, these two sequences of sets of strategy profiles can be defined recursively starting from t = T + 1 downwards as follows.
  • C N , t by:
    C N , T + 1 = C ,
    C N , t being the set of profiles from c C N , t + 1 for which a Nash equilibrium in G t , x , c exists for each x X .
  • C N approx , t by
    C N approx , T + 1 = C ,
    C N approx x , t being the set of profiles from c C N approx , t + 1 for which a Nash equilibrium in G t , x , c approx exists for each x X .
For finite T, we can also define the following sets of profiles.
  • CN being the set of feedback Nash equilibria.
  • CN approx being the set of profiles c C that fulfill, starting from T backwards, that c ( t , x ) is a Nash equilibrium of the static game G t , x , c approx .
In the infinite horizon, those sets still fulfill the recurrence relation, but we lack a terminal condition.
Theorem 5. 
Consider a game with a finite time horizon T.
a) If c ¯ is a Nash equilibrium, and for every i, both the pair V i ( · , · , c ¯ i ) and V i approx ( · , · , c ¯ i ) as well as the pairs V i ( · , · , c i ) and V i approx ( · , · , c i ) for every c CN approx fulfill the assumptions of Theorem 3, then there exists a profile c ˜ which fulfills
c ¯ | Ω c ¯ = c ˜ i | Ω c ¯ and such that for all i, c ˜ i BEST i approx ( c ˜ i ) .
b) If c ˜ CN approx and for every i, both the pair V i ( · , · , c ˜ i ) and V i approx ( · , · , c ˜ i ) as well as the pairs V i ( · , · , c i ) and V i approx ( · , · , c i ) for every c CN (Nash equilibrium) fulfill the assumptions of Theorem 3, there exists a Nash equilibrium c ¯ which fulfills c ¯ | Ω c ˜ = c ˜ i | Ω c ˜ .
Proof. 
The proof is based on a recursive construction of a profile coinciding with c ¯ or c ˜ , respectively.
By Theorem 3, for every c C , from a) and b), Ω i approx ( c i ) = Ω i ( c i ) , for every c ^ i BEST i ( c i ) , there exist c ˘ i BEST i approx ( c i ) such that c ^ i | Ω i ( c i ) = c ˘ | Ω i ( c i ) and for every c ˘ BEST i approx ( c i ) there exist c ^ BEST i ( c i ) such that c ^ i | Ω i ( c i ) = c ˘ i | Ω i ( c i ) .
a) This applies both to c ¯ and the profile c ˜ which we are going to construct. To simplify the notation, let us denote by Ω ¯ the union of all Ω i ( c ¯ i ) .
We define c ˜ recursively, starting from t = T as follows.
(i) c ˜ ( T , x ) = c ¯ ( T , x ) for all x such that ( T , x ) Ω ¯ and c ˜ ( T , x ) is an equilibrium of G T , x , c approx . For another x, this part of defining a profile is correct whatever c we write, since G T , x , c approx is independent of c.
(ii) having defined c ˜ ( τ , x ) for all τ > t , x X , define c ˜ ( t , x ) = c ¯ ( t , x ) for all x such that ( t , x ) Ω ¯ , c ˜ ( t , x ) being any equilibrium of G t , x , c ˜ approx for other x X .
This definition is correct since G t , x , c ˜ approx depends only on c ˜ ( τ , x ) for τ > t .
By Theorem 4, c ˜ has the assumed property.
b) Analogously, in the opposite direction. □
Example 5.
Application to the Motivating Example.
Assume that, by some preliminary analysis, we know that the equilibrium profile c ¯ exists and we know some superset of Ω c ¯ given X 0 (e.g., the fact that for x 0 ( 1 100 , 1 2 ) , X x 0 , t 0 c ¯ ( t ) ( 1 100 , 1 2 ) for all t). We also know that the assumption of Theorem 5 b) is fulfilled. By Theorem 5, there exists a profile c ˜ with c ˜ i BEST i approx ( c ˜ i ) such that c ¯ i | Ω c ˜ = c ˜ i | Ω c ˜ .
We have obtained a single-valued BEST i approx ( c ˜ i ) for every i and every c i , and a unique solution to the fixed point problem of the approximate joint best response (if we do not assume symmetry a priori, then we get it as a result, recursively, starting from the terminal time).
Thus, by Theorem 5, the profile c ˜ calculated by the approximate procedure coincides with the Nash equilibrium profile on Ω c ˜ . Thus, having calculated c ˜ , we know the equilibrium c ¯ on the set Ω c ˜ .

5. Conclusions and Further Research

After analyzing numerically the problems of social optima and Nash equilibria for Levhari and Mirman Fish Wars model, a dynamic game with logarithmic instantaneous and terminal payoffs, with an algorithm in which we have purposely left the grid sparse on some sets and we haven’t assumed a specific form of solutions a priori, we have obtained a surprisingly good quality of approximation of optima and equilibria along the corresponding trajectories although the value functions were substantially overestimated on some sets and underestimated on some other sets.
This has been a starting point to the main achievement of the paper: formulation of general rules of what type of over- or under-estimation of the value function does result either in incorrectness of the optimal trajectory or in incorrectness of the optimal strategy along it. This applies both to dynamic optimization problems and to Nash equilibria in dynamic games. In this general rule, the over- or under-estimation is not restricted to over- or under-estimation resulting from using numerical methods with grids sparse on some sets, it may also be a result of replacing the value function which is not known exactly by a constraint for it on some intervals a priori in order to simplify further computation or calculation. Depending on whether we allow the value function to be under-estimated, over-estimated, or just incorrectly calculated, on some subsets, we do it at the cost of previous knowledge about the regions in which the optimal trajectory is never in, checking ex post whether the approximate trajectory has non-empty intersection with those subsets, or both, respectively.
Among other things, our results prove that, in some dynamic optimization problems, solving the Bellman equation and finding the maximand of its rhs as the candidate for optimal control for the steady state of the state variable only, may yield a correct result. It also justifies the procedure of calculating the optimal control only along the optimal trajectory. Those two hold with the restrictions as stated above.
The results of this paper are mainly theoretical. Some of them assume some other method of restriction of the set in which the optimal trajectory may be, which has to be used a priori. Some of them require checking ex post whether the calculated trajectory has empty intersections with the sets on which the value function is incorrectly calculated. Although we are able to do this in a specific example, as we have done in Example 4, further research is required to develop more general methods of applying the theoretical results to larger classes of problems. Similarly, how the theoretical findings of this paper can be used for developing numerical methods, in which the approximate solution is not exactly equal to the accurate solution, to obtain criteria of determining when the paradox from our motivating example holds, has to be further studied.
It is worth emphasizing that the results for Nash equilibria are weaker than those for optima. They require an additional assumption that the procedure of calculation of a feedback Nash equilibrium never ends by a dead end. This is because of coupling the players’ dynamic optimization problems in order to find a fixed point in the space of profiles of feedback strategies. This coupling is reduced to finding a sequence of fixed points in spaces of profiles of decisions only if we replace the concept of Nash equilibrium by pre-belief distorted Nash equilibrium (pre-BDNE) mentioned in the introduction in the context of their applications to Fish Wars. Thus, an extension of the results for that solution concept seems a natural next step of this analysis and, by partial de-coupling of the solutions, stronger results can be expected. Another next step is extension of the results to stochastic optimal control problems and stochastic games.

Author Contributions

Conceptualization, A.W.-M.; methodology, A.W.-M.; software, R.S.; validation, A.W.-M. and R.S.; formal analysis, A.W.-M and R.S.; investigation, A.W.-M. and R.S.; resources, A.W.-M. and R.S.; data curation, R.S.; writing—original draft preparation, A.W.-M. and R.S.; writing—review and editing, A.W.-M. and R.S.; visualization, R.S.; supervision, A.W.-M.; project administration, A.W.-M.; funding acquisition, A.W.-M. and R.S. All authors have read and agreed to the published version of the manuscript.

Funding

The project was financed by funds of National Science Centre, Poland, Grant No. 2016/21/B/HS4/00695 (Agnieszka Wiszniewska-Matyszkiel) and 2016/21/N/HS4/00258 (Rajani Singh).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Numerical Solutions for the Motivating Example

In this Appendix, we present the algorithms used for the motivating Fish Wars example. Generally, the same method of dynamic programming or Bellman equation is used both in analytic and numerical approaches, but, in numerical analysis, we restrict ourselves a priori to symmetric solutions only.
In the numerical approach, like in the analytic approach, we use the Bellman equation to find the value function first, then the optimal solution using it (the social optimum or the best response to the others’ strategies, depending on the case), starting from terminal time T. We do it stage by stage, recursively.
It is important that we purposely don’t use any knowledge about the value function having a special form (in the opposite case, it is enough to approximate numerically the unknown coefficients). All the theoretical assumptions that we make before solving the problem numerically are that the solution exists, it is unique, and symmetric. For the Nash equilibrium, we additionally assume monotonicity of the best responses.
In the procedure of computing the Nash equilibrium, starting from the terminal time, we first calculate an approximate of the value function of a player for optimization given the sum of decisions of the remaining players at this stage o—which, since we assume symmetry, simplifies to knowing the current decision of the other players, o n 1 , and the best response to o; then, we look for a fixed point at this stage. Subsequently, with the fixed point and the value function for the equilibrium at this stage, we switch to the previous period.
In the calculation of the value function for each stage, we approximate the continuous state space by a finite grid. In the case of computing the Nash equilibrium, we also need a grid for the sum of consumptions of the other players o.

Appendix A.1. Computation of Social Optimum

In this approach, we assume the symmetry of the solution (all c i identical) a priori, which reduces the computation of the maximum at each stage to one-dimensional.
Since we know that the social optimum profile is symmetric, we can a priori assume this in the Bellman equation and reduce optimization in each stage to one-dimensional.
In this case, the Bellman equation and maximization of its right hand simplify to
V ( t , x ) = sup c [ 0 , x n ] n ln c + β V t + 1 , x n c α ,
and
c i ( t , x ) Argmax c [ 0 , x n ] n ln c + β V t + 1 , x n c α .
There is no need to do it a priori in the analytic approach since the symmetry is obtained almost immediately in calculations; nevertheless, in the numerical approach, it substantially reduces complexity.
We use a grid for the state variable x [ 0 , 1 ] . The grid is not uniform—we refine it on the subinterval [ 0 . 1 2 ] . The same applies to the computation of a Nash equilibrium.
Algorithm A1. Computation of the social optimum.
  • We compute V S ( T + 1 , x ) = n ln x n for all x in the x grid.
  • Starting from t = T backwards to 1, we compute V S ( t , x ) from Equation (A1) and c i S ( t , x ) from Formula (A2) for all grid points of x, using computed V S ( t + 1 , ) . Since ( x n c i ) α is usually not a grid point of x, we use cubic interpolation. Those two are calculated in the same operation, using fminbnd function in Matlab.
  • Using computed c S ( t , x ) , starting from t = 1 to T, we calculate X ( t ) and c S ( t , X ( t ) ) .

Appendix A.2. Computation of Nash Equilibria

Since we look for symmetric equilibria only, this can be further reduced to one V and one c i for all players and finding c i being the best response to the sum of strategies chosen by the others o = ( n 1 ) c i ; therefore, the Bellman equation and maximization of its right-hand side given a profile of strategies of the others can be replaced by the following:
V ( t , x ) = sup c [ 0 , x n ] ln c + β V t + 1 , x c o α ,
while the equilibrium profile is every profile which fulfills for every t
c i ( t , x ) Argmax c [ 0 , x n ] ln c + β V t + 1 , x c o α .
Again, this restriction is not needed for analytic calculation of equilibrium, in which we obtain uniqueness of Nash equilibrium, and we are able to prove that this unique equilibrium is symmetric.
In the case of numerical computation, we use the reduced Conditions (A3) and (A4) in order to reduce dimensionality.
Besides the grid for the state variable, we need a grid for o [ 0 , ( n 1 ) x ] . The initial grid for o is not very fine since its size is the main component of the cost—we are going to further refine it only on small subsets; initially, it is obtained by taking a uniform grid in [ 0 , n 1 ] and multiplying it by x.
From the fact that the value function tends to as x tends to 0, as well as instantaneous payoff tends to as c tends to 0, we recognize that, for small x, we need a finer grid for o. Therefore, c in the social optimization and both c and o in computation of equilibrium are written in a form a · x and optimization is taken for a over fixed interval [ 0 , 1 n ] or [ 0 , n 1 n ] and the grid is uniform in a.
For the same reason, the grid for x was not uniform—it was denser for small x.
Algorithm A2. Computation of the Nash Equilibrium.
  • We compute V N ( T + 1 , x ) = ln x n for all x in the x grid.
  • Starting from t = T backwards to 1 for every grid point of x,
    (a)
    we compute an auxiliary V B ( t , x , o ) from Equation (A3) and c B ( t , x , o ) from Equation (A4). We do it for all grid points of o [ 0 , ( n 1 ) x ] , using computed V N ( t + 1 , · , o ) . Since ( x c o ) α is usually not a grid point of x, we use cubic interpolation. Those two are calculated in the same operation, using fminbnd function in Matlab.
    (b)
    We find o ^ in the grid minimizing | o c B ( t , x , o ) ( n 1 ) | .
    (c)
    We take the interval of two neighboring points of o ^ unless o ^ is the first or the last point of the grid—in this case, we take the interval with ends in o ^ and its neighbor in the grid. We divide the chosen interval into a sub-grid, as in the initial stage, and we repeat (a), (b) on this grid, until the distance of points in the sub-grid is of required accuracy.
    (d)
    We substitute V N ( t , x ) = V B ( t , x , o ^ ) and c N ( t , x ) = c B ( t , x , o ^ ) .
  • Using the computed c N ( t , x ) , starting from t = 1 to T, we calculate X ( t ) and c N ( t , X ( t ) ) .

References

  1. Stokey, N.L. Recursive Methods in Economic Dynamics; Harvard University Press: Cambridge, MA, USA, 1989. [Google Scholar]
  2. Kamihigashi, T. On the principle of optimality for nonstationary deterministic dynamic programming. Int. J. Econ. Theory 2008, 4, 519–525. [Google Scholar] [CrossRef]
  3. Wiszniewska-Matyszkiel, A.; Singh, R. Infinite horizon dynamic optimization with unbounded returns—Necessity, sufficiency, existence, uniqueness, convergence and a sequence of counterexamples. SSRN Electron. J. 2018. [Google Scholar] [CrossRef]
  4. Martins-da Rocha, V.F.; Vailakis, Y. Existence and uniqueness of a fixed point for local contractions. Econometrica 2010, 78, 1127–1141. [Google Scholar] [CrossRef] [Green Version]
  5. Le Van, C.; Morhaim, L. Optimal growth models with bounded or unbounded returns: A unifying approach. J. Econ. Theory 2002, 105, 158–187. [Google Scholar] [CrossRef] [Green Version]
  6. Rincón-Zapatero, J.P.; Rodríguez-Palmero, C. Existence and uniqueness of solutions to the Bellman equation in the unbounded case. Econometrica 2003, 71, 1519–1555. [Google Scholar] [CrossRef] [Green Version]
  7. Matkowski, J.; Nowak, A. On discounted dynamic programming with unbounded returns. Econ. Theory 2011, 46, 455–474. [Google Scholar] [CrossRef] [Green Version]
  8. Kamihigashi, T. Elementary results on solutions to the Bellman equation of dynamic programming: Existence, uniqueness, and convergence. Econ. Theory 2014, 56, 251–273. [Google Scholar] [CrossRef] [Green Version]
  9. Singh, R.; Wiszniewska-Matyszkiel, A. A class of linear quadratic dynamic optimization problems with state dependent constraints. Math. Methods Oper. Res. 2020, 91, 325–355. [Google Scholar] [CrossRef] [Green Version]
  10. Singh, R.; Wiszniewska-Matyszkiel, A. Linear quadratic game of exploitation of common renewable resources with inherent constraints. Topol. Methods Nonl. Anal. 2018, 51, 23–54. [Google Scholar] [CrossRef]
  11. Bock, H.G. Nonlinear mixed-integer optimal control–from the Maximum Principle Approach to online computation of closed loop controls in real time. In Proceedings of the 14th Viennese Conference on Optimal Control and Dynamic Games, Vienna, Austria, 3–6 July 2018. [Google Scholar]
  12. Horwood, J.; Whittle, P. Optimal control in the neighbourhood of an optimal equilibrium with examples from fisheries models. Math. Med. Biol. A J. IMA 1986, 3, 129–142. [Google Scholar] [CrossRef]
  13. Horwood, J.; Whittle, P. The optimal harvest from a multicohort stock. Math. Med. Biol. A J. IMA 1986, 3, 143–155. [Google Scholar] [CrossRef]
  14. Krawczyk, J.B.; Tolwinski, B. A cooperative solution for the three-nation problem of exploitation of the southern bluefin tuna. Math. Med. Biol. A J. IMA 1993, 10, 135–147. [Google Scholar] [CrossRef]
  15. Levhari, D.; Mirman, L.J. The great fish war: An example using a dynamic Cournot-Nash solution. Bell J. Econ. 1980, 11, 322–334. [Google Scholar] [CrossRef]
  16. Okuguchi, K. A dynamic Cournot-Nash equilibrium in fishery: The effects of entry. Decis. Econ. Financ. 1981, 4, 59–64. [Google Scholar] [CrossRef]
  17. Mazalov, V.V.; Rettieva, A.N. Fish wars with many players. Int. Game Theory Rev. 2010, 12, 385–405. [Google Scholar] [CrossRef]
  18. Mazalov, V.V.; Rettieva, A.N. The compleat fish wars with changing area for fishery. IFAC Proc. Vol. 2009, 42, 168–172. [Google Scholar] [CrossRef]
  19. Rettieva, A. A discrete-time bioresource management problem with asymmetric players. Autom. Remote Control 2014, 75, 1665–1676. [Google Scholar] [CrossRef]
  20. Nowak, A.S. Equilibrium in a dynamic game of capital accumulation with the overtaking criterion. Econ. Lett. 2008, 99, 233–237. [Google Scholar] [CrossRef]
  21. Nowak, A.S. A note on an equilibrium in the great fish war game. Econ. Bull. 2006, 17, 1–10. [Google Scholar]
  22. Fischer, R.D.; Mirman, L.J. Strategic dynamic interaction: Fish wars. J. Econ. Dyn. Control 1992, 16, 267–287. [Google Scholar] [CrossRef]
  23. Fischer, R.D.; Mirman, L.J. The compleat fish wars: Biological and dynamic interactions. J. Environ. Econ. Manag. 1996, 30, 34–42. [Google Scholar] [CrossRef]
  24. Wiszniewska-Matyszkiel, A. A Dynamic Game with Continuum of Players and Its Counterpart with Finitely Many Players. In Advances in Dynamic Games; Springer: Berlin, Germany, 2005; pp. 455–469. [Google Scholar]
  25. Wiszniewska-Matyszkiel, A. Open and closed loop Nash equilibria in games with a continuum of players. J. Optim. Theory Appl. 2014, 160, 280–301. [Google Scholar] [CrossRef] [Green Version]
  26. Kwon, O.S. Partial international coordination in the great fish war. Environ. Resour. Econ. 2006, 33, 463–483. [Google Scholar] [CrossRef]
  27. Breton, M.; Keoula, M.Y. Farsightedness in a coalitional great fish war. Environ. Resour. Econ. 2012, 51, 297–315. [Google Scholar] [CrossRef]
  28. Breton, M.; Keoula, M.Y. A great fish war model with asymmetric players. Ecol. Econ. 2014, 97, 209–223. [Google Scholar] [CrossRef]
  29. Koulovatianos, C. Strategic exploitation of a common-property resource under rational learning about its reproduction. Dyn. Games Appl. 2015, 5, 94–119. [Google Scholar] [CrossRef]
  30. Dutta, P.K.; Sundaram, R.K. The tragedy of the commons? Econ. Theory 1993, 3, 413–426. [Google Scholar] [CrossRef]
  31. Wiszniewska-Matyszkiel, A. When beliefs about future create future—Exploitation of a common ecosystem from a new perspective. Strat. Behav. Environ. 2014, 4, 237–261. [Google Scholar] [CrossRef]
  32. Wiszniewska-Matyszkiel, A. Belief distorted Nash equilibria: Introduction of a new kind of equilibrium in dynamic games with distorted information. Ann. Oper. Res. 2016, 243, 147–177. [Google Scholar] [CrossRef] [Green Version]
  33. Wiszniewska-Matyszkiel, A. Common resources, optimality and taxes in dynamic games with increasing number of players. J. Math. Anal. Appl. 2008, 337, 840–861. [Google Scholar] [CrossRef] [Green Version]
  34. Hannesson, R. Fishing as a Supergame. J. Environ. Econ. Manag. 1997, 32, 309–322. [Google Scholar] [CrossRef]
  35. Górniewicz, O.; Wiszniewska-Matyszkiel, A. Verification and refinement of a two species Fish Wars model. Fisheries Res. 2018, 203, 22–34. [Google Scholar] [CrossRef]
  36. Breton, M.; Dahmouni, I.; Zaccour, G. Equilibria in a two-species fishery. Math. Biosci. 2019, 309, 78–91. [Google Scholar] [CrossRef] [PubMed]
  37. Carraro, C.; Filar, J. Control and Game-Theoretic Models of the Environment; Springer Science & Business Media: Berlin, Germany, 2012; Volume 2. [Google Scholar]
  38. Van Long, N. Dynamic games in the economics of natural resources: A survey. Dyn. Games Appl. 2011, 1, 115–148. [Google Scholar] [CrossRef]
  39. Van Long, N. Applications of dynamic games to global and transboundary environmental issues: A review of the literature. Strat. Behav. Environ. 2012, 2, 1–59. [Google Scholar] [CrossRef]
  40. Haurie, A.; Krawczyk, J.B.; Zaccour, G. Games and Dynamic Games; World Scientific Publishing Company: Singapore, 2012; Volume 1. [Google Scholar]
  41. Singh, R. Calculation of Optima and Equilibria in Dynamic Resource Extraction Problems. Available online: https://depotuw.ceon.pl/handle/item/3317 (accessed on 16 May 2020).
  42. Bellman, R. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957. [Google Scholar]
  43. Blackwell, D. Discounted dynamic programming. Ann. Math. Stat. 1965, 36, 226–235. [Google Scholar] [CrossRef]
  44. Başar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory; SIAM: Bangkok, Thailand, 1998. [Google Scholar]
  45. Wiszniewska-Matyszkiel, A. On the terminal condition for the Bellman equation for dynamic optimization with an infinite horizon. Appl. Math. Lett. 2011, 24, 943–949. [Google Scholar] [CrossRef] [Green Version]
Figure 1. (In-)accuracy of the value function versus the accuracy of the optimal control. (a): Numerical and actual value functions, V S ( t , x ) , for the social optimum; (b): The error in c S ( t , x ) (the difference between numerical and actual values) for the social optimum.
Figure 1. (In-)accuracy of the value function versus the accuracy of the optimal control. (a): Numerical and actual value functions, V S ( t , x ) , for the social optimum; (b): The error in c S ( t , x ) (the difference between numerical and actual values) for the social optimum.
Mathematics 08 01109 g001
Figure 2. Accuracy of the optimal state trajectory. (a): The trajectories of numerical (red) and actual (blue dashed) state variable, X ( t ) , for the social optimum; (b): The error in the trajectory of the state variable, X ( t ) , for the social optimum.
Figure 2. Accuracy of the optimal state trajectory. (a): The trajectories of numerical (red) and actual (blue dashed) state variable, X ( t ) , for the social optimum; (b): The error in the trajectory of the state variable, X ( t ) , for the social optimum.
Mathematics 08 01109 g002
Figure 3. Accuracy of the optimal control trajectory. (a): The optimal consumption path, c S ( t , X ( t ) ) , for the social optimum: numerical (red) and actual (blue dashed); (b): The error in the optimal consumption path, c S ( t , X ( t ) ) , for the social optimum.
Figure 3. Accuracy of the optimal control trajectory. (a): The optimal consumption path, c S ( t , X ( t ) ) , for the social optimum: numerical (red) and actual (blue dashed); (b): The error in the optimal consumption path, c S ( t , X ( t ) ) , for the social optimum.
Mathematics 08 01109 g003
Figure 4. (In-)accuracy of the value function of a players versus the accuracy of the Nash equilibrium strategy. (a): Numerical and actual value functions, V N ( t , x ) , for the Nash equilibrium; (b): The error in c N ( t , x ) (difference between numerical and actual values) for the Nash equilibrium.
Figure 4. (In-)accuracy of the value function of a players versus the accuracy of the Nash equilibrium strategy. (a): Numerical and actual value functions, V N ( t , x ) , for the Nash equilibrium; (b): The error in c N ( t , x ) (difference between numerical and actual values) for the Nash equilibrium.
Mathematics 08 01109 g004
Figure 5. Accuracy of the Nash equilibrium trajectory. (a): The trajectories of numerical (red) and actual (blue dashed) state variable, X ( t ) , for the Nash equilibrium; (b): The error in the trajectory of the state variable, X ( t ) , for the Nash equilibrium.
Figure 5. Accuracy of the Nash equilibrium trajectory. (a): The trajectories of numerical (red) and actual (blue dashed) state variable, X ( t ) , for the Nash equilibrium; (b): The error in the trajectory of the state variable, X ( t ) , for the Nash equilibrium.
Mathematics 08 01109 g005
Figure 6. Accuracy of the trajectory of the Nash equilibrium strategy. (a): The optimal consumption path, c N ( t , X ( t ) ) , numerical (red) and actual (blue dashed) for the Nash equilibrium; (b): The error in the optimal consumption path, c N ( t , X ( t ) ) , for the Nash equilibrium.
Figure 6. Accuracy of the trajectory of the Nash equilibrium strategy. (a): The optimal consumption path, c N ( t , X ( t ) ) , numerical (red) and actual (blue dashed) for the Nash equilibrium; (b): The error in the optimal consumption path, c N ( t , X ( t ) ) , for the Nash equilibrium.
Mathematics 08 01109 g006
Figure 7. Intentional underestimation of the value function on some interval not influencing the optimal state and consumption trajectories in Example 1.
Figure 7. Intentional underestimation of the value function on some interval not influencing the optimal state and consumption trajectories in Example 1.
Mathematics 08 01109 g007
Figure 8. Intentional overestimation of the value function on some interval not influencing the optimal state and consumption trajectories in Example 3.
Figure 8. Intentional overestimation of the value function on some interval not influencing the optimal state and consumption trajectories in Example 3.
Mathematics 08 01109 g008

Share and Cite

MDPI and ACS Style

Wiszniewska-Matyszkiel, A.; Singh, R. When Inaccuracies in Value Functions Do Not Propagate on Optima and Equilibria. Mathematics 2020, 8, 1109. https://doi.org/10.3390/math8071109

AMA Style

Wiszniewska-Matyszkiel A, Singh R. When Inaccuracies in Value Functions Do Not Propagate on Optima and Equilibria. Mathematics. 2020; 8(7):1109. https://doi.org/10.3390/math8071109

Chicago/Turabian Style

Wiszniewska-Matyszkiel, Agnieszka, and Rajani Singh. 2020. "When Inaccuracies in Value Functions Do Not Propagate on Optima and Equilibria" Mathematics 8, no. 7: 1109. https://doi.org/10.3390/math8071109

APA Style

Wiszniewska-Matyszkiel, A., & Singh, R. (2020). When Inaccuracies in Value Functions Do Not Propagate on Optima and Equilibria. Mathematics, 8(7), 1109. https://doi.org/10.3390/math8071109

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop