2. Preliminaries: Cognitive Extensive Games
This section aims to introduce a game framework [
26] that characterizes players’ cognition when playing extensive games by incorporating ANNs into the classic model of extensive games.
2.1. Game Models
First, we introduce the concept of (finite) extensive games characterized by pure strategies with perfect information.
Extensive form-perfect information games An extensive form-perfect information game [
31] is formally defined as a tuple
G =
, in which
N represents the set of players, and ;
denotes the (directed, irreflexive and finite) game tree, which consists of a set of nodes (or vertices) V, a set of moves or actions A, and a set of arcs. For any two nodes v and , is said to be an immediate successor to v if . Nodes without successors are called leaves (terminal nodes) and are denoted by Z. The set of moves available at v is denoted as ;
denotes the turn function and indicates which player is to move at each nonterminal node;
The utility function determines the utility of each terminal node in Z for each player ;
denotes the set of strategies for player i. Each is a function assigned to every nonterminal node v, with acting as an immediate successor of v.
A strategy profile is a combination of the strategies of every player. The set of all strategy profiles is denoted as . For any player , is the strategy of players in . An outcome function is a function that assigns a terminal node to each strategy profile. denotes the outcomes that can be achieved by agent i, given that all the other players follow , and depicts the outcome when player i follows and the other players utilize .
An alternative way to depict players’ payoff is through preference relation , such that for each player i, if . The indifference case is written as when . We focus on games with a finite horizon and no infinite branches in the game tree. In this paper, by “extensive games”, we refer to extensive form-perfect information games with a finite horizon.
For an extensive game G and any node , the subgame of G following v is defined as =, in which , is the subtree of T with rooted v and , satisfies that , is the set of strategies , such that for each with , , and . The outcome of in subgame is written as .
While this approach gives a full picture of the game from an omniscient observer’s point of view, the extensive game lacks consideration of the players’ vision of the game, which might differ from the real game being played. Normally, players view the game according to accumulated experience, including their judgment of the plausibility of future moves and the suitability of game configurations. That is, a player’s view of the game is only a part of the original game tree, which is narrow and short due to their limited abilities.
In [
26], a new model of extensive games was proposed by considering players’ cognition of the game. Technically, two kinds of ANNs were introduced into the game model to model and simulate agents’ cognition.
The first type of ANN, called a filter net, represents the players’ cognition regarding the plausibility of future moves. For a filter net of a game G, the input is any state of G and the output is a probability function over all future moves at that state. For a state v and a possible move a at v, the probability of choosing a at v is defined as , which is usually written as .
The second type of ANN, the evaluation net, simulates players’ evaluation of the quality of game states. For an evaluation net of a game G, the input is any state of G, and the output is an evaluation function assigning a probability to each state. For any state v, predicts the probability of winning the game following v and is usually simply written as .
The cognitive gameplay process can be modeled based on the filter net and evaluation net. For each decision at a current state, four subprocedures are involved: the first three capture players’ cognition by obtaining the filtration
of the game tree
T of an extensive
G.
Algorithm for obtaining the filtration |
- 1.
Branch-Pick. According to prior knowledge, the branches of the current decision point can be narrowed by selecting several (e.g., b) of the most plausible alternatives among all the available actions. Formally, for any state s, the branches corresponding to will be chosen; that is, only the top b elements of are selected. - 2.
Subtree-Pick. To make decisions, the current player searches the subsequent game tree by following these branches. Due to computational limitations, the exploration of the future involves a finite number of steps (e.g., l). Each branch rooted at is extended to depth l, i.e., the subtree that can be reached within l steps is obtained. - 3.
Evidence-Pick. To choose the optimal of the b branches, the player evaluates the goodness based on the payoffs of the final states of the subtrees. This evaluation of the game state depends on two aspects: (1) an evaluation via prior knowledge and (2) a vague prediction of the future. The former can be given by the evaluation net, while the latter requires a simulation of future moves, which is necessary, since even if it is difficult for players to obtain the complete game tree, they can still hold a vague prediction of the far future. This process is completed in the following manner. First, for each leaf in the subtree following , a path is computed until the end of the game, where at each node along the path, the most optimal action according to the plausibility returned by is chosen. Then, the payoff of the final outcome is determined following as the simulation result of . The overall payoff of is the weighted sum of the two parts.
|
The fourth subprocedure is performed based on the filtration . Given the cognition on the game tree and the suitability of game states, the player who moves at s can choose the optimal move from the b options via BI.
Therefore, the game being played is different from the ideal model of extensive games: players’ cognition regarding the game must be included. The following is a model called
cognition games, which was proposed in [
26].
Cognition Games. Given an extensive game G = , a filter net and an evaluation net for G, a cognition game for G at any state s is a tuple , in which
denotes the filtration of T at s;
The set of players is for ;
Turn function is consistent with t, i.e., for any ;
The set of strategies coincides with . for every with ;
The utility function is an integration of the results obtained via the evaluation net and simulation. Let be any leaf of , and let z be the final point following in the simulation process. Then, .
2.2. Equilibrium Concepts
A substantial concern in games is the equilibrium concept, which characterizes a state of dynamic balance in which no player can deviate from this strategy and improve their expected payoff. Two classic equilibrium concepts for extensive games are the Nash equilibrium [
32] and SPE.
Nash equilibrium. Let be any strategy profile of an extensive game G. is the best response for player i if for each strategy of i. If is the best response for all players i, then is called a Nash equilibrium of G.
Subgame perfect equilibrium. Let be any strategy profile of an extensive game G. If, for each player, i and any node v with = i, holds for any of i’s strategies in , and is called an SPE of G.
The best response is the best option for each player given what other players will do. Consequently, the Nash equilibrium requires that a strategy profile consists of the best response for every player. SPE considers sequential moves and must be the Nash equilibrium in all subgames.
For any extensive game G with root , an SPE of G can determine a node sequence , s.t. for each , , and is a terminal node. We call this node sequence a -SPE solution of G, which is written as .
A fundamental way to find the SPE of extensive games is BI [
33], which identifies the best move for the player who is the last to move in the game. Subsequently, the optimal move for the next-to-last player is then found. This process is repeated from the end of the game to the beginning to determine the strategies for all players.
An apparent weakness of BI is the need to search the full game tree, which makes the approach impractical for large-scale games. Due to resource limitations or ability constraints [
34], players cannot make a perfect prediction of the future in practical game-playing processes. Normally, players must make decisions via a heuristic search over a limited part of the game tree based on prior knowledge. By taking players’ cognition about the game into consideration, cognitive game-solving provides a more realistic framework for playing extensive games. The resulting equilibrium concept is called CPE.
Cognitive Perfect Equilibrium [
26]. For an extensive game
G =
, a filter net
and an evaluation net
of
G, a strategy profile
is called a CPE of
G if and only if the following holds:
For each and each node (), there is a strategy profile of satisfying and for any strategy profile of i in .
The intuition of CPE is that at every decision point, the CPE is consistent with the SPE of the corresponding cognition game.
Cognitive games provide realistic representations of extensive games, and the CPE reflects the gameplay procedure. However, a major drawback exists concerning cognitive games and the CPE. In cognitive games, players assume that their cognition is consistent with that of their opponents, i.e., their opponents have the same view of the game being played and the same evaluation of the game states.
As a result, cognitive games omit the player’s reasoning on their opponents’ cognition, which might play significant roles in the player’s strategies. In particular, the player may benefit from exploiting the opponents’ cognition.
This paper aims to refine cognitive games by endowing players with the ability to learn their opponents’ cognition about the game being played and to evaluate game situations. An appropriate solution concept is obtained under this new game model.
3. Adversarial Cognition Game
In this section, we introduce a refinement of cognition games, in which players are allowed to learn and reason about the cognition of their opponents, namely, the game the opponents believe is being played and their evaluation of the game situations, i.e., the utility functions they use. We first introduce the notion of state pair, a formal structure that allows reasoning about the cognition of opponents.
State Pairs. Consider an extensive game G = , a filter net , and an evaluation net for G. A state pair of G is a pair of states of the form satisfying , i.e., states following in the pair are those within the filtration at .
Without loss of generality, we assume that the opponents’ cognitive ability is encoded by the number of future steps that are foreseeable to them, i.e., their search depth. The opponents’ evaluation of the goodness of leaves of the search tree can be modeled as a static payoff function p. The set of utility functions is denoted by P. The intuition behind state pairs is to capture the adversarial cognition of the player who moves at . The expression of the form encodes the cognition that the player moving at holds about the cognition of the player moving at , including what he can foresee occurring in the future and what his utility function is. We use to denote the set of state pairs of G.
Based on the notion of state pairs, we can represent adversarial cognition structures by associating each state pair with a set of states and an evaluation over the terminal states therein.
Adversarial cognition structures. Let G be an extensive game and and be the filter net and evaluation net for G, respectively. An adversarial cognition structure for G is a tuple , such that is a function , associating a subset of nodes following with each state pair , and is a function , associating a payoff function with each state pair. satisfies the following conditions:
(Self-consistence) with , we have that whenever , i.e., the player has a precise cognition regarding what he himself can foresee.
(Adversary–inclusivity) ∀, , i.e., an agent’s cognition for , is a subset of his cognition of himself at .
(Depth-limited) with and , we have that = , where , i.e., the player’s cognition at about what the player moving at can see is represented by nodes in the subtree limited by a depth d (we use to denote the nodes in that can be reached from within d depth).
For , denotes the player’s cognition regarding the utility function of the player moving at .
A game model with the agent’s cognition regarding their opponents can then be obtained by assigning an adversarial cognition structure to a cognition game.
Adversarial Cognition Game. An adversarial cognition game (ACG) is defined as a tuple , with G being an extensive game, while , and C are the filter net, evaluation net, and adversarial cognition structure for G, respectively.
Note that an ACG induces a sequence of extensive games, one for each state pair concerning the player’s adversarial cognition. For any ACG , we denote the game induced by any sate pair as , for which the game tree is restricted by and the utility function is .
4. Game Solving and Equilibrium
Based on the player’s adversarial cognition at each state pair, he can search the current game and make an optimal decision with regard to the possible moves and payoffs for the corresponding outcomes. The combination of all these optimal decisions results in a solution to the ACG. According to this idea, Algorithm 1 presents the process for solving an ACG, which is called
adversarial cognitive game solving. The process starts from root
and extends the sequence
q by successively adding the successor node that is the result of the optimal move in the cognition game at the current node determined by using Algorithm 2. This process is depicted in
Figure 1.
Algorithm 1: Solution of . |
|
Algorithm 2: Adversarial cognition move. |
|
The most important parts of Algorithm 2 are the function
(Algorithm 3) and function
(Algorithm 4). Algorithm 3 computes, for a state pair
, the node sequences determined by the SPE of
and then yields the optimal successors following
v. Given the optimal move of each node
in the cognition game
of the current player at
v, Algorithm 4 computes a best path following
v for
and returns the immediate successor of
v as required.
Algorithm 3: Search. |
|
Algorithm 4: Best Branch. |
|
Note that the above game-solving algorithm is different from the standard BI or the cognitive game-playing process in [
26]. Consequently, the resulting equilibrium of the ACG differs from that of the SPE or CPE. Algorithm 3 searches the game
induced by state pair
, which corresponds to an SPE of
. The optimal move at state
v should be consistent with the SPE of
for any state
within
’s cognition game
. Therefore, the first equilibrium we can define is the one obtained at state
v, which is called
local adversarial cognition equilibrium (LACE).
LACE. Let be an ACG, and let v be any node in V. A strategy profile is a LACE of at v if:
At each , there exists a terminal node in , such that and z is an outcome of the SPE for .
We denote as the set of LACE outcomes of at v. The composition of such outcomes yields the global solution for :
Adversarial Cognition Equilibrium. Let be an ACG. A strategy profile is an adversarial cognition equilibrium (ACE) of if, at each , there exists a terminal history in , such that and .
An ACS is the composition of best strategies of players at each decision node. Each such strategy is the best response for the players, given the player’s cognition about the opponents’ beliefs on the games being played and the quality of the game states.
As is the case for other equilibrium concepts, each ACE of an ACG also determines a specific sequence of nodes. Suppose is an ACE of with root . The ACE solution determined by , dubbed as , is a sequence of , satisfying that , , and for all i with .
The set of adversarial cognition solutions of is denoted as .
For game theory, another fundamental concern is the existence of an equilibrium. The following lemma clarifies that every ACG has an ACE.
Lemma 1. (Existence) Every ACG Γ has an ACE.
Proof. It is sufficient to show the existence of the ACE at any position v. The first step is to prove the existence of the SPE for each . We can obtain an SPE at v via induction with regard to the depth h of the nodes. Let f be a function that connects a path with each state . When , i.e., is a leaf, define . Then, if is defined for all nodes with a depth of for some , suppose is a node with , and . Given that , it has , where for any action a in . Let be a maximizer of over followers of v in , and let . Via an induction, we have obtained a strategy profile in . According to the definition of SPE, is an SPE of .
For any intermediate node v in , let for every in , in which is an SPE of . According to the definition, is an ACE of at v. Finally, we can construct a strategy for every state v in ; thus, it is evident that is an ACE of . □
The observation below illustrates the connection between the ACE and the two previously mentioned equilibrium concepts, SPE and CPE, by specifying the conditions under which the ACE collapses into the SPE or CPE.
Proposition 1. Let be an ACG.
- (1)
If for every nonterminal node v, = for any node in the filtration at v, then an ACE of Γ is also a CPE, and vice versa.
- (2)
If for every nonterminal node v, and = for any node in the filtration at v, then an ACE of Γ is also an SPE of G, and vice versa.
Proof. . Let be a CPE of . For all the nonterminal nodes v and any node in with , there is a with for any alternative in , satisfying that . When = , we find that is an SPE of for any in . Therefore, is a LACE of at each v. Since for any v, is an ACE of by definition. Let be an ACE of . For all the nonterminal nodes v, there is a that is a LACE of with . That is, for any in , there is an SPE of , such that = . When = , it holds that is an SPE of for any such . Therefore, is an SPE of for any v. Since , it follows that is a CPE of .
. Let be an SPE of G. For every nonterminal node v with = i, holds. Consequently, for any , holds. Given that and = , there is a strategy with for any alternative in , satisfying that . That is, is a LACE of at each v; thus, is an ACE of . . Take any ACE of . For any nonterminal node v, there exists a strategy , which is a LACE of . That is, for any in , there is an SPE of , s.t., = . If and = , we find that is an SPE of for any such . Therefore, is an SPE of for any v. Since , it follows that is an SPE of G. □
Therefore, if the current player’s cognition regarding the following players’ cognition is the same as his cognition of himself, then the ACE is equivalent to the CPE; if the player’s cognition is the same as the complete subtree therein, then the ACE is equivalent to the SPE. However, these conditions are normally impossible during real gameplay, which reflects the rationality of our framework.
Crucial issues concerning the game-solving algorithm include its correctness and complexity. The following theorem presents an argument that each solution returned by Algorithm 1 is an ACS of the game.
Theorem 1. (Correctness) For any ACG Γ with root , for any path returned by Sol(Γ) in Algorithm 1, there exists an ACE of Γ, such that .
Proof. This can be proved by induction regarding the depth d of the game tree.
Base case: is trivial, with only a single node in the game.
When , let . According to Algorithm 1, is a successor of v obtained by executing (Lines 5–7). That is, is a sequence returned by (line 4 in Algorithm 2) and the action a, such that is a best move returned by . Therefore, is an SPE solution of . At the same time, is a LACS of at . We can define a strategy profile as , and for any . Observe that is an ACE of according to the definition of ACE. Hence, is determined by an ACE.
Induction assumption: For with depth k, .
Induction: Let be a node sequence returned by Algorithm 1. Therefore, initiates an SPE solution of . We can define a strategy profile as and for , and for any other state , is consistent with an SPE of . Then, . It has yet to be verified that is an ACE of . This result is direct according to the definition of ACE. □
The complexity of Algorithm 1 is analyzed in the following proposition.
Proposition 2. (Complexity) The worst time complexity of Algorithm 1 is , where n is the number of nodes in the underlying game.
Proof. First, let b be the number of branches selected in the filtration and let d be the depth of the game; then, the time complexity of Algorithm 4 is = . For Algorithm 3, let be the complexity of a game tree with depth d. Then, , and for any k = , , where m represents the width of the game tree. Meanwhile, for . Through iterative computation, the time complexity of Algorithm 3 is obtained, i.e., . Algorithm 2 must first obtain the filtration at v, then call Algorithm 3 for each in the filtration, and finally call Algorithm 4. Therefore, the time complexity is the sum of the three parts. For the filtration, the complexities of the three subprocedures are , and . Considering that b and l are normally much smaller than m and d, we can obtain the complexity of the filtration at v, i.e., . Therefore, the complexity of Algorithm 2 is . To obtain a node sequence q, the musts be called times. Hence, the overall complexity of Algorithm 1 is , where b is a constant and , , . □
5. An Example: Tic-Tac-Toe
After establishing a model of extensive games involving players’ cognition on the opponents and the new solution concept, we proceed with an example illustrating this framework, through which the procedure of solving such games is demonstrated. Comparison with the case without cognition about opponents confirms the feasibility of opponent modeling in gameplay.
We consider the example first presented in [
26], which starts with a scenario in a Tic-Tac-Toe game. Tic-Tac-Toe is a simple game that is suitable for illustrating our model, and has been extensively used in the literature due to its simplicity. According to [
26], with a player’s own cognition regarding the game, a cognition game model of Tic-Tac-Toe consists of three components: a classic extensive game model
G, a filter net
and an evaluation net
, where:
- (1)
, such that
the set of players , with and ;
game tree T=, with
V=;
A=;
= ;
for nodes in which it is player 1’s turn to move, and for player 2’s turn;
player i’s strategies = , and for each and with ;
utility is defined as 1 for any terminal node z where i wins the game; when player i loses at z; and when there is a draw.
- (2)
is a multi-layer backpropagation (BP) neural network, in which the number of input neurons is nine, representing the feature of nine grids. There are also 9 output neurons, one for each grid (−1 for ×, 1 for ∘, and 0 for idle). There are also 50 hidden neurons. The filter function is determined by the output probability of the filter net for any state s and any possible move following s.
- (3)
shares the same structure as , but it has only one output neuron, which outputs a probability for s.
The process for game solving in [
26] under this model is to compute the CPE. The decisions at each point are made based on the two output probabilities from the filter net and evaluation net, which characterize players’ cognition on the plausibility of moves and the quality of game states, respectively.
For comparison with the model proposed here, we consider the same instance, viz., a partial game of Tic-Tac-Toe, with a starting point
(see
Figure 2 and two of its successors
,
) in which it is player
O’s turn to move. The game tree after filtration via the filter net
is shown in
Figure 3, where the board configurations of these nodes are shown in
Figure 4 (for intermediate nodes) and
Figure 5 (for terminal nodes).
The final utilities of the terminal nodes (based on the cognition of player
O) are shown in
Table 1. Note that these nodes are not terminal nodes of the original game, but the terminal ones within the cognition of
O. For each node, each utility is given as the average of the probability returned by the evaluation net
and the utility of the leaf in the most plausible subsequent path. For each pair of utilities, the first value is the utility of player
X, and the second is the utility of player
O. For simplicity, more details about obtaining the above figures and table are omitted, since the information does not affect our consideration.
According to the cognition game solution algorithm in [
26], for the subtree at
, player
X will choose branch
via a BI process, since when using
,
X can receive a utility 0.30 after
O’s choice of
; similarly, for the subtree at
, the player chooses
. Consequently, the optimal choice for
O at
is
. In the following steps, since there are at most three branches and the search depth is no greater than three, no filtration is needed. Continuing this game-solving procedure results in a leaf node
, in which the game ends with a draw.
The above process is based on the assumption that player X holds the same cognition as player O. However, if the player’s cognition on the opponent is involved, the results are different. If the player can make a precise prediction about their opponent, then he can try to utilize this information and obtain a better outcome.
Suppose now that player X’s cognition on the game tree is limited to a subtree of depth 2, i.e., the player can only search two steps forward. Moreover, the player’s evaluation function regarding the quality of nodes is given by , where denotes the 9 grids on the board, and and are the weight and the value for each grid. if the jth grid is occupied by player X; if the grid is blank; otherwise, . for four corners; for the center grid; otherwise, . For example, the evaluation of (for which , , ; ; ; , i.e., there is one grid with weight 3 and value 1, one grid with weight 2 and value 1, two grids with weight 1 and value 1, and two grids with weight 3 and value , two grids with weight 1 and value ), one with weight 3 and value 0 is as follows:
= +. Correspondingly, the evaluation of for player O (from the perspective of X) is 1. We write this result as .
We can then directly obtain the evaluation for the other nodes following : ; ; ; ; . Similarly, = ; ; ; ; ; .
Furthermore, suppose that player O’s prediction on X’s cognition is correct. Then, from O’s perspective, X will choose at and or at . Therefore, player O will choose at , since this selection will result in , where O wins the game. Compared with the case without reasoning about player X’s cognition, player O gains a better result (a win or a draw) by knowing the decision that will be made by X.