3.1. Effects of Mutation on the Evolution of ‘TfTx’ Sets and of Xr Sets
Populations can be subdivided in fractions μ of mutants and (1 − μ) of non-mutants. Strategy i generates average payoff πi = from interactions with mutants. The average fitness of strategy i individuals is wi = (1 − μ)θi + μπi + K, where θi is the average payoff generated from interactions with non-mutants. We interpret the occurrence of inequalities πi > πAllD as direct effects and of inequalities θi > θAllD as indirect effects if they coincide with the observation of significant effects (wi > wAllD). Direct effects and indirect effects are not mutually exclusive. In the following, the discussion on indirect effects focuses on their emergence for cases when direct effects are excluded (i.e., πAllD ≥ πi for all i s and at all population compositions).
For constant u distributions, strategies generate fixed returns πi, i.e., the payoff from interactions with mutants is independent of the population composition. Then, direct effects can be excluded if (πi – πAllD) ≤ 0 for all strategies i. Direct effects can occur (and are inevitable for sufficiently high μ-values) if (πi – πAllD) > 0 for at least one strategy i. For variable u distributions, the averages πi are functions of population compositions. These compositions are also functions of the mutation rate μ. As a consequence, there is no simple expression for when direct effects can emerge. We focus on analytical results assuming constant u distributions and only briefly discuss the more complicated case of variable u distributions.
For ‘
TfTx’ sets, the difference in performance between
TfTx and
AllD in interactions with mutants can be given by the recursion
πTfTx −
πAllD =
(note that in ‘
TfTx’ notation,
AllD is
TfT0). The adjacent strategies
TfTx−1 and
TfTx perform identically with mutants expressing strategies
TfTy for which
y ≤
x − 2.
TfTx individuals generate one additional round of mutual cooperation from interactions with mutants expressing strategies
TfTy for which
y ≥
x.
TfTx individuals are exploited by
TfTx−1 mutants at a single occasion, and they do not exploit
TfTx mutants in round
x + 1. Consequently, strategy
TfTx is more effective in interactions with mutants than
TfTx−1 (
πTfTx −
πTfTx−1 > 0) if
From this inequality it follows that for uniform ‘TfTx’-u distributions (i.e., uTfT0 = uTfT1 = … = uTfTr), the distribution of the π values has a single peak at πTfTx whereby x is the highest integer for which inequality (r + 1 − x) (R − P) > T − S is satisfied. As a consequence, direct effects can be obtained for uniform distributions by manipulating μ if r(R − P) > T − S (i.e., πTfT1 − πAllD > 0). The expectation that changes in conditions yielding increased x-values also result in increased execution of cooperation at the evolutionary equilibrium, was confirmed in a set of simulations.
For Xr sets, the following property should be noted. The response ρij is the action sequence (of length r) that strategy i triggers from strategy j, and ρi is the entire set of responses ρij (j Xr) of strategy i. Given the comprehensiveness of Xr sets it follows that—for arbitrary set ρi (i Xr)—the same number of respective responses is found for each of the 2r action sequences (i.e., ρi and ρj (i ≠ j) are two permutations of the same set of sequences). The consequence is that, with uniform u distributions, the mean behaviors of mutants are not influenced by the strategy of the opponents. In that case, it can be inferred from the payoff dominance of D over C that AllD generates the absolute highest mean payoff from mutants (πAllD > πi).
For the uniform distributions analyzed above, mean behaviors of mutants are not influenced by the strategy of the opponents. We refer to such
u distributions with unbiased average mutant behaviors as symmetric and to alternative
u distributions with biased average mutant behaviors as asymmetric. This distinction is useful because not only uniform
u distributions of
Xr sets are symmetric. For example, any distribution with uniform
ui values for the conditional strategies is symmetric because the behavior of unconditional strategies is not influenced by the opponent. In the
Appendix A, we define the space of symmetric
u distributions. Note, for both symmetric and asymmetric distributions, increasing the share of unconditional strategies tends to favor
πAllD as
AllD expresses best response behavior to unconditional strategies. As outlined for the uniform distributions, direct effects can be excluded for all symmetric distributions. Hence, direct effects emerge only if strategies can trigger distinct mean mutant behaviors (i.e., the key characteristic of asymmetric distributions).
Symmetric
u distributions for
Xr sets are a special case. The ‘
TfTx’ sets—as shown above—allow for direct effects, and they represent asymmetric distributions (the
ui values of ‘
TfTx’ sets are formed from
u distributions of
Xr sets by setting the
ui values to zero for strategies outside the ‘
TfTx’ sets). It is apparent, for direct effects to occur, that average encounters with mutants should be inefficient for
AllD but efficient for certain other strategies, i.e., mutants should tend to conditionally defect in interactions with
AllD and should tend to conditionally cooperate in interactions with certain other strategies. Examples are distributions (such as ‘
TfTx’) for which mutants tend to express reciprocal behaviors [
20,
23].
3.2. Simulations of Xr-Populations
To gain insight in indirect effects, we performed simulations using the
Xr strategy sets {
X1,
X2,
X3,
X4} with uniform
u distributions. As discussed in the previous subsection, direct effects are excluded with uniform
u distributions. A set of simulations was performed with fixed parameters
K = 0 and {
T,
S,
R} = {5, 0, 3}, while varying mutual defection payoff
P = {0.05, 0.3, 1} and mutation rates
μ = {0.0001, 0.001, 0.01, 0.1}. For these parameter combinations,
Table 2 shows whether populations evolve to an equilibrium or not (equilibrium conditions are described in the
Appendix B). For all settings, {
X1,
X2}-populations (i.e., playing the one-round and the two-round game) evolve to equilibrium (
Table 2). The table shows that for
P = {0.05, 0.3}, no equilibrium is attained in the evolution of certain
X3-populations and of certain
X4-populations.
The equilibrium populations described in
Table 2 are dominated by
AllD (i.e.,
fAllD >
fi for
i ≠
AllD)—this characteristic applies to all observed equilibrium populations in our study. Furthermore, all observed equilibrium strategy frequencies
fi are identical for both continuous and discrete-generation models. At equilibrium, dominance of
AllD implies that the strategy also has fitness dominance. We do not find persistent indirect effects in the populations that do not reach equilibrium. Consequently, we do not find persistent indirect effects in the simulations.
For
P = 1, the {
X1,
X2,
X3,
X4}-populations evolve to equilibrium for all mutation rates (
Table 2). For rates
μ = {0.001, 0.01, 0.1},
Table 3a shows the average number of
C executions per Prisoner’s Dilemma game (
) in these equilibrium populations. For each setting, these averages increase with mutation rates. The
-values of
X1-populations (
Table 3a) are only slightly higher than the inflow of cooperator (
c) mutants (~0.5
μ). The execution of cooperation can thus be attributed to
c-mutants. The table shows for each mutation rate that
-values of {
X2,
X3,
X4}-populations are approximately three times higher than those of
X1. We attribute this difference to the fact that sets {
X2,
X3,
X4} contain conditional strategies.
Table 3b shows that evolution to an equilibrium is found in simulations of
X3-populations using the two background fitness values
K = {0, 5, 20}. Along
K = {0, 5, 20} we find an increase in mean cooperation for each rate
μ (
Table 3b).
The observed cooperation in the populations of
Table 3 is maintained by mutation-selection balance because direct and indirect effects are absent. This interpretation of the
-data is straightforward for the
X1-populations. For the populations with repeated games, cooperation can be argued to be disadvantageous because ‘non-
AllD’-individuals would increase their fitness by substituting their strategy for
AllD. However, we emphasize that in several {
X3,
X4}-populations, non-defectors obtain above-average fitness at equilibrium (
Table 2 and
Table 3). The potential for the evolution of conditional behavior in repeated games ({
X2,
X3,
X4}) seems to reduce selection against cooperation (as cooperation levels are higher for these sets than in
X1; see
Table 3a). As expected, a similar effect can be attributed to increasing background fitness
K (
Table 3b).
Table 2 shows that for the lowest mutual defection payoff (
P = 0.05), {
X3,
X4}-populations do not converge to equilibrium in the simulations with the two lowest mutation rates. For the intermediate
P-value of
Table 2, this phenomenon is also observed for
X3-populations at the lowest rate and for
X4-populations at the three lowest rates. With its 256 times smaller set size, the
X3-populations are more convenient to study. This is why we mainly study non-equilibrium behavior in
X3-populations.
For
P = 0.05,
Figure 1a shows the mean execution of cooperation per frPD game (
) along
μ = {0.00001, 0.0001, 0.001, 0.01, 0.1}. For the lowest rate and for the two highest rates, these means are sampled at equilibrium. As mentioned, the equilibrium frequencies are not affected by the choice of the generation model (i.e.,
dgm or
cgm). Hence, the
-values are identical in
Figure 1a for each of these rates. After transient phases, the populations at rates
μ = {0.0001, 0.001} evolve in cycles. As an example, consider the strategy dynamics at rate
μ = 0.001 in
Figure 1b for
dgm and in
Figure 1c for
cgm.
Table 4 lists the strategies with max(
fi) > 0.1 during the cycles for these two figures. For mutation rates
μ = {0.0001, 0.001}, the
-values in
Figure 1a are averaged over one cycle period. The
-values are identical if populations are initialized with
fAllD = 1 and with a uniform frequency distribution. For both mutation rates, the averages
are higher if sampled over
dgm-cycles than if sampled over
cgm-cycles (
Figure 1a). The figure also shows that for both models, the
-values are higher in the cycling populations than for the equilibrium populations at
μ = 10
−5. The
-values are higher than the equilibrium-values found at the higher rate
μ = 0.01 for the
dgm at rates
μ = {0.0001, 0.001} and for the
cgm at rate
μ = 0.001 (
Figure 1a). Consequently, for both types of generation models, an optimum in
exists within the interval 10
−5 <
μ < 0.01.
For the two mutation rates μ = {0.0001, 0.001}, we tested the sensitivity of the cycling dynamics in dgm-populations to the choice of background fitness K. The populations show cycling dynamics if K ≤ {45, 3} (→μ = {0.0001, 0.001}) and evolve to equilibrium for higher K-values. The X3-populations showing non-equilibrium dynamics in our simulations evolve in cycles (and show periodic indirect effects).
The strategy dynamics in
Figure 1b,c resemble those in the corresponding simulations with the lower mutation rate
μ = 0.0001. All four cycles show (as in
Figure 1b,c) alterations of phases with dominance of
AllD followed by phases with dominance of
TfT1 (
cdddddd). As can be inferred from these dynamics,
AllD respectively
TfT1 have the highest fitness when invading the populations. Consequently, these populations express periodic indirect effects. In
Figure 2, we give behavioral statistics from the simulation of
Figure 1b.
Figure 2a shows the dynamics of the mean number of executed
C actions for each round of the game (
,
i = {1, 2, 3}). Cooperation is more intensively executed during
TfT1 phases, especially in round 1 (
Figure 2a). The relatively longer
TfT1 phase durations in the
dgm-populations (compare
Figure 1b with
Figure 1c) explain that
-values are higher in
dgm-populations than in corresponding
cgm-populations (
Figure 1a at
μ = {0.0001, 0.001}).
For the three payoffs
P,
T, and
R,
Figure 2b shows the dynamics of the mean payoff values per frPD game (i.e.,
,
, and
). Steep increases in the generation of
T-payoffs (
Figure 2b) mark the onset of invasions by
TfT1 (
Figure 1b). Defectors like
AllD generate this payoff in the first round when interacting with
TfT1 and defectors are the dominant opponents of this strategy at the onset of invasions (
Figure 1b). The increase in the generation of
T-payoffs is therefore partly explained by defectors triggering this payoff from
TfT1. For
TfT1, these first round interactions seem disadvantageous, but this disadvantage is evidently compensated because
TfT1 invades.
Figure 2b additionally shows the dynamics of expected average payoff values generated per game from receiving payoff
P,
T, or
R (i.e.,
,
, and
). For payoff
T, the observed value is higher than the expected value (
Figure 2b) over the dominance phase of
TfT1 (
Figure 1b). These differences between observed and expected values are caused by the conditional behaviors in rounds 2 and 3. Hence, we propose that the invasions of
TfT1 are fueled by triggering
T-payoffs in these rounds. At the onset of invasions,
AllD is the dominant strategy (
Figure 1b) and defection is the predominant behavior (
Figure 2). Defectors (in contrast to non-defectors) are not penalized when interacting with
AllD and they can therefore be expected to perform better than other strategies in
AllD-dominated populations. The strategy
TfT1 generates
T-payoffs from the twelve defectors {
ddcd…,
dddd.c.}. Game interactions between these defectors and
TfT1 indeed significantly contribute (data not shown) to the increases of
(
Figure 2b).
In the
appendix, we derive the invasion condition for a single
TfT1-individual in a population state with full defection. We find that such invasion occurs if the combined frequency of defectors {
ddcd…,
dddd.c.} exceeds (
P −
S)/(
T −
P) (~0.01 in the simulation of
Figure 1b). This condition is fulfilled over the entire cycle period in
Figure 1b, but the population state deviates from full defection due to mutation. In this state,
AllD obtains the highest benefit from interactions with mutants (i.e.,
μ(
πAllD −
πTfT1) > 0). Thus, the invasion conditions in the simulations should be more stringent than those derived in the
appendix. Before the onset of the invasions, the population does converge towards a state of full defection (
Figure 2) and thus towards the conditions underlying the analysis in the
appendix. In our opinion, the invasions of
TfT1 in the simulations are fueled by interactions with defectors {
ddcd…,
dddd.c.}, just like in the analysis. That
AllD subsequently regains dominance, thereby closing the cycle (
Figure 1b,c), is in line with the expectations from the selection dynamics of evolutionary frPD games [
18,
19].
In
Table 2 and
Table 3, we mark the equilibria (
E in
Table 2 and italic numbers in
Table 3) in which non-defectors have above-average fitness. The strategy
TfT1 has the highest fitness among the non-defectors in these equilibria. Furthermore, these equilibria emerge at the higher mutation rates (
Table 2 and
Table 3) possibly because mutation benefits
TfT1 (e.g., by generating defectors {
ddcd…,
dddd.c.} opponents) in these equilibria. However, invasion by this strategy is prevented also because
AllD is the strategy that benefits most from interactions with mutants (
μ(
πAllD −
πTfT1) > 0).
The
X4-simulations are more computation-intensive than the
X3-simulations, and we restricted these simulations to 10
4 generations due to constraints on computation time. Consequently, the data obtained do not allow definitive conclusions on the nature of non-equilibrium
X4-dynamics. Over the simulation periods, chaotic dynamics occurs for the
X4-populations with non-equilibrium dynamics in
Table 2. For example, in
Figure 3, the
X4-frequency dynamics at {
P,
μ} = {0.3, 0.01} exhibits a transient period of ~2000 generations, after which alternations of dominance by strategies {
AllD,
ddcdddddddddddd,
TfT1,
dddddcddddddddd} emerge. The population therefore expresses periodic indirect effects. As in
Figure 1b,c, strategies
AllD and
TfT1 in
Figure 3 become periodically dominant, with dominant phases of strategies {
AllD,
TfT1,
dddddcddddddddd} that have fairly regular phase lengths (
Figure 3).
For {
X1,
X2,
X3,
X4},
Table 2 shows that all populations evolve to equilibrium in the two smallest sets {
X1,
X2}, and non-equilibrium dynamics occurs more frequently when going from set
X3 to set
X4 (e.g., the X
3-population evolves to equilibrium under the conditions of
Figure 3). We interpret this observation as an indication that increasing the number of rounds (
r) increases the parameter range for which periodic indirect effects emerge. This interpretation meets our intuition because strategy
TfT1 generates one
T-payoff from
X2-defector
ddc, two
T-payoffs from
X3-defectors
ddcdc.., and three
T-payoffs from
X4-defectors
ddcdc..d…c….