Figure 1.
CTDE framework vs. DTDE framework.
Figure 1.
CTDE framework vs. DTDE framework.
Figure 2.
Parameter sharing model.
Figure 2.
Parameter sharing model.
Figure 3.
Parameter independence model.
Figure 3.
Parameter independence model.
Figure 4.
Policy network coding method.
Figure 4.
Policy network coding method.
Figure 5.
Flowchart of DE-COMA.
Figure 5.
Flowchart of DE-COMA.
Figure 6.
DE-COMA initialization module.
Figure 6.
DE-COMA initialization module.
Figure 7.
DE-COMA interaction module.
Figure 7.
DE-COMA interaction module.
Figure 8.
DE-COMA RL update module.
Figure 8.
DE-COMA RL update module.
Figure 9.
DE-COMA differential evolution update module.
Figure 9.
DE-COMA differential evolution update module.
Figure 10.
Training convergence plots of win rates for six algorithms on StarCraft II 2s_vs_1sc (a), 2s3z (b), 3m (c), 8m (d).
Figure 10.
Training convergence plots of win rates for six algorithms on StarCraft II 2s_vs_1sc (a), 2s3z (b), 3m (c), 8m (d).
Figure 11.
Training convergence plots of average return for six algorithms on StarCraft II 2s_vs_1sc (a), 2s3z (b), 3m (c), 8m (d).
Figure 11.
Training convergence plots of average return for six algorithms on StarCraft II 2s_vs_1sc (a), 2s3z (b), 3m (c), 8m (d).
Figure 12.
2s3z scenario action sampling.
Figure 12.
2s3z scenario action sampling.
Figure 13.
2s_vs_1sc scenario action sampling.
Figure 13.
2s_vs_1sc scenario action sampling.
Figure 14.
3m scenario action sampling.
Figure 14.
3m scenario action sampling.
Figure 15.
8m scenario action sampling.
Figure 15.
8m scenario action sampling.
Table 1.
Individual coding information.
Table 1.
Individual coding information.
Num | Definition | Size ([row, col]) | Symbol |
---|
1 | fc1.weight | [input_shape,rnn_hidden_dim] | |
2 | fc1.bias | [rnn_hidden_dim] | |
3 | reset_gate_weight | [rnn_hidden_dim, rnn_hidden_dim] | |
4 | reset_gate_bias | [rnn_hidden_dim] | |
5 | update_gate_weight | [rnn_hidden_dim, rnn_hidden_dim] | |
6 | update_gate_bias | [rnn_hidden_dim] | |
7 | con_hidden_status.weight | [rnn_hidden_dim, rnn_hidden_dim] | |
8 | con_hidden_status.bias | [rnn_hidden_dim] | |
9 | fc2.weight | [rnn_hidden_dim, n_actions] | |
10 | fc2.bias | [n_actions] | |
Table 2.
Experimental algorithm parameter table.
Table 2.
Experimental algorithm parameter table.
Name | Variable | DE-COMA | COMA | VND | QMIX | QTRAN | MAVEN |
---|
Training epochs | n_timestpes | 1 × 106 | 1 × 106 | 1 × 106 | 1 × 106 | 1 × 106 | 1 × 106 |
Evaluation generations | evaluate_epoch | 5000 | 5000 | 5000 | 5000 | 5000 | 5000 |
Replay buffer size | batch_size | 32 | 32 | 32 | 32 | 32 | 32 |
Greedy rate | epsilon | 0.5 | 0.5 | 1 | 1 | 1 | 1 |
Population size | NP | 10 | - | - | - | - | - |
Evolution generations | rounds | 5 | - | - | - | - | - |
Mutation rate | factor | 0.7 | - | - | - | - | - |
Crossover rate | CR | 0.7 | - | - | - | - | - |
Decay rate | min_epsilon | 0.05 | 0.02 | 0.05 | 0.05 | 0.05 | 0.05 |
Discount rate | gamma | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Table 3.
Win rate data for six algorithms on StarCraft II 2s_vs_1sc, 2s3z, 3m, 8m.
Table 3.
Win rate data for six algorithms on StarCraft II 2s_vs_1sc, 2s3z, 3m, 8m.
Env | Results | DE-COMA | COMA | QMIX | VDN | QTRAN | MAVEN |
---|
2s1sc | Mean | 5.69 × 10−01 | 4.70 × 10−01 | 7.21 × 10−01 | 8.75 × 10−01 | 8.36 × 10−01 | 6.63 × 10−01 |
| Std | 3.13 × 10−01 | 3.46 × 10−01 | 3.59 × 10−01 | 2.78 × 10−01 | 3.16 × 10−01 | 3.27 × 10−01 |
Max | 1.00 × 1000 | 6.00 × 10−01 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 9.50 × 10−01 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
2s3z | Mean | 4.41 × 10−01 | 2.50 × 10−01 | 7.75 × 10−01 | 7.42 × 10−01 | 4.70 × 10−01 | 5.36 × 10−01 |
| Std | 1.97 × 10−01 | 1.14 × 10−01 | 2.61 × 10−01 | 2.38 × 10−01 | 2.70 × 10−01 | 3.19 × 10−01 |
Max | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
3m | Mean | 7.50 × 10−01 | 6.68 × 10−01 | 7.66 × 10−01 | 7.46 × 10−01 | 7.61 × 10−01 | 7.11 × 10−01 |
| Std | 1.68 × 10−01 | 1.70 × 10−01 | 2.76 × 10−01 | 2.49 × 10−01 | 2.70 × 10−01 | 3.77 × 10−01 |
Max | 1.00 × 1000 | 9.50 × 10−01 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
8m | Mean | 7.91 × 10−01 | 7.01 × 10−01 | 8.18 × 10−01 | 7.37 × 10−01 | 7.52 × 10−01 | 7.06 × 10−01 |
| Std | 2.42 × 10−01 | 2.34 × 10−01 | 2.37 × 10−01 | 2.80 × 10−01 | 2.37 × 10−01 | 2.82 × 10−01 |
Max | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 | 1.00 × 1000 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
Table 4.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s_vs_1sc win rate.
Table 4.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s_vs_1sc win rate.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 3.87 | 4 |
COMA | 4.63 | 6 |
QMIX | 3.36 | 3 |
VDN | 2.25 | 1 |
QTRAN | 2.53 | 2 |
MAVEN | 3.96 | 5 |
Table 5.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s3z win rate.
Table 5.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s3z win rate.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 4.19 | 5 |
COMA | 5.66 | 6 |
QMIX | 1.63 | 1 |
VDN | 2.06 | 2 |
QTRAN | 3.86 | 4 |
MAVEN | 3.59 | 3 |
Table 6.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 3m win rate.
Table 6.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 3m win rate.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 3.19 | 4 |
COMA | 4.74 | 6 |
QMIX | 3.07 | 2 |
VDN | 3.46 | 5 |
QTRAN | 3.17 | 3 |
MAVEN | 3.02 | 1 |
Table 7.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 8m.
Table 7.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 8m.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 3.52 | 2 |
COMA | 3.76 | 5 |
QMIX | 2.43 | 1 |
VDN | 3.56 | 3 |
QTRAN | 3.69 | 4 |
MAVEN | 4.04 | 6 |
Table 8.
Average return data for six algorithms on StarCraft II 2s_vs_1sc, 2s3z, 3m, 8m.
Table 8.
Average return data for six algorithms on StarCraft II 2s_vs_1sc, 2s3z, 3m, 8m.
Env | Results | DE-COMA | COMA | QMIX | VDN | QTRAN | MAVEN |
---|
2s1sc | Mean | 1.04 × 1001 | 9.18 × 1000 | 1.30 × 1001 | 1.53 × 1001 | 1.46 × 1001 | 1.20 × 1001 |
| Std | 5.06 × 1000 | 5.46 × 1000 | 5.79 × 1000 | 4.80 × 1000 | 5.34 × 1000 | 5.39 × 1000 |
Max | 1.93 × 1001 | 1.90 × 1001 | 1.95 × 1001 | 1.95 × 1001 | 1.95 × 1001 | 1.95 × 1001 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
2s3z | Mean | 8.21 × 1000 | 3.53 × 1000 | 1.48 × 1001 | 1.42 × 1001 | 9.32 × 1000 | 1.05 × 1001 |
| Std | 3.64 × 1000 | 2.18 × 1000 | 4.83 × 1000 | 4.46 × 1000 | 4.94 × 1000 | 5.88 × 1000 |
Max | 1.99 × 1001 | 1.17 × 1001 | 1.98 × 1001 | 1.96 × 1001 | 1.88 × 1001 | 1.89 × 1001 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
3m | Mean | 1.41 × 1001 | 1.29 × 1001 | 1.46 × 1001 | 1.41 × 1001 | 1.44 × 1001 | 1.36 × 1001 |
| Std | 3.28 × 1000 | 3.25 × 1000 | 5.19 × 1000 | 4.68 × 1000 | 5.00 × 1000 | 6.92 × 1000 |
Max | 1.95 × 1001 | 1.88 × 1001 | 1.97 × 1001 | 1.97 × 1001 | 1.96 × 1001 | 1.98 × 1001 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
8m | Mean | 1.47 × 1001 | 1.43 × 1001 | 1.56 × 1001 | 1.41 × 1001 | 1.43 × 1001 | 1.35 × 1001 |
| Std | 4.36 × 1000 | 4.43 × 1000 | 4.43 × 1000 | 5.25 × 1000 | 4.45 × 1000 | 5.17 × 1000 |
Max | 1.98 × 1001 | 1.98 × 1001 | 1.98 × 1001 | 1.98 × 1001 | 1.97 × 1001 | 1.98 × 1001 |
Min | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 | 0.00 × 1000 |
Table 9.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s_vs_1sc average return.
Table 9.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s_vs_1sc average return.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 4.17 | 5 |
COMA | 4.49 | 6 |
QMIX | 3.34 | 3 |
VDN | 2.46 | 1 |
QTRAN | 2.66 | 2 |
MAVEN | 3.87 | 4 |
Table 10.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s3z average return.
Table 10.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 2s3z average return.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 4.20 | 5 |
COMA | 5.68 | 6 |
QMIX | 1.66 | 1 |
VDN | 2.04 | 2 |
QTRAN | 3.85 | 4 |
MAVEN | 3.58 | 3 |
Table 11.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 3m average return.
Table 11.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 3m average return.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 3.68 | 5 |
COMA | 4.70 | 6 |
QMIX | 3.03 | 1 |
VDN | 3.50 | 4 |
QTRAN | 3.04 | 2 |
MAVEN | 3.05 | 3 |
Table 12.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 8m average return.
Table 12.
Average ranking of DE-COMA, COMA, QMIX, VDN, QTRAN, MAVEN according to the Friedman test on 8m average return.
Algorithms | Average Ranking | Final Rank |
---|
DE-COMA | 3.49 | 2 |
COMA | 3.71 | 5 |
QMIX | 2.51 | 1 |
VDN | 3.54 | 3 |
QTRAN | 3.67 | 4 |
MAVEN | 4.08 | 6 |
Table 13.
SMAC scenarios.
Table 13.
SMAC scenarios.
Name | All Units | Enemy Units | Type |
---|
2s3z | 2 Stalkers and 3 Zealots | 2 Stalkers and 3 Zealots | Heterogeneous and Symmetric |
2s_vs_1sc | 2 Stalkers | 1 Spine Crawler | Micro-Trick: Alternating Fire |
3m | 3 Marines | 3 Marines | Homogeneous and Symmetric |
8m | 8 Marines | 8 Marines | Homogeneous and Symmetric |