1. Introduction
Autonomous underwater vehicles (AUVs) are being widely used for ocean observation, sea rescue, minefield search, enemy reconnaissance, and in other related fields [
1,
2]. Compared to manned equipment, AUVs present the advantages of low cost, zero casualty count, good concealment, strong mobility, low energy consumption, recyclability, etc. [
3,
4,
5]. However, in the face of high efficiency and large-scale task requirements, a single AUV hardly fulfills all these demands. A multi-AUV system is a new alternative for complex ocean tasks due to its high efficiency and reliability, achieved through its time-space distribution and redundant configuration. In recent years, studies of multi-AUV cooperative formation and navigation research has made significant headway [
6,
7,
8], but the multi-AUV cooperative counter-games are still limited. The multi-AUV cooperative countermeasure can be used in marine research and military tasks, including underwater multi-target tracking, monitoring, and detection, which effectively expands the underwater combat radius and reduces underwater equipment loss and casualties.
The maneuver countermeasure is a special case of maneuver decision-making, which can be roughly divided into two categories. One is the ‘one-sided’ optimization algorithm, which only considers one’s self strategy optimization. The other is the ‘two-sided’ game algorithm, which fully analyzes the influence of both sides and emphasizes the conflict and confrontation of the situation. The one-sided optimization algorithm always includes an intelligent algorithm, guidance law algorithm, and expert system [
9]. The expert system-based maneuver decision-making algorithm presented in Ref. [
10] is one of the earliest works. However, this method relies heavily on experience, the chosen strategy is not guaranteed to be optimal, and adaptability is not reliable. An air combat maneuver decision-making algorithm based on heuristic reinforcement learning is proposed in Ref. [
11], which realizes the online task allocation of actual confrontation. In Ref. [
12], a machine learning algorithm is used to train a large-scale cluster UAV, showcasing its high intelligence decision-making ability. This kind of one-sided optimization algorithm may achieve the desired results to a certain extent, but the optimization process is not completely objective because the influence of the opponent strategy is not considered, and the interaction and conflict of the game are ignored.
The matrix game and differential game are the most widely studied aspects regarding the two-sided counter-game algorithm [
13]. In Ref. [
14], the matrix game is first used to study the two UAVs counter-game, but its model and strategy are relatively simple and cannot be applied to actual confrontation. The integration of a game strategy solution to this model, based on intelligent algorithms, is proposed in Ref. [
15], which improves the applicability of the former model. Ref. [
16] discusses the maneuver counter-game of UAV clusters based on the intuitionistic fuzzy game, mapping the uncertainty of information such as a game environment to fuzzy membership, which is more aligned with the reality of a counter-game scenario. Furthermore, Ref. [
17] examines the problem of an active defense cooperative differential game with three players, and Ref. [
18] discusses the problem of a qualitative differential game in which multiple pursuers hunt for an advantage runner. Overall, the two-sided game algorithm could highlight the complex conflict of an unmanned system cluster countermeasure, which may realize the maneuver strategy in a more scientific and accurate way. However, owing to the complex marine environment [
19], it is necessary to further explore the multi-stage dynamic countermeasure algorithm of multi-AUV systems with regard to underwater characteristics.
Classical game theory only discusses games with clear payment information [
20]. However, most of the information in the marine environment is uncertain, which is the main problem in the counter-game process. If the uncertain information is directly converted to a certain value, the maneuver countermeasure algorithm may lose its credibility in the strategy selection. In this paper, this problem is solved by introducing the interval information into the cooperative dynamic maneuver countermeasure algorithm of the multi-AUV. The uncertainty in the multi-AUV counter-game, confrontation effectiveness, and the marine environment is regarded as an interval information series. Meanwhile, an interval payment multi-attribute evaluation of the multi-AUV maneuver strategy is carried out, and the interval payment matrix is obtained. Therefore, the uncertain information is well considered and settled in the proposed counter-game algorithm, which make this algorithm suitable to application in a marine environment. Moreover, the Nash equilibrium condition satisfying the interval payment is then proposed, and the countermeasure model of a Nash equilibrium strategy in a dynamic marine environment is established. To determine the optimal strategy, an improved fractional-order Differential Evolution (DE) algorithm is presented by a combination of DE algorithm and fractional calculus, which could lead to a raise of the convergence rate in the optimization process. Fractional calculus owns a long-memory ability and does not rely on current gradients [
21]. Thus, the fractional-order DE contains great potential in avoiding slow convergence and local extremum.
Overall, the main contributions of the proposed dynamic maneuver countermeasure algorithm for multi-AUV can be summarized as follows. Firstly, the underwater environment and communication condition have been well presented in the countermeasure process using the interval information game; thus, the modeling of the dynamic maneuver counter-game of the multi-AUV is more accurate. Then, an improved fractional-order DE algorithm is proposed to improve the efficiency of achieving the optimal strategy of a real-time underwater countermeasure. The order of a fractional-order DE can affect the convergence rate and optimization error, which can also be tuned to satisfy different underwater requirements. But it should be noted that the standard of determining the optimal fractional order is not analytic or unified. The determining process needs more computation at the beginning of the proposed algorithm.
The remainder of this paper is organized as follows:
Section 2 presents the preliminary of the fractional derivatives and DE algorithm;
Section 3 presents the maneuver attributes, the advantage function, and the payoff matrix;
Section 4 presents the payment ranking based on the four-parameter interval set, relative entropy, and the Nash equilibrium optimal solution; the fractional-order DE algorithm is presented in
Section 5;
Section 6 demonstrates the example of the multi-AUV counter-game; finally, the conclusions are stated in
Section 7.
4. Optimal Solution of Multi-AUV Dynamic Maneuver Countermeasure
4.1. Payment Interval Ranking Based on Relative Entropy
For the comparison of interval information sets, one cannot compare the size from the perspective of quantity, such as real numbers. The sorting method based on the possibility degree has the possibility of failure and that based on geometric distance may cause serious information loss [
31]. In order to avoid these shortcomings, a method combining the four parameter interval set and relative entropy is proposed in this paper.
According to the interval information given in the last subsection, the payment interval is obtained from the comprehensive information of both sides in the game. However, the payments do not consider the distribution of points in the interval. In fact, the interior point of payment interval set cannot be simply regarded as uniform distribution. It should change according to the change of the underwater game situation. For one strategy , when the confrontation situation is favorable to the attacker, the payment of the attacker tends to inevitably; if not, it will tend to . In order to fully explore the information of the advantage matrix, the payment interval is transformed into four-parameter interval sets in this subsection.
The four-parameter interval set is similar to the trapezoid fuzzy set [
32]. Combined with the advantage function in the last section, the payment interval
is transformed into a four-parameter interval set
, where
,
, and
is the normalized advantage function with
.
The basic idea of this ranking method is using information entropy to measure the difference between AUV’s own revenue and the maximum revenue (minimum revenue) under different strategies, and to choose the strategy with the smallest difference between AUV’s own revenue and the maximum revenue (or the strategy with the largest difference between AUV’s own revenue and minimum revenue). In fact, the highest return indicates that the AUV has completed the scheduled task without casualties; the lowest return indicates that the AUV has failed to complete the scheduled task with the largest casualties.
First, the Kullback–Leibler distance concept is introduced. For two systems
P and
Q, the relative information entropy for them in state
and
can be expressed as [
33]:
where the unit of
is the bit.
In order to overcome the meaninglessness of Equation (
18) when
or
, the relative information entropy is improved as [
33]:
in which the smaller
means the smaller difference between
and
. Equation (
18) measures the relative information entropy of the two systems under the specific attribute; when the information entropy is zero, it means that the two systems are identical under the evaluation criteria of specific attributes.
Definition 3. For two four-parameter interval sets and , the relative information entropy of and is defined as:where is the weighting coefficient. Equation (20) is asymmetrical to and , which is not corresponding to the actual submarine counter-game. Therefore, the definition of the relative information entropy is improved in the following:where presents the improved relative information entropy between two four-parameter interval sets, and . Property 1. , if and only if when , the equal sign holds.
Property 2. .
Here, we give the proof of the tenable condition of Property 1.
Proof. For any
, define the corresponding component of relative information entropy
as
Because
is a convex function, thus it is yielded from Jensen inequality that:
if and only if when
, the equality holds, thus
. In the same way, it can be proven that
when
. This completes the proof of Property 1. □
If the relative information entropy between the payment of strategy
i and the maximum payment is
, that between the payment of strategy
i and the minimum payment is
, which can be achieved from Equation (
21). Therefore, the relative closeness of the payment of strategy
i can be expressed as:
The multi-AUV system maneuver strategies can be sorted according to in the last equation. In the countermeasure process, the strategy with the largest C value gets the highest priority; when the C values of different strategies are the same, the strategy with a smaller gets the higher priority.
4.2. State Transition of Payment Functions
When one multi-AUV system completes the
-th game stage and reaches the
k-th stage, the multi-AUV strategy is
, which means the
i-th strategy is used in this stage. At this moment, the game situation of the last stage is
, and the estimated confrontation situation after the current stage
is
. Over time, when the multi-AUV system completes the maneuver strategy
, the situation state of the
k-th stage transfers to:
where
is the situation advantage state interval of the current strategy.
represents the variation of the state intervals of
and
.
could be determined easily by the
-th state
and the definition of
i-th strategy. For example, if the
i-th strategy is to keep the current speed,
is the product of the speed vector in
and the stage interval time.
is the energy situation advantage state interval of the current strategy within the
j-th strategy.
4.3. Nash Equilibrium Optimal Solution
The payment of the complete information zero-sum-game is replaced by the interval information set in this paper; thus, the payment of the interval information zero-sum-game discussed here is similar to that of the complete information zero-sum-game. For the multi-AUV attack and defense game, there is no guaranteed saddle point in the pure strategy game; thus, the countermeasure algorithm of the attacker and the defender can only choose the strategy randomly with a certain probability in the strategy set. Therefore, the Nash equilibrium for the mixed strategy is discussed here.
Definition 4. For game , define as the probabilities when players choose strategies at the k-th stage of the game from strategy sets . The mixed strategies of the players in this game can be expressed as follows. Definition 5. , so are the optimal mixed strategies of players at this stage of the game, is the optimal mixed situation, and v is the expected benefit to the players.
According to the interval information game proposed, in this paper is an interval information set. Assume is the interval payment when our multi-AUV system chooses maneuver strategy and the enemy multi-AUV system chooses maneuver strategy at this stage of the game.
According to Definition 3 and Definition 4, the Nash equilibrium [34] of our multi-AUV system under the mixed strategy can be achieved as follows: On considering the actual submarine environment constraints, Equation (25) can be transformed into an optimization problem with interval uncertain parameters, as follows: Therefore, the optimal mixed strategy of the multi-AUV maneuver counter-game can be obtained by solving this optimization problem.
5. Fractional-Order Differential Evolution Algorithm
DE algorithm owns the characteristics of simple operation, high robustness, strong optimization ability, etc. [
16]. However, slow convergence and falling into a local extremum may occur when the DE algorithm is applied in the optimization process, which makes satisfying the real-time counter-game requirements difficult. Aiming to address this problem, a fractional-order differential evolution (FDE) algorithm is presented in this section.
(1) Fitness function.
Normally, the optimal strategy of game player
is to maximize payoff benefit under the constraint conditions, while the other game player
is the opposite. Therefore, the fitness function here could be the optimization objective function presented in Equation (
26).
(2) Mutation.
According to Equations (
3)–(
5), the variation vector of FDE mutation is proposed within the Caputo definition:
where
h is the iteration step of
, and the other parameters are same with the definition in Equation (
5).
In particular, the variation vector (
27) with
becomes
Remark 1. The variation vector (27) of the FDE mutation is derived by the combination of Equation (5) and Caputo fractional derivatives. In fact, Equation (5) is a special case with of the following discrete iteration: The first-order difference is replaced by the Caputo fractional one (4), i.e., In addition, denotewhere can be calculated through a recursive scheme, as follows Then, the variation vector (27) of the FDE mutation can be obviously deduced. Note that in Equation (
27), the scaling factor
is for scaling each base vector and generating a new variation vector. A larger
can search for a potentially optimal solution in a larger range. On the contrary, a smaller one can speed up convergence and improve accuracy. Meanwhile, when the fitness of each individual is relatively superior, it is preferable for
to be small to reduce the disturbance of the better individuals; on the contrary, when the fitness of each individual is relatively poor, it is preferred to expand the search range of the solution; thus, a larger
may be applied. Combined with the game algorithm proposed in this paper, the scaling faction
here is determined according to the evolution time and the difference between the best and worst individuals, as follows:
where
,
G is the maximum number of iterations,
g is the current number of iterations;
is the best fitness,
is the worst fitness, and
is the current individual fitness.
and
are the maximum and minimum values of
F, respectively. If the fitness difference between the current individual and the optimal individual is large, it means that the individual is far away from the optimal individual in space. A larger value of
signifies that the disturbance to the individual is larger, which in turn implies that the search scope of the algorithm is expanded, and the global search ability is enhanced. If the fitness difference is small,
may acquire a smaller value, and the disturbance to the individual is also small, which means that the search is only carried out in the small area near the individual, so as to enhance the ability of the algorithm development. Moreover, at a later stage in the evolution, the value of
prefers to be relatively small, which can make the search in the neighborhood of the current individual and ensure the accuracy of the algorithm.
(3) Crossover.
As shown in Equation (
6), the crossover operation is described by
Note that
is the crossover rate that determines the crossover probability of the mutated individual and the original individual on each dimension vector. The larger
of the individual with poor adaptability can accelerate the change of the individual structure. Therefore, it is preferred that a smaller
be used in the late evolution stage to reduce the disturbance of the target individual to the experimental individual and ensure the convergence speed of the algorithm. The designed crossover rate is as follows:
where
is the average fitness of the current population,
is the current crossover rate, and
and
are the maximum and minimum values of
, respectively. Equation (
30) shows that when the fitness of the target individual
is smaller than the average fitness, the target individual is relatively superior. A smaller
should be chosen, and more information of the test vector is obtained from the target vector
. Otherwise, more information of the test vector
is obtained from the variation vector
, which improves the diversity of population.
could ensure a large
at an early stage of evolution, increase population diversity, and speed up convergence. In addition, a small
in the late evolution stage is conducive to finding the optimal solution.
(4) Selection.
According to Equation (
7), the selection operation is re-mentioned.
where
is the next generation individual.
6. Example
In this section, a multi-AUV countermeasure example is given to verify the effectiveness of the proposed algorithm. In the initialization process of the proposed algorithm, several parameters need to be determined as the following guideline.
The initial states of game players
and
, such as the initial positions, velocity, deflection angle, and pitch angle, as well as the time interval of the game steps and the weighting coefficients
in Equation (
12).
The strategy spaces of both sides , such as keep the current speed, turn left, turn right, pitch up, pitch down, etc.
The maximum number of iterations G in the fractional-order DE, as well as the population , iteration step h, and fractional order q.
Suppose ‘A’ and ‘D’ are engaged in a two-vs.-two AUVs underwater counter-game. The initial positions of ‘A1’ and ‘A2’ are (0 m, 100 m, and 200 m) and (0 m, −100 m, and 200 m), and ‘D1’ and ‘D2’ are (800 m, 100 m, and 200 m) and (800 m, −100 m, and 200 m), respectively. The velocity, deflection angle, and pitch angle of ‘A1’ and ‘A2’ are 23 m/s,
,
, 23 m/s, and
,
, respectively. The velocity, deflection angle, and pitch angle of ‘D1’ and ‘D2’ are 25 m/s,
,
, 25 m/s, and
,
, respectively. Both sides have the same control ability, and the time interval of the game steps is 5 s. The weighting coefficients are denoted as
referring to [
30]. The strategy spaces of ‘A’ and ‘D’ are both supposed as keep the current speed, speed up, speed down, turn left, turn right, pitch up, and pitch down for each AUV. In addition, the maximum number of iterations is set as
, as well as the population
, iteration step
, and fractional order
. Within all the above initial conditions, the multi-AUV maneuver countermeasure model is constructed according to
Section 3 and
Section 4, and the fractional-order differential evolution in
Section 5 is employed in strategy optimization. For a different fractional order
q, the optimization errors of the first game step are calculated and shown in
Figure 1. When the optimization errors are close to 0, the convergence rates are not clearly visible in
Figure 1. Thus, the base-10 logarithms (
) of the optimization errors of different fractional orders
q are exhibited in
Figure 2.
According to
Figure 1 and
Figure 2, the optimization error of
is better than the others when the iteration is 300. However, the optimization error of
is the most superior one when the iteration is 500. Thus, smaller order
q brings faster optimization rate when the iteration number is not enough. When the iterations are raising, the optimization effects of the larger order visibly increase. Thus, according to the iteration number, the fractional-order DE algorithm could adjust the fractional order
q to achieve the optimal strategy calculation.
It is obvious that ’D’ possesses some advantage in the beginning. It should also be noted that the maximum maneuver steps should be determined according to the effectiveness of the AUVs used in the confrontation. There are 50 steps in the game process, whose expected benefits are shown in
Figure 3. According to the last section, the obtained expected benefits demonstrate that the Nash equilibrium of the interval information game is satisfied.
In order to compare the game performance, ‘A’ uses the cooperative dynamic maneuver countermeasure algorithm proposed in this paper, and ‘D’ uses the max-min countermeasure algorithm during the multi-AUV confrontation process [
34]. The three-dimensional counter-game process with five main stages are shown in
Figure 4,
Figure 5,
Figure 6,
Figure 7 and
Figure 8. The red-dotted line represents the path of ‘A1’, the red solid line represents ‘A2’, and the blue-dotted and solid lines represent ‘D1’ and ‘D2’, respectively. The ‘*’ shows the initial position and ‘
▵’ shows the current position. The confrontation ends when one side’s expected benefit reaches the absolute advantage. For stage 1 in
Figure 4, ‘A’ possesses the dominant position and decides to attack ‘D’ actively, in which ‘A1’ tries to attack ‘D2’, and ‘A2’ is moving towards ‘D1’. For stage 2 in
Figure 5, ‘A1’ misses the attack on ‘D2’, and tries to attack ‘D1’ with ‘A2’. ‘D2’ escapes the attack of ‘A1’, and tries to outflank ‘A2’. The situation changes when ‘D’ possesses the dominating position in stage 3. This can also be validated in
Figure 3, in which the expected benefits change from positive to negative. In
Figure 6, ‘D1’ and ‘D2’ try to achieve a converging attack to ‘A2’; ‘A1’ tries to attack ‘D1’ and supports ‘A2’. Then, in stage 4, the situation changes again, in which ‘A’ possesses the dominating position and the expected benefits change from negative to positive. ‘A2’ turns continuously and drives ‘D1’ away successfully, then ‘A1’ and ‘A2’ tries to attack ‘D2’, but ‘D1’ and ‘D2’ escape in two different directions. In the end, both ‘A1’ and ‘A2’ possess the dominating positions, thus ‘A’ achieves the absolute advantage and ends the game, which is illustrated in
Figure 8. This example validates the effectiveness of the proposed multi-AUV dynamic maneuver countermeasure algorithm.