1. Introduction
From conceptualization to realization, the utilization of multiple intelligent surface vehicles (ISVs) to conduct surface pursuits has attracted widespread attention [
1,
2,
3,
4,
5]. However, how to effectively and efficiently pursue a moving target still poses a significant challenge for a multi-ISV system. The system necessitates numerous advanced technologies, including target recognition, target positioning, and dynamic maneuver planning [
6,
7]. Focusing on providing an efficient maneuver planner for the multi-ISV pursuit system, this paper aims to address the concerns regarding realizing the strategic interaction and addressing the data scarcity of the target.
There are three mainstream methods in maneuver planning for a multi-vehicle target pursuit: graph theory-based algorithms, the cooperative target hunting approach, and game theory-based methods. Based on graph theory, the prevailing approach is the Voronoi-related algorithm, which could visually represent the relationships among the motion sets for the vehicles [
8,
9]. However, all the graph theory-based algorithms do not directly provide information regarding strategic choices and the interaction between the pursuit vehicles and the target. As for the cooperative target hunting methods, researchers have focused on the formation control and dynamic task allocation during the planning process [
10,
11,
12]. Nevertheless, these methods require detailed models for both the vehicles and the environmental information. Game theory, as a mature mathematical theory, has broad applicability and theoretical rigor in maneuver planning [
13]. It offers advantages such as equilibrium point analysis, strategy optimization, and the handling of incomplete information. To realize intelligent and cognitive interactions between the pursuit team and the target, this paper focuses on developing a novel and practical game theory-based maneuvering planning method.
Under the game theory-based target-pursuit framework, three types of game models, Stackelberg games [
14], cooperative games [
15], and zero-sum games (ZSGs) [
16,
17,
18,
19], demonstrate different characteristics. The Stackelberg game involves a leader–follower model, where the leader takes action first and the follower adjusts their strategy after observation [
20]. Hence, the Stackelberg game-based model is suitable for simulating the dynamics in a leader–follower structure. However, the complexity of the Stackelberg game is high, requiring sophisticated real-time decision-making. Cooperative games emphasize teamwork [
21]. In cooperative games, team members depend on each other. The action of one member may affect the entire team [
22]. The interdependence of the members may result in a decrease in team effectiveness when one team member fails to fulfill their responsibilities. Compared with the two previously mentioned games, ZSGs are more applicable to intense competition [
23]. In ZSGs, the success of one team inevitably leads to the failure of other teams. ZSGs enable participants to adopt clear confrontational strategies. Moreover, due to the relative simplicity of the ZSG models, decision-making is more explicit and easier to implement, contributing to more effective target pursuit in competitive environments.
The application of ZSGs to target pursuit manifests as zero-sum pursuit–evasion games (ZSPEG). The ZSPEG-based model comprises two players: the pursuer and the evader [
16,
17,
18,
19]. In refining the ZSPEG-based model to be more specific and tailored, researchers focus on enriching its objective function and considering various involved factors. A divide-and-conquer approach is used for a multi-player ZSPEG where the pursuers have a twofold goal [
11]. To identify unknown sets of incoming attackers, a multi-model adaptive estimator is implemented in the offline design policy sets [
24]. Furthermore, a bilateral adaptive parameter estimation method is adopted to deal with a multi-agent ZSPEG model with unknown general quadratic goals [
25]. In recent years, various algorithms have been developed to solve ZSG-based models, such as regret matching [
26], fictitious play [
27], double oracle [
28], etc. Among them, the most popular algorithms are regret learning-based methods. They rely on the concepts of external regret, internal regret, swap regret, and Nash equilibrium-based regret [
29]. Building on this foundation, the current mainstream algorithms are the optimistic follow-the-regularized-leader [
30] and the optimistic mirror descent algorithm [
31]. Despite the extensive research on ZSPEG-based methods and the relevant algorithms, there is little research on applying the ZSPEG to multi-ISV target pursuit. The main reason for this is the high complexity, data scarcity, and many technical challenges in multi-ISV target pursuit. In this context, comparing other algorithms to the ZSG-based models, the fictitious play algorithm can provide more effective assistance. Firstly, fictitious play could simulate the behavior of the target, which could help the pursuit system to better understand the behavioral patterns of the target, thereby addressing the data scarcity of the target. Additionally, the fictitious play algorithm allows for the experiment to be validated and for an easier evaluation of the feasibility of different strategies and algorithms. This characteristic could prove advantageous for multi-ISV systems by mitigating the complexity during the surface pursuit. Thus, fictitious play is adopted as the solution framework for our developed ZSPEG-based, multi-ISV, target-pursuit model.
Building upon the above discussions, this paper formulates an innovative ZSPEG-based maneuver planning method for multiple-pursuit ISVs. The main contributions are summarized as follows:
By employing the ZSPEG framework, the surface pursuit system gains the ability to explore and analyze the intense competition between the pursuit team and the target.
Through the utilization of fictitious play, the data scarcity concerning the target can be alleviated, thereby reducing the complexity associated with surface pursuit.
Under the fictitious play framework, a mixed-strategy Nash equilibrium (MNE)-based decision-making process is employed. This approach could derive the best responses to maneuver the surface vehicles effectively and stably.
The subsequent sections of this paper follow a structured organization.
Section 2 provides the details of the multi-ISV target-pursuit model, which is based on the ZSPEG framework. Following this,
Section 3 introduces the design of a motion planner for the pursuit vehicles, utilizing the MNE methodology.
Section 4 shows the execution of the simulations. Finally,
Section 5 presents the conclusions drawn in the study.
2. System Model
ZSG means that, under strict competition conditions, one player gains benefits while another player suffers loss. The sum of their benefits is zero [
32]. The development of a ZSPEG model for a multi-ISV pursuit system includes three foundational elements in the ZSG-based model: the two players, their strategy sets, and the payoff function.
2.1. Game Scenario and the Players
A classic ZSG model can be formulated as follows:
where
denotes the strategy set for player 1. It is assumed that
.
m is the positive integer. Similarly,
denotes the strategy set for player 2, which is assumed to be
.
represents the payoff matrix for player 1. As for the state in the ZSG, when
and
, if player 1 adopts strategy
and player 2 adopts strategy
, the game state could be formulated by
. Under this game state, the payoff for player 1 is assumed to be
. Then, the payoff matrix
for player 1 is obtained. Different rows in
denote the different strategies of player 1, while different columns in
denote the different strategies chosen by player 2.
The pursuit scenario with
ISVs pursuing a mobile target is presented in
Figure 1. In accordance with this scenario, the constructed ZSSG model includes two players: the pursuit team, consisting of
vehicles, and the moving target. Then, the ZSSG-based model for the multi-ISV target-pursuit scenario could be formulated by
, where
and
denote the strategy sets for the two players. It should be noted that the red star in the figure represents the attached points of the moving target. Once the target reaches this point, the pursuit team fails the protection task.
represents the payoff matrix for the pursuit team.
2.2. Strategy Set
Before designing the maneuver strategy sets, the kinematic model of the surface vehicle should be formulated. When the vehicle moves on the surface, three primary types of motion on the horizontal plane are typically identified: surge, sway, and yaw. The corresponding motion analysis is presented in
Figure 2. The motions of the vehicle are described using two reference coordinates: the geodetic reference coordinate
and the vehicle body reference coordinate
[
33]. As shown in
Figure 2, when
, under
, the position of the
vehicle is
.
is the speed at which the vehicle oscillates left and right along
, which is usually set to be zero.
is the speed at which the boat oscillates forward and backward along
.
is the yaw angle of the vehicle.
is the velocity of the currents at
.
Assumption 1. It is assumed that all the vehicles have a constant-speed motion, and the sway velocity for the individual vehicle is zero.
Under Assumption 1 and the discrete maneuver, the three-degrees-of-freedom kinematics model of the
th vehicle is shown as follows:
where the control variable is the unit vector of yaw angle
at the current
th step, where
is a non-negative integer.
represents the position of the vehicle at the next time step.
represents the position at the current
step.
is the unit time period.
is a standard vector–scalar multiplication, where scalar
scales the components of vector
. This results in a new vector
, which represents the velocity of the vehicle in Cartesian coordinates.
Based on the above introduction, is what should be determined for the pursuit vehicle in the ZSSG-based model. Then, if there are yaw vectors for the vehicle to choose, there would be choices in the for the pursuit team. Therefore, to reduce the computational overhead, one decomposing method is adopted in this paper. The decomposing method that is employed is named the target-guided relay-pursuit (TGRP) strategy, which is elaborated in Definition 1. This strategy hinges on the principle of relay pursuit, where only one vehicle is active at any given time, while the others remain on standby, ready to take over the pursuit as the scenario evolves. There are two reasons for adopting the TGRP strategy. The core reason behind using TGRP is to dynamically assign the pursuit role to the ISV that is best positioned to continue the chase, minimizing the overall time to intercept and enhancing the strategic positioning of the pursuit team. Furthermore, if fewer ISVs are actively maneuvering at any one time, the risk of collisions is significantly decreased, enhancing safety and operational integrity. However, while the TGRP strategy is designed to be highly effective under a wide range of conditions, it is necessary to clarify that it may not always represent the absolute optimal strategy. Its performance can be considered close to optimal in scenarios characterized by complex dynamics and the need for rapid tactical shifts.
Definition 1 (TGRP strategy [34]). In the TGRP strategy, only one vehicle is active, while the others are stationary. This relay-pursuit mechanism could effectively avoid collisions among the vehicles. Furthermore, the distribution of the active vehicle changes over time, which depends on the outcome of the game at each time step. If the active vehicle is determined by index , the heading vector for the pursuit ISV could be obtained by Equation (3).where is the relative position vector from the target to the vehicle. denotes the heading angle for the inactive vehicle towards the direction of the target. Under the TGRP strategy, the pursuit team has
total motions. When
, the
motion denotes that the
vehicle actively pursues the target. By utilizing the fictitious play, the restricted strategy set
for the target is assumed to include
choices. The former
actions represent an evasion of the corresponding pursuers. The
action means that the target is moving towards its target node
. Then, when
,
denotes the
maneuver strategy for the moving target. The payoff matrix
for the pursuit team is demonstrated in
Table 1.
2.3. Pursuit Payoff Function
In this paper, two goals are set for each pursuit vehicle. The first goal is to capture the target as quickly as possible. The second is to prevent the target from reaching its intended node. The two goals are the minimum time required for the vehicle to capture the target, and the alignment of the target with the point of attack from its current position, respectively.
2.3.1. Minimum Time-to-Capture
To calculate the minimum time-to-capture, several parameters need to be introduced. , , and represent the location, the surge velocity, and the unit yaw vector for the target, respectively. is the length of the vehicle.
Then, a time metric is introduced [
23]. It is associated with the minimum positive solution,
, in the following equation:
where
. As the time-of-capture
is bounded by the
, the final
can be obtained as follows:
2.3.2. Avoidance of Being Attacked
There is a goal to avoid the target reaching the point of attack:
,
denotes the extent to which target is heading towards
from its current location. Therefore,
is assumed to be
, where
is the angle between
and vectors
. Therefore,
is formed as follows:
where
is the alignment metric for the target with respect to its attack node under the
strategy, and
represents the cosine of the angle between the target’s heading vector
and the vector pointing from the target to the attack node
. This measures how well-aligned the target’s heading direction is with the vector pointing towards the attack node. According to Equation (6), a higher value of
(close to 1) indicates that the target is moving directly towards the attack node, suggesting a higher likelihood of an attack. A lower value (close to −1) suggests the target is moving directly away from the attack node. Values of around 0 indicate perpendicular movement, implying that the target is neither approaching nor directly retreating from the attack node. This metric helps to evaluate the level of threat posed by the target to the attack node. By continuously monitoring
, the pursuit system can prioritize targets that are better aligned with critical points, thus optimizing the pursuit strategies.
2.3.3. Goal/Payoff Function
To balance multiple critical aspects of the pursuit dynamics, the goal/payoff function (7) is formed based on the two introduced metrics: the minimum time-to-capture
and the avoidance of being attacked
. This goal function could reflect both the urgency of intercepting the target and the strategic necessity of managing risks.
where the minimum time-to-capture
is normalized with the maximum value in vector
. This ensures that all values of
are uniformed between zero and one so that their size is similar to the values of
.
There are two main motivations for adopting this goal/payoff function. First, this function facilitates a more holistic evaluation of pursuit strategies by not only focusing on quick interception but also ensuring that the maneuvers are strategically sound and sustainable over the long term. Second, the inclusion of both
and
allows the function to adapt to different tactical situations, providing a flexible tool that can adjust to the target’s behavior and environmental variables. Based on function (7), the payoff assignment is presented in the Algorithm 1, which formulates the payoff matrix under the pursuit station
. The calculated payoff matrix would assist in the latter solution to the proposed ZSPEG-based model.
Algorithm 1: Payoff assignment for the pursuit team |
Input: ; Output: ;
- 1
for do - 2
for j do - 3
; - 4
; - 5
; - 6
end for - 7
end for - 8
if then - 9
; - 10
for do - 11
; - 12
; - 13
end for - 14
end if
|
4. Simulations
In this section, we present the results obtained from the simulations and discuss their implications. The objective is to evaluate the performance of the proposed ZSPEG-based model and its effectiveness in achieving stable and accurate equilibrium points. We will analyze the outcomes of different scenarios and compare them with baseline methods to demonstrate the advantages of our method. The results will be examined in terms of time-to-capture, maneuver efficiency, etc.
To set the parameters before the simulations, a series of parameters were chosen based on their relevance to the performance and functionality of the ISV during pursuit tasks. The chosen parameters are presented in
Table 2. The selection criteria for these parameters were as follows: (i) Sway velocity of the vehicle (
)—the sway velocity range of 5 m/s to 15 m/s was determined by referring to the currently used ISVs, ensuring that the vehicle could adapt to the various operational speeds required for different scenarios. (ii) Sway velocity of the target (
)—a fixed value of 10 m/s was chosen based on the average target speeds observed in similar pursuit and evasion scenarios. This value balances the difficulty for the ISV to capture the target while allowing for a meaningful analysis of the algorithm’s effectiveness. (iii) Length of the vehicle (
)—a standard length of 4 m was selected to represent the typical dimensions of ISVs used in practical applications, ensuring the results are applicable to real-world scenarios. (iv) Intercepted distance to target (
)—the value of 12 m was derived based on the length of the vehicle, indicating the optimal interception distance for successful captures while minimizing the risk of overshooting or collision. (v) Attack distance of target (
)—an attack distance of 6 m was chosen based on the ISV’s common response capabilities, ensuring effective engagement. (vi) Safe distance between the ISVs (
)—a safe distance of 9 m was determined to prevent collisions between multiple ISVs operating within the same area, allowing for coordinated maneuvers. (vii) Interval time (
)—an interval time of 0.01 s was selected to ensure precise and responsive updates to the ISV’s path-planning algorithm, enabling real-time adjustments. (viii) Survival time of target (
)—the survival time of 2.0 s was established to assess the ISV’s rapid response capabilities, providing a challenging yet achievable timeframe for the pursuit. (IX) Protected area (
)—the protected area of
was set to represent a controlled environment where the ISVs and targets operate, allowing for consistent and repeatable testing conditions.
First of all, to compare the proposed ZSPEG-based planning scheme in both the single-ISV scenario and the multi-ISV scenario, the strategy set and the reward function for the single-ISV pursuit scenario are constructed as follows. In the single-ISV pursuit scenario, the heading vector of the ISV is limited to within their field of vision constraints, which are denoted by
. Therefore,
, the unit yaw vector for the pursuit vehicle, is limited to the range of
. Furthermore, the maneuver planning for the ISV is closely related to its minimum steering angle (
). Then, under the constraints of the
and
, there could be
options for ISV to move. Hence, the
strategy for the pursuit vehicle is denoted by
,
. Similarly, the maneuver strategy set for the moving target could be assumed by
,
. The payoff function for the single-ISV target pursuit is designed to consider two factors: the distance between the pursuit ISV and the target, and the distance between the target and the point of attack. These two factors could be formulated by the following equations:
where
denotes the pursuit feedback of the distance between the vehicle and the target.
denotes the attacking feedback of the distance between the target and the intended node.
and
denote the corresponding distance in the future situation after adopting the related motions.
and
represent the distances in the current situation. Then, the payoff for the pursuit ISV can be obtained by Equation (34).
where
,
, and
are all constants.
To generate and analyze the trajectories under the single-ISV pursuit model, the initial position for the moving target is set to along the boundary of the protected area. The position of the unknown attacked node should be within the protected area. Thus, it is assumed that
= (80,5),
= (20,10),
= (40,22),
, and
. After being solved by Algorithm 2, under different
and
configurations, nine trajectories in the single-ISV pursuit are shown in
Figure 4. Although the rational motions were sequentially generated for the ISV in time, the vehicle fails to capture the target in all the situations. If the ISV does not fulfil the necessary conditions, the target could gain the initiative to attack its intended node. Therefore, it is an inevitable trend that multi-ISV pursuit replaces single-ISV pursuit.
4.1. Model Evaluation
In this subsection, we evaluate our proposed ZSPEG-based model by comparing it with two other target-pursuit methods through a series of simulations. The compared methods include the following:
Graph theory-based: There are many versions of Voronoi algorithms being applied in a multi-robot pursuit scenario [
40]. Among these Voronoi-based methods, the GAM–Voronoi algorithm outperforms the others in a bounded convex environment, which utilizes a global area-minimization (GAM) strategy [
41]. This was chosen as the representative of the Voronoi-related algorithms.
Cooperative target hunting: To compare our method with strategy-based cooperative pursuit methods, a popular method, called Cooperative Hunting based on the Dynamic Hunting Points Allocation (CH-DHPA) [
42,
43,
44], is selected. This method transforms the target pursuit to the dynamic hunting points allocation problem.
Our Proposed Method (TGRP-MNE): This method aims to achieve stable and accurate equilibrium points by implementing the ZSPEG-based model.
The reasons why the GAM–Voronoi algorithm and CH-DHPA were chosen for the comparison with our proposed methods include two main criteria. First, both the GAM–Voronoi and CP-DHPA methods address the core issue of maneuver planning in pursuit scenarios, similar to our study’s focus. This relevance ensures that the comparisons are directly applicable and meaningful to the field of pursuit–evasion games involving multiple ISVs. Second, by comparing these methods, this study could have significant theoretical implications regarding the applicability of game theory versus graph theory and cooperative strategies in real-world scenarios.
The required parameters, the pros, and the cons of the three methods are shown in
Table 3. To ensure the fairness of the experiment, while providing our method with additional parameters and information, we established ideal environmental conditions for the other two methods. The relevant environmental elements and geographic information were set to ideal situations devoid of external disturbances. Furthermore, we also idealized the communication situation for these two methods. As they involve distributed communication, effective communication among multiple ISVs is crucial during task allocation and team collaboration. Hence, to enhance fairness during the comparison simulation, signal transmission was also configured to be in an ideal state for the methods of GAM–Voronoi and CP-DHPA. Based on the above settings, simulations were conducted under the fixed initial positions for the ISVs. The relevant initial positions were set as follows:
= (40,30),
= (40,10),
= (50,20), and
= (30,20). Furthermore,
and
were randomly chosen. The settings for the other parameters are shown in
Table 2. For every group, the simulations were carried out 1000 times. The assessed metrics include the capture success rate—
; the rate of nodes being attacked—
; the number of collisions—
; and the approximate average energy consumption per successful capture—
(m
s).
The selected three parameters were computed as follows: (i) The capture success rate —this parameter represents the percentage of simulations where the pursuit strategy successfully led to the target being captured within the defined constraints and conditions. It is computed by dividing the number of successful captures by the total number of simulation runs and then multiplying by 1000 to express this as a percentage. (ii) The rate of nodes being attacked —similar to the success rate, the avoidance rate measures the percentage of simulations in which the target successfully evaded capture for the duration of the simulation or until it reached a safe zone. This is calculated by counting the instances of successful evasion, dividing this by the total simulations, and converting the result into a percentage. (iii) The average energy consumption per successful capture (ms)—to compute the , the total energy consumed in each run is summed up and then divided by the number of runs to obtain the average. This provides a measure of how much energy, on average, is required for a vehicle to participate in a pursuit scenario over the course of the simulations. Mathematically, it can be represented as , where is the energy consumed in the simulation and is the total number of simulations. This calculation helps to understand the energy efficiency of the different tactical approaches under various operational conditions, which is crucial for optimizing the pursuit strategies in terms of both effectiveness and sustainability.
The simulation results under the varying survival times, the varying pursuit velocities, and the different areas are shown in
Table 4,
Table 5 and
Table 6, respectively. It should be noted that only the CP-DHPA method has the collision parameter
. This is because the CP-DHPA method requires additional collision avoidance measures, for which we need to analyze its effectiveness in avoiding collisions. However, in the GAM–Voronoi method, collision avoidance is inherently built into the method, preventing any collisions from occurring. In our TGRP-MNE method, a relay–pursuit strategy is adopted. Therefore, collisions between pursuit vehicles are not a concern, either. As presented by these three tables, our method outperforms the other two methods in achieving a relatively high success rate of capture
and the lowest average energy consumption
. To be specific, when comparing
Table 4 and
Table 5, we observed that the rate of nodes being attacked,
in the CP-DHPA method is lower than that in our proposed TGRP-MNE method. This phenomenon indicates that that our TGRP-MNE method might implement a more effective or aggressive pursuit strategy. Such a strategy could better block the routes of the target or more accurately predict the target’s movements, but may not provide sufficient protection for the target’s attack point. Furthermore, according to
Table 6, our proposed method demonstrates a superior performance compared to GAM–Voronoi and CP-DHPA in terms of its success rate and energy consumption across all area scales. Based on these findings, if node protection is the primary goal in surface pursuit, CP-DHPA would be the preferred choice among the three methods. However, in general situations, achieving a high and stable success rate of capture is paramount for the surface pursuit. With adaptability to factors
,
, and the area scale, and the ability to achieve a high and stable success rate of capture and low energy consumption, our method can be considered the optimal choice for surface pursuit.
4.2. Sensitive Analysis
In this subsection, the sensitivity of our method under different conditions is tested and analyzed. The conducted experiments include the following: (i) analysis of , , and , where we tested our methods under multiple scenarios with different initial conditions, including the number of vehicles , the survival time , and the pursuit velocity ; (ii) system stability, where we assessed the stability of the pursuit system by observing the consistency of the pursuit strategies over extended simulation runs with different target strategy sets. These experiments could allow for a comprehensive evaluation of the proposed method, demonstrating its advantages and identifying potential areas for improvement.
Analysis of
n,
, and
: To explore the impact of variables on the performance, simulations are conducted using different numbers of vehicles
, various survival times
, and changing pursuit velocities
. The results are shown in
Figure 5. In
Figure 5a, once
exceeds 1.6 s, all the success rates would be stable, with a small fluctuation of around 85%. As shown in
Figure 5b, there were no significant differences or changes among the data. The rates of attack were all below 15%. The above observations indicate that when the survival time exceeds a certain value, further increasing
no longer contributes to improvements in the performance. According to the results shown in
Figure 5c,d, with the pursuit velocity increases, the capture success rate in all the teams increases, while the rates of the node being attacked all decrease. To obtain a higher success rate and a lower rate of being attacked in a pursuit team, improving the pursuit velocity could be helpful. Furthermore, the team with
always achieves the best pursuit performance due to its having the highest number of vehicles. In sum, the variation in the team-scale and the pursuit velocity has a more significant impact on the model’s performance compared to the changes in time. Therefore, during the optimization of the system, particular attention should be paid to the
and
adjustments.
System stability: To determine how the strategy set of the target affects the pursuit, the strategy sets for the target are redesigned. For a pursuit team with
= 3, the strategy sets for the target are reformulated by the two newly designed sets:
and
. The first strategy set is based on the ZSG model, which was introduced in
Section 2.2.
is more complex than the ZSG-based strategy set and the original one. The detailed strategies in
are shown as follows, and are also displayed in
Figure 6:
: the target evades from the nearest ISV.
: the target adopts the collective evasion strategy, which is explained in Definition 4.
: the target heads directly toward its attacked node, denoted by the red star in
Figure 6.
: the direction of the target is the angle bisector formed by and .
Definition 4 (collective evasion strategy). From the perspective of the target, all angles formed between the two adjacent ISVs are taken into account in the collective evasion strategy. Additionally, the moving direction of the target is the parallelogram of the maximum angle formed by the two adjacent ISVs. The purpose of this strategy is to enable the target to move immediately away from the entire pursuit team, rather than just from one ISV. The calculation for this evasion motion is as follows: Firstly, is set to represent the angle of the vector (). denotes the angle between two adjacent ISVs. If , is equal to . Therefore, could be obtained by Equation (35).where is the index when is taken to its maximum value. is half of the .
With changing survival times, simulations were carried out under
,
and
, respectively. The results are shown in
Figure 7. In terms of the success rate of capture and the rate of being attacked, it is apparent that there is no significant difference among the three strategy sets. This discovery underscores the robustness of our proposed method, showcasing its ability to maintain a consistent performance despite variations in the target’s strategy set.