1. Introduction
At present, the establishment and realization of mobile communication networks mainly rely on terrestrial base stations and other fixed communication equipment, which requires time-consuming network planning with the consideration of many practical factors. Unmanned Aerial Vehicles (UAVs) with high flexibility, low cost, and wide coverage have aroused widespread concern in academia and industry [
1,
2], e.g., UAV-assisted base station communications [
3], relay communications [
4], data collection [
5], and secure communications [
6]. In the area of UAV-assisted communications, UAVs are mainly regarded as mobile base stations to provide high-quality communication services to ground users (GUs) [
7]. The mobility and flexibility of UAV Base Stations (UAV-BSs) can establish communication connections quickly and improve data transmission efficiency and communication range significantly [
8,
9,
10,
11]. For example, when the ground communication infrastructure is damaged by natural disasters, UAVs can be employed as temporary base stations to provide emergency communication services for GUs.
UAVs as aerial base stations have many advantages compared to terrestrial base stations. (1) UAVs can establish good Line-of-Sight (LoS) links with GUs [
12]. (2) For mobile GUs, UAVs can adjust flight trajectory or follow the GUs to provide better communication services [
13]. It is worth noting that due to limited communication resources and coverages, UAV-BSs cannot serve all GUs by merely optimizing the locations of UAV-BSs. Thus, UAV-BSs mainly face two challenges in UAV-assisted communication. (1) How to select the GUs to be served by optimizing the locations of the UAVs to maximize the communication efficiency (e.g., throughput). (2) How to adopt a service strategy to ensure fairness among GUs when providing communication services for multiple GUs.
For the problem of maximizing the efficiency of UAV-assisted communication, most studies regard the energy efficiency and throughput as the primary optimization objective. In [
14], the authors proposed a centralized multi-agent Q-learning algorithm to maximize the energy efficiency of wireless communication. In [
15], a Deep Reinforcement Learning (DRL) algorithm based on Q-Learning and convolutional neural networks is proposed to maximize spectral efficiency. However, maximizing communication efficiency results in the tendency of UAV-BSs to hover approaching a small subset of GUs. This will lead to communication mission interruptions for other GUs due to being out of the communication range of the UAV-BSs. For the fair communication problem, previous works focus on fair coverage. For example, an algorithm based on DRL is proposed in [
16] to deploy UAV-BSs. UAV-BSs can provide fair coverage and reduce collisions among UAVs. In [
17], a DRL-based control algorithm is proposed to implement energy efficient and fair coverage with energy recharge. A mean-field game model is proposed in [
18] to maximize the coverage score while ensuring fair communication range and network connectivity. These studies [
16,
17,
18] focus on fair coverage of ground areas or users. In the pursuit of regional fairness, UAVs need to serve all cell as much as possible. Thus, focusing only on fair coverage will cause partial task interruption and the degradation of communication efficiency. It is necessary to consider both communication efficiency (throughput) and user fairness in UAV-assisted communications, which has been overlooked in the literature [
14,
15,
16,
17,
18].
1.1. Related Work
In this subsection, we review relevant work on UAV-assisted communication and point out the inadequacy of these works.
1.1.1. UAV-Assisted Communication
UAV-assisted communication has been extensively researched. For example, Jeong et al. [
19] proposed an optimization algorithm based on a concave-convex procedure to maximize the transmission rate by designing the flight trajectory and the transmitting power of the UAV. Yin et al. [
20] studied UAV as aerial base stations serving multiple GUs and proposed a deterministic policy gradient algorithm to maximize the total uplink transmission rate. These works [
19,
20] focus only on the communication performance of UAV-assisted communication networks, the energy consumption of the UAV is also a crucial issue due to the limited onboard energy. In [
21], an Actor–Critic-based deep stochastic online scheduling algorithm is proposed to minimize the overall energy consumption of the communication network by optimizing the data transmission and hovering time of the UAV. The proposed algorithm can reduce energy by 29.94% and 24.84% compared to the DDPG and PPO algorithms. Yang et al. [
22] investigated UAV-assisted data collection, where the data transmission rate and the energy consumption of the UAV are in conflict with each other during the data collection process. Theoretical expressions for the energy consumption of the UAV and GUs are derived to achieve different Pareto-optimal trade-offs. The results provide a new insight for future energy efficiency of UAV-assisted communication. Zhang et al. [
23] considered post-disaster rescue scenarios where energy is limited due to the collapse of the power system. To satisfy energy constraints and obstacle constraints, a safe deep Q-learning-based UAV trajectory optimization algorithm is proposed to maximize uplink throughput. Its weakness is that it cannot be applied to a larger disaster area due to the limited communication range and on-board energy of single UAV. The above studies [
21,
22,
23] take full consideration of the energy consumption of the UAV during the trajectory design, enabling the UAV to perform communication services with greater energy efficiency. These studies [
19,
20,
21,
22,
23] consider single UAV scenarios and the GUs are stationary. Thus, the methods designed in the above references are only applicable to small-scale simple scenarios.
For complex scenarios, it is particularly important that multiple UAVs collaborate with each other to accomplish complex communication tasks. Shi et al. [
24] proposed a dynamic deployment algorithm based on hierarchical Reinforcement Learning (RL) to maximize the long-term desired average throughput of the communication network. The proposed algorithm can increase throughput by 40%. To meet the Quality of Experience (QoE) of all GUs with limited system resources and energy, Zeng et al. [
25] jointly optimized GUs scheduling, UAVs trajectory, and transmit power to maximize energy efficiency and meet GUs QoE. The proposed algorithm increases energy efficiency by 12.5% compared to the baseline algorithm. Ding et al. [
26] modeled the UAVs and GUs as a hybrid cooperative-competitive game problem, maximizing throughput by simultaneously optimizing the trajectory of UAVs and the access of GUs. The above studies [
24,
25,
26] focus on the multiple UAVs and multiple GUs scenario to maximize throughput and energy efficiency by designing the flight trajectory. However, the optimization objectives of [
19,
20,
21,
22,
23,
24,
25,
26] focus on communication performance and ignore fairness among GUs. This leads to UAVs allocating more communication resources to GUs with high throughput or following the movement of the GUs. As a result, the UAV-BSs ignore service requests of other critical GUs.
1.1.2. UAV-Assisted Fair Communication
It is also a crucial issue to consider service fairness in UAV-assisted communication. Diao et al. [
27] studied the problem of fair perceptual task allocation and trajectory optimization in UAV-assisted edge computing. The non-convex optimization problem is transformed into multiple convex sub-problems for solution by introducing auxiliary variables to minimize energy consumption while meeting fairness. In [
28], Ding et al. derived an expression for the energy consumption of UAV. Based on the energy consumption of UAV and the fairness of GUs, a DRL-based algorithm is proposed to maximize the system throughput by jointly optimizing the trajectory and bandwidth allocation. The proposed method increases the fair index by 15.5%. The works in [
27,
28] study the issue of single UAV-assisted fairness communication and the proposed algorithms are not applicable to multi-UAV scenarios.
For multi-UAV scenarios, Liu et al. [
29] studied the fair coverage problem of UAVs and proposed a DRL-based control algorithm to maximize the energy efficiency while guaranteeing communication coverage, fairness, energy consumption, and connectivity. The study improves coverage scores and fairness index by 26% and 206%. A novel distributed control scheme algorithm is proposed in [
30] to deploy multiple UAVs in an area to improve coverage with minimum energy consumption and maximum fairness. The proposed algorithm covers 84.6% of the area and improves the fair index by 3% on the same network conditions. Liu et al. [
31] investigated how to deploy UAVs to improve service quality for GUs and maximize fair coverage according to the designed fair index. The above studies [
29,
30,
31] focus on the problem of fair coverage in multi-UAV-assisted communication by optimizing the locations of UAVs to cover the ground area in a fair manner. In contrast with the above studies, we are concerned with maximizing system throughput while considering user fairness in a multi-UAV mobile network.
1.2. Motivation and Contribution
UAV-assisted mobile wireless communication networks have many advantages compared with traditional terrestrial fixed communication infrastructures. However, UAVs as mobile base stations still face the following two problems. (1) Existing research mainly focuses on the communication efficiency. The research objectives aim to maximize communication metrics such as throughput, transmission rate, and energy efficiency by optimizing the flight trajectory and resource allocation of the UAVs. However, these studies ignore the fairness among GUs. For example, service user A can obtain high throughput and user B needs more urgent communication service. If we select throughput as the service goal, it will cause the UAVs to allocate more communication resources to user A maximizing the goal. Thus, the service request of user B is ignored despite the more urgent service request. (2) Another important problem in UAV-assisted communication is the optimization problem of UAV location, which is a typical optimal sequential decision problem. The computational complexity of heuristic and convex optimization algorithms grows exponentially with the increase of numbers of GUs and UAVs, which is not suitable for multi-UAV and multi-user scenarios.
Motivated by communication efficiency and user fairness, this paper investigates fair communication in UAV-BSs-assisted communication systems. We propose an information sharing mechanism based on gated functions and incorporate it into Multi-Agent Deep Reinforcement Learning (MADRL) to obtain near-optimal strategies. The main contributions of this paper are summarized as follows:
To evaluate the trade-off between fairness and communication efficiency, we design a Ratio Fair (RF) metric by weighing fairness and throughput when UAV-BSs serve GUs. Based on the RF metric, we formulate the UAV-assisted fair communication as a non-convex problem and utilize DRL to acquire a near-optimal solution for this problem;
To solve the above continuous control problem with an infinite action space, we propose the UAV-Assisted Fair Communication (UAFC) algorithm. The UAFC algorithm establishes an information sharing mechanism based on gated functions to reduce the distributed decision uncertainty of UAV-BSs;
To address the dimension imbalance and training difficulty due to the high dimension of the state space, we design a novel actor network structure of decomposing and coupling. The actor network utilizes dimension spread and state aggregation to obtain high-quality state information.
The remainder of this paper is organized as follows.
Section 2 introduces energy consumption and communication models. In
Section 3, we describe the problem and formulate it as a Markov decision process.
Section 4 describes the implementation process of the UAFC algorithm. The results and analyses of the experiments are presented in
Section 5. We discuss some of the limitations of this paper and future research work in
Section 6.
Section 7 concludes our paper.
Notations: In this paper, variables are denoted by italicized notation and vectors are denoted by bold notation.
denotes L2 parametric;
denotes a
W-dimensional vector space.
denotes set. For convenience of reading, the important symbols are listed in
Table 1 with the corresponding descriptions.
2. System Model
In this paper, we consider a wireless communication scenario in an area. The scenario contains multi-UAV and mobile GUs. UAVs provide communication services for GUs by optimizing the trajectories, as shown
Figure 1. Air-to-Ground (A2G) links exist between UAV-BSs and GUs.
K GUs are randomly distributed, and the set of GUs is denoted as
The location of
at time
t is denoted as
D UAVs are deployed as mobile base stations to provide communication services for GUs, and the set of UAV-BSs is denoted as
In the three-dimensional Cartesian coordinate system, the location of
at time
t is denoted as
. To reduce the additional energy overhead in the climbs of UAVs, we assume that UAVs fly at a fixed altitude
H, i.e.,
.
2.1. UAV Movement and Energy Consumption Model
The movement of GUs leads to the change of the channel quality between UAVs and GUs. Thus, the position of the UAV-BSs needs to be optimized to provide better communication services. As shown in
Figure 1, the coordinate of
at time
t is denoted as
.
flies towards the next position according to the moving distance
and the flight angle
, where the maximum flying distance and flight angle of
are denoted as
and
, respectively. Therefore, the next position of
is calculated as
The energy consumption of UAVs mainly depends on propulsion energy consumption and communication energy consumption. According to [
32], the propulsion energy consumption is related to the speed and acceleration of the UAV. To simplify the system model and computational complexity, the effect of acceleration on propulsion energy consumption is ignored [
33]. In time
t, the energy consumption
due to the movement of the
can be expressed as
where
is the propulsion power of
at time
. The remaining energy of
(denoted as
) is calculated as
where
is the maximum energy value of
being fully charged. The communication energy consumption of
in the [0,
t] period is denoted as
.
2.2. Communication Model
The transmission links between UAVs and GUs are modeled as a probabilistic channel model [
34], and the probability
of establishing LoS connection between UAVs and GUs is given by
where
X and
Y are the coefficients related to the environment, respectively.
denotes the horizontal distance between
and
.
The link path loss models between
and
are given for the LoS link and Non-Line-of-Sight (NLoS) link (denoted as
and
), respectively, as follows:
where
denotes the distance between
and
, and
denotes the carrier frequency.
and
denote the additional path loss of the LoS link and NLoS link [
35], respectively.
is the free space path loss.
c is the speed of light.
Therefore, the average path loss between
and
(denoted as
) is calculated by
In this model, the path loss threshold
is defined, and the link is considered broken when
. Thus, a binary variable
is defined, which denotes the association of the
to the
at time
t.
The transmission rate between
and
(denoted as
) is expressed as
where
denotes the channel power gain,
is the fixed transmit power of the
,
indicates the communication bandwidth allocated to
, and
represents the noise power spectral density.
4. The UAFC Algorithm
Since multi-agent systems are sensitive to the change in the training environment [
42], the policies obtained by agents may fall into local optimization. The Multi-Agent Twin Delayed Deep Deterministic policy gradient (MATD3) algorithm [
43] is based on the Actor–Critic architecture and incorporates policy smoothing technique in the actor network. The target policy smoothing technique is utilized to compute the target Q value, which is beneficial to improving the accuracy of the target Q value and ensure the stability of the training process. Thus, the proposed UAFC algorithm employs the MATD3 algorithm as the basic algorithm and adopts the MADRL framework with centralized training and distributed execution [
44] as shown in
Figure 2. In the centralized training stage, the MATD3 algorithm learns a policy by jointly modeling all agents. Specifically, the observations of all the agents are employed as input to the actor network, which outputs the joint actions of the agents. Thus, the problem of environment non-stationarity is solved according to centralized training. In the distributed execution stage, the UAVs cannot fully obtain the state information of the environment and other agents due to the limited perception ability. Thus, the unknown state information results in the uncertainty of strategy and makes it challenging for the agent to obtain the optimal strategy quickly. To reduce the distributed decision-making uncertainty of UAVs, the information-sharing based on gated functions is designed in the UAFC algorithm.
4.1. MATD3 Algorithm
As shown in
Figure 2, agents adopt the TD3 algorithm. Two main techniques are introduced to enhance the performance of the TD3 algorithm: clipped double-Q learning and target policy smoothing.
Clipped Double-Q Learning: The TD3 algorithm consists of an actor network with parameter
and two critical networks with network parameters
and
, respectively. We assume that the actions, states, and rewards of all agents are accessible during training. The actor network makes decisions based on the local state information, and the critic network utilizes the state–action pair to learn two centralized evaluation functions
to evaluate the policy. To avoid the overestimate of the Q value in a single critical network, the Q value is updated with the minimum value of the two critic networks. Thus, the target values
can be formulated as
where
indicates next moment state,
denotes the action generated by the target actor network.
Target Policy Smoothing: Furthermore, clipped Gaussian noise is added to the actor network to prevent overfitting of the Q value, which can achieve smoother state–action estimation and the modified target action.
4.2. Information Sharing Mechanism Based on Gated Functions
In addition, to reduce the uncertainty of distributed decision-making. The information-sharing mechanism based on gated functions is designed, which enables UAVs to establish state information sharing through a central memory
with a storage capacity of
M [
45]. The memory is used to store the collective state information
of the UAVs. As shown in
Figure 2, with the information sharing mechanism, the strategy of each UAV becomes
. The policy is determined by observation
and the information in the memory. Each UAV accesses the central memory to retrieve information shared by other UAVs before taking action. The neural networks are utilized to build policy networks for DRL. Furthermore, the gated functions are employed to characterize the information interaction between the agent and memory.
4.2.1. Encoding and Reading Operations
The encoding operation and reading operation are shown in
Figure 3. Each UAV maps its own state vector to an embedding vector (denoted as
) representing the state information and is given by
where
is a neural network with network parameters
.
The UAVs perform the reading operation to extract the associated information stored in
after encoding the current information. A latent vector
is generated to learn the temporal and spatial dependency information of the embedded vector
where
denotes the network parameters of the linear mapping.
H denotes the dimension of the context vector and
E denotes the dimension of the embedding vector. The state embedding vector
, the context vector
, and content
m in current memory
contain different information, respectively.
,
, and
m are employed jointly as input to learn a gated mechanism.
is utilized as a weighting factor to adjust the information reading from the memory and is given by
where
denotes the concatenation operation of the vectors and
conducts the calculation of the sigmoid activation function.
M represents the dimension of the content
m. Thus, the information reading from
(denoted as
) is given by
where ⊙ indicated the Hadamard product.
4.2.2. Writing Operation and Action Selection
The writing operation regulates the keeping and discarding of the information through gated functions, and the framework is shown in
Figure 4.
obtains a candidate storage vector
based on the state embedding vector
and the shared information
m by nonlinear mapping
where
is the network parameter. The input gate
is employed to regulate the contents of the candidate, and
is utilized to decide the information to be kept. These operations can be expressed as
Then,
finally generates newly updated information
by weighting the historical state information and real-time state information, and it is calculated as
After completing the reading and writing operations,
obtains the action
, which depends on the current state and the information reading from the
According to the above description, the pseudo code of the reading and writing operation based on gated functions is given in Algorithm 1.
Algorithm 1 Memory-Based Reading and Writing Operations |
- Input:
State information of UAVs: ; - Output:
Decisions of UAVs: ; - 1:
Initialize the state , memory ; - 2:
Initialize each actor networks of with weights and , respectively; - 3:
for to D do - 4:
Obtain state and the share information m; - 5:
Set ; - 6:
Generate observation encoding according to Equation ( 19); - 7:
Generate read vector according to Equation ( 22); - 8:
Generate new message according to Equation ( 25); - 9:
Update information in memory; - 10:
Select action according to Equation ( 26); - 11:
end for
|
Both reading and writing operations in Algorithm 1 are the core of the information sharing mechanism. The agents utilize gated functions to select the required information from the memory based on own observations. Thus, unknown state information can be obtained through reading operation. The read information and observations are jointly used as input to the policy network. Hence, the actions depend on observations and the state information of other agents. With the dynamic changes of both agents and environments, the information in the memory needs to be dynamically updated. The writing operation regulates the keeping and discarding of the information through gated functions. As a result, Algorithm 1 enables the sharing of state information among UAVs and avoids policy uncertainty due to the partial state information.
4.3. The Architecture of Actor Network
Furthermore, the actor network of the UAFC algorithm consists of more than one network. The input of actor network can be divided into three categories:
(1) The remaining energy of UAVs (). It determines whether the UAVs perform the mission.
(2) The location of UAVs and GUs (). They determine whether the UAVs should move to optimal location to provide great communication services.
(3) The information read from memory (). They can help UAVs create optimal policies.
The final actions of the UAVs depend on the comprehensive impact of these three categories of input information. If we directly input all the state information and share information into an actor network, it may hardly output desirable policy due to the imbalance and high dimension of state information. Thus, we design a novel actor network architecture of decomposing and coupling. The architecture decouples the input vector into three categories. Then, it expands the dimension of part state information () and aggregates three parts of information as a total input vector. This method of state dimension spread and state aggregation can address the dimension imbalance problem and reduce state dimension to generate higher-quality policy.
The actor network architecture is shown in
Figure 5. It aims to avoid the crash and service interruption of the UAVs due to insufficient power. Thus, the energy state of dimension size
D is very important. Furthermore, the energy state information dimension is much smaller than the position information dimension of the UAVs and GUs. There exists a dimension imbalance problem, which makes the algorithm difficult to converge. The dimension spread and linear mapping are utilized to process energy state, location state, and the information read from memory to obtain three state vectors with the same dimension, respectively. After the state decomposing and linear mapping, the input dimension is reduced and the vectors are denoted as
,
, and
. Then, network 4 combines
,
, and
into a new vector and as the input, and outputs the final action.
4.4. Training of UAV-Assisted Fair Communication
Algorithm 2 summarizes the UAFC algorithm for UAVs-assisted fair communication. First, the training data are randomly sampled from the experience replay pool.
and
are input into the evaluation and target critic network to generate state-action value function
and target state-action value function target
, respectively. The loss function is constructed according to
and
to train the critic network.
Algorithm 2 UAFC Algorithm |
- Input:
State information of UAVs: ; - Output:
Decisions of UAVs: ; - 1:
⊳ Parameter initialization - 2:
Initialize actor and critic networks parameters , , and , respectively; - 3:
Initialize replay buffer B; - 4:
for each
episode
do - 5:
⊳ Action generation - 6:
Obtain the action of from Algorithm 1; - 7:
Set and ; - 8:
⊳ Experience storage - 9:
take selected actions ; - 10:
obtain the reward R, state s transfers to new ; - 11:
The experience is stored in replay pool B; - 12:
⊳ Parameter updating - 13:
for to D do - 14:
Sample a random mini-batch of from B; - 15:
Update weights of evaluation critic networks by minimizing loss function according to Equation ( 28); - 16:
Update weights of evaluation actor network according to Equation ( 30); - 17:
Update the weights of the three target networks according to Equations ( 31) and ( 32); - 18:
end for - 19:
end for
|
Initialization (lines 2–3): During the centralized training phase, the actor network and critic network parameters are randomly initialized. Furthermore, the two storage spaces of the experience replay pool and the memory are initialized.
Generate action (lines 4–7): Each UAV through the current observation value and the information of other UAVs obtains the action according to the policy function .
Experience storage (Lines 9–11): The experience of each UAV can be expressed as a tuple . After performing the action, obtains the reward , and the current state will transfer to the new state at the next moment. Finally, the experience is stored into replay pool B with a capacity of .
Parameter update (lines 13–17): During the training process, experience
of size
is randomly sampled from the experience replay pool. The evaluation actor network generates a policy
according to
and
. The parameters of the evaluation actor network are updated according to the following policy gradient [
46]
Based on the policy
, two Q values, i.e.,
and
, are obtained by two evaluation critic networks. The parameters of the critic networks are updated by minimizing the loss function
According to the above loss function, each UAV updates three evaluation networks
where
denotes the learning rate, and the target network parameters are updated as follows
where
u denotes the updating rate.
4.5. Complexity Analysis
We evaluate the efficiency of the UAV-assisted fair communication algorithm by complexity analysis. The non-linear mapping of states to actions is achieved by a deep neural network during the offline training and online execution phases. The actor and critic networks contain
J-th layer and
F-th layer neural networks, respectively. Thus, the time complexity of the UAFC algorithm (denoted as
) is given by
where
represents the number of neurons in the
j-th layer of the actor network, and
represents the number of neurons in the
f-th layer of the critic network.
A matrix of
and a bias of Q exist in a fully connected neural network. Therefore, the number of storage unit required by a fully connected neural network is
, and thus, the space complexity is
. In addition, it is also necessary to allocate storage space to the experience replay pool and memory to store information in the process of training, and the space complexities are
and
, respectively. Hence, the space complexity of the UAFC algorithm (denoted as
) is formulated as
In the distributed execution stage, only the trained actor network is needed. Thus, the space complexity of the execution phase is
and the time complexity is
5. Performance Evaluation
In this section, we introduce the detailed settings of the algorithm and simulation parameters, and conduct extensive simulation experiments to verify the effectiveness of the UAFC algorithm.
5.1. Simulation Settings
We verify the performance of the UAFC algorithm through extensive experiments. The experimental platform is built based on Intel Core i9-11900H, NVIDIA GeForce RTX3090, and Tensorflow-CPU-1.14. GUs are randomly deployed in a target area (500 m × 500 m) and move in random directions and speeds. The UAVs initializes their position randomly to provide communication services for GUs. The experimental parameters are shown in
Table 2. The two metrics are chosen for performance evaluation: A novel fairness index
and fair throughput are expressed as Equation (
12) and Equation (
13), respectively.
5.2. Training Results
To verify the impact of RF metric and state decomposing and coupling on algorithm performance. We compare the accumulative reward, fair throughput, and fair index of the UAFC algorithm with the UAFC-NSDC and UAFC-NRF algorithms.
No state decomposing and coupling (UAFC-NSDC): Compared with the UAFC algorithm, this algorithm directly involves complete state information without employing state decomposing and coupling.
No RF (UAFC-NRF): The UAFC-NRF algorithm maximizes throughput while ignoring the fairness of the GUs in communication services.
From
Figure 6a, we can observe that the UAFC algorithm achieves higher accumulative reward than the other two algorithms. This is because the UAFC algorithm takes into consideration the GUs fairness to serve more GUs and obtain higher coverage reward. Furthermore, the UAFC algorithm utilizes state decomposing and coupling to eliminate the influences of state dimension imbalance. Thus, UAVs can obtain high-quality policies to achieve great reward.
From
Figure 6b, we can observe that the UAFC-NRF algorithm converges to optimal value quickly. This is due to the fact that the UAFC-NRF algorithm ignores the fairness among GUs where the UAVs tend to hover close to partial GUs to achieve higher throughput. Compared to the UAFC-NRF algorithm, the throughput values of both UAFC and UAFC-NSDC are lower, since these two algorithms trade off the communication efficiency and fairness. To ensure fair communication services for GUs, the throughputs of UAVs are sacrificed, especially in the case of a limited number of UAVs.
The mean fairness index is employed as the evaluation indicator. From
Figure 6c, we can observe that the mean fair index of the UAFC algorithm outperforms the other two algorithms, because UAFC considers the fairness of GUs and the state information is involved into the actor network after state decomposing and coupling.
5.3. Performance Comparisons with Two Related Algorithms
In this subsection, we compare the accumulative reward, fair throughput, and fair index of the UAFC algorithm with another two existing algorithms.
MADDPG: The MADDPG algorithm in [
47] is utilized as the benchmark for designing the trajectory of UAV-BSs to maximize throughput without considering fairness.
MATD3: The MATD3 algorithm in [
48] is employed as a UAV trajectory planning and resource allocation algorithm based on MATD3 to minimize the time delay and energy consumption of UAV tasks.
Figure 7 shows the convergence curves of the UAFC algorithm and the three baseline algorithms for accumulative reward value, total fair throughput, and mean fair index, respectively.
Figure 7a shows the results of accumulative reward. The reward value of UAFC converges to around 6500 at 30,000 episodes. UAFC outperforms both MATD3 and MADDPG. This is because the information sharing mechanism makes full use of the shared states information among UAVs. The information is conducive to finding the optimal policy and avoiding falling into the local optimum. Furthermore, the training curves of the three algorithms have obvious oscillations. This is because MADRL is different from supervised learning, since it has no clear label information.
Figure 7b shows the convergence curve of fair throughput. We can observe that the total fair throughput of the MATD3 algorithm is better than the UAFC algorithm in the first 25,000 episodes. This is due to the fact that the actor network of UAFC contains reading and writing operations, which are implemented through a multi-layer fully connected (FC) layer neural network. In the FC network, the gradient becomes smaller as the hidden layers propagate backwards. This means that neurons in the previous hidden layers learn more slowly than neurons in the later hidden layers. Thus, the UAFC is harder to train than that of MATD3. The total fair throughput curve of UAFC converges at 40,000 episodes and outperforms both MADDPG and MATD3 as the training time increases. This is because: (1) UAVs share state information with each other to obtain the optimal policy; (2) State information is processed by state decomposing and coupling. Thus, the input of actor network gains a low-dimension state vector which includes complete state information and reading information from
.
The experimental results of the three algorithms on mean fair index are shown in
Figure 7c. In the first 20,000 episodes, the UAFC algorithm fluctuates greatly and the numerical value is lower than that of both MATD3 and MADDPG algorithms. By decoupling and coupling the input of the actor network, the overall actor network has more layers and more complex structures. Furthermore, the distribution of GUs is changing, and it is difficult to obtain the best strategy for UAVs cooperative search. Both MATD3 and MADDPG converge to the local optimal value quickly, which can also be seen from the final convergence value. The fairness index keeps increasing and it finally converges to 0.72. Compared with the MADDPG and MATD3 algorithms, the mean fairness index of the UAFC algorithm is improved by
and
, respectively.
To demonstrate the performance of the UAFC algorithm more intuitively, we present the result of the UAFC algorithm and the four compared algorithms regarding the evaluation metrics in
Table 3. It can be seen that the UAFC algorithm outperforms the comparison algorithm in terms of reward function and fair index.
Table 4 shows a comparison of the results on reward function, equity throughput, and fair index. Compared with UAFC-NRF, UAFC has a 9.09% decrease in fair throughput. The main reason is that the UAFC-NRF algorithm does not consider fairness among GUs. As a result, the UAV can always serve GUs with high throughput. The results also show that the UAFC algorithm obtains a higher fair index by sacrificing part of the throughput.
Figure 8 shows the impact of GU numbers on system performance. The results indicate the following:
From
Figure 8a, we can observe that the average fair throughput of the three algorithms increases as the number of GUs increases. As the number of GUs served by the UAVs will increase as the number of GUs increases, there is an upward tendency in the average fair throughput. It is worth noting that as the number of GUs increases, the performance of the UAFC algorithm on fair throughput outperforms both MADDPG and MATD3. This is because each UAV can obtain the state information of other UAVs, and perform state decomposing and coupling. Thus, the UAVs can obtain complete state information of the environment and other UAVs;
Figure 8b shows the changing trend of the fairness index of the three algorithms with the increase of GUs. As the number of mobile GUs increases, the fairness of the three algorithms does not change much. The fairness index of the MADDPG algorithm is the lowest. This is because the absence of target policy smoothing regularization in the actor network of MADDPG leads to convergence to a local optimum. Furthermore, the UAFC algorithm numerically outperforms the MATD3 algorithm in fairness index. This is due to the fact that the UAVs can provide fair communication services for GUs in a collaborative way by sharing the states information of the UAVs.
Figure 9 shows the impact of number of UAV-BSs on system performance. From the results, we can conclude that:
More GUs can be served as the number of UAVs increases, thus, the fair throughput of the three algorithms gradually increases. The performance of the UAFC algorithm outperforms both MADDPG and MATD3 in terms of fair throughput. Thus, the UAFC algorithm can provide fair communication services. This is due to the fact that: (1) The information sharing mechanism is vital and reduce the distributed decision-making uncertainty of UAVs. Thus, the UAFC algorithm still performs well when the number of UAVs increases; (2) The dimension of state information increases with the number of UAVs. The state decomposing and coupling can address the dimension imbalance problem and reduce state dimension to generate higher-quality policy;
As shown in
Figure 9b, the fair index of the three algorithms increases as the number of UAVs increases. The reason is that more UAVs means more extensive coverage and the ability to cover almost all GUs. When the number of UAVs is 3, the UAFC algorithm improves the fairness index by
and
compared to the MADDPG and MATD3 algorithms. This is because the information mechanism and actor network architecture can help UAVs make decisions.
6. Discussion
In this section, we focus on some of the limitations of this paper. The main limitations of this paper are: (1) We model the UAV-assisted fair communication problem as a complex non-convex optimization problem that is an NP-hard problem. It is difficult to find an analytic solution. We utilize a DRL algorithm to solve it. The DRL algorithm cannot obtain an optimal solution, but it can be trained to obtain an approximate optimal solution. In addition, experimental results show that our algorithm is more effective than other algorithms. (2) UAVs can flexibly adjust their positions to establish good LoS communication links with GUs and provide reliable wireless communication environment. However, in some complex environments (e.g., urban scenarios), it is inevitable that the LoS link between the UAVs and the GUs will be blocked by high buildings or trees, affecting the quality of communication. In this paper, we assume that the links between the UAVs and the GUs is unaffected by obstructions. (3) In addition, UAVs carry very limited energy due to their limited size. This paper considers the residual energy consumption of UAVs only as a constraint and does not design methods to extend the flight time of UAVs, such as utilizing wireless power transfer technology to recharge the UAVs. In future work, we consider building more realistic mathematical models and designing more accurate solution algorithms.
7. Conclusions
UAV-assisted communication has been expected to be a suitable method for wireless communication. In this paper, we have studied the problem of UAV-assisted communication with the consideration of user fairness. First, a novel metric to evaluate the trade-off between fairness and communication efficiency is presented to maximize fair system throughput while ensuring user fairness. Then, the UAV-assisted fair communication problem is modeled as a mixed-integer non-convex optimization problem. We reformulated the problem as an MDP and proposed a UAFC algorithm based on MADRL. Further, inspired by the communication among agents, the information sharing mechanism based on gated functions is designed to reduce the distributed decision-making uncertainty of UAVs. To solve the problem of state dimension imbalance, a new actor network architecture is designed to reduce the impact of dimension imbalance and dimensional catastrophe on policy search through dimensional expansion and linear mapping techniques. Finally, we have verified the effectiveness of the proposed algorithm through extensive experiments. Simulation results show that the proposed UAFC algorithm increases fair throughput by 5.62%, 26.57% and fair index by 1.99%, 13.82% compared to the MATD3 and MADDPG algorithms. Intelligent Reflecting Surface (IRS) is a new technology in 6G that has received widespread academic attention. Combining IRS and wireless power information transmission technology is a good option to further improve the performance of UAV-assisted communication. In future, we will extend this paper to design a novel algorithm based on IRS.