1. Introduction
Lately, wireless communication has been evolving not only for high throughput, but also for ultra-reliability, efficient energy consumption, and to support highly diversified applications with heterogeneous requirements for quality of service (QoS) [
1]. To this end, extensive research efforts have mainly been devoted to fixed terrestrial infrastructures such as ground base stations (BSs), access points, and relays, which generally restrict their capability to cost-effectively meet the ever-increasing multifarious traffic demand. In order to address this problem, there is a great deal of growing interest in providing wireless connectivity from the sky under various airborne platforms, such as unmanned aerial vehicles (UAVs) [
2], balloons [
3], and helikites [
4]. Currently, UAV’s classification has a broad diversity. It was classified according to the basis of weight, altitude, and range, wings and rotors, and their application [
5]. By leveraging low-altitude UAVs (i.e., less than about one kilometer above the ground [
5]), the wireless communication system can provide swift deployment and high flexibility in mobility [
2]. UAVs can be used as a flying base station to handle short-term, erratic traffic demand in hotspots, such as a concerts and sports events, or through data offloading for congestion mitigation [
6,
7]. In other words, the UAV can provide additional aid as either a stand-alone aerial BS [
4,
8], or it can serve as a part of a heterogeneous network in a multi-tier airborne cellular network [
6]. It is conceivable that thanks to UAV-enabled wireless communication, QoS can be enhanced in terms of better channel conditions, guaranteeing line-of-sight (LOS) links, faster deployment, and better maneuverability. Moreover, it can be applied in many practical scenarios, such as public safety communications [
9], Internet of Things (IoT) communications [
10], and massive machine-type communications [
11]. For natural disaster scenarios, the authors in [
12] investigated the UAV-enabled method to search for survivors after disasters, while the deployment tool for the emergency networks was proposed by using the UAVs in a realistic disaster scenario in [
13]. With the above apparent advantages, it is foreseen that UAV-enabled communications will promisingly play a more important role in future wireless communication systems.
1.1. Related Works
In recent years, the reputation of non-orthogonal multiple access (NOMA) has risen intensively as a promising solution to critical issues in next-generation wireless systems [
14]. By allowing multiple devices to operate with the same frequency, time, or code resources, the NOMA technique has exhibited improved spectral efficiency and balanced and fair access, compared to orthogonal multiple access (OMA) approaches [
15,
16]. It should be noted that the NOMA method is typically based on superposition coding (SC) at the transmitters and successive interference cancellation (SIC) at the receivers. Many research efforts have paid attention to combinations of NOMA and UAV-enabled wireless communications technologies [
17,
18,
19,
20]. In [
17], the authors proposed multiple access mode selection (NOMA/OMA) based on conditions for a better outage probability in a UAV-enabled downlink wireless network. Sohail et al. [
18] investigated a power allocation approach to maximize the sum rate of the network by reducing the energy expense of the UAV, and numerical results were obtained by simulating various deployment environments, such as rural, urban, and dense-urban. In [
19], an efficient joint placement and power allocation method for a single UAV providing data services as a base station for multiple ground users was investigated to improve the total data rate of the UAV-assisted NOMA network. Besides, the authors in [
20] introduced a comprehensive framework for UAV base station cooperation in UAV-assisted NOMA networks, where the UAV and BS cooperate with each other to serve ground users simultaneously. As a result, by jointly relying on the UAV trajectory and NOMA precoding optimization, the proposed scheme demonstrated large throughput in the system. In [
21], the authors proposed the UAV-aided scheme to guarantee secure transmission for the ground receivers. In the following, we discuss solutions among research efforts dealing with issues of UAV communication systems.
1.1.1. UAV Communications Using Data Caching
During the past few decades, the driving forces behind traffic development have shifted from connection-centric communication demand (e.g., text messages and smart phones) to content-centric communication demand (e.g., popular music or video streaming). Although small base stations are densely employed to accommodate the ever-increasing traffic demand, a heavy traffic burden is still imposed on the backhaul links. One potential solution is to properly cache popular content at the network edge (i.e., UAVs, D2D devices, or relays) to serve the same requests of users without duplicate transmissions via the backhaul links. More explicitly, in UAV-assisted edge caching, UAVs generally cache the content via a limited wireless backhaul link and then distribute it to ground users. In this regard, many contributions to caching in UAV-assisted communication systems have been made [
22,
23,
24,
25,
26]. In [
22], UAVs were dispatched to store enhancement layer segments of video beforehand and then provided the transmissions to users who requested the videos. Chen et al. [
23] proposed appropriate content caching during off-peak times in a cloud radio access network, which is based on the user’s behavior prediction. In [
24], an effective liquid state machine learning approach was investigated to predict the content-request distribution of users in Long-Term Evolution-Unlicensed (LTE-U) UAV transmissions systems while having only limited information on the network states. The proposed algorithm also enables UAVs to optimally allocate the bandwidth over licensed and unlicensed bands and to satisfy the queue stability requirements of each user. In addition, local content caching at the ground users (GUs) was also studied [
25,
26]. In particular, the authors in [
25] proposed a solution to deal with the endurance issue at the UAV in which the ground users cooperatively cache files delivered by the UAV. Then, the files are retrieved either from its local cache or neighbor cache. Meanwhile, a joint UAV-trajectory and time-scheduling approach was proposed in [
26] to maximize the secrecy rate in UAV-relaying systems.
1.1.2. UAV Communications Using Energy Harvesting
From the standpoint of wireless communication, UAV-enabled communication system operations are quite energy consuming owing to the support of the UAV’s propulsion in the air, the communications with users, and application-based purposes. Therefore, UAVs usually have very limited endurance due to energy constraints. To address this issue, several methods have been introduced to alleviate UAVs energy consumption by, for example, reducing the UAV’s weight [
27] and planning energy-efficient UAV flight paths [
28,
29]. The authors in [
28] investigated a path planning algorithm that minimizes energy consumption while satisfying coverage and resolution. Meanwhile, an efficient approach was proposed to maximize the UAV’s energy efficiency under the constraints on the trajectory [
29]. However, the energy supply for the UAVs is still basically unsustainable due to the limited battery capacity. Thus, the fundamental UAV endurance problem remains unresolved. Currently, energy harvesting of radio frequency (RF) signals has attracted increasing interest in UAV communications for the purpose of prolonging network lifetime or maximizing system throughput [
30,
31,
32]. In [
30], the authors designed a power splitting-based relaying scheme for a UAV capable of energy harvesting (EH) and information forwarding to optimize the network throughput. Similarly, in order to save energy in the UAV’s battery and to enhance flight endurance, Yang et al. [
31] applied EH technology to the UAV for collecting RF energy from the ground base station, and an outage probability analysis in terms of urban environment parameters was derived. Apart from RF-powered UAV communication systems, other energy resources for UAV’s operation have also been applied by using solar power [
32,
33,
34,
35] or laser power [
36]. In [
32], the UAV was equipped with solar panels to convert the harvested solar energy to electrical energy with the aim of enabling long endurance flights. The authors in [
33] proposed the optimal trajectory planning to maximize the harvested solar energy. However, the design did not consider the impact of harvested solar energy for communications. Sun et al. [
34] investigated the resource allocation for multi-channel solar-powered UAV systems to maximize the throughput. Nevertheless, their constant aerodynamic power consumption model may not be applied to the realistic scenarios since the aerodynamic power consumption significantly depends on the UAV’s flight velocity. Moreover, to satisfy the QoS requirements of the users, the authors in [
35] proposed joint trajectory and resource allocation algorithms to maximize the system sum throughput. However, the designs for UAV communication systems in [
34,
35] may lead to the high mass and size overheads of the solar-powered UAVs. Furthermore, using the OMA technique can result in a low spectrum efficiency of the massive access scenarios.
1.2. Motivations and Contributions
To the best of our knowledge, in most existing works on EH-powered wireless communication systems, information about the energy harvesting arrival is assumed to be known. Thus, it is not always available for designing appropriate solutions to the real wireless UAV communication issues. Moreover, the consumption models for the propulsion energy of the UAVs are quite sophisticated and critically depend on many factors such as the UAV’s trajectory, velocity, and acceleration [
29,
34,
35]. This can increase the complexity of developing contemporary schemes. On the other hand, establishing the schemes using only harvested energy for both wireless communications and flight operations might not optimize EH-powered UAV communication performance if the UAV carries a high energy consumption aerial base station or if the harvested energy arrival rate is small [
34,
35,
36,
37]. By combining all these issues, devising a method to optimize the service performance of the EH-powered UAV to multiple ground users is still a very challenging task, especially in unexpected circumstances such as temporary disaster areas or complex terrains.
Motivated by the above analysis, in this paper, we propose two joint caching and power allocation schemes for solar-powered, UAV-enabled NOMA communication systems under two scenarios. In the first scenario, the system has the prior knowledge of the harvested energy distribution of the UAV. On the other hand, in the second scenario, we consider the case that the system does not know the harvested energy distribution of the UAV. The GUs require the number of data items stored in the local station. Nevertheless, there are no available direct links between the local station and the GUs due to unexpected or emergency circumstances such as natural disasters, obstacles, and long-distance transmissions. The deployment of terrestrial infrastructure can be infeasible and challenging owing to sophisticated environments, as well as high operational costs. Thus, the UAV is employed to cache part of the content from the local station and deliver data to the GUs. In this work, the UAV can harvest solar energy from the ambient environment. However, the solar panel equipped on the UAV cannot sufficiently provide long-term operation due to its large mass, high mobility energy, and communication energy. To address this problem, the battery is fully recharged at the local station (LS) by the grid power whenever the UAV returns to the station.
There are two portions in the battery: mobility capacity used for flight operation and transmission capacity used for data transmissions. Mobility capacity representing the space needed for flight energy occupies a large portion of the battery. Therefore, the remaining space required for data transmissions (i.e., transmission capacity) in the battery is significantly limited. The amount of initial energy for data transmissions in the battery is not enough for providing the higher data rate to the GUs in the long term. It is supposed that the UAV always harvests the energy during its flight. Hence, during the serving time of each round, the UAV can leverage harvested solar energy to transmit data to the GUs. The mobility energy is assumed to be preserved enough in each round; thus, the harvested energy used for data transmission has a higher priority during the serving time. This means the harvested energy is used for replenishing the transmission capacity before it is used to charge the mobility capacity during the serving time. Besides, the battery is always recharged by the harvested energy during the non-serving time to reduce the grid power consumption required for charging and additional charging time when the UAV is at the LS. In other words, the harvested energy is stored in the on-board battery, which can be used not only for providing data transmission services to GUs during the serving time (i.e., the duration time that the UAV flies around the circular trajectory), but also for recharging the battery for its flight operation during the non-serving time (the time when the UAV approaches the LS and the time when the UAV goes to the serving area). Therefore, it is worth applying solar harvesting to the UAV-based communication system.
Instead of using conventional orthogonal multiple access (OMA) (e.g., TDMA, FDMA, CDMA), which causes low spectrum efficiency, the NOMA technique is applied to enhance the data rate of the UAV system in which the UAV can simultaneously transmit data to the GUs. In this paper, there are three phases of the UAV’s operation: (1) performing the caching update process and then approaching the serving area, (2) flying along the circular trajectory while doing the communication process, and (3) returning to the LS for re-caching the files and recharging the battery, as shown in
Figure 1a. The caching update process is implemented at the local station in which the UAV pre-caches part of the content from the local station and replenishes the battery for the next round. Then, it approaches its serving area to start flying along the predefined circular trajectory where the GUs can be served. Next, the communication process of the UAV will be executed in which the UAV can transmit data based on the content requests of the GUs during the UAV’s flight following the predefined circular trajectory. After finishing a circular trajectory flight period, the communication process will temporarily be terminated, and the UAV needs to go back to the LS for re-caching the content and battery recharging. These processes will repeat until the UAV satisfies the GU’s requests. In this paper, using solar harvesting for the UAV will help relieve the burden of grid power-based energy consumption. Furthermore, finding the proper solution for the solar-powered UAV to provide the energy-efficient communications is still a challenging task under the limited energy harvesting technology. This can make the solar-powered UAV system more applicable to the real wireless system scenarios. In a nutshell, the main contributions can be summarized as follows.
Firstly, we study a model of a cache-enabled downlink UAV communication network. Ground users request data items stored in the library of a local station, but direct links are not available. Thus, the solar-powered UAV is employed to cache content from the local station and then approach distant users to execute data transmissions using NOMA technology. However, the UAV is equipped with both limited battery capacity and cache capacity. Therefore, we aim to efficiently allocate the harvested energy to the GUs for the long-term operation.
Secondly, we formulate the problem of the sum data rate maximization as the framework of a partially observable Markov decision process (POMDP). An iteration-based dynamic programming approach is proposed to obtain the optimal policy for the UAV in order to maximize the system data rate under the assumption that the UAV has prior environment information. With this method, the UAV can efficiently cache the content from the local station at the beginning of each flight period and allocate an appropriate portion of transmission power for the GUs throughout every time slot under energy and cache constraints.
Thirdly, we present another approach using an actor-critic-based reinforcement learning algorithm to deal with the problem in the scenario where the UAV does not have information on environment dynamics in advance. With the actor-critic-based method, the UAV can interact with the environment and gradually learn the optimal policy as time goes on, based on trial-and-error without prior environment knowledge.
Lastly, extensive numerical results are provided to validate the proposed algorithm’s performance through various network parameters. We show that, with joint caching and power allocation, the two proposed schemes are superior to the benchmark schemes where the UAV greedily utilizes transmission power without long-term considerations.
The remainder of this paper is organized as follows. The model for the EH-powered UAV downlink communication system is presented in
Section 2. Next, we describe the proposed POMDP-based joint cache scheduling and power allocation scheme in
Section 3, and the proposed actor-critic-based learning framework is presented in
Section 4. The discussions on the simulation results are elaborated in
Section 5. Finally, we conclude this work in
Section 6.
2. System Model
We consider a caching-based UAV-enabled downlink wireless transmission system adopting non-orthogonal multiple access and content caching technologies where a UAV,
F, is employed as a mobile base station to serve a group of
I ground users, denoted by
. We assume GUs do not have direct links to the local station (LS) where all content that the GUs requests is stored. This kind of network scenario can be a practical instance in suburban environments where the deployment of communication infrastructures is still restricted or in urban environments where damage of the infrastructures may happen due to natural disasters. Thus, the remote users may not get services from a local station. For that reason, the UAV is dispatched to obtain cached contents from the LS, and it then flies along a predefined trajectory to transmit the requested data to GUs. In the existing works, given the user distribution in the network, many effective methods have been used to optimize the UAV’s placement and UAV’s trajectory [
19,
20]. Besides, the approach to maximize the coverage area of the UAVs was well studied in [
8]. Therefore, in this paper, we do not optimize the flight trajectory and coverage region of the UAV in the network. Instead, we aim to maximize the long-term throughput based on the predefined trajectory. For that reason, we assume that the circular trajectory of the UAV’s flight is known based on the locations of the GUs, and the coverage region of
F is large enough to guarantee the connection to all GUs by following the predefined circular trajectory with a reasonable radius and altitude. This means that the GUs are still in the UAV’s coverage during the UAV’s circular flight such that the GUs always get the data delivery from
F. It is noteworthy that the system still can be well applied for other flight trajectories of the UAV. Our main goal is to allocate the appropriate transmission power to the GUs and schedule data caching of the UAV under a predefined trajectory to obtain the maximal long-term data rate of the system.
Each data transmission is executed in every time slot
t, and meanwhile, each caching action is executed at the beginning of a flight period, which is determined as a round in which the UAV flies to the serving area and then flies along its predefined circular trajectory and returns to the LS. However, due to a limited cache capacity, it can only periodically cache part of the content from the LS at the beginning of every flight period. The GUs are assumed to have a fixed power supply, whereas the UAV has a limited-capacity battery. Hence, UAV
F is equipped with an energy harvester to scavenge solar energy from the ambient environment to replenish its battery. We assume the UAV works in an ideal environment without any environment factors (e.g., wind). Suppose that the UAV continuously flies at a constant velocity,
, in a circular trajectory with radius
, at altitude
, and the UAV position repeats every
(seconds). Thus, the flight length for the circular trajectory is defined as
, and the number of time slots discretized in each circular trajectory length is determined as
, where
T is the time slot duration. Note that the UAV’s location is assumed to be unchanged during each time slot when
T is chosen sufficiently small in the system [
35].
Without loss of generality, we consider three-dimensional (3D) Cartesian coordinates
where
represents the ground plane and
z is the altitude. The location of
is denoted as
. In fact, when disasters occur, the network infrastructure may be corrupted. However, the GUs can still position their location easily thanks to a GPS decoder, which is integrated into most mobile devices currently. Thus, the GUs can report their locations to the UAV such that the UAV can calculate the flight trajectory to serve the GUs’ requests. For the devices without GPS, the UAV can still estimate the GUs locations based on the received signal strength indicator (RSSI), which is well studied in the literature [
38,
39]. Furthermore, when the locations of the users are known, determining the flight trajectory of the UAV was proposed in [
20]. In this paper, we do not focus on an approach to obtain the GUs’ locations and the UAV’s trajectory. Instead, we mainly focus on the power allocation with data caching at the UAV to maximize the long-term data rate of the system. Therefore, it is assumed that the GUs’ locations and the UAV’s trajectory are known in advance. Herein, we establish the formulation for the circular trajectory of the UAV in the serving area, which is defined as the region where the GUs are located. The 3D setup of the considered network consisting of the LS, the UAV, and multiple GUs is illustrated in
Figure 1a. Point
, located at
, is the center of the circular trajectory with radius
, in which
F flies. Let
denote the angle of the circle of
F’s location with respect to the
x-axis. The location of
F at time slot
t can be determined as
. The time frame structure of the system is illustrated in
Figure 1b. The time frame is divided into four phases: GUs’ requests (
), UAV’s decision (
), data transmission (
), and information update (
). At the start of a time slot, the GUs will send data item requests to
F. Then, a decision will be determined at
F by allocating the transmission power to the GUs based on the current state of the system. Subsequently, data transmission will be conducted according to the assigned power portions for the GUs in the data transmission phase. Finally, the system will update its state at the end of the time slot.
2.1. Channel and Transmission Models
According to the above network setup, the time-dependent distance between
F and
can be calculated as:
where
denotes the Euclidean norm operation. In practice, the air-to-ground wireless channels from the UAV to GUs are normally dominated by LOS links, where the quality of the channel only depends on communication distance [
40]. Moreover, UAV-assisted information dissemination is more necessary in rural regions than in urban regions [
2]. In rural regions, building density is very low, and thus, the probability of non-line-of-sight links is also low. Therefore, in this paper, wireless channels from
F to the GUs are assumed to follow a free-space path loss model. As a consequence, channel power gain from
F to
at time slot
t can be expressed as [
41]:
where
represents the channel power gain at the reference distance,
m, which depends on the carrier frequency, antenna gain, etc.; and
is the path loss exponent. Suppose that
F has access to the flight control and location information of the GUs for power allocation. Besides, it is worth noting that the channel gain between
F and the GUs varies over period
due to the movement of
F. Given the location of
F at time slot
t, the channels of the GUs are sorted in
F to apply NOMA.
Typically, a NOMA scheme enables a base station to serve multiple users simultaneously over the same frequency band. The power portions for users are assigned in an inversely proportional manner based on their channel conditions, in which the low channel gain user requires a higher allocated transmission power, and vice versa. We assume that each GU’s channel gain is placed in an ascending manner in time slot t.
According to the downlink NOMA principle, UAV
F will transmit a combined signal,
, to all GUs with the assigned power portions in time slot
t. Specifically, with the content requests of the GUs in time slot
t, the transmitted signal by UAV
F can be written as:
where
is the normalized information for
in time slot
t with
;
represents the total transmission power that
F uses to transmit data to the GUs, in which
is the amount of transmission energy used by
F in the time slot; and
denotes the power portion allocated for
in time slot
t (s.t.
). The received signal at
in time slot
t can be given by:
where
is the zero-mean additive Gaussian noise with variance
at
. Let us denote the descending order vector of power portions, as
. The GU with the highest power portion (with index
), treats all signals of other GUs as interference and directly decodes its own information without using SIC. Nevertheless, other GUs need to employ the SIC process where they first decode signals that are stronger (i.e., the GUs with a higher assigned portion) than their own desired signals. Then, those signals will be subtracted from the received signal, and this process will continue until the GUs’ own signals are decoded. In other words, each GU will decode its own information by treating other GUs’ signals (with smaller power portions) as interference. As explained above, assume that all the signals of
, for
, have been perfectly decoded by
. Thus, the signal-to-interference-plus-noise ratio (SINR) at
for decoding its own information is given as:
Consequently, the achievable rate at
in (b/s/Hz) to decode its own information in time slot
t can be calculated as:
Additionally, the SINR at
to decode the information of
, for
, can be expressed as:
Similarly, the achievable rate at
in (b/s/Hz) to decode the information of
for
in time slot
t can be calculated as:
Finally, the sum rate of the system in time slot
t can be expressed as follows:
where
represents the achievable rate at
in time slot
t subject to
.
More specifically, for a better understanding, let us take an example with
: if
, then
and
. At
, by using SIC, it first decodes
and then cancels it out from (4) to decode its own signal,
. Meanwhile, at
,
is directly decoded without performing SIC. As a result, the achievable data rates at
and
can be respectively calculated by:
and:
Eventually, the sum rate of the system in time slot
t can be given as follows:
2.2. Data Request Behavior of the Ground Users
In this paper, library
in the LS contains
K different finite data items for the requests of GUs. Data items are essentially an abstraction of application data, which might range from database records, web pages, ftp files, etc. We consider the content requests of the GUs to be unrelated to each other. Let us assume that the probability that each GU accesses the same data item in the two consecutive time slots is pretty high, but accesses to the other data item are smaller. That is realistic since the users tend to frequently access the same data source of their interest for a long duration. Thus, we model the request of each GU as a discrete-time Markov chain where the state transition probability of
for two adjacent time slots is illustrated in
Figure 2a.
and
(where
) represent the probabilities that
requests the same data item,
m, or another data item,
, respectively, in two adjacent time slots.
and
(where
) are the probabilities that
requests different items in two adjacent time slots. It is assumed that if the request of
in time slot
t is item
m, then the probability that
requests item
in time slot
can be computed as:
where
K is the total number of data items in library
. It is worth noting that when
requests an item that is not among the cached data items in the UAV, it cannot receive that requested data from the UAV, and thus, no transmission power will be allocated for
in this time slot, i.e.,
.
2.3. Content Caching Model of UAV
This paper adopts a traditional caching technique for UAV F for serving the requests of the GUs in the network. Since the number of data items that can be cached by F is restricted to a caching capacity, , the UAV needs to cache new data items from after each flight period j to replace the old cached items. With periodical caching, performance can be enhanced according to the GUs’ requests. In this paper, the non-serving time that includes the duration for the UAV to cache the items, approach the serving area, and return to the LS is approximately unchanged and will not affect the data rate maximization during the serving time. Therefore, the non-serving time can be ignored in the paper, and the term flight period can be referred to as the circular trajectory period of the UAV henceforth. Let denote the cache content vector of UAV F in period j. Based on the data request behavior of the GUs, the cache content vector, , where the data items are cached in period j is divided into two parts: the request-based cache vector, , and the random cache vector, , and can be expressed as , s.t. . The former consists of the items cached based on the latest requests of the GUs, while the latter is determined by randomly caching items from the library, except for the items in the request-based cache. In particular, at the start of new flight period j, F will cache the same data items based on the latest items requested by the GUs (i.e., the items requested at the last time slot of previous period ), and the rest of the space in is fulfilled by randomly selecting another items from library in the LS, such that each item cached in is unique in the current period. The reason for this caching model is because the probability that requests the same item is assumed to be much greater than that of GUs requesting a different item between two adjacent time slots, i.e., , as presented in the previous subsection.
We use
to denote the item request vector of the GUs, where
represents the item request of
at the start of time slot
t, and meanwhile,
denotes the total number of time slots in each circular trajectory period. If the GUs request data items different from each other in the last time slot of period
j, i.e.,
, the request-based cache vector,
, and the random cache vector,
, for the next period,
, can be respectively determined as follows:
and:
where
is the cached item
of
. It is worth noting that if there are similar requested items among the GUs’ requests in the last time slot of period
j, then UAV
F will only cache these same items one time in
for use in the next period,
, to save cache space in
.
An example of the caching process by UAV
F can be illustrated in
Figure 2b with
,
,
, and
. In time slot
, which belongs to period
, the requests of the GUs are
,
, and
, and then,
and
. In time slot
with
, the requests of
are duplicates, and the request of
; thus,
and
.
2.4. Energy Harvesting Model of the UAV
In this paper, UAV
F is assumed to have a limited-capacity battery,
, and it is equipped with an energy harvesting circuit to harvest solar energy for its operation. UAV
F can simultaneously harvest solar energy and perform other operations such as forward movement, climbing up and down, and data transmissions. In this work, we aim at efficiently using the harvested solar energy in the UAV in order to allocate proper transmission power to the GUs during the serving duration. Since the amount of flight energy consumed for a round trip of the UAV can be approximately estimated, for simplicity, the energy portion for the mobility of the UAV is not shown in the formulation. Thus, we only consider the battery capacity portion required for the data transmission (i.e., transmission capacity), and it is also denoted as
for our simplified formulation purposes. If
is full during the serving time (i.e., the maximum value of the transmission capacity portion is achieved), the rest of the amount of harvested energy will be stored in the mobility capacity portion that is used for the UAV’s flight. Herein, the amount of energy harvested by
F in time slot
t, denoted as
, is finite, where
;
, and
and is assumed to follow a Poisson distribution [
42]. The authors in [
42] carried out empirical measurements for the modeling of a solar-powered wireless sensor node in time-slotted operation and showed that the stored energy characteristics depend on many factors such as the time slot duration, light intensity, power level, and the deployment environment. As a result, the Poisson distribution model achieved a near fit for the collected measurements. The probability distribution of the energy harvested by
F can be given by:
where
represents the mean energy harvested by
F. For tractability in the simulation, the amount of harvested energy can be approximated, and the maximum harvested energy can be determined according to network parameters such that the cumulative distribution function is close enough to one.
2.5. Sum Rate Maximization Formulation
In this paper, we aim to optimize the transmission power allocated to the GUs and the content caching by UAV
F such that the sum cumulative data rate of ground users can be maximized in a long-term operation. Thus, the problem formulation can be expressed as follows:
where
is the cache content vector of UAV
F in flight period
j;
represents the upper bound of the transmission power that
F can use to transmit data to the GUs. Constraint (a) specifies that the UAV totally assigns its transmission power,
, to GUs that request items from the UAV’s cache in time slot
t. Constraint (b) guarantees that the total transmission power for GUs in each time slot is no greater than the maximum transmission power that the UAV can use without causing it to be inactive owing to an energy shortage. Finally, Constraint (c) ensures that every cached item is unique in the cache content vector for period
j, where
represents the
item of cache content vector
.
It is worth noting that although maximizing the energy utilization in the current time slot can optimize the temporal data rate of the system, it may cause inactivity upon data transmission in the subsequent time slots due to an energy shortage in F. Consequently, it can significantly degrade the long-term sum rate of the network. Furthermore, dynamic data requests of the GUs will also affect the performance of the system, since the caching constraint on F is taken into account. Therefore, according to the system state, finding an optimal policy for joint cache scheduling and power allocation in F to obtain the maximum long-term sum rate of the system is the main goal of this study.
5. Simulation Results
In this section, we present the numerical simulation results regarding the performance of the two proposed schemes and those of other benchmark schemes based on the Myopicmethod [
48]: a Myopic-NOMA scheme, a Myopic-NOMA-RCscheme, and a Myopic-OMA scheme. The term “Myopic” represents the solution in which the optimal decision is made only for the current time slot without considering the future evolution. In the Myopic-NOMA scheme, the UAV always transmits data with optimal transmission power to the GUs by using NOMA whenever more than two GUs’ requests are in the cached content of the UAV. Similarly, in the Myopic-NOMA-RC scheme, the UAV randomly caches items from the LS and always transmits data to the GUs with the optimal transmission power by using NOMA. Lastly, in the Myopic-OMA scheme, OMA data transmission is always used with the optimal transmission power. In particular, with this scheme, the data transmission phase is divided into
equal sub-slots, where
is the number of involved GUs for the data transmissions in time slot
t, and the UAV will transmit the corresponding data to each GU through each sub-slot. Therefore, the sum data rate of the Myopic-OMA scheme in time slot
t can be calculated with
. Nevertheless, these benchmark schemes only consider the current time slot for maximizing the sum rate. Thus, their optimal policy is made by using the maximum level of transmission energy available in the battery in the current time slot, which can lower system performance in a long operation owing to an energy shortage for data transmissions in subsequent time slots. Meanwhile, the proposed schemes consider not only the current reward, but also the future reward, which was thoroughly presented in
Section 3 and
Section 4. Thus, in the following, we can verify the effectiveness of the two proposed schemes under changes in network parameters.
Table 2 shows the parameter setup, and the network topology with
is illustrated in
Figure 5. Unless otherwise stated, the transmission energy in the UAV is divided into five equal levels ranging from
, and there are eight levels in the UAV’s battery, from zero to
. The span of power portion
is 0.025. In this paper, the simulation results were achieved by averaging
time slots. Besides, the harvested energy was stochastically generated in each slot by a Poisson distribution with the mean value of harvested energy
J. During the serving time, there might be no energy for data transmissions by the UAV, which is referred to as energy shortage. In that case, it has to stay silent and wait for upcoming harvested energy in subsequent time slots to transmit data to the GUs.
We first examine the convergence rate of the actor-critic-based scheme during the training process under various values of
and
for the mean value of harvested energy,
J, based on the achievable sum rate calculated every 1000 time steps, as shown in
Figure 6. Besides, the optimal value line is plotted according to the policy obtained by the POMDP-based approach. It is noted that the convergence condition of the algorithm is defined as the convergence condition of the sum data rate. That means that during the training process, the sum data rate is averaged after every batch of 1000 training time slots, and then, the difference between two adjacent updates,
, is calculated. In the simulation, we set the convergence condition for the algorithm at
. It is observed from
Figure 6 that the sum rate of the system after each iteration of 1000 slots sharply increases in the first 100,000 time slots and then gradually converges to a locally optimal policy that depends on the values of
and
. Therefore, in the simulation, we repeated the training process a number of times and then selected the policy with the proper actor and critic step size values that provide the maximum average rate. In particular, with step sizes greater than 0.1, the proposed scheme provides faster convergence; however, it leads to a lower data rate after 200,000 time slots of training. We can also see that if we keep decreasing the step size values to less than 0.1, the algorithm might converge to a worse policy due to overfitting. Besides, it is obvious that with the network parameters in this paper, the proposed scheme with critic and actor step sizes
provides better performance, in which the data rate mostly converges to the optimal value, given by the POMDP-based scheme, after 200,000 time slots of training. Therefore, we chose actor-critic step size values at
for the rest of the simulations.
Figure 7 shows the sum rate according to the mean value of harvested energy in the UAV. It can be seen that the throughput of the system increases when the mean value of the harvested energy goes up. That is because the UAV can harvest more energy from the environment; thus, a number of higher power transmissions can be used for data transmissions during its flight period. We can see that the system rates of the proposed schemes dominate the conventional schemes in which the actor-critic-based method can be approximately as good as the POMDP-based method, and the two proposed schemes can provide a system data rate 10% higher than the Myopic approaches. Next, we compare the energy efficiency of the schemes with respect to mean value of harvested energy in
Figure 8. In this study, we aim to efficiently utilize the solar harvested energy of the UAV in the long-term operation. When the transmission capacity is full during the serving time, the rest of harvested energy can also be stored for the mobility capacity portion to support the UAV’s flight. Moreover, the overflow energy of the battery is considered as the wasted energy consumption of the system. For that reason, in the simulation, the energy consumption is calculated as the total harvested energy during the UAV’s operation. All schemes with each mean value of harvested energy, in
Figure 8, have the same total amount of energy consumption in
time slots. In the paper, energy efficiency is defined as the sum data rate over the total harvested energy during the UAV’s operation. As a consequence, the curves in
Figure 8 can be interpreted as the sum-rate according to energy consumption.
In order to explore the behavior in terms of transmission power by the UAV, in
Figure 9, we plot the statistics of the actions in the POMDP scheme, the actor-critic scheme, the Myopic-NOMA scheme, and the Myopic-OMA scheme over 200,000 time slots. The notation
represents the transmission mode with a level of
where
is the level of transmission energy. We can see in
Figure 9 that the Myopic-NOMA scheme and the Myopic-OMA scheme tend to choose the highest transmission power for the purpose of maximizing the instant reward. Obviously, the statistics of selected actions in these myopic schemes are similar, but the achievable reward of the NOMA scheme is higher than that of the OMA scheme owing to the effective utilization of the NOMA technique. However, due to the limitation on harvested energy, using too much energy in a time slot may cause the energy shortage, in which the UAV has to stay silent for many future time slots. This will lower the data rate of the system. On the other hand, simultaneously assigning an appropriate amount of transmission energy can give the UAV more chances to stay active and transmit data to the GUs under the environment dynamics, such that a maximum long-term data rate can be guaranteed.
In
Figure 10, we plot the sum data rate according to different values of caching capacity. The curves show that the system performance is enhanced if the UAV has a higher caching capacity. Obviously, with a larger value of
, the UAV can store more items from the LS, and then, the probability that the GUs’ requests are in the cached content of the UAV will increase, which leads to the higher data transmission rate. On the other hand, we can see that the higher
also brings higher performance of the system. The reason is that the GUs will more frequently request their own items of interest during the time slots.
Figure 11 and
Figure 12, respectively, show the impact of noise variance at the GUs and the effect of the altitude of the UAV on the system reward. We can see that system performance notably declined as the noise power at the ground users (as well as the altitude of the UAV) grew. In order to explain this, noise power will lower the throughput for each GU’s data recipient, and meanwhile, a farther distance between
F and the GUs will increase path loss during data transmissions.
Finally, we further investigated the joint effect of both the number of items,
K, in the library, and caching capacity
in the UAV on the system data rate.
Figure 13 indicates that the system reward will increase with an increment in the ratio of
over
K. For example, if the number of items is
, the data rate of the system will go up when increasing caching capacity
. Furthermore, the results of the POMDP-based and actor-critic schemes are superior to the Myopic-NOMA scheme. The reason is that the proposed POMDP scheme exploits prior information on the harvested energy distribution and on the request model of the GUs, and then, it calculates the possible situations and corresponding probabilities. The actor-critic method can explore the information from interacting directly with the environment, and it then learns the optimal policy through trial and error. Consequently, the next state of the system can be predicted, and the UAV can efficiently allocate transmission power for the GUs based on NOMA and caching technologies under the long-term operation considerations. On the other hand, the presented numerical results validate the effectiveness of the proposed approaches through various network parameters in this paper.