1. Introduction
In the last few years, the Internet of Things (IoT) has been rapidly growing across the world, which enables users to experience various environmental monitoring services [
1]. In these services, the monitoring server periodically updates its service (e.g., a traffic and air quality monitoring map) by analyzing the received fresh data (e.g., the air quality and traffic intensity on each location) from sensor nodes (SNs) during the service update cycle [
1,
2,
3]. To provide high-quality environmental monitoring services, fresh data need to be collected steadily from as many SNs as possible [
1,
4]. Nonetheless, gathering consistent data from numerous SNs is challenging because of their limited battery capacity. To address this issue, various systems have incorporated intermediate nodes capable of relaying data and/or transmitting RF energy [
5,
6]. However, the efficiency of incorporating intermediate nodes can be not enough due to numerous obstacles and significant propagation loss. Deploying intermediate nodes in an ultra-dense manner can yield substantial benefits, such as energy savings. However, this approach comes with a notable drawback as it significantly escalates the deployment costs. To mitigate this problem, a number of works (e.g., directional antenna transmission [
7], scheduling [
8,
9], and energy waveform optimization [
10,
11]) have been investigated in the literature. One of the most promising solutions is introducing unmanned aerial vehicles (UAVs) [
12,
13,
14,
15,
16,
17,
18,
19,
20,
21] as intermediate nodes to collect data and/or transfer energy, because UAVs can achieve flexibly shorter distance and line-of-sight (LoS) links thanks to their mobility. These solutions fall into two main categories: (1) UAV-based data collection [
12,
13,
14,
15,
16]; and (2) UAV-based wireless power transfer [
17,
18,
19,
20,
21]. However, as these studies did not jointly optimize policies for both data collection and wireless power transfer via a UAV, achieving high service quality becomes challenging.
To address this issue and achieve high service quality, we introduce a UAV-based environmental monitoring system designed to optimize UAV operations for data collection, wireless energy transfer, and trajectory. To accomplish this objective, UAV navigates the specified region, and SNs intermittently send their collected data to the gateway (GW) or UAV by taking into account the respective distances to both the GW and UAV. To optimize the service quality of the environmental map (to ensure sufficient fresh data without depleting the energy of the SNs), the UAV periodically makes three types of decisions: (1) determining its movement, (2) deciding whether to relay or aggregate data from SNs, and (3) determining whether to transfer energy to SNs. To achieve these optimal decisions, we introduce a deep reinforcement learning (DRL)-based UAV operation decision algorithm named DeepUAV. Due to the huge scale of the decision space, it is difficult to apply the conventional optimization techniques to our problem. Thus, we introduce the reinforcement learning-based approach for deciding the operation of the UAV. In DeepUAV, the controller commands the optimal decisions of the UAV (i.e., trajectory, data relay, and wireless power transfer). These decisions are made by taking into account the transmission distance between the UAV and GW, the freshness of the aggregated data in the UAV, and the energy levels of the UAV and SNs. It is important to highlight that the controller undergoes continuous online learning, refining the three kinds of decisions through a process of trial and error. Specifically, the information derived from state observation and learning experiences are integrated into a nonlinear function approximator (i.e., a neural network). This neural network undergoes iterative training to provide the controller with near-optimal decisions corresponding to each state. To evaluate and compare the service quality of DeepUAV with that of the related works, we perform an event-driven simulation in an environment where the general crowdsensing map application is requested. From the evaluation results, we demonstrate that DeepUAV can increase the service quality by up to compared to the trajectory and wireless power transfer optimal scheme because DeepUAV simultaneously optimizes three types of UAV operations based on the given environment.
The main contribution of this paper is three-fold: (1) The joint optimization of the trajectory, data relay, and wireless power transfer of the UAV is conducted and thus a sufficient number of fresh data can be collected, leading to the high service quality of the environmental monitoring systems. The previous works focused on optimizing individual UAV operations, such as the trajectory, data relay, or wireless power transfer operations. (2) Because DRL can be exploited in the high-dimensional decision-making problems, DeepUAV can be implemented in practical systems. (3) Extensive evaluation results are presented and scrutinized across diverse environments, offering insights for the development of systems involving UAV-based data collection and wireless power transfer.
The remainder of this paper is structured as follows:
Section 2 provides a summary of the related works, and
Section 3 presents the system model.
Section 4 details the development of DeepUAV. Then, the evaluation results are given in
Section 5, followed by the concluding remarks in
Section 6.
2. Related Works
To improve the service quality in an environmental monitoring system, a number of works to exploit a UAV have been reported in the literature [
12,
13,
14,
15,
16,
17,
18,
19,
20,
21], which can be categorized into (1) UAV-based data collection [
12,
13,
14,
15,
16] and (2) UAV-based wireless power transfer [
17,
18,
19,
20,
21].
Dai et al. [
12] presented a centralized solution based on DRL, focusing on regulating the UAV trajectory and scheduling SNs for data collection. The primary goal is to reduce UAV energy usage while maintaining data freshness. Li et al. [
14] proposed a scheme known as the DQN-based Flight Resource Allocation Scheme (DQN-FRAS) to minimize network data loss by determining the UAV data trajectory and data collection schedule. Sun et al. [
15] presented an Age of Information (AoI)- and energy-aware data collection scheme tailored for UAV-assisted IoT networks, focusing on minimizing the weighted sum of the expected average AoI and UAV propulsion energy. Liu et al. [
13] introduced a distributed control framework for maximizing energy efficiency, the data collection ratio, geographic fairness, and minimizing energy consumption through UAV trajectory planning with charging stations. Dai et al. [
16] introduced a UAV crowdsensing framework that prioritizes both minimizing energy consumption across all UAVs and ensuring data freshness while also aiming to maximize the collected data amount and geographical fairness simultaneously.
Yang et al. [
17] proposed a novel UAV-enabled hybrid communication system jointly optimizing the UAV’s transmission power and trajectory and the SNs’ transmission power and duration to maximize the total energy efficiency (EE) of the SNs. Suman and De [
18] analyzed UAV-aided RF energy transfer performance and then they provided a closed-form expression on the received power from a UAV. Jiang [
19] analyzed a degradation of the received power at SNs due to the inappropriate movement of the UAV in UAV-assisted wireless information and energy transfer systems. Liu et al. [
20] developed closed-form expressions for the energy outage probability and rate outage probability, formulating an optimization problem aimed at minimizing the overall outage probability of SNs. Nguyen et al. [
21] explored a simultaneous wireless power transfer and information transmission scheme with reconfigurable intelligent surface (RIS)-assisted UAV communication, optimizing the UAV trajectory, power allocation, and phase-shift matrix of the RIS.
Although these related works have achieved high service quality by formulating policies for either data collection from SNs or wireless power transfer, their potential for enhanced service quality increases significantly if these works jointly determine policies for both data collection and energy wireless transfer. For instance, consider a situation where no joint optimization on the trajectory and data relay is conducted. In this situation, even when the UAV is instructed to move closer to the GW (i.e., even though there is a chance that the UAV can relay the data with smaller energy consumption), the UAV can relay the data to the GW at the current location, which can cause the degradation of the service quality due to the energy depletion of the UAV. To obtain high service quality more than the related works, we introduced a UAV system that jointly optimizes the trajectory, data relay, and wireless power transfer in [
22] and in this paper, respectively. In [
22], the proposed system follows the static data relay policy of transmitting data to the GW after a certain number of data are aggregated. Thus, we cannot achieve high service quality. Thus, we have extended the system in [
22] to dynamically determine the policy according to the environment.
3. UAV-Based Environmental Monitoring System
The system model in this paper is illustrated in
Figure 1. It comprises two planes: (1) the ground plane; and (2) the UAV plane. Each plane is subdivided into multiple locations, and each location is identified by
l. In the ground plane,
I SNs are strategically deployed and equipped with distinct battery capacities. Let
denote the location of SN
i and
represent the battery capacity of SN
i. Additionally, a fixed charger for the unmanned aerial vehicle (UAV) is positioned at the charging location identified as
on the ground plane. It is assumed that the monitoring server and the UAV controller are co-located at the gateway (GW).
As depicted in
Figure 2, we focus on the environmental monitoring services (e.g., traffic and air quality monitoring services). In this service, the monitoring server aggregates the data (e.g., traffic and air quality intensity) from SNs at the location of interest (LOI) during the service update cycle
. The environmental monitoring map is then updated at the end of each cycle [
1,
2,
3,
23]. We assume that the data sensed by the SNs are dynamically changed.
Specifically, as shown in
Figure 2, when it is assumed that the monitoring server periodically updates the environmental map every 5 time slots (i.e., service update cycle
is 5), the monitoring server collects the data from SNs during 5 time slots. Then, at the end of the service update cycle, the server updates the environmental map by analyzing the collected data (e.g., predicting the air quality for the air quality map).
Intuitively, if the monitoring server acquires more data during the service update cycle, it can lead to higher service quality, as discussed in [
1]. To achieve this, the controller directs the operations of the UAV, including the movement, data relay, and wireless power transfer. Subsequent subsections provide detailed descriptions of the operations of both the sensor nodes (SNs) and the UAV.
3.1. Operations of SN
On the ground plane, each SN collects and/or senses its designated target (e.g., air condition and temperature) by expending its energy . Consequently, when the SN runs out of energy, it becomes incapable of collecting and/or sensing the target. After sensing the target, the SNs transmit the data to the GW or UAV, considering the distances to both the GW and the UAV. If the UAV emits a short signal pulse, the SNs can estimate the distance to the UAV by analyzing the received signal power. Additionally, because the GW and SNs are stationary, determining the distance between them is straightforward. In particular, when the distance to the GW is shorter than that to the UAV, the SN sends the sensed data directly to the GW. Conversely, if the distance to the GW is longer than that to the UAV, the SN transmits the data to the UAV. This strategy can reduce the energy consumption for data transmission.
To transmit the sensed data with the minimum energy consumption, it is assumed that SNs use the minimum transmission power
to achieve the desired bit error rate. The minimum transmission power can be calculated as
[
24] where
h and
d are the channel coefficient and the distance between the source node and the destination node, respectively, and
is the receiving minimum power at the destination node. On the other hand,
h is defined as
where
and
are denoted by the large-scale fading effect and the small-scale fading, respectively. Especially,
is
, where
and
c represent the carrier frequency and the speed of light, respectively.
and
denote the path loss exponent and the additional path loss. Therefore, the energy consumption of the SNs for data transmission increases in accordance with the distance.
Meanwhile, upon receiving data from the SN, the application server dispatches an acknowledgment message to the SN. Subsequently, the SN reevaluates its target by monitoring and/or sensing it, before transmitting the data to either the UAV or GW. The energy status of the SN is included in the packet for the sensed data.
3.2. Operations of UAV
On the UAV plane, the UAV maintains a constant altitude denoted as
z. In practical systems, the altitude is typically set to the minimum level that avoids obstacles, such as buildings and trees, without the need for frequent descent and ascent [
25]. The operations of the UAV fall into three categories: (1) movement; (2) data relay; and (3) wireless power transfer. In other words, the controller makes determinations and issues commands for these operations to the UAV at each time epoch.
For the movement operation, the UAV possesses the capability to either hover in its current position or traverse to an adjacent location, utilizing energies and , respectively. Meanwhile, a fixed charging station for the UAV is stationed at a specific location identified as . Staying in proximity to the charging location enables the UAV to undergo frequent charging and thus it can sustain operations without energy depletion. However, this scenario, where the UAV remains close to the charging location, poses a challenge for SNs deployed far from the charging point. In such instances, these SNs cannot receive assistance from the UAV, i.e., they are unable to transmit sensed data to the UAV or be charged by it. Consequently, the lifetime of the SNs is expected to be shortened, resulting in inadequate data collection. To prevent this situation, the decision regarding the movement operation of the UAV must account for the energy levels of both the UAV and SNs.
As a relay node between the SNs and GW, the UAV has the capability to aggregate data. If the UAV were to relay data to the GW immediately upon receiving it from the SNs, its energy could be depleted rapidly. To avoid this, the UAV can aggregate data and thus it does not have to transmit it to the GW right away upon reception. Instead, it can wait until a certain amount of data are accumulated and/or until the UAV is in close proximity to the GW. At that point, the UAV can deliver the aggregated data simultaneously using a single packet. However, if the data are excessively aggregated, there is a risk of the current service update cycle concluding before the data are transmitted to the GW, rendering the aggregated data obsolete. Therefore, the decision regarding the transmission operation of the UAV should consider the energy levels of both the UAV and the service update cycle.
To facilitate efficient wireless power transfer, we assume that the UAV utilizes the directional antenna with the fixed angle [
26]. Specifically, the UAV at location
i in the UAV plane can transfer the energy to only the SN at location
i in the ground plane, and this power transfer consumes the energy
. It is unnecessary to conduct wireless power transfer to SNs that already possess sufficient energy. Additionally, if the UAV is positioned far from its charging location and excessively engages in wireless power transfer, it may face challenges returning to the charging location due to energy depletion. In summary, the decision regarding the wireless power transfer should consider the current location of the UAV and the energy levels of the UAV and SNs, jointly.
To simultaneously optimize the three types of operations (i.e., movement, data relay, and wireless power transfer), a DeepUAV algorithm is introduced in the subsequent section. In DeepUAV, the controller undergoes continuous online learning, refining the UAV’s policies (i.e., sequences of operations) through trial and error. More precisely, state observation information and learning experiences are integrated within a nonlinear function approximator (i.e., a neural network). This neural network is iteratively trained, providing the controller with near-optimal decisions for each state.
5. Evaluation Results
For the performance evaluation, we compare the proposed algorithm, DeepUAV, with the following three schemes: (1) No-UAV where there is no UAV to relay data from SNs and to transfer energy to SNs; (2) DeepCharge, which follows the optimal policy on the wireless energy transfer and the movement of the UAV while adhering to the fixed policy of the always data relay (i.e., always
); and (3) DeepPush, which follows the optimal policy on the data relay and the movement of the UAV while adhering to the fixed policy of no wireless energy transfer (i.e., always
). Note that DeepCharge and DeepPush are designed by generalizing the UAV-based data collection schemes and UAV-based wireless power transfer schemes in
Section 2.
For the performance evaluation, we consider a
plane (i.e., 100 locations) where 50 SNs are randomly deployed. In addition, the UAV charger and GW are located at the center of the locations. Also, we consider the general crowdsensing map application. This application generates the environment map (e.g., temperature map) in the urban area based on the aggregated SNs’ data (e.g., temperature information) [
1].
The energy consumption parameters are set as follows. First, we normalize the energy consumption for sensing data
as 1 J. Then, comparing the complexity of the operations,
,
,
,
, and
are set to 1, 1, 10,
, and 100 J. Also, the maximum battery capacities of the UAV and SN,
and E[
], are set to 500 and 70 J, respectively. The service update cycle and a sufficient number of data from each SN for the current service update cycle,
and
, are set as 5 and 2, respectively. We consider the average of the service quality based on Equation (
12) during the service operation time as the performance parameters.
Algorithm 1 Deep reinforcement learning (DRL)-based UAV operation decision algorithm. |
- 1:
Make and Rest replay memory D - 2:
Make and Rest the deep Q-network having weights - 3:
Make and Rest target deep Q-network having weights - 4:
for each episode do - 5:
Obtain the initial state - 6:
for do - 7:
Randomly Determines probability value p - 8:
if then - 9:
select a random action - 10:
else - 11:
- 12:
end if - 13:
Command the action to UAV, and obtain and calculate the next state and the reward and t - 14:
Put experience into replay memory D - 15:
Get a set of experiences into replay memory D - 16:
- 17:
Performs a gradient descent step on with respect to - 18:
For every K steps, update the target deep Q-network with - 19:
end for - 20:
end for
|
5.1. Convergence Process Analysis
Figure 4a,b represent the service quality and the training loss during the training phase, respectively. In each epoch, 50 iterations are conducted. From
Figure 4a,b, it can be shown that the service quality and training loss increase and decrease as the training epoch goes on, except for the initial epochs, respectively. This indicates that DeepUAV effectively learns how the selected UAV operations affect the achievable service quality. Meanwhile, at the initial training phase (i.e., epochs < 100), high training loss and low service quality are obtained because DeepUAV has less experience in learning how the selected UAV operations affect the service quality.
5.2. Effect of I
Figure 5 represents the effect of the number of SNs
I on the service quality. In this simulation result, DeepUAV shows the enhanced service quality by
when it is compared to DeepPush and DeepCharge. Especially, in the default simulation setting, DeepUAV demonstrates increased service quality by up to
. From
Figure 5, it can be found that DeepUAV has the highest service quality regardless of
I. This is because DeepUAV simultaneously optimizes three types of decisions (i.e., the decision on where to move, decision on whether to relay or aggregate data, and decision on whether to transfer energy).
Meanwhile, as shown in
Figure 5, the service qualities of all the schemes increase as
I increases. This is because more SNs can transmit their sensed data for the target application and more data indicate a higher service quality.
5.3. Effect of
Figure 6 represents the effect of the number of locations in the ground plane,
, on the average service quality. DeepUAV increases its service quality by up to
compared to DeepPush and DeepCharge. From
Figure 6, it can be observed that the service qualities of all the schemes are degraded as the number of locations
increases. This is because, when
increases, the SNs are more sparsely deployed in the ground plane and thus the SNs need more energy consumption to transmit their data to the GW. In addition, the UAV needs to travel long distances to help the SNs, which causes more energy consumption of the UAV.
5.4. Effect of
Figure 7 represents the effect of the service update cycle
on the service quality. In this simulation result, DeepUAV demonstrates that it can enhance the service quality by up to
compared to DeepPush and DeepCharge. When
is set to a short value, SNs need to frequently sense the target data and report the sensed data to the monitoring server. In this situation, SNs consume more energy within a short duration, which can cause the energy depletion of SNs. Thus, as shown in
Figure 7, the service qualities of all the schemes decrease as
decreases.
Meanwhile, from
Figure 7, it can be found that DeepPush has almost the same service quality with No-UAV when a service update cycle is short (i.e.,
is 1, 2, or 3). This is because, when a service update cycle is short, most SNs directly transmit their data to the GW before the UAV approaches. That is, the effectiveness of introducing the UAV in DeepPush is not significant.
5.5. Effect of
Figure 8 shows the effect of the average battery capacity of SNs,
. When DeepUAV is compared to DeepPush and DeepCharge, DeepUAV can enhance the service quality by up to
compared to DeepPush and DeepCharge. From
Figure 8, the service qualities of all the schemes increase as the battery capacity of the SNs increases. This is because SNs having more battery capacity can sense and transmit more data to the GW for a longer time.
Meanwhile, when SNs have a lower battery capacity, the charging energy of SNs to prevent the energy depletion of SNs is more important than reducing the transmission energy consumption via the UAV relay operation. Therefore, when SNs have a low battery capacity (i.e., ), the service qualities of DeepCharge and DeepUAV are better than that of DeepPush.