2. Related Work
Due to their unique merits and intriguing applications, many UAV-NOMA-related works have been handled recently.
Table 1 summarizes the related work and highlights the major contributions of each. In [
23], the authors optimized the altitude and PA of NOMA-based UAVs to achieve the maximum achievable sum rate for multi-users via NOMA user-rate gains. However, enhancing spectral and energy efficiency is imperative for achieving the maximum sum rate from UAV-enabled communications. In addition, the deployment of UAVs and power allocation schemes were developed in [
24] to improve the performance of a UAV-NOMA network. In order to maximize network sum rates, PA for NOMA is optimized based on the ideal location of the UAV. Moreover [
25], UAV trajectory planning and PA optimization could be utilized to operate multiple UAV Base Stations (BSs) at a minimum average rate. This was performed via NOMA without considering UAV movement battery consumption. With the aid of trajectory planning and PA, the authors of [
26] were able to maximize the throughput of a UAV relay system. Nevertheless, they considered fading channel and fixed sensor scenarios, not mobile ones. Furthermore, the authors of [
27] optimized the broadcast power allocation, ground customer, and UTP in order to maximize the minimum achievable rate in the downlink NOMA scenario.
Recently, UAV-NOMA wireless communication issues have been solved using MABs due to their distinctive benefits. Thus, the authors of [
5] mitigated aerial-ground interference in cellular-connected UAV communications via uplink NOMA from the UAV to cellular BSs while sharing the spectrum with existing ground users. The authors of [
6] evaluated the performance of NOMA-aided Reconfigurable Intelligent Surfaces (RIS)-assisted hybrid Radio Frequency (RF)- Underwater Optical Wireless Communication (UOWC) system. In one paper [
7], the authors investigated NOMA-enhanced dual-hop hybrid communication systems with decode-and-forward relay. Additionally, they proposed a power allocation optimization technique for achieving outage-optimal performance. Furthermore, the authors of [
10] applied UAV-NOMA for constructing high-capacity IoT uplink transmission systems to optimize the sub-channel assignment and the uplink transmit power of IoT nodes. Moreover, the authors of [
12] proposed a UAV-assisted NOMA network, where the UAV and BS collaborate to serve ground users simultaneously to maximize the sum rate by jointly optimizing the UAV trajectory and NOMA precoding.
Also, the authors of [
28,
29] proposed a MAB solution for the UAV-NOMA system that faces joint resource allocation and power control problems. Using the proposed solution, a distributed resource allocation and power level can be selected via customers/survivors. The authors of [
30] proposed a deep reinforcement learning algorithm for trajectory planning of UAV-aided mobile edge computing. According to [
31], the MAB issue, the optimal UAV placement, was determined in order to achieve the network’s maximum sum rate. They solved the MAB issue using the UCB scheme. Also, in [
32], a MAB-aided solution to find the ideal UTP and PA enhances the network’s sum rate with regard to the MAB issue. Furthermore, the authors in [
33] propose solutions that follow UCB principles for stochastic MAB. In particular, a new exploration policy was implemented in order to learn resource-efficient scheduling algorithms. As a consequence, the coauthors of this work, in [
34] proposed the utilization of MAB schemes to optimize UAV energy consumption in disaster area scenarios. They used the UCB algorithms to solve the UAV optimization problem to find the ideal trajectory without NOMA existence. However, to the best of our knowledge, BMABs have not been exploited in UTP-PA of UAV-NOMA problems despite its practical aspects which motivate this work.
Unlike the pre-mentioned works, we propose BUCB1 / BUCB2 schemes that maximize the data rate and optimize UTP to cover all customers/survivors and minimize battery consumption via efficient power allocation. Both algorithms have the same exploitation behavior, which is the division of the observed rewards over the arms cost. However, their management of exploration is different. BUCB1 assumes prior knowledge of all actions/arms’ minimum expected costs (i.e., survivor locations are known prior to estimation), and BUCB2 estimates these costs from previous observations (i.e., unknown survivor locations). Numerical simulations demonstrate the effectiveness of our proposed BMAB solutions in terms of the total number of assisted survivors, energy consumption, and convergence speed compared to benchmark solutions.
4. Envisioned BMAB Techniques
The problem of allocating resources (power and time) to different clusters in order to increase the number of customers/survivors served while reducing UAV battery consumption can be formulated as a BMAB problem. In this case, the UAV is the bandit player, the clusters are the bandit arms, and the UAV’s power allocation and flight time as the resources to be allocated, i.e., the budget. The reward in this problem is the number of customers/survivors served and the cost is the UAV battery consumption. In BMABs, the goal is to balance the trade-off between exploration and exploitation with cost minimization. Similarly, UAV needs to explore different clusters to gather information about the number of survivors and the battery consumption, while also exploiting the knowledge gained to maximize the reward and minimize the cost [
41].
One approach for solving this problem using BMABs is to use the Budget Upper Confidence Bound (BUCB) algorithm. In the BUCB algorithm, exploration and exploitation are balanced by selecting the arm with the highest UCB for the reward, which takes both the average reward and the estimated uncertainty into account [
42]. The algorithm also includes a budget constraint to ensure that the UAV’s power and flight time do not exceed their maximum limits. Another approach is to use the linearly constrained bandit algorithm, which solves the problem of balancing exploration and exploitation while taking into account the budget constraints. This algorithm uses a linear model to approximate the expected rewards and costs of each arm and solves the problem by solving a linear program in each round. It is worth noting that solving BMAB problems is not a straightforward task, and it is computationally expensive. Also, It is important to note that these are approximate solutions and the actual results might not be optimal. The success of the solution also depends on the quality of the approximation used and the assumption made about the underlying system.
4.1. Proposed UCB Algorithm
UCB is one of the most well-known bandit algorithms for balancing exploration-exploitation compromise [
31,
42]. The balance between exploration and exploitation is continually updated as it gathers more data about the environment. The first step focuses on exploring all arms, then when the least action trials have occurred, it exploits the arm with the highest calculated payoff. Applying this in UTP-PA problems, the player/UAV selects each arm/cluster once based on the UCB policy. Hence, at every trial
, the player draws a arm/cluster
according to the following formula:
where
refers to the average reward per cluster (i.e., the number of aided customers/ survivors) delivered from
k cluster at trial
t, and
is the number of times arm/cluster
k has been selected. As the cluster is pulled a number of times, the confidence interval enlarges. Hence, the player/UAV attempts other arms/clusters that are less drawn as
decreases. As a result of exploiting the past highest-payoff cluster, the player/UAV is able to gain the maximum allowable reward.
4.2. Proposed BUCB1/ BUCB2 Algorithms
BMABs classifies into two primary categories: the pure exploration category, referred to as best arm identification, and the exploitation-exploration category [
41]. For the first category, only the exploration arms reflect the budget without updating the exploitation arms to determine which arm is best. In contrast to UCB, BUCB1/BUCB2 algorithms represent both exploration and exploitation budgets in the second category.
This allows us to illustrate how the joint UTP and PA problems of the UAV-NOMA system can be solved effectively using BUCB1 and BUCB2 algorithms. In our considered scenario, the UAV cost is a random time variable that should be efficiently anticipated. Furthermore, it is important to reflect both the cost and the payoff of each arm in the exploration-exploitation tradeoff using BUCB1 and BUCB2 algorithms. There is a fundamental difference between the two algorithms in terms of how they manage exploration and explanation [
41].
There are two proposed algorithms for BUCB1/BUCB2, both of which have the same exploitation component: the payoff ratio (i.e., the number of customers/survivors) over the costs (i.e., the UAV energy consumption). In BUCB1, the minimum cost of all arms is assumed to be known prior to the game start, so the locations of the survivors are well-known. On the other hand, BUCB2 eliminates this requirement by depending on previous observations to obtain estimated costs. There is a difference between the limits of the proposed algorithms: BUCB2 owns a looser boundary but a wider range than BUCB1 due to the latter requiring more knowledge.
In contrast to the UCB-based UTP-PA algorithm, which has no explicit stopping time, both BUCB1 and BUCB2 cease operation when the energy in the UAV’s battery is consumed, as long as the average payoff-to-energy ratio exploitation term remains the same. Hence, the UAV will choose/fly to the cluster with the highest payoffs-to-energy ratio to assist.
Algorithm 1 summarizes the main steps of BUCB1 and BUCB2 schemes. In BUCB1, A parameter
represents the lower bound of expected costs based on prior knowledge [
41]:
Algorithm 1: BUCB1/ BUCB2 Algorithms. |
|
The Global Positioning System (GPS)-based localization makes it simple to obtain this prior knowledge. However, obtaining such knowledge under other scenarios may be difficult if the GPS signal is lost or highly drains the battery in the customer/survivor handset.
Accordingly, BUCB2 analyzes the expected energy of the dispersed survivors/customers on a timely basis, that is,
, by taking into account the previous energy observations from the clusters that were visited. Thus, BUCB2 utilizes both the minimum necessary cost expectations and the achievable payoffs using empirical observations, as follows [
41]:
Following that, the estimate will be used to calculate the exploration term. Thus, this method does not require prior knowledge and can be used in many applications, unlike BUCB1. It is noteworthy that the BUCB2 equation in Algorithm (1) cannot be determined by just substituting in with .
As shown in Algorithm (1), the energy costs are determined by the number of survivors/customers assisted and the input of the next cluster. is the average payoff of cluster before step t, is the average cost, is the time that clusters before step t, it has been pulled, and denotes the index of the cluster via algorithm pulled at time t. When a cluster is pulled many times, the confidence interval expands, causing to decrease and the player/UAV to try other less drawn arms/clusters. The player utilizes the previous highest-payoff cluster to gain the maximum allowable payoff.
5. Numerical Simulations
Herein, we evaluate the performance of the proposed two algorithms (BUCB1, BUCB2) based on the UAV-NOMA network, assuming an equal power allocation within each cluster. Then, we will compare their performance with respect to a conventional UAV-OMA scenario. The survivors are deployed randomly within each cluster, assuming their traffics follow Binomial distributions
B, where
is the number of survivors in the
cluster and
o is the on-demand radio access probability that equals to 0.2 (
Table 2).
In this simulation, the following parameters are used; The UAV’s altitude is fixed at
, where a default bandwidth of 100 MHz is used for transmission. A maximum power of
is used by the the UAV. Herein, we utilized X-NOMA corresponds to a NOMA scheme with equal power allocation for all survivors in each cluster, while X-NOMA PA denotes a NOMA scheme that uses the power allocation strategy in Equation (
4) such that X
.
Figure 2 shows the number of assisted survivors of the three compared algorithms (i.e., UCB, BUCB1, and BUCB2) for both UAV-NOMA and UAV-OMA with E = 1000 Joule (J) against the convergence time horizon. Comparing the convergence provides valuable insights for algorithm selection. The results highlight the benefit of exploiting NOMA transmission compared with OMA, where all three algorithms achieve much higher speed through using NOMA. On the other hand, the results show that BUCB1 performs best, owing to its precise selection policy with GPS survivors’ locations. BUCB2 achieves less performance since it has no access to the prior knowledge of BUCB1. As shown in
Table 3, at
BUCB1, BUCB2, and UCB for UAV-NOMA achieves higher number of aided survivors by 109%, 133%, and 260% compared to similar schemes in UAV-OMA, respectively.
Figure 3 shows the number of assisted survivors versus various UAV transmitted power levels ranging from 10 to 40 dBm at E = 1000 J. For all compared schemes as the power increases the number of assisted survivors gradually increases, especially the UAV-NOMA-related schemes. The BUCB1-NOMA algorithm assists the highest possible number of survivors. As shown in
Table 4, at
dBm, BUCB1-NOMA. BUCB2-NOMA, and UCB-NOMA performance is better than similar techniques using OMA by 92%, 74%, and 70%, respectively.
Figure 4 previews the number of assisted survivors versus the number of visited clusters at
J. The whole compared schemes increase relatively with the number of visited clusters. With a longer flight duration throughout minimizing battery consumption, UAV transmits more information to the clusters. Specifically, the envisioned BUCB1 owns the best performance, followed by BUCB2 and then UCB. Moreover, UAV-NOMA improves performance better than UAV-OMA. The overall number of survivors increases slightly with an increase in the number of clusters, especially in BUCB1 and BUCB2. A more significant number of visited clusters leads to more flying power consumption via the UAV. As a result, the algorithms we offer have an effective energy management strategy. At
the number of assisted survivors for the NOMA scenario compared schemes is larger than what is in another case because of the larger battery capacity, the hovering and flying times are longer in the area. BUCB1 performs best, followed by BUCB2 due to its appropriate techniques for both battery capacity scenarios. BUCB1 owns the exact locations of the survivors via GPS. As shown in
Table 5, at 25 clusters the UAV-NOMA-BMAB schemes outperform UAV-OMA-BMAB by 27.47%, 24.02%, and 18.29% for BUCB1, BUCB2, and UCB, respectively.
Figure 5 presents the number of survivors/users versus the number of clusters for different NOMA power allocation schemes. The power allocation strategy in Equation (
4) denoted by X-NOMA PA show a better performance compared with their counterparts using the equal power allocation schemes denoted by X-NOMA. This help in focusing on exploring better channels.
Table 6 reveals the percentage improvements for UCB, BUCB1, and BUCB2, when using the PA strategy in Equation (
4), which are approximately 333%, 80%, and 153%, respectively, as the number of survivors at
in UAV-NOMA compared to UAV-OMA, respectively.
Figure 6 shows the effect of using the NOMA power allocation strategy in Equation (
4) compared the equal power allocation of NOMA strategy on the assisted survivors’ performance of UCB, BUCB1, and BUCB2 algorithms at E = 1000 J over time. The results show that implementing the NOMA power allocation improves the performance significantly compared to the equal power allocation for all schemes. The NOMA PA is expected to outperform equal PA in UAV-NOMA scenarios, where leveraging adaptive PA for resource efficiency with favorable channel conditions, leading to increasing assisted survivors over time. As shown in
Table 7 BMAB-NOMA PA techniques outperform similar BMAB-NOMA techniques with eqial PA by approximately 48.77%, 62.73%, and 50.49%, for UCB1, BUCB1, and BUCB2, respectively.
Figure 7 compares UCB, BUCB1, and BUCB2 algorithms for UAV-NOMA under two power allocation scenarios: equal power allocation and NOMA power allocation. Equal power allocation provides fixed power to each survivor in the cluster, while NOMA power allocation allocates varying power levels based on channel conditions and QoS requirements, exploiting multi-survivor diversity. With increasing UAV transmitted power, the assisted survivors may increase linearly, but limitations can arise due to varying channel conditions. The performance of algorithms under NOMA PA depends on channel conditions and power allocation policies and influences the number of visited clusters.
Table 8 illustrates distinct variations in assisted survivors as UAV transmission power increases. at
= 40 dbm, BUCB1 with NOMA PA achieves 92.15%, followed by BUCB2 with NOMA PA at 85.28%, and UCB with NOMA PA trailing at 73.32%. Finally, BUCB1-NOMA PA aims to achieve a more balanced allocation of power among survivors in the cluster compared to UCB, and BUCB2, ensuring a fair distribution of resources while maximizing the overall sum rate.
6. Conclusions
In this paper, we proposed a novel BMAB-based approach for power allocation and trajectory planning in UAV-NOMA-aided networks that has shown promising results in optimizing the performance of such networks. Using BMAB algorithms, UAVs can efficiently allocate power and determine optimal trajectories based on available information and environmental feedback. BMABs balance exploration and exploitation and consider the UAV battery budget leading to better decision-making and improved communication performance. Hence, we proposed two UCB-aided budgeted algorithms, i.e., BUCB1 and BUCB2, that effectively assisted more survivors in disaster area scenarios. Hence, UAV-NOMA networks can enhance spectral efficiency and support multiple clusters simultaneously. The proposed algorithms were utilized to allocate power and optimize UAV positioning/trajectory, further improving UAV network performance and making them more efficient and effective in various applications. BMAB-NOMA achieved remarkable improvements, with a 60% increment in assisted survivors, 80% enhancement in convergence, and significant energy saving compared to the UAV-OMA solution. These findings underscore the BMAB approach’s effectiveness, efficiency, and potential to significantly elevate UAV-NOMA power allocation performance. Future directions might include inspecting UAV-NOMA in SDN-IOT Fog networks with multiplayer UAV scenario.